Building Python Packages
To build Python packages for ThetaGPU, there are two options: build on top of a bare-metal build or build on top of (and within) a singularity container. Additionally, you can build a new container from NVIDIA's docker images.
Build on ThetaGPU compute using Conda
To build on ThetaGPU compute and install your own packages, login to theta and then submit an interactive job to log on to ThetaGPU compute node.
Please see Running PyTorch with Conda or Running TensorFlow with Conda for more information.
Building on top of a container
At the moment, you will need two shells to do this: have one open on a login node (for example, thetaloginN
, and one open on a compute node (thetagpuN
). First, start the container in interactive mode:
singularity exec -B /lus:/lus --nv /lus/theta-fs0/projects/datascience/thetaGPU/containers/pytorch_20.08-py3.sif bash
export VENV_LOCATION=/path/to/virtualenv # replace this with your path!
python -m venv --system-site-packages $VENV_LOCATION
python -m virtualenv
. If neither are available, you can install it in your user directory:
and it should work.
Next time you log in, you'll have to start the container, and then run source $VENV_LOCATION/bin/activate
to re-enable your installed packages.
Reaching the outside world for pip packages
You'll notice right away when you try to pip install you can not, because the connection fails. You can, however, go through a proxy server for pip by enabling these variables:
export HTTP_PROXY=http://theta-proxy.tmi.alcf.anl.gov:3128
export HTTPS_PROXY=https://theta-proxy.tmi.alcf.anl.gov:3128
pip install mpi4py
Building custom packages
Most packages (HDF5, for example, or python packages) can be built and installed into your virtual env. Here are two common examples that aren't currently part of the pytorch container that may be useful.
HDF5
You can find the source code for HDF5 on their website https://www.hdfgroup.org/downloads/hdf5/source-code. When downloaded and un-tarred, cdto the directory and run:
This should get you HDF5! For example, after this:(pytorch_20.08) Singularity> which h5cc
/home/cadams/ThetaGPU/venvs/pytorch_20.08/bin/h5cc # This is my virtualenv, success!
Horovod
Horovod is useful for distributed training. To use it, you need it enabled within the container.
This should install Horovod within your container.