Building Python Packages
To build Python packages for ThetaGPU, there are two options: build on top of a bare-metal build or build on top of (and within) a singularity container. Additionally, you can build a new container from NVIDIA's docker images.
Build on ThetaGPU compute using Conda
To build on ThetaGPU compute and install your own packages, login to theta and then submit an interactive job to log on to ThetaGPU compute node.
Building on top of a container
At the moment, you will need two shells to do this: have one open on a login node (for example,
thetaloginN, and one open on a compute node (
thetagpuN). First, start the container in interactive mode:
python -m virtualenv. If neither are available, you can install it in your user directory:
Next time you log in, you'll have to start the container, and then run source
$VENV_LOCATION/bin/activate to re-enable your installed packages.
Reaching the outside world for pip packages
You'll notice right away when you try to pip install you can not, because the connection fails. You can, however, go through a proxy server for pip by enabling these variables:
pip install mpi4py
Building custom packages
Most packages (HDF5, for example, or python packages) can be built and installed into your virtual env. Here are two common examples that aren't currently part of the pytorch container that may be useful.
You can find the source code for HDF5 on their website https://www.hdfgroup.org/downloads/hdf5/source-code. When downloaded and un-tarred, cdto the directory and run:
Horovod is useful for distributed training. To use it, you need it enabled within the container.