Containers on Crux
Apptainer will be supported on Crux at a future date.
Recipe-Based Container Building
As mentioned earlier, you can build Apptainer containers from recipe files. Instructions are available here. See available containers for more recipes.
Note: You can also build custom recipes by bootstrapping from prebuilt images. For example, the first two lines in a recipe to use our custom TensorFlow implementation would be
Bootstrap: oras
followed byFrom: ghcr.io/argonne-lcf/tf2-mpich-nvidia-gpu:latest
.
Available containers
If you just want to know what containers are available, here you go:
-
Examples for running MPICH containers can be found here.
-
Examples for running databases can be found here.
-
For using shpc - that allows for running containers as modules. It can be found here.
The latest containers are updated periodically. If you have trouble using containers or request a newer or a different container, please contact ALCF support at [email protected]
.
Troubleshooting Common Issues
Permission Denied Error: If you encounter permission errors during the build:
-
Check your quota and delete any unnecessary files.
-
Clean up the Apptainer cache,
~/.apptainer/cache
, and set the Apptainer tmp and cache directories as below. If your home directory is full and if you are building your container on a compute node, then set the tmpdir and cachedir to local scratch:
export BASE_SCRATCH_DIR=/local/scratch/ # FOR POLARIS
#export BASE_SCRATCH_DIR=/raid/scratch/ # FOR SOPHIA
export APPTAINER_TMPDIR=$BASE_SCRATCH_DIR/apptainer-tmpdir
mkdir $APPTAINER_TMPDIR
export APPTAINER_CACHEDIR=$BASE_SCRATCH_DIR/apptainer-cachedir/
mkdir $APPTAINER_CACHEDIR
-
Make sure you are not in a directory accessed with a symbolic link, i.e., check if
pwd
andpwd -P
return the same path. -
If any of the above doesn't work, try running the build in your home directory.
Mapping to rank 0 on all nodes: Ensure that the container's MPI aligns with the system MPI. For example, follow the additional steps outlined in the container registry documentation for MPI on Polaris.
libmpi.so.40 not found: This can happen if the container's application has an OpenMPI dependency, which is not currently supported on Polaris. It can also spring up if the container's base environment is not a Debian-based architecture such as Ubuntu. Ensure the application has an MPICH implementation as well. Also, try removing .conda/
, .cache/
, and .local/
folders from your home directory and rebuilding the container.
Disabled Port mapping, user namespace, and [network virtualization] Network virtualization is disabled for the container due to security constraints. See issue #2533.
Apptainer instance errors with version 1.3.2
Use nohup
and &
as an alternative if you want to run Apptainer as a background process. See below for an example of running Postgres as a background process: