Skip to content

Containers on Crux

Apptainer will be supported on Crux at a future date.

Recipe-Based Container Building

As mentioned earlier, you can build Apptainer containers from recipe files. Instructions are available here. See available containers for more recipes.

Note: You can also build custom recipes by bootstrapping from prebuilt images. For example, the first two lines in a recipe to use our custom TensorFlow implementation would be Bootstrap: oras followed by From: ghcr.io/argonne-lcf/tf2-mpich-nvidia-gpu:latest.

Available containers

If you just want to know what containers are available, here you go:

  • Examples for running MPICH containers can be found here.

  • Examples for running databases can be found here.

  • For using shpc - that allows for running containers as modules. It can be found here.

The latest containers are updated periodically. If you have trouble using containers or request a newer or a different container, please contact ALCF support at [email protected].

Troubleshooting Common Issues

Permission Denied Error: If you encounter permission errors during the build:

  • Check your quota and delete any unnecessary files.

  • Clean up the Apptainer cache, ~/.apptainer/cache, and set the Apptainer tmp and cache directories as below. If your home directory is full and if you are building your container on a compute node, then set the tmpdir and cachedir to local scratch:

export BASE_SCRATCH_DIR=/local/scratch/ # FOR POLARIS
#export BASE_SCRATCH_DIR=/raid/scratch/ # FOR SOPHIA
export APPTAINER_TMPDIR=$BASE_SCRATCH_DIR/apptainer-tmpdir
mkdir $APPTAINER_TMPDIR
export APPTAINER_CACHEDIR=$BASE_SCRATCH_DIR/apptainer-cachedir/
mkdir $APPTAINER_CACHEDIR
  • Make sure you are not in a directory accessed with a symbolic link, i.e., check if pwd and pwd -P return the same path.

  • If any of the above doesn't work, try running the build in your home directory.

Mapping to rank 0 on all nodes: Ensure that the container's MPI aligns with the system MPI. For example, follow the additional steps outlined in the container registry documentation for MPI on Polaris.

libmpi.so.40 not found: This can happen if the container's application has an OpenMPI dependency, which is not currently supported on Polaris. It can also spring up if the container's base environment is not a Debian-based architecture such as Ubuntu. Ensure the application has an MPICH implementation as well. Also, try removing .conda/, .cache/, and .local/ folders from your home directory and rebuilding the container.

Disabled Port mapping, user namespace, and [network virtualization] Network virtualization is disabled for the container due to security constraints. See issue #2533.

Apptainer instance errors with version 1.3.2

Use nohup and & as an alternative if you want to run Apptainer as a background process. See below for an example of running Postgres as a background process:

 nohup apptainer run 
 -B pgrun:/var/run/postgresql \
 -B pgdata:/var/lib/postgresql/data \
 --env-file pg.env \
 postgres.sing postgres &

 # 3) Capture its PID so we can kill it later
 echo $! > postgres_pid.txt
 echo "Started Postgres in the background with PID $(cat postgres_pid.txt)"

# 4) Perform whatever work you need while Postgres is running
#    In this demo, we just sleep for 30 minutes (1800 seconds).
sleep 1800

# 5) Kill the background process at the end of the job
kill "$(cat postgres_pid.txt)"
rm postgres_pid.txt