Getting Started on Aurora¶

Logging Into Aurora:¶

To log into Aurora:

1	`ssh <username>@aurora.alcf.anl.gov`

Then, type in the password from your CRYPTOCard/MobilePASS+ token.

Hardware Overview¶

An overview of the Aurora system, including details on the compute node architecture, is available on the Machine Overview page.

Compiling Applications¶

Users are encouraged to read through the Compiling and Linking Overview page and corresponding pages depending on the target compiler and programming model.

Autotools and CMake are available in the default Aurora Programming Environment (PE) and can be loaded via Lmod modules:

module load autoconf cmake

Submitting and Running Jobs¶

Users are encouraged to read through the Running Jobs with PBS at the ALCF page for information on using the PBS scheduler and preparing job submission scripts. For Aurora-specific job documentation, refer to Running Jobs on Aurora.

Early User Notes and Known Issues¶

Hardware instabilities - possible frequent downtime
Software instabilities - non-optimized compilers, libraries, and tools; frequent software updates
Non-final configurations (storage, OS versions, etc.)
Short notice for downtimes (scheduled downtimes will be with 4-hour notice, but sometimes downtimes may occur with just an email notice). Notices go to the aurora-notify@alcf.anl.gov email list. All users with access are added to the list initially.

See Early User Notes and Known Issues for details.

Python on Aurora¶

Frameworks on Aurora can be loaded into a user's environment by loading the frameworks module as follows. The conda environment loaded with this module makes available TensorFlow, Horovod, and PyTorch with Intel extensions and optimizations.

1	`module load frameworks`

Note that there is a separate Python installation in spack-pe-gcc which is used as a dependency of a number of Spack PE packages. Users will need to exercise caution when loading both frameworks and python from the Spack PE. For more details about Python on Aurora, please review Python on Aurora.

Software Environment¶

The Aurora Programming Environment (Aurora PE) provides the OneAPI SDK, MPICH, runtime libraries, and a suite of additional tools and libraries. The Aurora PE is available in the default environment and is accessible through modules. For example, tools and libraries like cmake, boost, and hdf5 are available in the default environment.

1	`module load cmake`

More details are on the Aurora PE page.

Additional software is installed in /soft and can be accessed by adding /soft/modulefiles to the module search path.

1	`module use /soft/modulefiles`

This will make available a handful of additional software modules, such as kokkos.

Proxy¶

If the node you are on doesn’t have outbound network connectivity, add the following to your ~/.bash_profile file to access the proxy host:

# proxy settings
export HTTP_PROXY="http://proxy.alcf.anl.gov:3128"
export HTTPS_PROXY="http://proxy.alcf.anl.gov:3128"
export http_proxy="http://proxy.alcf.anl.gov:3128"
export https_proxy="http://proxy.alcf.anl.gov:3128"
export ftp_proxy="http://proxy.alcf.anl.gov:3128"
export no_proxy="admin,polaris-adminvm-01,localhost,*.cm.polaris.alcf.anl.gov,polaris-*,*.polaris.alcf.anl.gov,*.alcf.anl.gov"

File Systems and DAOS¶

Home and Project Directories¶

Home directories on Aurora are /home/username, available on login and compute nodes. This is provided from /lus/gecko/home. The default quota is 50 GB.

Lustre project directories are under /lus/flare/projects. Default quota is 1 TB. The project PI should email [email protected] if their project requires additional storage.

DAOS¶

The primary storage system on Aurora is not a file system, but rather an object store called the Distributed Asynchronous Object Store. This is a key-array based system embedded directly in the Slingshot fabric, which provides much faster I/O than conventional block-based parallel file systems such as Lustre (even those using non-spinning disk and/or burst buffers). Project PIs will have requested a storage pool on DAOS via INCITE/ALCC/DD allocation proposals.

Aurora project PIs should email [email protected] to request DAOS storage with the following information:

Project name
Storage capacity (if this is different than in the current project proposal, please give brief justification)
Who should be the designated owner of the pool (please provide their ALCF username). The owner can be the project PI or any other team member, but they need to have an active ALCF account.
Have you used DAOS in the past? Is yes, please provide details.

See DAOS Overview for more on using DAOS for I/O.

Lustre File Striping¶

In addition to the content above, here is a document on Lustre File Striping Basics:

Lustre File Striping Basics

Getting Assistance¶

Please direct all questions, requests, and feedback to [email protected].