Getting Started on Aurora
Overview
*** ACCESS IS CURRENTLY ENABLED FOR ESP and ECP TEAMS ONLY ***
How to Get Access to Aurora (for New Users)
If You Already Have Access to Sunspot
If you already have access to Sunspot, all you need to do to gain access to Aurora is send an email to [email protected] requesting access to Aurora. In your email, include
- Your ALCF username
- Your institutional email address
- The ESP or ECP project in which you are a member
For Aurora Early Science Program (ESP) Team Members
If you have never had access to Sunspot, here are the steps to gain access to Aurora:
- Verify that your institution has signed a CNDA with Intel that covers you.
- If you do not have an active ALCF account, request one using the ALCF Account request webpage. When you come to the part about joining a project, request the
ProjectName_aesp_CNDA
project. - Acknowledge the Intel Terms of Use agreement (TOU) for the Aurora Software Development Kit (SDK) by submitting this form.
Getting a new ALCF account typically takes anywhere from a few days to a few weeks (processing new access for foreign nationals is what can take weeks). After you acknowledge the TOU, there is a manual step that typically takes a few days. You will receive an email notifying you when Aurora access is granted, including some getting started instructions.
For Aurora Exascale Computing Project (ECP) Team Members
See this page for instructions.
Caveats About Using Aurora and Reporting Findings
NOTE: Sharing of any results from Aurora publicly no longer requires a review or approval from Intel. However, anyone publishing these results should include the following in their materials:
"This work was done on a pre-production supercomputer with early versions of the Aurora software development kit."
In addition, users should acknowledge the ALCF. Refer to the acknowledgement policy page for details. Please note that certain information on Aurora hardware and software is considered NDA and cannot be shared publicly.
Aurora is in the very early stages of the system deployment - do not expect a production environment!
Expect to experience:
- Hardware instabilities - possible frequent downtime
- Software instabilities - non-optimized compilers, libraries and tools; frequent software updates
- Non-final configurations (storage, OS versions, etc...)
- Short notice for downtimes (scheduled downtimes will be with 4 hr notice, but sometimes downtimes may occur with just an email notice). Notices go to the aurora-notify@alcf.anl.gov email list. All users with access are added to the list initially.
Getting Help
Email ALCF support at support@alcf.anl.gov for bugs, technical questions, software requests, reservations, priority boosts, etc...
- ALCF's user support team will triage and forward the tickets to the appropriate technical SME as needed.
- Expect turnaround times to be slower than on a production system as the technical team will be focused on stabilizing and debugging the system.
For faster assistance, consider contacting your project's POC at ALCF (project catalyst or liaison)
- They are an excellent source of assistance during this early period and will be aware of common bugs and known issues.
ECP and ESP users will be added to a CNDA Slack workspace, where CNDA discussions may occur. An invite to the Slack workspace will be sent when a user is added to the Aurora resource.
Known Issues
See this page for known issues.
A known issues page can be found in the JLSE Wiki space used for NDA content. Note that this page requires a JLSE Aurora early hw/sw resource account for access. See page for other known issues.
Allocation usage
The allocation accounting system sbank is not yet installed on Aurora.
To obtain the usage information for all your projects on Aurora, issue the sbank command on another ALCF resource where sbank
is installed, such as Polaris.
For more information, see this page.
Transition to Aurora from Sunspot
Some guidance is provided here to aid users in the process of moving their work from the Sunspot Test & Development System.
Logging Into Aurora
Logging into Aurora is a two-stage process. You must first login through the bastion node via:
Then, type in the one-timepassword from your CRYPTOCard/MobilePASS+ token.This bastion node is a pass-through erected for security purposes, and is not
meant to host files. Once on the bastion, SSH to login.aurora.alcf.anl.gov
. It is
round robin to the aurora login nodes.
Proxies for outbound connections: Git, ssh, etc...
The Aurora login nodes don't currently have outbound network connectivity enabled by default. Setting the following environment variables will provide access to the proxy host. This is necessary, for example, to clone remote git repos.
# proxy settings
export HTTP_PROXY="http://proxy.alcf.anl.gov:3128"
export HTTPS_PROXY="http://proxy.alcf.anl.gov:3128"
export http_proxy="http://proxy.alcf.anl.gov:3128"
export https_proxy="http://proxy.alcf.anl.gov:3128"
SSH to other machines
To ssh to another machine from an Aurora login node, it can be helpful to add a proxyjump through Bastion in your .ssh/config
file. The first password prompt would be for bastion, followed by a prompt for the remote machine.
$ cat .ssh/config
Host my.awesome.machine.edu
ProxyJump bastion.alcf.anl.gov
$ ssh [email protected]
Additional guidance on scp and transfering files to Aurora is available and here.
Working with Git repos
The default SSH port is currently blocked on Aurora; by default, this prevents communicate with Git remotes that are SSH URLs such as the following.
For a workaround for GitHub, GitLab, and Bitbucket, the following can be added to your~.ssh/config
file. This requires updating your environment with the above proxy settings.
Host github.com
User git
hostname ssh.github.com
Host gitlab.com
User git
hostname altssh.gitlab.com
Host bitbucket.org
User git
hostname altssh.bitbucket.org
Host github.com gitlab.com bitbucket.org
Port 443
ProxyCommand /usr/bin/socat - PROXY:proxy.alcf.anl.gov:%h:%p,proxyport=3128
If you need to use something besides your default SSH key on Aurora for authentication to GitHub in conjunction with the above SSH workaround, you may set
where specialGitKey is the name of the private key in your .ssh
directory, for which you have uploaded the public key to GitHub. The -F
option can be used to specify a different SSH config file if needed; for example, -F none
will completely ignore your config file, including the above workaround.
Hardware Overview
An overview of the Aurora system including details on the compute node architecture is available on the Machine Overview page.
File Systems and DAOS
Home and Project Directories
Home directories on Aurora are /home/username
, available on login and compute
nodes. This is provided from /lus/gecko/home
. The default quota is 50 GB. Note that bastions have a different /home
and the default quota is 500 MB.
Lustre project directories are under /lus/flare/projects
. ALCF staff should
use /lus/flare/projects/Aurora_deployment
project directory. ESP and ECP
project members should use their corresponding project directories. The
project name is similar to the name on Polaris with an _CNDA suffix
(e.g.: projectA_aesp_CNDA, CSC250ADABC_CNDA). Default quota is 1 TB. The
project PI should email [email protected] if
their project requires additional storage.
Note: The Project Lustre File system has changed from Gecko to Flare. Project data from /lus/gecko/projects/*
has been copied over to /lus/flare/projects/*
. /lus/gecko/projects
is only available on the User Access Nodes (UANs).
DAOS
The primary storage system on Aurora is not a file system, but rather an object store called the Distributed Asynchronous Object Store. This is a key-array based system embedded directly in the Slingshot fabric, which provides much faster I/O than conventional block-based parallel file systems such as Lustre (even those using non-spinning disk and/or burst buffers). Project PIs will have requested a storage pool on DAOS via INCITE/ALCC/DD allocation proposals.
Preproduction ESP and ECP Aurora project PIs should email [email protected] to request DAOS storage with the following information
- Project name (e.g. FOO_aesp_CNDA)
- Storage capacity (For ESP projects, if this is different than in the ESP proposal, please give brief justification)
See DAOS Overview for more on using DAOS for I/O.
Software Environment
The Aurora Programming Environment (Aurora PE) provides the OneAPI SDK, MPICH, runtime libraries, and a suite of additional tools and libraries. The Aurora PE is available in the default environment and is accessible through modules. For example, tools and libraries like cmake
, boost
, and hdf5
are available in the default environment.
Additional software is installed in /soft
and can be accessed by adding /soft/modulefiles
to the module search path.
kokkos
.
Compiling Applications
Users are encouraged to read through the Compiling and Linking Overview page and corresponding pages depending on the target compiler and programming model.
Autotools and cmake are available in the default Aurora PE environment and can be loaded via modules.
Python on Aurora
Frameworks on Aurora can be loaded into a users environment by loading the frameworks
module as follows. The conda environment loaded with this module makes available TensorFlow, Horovod, and Pytorch with Intel extensions and optimizations.
Note that there is a separate Python installation in spack-pe-gcc
which is used as a dependency of a number of Spack PE packages. Users will need to exercise caution when loading both frameworks
and python
from the Spack PE.
Submitting and Running Jobs
Aurora uses the PBSPro job scheduler system. For Aurora-specific job documentation, refer to Running Jobs on Aurora
Getting Assistance
Please direct all questions, requests, and feedback to [email protected].