Known Issues
This is a collection of known issues that have been encountered on Polaris. Documentation will be updated as issues are resolved. Users are encouraged to email [email protected] to report issues.
Submitting Jobs
-
For batch job submissions, if the parameters within your submission script do not meet the parameters of any of the execution queues (
small
, ...,backfill-large
) you might not receive the "Job submission" error on the command line at all, and the job will never appear in historyqstat -xu <username>
(current bug in PBS). E.g. if a user submits a script to theprod
routing queue requesting 10 nodes for 24 hours, exceeding "Time Max" of 6 hrs of thesmall
execution queue (which handles jobs with 10-24 nodes), then it may behave as if the job was never submitted. -
Job scripts are copied to temporary locations after
qsub
and any changes to the original script while the job is queued will not be reflected in the copied script. Furthermore,qalter
requires-A <allocation name>
when changing job properties. Currently, there is a request for aqalter
-like command to trigger a re-copy of the original script to the temporary location.
Compiling & Running Applications
- If your job fails to start with an
RPC launch
message like below, please forward the complete messages to [email protected].
launch failed on x3104c0s1b0n0: Couldn't forward RPC launch(ab751d77-e80a-4c54-b1c2-4e881f7e8c90) to child x3104c0s31b0n0.hsn.cm.polaris.alcf.anl.gov: Resource temporarily unavailable
apptainer
. For other commands, please forward the complete message to [email protected] so we are aware of your use case.
ERROR: ld.so: object '/soft/xalt/3.0.2-202408282050/lib64/libxalt_init.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ssh
'ing between Polaris Compute Nodes
-
You should be able to
ssh
freely (without needing a password) between your assigned compute nodes on Polaris. If you are running intossh
issues check for the following causes: -
Your
/home/<username>
directory permissions should be set to700
(chmod 700 /home/<username>
) - Confirm the following files exist in your
.ssh
directory and the permissions are set to the following: 1.-rw------- (600) authorized_keys
2.-rw-r--r-- (644) config
3.-rw------- (600) id_rsa
4.-rw-r--r-- (644) id_rsa.pub
- Copy the contents of your
.ssh/id_rsa.pub
file to.ssh/authorized_keys