DeepSpeed
The base frameworks
environment on Aurora does not come with Microsoft's
DeepSpeed pre-installed and it needs to be installed by the user. Further
instructions for working with the base environment can be found here.
We describe below the steps needed to get started with DeepSpeed on Aurora.
We focus on the cifar
example provided in the
DeepSpeedExamples repository,
though this approach should be generally applicable for running any model with
DeepSpeed support.
Running DeepSpeed on Aurora
Note
The instructions below should be ran directly from a compute node.
Explicitly, to request an interactive job (from uan-00xx
):
Refer to job scheduling and execution for additional information.
-
Load
frameworks
module: -
Create a (new) virtual environment:
-
Install DeepSpeed:
-
Clone microsoft/DeepSpeedExamples and navigate into the directory:
Launching DeepSpeed
In both examples the 'train_batch_size' variable needs to be modified from 16 to 12 in the DeepSpeed
config embedded in function get_ds_config()
from Python file cifar10_deepspeed.py
. This is because the default of 16 is not
compatible with 12 ranks per node we are launching with. DeepSpeed features can be further modified in the DeepSpeed config,
and the full feature set is described in the DeepSpeed documentation.
-
Get total number of available GPUs:
- Count number of lines in
$PBS_NODEFILE
(1 host per line) - Count number of GPUs available on current host
NGPUS="$((${NHOSTS}*${NGPU_PER_HOST}))"
- Count number of lines in
-
Launch with
mpiexec
:
-
Create a DeepSpeed compliant
hostfile
, specifying thehostname
and number of GPUs (slots
) for each of our available workers (more info here): -
Create a
.deepspeed_env
(more info here) containing the environment variables our workers will need access to:
Warning
The .deepspeed_env
file expects each line to be of the form
KEY=VALUE
. Each of these will then be set as environment
variables on each available worker specified in our hostfile
.
We can then run the cifar10_deepspeed.py
module using DeepSpeed:
AssertionError: Micro batch sizer per gpu: 0 has to be greater than 0
Depending on the details of your specific job, it may be necessary to
modify the provided ds_config.json
.
If you encounter an error:
you can modify the"train_batch_size": 16
variable in the provided
ds_config.json
to the (total) number of available GPUs, and explicitly
set "gradient_accumulation_steps": 1
, as shown below.
$ export NHOSTS=$(wc -l < "${PBS_NODEFILE}")
$ export NGPU_PER_HOST=$(nvidia-smi -L | wc -l)
$ export NGPUS="$((${NHOSTS}*${NGPU_PER_HOST}))"
$ echo $NHOSTS $NGPU_PER_HOST $NGPUS
24 4 96
$ # replace "train_batch_size" with $NGPUS in ds_config.json
$ # and write to `ds_config-polaris.json`
$ sed \
"s/$(cat ds_config.json| grep batch | cut -d ':' -f 2)/ ${NGPUS},/" \
ds_config.json \
> ds_config-polaris.json
$ cat ds_config-polaris.json
{
"train_batch_size": 96,
"gradient_accumulation_steps": 1,
...
}