DeepSpeed
The base conda
environment on ThetaGPU comes with Microsoft's
DeepSpeed pre-installed. Instructions
for using / cloning the base environment can be found
here.
We describe below the steps needed to get started with DeepSpeed on ThetaGPU.
We focus on the cifar
example provided in the
DeepSpeedExamples repository,
though this approach should be generally applicable for running any model with
DeepSpeed support.
Running DeepSpeed on ThetaGPU
Note
The instructions below should be ran directly from a compute node.
Explicitly, to request an interactive job (from thetalogin
):
qsub-gpu -A <project> -n 2 -t 01:00 -q full-node \
--attrs="filesystems=home,grand,eagle,theta-fs0:ssds=required" \
-I
Refer to GPU Node Queue and Policy.
-
Load
conda
module and activate base environment: -
Clone microsoft/DeepSpeedExamples and navigate into the directory:
-
Our newer conda environments should come with DeepSpeed pre-installed, but in the event your environment has no
deepspeed
, it can be installed2 withpip
:
Launching DeepSpeed
-
Get total number of available GPUs:
- Count number of lines in
$COBALT_NODEFILE
(1 host per line) - Count number of GPUs available on current host
NGPUS = $((${NHOSTS}*${NGPU_PER_HOST}))
- Count number of lines in
-
Launch with
mpirun
1:
-
Create a DeepSpeed compliant
hostfile
, specifying the hostname and number of GPUs (slots
) for each of our available workers: -
Create a
.deepspeed_env
containing the environment variables our workers will need access to:
Warning
The .deepspeed_env
file expects each line to be of the form
KEY=VALUE
. Each of these will then be set as environment variables on
each available worker specified in our hostfile
.
We can then run the cifar10_deepspeed.py
module using DeepSpeed:
AssertionError: Micro batch sizer per gpu: 0 has to be greater than 0
Depending on the details of your specific job, it may be necessary to
modify the provided ds_config.json
.
If you encounter an error:
you can modify the"train_batch_size": 16
variable in the provided
ds_config.json
to the (total) number of available GPUs, and explicitly set "gradient_accumulation_steps": 1
, as shown below.
$ export NHOSTS=$(wc -l < "${COBALT_NODEFILE}")
$ export NGPU_PER_HOST=$(nvidia-smi -L | wc -l)
$ export NGPUS="$((${NHOSTS}*${NGPU_PER_HOST}))"
$ echo $NHOSTS $NGPU_PER_HOST $NGPUS
2 8 16
$ # replace "train_batch_size" with $NGPUS in ds_config.json
$ # and write to `ds_config-polaris.json`
$ sed \
"s/$(cat ds_config.json| grep batch | cut -d ':' -f 2)/ ${NGPUS},/" \
ds_config.json \
> ds_config-polaris.json
$ cat ds_config-polaris.json
{
"train_batch_size": 16,
"gradient_accumulation_steps": 1,
...
}
-
The flag
-x ENVIRONMENT_VARIABLE
ensures the$ENVIRONMENT_VARIABLE
will be set in the launched processes. ↩ -
Additional details for installing DeepSpeed can be found int their docs from: Installation Details ↩