Scaling ResNet50¶
Follow all the instructions in Getting Started to log into a Graphcore node.
Examples Repo¶
Graphcore provides examples of some well-known AI applications in their repository at https://github.com/graphcore/examples.git.
Clone the examples repository to your personal directory structure:
Environment Setup¶
Establish a virtual environment.
mkdir -p ~/venvs/graphcore
rm -rf ~/venvs/graphcore/poptorch31_rn50_env
virtualenv ~/venvs/graphcore/poptorch31_rn50_env
source ~/venvs/graphcore/poptorch31_rn50_env/bin/activate
Install PopTorch¶
Install PopTorch.
POPLAR_SDK_ROOT=/software/graphcore/poplar_sdk/3.1.0
export POPLAR_SDK_ROOT=$POPLAR_SDK_ROOT
pip install $POPLAR_SDK_ROOT/poptorch-3.1.0+98660_0a383de63f_ubuntu_20_04-cp38-cp38-linux_x86_64.whl
Environment Variables¶
Establish the following environment variables.
mkdir ${HOME}/tmp
export TF_POPLAR_FLAGS=--executable_cache_path=${HOME}/tmp
export POPTORCH_CACHE_DIR=${HOME}/tmp
export POPART_LOG_LEVEL=WARN
export POPLAR_LOG_LEVEL=WARN
export POPLIBS_LOG_LEVEL=WARN
export PYTHONPATH=/software/graphcore/poplar_sdk/3.1.0/poplar-ubuntu_20_04-3.1.0+6824-9c103dc348/python:$PYTHONPATH
Install Requirements¶
One-time per user ssh key set up¶
Set up the ssh key on gc-poplar-01.
Gc-poplar-01¶
On gc-poplar-01:
mkdir ~/.ssh
cd ~/.ssh
ssh-keygen -t rsa -b 4096
#Accecpt default filename of id_rsa
#Enter passphrase (empty for no passphrase):
#Enter same passphrase again:
cat id_rsa.pub >> authorized_keys
You should see:
# gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
You should see:
# gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
You should see:
# gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
You should see:
# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
benchmarks.yml
¶
Update ${HOME}/graphcore/examples/vision/cnns/pytorch/train/benchmarks.yml with your favorite editor to match benchmarks.yml.
configs.yml
¶
Update ${HOME}/graphcore/examples/vision/cnns/pytorch/train/configs.yml with your favorite editor. At about line 30, change use_bbox_info: true to use_bbox_info: false.
Scale ResNet50¶
Scale and benchmark ResNet50.
Note: The number at the end of each line indicates the number of IPUs.
Note: Use screen because every run is long.
"PopRun exposes this control with the --process-placement flag and provides multiple pre-defined strategies. By default (and with --process-placement spreadnuma), PopRun is designed to be NUMA-aware. On each host, all the available NUMA nodes are divided among the instances. This means that each instance is bound to execute on and allocate memory from its assigned NUMA nodes, ensuring memory access locality. This strategy maximises memory bandwidth and is likely to yield optimal performance for most of the data loading workloads in machine learning." [Multi-Instance Multi-Host(https://docs.graphcore.ai/projects/poprun-user-guide/en/latest/launching.html#multi-instance-multi-host)
Setup¶
Move to the correct directory and establish the datasets directory.
cd ${HOME}/graphcore/examples/vision/cnns/pytorch/train
export DATASETS_DIR=/mnt/localdata/datasets/
Scaling to 16 IPUs¶
One may use any of the following commands to run ResNet50 on one to sixteen IPUs.
python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_1
python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_2
python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_4
python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_8
python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_pod16
Scaling to 64 IPUs¶
Note: One must complete the instructions on Multi-node Setup before running this example.
Establish Environment Variables¶
HOST1=`ifconfig eno1 | grep "inet " | grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' | head -1`
OCT123=`echo "$HOST1" | cut -d "." -f 1,2,3`
OCT4=`echo "$HOST1" | cut -d "." -f 4`
HOST2=$OCT123.`expr $OCT4 + 1`
HOST3=$OCT123.`expr $OCT4 + 2`
HOST4=$OCT123.`expr $OCT4 + 3`
export HOSTS=$HOST1,$HOST2,$HOST3,$HOST4
export CLUSTER=c16
export IPUOF_VIPU_API_PARTITION_ID=p64
export TCP_IF_INCLUDE=$OCT123.0/8
export IPUOF_VIPU_API_HOST=$HOST1
64 IPU Run¶
This runs to convergence. It uses all 64 IPUs for more than 12 hours.
Note: This should only be used if absolutely required.
Execute:
python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_pod64
python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_pod64_conv
Benchmark Results¶
One IPU¶
[INFO] 2022-12-16 17:07:32: Total runtime: 3956.836479 seconds
[INFO] 2022-12-16 17:07:32: throughput = '7527.626315789474'
[INFO] 2022-12-16 17:07:32: accuracy = '57.41'
[INFO] 2022-12-16 17:07:32: loss = '2.8153'
[INFO] 2022-12-16 17:07:33: Total compile time: 429.59 seconds
Two IPUs¶
[INFO] 2022-12-16 15:56:23: Total runtime: 5866.494071 seconds
[INFO] 2022-12-16 15:56:23: throughput = '4798.778947368421'
[INFO] 2022-12-16 15:56:23: accuracy = '68.23'
[INFO] 2022-12-16 15:56:23: loss = '2.3148'
[INFO] 2022-12-16 15:56:24: Total compile time: 418.75 seconds
Four IPUs¶
[INFO] 2022-12-16 04:05:28: Total runtime: 3070.994553 seconds
[INFO] 2022-12-16 04:05:28: throughput = '9959.821052631578'
[INFO] 2022-12-16 04:05:28: accuracy = '67.76'
[INFO] 2022-12-16 04:05:28: loss = '2.338'
[INFO] 2022-12-16 04:05:29: Total compile time: 377.4 seconds
Eight IPUs¶
[INFO] 2022-12-16 02:46:45: Total runtime: 1831.437598 seconds
[INFO] 2022-12-16 02:46:45: throughput = '19865.263157894733'
[INFO] 2022-12-16 02:46:45: accuracy = '64.94'
[INFO] 2022-12-16 02:46:45: loss = '2.4649'
[INFO] 2022-12-16 02:46:46: Total compile time: 386.27 seconds
Sixteen IPUs¶
Epochs: 20
[INFO] 2022-12-15 22:01:14: Total runtime: 1297.274336 seconds
[INFO] 2022-62:01:14: throughput = '39057.447368421046'
[INFO] 2022-12-15 22:01:14: accuracy = '57.43'
[INFO] 2022-12-15 22:01:14: loss = '2.8162'
[INFO] 2022-12-15 22:01:16: Total compile time: 397.08 seconds