Skip to content

Argonne Leadership Computing Facility

Steps to Run a Model/Program

Note: Please be mindful of how you are using the system. For example, consider running larger jobs in the evening or on weekends.

Running of any model or application includes graph compilation of the model that is then deployed on the IPUs. Below is the description of training a neural network for classification on the MNIST dataset using the PopTorch (pytorch framework optimized for IPU).

Examples Repo

Graphcore provides examples of some well-known AI applications in their repository at

Clone the examples repository to your personal directory structure, and checkout the v3.1.0 release:

mkdir ~/graphcore
cd ~/graphcore
git clone
cd examples


Activate PopTorch Environment

Follows the steps at Poptorch environment setup to enable the Poplar SDK.

source ~/venvs/graphcore/poptorch31_env/bin/activate

Install Requirements

Change directory and install packages specific to the MNIST model:

cd ~/graphcore/examples/tutorials/simple_applications/pytorch/mnist
python -m pip install torchvision==0.14.0


Execute the command:

/opt/slurm/bin/srun --ipus=1 python

All models are run using Slurm, with the --ipus indicating how many IPUs are need to be allocated for the model being run. This example uses a batchsize of 8, and run for 10 epochs. It also set the device iteration to 50 which is the number of iterations the device should run over the data before returning to the user. The dataset used in the example is derived from the TorchVision and the PopTorch dataloader is used to load the data required for the 50 device iterations from the host to the device in a single step.

The model used here is a simple CNN based model with an output from a classifier (softmax layer). A simple Pytorch model is translated to a PopTorch model using poptorch.Options(). poptorch.trainingModel is the model wrapping function on the Pytorch model. The first call to trainingModel will compile the model for the IPU. You can observe the compilation process as part of output of the above command.

Graph compilation:   3%|▎         | 3/100 [00:00<00:03]2023-04-26T16:53:21.225944Z PL:POPLIN    3680893.3680893 W: poplin::preplanMatMuls() is deprecated! Use poplin::preplan() instead
Graph compilation: 100%|██████████| 100/100 [00:20<00:00]2023-04-26T16:53:38.241395Z popart:session 3680893.3680893

The artifacts from the graph compilations is cached in the location set by the flag POPTORCH_CACHE_DIR, where the .popef file corresponding to the model under consideration is cached.


The expected output will start with downloads followed by and we can observe the model used by the model, the progress bar of the compilation process, and the training progress bar.

srun: job 2623 queued and waiting for resources
srun: job 2623 has been allocated resources
/home/arnoldw/workspace/poptorch31.env/lib/python3.8/site-packages/torchvision/io/ UserWarning: Failed to load image Python extension: cannot open shared object file: No such file or directory
  warn(f"Failed to load image Python extension: {e}")
Epochs:   0%|          | 0/10 [00:00<?,[16:58:56.683] [poptorch:cpp] [warning] [DISPATCHER] Type coerced from Long to Int for tensor id 10
Graph compilation: 100%|██████████| 100/100 [00:20<00:00]
Epochs: 100%|██████████| 10/10 [01:35<00:00,  9.57s/it]
Graph compilation: 100%|██████████| 100/100 [00:13<00:00]
TrainingModelWithLoss(%|█████████▋| 97/100 [00:13<00:01]
  (model): Network(
    (layer1): Block(
      (conv): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (relu): ReLU()
    (layer2): Block(
      (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (relu): ReLU()
    (layer3): Linear(in_features=1600, out_features=128, bias=True)
    (layer3_act): ReLU()
    (layer3_dropout): Dropout(p=0.5, inplace=False)
    (layer4): Linear(in_features=128, out_features=10, bias=True)
    (softmax): Softmax(dim=1)
  (loss): CrossEntropyLoss()
Accuracy on test set: 98.59%

Refer to the script to learn more about this example.

Example Programs lists the different example applications with corresponding commands for each of the above steps.