Skip to content

Argonne Leadership Computing Facility

Example Programs

SambaNova provides examples of some well-known AI applications under the path: /opt/sambaflow/apps/starters, on all SambaNova compute nodes. Make a copy of this to your home directory:

Copy the starters to your personal directory structure:

cd ~/
mkdir apps
cp -r /opt/sambaflow/apps/starters apps/starters

LeNet

Change directory

cd ~/apps/starters/lenet

Common Arguments

Below are some of the common arguments used across most of the models in the example code.

Argument Default Help
-b 1 Batch size for training
-n, 100 Number of iterations to run
--num-iterations the pef for
-e, 1 Number epochs for training
--num-epochs
--log-path 'check Log path
points'
--num-workers 0 Number of workers
--measure-train- None Measure training performance
performance

LeNet Arguments

Argument Default Help
--lr 0.01 Learning rate for training
--momentum 0.0 Momentum value for training
--weight-decay 0.01 Weight decay for training
--data-path './data' Data path
--data-folder 'mnist_ Folder containing mnist data
data'

Establish the Environment

source /opt/sambaflow/apps/starters/lenet/venv/bin/activate

NOTE: If you receive an \"HTTP error\" message on any of the following commands, run the command again. Such errors (e.g 503) are commonly an intermittent failure to download a dataset.

Run these commands:

srun python lenet.py compile -b=1 --pef-name="lenet" --output-folder="pef"
srun python lenet.py run --pef="pef/lenet/lenet.pef"

To use Slurm sbatch, create submit-lenet-job.sh with the following contents:

#!/bin/sh

python lenet.py compile -b=1 --pef-name="lenet" --output-folder="pef"
python lenet.py run --pef="pef/lenet/lenet.pef"

Then

sbatch --output=pef/lenet/output.log submit-lenet-job.sh

Squeue will give you the queue status.

squeue
# One may also...
watch squeue

One may see the run log using:

cat pef/lenet/output.log

MNIST - Feed Forward Network

Establish the Environment

source /opt/sambaflow/apps/starters/ffn_mnist/venv/bin/activate

Change directory

cd ~/apps/starters/ffn_mnist/

Commands to run MNIST example:

srun python ffn_mnist.py  compile -b 1 --pef-name="ffn_mnist" --mac-v2
srun python ffn_mnist.py  run -b 1 -p out/ffn_mnist/ffn_mnist.pef

To run the same using Slurm sbatch, create and run the submit-ffn_mnist-job.sh with the following contents.

#!/bin/sh
python python ffn_mnist.py  compile -b 1 --pef-name="ffn_mnist" --mac-v2
python ffn_mnist.py  run -b 1 -p out/ffn_mnist/ffn_mnist.pef
sbatch --output=pef/ffn_mnist/output.log submit-ffn_mnist-job.sh

Logistic Regression

Establish the Environment

source /opt/sambaflow/apps/starters/logreg/venv/bin/activate

Change directory

cd ~/apps/starters/logreg

Logistic Regression Arguments

This is not an exhaustive list of arguments.

Arguments

Argument Default Help Step
--lr 0.001 Learning rate for training Compile
--momentum 0.0 Momentum value for training Compile
--weight-decay 1e-4 Weight decay for training Compile
--num-features 784 Number features for training Compile
--num-classes 10 Number classes for training Compile
--weight-norm na Enable weight normalization Compile

Run these commands:

srun python logreg.py compile --pef-name="logreg" --output-folder="pef"
srun python logreg.py test --pef="pef/logreg/logreg.pef"
srun python logreg.py run --pef="pef/logreg/logreg.pef"

To use Slurm, create submit-logreg-job.sh with the following contents:

#!/bin/sh
python logreg.py compile --pef-name="logreg" --output-folder="pef"
python logreg.py test --pef="pef/logreg/logreg.pef"
python logreg.py run --pef="pef/logreg/logreg.pef"

Then

sbatch --output=pef/logreg/output.log submit-logreg-job.sh

The output, pef/logreg/output.log, will look something like this:

2023-03-08 21:18:25.168190: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-08 21:18:25.334389: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-03-08 21:18:25.334430: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-03-08 21:18:26.422458: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-03-08 21:18:26.422701: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-03-08 21:18:26.422709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
[Info][SAMBA]# Placing log files in /home/wilsonb/apps/starters/logreg/pef/logreg/logreg.samba.log
[Info][MAC]# Placing log files in /home/wilsonb/apps/starters/logreg/pef/logreg/logreg.mac.log
...

Epoch [1/1], Step [10000/60000], Loss: 0.4642
Epoch [1/1], Step [20000/60000], Loss: 0.4090
Epoch [1/1], Step [30000/60000], Loss: 0.3863
Epoch [1/1], Step [40000/60000], Loss: 0.3703
Epoch [1/1], Step [50000/60000], Loss: 0.3633
Epoch [1/1], Step [60000/60000], Loss: 0.3553
Test Accuracy: 91.40  Loss: 0.3014
2023-03-08T21:19:08 : [INFO][LIB][2688517]: sn_create_session: PEF File: pef/logreg/logreg.pef

UNet

Establish the Environment

#TODO Update
# source /opt/sambaflow/apps/starters/logreg/venv/bin/activate

Change directory and copy files.

cd ~/apps/image/
mkdir unet
cd ~/apps/image/unet

Copy and paste the contents of unet_compile_run_all.sh to a file with the same name into the current directory using your favorite editor.

chmod +x unet_compile_run_all.sh

Run these commands for training (compile + train):

./unet_compile_run_all.sh compile 256 256
./unet_compile_run_all.sh run 256 256

The performance data is located at the bottom of run_unet_256_256_single_4.log.

Squeue will give you the queue status.

squeue