Example Programs
SambaNova provides examples of some well-known AI applications under the path: /opt/sambaflow/apps/starters
, on all SambaNova compute nodes. Make a copy of this to your home directory:
Copy the starters to your personal directory structure:
LeNet
Change directory
Common Arguments
Below are some of the common arguments used across most of the models in the example code.
Argument | Default | Help |
---|---|---|
-b | 1 | Batch size for training |
-n, | 100 | Number of iterations to run |
--num-iterations | the pef for | |
-e, | 1 | Number epochs for training |
--num-epochs | ||
--log-path | 'check | Log path |
points' | ||
--num-workers | 0 | Number of workers |
--measure-train- | None | Measure training performance |
performance | ||
LeNet Arguments
Argument | Default | Help |
---|---|---|
--lr | 0.01 | Learning rate for training |
--momentum | 0.0 | Momentum value for training |
--weight-decay | 0.01 | Weight decay for training |
--data-path | './data' | Data path |
--data-folder | 'mnist_ | Folder containing mnist data |
data' | ||
Establish the Environment
NOTE: If you receive an \"HTTP error\" message on any of the following commands, run the command again. Such errors (e.g 503) are commonly an intermittent failure to download a dataset.
Run these commands:
srun python lenet.py compile -b=1 --pef-name="lenet" --output-folder="pef"
srun python lenet.py run --pef="pef/lenet/lenet.pef"
To use Slurm sbatch, create submit-lenet-job.sh with the following contents:
#!/bin/sh
python lenet.py compile -b=1 --pef-name="lenet" --output-folder="pef"
python lenet.py run --pef="pef/lenet/lenet.pef"
Then
Squeue will give you the queue status.
One may see the run log using:
MNIST - Feed Forward Network
Establish the Environment
Change directory
Commands to run MNIST example:
srun python ffn_mnist.py compile -b 1 --pef-name="ffn_mnist" --mac-v2
srun python ffn_mnist.py run -b 1 -p out/ffn_mnist/ffn_mnist.pef
To run the same using Slurm sbatch, create and run the submit-ffn_mnist-job.sh with the following contents.
#!/bin/sh
python python ffn_mnist.py compile -b 1 --pef-name="ffn_mnist" --mac-v2
python ffn_mnist.py run -b 1 -p out/ffn_mnist/ffn_mnist.pef
Logistic Regression
Establish the Environment
Change directory
Logistic Regression Arguments
This is not an exhaustive list of arguments.
Arguments
Argument | Default | Help | Step |
---|---|---|---|
--lr | 0.001 | Learning rate for training | Compile |
--momentum | 0.0 | Momentum value for training | Compile |
--weight-decay | 1e-4 | Weight decay for training | Compile |
--num-features | 784 | Number features for training | Compile |
--num-classes | 10 | Number classes for training | Compile |
--weight-norm | na | Enable weight normalization | Compile |
Run these commands:
srun python logreg.py compile --pef-name="logreg" --output-folder="pef"
srun python logreg.py test --pef="pef/logreg/logreg.pef"
srun python logreg.py run --pef="pef/logreg/logreg.pef"
To use Slurm, create submit-logreg-job.sh with the following contents:
#!/bin/sh
python logreg.py compile --pef-name="logreg" --output-folder="pef"
python logreg.py test --pef="pef/logreg/logreg.pef"
python logreg.py run --pef="pef/logreg/logreg.pef"
Then
The output, pef/logreg/output.log, will look something like this:
2023-03-08 21:18:25.168190: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-08 21:18:25.334389: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-03-08 21:18:25.334430: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-03-08 21:18:26.422458: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-03-08 21:18:26.422701: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-03-08 21:18:26.422709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
[Info][SAMBA]# Placing log files in /home/wilsonb/apps/starters/logreg/pef/logreg/logreg.samba.log
[Info][MAC]# Placing log files in /home/wilsonb/apps/starters/logreg/pef/logreg/logreg.mac.log
...
Epoch [1/1], Step [10000/60000], Loss: 0.4642
Epoch [1/1], Step [20000/60000], Loss: 0.4090
Epoch [1/1], Step [30000/60000], Loss: 0.3863
Epoch [1/1], Step [40000/60000], Loss: 0.3703
Epoch [1/1], Step [50000/60000], Loss: 0.3633
Epoch [1/1], Step [60000/60000], Loss: 0.3553
Test Accuracy: 91.40 Loss: 0.3014
2023-03-08T21:19:08 : [INFO][LIB][2688517]: sn_create_session: PEF File: pef/logreg/logreg.pef
UNet
Establish the Environment
Change directory and copy files.
Copy and paste the contents of unet_compile_run_all.sh to a file with the same name into the current directory using your favorite editor.
Run these commands for training (compile + train):
The performance data is located at the bottom of run_unet_256_256_single_4.log.
Squeue will give you the queue status.