CosmicTagger Conversion
The intent of this page is to show conceptually how to convert a model to run on the SambaNova system. It is not necessary to convert CosmicTagger because it has already been converted and is located at CosmicTagger on the SambaNova branch. The original is located at CosmicTagger.
Run Model on CPU
The first step to converting a model is to verify that it runs on the CPU. This step has been verified for CosmicTagger.
Config.py
CosmicTagger can run on multiple machines. As such, it is necessary to specify the architecture that one is using. For example, CPU or GPU. The architecture is stored in the ComputeMode class.
Edit src/config/config.py. Add RDU to the ComputeMode class.
Trainer.py
Edit src/utils/torch/trainer.py.
Import SambaNova Packages
Insert the imports at the top of the file.
SambaFlow is a complete software stack designed to take input from standard machine learning frameworks such as PyTorch and TensorFlow. SambaFlow automatically extracts, optimizes, and maps dataflow graphs onto RDUs.
try:
from sambaflow import samba
import sambaflow.samba.utils as utils
from sambaflow.samba.utils.argparser import parse_app_args
from sambaflow.samba.utils.common import common_app_driver
except:
pass
Wrap Model
Wrap the model using poptorch.trainingModel() so that it may be ran on IPUs for training.
Wrap the model using poptorch.inferenceModel() when not training.
Find the following code around line 90 in the init_network method.
# Foregoing any fusions as to not disturb the existing ingestion pipeline
if self.is_training() and self.args.mode.quantization_aware:
self._raw_net.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')
self._net = torch.quantization.prepare_qat(self._raw_net)
else:
self._net = self._raw_net
After the above code, add:
if self.args.run.compute_mode == ComputeMode.IPU:
if self.is_training():
opts = poptorch.Options()
self._net = poptorch.trainingModel(self._net, opts, optimizer=torch.optim.SGD(self._net.parameters(), lr=1e-3))
else:
self._net = poptorch.inferenceModel(self._net)
See poptorch.trainingModel() and poptorch.inferenceModel() for more information.
There is also a Build the Model tutorial.
Update Optimizer
Update init_optimizer() to use the poptorch class instead of the torch class as needed.
Change:
if self.args.mode.optimizer.name == OptimizerKind.rmsprop:
self._opt = torch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)
else:
self._opt = torch.optim.Adam(self._net.parameters(), 1.0)
to:
if self.args.mode.optimizer.name == OptimizerKind.rmsprop:
if self.args.run.compute_mode == ComputeMode.IPU:
self._opt = poptorch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)
else:
self._opt = torch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)
else:
if self.args.run.compute_mode == ComputeMode.IPU:
self._opt = poptorch.optim.Adam(self._net.parameters(), 1.0)
else:
self._opt = torch.optim.Adam(self._net.parameters(), 1.0)
Update the Forward Pass
Putting the loss calculation in forward_pass() allows the loss computation to be performed on the IPUs. This will be faster because the data will not need to be transfered round-trip to the CPU.
Change forward_pass():
Original
if net is None:
logits_image = self._net(minibatch_data['image'])
else:
logits_image = net(minibatch_data['image'])
Updated
The following code changes are to account for the loss function, i.e., self.loss_calculator, and the image labels, i.e., labels_image, to be passed to the model's forward_pass method. Additionally, the calculated loss is returned from the forward_pass method.
if net is None:
if self.args.run.compute_mode == ComputeMode.IPU:
logits_image, labels_image, loss = self._net(minibatch_data['image'], self.loss_calculator, labels_image)
return logits_image, labels_image, loss
else:
logits_image = self._net(minibatch_data['image'])
else:
if self.args.run.compute_mode == ComputeMode.IPU and self.args.mode.name != ModeKind.inference:
logits_image, labels_image, loss = net(minibatch_data['image'], self.loss_calculator, labels_image)
return logits_image, labels_image, loss
else:
logits_image = net(minibatch_data['image'])
Update the Training Step
Receive the extra loss variable from the forward_pass method.
Update the train_step method.
Original Training Step
with self.timing_context("forward"):
if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:
with torch.cuda.amp.autocast():
logits_image, labels_image = self.forward_pass(minibatch_data)
else:
logits_image, labels_image = self.forward_pass(minibatch_data)
verbose = False
# Compute the loss based on the logits
with self.timing_context("loss"):
loss = self.loss_calculator(labels_image, logits_image)
Updated Training Step
The forward_pass() method was changed to return the extra variable loss in the previous section. It is now received conditionally when using an IPU(s).
In the with self.timing_context("loss"): section, only calculate loss if not using an IPU(s).
with self.timing_context("forward"):
if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:
with torch.cuda.amp.autocast():
logits_image, labels_image = self.forward_pass(minibatch_data)
else:
if self.args.run.compute_mode == ComputeMode.IPU:
logits_image, labels_image, loss = self.forward_pass(minibatch_data)
else:
logits_image, labels_image = self.forward_pass(minibatch_data)
verbose = False
# Compute the loss based on the logits
with self.timing_context("loss"):
if self.args.run.compute_mode == ComputeMode.IPU:
loss = loss
else:
loss = self.loss_calculator(labels_image, logits_image)
Update Validation Step
Update the val_step method.
Original Validation Step Code
Find this code.
if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:
with torch.cuda.amp.autocast():
logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)
else:
logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)
# Compute the loss based on the logits
loss = self.loss_calculator(labels_image, logits_image)
Updated Validation Step Code
Change the code to the following.
if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:
with torch.cuda.amp.autocast():
logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)
# Compute the loss based on the logits
loss = self.loss_calculator(labels_image, logits_image)
else:
if self.args.run.compute_mode == ComputeMode.IPU:
logits_image, labels_image, loss = self.forward_pass(minibatch_data, net=val_net)
else:
logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)
# Compute the loss based on the logits
loss = self.loss_calculator(labels_image, logits_image)
UResNet2D Model
Update Model
The Graphcore system is more computationally efficient if the loss function is on the IPU. This is accomplished by using the loss function within the model's forward method.
Edit src/networks/torch/uresnet2D.py.
Update the Forward Declaration
Find the forward method.
Update the argument list to include the loss function, i.e., loss_calculator and the image labels, i.e., labels_image.
Add Loss Calculation
Add the loss calculation just before the forward method returns.
if loss_calculator is not None:
labels_image = labels_image.long()
labels_image = torch.chunk(labels_image, chunks=3, dim=1)
shape = labels_image[0].shape
labels_image = [ _label.view([shape[0], shape[-2], shape[-1]]) for _label in labels_image ]
loss = loss_calculator(labels_image, x)
import poptorch
loss = poptorch.identity_loss(loss , reduction="mean")
return x, labels_image, loss
# This return already exists.
return x
The poptorch.identity_loss method takes a single PyTorch tensor and will backpropagate a gradient of ones through it. You may find an example at here
bin/exec.py
The following is included for completeness. One will not likely find this in other code.
Open bin/exec.py in your favorite editor. Change:
to