LibTorch C++ Library
LibTorch is a C++ library for Torch, with many of the API that are available in PyTorch. Users can find more information on the PyTorch documentation. This is useful to integrate the Torch ML framework into traditional HPC simulation codes and therefore enable training and inferecing of ML models. During compilation, Intel optimizations will be activated automatically once the IPEX dynamic library is linked.
Environment Setup
To use LibTorch on Polaris, load the ML frameworks module
which will also loadsPrgEnv-gnu/8.5.0
and cmake
.
Torch Libraries
With the ML frameworks module loaded as shown above, run
python -c 'import torch; print(torch.__path__[0])'
python -c 'import torch;print(torch.utils.cmake_prefix_path)'
Linking the Torch Libraries
When using the CMake build system, the LibTorch libraries can be linked to an example C++ application using the following CMakeLists.txt
file
cmake_minimum_required(VERSION 3.5 FATAL_ERROR)
cmake_policy(SET CMP0074 NEW)
project(project-name)
find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS} -Wl,--no-as-needed")
set(TORCH_LIBS ${TORCH_LIBRARIES})
add_executable(exe main.cpp)
target_link_libraries(exe ${TORCH_LIBS})
set_property(TARGET exe PROPERTY CXX_STANDARD 17)
and configuring the build with
cmake \
-DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` \
./
make
Device Introspection
Similarly to PyTorch, LibTorch provides API to perform instrospection on the devices available on the system. The simple code below shows how to check if CUDA devices are available, how many are present, and how to loop through them to discover some properties.
#include <torch/torch.h>
int main(int argc, const char* argv[])
{
torch::DeviceType device;
int num_devices = 0;
if (torch::cuda::is_available()) {
std::cout << "CUDA devices detected" << std::endl;
device = torch::kCUDA;
num_devices = torch::cuda::device_count();
std::cout << "Number of CUDA devices: " << num_devices << std::endl;
} else {
device = torch::kCPU;
std::cout << "No CUDA devices detected, setting device to CPU" << std::endl;
}
return 0;
}
Model Inferencing Using the Torch API
This example shows how to perform inference with the ResNet50 model using LibTorch.
First, get a jit-traced version of the model executing python resnet50_trace.py
(shown below) on a compute node.
import torch
import torchvision
from time import perf_counter
device = 'cuda'
model = torchvision.models.resnet50()
model.to(device)
model.eval()
dummy_input = torch.rand(1, 3, 224, 224).to(device)
model_jit = torch.jit.trace(model, dummy_input)
tic = perf_counter()
predictions = model_jit(dummy_input)
toc = perf_counter()
print(f"Inference time: {toc-tic}")
torch.jit.save(model_jit, f"resnet50_jit.pt")
Then, build inference-example.cpp
(shown below)
#include <torch/torch.h>
#include <torch/script.h>
int main(int argc, const char* argv[]) {
torch::jit::script::Module model;
try {
model = torch::jit::load(argv[1]);
std::cout << "Loaded the model\n";
}
catch (const c10::Error& e) {
std::cerr << "error loading the model\n";
return -1;
}
model.to(torch::Device(torch::kCUDA));
std::cout << "Model offloaded to GPU\n\n";
auto options = torch::TensorOptions()
.dtype(torch::kFloat32)
.device(torch::kCUDA);
torch::Tensor input_tensor = torch::rand({1,3,224,224}, options);
assert(input_tensor.dtype() == torch::kFloat32);
assert(input_tensor.device().type() == torch::kCUDA);
std::cout << "Created the input tesor on GPU\n";
torch::Tensor output = model.forward({input_tensor}).toTensor();
std::cout << "Performed inference\n\n";
std::cout << "Slice of predicted tensor is : \n";
std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/10) << '\n';
return 0;
}
and execute it with ./inference-example ./resnet50_jit.pt
.