Image classification with pre-trained models using libtorch (Pytorch C++ API)
Deep learning has revolutionized computer vision. There are thousands of Python code snippets to start but few ones in C++. If you like C++ like me and want to deploy your models in edge, then this series of posts are for you.As a gentle introduction, I will explain how to use libtorch to do image classification using pre-trained models.But there will be much more exciting posts in the future ;) Stay tuned.
1. Environment setup
We start by downloading a pre-built version of the libtorch. This version doesn’t allow to use GPUs but we don’t need it for now.
The next step is to download Torchvision. It is a package from Pytorch which has popular image datasests and model architectures. Unfortunately, its C++ API doesn’t support the use of pre-trained models currently. But there is another way. We import the model in Python, trace it by a random input data, save the JIT traced model, and load it in our C++ code. To install torchvision:
pip3 install torchvision
2. The code
First of all, let’s save the JIT traced model. I used the pre-trained version of SqueezNet. It’s one of the most efficient networks. You can check other models here.
# An instance of your model.
model = torchvision.models.squeezenet1_0(pretrained=True)
# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)
# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)
# Save the model
Run the above code and make sure the .pt file has been generated. It’s time to dive into the C++ code:
The code is clear but I try to explain some parts. First, the model is loaded in the main()function. To feed an image to this model, we have to save the image in a torch::Tensorvariable. This is what read_image()function is responsible for. It uses OpenCV to read the image, crop it in center, resize it to the fixed size of 224×224, and reorder its channels from BGR to RGB. The pre-trained models in Pytorch need the input images to be in the range of [0-1], normalized by mean=[0.485, 0.456, 0.406]and std=[0.229, 0.224, 0.225]. Thepermutefunction reorders the image from HxWxC to CxHxW which is the standard description of tensors in torch::Tensortype.
The image which is loaded in torch::Tensoris added to a vector of torch::jit::IValueto be prepared for feeding to the model. To have a better sense of the network output, the probabilities are transferred to a range of [0-1] using softmax()function. The highest probability and the corresponding class index are obtained using max()function. Class name is retrieved from a .txt file that contains 1000 class names of ImageNet dataset.
There is only one step remaining. We must build the code and run it.
Use the above CMakeLists.txt in the build folder. Give the path of the pre-built libtorch and generate the Makefile. Run the executable by giving the JIT traced model file, label .txt file, and image file respectively.
cmake -DCMAKE_PREFIX_PATH=/home/zana/Pytorch/libtorch/ ..
./classify ../models/traced_squeezenet_model.pt ../labels/imageNetLabels.txt ../images/panda.jpg