Building a faster OpenCV on Raspberry Pi

When it comes to embedded computer vision, fractions of code acceleration are regarded as a huge success for programmers. Today, I’ll explain how to build a customized OpenCV for Raspberry Pi as one of the most famous single board computers. By following these simple tips, you’ll experience a 2-3x faster OpenCV on your board.

Raspberry Pi 2 and higher versions have multi-core CPUs that support ARM NEON technology. Clearly, a code that benefits from these two options will run much faster than a simple bare code. The good news is that most OpenCV functions are parallelized on CPU and a limited number of them benefit from NEON C intrinsics. You can check this by digging into some of the source codes (e.g. KLT tracker implementation in opencv/src/modules/video/src/lkpyramid.cpp) and look for parallel_for_and if CV_NEONstatements.

The bad news is that, if you have previously built OpenCV on your board, it is most likely that your library doesn’t benefit from these options. You can simply check by running the following script:

#include <iostream>
#include <opencv2/opencv.hpp>

using namespace std;
using namespace cv;

int main()
{
    if(getNumThreads()==4 && checkHardwareSupport(CV_CPU_NEON)==1)
        cout << "OpenCV is optimized" << endl;
    return 0;
}

If it doesn’t print the message, it means that your OpenCV doesn’t support multi-threading and/or NEON vectorization. To build a customized OpenCV with mentioned capabilities, you must first install Intel TBB via:

sudo apt-get install libtbb-dev

and then set the following flags in your cmake command:

cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=/usr/local -DWITH_TBB=ON -DENABLE_VFPV3=ON -DENABLE_NEON=ON -DBUILD_TESTS=OFF -DINSTALL_C_EXAMPLES=OFF -DINSTALL_PYTHON_EXAMPLES=OFF -DBUILD_EXAMPLES=OFF ..

The flag –DWITH_TBB enables multi-threading, while the flags –DENABLE_VFPV3=ON and –DENABLE_NEON=ONenable it to use the NEON coprocessor for vectorization. Other flags turn off building test and example programs just to reduce the overall OpenCV building time.

After building, you have an OpenCV that certainly works faster. The following figure shows the boost-up of some of the well-known functions.

Notes


1- If you are using a 64bit OS on you Pi, then don’t set  –DENABLE_VFPV3=ON and –DENABLE_NEON=ON. It throws an error. CMake itself will find the co-processors automatically (tested with OpenCV 4.2.0 on Ubuntu Mate 18).

2– Although the documents suggest more threading frameworks like OpenMP, pthreads, Concurrency, and GCD, I didn’t experience a multi-threaded OpenCV after setting –DWITH_OPENMP=ON.

3 – You can also use libjpeg-turbo instead of the OpenCV’s default libjpeg library. It uses NEON instructions to boost up Jpeg reading and writing. If you are interested, you must first build it by:

wget https://github.com/libjpeg-turbo/libjpeg-turbo/archive/1.5.0.tar.gz -O libjpeg-turbo.tar.gz
tar xvf libjpeg-turbo.tar.gz
cd libjpeg-turbo
mkdir build
autoreconf –fiv
cd build
sh /configure
make -j4
sudo make install

and then add the following statements to your CMake command:

-DWITH_JPEG=ON -DBUILD_JPEG=OFF -DJPEG_INCLUDE_DIR=/opt/libjpeg-turbo/include/ -DJPEG_LIBRARY=/opt/libjpeg-turbo/lib32/libjpeg.a

4- The instructions mentioned in this post are not limited to Raspberry Pi. They are applicable to all SBCs equipped with multi-core or newer processors that support Neon technology, including Odroid XU4, BeagleBoard X15, Nvidia Jetson series, etc.

4 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

15 − 14 =

Related Posts: