Building a faster OpenCV on Raspberry Pi

When it comes to embedded computer vision, fractions of code acceleration are regarded as a huge success for programmers. Today, I’ll explain how to build a customized OpenCV for Raspberry Pi as one of the most famous single-board computers. By following these simple tips, you’ll experience a 2-3x faster OpenCV on your board.

Raspberry Pi 2 and higher versions have multi-core CPUs that support ARM NEON technology. Clearly, a code that benefits from these two options will run much faster than a simple bare code. The good news is that most OpenCV functions are parallelized on CPU and a limited number of them benefit from NEON C intrinsics. You can check this by digging into some of the source codes (e.g. KLT tracker implementation in opencv/src/modules/video/src/lkpyramid.cpp) and look for parallel_for_and if CV_NEONstatements.

The bad news is that, if you have previously built OpenCV on your board, it is most likely that your library doesn’t benefit from these options. You can simply check by running the following script:

#include 
#include 

int main()
{
    if(cv::getNumThreads()==4 && cv::checkHardwareSupport(CV_CPU_NEON)==1)
        std::cout << "OpenCV is optimized" << std::endl;
    return 0;
}

If it doesn't print the message, it means that your OpenCV doesn’t support multi-threading and/or NEON vectorization.

1- Uninstall your current OpenCV

To build a customized OpenCV with the mentioned capabilities, you must first delete your current version. If you had installed it via sudo apt install libopencv-dev, just run sudo apt purge libopencv-dev. But if you had installed it from source, you must first go to its build folder and then run:

sudo make uninstall
sudo rm -r *

To make sure that your Pi is clean from any OpenCV lib, run the following commands. If it prints, then there is something wrong with your uninstallation.

cd /usr/local/lib
ls | grep -e libopencv

2 - Install an optimized OpenCV

Before building OpenCV itself, we must install some base dependencies:

sudo apt update 
sudo apt upgrade
sudo apt install build-essential cmake pkg-config
sudo apt install libjpeg-dev libtiff5-dev libpng-dev
sudo apt install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
sudo apt install libxvidcore-dev libx264-dev
sudo apt install libgtk2.0-dev libgtk-3-dev
sudo apt install libatlas-base-dev gfortran

Then install Intel TBB and OpenBLAS via:

sudo apt-get install libtbb-dev
sudo apt-get install libopenblas-dev liblapacke-dev

Now clone OpenCV and OpenCV-Contrib (optional):

cd ~
mkdir OpenCV && cd OpenCV
wget -O opencv.zip https://github.com/opencv/opencv/archive/4.5.0.zip
wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/4.5.0.zip
unzip opencv.zip
unzip opencv_contrib.zip

and finally, run:

cd opencv-4.5.0
mkdir build && cd build

cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=/usr/local -DWITH_TBB=ON -DWITH_LAPACK=ON -DENABLE_VFPV3=ON -DENABLE_NEON=ON -DBUILD_TESTS=OFF -DINSTALL_C_EXAMPLES=OFF -DINSTALL_PYTHON_EXAMPLES=OFF -DBUILD_EXAMPLES=OFF -DOPENCV_EXTRA_MODULES_PATH=~/OpenCV/opencv_contrib-4.5.0/modules ..

make -j2
sudo make install
sudo ldconfig

The flag –DWITH_TBB enables multi-threading, while the flag -DWITH_LAPACK enables faster matrix operations with OpenBLAS. The –DENABLE_VFPV3=ON and –DENABLE_NEON=ONflags enable OpenCV to use the NEON coprocessor for vectorization. Other flags just turn off building tests and example programs to reduce the overall OpenCV building time.

After building, you have an OpenCV that certainly works faster. The following figure shows the boost-up of some of the well-known functions.

3- Notes

If you are using a 64bit OS on your Pi, then don't set –DENABLE_VFPV3=ON and –DENABLE_NEON=ON. It throws an error. CMake itself will find the co-processors automatically (tested with OpenCV 4.2.0 on Ubuntu Mate 18).
Although the documents suggest more threading frameworks like OpenMP, pthreads, Concurrency, and GCD, I didn’t experience a multi-threaded OpenCV after setting –DWITH_OPENMP=ON.
You can also use libjpeg-turbo instead of OpenCV’s default libjpeg library. It uses NEON instructions to boost up Jpeg reading and writing. If you are interested, you must first build it by:

wget https://github.com/libjpeg-turbo/libjpeg-turbo/archive/1.5.0.tar.gz -O libjpeg-turbo.tar.gz
tar xvf libjpeg-turbo.tar.gz
cd libjpeg-turbo
mkdir build
autoreconf –fiv
cd build
sh /configure
make -j4
sudo make install

and then add the following statements to your CMake command:

-DWITH_JPEG=ON -DBUILD_JPEG=OFF -DJPEG_INCLUDE_DIR=/opt/libjpeg-turbo/include/ -DJPEG_LIBRARY=/opt/libjpeg-turbo/lib32/libjpeg.a

The instructions mentioned in this post are not limited to Raspberry Pi. They are applicable to all SBCs equipped with multi-core or newer processors that support Neon technology, including Odroid XU4, BeagleBoard X15, Nvidia Jetson series, etc.

16 replies

Kenton says:
March 13, 2019 at 11:32 am

Hi there! Such a wonderful article, thank
you!
Reply
- Zana Zakaryaie says:
  March 16, 2019 at 11:05 am
  
  Thanks Kenton
  Reply
jsxyhelu says:
August 17, 2020 at 1:22 pm

Hi,in the image show in this blog “you have an OpenCV that certainly works faster”.My question is how you get this image?
Thank you!
Reply
- Zana Zakaryaie says:
  August 20, 2020 at 10:20 am
  
  Hi. You can reproduce this image by comparing the runtimes of these functions before and after the optimizations. I used Matlab’s “bar” function to plot the results.
  Reply
David says:
April 4, 2022 at 1:51 pm

Nice article, attempting to follow the build for ARM VO etc. Kept on failing and realised I needed to increase swap while building opencv

based on: https://www.nerdynat.com/programming/2019/how-to-install-opencv-on-raspberry-pi-3b/
Reply
- Zana Zakaryaie says:
  April 4, 2022 at 2:08 pm
  
  Thank you David for reporting this. I had forgotten to mention swap as a note. Generally, when building big libraries with multiple cores, there is a chance to run out of memory (RAM). In such scenarios, the operating system will kill the build process unless some swap memory is available there. As you may know, swap allows the OS to use the disk space (microSD here) as extra RAM. Obviously, it is too slow compared to RAM but it can at least avoid kills. BTW, make sure to call swapoff after you built OpenCV. I read somewhere that using swap space frequently might damage the microSD.
  Reply
dfhjkfdj says:
April 10, 2022 at 12:10 am

I am getting the error: “./ARM_VO: error while loading shared libraries: libopencv_gapi.so.405: cannot open shared object file: No such file or directory”
Reply
- Zana Zakaryaie says:
  April 10, 2022 at 9:22 am
  
  Hi. Please make a new issue in ARM-VO’s Github repo. I will answer there. Thanks
  Reply
Jeva says:
July 16, 2022 at 8:27 pm

Bro, will this method work for opencv-python too?
Reply
- Zana Zakaryaie says:
  July 17, 2022 at 11:48 am
  
  Hi Jeva.
  opencv-python is just an API that calls the underlying C/C++ functions. So, if the C/C++ functions are optimized, the python calls run faster too.
  Reply
  - Golden Strawberry says:
    April 8, 2023 at 6:18 am
    
    hi,
    i was able to follow the instructions with out errors but in python it keeps saying that cv2 cant be found despite the fact its in the site-packages folder for my specific version of python im using
    Reply
    - Zana Zakaryaie says:
      April 10, 2023 at 4:19 pm
      
      Hi
      This is strange because your site-packages folder is populated with opencv files. After a short search, I found a GitHub issue that might be helpful for you:
      https://github.com/opencv/opencv/issues/21471
      Reply
zoldaten says:
August 29, 2022 at 11:08 am

hi !
i tring to check opencv using script at start of post.
what should go after #include ?
Reply
- Zana Zakaryaie says:
  August 30, 2022 at 11:55 am
  
  Hi. Unfortunately, my editor removes includes due to some kind of conflict with the website’s tags! The first include is “iostream” and the other one is “opencv2/opencv.hpp”
  Reply
zoldaten says:
August 29, 2022 at 12:54 pm

tried to build on raspberry pi 4 aarch64 but got error:
CMake Error at cmake/OpenCVCompilerOptimizations.cmake:546 (message):
Required baseline optimization is not supported: VFPV3
(CPU_BASELINE_REQUIRE=;VFPV3;NEON)
Reply
- Zana Zakaryaie says:
  August 30, 2022 at 11:59 am
  
  As explained in section “3-Notes”, there is no need to pass vectorization flags for 64bit operating systems
  Reply