[ARM] Arm Compute Library for computer vision and machine learning
基本信息
一、资料
Arm Compute Library for computer vision and machine learning now publicly available!, 3 years ago.
https://github.com/ARM-software/ComputeLibrary
How to Integrate Arm Compute Library with OpenCV?
How to Integrate Arm Compute Library with OpenCV?
Edge AI and Vision Alliance
二、性能提升比对图
ARM's Roberto Mijat Explores Industry Use Cases for Optimized Low-level ARM Primitives
Examples benchmarking problem in RaspberryPi 3B+ #599
三、到底是什么
Goto: A software library for computer vision and machine learning
这个库中分别用OpenCL与NEON的方式实现了一些上述领域的基本算法,
-
- OpenCL主要是arm的Mali GPU加速,
- NEON 是针对arm的A系列CPU。
* Mali GPU --> OpenCL
mali GPU最早由挪威科技大学项目独立出来成立的Falanx公司开发,在2006年被ARM收购,成为ARM一个GPU事业部,并在2007年mali作为arm的一部分,发布了mali-200 GPU.
* NEON
Arm Neon technology is an advanced Single Instruction Multiple Data (SIMD) architecture extension for the Arm Cortex-A series and Cortex-R52 processors.
工程中是把图像按照列的方式分割成子块,然后分别启动几个线程去处理这些子块。对于convolution来说,
* NEON方式实现了两种方法,
(1) 一种是GEMM的方法,把输入图像先im2col,然后interleave操作,把weight进行transposed操作,之后进行矩阵乘法,之所以有interleave与transposed两步是为了矩阵乘法时NEON指令集load数据的连贯性与平顺性,并且不需要重复load,最大限度的发挥了neon指令集的能力。
(2) 还一种方法是标准的卷积运算。当然其中也是运用了NEON的intrinsic函数调用方式。
* OpenCL调用GPU加速的方式我还没有细看,不过大体上看来主要流程与NEON的方式类似,也是按照线程数分割图像,然后并行处理子块。其中也是有shape,window,iterator的概念。只是真正的计算中与NEON的指令集不一样。
这个lib发布之后,开发者可以不用关心arm的cpu与gpu怎样通过NEON或OpenCL来实现硬件的加速,直接调用这个库中的接口就可以,对于开发计算机视觉类的应用但是不太了解硬件加速编程的工程师来说十分有利。
实践进程
一、OpenCV 里集成了么
== 2018 ==
2018年,可能还没有.
2018.12 是 OpenCV 3.4.5: https://opencv.org/releases/page/2/ [release note]
== 2019 ==
Ref: https://github.com/opencv/opencv/issues/13739#
如何集成,属于自定义硬件抽象层的范畴:Custom HAL samples
Goto: OpenCv库的精简,通过编译选项看包含的"组建".

cmake -G "Unix Makefiles" -DCMAKE_TOOLCHAIN_FILE=..\..\android\android.toolchain.cmake ..\..\.. -DANDROID_NDK="D:\Android\sdk\ndk-bundle" -DANDROID_TOOLCHAIN_NAME=arm-linux-androideabi-4.9 -DCMAKE_MAKE_PROGRAM="D:\Android\sdk\ndk-bundle\prebuilt\windows-x86_64\bin\make.exe" -DCMAKE_BUILD_TYPE=Release -DANDROID_ABI="armeabi" -DANDROID_NATIVE_API_LEVEL=14 -DANDROID_FORCE_ARM_BUILD=ON -DWITH_CAROTENE=OFF -DWITH_CLP=OFF -DWITH_CUBLAS=OFF -DWITH_CUDA=OFF -DWITH_CUFFT=OFF -DWITH_EIGEN=OFF -DWITH_GDCM=OFF -DWITH_GSTREAMER_0_10=OFF -DWITH_JASPER=OFF -DWITH_JPEG=OFF -DWITH_NVCUVID=OFF -DWITH_OPENCL=OFF -DWITH_OPENCL_SVM=OFF -DWITH_OPENEXR=OFF -DWITH_OPENMP=OFF -DWITH_OPENVX=OFF -DWITH_PNG=ON -DWITH_PTHREADS_PF=OFF -DWITH_TBB=OFF -DWITH_TIFF=OFF -DWITH_WEBP=OFF -DBUILD_ANDROID_EXAMPLES=OFF -DBUILD_ANDROID_SERVICE=OFF -DBUILD_CUDA_STUBS=OFF -DBUILD_DOCS=OFF -DBUILD_EXAMPLES=OFF -DBUILD_FAT_JAVA_LIB=OFF -DBUILD_JASPER=OFF -DBUILD_JPEG=OFF -DBUILD_OPENEXR=OFF -DBUILD_PACKAGE=OFF -DBUILD_PERF_TESTS=OFF -DBUILD_PNG=ON -DBUILD_SHARED_LIBS=OFF -DBUILD_TBB=OFF -DBUILD_TESTS=OFF -DBUILD_TIFF=OFF -DBUILD_WITH_DEBUG_INFO=OFF -DBUILD_WITH_DYNAMIC_IPP=OFF -DBUILD_opencv_apps=OFF -DBUILD_opencv_calib3d=ON -DBUILD_ZLIB=ON -DBUILD_opencv_core=ON -DBUILD_opencv_features2d=ON -DBUILD_opencv_flann=ON -DBUILD_opencv_highgui=ON -DBUILD_opencv_imgcodecs=ON -DBUILD_opencv_imgproc=ON -DBUILD_opencv_java=OFF -DBUILD_opencv_ml=ON -DBUILD_opencv_objdetect=OFF -DBUILD_opencv_photo=OFF -DBUILD_opencv_shape=OFF -DBUILD_opencv_stitching=OFF -DBUILD_opencv_stereo=OFF -DBUILD_opencv_superres=OFF -DBUILD_opencv_ts=OFF -DBUILD_opencv_video=OFF -DBUILD_opencv_videoio=OFF -DBUILD_opencv_line_descriptor=OFF -DBUILD_opencv_reg=OFF -DBUILD_opencv_saliency=OFF -DBUILD_opencv_videostab=OFF -DBUILD_opencv_world=OFF -DCMAKE_CXX_FLAGS="-ffunction-sections -fdata-sections -fvisibility=hidden -O3 -std=c++11 -mfloat-abi=softfp -mfpu=neon -march=armv7-a -mtune=cortex-a8" -DCMAKE_C_FLAGS="-ffunction-sections -fdata-sections -fvisibility=hidden -O3 -mfloat-abi=softfp -mfpu=neon -march=armv7-a -mtune=cortex-a8" -DCMAKE_SHARED_LINKER_FLAGS="-Wl,--gc-sections" -DBUILD_opencv_xfeatures2d=OFF -DBUILD_opencv_face=OFF -DBUILD_opencv_bgsegm=OFF -DBUILD_opencv_datasets=OFF -DBUILD_opencv_dpm=OFF -DBUILD_opencv_tracking=OFF -DBUILD_opencv_xobjdetect=OFF -DBUILD_opencv_optflow=OFF -DBUILD_opencv_tracking=OFF -DENABLE_NEON=ON -DOPENCV_EXTRA_MODULES_PATH="E:/opencv_contrib-3.2.0/modules" -DBUILD_opencv_ximgproc=ON -DBUILD_opencv_dnn=OFF -DBUILD_opencv_structured_light=OFF -DBUILD_opencv_surface_matching=OFF -DBUILD_opencv_text=OFF -DBUILD_opencv_xphoto=OFF -DBUILD_opencv_fuzzy=OFF -DBUILD_opencv_bioinspired=OFF -DBUILD_opencv_phase_unwrapping=OFF -DBUILD_opencv_plot=OFF -DBUILD_opencv_rgbd=OFF -DBUILD_opencv_aruco=OFF
Carotene
This is Carotene, a low-level library containing optimized CPU routines that are useful for computer vision algorithms.
在OpenCV代码中找寻ACL的痕迹 in opencv/3rdparty/carotene/
二、如何单独使用
Tutorials:
Tutorial: Cartoonifying Images on Raspberry Pi with the Compute Library
Tutorial: Running AlexNet on Raspberry Pi with Compute Library
The challenges to deploy these applications still has problems such as:
1. Code/Performance portability: One of the problems developers have to face up to, as most of the time the algorithm has to be rewritten from scratch to reach the desired performance.
2. Code optimization on specific architectures: Does the architecture support SIMD acceleration? Does the architecture support FP16 acceleration? Is the architecture 32 or 64-bit? These are just a few questions to have in mind when we want to considerably boost the performance of our algorithms.
ACL API
At the current state the library has roughly 60 functions, accelerated for both Arm Cortex-A CPUs (both aarch32 and aarch64 with NEON support) and Arm Mali GPUs (both Midgard and Bifrost architectures).
The Arm Compute Library is a collection of low-level functions optimized for Arm CPU and GPU architectures targeted at image processing, computer vision, and machine learning. It is available free of charge under a permissive MIT open source license.
Just to name a few:
-
- Image processing: Convolution, Gaussian filtering, Sobel filtering, Warp, Remap
- Computer Vision: Canny Edge, Harris Corner, HOG, Optical Flow
- Machine Learning: S/H/LOWP/GEMM, Convolution Layer, Activation Layer, Fully Connected Layer, Pooling Layer
The library’s collection of functions includes:
-
- Basic arithmetic, mathematical, and binary operator functions
- Color manipulation (conversion, channel extraction, and more)
- Convolution filters (Sobel, Gaussian, and more)
- Canny Edge, Harris corners, optical flow, and more
- Pyramids (such as Laplacians)
- HOG (Histogram of Oriented Gradients)
- SVM (Support Vector Machines)
- H/SGEMM (Half and Single precision General Matrix Multiply)
- Convolutional Neural Networks building blocks (Activation, Convolution, Fully connected, Locally connected, Normalization, Pooling, Soft-max)
卡通效果
In order to achieve the basic cartoon effect, we need to apply the Gaussian filter 5x5 and the Canny edge over the input image.
参考资料:
接口的源代码位置:ComputeLibrary/src/runtime/NEON/functions/
#include "arm_compute/runtime/NEON/NEFunctions.h" #include "arm_compute/core/Types.h" #include "utils/Utils.h" using namespace arm_compute; using namespace utils; int main(int argc, const char **argv) {
Step 1: Image definitions
Image src_img; Image dst_img; Image gaus5x5_img; Image canny_edge_img; if(argc < 2) { // Print help std::cerr << "Usage: ./build/neon_cartoon_effect [input_image.ppm]\n\n"; std::cerr << "No input_image provided\n"; return -1; }
Step 2: Input image initialization
// Open PPM file PPMLoader ppm; ppm.open(argv[1]); // Initialize just the dimensions and format of [your buffers]: ppm.init_image(src_img, Format::U8);
Step 3: Initialization of the images
// Initialize just the dimensions and format of [the images]: gaus5x5_img.allocator()->init(*src_img.info()); canny_edge_img.allocator()->init(*src_img.info()); dst_img.allocator()->init(*src_img.info()); NEGaussian5x5 gaus5x5; NECannyEdge canny_edge; NEArithmeticSubtraction sub;
Step 4: Function configuration
// Configure the functions to call gaus5x5.configure(&src_img, &gaus5x5_img, BorderMode::REPLICATE); canny_edge.configure(&src_img, &canny_edge_img, 100, 80, 3, 1, BorderMode::REPLICATE); sub.configure(&gaus5x5_img, &canny_edge_img, &dst_img, ConvertPolicy::SATURATE);
Step 5: Memory allocation
// Now that the padding requirements are known we can allocate the images: src_img.allocator()->allocate(); dst_img.allocator()->allocate(); gaus5x5_img.allocator()->allocate(); canny_edge_img.allocator()->allocate();
// Step 6: Fill the input image and...run! // Fill the input image with the content of the PPM image ppm.fill_image(src_img); // Execute the functions: gaus5x5.run(); canny_edge.run(); sub.run(); // Save the result to file: save_to_ppm(dst_img, "cartoon_effect.ppm"); }
三、Android如何集成
- 专门的Android版本
Binaries available here:
Doc: https://arm-software.github.io/ComputeLibrary/v19.11.1/
Build for Android: https://arm-software.github.io/ComputeLibrary/v19.11.1/index.xhtml#S3_3_android
相关问题:How to use the Binaries release (android version) ? #434
In order to use the compute library on Android there are few steps to follow to make everything working.
Before starting to present the necessary steps, I would suggest to use the static libs of arm compute library (i.e. libarm_compute-static.a) rather than the dynamic ones,
and to use the arm compute library only with NEON acceleration as there are extra steps to follow for OpenCL.
- ACL on embedded devices.
Ref: Enabling Embedded Inference Engine with the ARM Compute Library: A Case Study
只是cpu优化的基础上,提高25%左右.看来,ACL --> OpenCL才是爷爷,也是TF Lite的小秘密.
相关问题:how to build for android #300
Did you manage to use ACL in your project ?
There shouldn't be anything different from other libraries (Except that you might need to link against OpenCL if you're using it, in which case it will not work on Android O and after).
It means that because of some added security now applications can only link against system libraries that have been explicitly whitelisted by the manufacturer.
Therefore, even if OpenCL is available on your platform, you're unlikely to be able to use it in an App
文档学习
一、ACL OpenCL
Ref: https://arm-software.github.io/ComputeLibrary/v19.11.1/functions_list.xhtml
/* implement */
End.