OpenCL For Opencv

一、Opencv-OCL编程基础

1. Opencv OCL基本编程API

a) Opencv4.2 OCL API

b) Opencv3.1 OCL API

2. 图像处理Kernel实现及CU单元配置

3. Demo实验

        我目前编译使用的opencv版本是opencv4.2【如果版本不同请下载不同版本下的Demo程序】,使用如下官方提供的Opencv-OCL代码,如果只是单纯的测试运行此官方提供的代码不需要有特定的加速设备,直接使用多核心CPU-PC平台即可,因为OpenCL本身就支持了CPU加速,具体代码如下:

  1 // This file is part of OpenCV project.
  2 // It is subject to the license terms in the LICENSE file found in the top-level directory
  3 // of this distribution and at http://opencv.org/license.html
  4 
  5 #include "opencv2/core.hpp"
  6 #include "opencv2/core/ocl.hpp"
  7 #include "opencv2/highgui.hpp"
  8 #include "opencv2/imgcodecs.hpp"
  9 #include "opencv2/imgproc.hpp"
 10 
 11 #include <iostream>
 12 
 13 using namespace std;
 14 using namespace cv;
 15 
 16 static const char* opencl_kernel_src =
 17 "__kernel void magnutude_filter_8u(\n"
 18 "       __global const uchar* src, int src_step, int src_offset,\n"
 19 "       __global uchar* dst, int dst_step, int dst_offset, int dst_rows, int dst_cols,\n"
 20 "       float scale)\n"
 21 "{\n"
 22 "   int x = get_global_id(0);\n"
 23 "   int y = get_global_id(1);\n"
 24 "   if (x < dst_cols && y < dst_rows)\n"
 25 "   {\n"
 26 "       int dst_idx = y * dst_step + x + dst_offset;\n"
 27 "       if (x > 0 && x < dst_cols - 1 && y > 0 && y < dst_rows - 2)\n"
 28 "       {\n"
 29 "           int src_idx = y * src_step + x + src_offset;\n"
 30 "           int dx = (int)src[src_idx]*2 - src[src_idx - 1]          - src[src_idx + 1];\n"
 31 "           int dy = (int)src[src_idx]*2 - src[src_idx - 1*src_step] - src[src_idx + 1*src_step];\n"
 32 "           dst[dst_idx] = convert_uchar_sat(sqrt((float)(dx*dx + dy*dy)) * scale);\n"
 33 "       }\n"
 34 "       else\n"
 35 "       {\n"
 36 "           dst[dst_idx] = 0;\n"
 37 "       }\n"
 38 "   }\n"
 39 "}\n";
 40 
 41 int main(int argc, char** argv)
 42 {
 43     const char* keys =
 44         "{ i input    | | specify input image }"
 45         "{ h help     | | print help message }";
 46 
 47     cv::CommandLineParser args(argc, argv, keys);
 48     if (args.has("help"))
 49     {
 50         cout << "Usage : " << argv[0] << " [options]" << endl;
 51         cout << "Available options:" << endl;
 52         args.printMessage();
 53         return EXIT_SUCCESS;
 54     }
 55 
 56     cv::ocl::Context ctx = cv::ocl::Context::getDefault();
 57     if (!ctx.ptr())
 58     {
 59         cerr << "OpenCL is not available" << endl;
 60         return 1;
 61     }
 62     cv::ocl::Device device = cv::ocl::Device::getDefault();
 63     if (!device.compilerAvailable())
 64     {
 65         cerr << "OpenCL compiler is not available" << endl;
 66         return 1;
 67     }
 68 
 69 
 70     UMat src;
 71     {
 72         string image_file = args.get<string>("i");
 73         if (!image_file.empty())
 74         {
 75             Mat image = imread(samples::findFile(image_file));
 76             if (image.empty())
 77             {
 78                 cout << "error read image: " << image_file << endl;
 79                 return 1;
 80             }
 81             cvtColor(image, src, COLOR_BGR2GRAY);
 82         }
 83         else
 84         {
 85             Mat frame(cv::Size(640, 480), CV_8U, Scalar::all(128));
 86             Point p(frame.cols / 2, frame.rows / 2);
 87             line(frame, Point(0, frame.rows / 2), Point(frame.cols, frame.rows / 2), 1);
 88             circle(frame, p, 200, Scalar(32, 32, 32), 8, LINE_AA);
 89             string str = "OpenCL";
 90             int baseLine = 0;
 91             Size box = getTextSize(str, FONT_HERSHEY_COMPLEX, 2, 5, &baseLine);
 92             putText(frame, str, Point((frame.cols - box.width) / 2, (frame.rows - box.height) / 2 + baseLine),
 93                     FONT_HERSHEY_COMPLEX, 2, Scalar(255, 255, 255), 5, LINE_AA);
 94             frame.copyTo(src);
 95         }
 96     }
 97 
 98 
 99     cv::String module_name; // empty to disable OpenCL cache
100 
101     {
102         cout << "OpenCL program source: " << endl;
103         cout << "======================================================================================================" << endl;
104         cout << opencl_kernel_src << endl;
105         cout << "======================================================================================================" << endl;
106         //! [Define OpenCL program source]
107         cv::ocl::ProgramSource source(module_name, "simple", opencl_kernel_src, "");
108         //! [Define OpenCL program source]
109 
110         //! [Compile/build OpenCL for current OpenCL device]
111         cv::String errmsg;
112         cv::ocl::Program program(source, "", errmsg);
113         if (program.ptr() == NULL)
114         {
115             cerr << "Can't compile OpenCL program:" << endl << errmsg << endl;
116             return 1;
117         }
118         //! [Compile/build OpenCL for current OpenCL device]
119 
120         if (!errmsg.empty())
121         {
122             cout << "OpenCL program build log:" << endl << errmsg << endl;
123         }
124 
125         //! [Get OpenCL kernel by name]
126         cv::ocl::Kernel k("magnutude_filter_8u", program);
127         if (k.empty())
128         {
129             cerr << "Can't get OpenCL kernel" << endl;
130             return 1;
131         }
132         //! [Get OpenCL kernel by name]
133 
134         UMat result(src.size(), CV_8UC1);
135 
136         //! [Define kernel parameters and run]
137         size_t globalSize[2] = {(size_t)src.cols, (size_t)src.rows};
138         size_t localSize[2] = {8, 8};
139         bool executionResult = k
140             .args(
141                 cv::ocl::KernelArg::ReadOnlyNoSize(src), // size is not used (similar to 'dst' size)
142                 cv::ocl::KernelArg::WriteOnly(result),
143                 (float)2.0
144             )
145             .run(2, globalSize, localSize, true);
146         if (!executionResult)
147         {
148             cerr << "OpenCL kernel launch failed" << endl;
149             return 1;
150         }
151         //! [Define kernel parameters and run]
152 
153         imshow("Source", src);
154         imshow("Result", result);
155 
156         for (;;)
157         {
158             int key = waitKey();
159             if (key == 27/*ESC*/ || key == 'q' || key == 'Q')
160                 break;
161         }
162     }
163     return 0;
164 }

使用mingw编译上述程序并运行结果如下(根据运行结果,说明对图像进行了边缘提取的功能):

当然如果你想处理其他的图像,也可以使用在CMD窗口当中调用编译完成的.exe文件加上文件的绝对路径。

终端Terminal当中显示了Kernel的基本内容,如下所示:

        从Kernel的结构可以分析出来,kernel中实现的是求解了图像自身的x方向以及y方向的梯度,并求出了每一点的梯度方向,实际上比较类似与Canny边缘算子的检测算法,kernel使用了二维的方式处理图片上每一个点,在kernel核当中,使用了if判断是否存在指针越界的情况,具体的基础实现相关内容请移步OpenCL基础入门

二、Demo代码变形

实现图像3x3均值滤波

三、嵌入式平台移植与编译(TI AM57x 系列)

Opencv3.1.0版本master下,opencv-ocl代码官方Demo

   1 /*
   2 // The example of interoperability between OpenCL and OpenCV.
   3 // This will loop through frames of video either from input media file
   4 // or camera device and do processing of these data in OpenCL and then
   5 // in OpenCV. In OpenCL it does inversion of pixels in left half of frame and
   6 // in OpenCV it does bluring in the right half of frame.
   7 */
   8 #include <cstdio>
   9 #include <cstdlib>
  10 #include <iostream>
  11 #include <fstream>
  12 #include <string>
  13 #include <sstream>
  14 #include <iomanip>
  15 #include <stdexcept>
  16 
  17 #define CL_USE_DEPRECATED_OPENCL_2_0_APIS // eliminate build warning
  18 
  19 #if __APPLE__
  20 #include <OpenCL/cl.h>
  21 #else
  22 #include <CL/cl.h>
  23 #endif
  24 
  25 #include <opencv2/core/ocl.hpp>
  26 #include <opencv2/core/utility.hpp>
  27 #include <opencv2/video.hpp>
  28 #include <opencv2/highgui.hpp>
  29 #include <opencv2/imgproc.hpp>
  30 
  31 
  32 using namespace std;
  33 using namespace cv;
  34 
  35 namespace opencl {
  36 
  37 class PlatformInfo
  38 {
  39 public:
  40     PlatformInfo()
  41     {}
  42 
  43     ~PlatformInfo()
  44     {}
  45 
  46     cl_int QueryInfo(cl_platform_id id)
  47     {
  48         query_param(id, CL_PLATFORM_PROFILE, m_profile);
  49         query_param(id, CL_PLATFORM_VERSION, m_version);
  50         query_param(id, CL_PLATFORM_NAME, m_name);
  51         query_param(id, CL_PLATFORM_VENDOR, m_vendor);
  52         query_param(id, CL_PLATFORM_EXTENSIONS, m_extensions);
  53         return CL_SUCCESS;
  54     }
  55 
  56     std::string Profile()    { return m_profile; }
  57     std::string Version()    { return m_version; }
  58     std::string Name()       { return m_name; }
  59     std::string Vendor()     { return m_vendor; }
  60     std::string Extensions() { return m_extensions; }
  61 
  62 private:
  63     cl_int query_param(cl_platform_id id, cl_platform_info param, std::string& paramStr)
  64     {
  65         cl_int res;
  66 
  67         size_t psize;
  68         cv::AutoBuffer<char> buf;
  69 
  70         res = clGetPlatformInfo(id, param, 0, 0, &psize);
  71         if (CL_SUCCESS != res)
  72             throw std::runtime_error(std::string("clGetPlatformInfo failed"));
  73 
  74         buf.resize(psize);
  75         res = clGetPlatformInfo(id, param, psize, buf, 0);
  76         if (CL_SUCCESS != res)
  77             throw std::runtime_error(std::string("clGetPlatformInfo failed"));
  78 
  79         // just in case, ensure trailing zero for ASCIIZ string
  80         buf[psize] = 0;
  81 
  82         paramStr = buf;
  83 
  84         return CL_SUCCESS;
  85     }
  86 
  87 private:
  88     std::string m_profile;
  89     std::string m_version;
  90     std::string m_name;
  91     std::string m_vendor;
  92     std::string m_extensions;
  93 };
  94 
  95 
  96 class DeviceInfo
  97 {
  98 public:
  99     DeviceInfo()
 100     {}
 101 
 102     ~DeviceInfo()
 103     {}
 104 
 105     cl_int QueryInfo(cl_device_id id)
 106     {
 107         query_param(id, CL_DEVICE_TYPE, m_type);
 108         query_param(id, CL_DEVICE_VENDOR_ID, m_vendor_id);
 109         query_param(id, CL_DEVICE_MAX_COMPUTE_UNITS, m_max_compute_units);
 110         query_param(id, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, m_max_work_item_dimensions);
 111         query_param(id, CL_DEVICE_MAX_WORK_ITEM_SIZES, m_max_work_item_sizes);
 112         query_param(id, CL_DEVICE_MAX_WORK_GROUP_SIZE, m_max_work_group_size);
 113         query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, m_preferred_vector_width_char);
 114         query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, m_preferred_vector_width_short);
 115         query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, m_preferred_vector_width_int);
 116         query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, m_preferred_vector_width_long);
 117         query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, m_preferred_vector_width_float);
 118         query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, m_preferred_vector_width_double);
 119 #if defined(CL_VERSION_1_1)
 120         query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, m_preferred_vector_width_half);
 121         query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, m_native_vector_width_char);
 122         query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, m_native_vector_width_short);
 123         query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, m_native_vector_width_int);
 124         query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, m_native_vector_width_long);
 125         query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, m_native_vector_width_float);
 126         query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, m_native_vector_width_double);
 127         query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, m_native_vector_width_half);
 128 #endif
 129         query_param(id, CL_DEVICE_MAX_CLOCK_FREQUENCY, m_max_clock_frequency);
 130         query_param(id, CL_DEVICE_ADDRESS_BITS, m_address_bits);
 131         query_param(id, CL_DEVICE_MAX_MEM_ALLOC_SIZE, m_max_mem_alloc_size);
 132         query_param(id, CL_DEVICE_IMAGE_SUPPORT, m_image_support);
 133         query_param(id, CL_DEVICE_MAX_READ_IMAGE_ARGS, m_max_read_image_args);
 134         query_param(id, CL_DEVICE_MAX_WRITE_IMAGE_ARGS, m_max_write_image_args);
 135 #if defined(CL_VERSION_2_0)
 136         query_param(id, CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS, m_max_read_write_image_args);
 137 #endif
 138         query_param(id, CL_DEVICE_IMAGE2D_MAX_WIDTH, m_image2d_max_width);
 139         query_param(id, CL_DEVICE_IMAGE2D_MAX_HEIGHT, m_image2d_max_height);
 140         query_param(id, CL_DEVICE_IMAGE3D_MAX_WIDTH, m_image3d_max_width);
 141         query_param(id, CL_DEVICE_IMAGE3D_MAX_HEIGHT, m_image3d_max_height);
 142         query_param(id, CL_DEVICE_IMAGE3D_MAX_DEPTH, m_image3d_max_depth);
 143 #if defined(CL_VERSION_1_2)
 144         query_param(id, CL_DEVICE_IMAGE_MAX_BUFFER_SIZE, m_image_max_buffer_size);
 145         query_param(id, CL_DEVICE_IMAGE_MAX_ARRAY_SIZE, m_image_max_array_size);
 146 #endif
 147         query_param(id, CL_DEVICE_MAX_SAMPLERS, m_max_samplers);
 148 #if defined(CL_VERSION_1_2)
 149         query_param(id, CL_DEVICE_IMAGE_PITCH_ALIGNMENT, m_image_pitch_alignment);
 150         query_param(id, CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT, m_image_base_address_alignment);
 151 #endif
 152 #if defined(CL_VERSION_2_0)
 153         query_param(id, CL_DEVICE_MAX_PIPE_ARGS, m_max_pipe_args);
 154         query_param(id, CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS, m_pipe_max_active_reservations);
 155         query_param(id, CL_DEVICE_PIPE_MAX_PACKET_SIZE, m_pipe_max_packet_size);
 156 #endif
 157         query_param(id, CL_DEVICE_MAX_PARAMETER_SIZE, m_max_parameter_size);
 158         query_param(id, CL_DEVICE_MEM_BASE_ADDR_ALIGN, m_mem_base_addr_align);
 159         query_param(id, CL_DEVICE_SINGLE_FP_CONFIG, m_single_fp_config);
 160 #if defined(CL_VERSION_1_2)
 161         query_param(id, CL_DEVICE_DOUBLE_FP_CONFIG, m_double_fp_config);
 162 #endif
 163         query_param(id, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, m_global_mem_cache_type);
 164         query_param(id, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, m_global_mem_cacheline_size);
 165         query_param(id, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, m_global_mem_cache_size);
 166         query_param(id, CL_DEVICE_GLOBAL_MEM_SIZE, m_global_mem_size);
 167         query_param(id, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, m_max_constant_buffer_size);
 168         query_param(id, CL_DEVICE_MAX_CONSTANT_ARGS, m_max_constant_args);
 169 #if defined(CL_VERSION_2_0)
 170         query_param(id, CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE, m_max_global_variable_size);
 171         query_param(id, CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE, m_global_variable_preferred_total_size);
 172 #endif
 173         query_param(id, CL_DEVICE_LOCAL_MEM_TYPE, m_local_mem_type);
 174         query_param(id, CL_DEVICE_LOCAL_MEM_SIZE, m_local_mem_size);
 175         query_param(id, CL_DEVICE_ERROR_CORRECTION_SUPPORT, m_error_correction_support);
 176 #if defined(CL_VERSION_1_1)
 177         query_param(id, CL_DEVICE_HOST_UNIFIED_MEMORY, m_host_unified_memory);
 178 #endif
 179         query_param(id, CL_DEVICE_PROFILING_TIMER_RESOLUTION, m_profiling_timer_resolution);
 180         query_param(id, CL_DEVICE_ENDIAN_LITTLE, m_endian_little);
 181         query_param(id, CL_DEVICE_AVAILABLE, m_available);
 182         query_param(id, CL_DEVICE_COMPILER_AVAILABLE, m_compiler_available);
 183 #if defined(CL_VERSION_1_2)
 184         query_param(id, CL_DEVICE_LINKER_AVAILABLE, m_linker_available);
 185 #endif
 186         query_param(id, CL_DEVICE_EXECUTION_CAPABILITIES, m_execution_capabilities);
 187         query_param(id, CL_DEVICE_QUEUE_PROPERTIES, m_queue_properties);
 188 #if defined(CL_VERSION_2_0)
 189         query_param(id, CL_DEVICE_QUEUE_ON_HOST_PROPERTIES, m_queue_on_host_properties);
 190         query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES, m_queue_on_device_properties);
 191         query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE, m_queue_on_device_preferred_size);
 192         query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE, m_queue_on_device_max_size);
 193         query_param(id, CL_DEVICE_MAX_ON_DEVICE_QUEUES, m_max_on_device_queues);
 194         query_param(id, CL_DEVICE_MAX_ON_DEVICE_EVENTS, m_max_on_device_events);
 195 #endif
 196 #if defined(CL_VERSION_1_2)
 197         query_param(id, CL_DEVICE_BUILT_IN_KERNELS, m_built_in_kernels);
 198 #endif
 199         query_param(id, CL_DEVICE_PLATFORM, m_platform);
 200         query_param(id, CL_DEVICE_NAME, m_name);
 201         query_param(id, CL_DEVICE_VENDOR, m_vendor);
 202         query_param(id, CL_DRIVER_VERSION, m_driver_version);
 203         query_param(id, CL_DEVICE_PROFILE, m_profile);
 204         query_param(id, CL_DEVICE_VERSION, m_version);
 205 #if defined(CL_VERSION_1_1)
 206         query_param(id, CL_DEVICE_OPENCL_C_VERSION, m_opencl_c_version);
 207 #endif
 208         query_param(id, CL_DEVICE_EXTENSIONS, m_extensions);
 209 #if defined(CL_VERSION_1_2)
 210         query_param(id, CL_DEVICE_PRINTF_BUFFER_SIZE, m_printf_buffer_size);
 211         query_param(id, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC, m_preferred_interop_user_sync);
 212         query_param(id, CL_DEVICE_PARENT_DEVICE, m_parent_device);
 213         query_param(id, CL_DEVICE_PARTITION_MAX_SUB_DEVICES, m_partition_max_sub_devices);
 214         query_param(id, CL_DEVICE_PARTITION_PROPERTIES, m_partition_properties);
 215         query_param(id, CL_DEVICE_PARTITION_AFFINITY_DOMAIN, m_partition_affinity_domain);
 216         query_param(id, CL_DEVICE_PARTITION_TYPE, m_partition_type);
 217         query_param(id, CL_DEVICE_REFERENCE_COUNT, m_reference_count);
 218 #endif
 219         return CL_SUCCESS;
 220     }
 221 
 222     std::string Name() { return m_name; }
 223 
 224 private:
 225     template<typename T>
 226     cl_int query_param(cl_device_id id, cl_device_info param, T& value)
 227     {
 228         cl_int res;
 229         size_t size = 0;
 230 
 231         res = clGetDeviceInfo(id, param, 0, 0, &size);
 232         if (CL_SUCCESS != res && size != 0)
 233             throw std::runtime_error(std::string("clGetDeviceInfo failed"));
 234 
 235         if (0 == size)
 236             return CL_SUCCESS;
 237 
 238         if (sizeof(T) != size)
 239             throw std::runtime_error(std::string("clGetDeviceInfo: param size mismatch"));
 240 
 241         res = clGetDeviceInfo(id, param, size, &value, 0);
 242         if (CL_SUCCESS != res)
 243             throw std::runtime_error(std::string("clGetDeviceInfo failed"));
 244 
 245         return CL_SUCCESS;
 246     }
 247 
 248     template<typename T>
 249     cl_int query_param(cl_device_id id, cl_device_info param, std::vector<T>& value)
 250     {
 251         cl_int res;
 252         size_t size;
 253 
 254         res = clGetDeviceInfo(id, param, 0, 0, &size);
 255         if (CL_SUCCESS != res)
 256             throw std::runtime_error(std::string("clGetDeviceInfo failed"));
 257 
 258         if (0 == size)
 259             return CL_SUCCESS;
 260 
 261         value.resize(size / sizeof(T));
 262 
 263         res = clGetDeviceInfo(id, param, size, &value[0], 0);
 264         if (CL_SUCCESS != res)
 265             throw std::runtime_error(std::string("clGetDeviceInfo failed"));
 266 
 267         return CL_SUCCESS;
 268     }
 269 
 270     cl_int query_param(cl_device_id id, cl_device_info param, std::string& value)
 271     {
 272         cl_int res;
 273         size_t size;
 274 
 275         res = clGetDeviceInfo(id, param, 0, 0, &size);
 276         if (CL_SUCCESS != res)
 277             throw std::runtime_error(std::string("clGetDeviceInfo failed"));
 278 
 279         value.resize(size + 1);
 280 
 281         res = clGetDeviceInfo(id, param, size, &value[0], 0);
 282         if (CL_SUCCESS != res)
 283             throw std::runtime_error(std::string("clGetDeviceInfo failed"));
 284 
 285         // just in case, ensure trailing zero for ASCIIZ string
 286         value[size] = 0;
 287 
 288         return CL_SUCCESS;
 289     }
 290 
 291 private:
 292     cl_device_type                            m_type;
 293     cl_uint                                   m_vendor_id;
 294     cl_uint                                   m_max_compute_units;
 295     cl_uint                                   m_max_work_item_dimensions;
 296     std::vector<size_t>                       m_max_work_item_sizes;
 297     size_t                                    m_max_work_group_size;
 298     cl_uint                                   m_preferred_vector_width_char;
 299     cl_uint                                   m_preferred_vector_width_short;
 300     cl_uint                                   m_preferred_vector_width_int;
 301     cl_uint                                   m_preferred_vector_width_long;
 302     cl_uint                                   m_preferred_vector_width_float;
 303     cl_uint                                   m_preferred_vector_width_double;
 304 #if defined(CL_VERSION_1_1)
 305     cl_uint                                   m_preferred_vector_width_half;
 306     cl_uint                                   m_native_vector_width_char;
 307     cl_uint                                   m_native_vector_width_short;
 308     cl_uint                                   m_native_vector_width_int;
 309     cl_uint                                   m_native_vector_width_long;
 310     cl_uint                                   m_native_vector_width_float;
 311     cl_uint                                   m_native_vector_width_double;
 312     cl_uint                                   m_native_vector_width_half;
 313 #endif
 314     cl_uint                                   m_max_clock_frequency;
 315     cl_uint                                   m_address_bits;
 316     cl_ulong                                  m_max_mem_alloc_size;
 317     cl_bool                                   m_image_support;
 318     cl_uint                                   m_max_read_image_args;
 319     cl_uint                                   m_max_write_image_args;
 320 #if defined(CL_VERSION_2_0)
 321     cl_uint                                   m_max_read_write_image_args;
 322 #endif
 323     size_t                                    m_image2d_max_width;
 324     size_t                                    m_image2d_max_height;
 325     size_t                                    m_image3d_max_width;
 326     size_t                                    m_image3d_max_height;
 327     size_t                                    m_image3d_max_depth;
 328 #if defined(CL_VERSION_1_2)
 329     size_t                                    m_image_max_buffer_size;
 330     size_t                                    m_image_max_array_size;
 331 #endif
 332     cl_uint                                   m_max_samplers;
 333 #if defined(CL_VERSION_1_2)
 334     cl_uint                                   m_image_pitch_alignment;
 335     cl_uint                                   m_image_base_address_alignment;
 336 #endif
 337 #if defined(CL_VERSION_2_0)
 338     cl_uint                                   m_max_pipe_args;
 339     cl_uint                                   m_pipe_max_active_reservations;
 340     cl_uint                                   m_pipe_max_packet_size;
 341 #endif
 342     size_t                                    m_max_parameter_size;
 343     cl_uint                                   m_mem_base_addr_align;
 344     cl_device_fp_config                       m_single_fp_config;
 345 #if defined(CL_VERSION_1_2)
 346     cl_device_fp_config                       m_double_fp_config;
 347 #endif
 348     cl_device_mem_cache_type                  m_global_mem_cache_type;
 349     cl_uint                                   m_global_mem_cacheline_size;
 350     cl_ulong                                  m_global_mem_cache_size;
 351     cl_ulong                                  m_global_mem_size;
 352     cl_ulong                                  m_max_constant_buffer_size;
 353     cl_uint                                   m_max_constant_args;
 354 #if defined(CL_VERSION_2_0)
 355     size_t                                    m_max_global_variable_size;
 356     size_t                                    m_global_variable_preferred_total_size;
 357 #endif
 358     cl_device_local_mem_type                  m_local_mem_type;
 359     cl_ulong                                  m_local_mem_size;
 360     cl_bool                                   m_error_correction_support;
 361 #if defined(CL_VERSION_1_1)
 362     cl_bool                                   m_host_unified_memory;
 363 #endif
 364     size_t                                    m_profiling_timer_resolution;
 365     cl_bool                                   m_endian_little;
 366     cl_bool                                   m_available;
 367     cl_bool                                   m_compiler_available;
 368 #if defined(CL_VERSION_1_2)
 369     cl_bool                                   m_linker_available;
 370 #endif
 371     cl_device_exec_capabilities               m_execution_capabilities;
 372     cl_command_queue_properties               m_queue_properties;
 373 #if defined(CL_VERSION_2_0)
 374     cl_command_queue_properties               m_queue_on_host_properties;
 375     cl_command_queue_properties               m_queue_on_device_properties;
 376     cl_uint                                   m_queue_on_device_preferred_size;
 377     cl_uint                                   m_queue_on_device_max_size;
 378     cl_uint                                   m_max_on_device_queues;
 379     cl_uint                                   m_max_on_device_events;
 380 #endif
 381 #if defined(CL_VERSION_1_2)
 382     std::string                               m_built_in_kernels;
 383 #endif
 384     cl_platform_id                            m_platform;
 385     std::string                               m_name;
 386     std::string                               m_vendor;
 387     std::string                               m_driver_version;
 388     std::string                               m_profile;
 389     std::string                               m_version;
 390 #if defined(CL_VERSION_1_1)
 391     std::string                               m_opencl_c_version;
 392 #endif
 393     std::string                               m_extensions;
 394 #if defined(CL_VERSION_1_2)
 395     size_t                                    m_printf_buffer_size;
 396     cl_bool                                   m_preferred_interop_user_sync;
 397     cl_device_id                              m_parent_device;
 398     cl_uint                                   m_partition_max_sub_devices;
 399     std::vector<cl_device_partition_property> m_partition_properties;
 400     cl_device_affinity_domain                 m_partition_affinity_domain;
 401     std::vector<cl_device_partition_property> m_partition_type;
 402     cl_uint                                   m_reference_count;
 403 #endif
 404 };
 405 
 406 } // namespace opencl
 407 
 408 
 409 class App
 410 {
 411 public:
 412     App(CommandLineParser& cmd);
 413     ~App();
 414 
 415     int initOpenCL();
 416     int initVideoSource();
 417 
 418     int process_frame_with_open_cl(cv::Mat& frame, bool use_buffer, cl_mem* cl_buffer);
 419     int process_cl_buffer_with_opencv(cl_mem buffer, size_t step, int rows, int cols, int type, cv::UMat& u);
 420     int process_cl_image_with_opencv(cl_mem image, cv::UMat& u);
 421 
 422     int run();
 423 
 424     bool isRunning() { return m_running; }
 425     bool doProcess() { return m_process; }
 426     bool useBuffer() { return m_use_buffer; }
 427 
 428     void setRunning(bool running)      { m_running = running; }
 429     void setDoProcess(bool process)    { m_process = process; }
 430     void setUseBuffer(bool use_buffer) { m_use_buffer = use_buffer; }
 431 
 432 protected:
 433     bool nextFrame(cv::Mat& frame) { return m_cap.read(frame); }
 434     void handleKey(char key);
 435     void timerStart();
 436     void timerEnd();
 437     std::string timeStr() const;
 438     std::string message() const;
 439 
 440 private:
 441     bool                        m_running;
 442     bool                        m_process;
 443     bool                        m_use_buffer;
 444 
 445     int64                       m_t0;
 446     int64                       m_t1;
 447     float                       m_time;
 448     float                       m_frequency;
 449 
 450     string                      m_file_name;
 451     int                         m_camera_id;
 452     cv::VideoCapture            m_cap;
 453     cv::Mat                     m_frame;
 454     cv::Mat                     m_frameGray;
 455 
 456     opencl::PlatformInfo        m_platformInfo;
 457     opencl::DeviceInfo          m_deviceInfo;
 458     std::vector<cl_platform_id> m_platform_ids;
 459     cl_context                  m_context;
 460     cl_device_id                m_device_id;
 461     cl_command_queue            m_queue;
 462     cl_program                  m_program;
 463     cl_kernel                   m_kernelBuf;
 464     cl_kernel                   m_kernelImg;
 465     cl_mem                      m_img_src; // used as src in case processing of cl image
 466     cl_mem                      m_mem_obj;
 467     cl_event                    m_event;
 468 };
 469 
 470 
 471 App::App(CommandLineParser& cmd)
 472 {
 473     cout << "\nPress ESC to exit\n" << endl;
 474     cout << "\n      'p' to toggle ON/OFF processing\n" << endl;
 475     cout << "\n       SPACE to switch between OpenCL buffer/image\n" << endl;
 476 
 477     m_camera_id  = cmd.get<int>("camera");
 478     m_file_name  = cmd.get<string>("video");
 479 
 480     m_running    = false;
 481     m_process    = false;
 482     m_use_buffer = false;
 483 
 484     m_t0         = 0;
 485     m_t1         = 0;
 486     m_time       = 0.0;
 487     m_frequency  = (float)cv::getTickFrequency();
 488 
 489     m_context    = 0;
 490     m_device_id  = 0;
 491     m_queue      = 0;
 492     m_program    = 0;
 493     m_kernelBuf  = 0;
 494     m_kernelImg  = 0;
 495     m_img_src    = 0;
 496     m_mem_obj    = 0;
 497     m_event      = 0;
 498 } // ctor
 499 
 500 
 501 App::~App()
 502 {
 503     if (m_queue)
 504     {
 505         clFinish(m_queue);
 506         clReleaseCommandQueue(m_queue);
 507         m_queue = 0;
 508     }
 509 
 510     if (m_program)
 511     {
 512         clReleaseProgram(m_program);
 513         m_program = 0;
 514     }
 515 
 516     if (m_img_src)
 517     {
 518         clReleaseMemObject(m_img_src);
 519         m_img_src = 0;
 520     }
 521 
 522     if (m_mem_obj)
 523     {
 524         clReleaseMemObject(m_mem_obj);
 525         m_mem_obj = 0;
 526     }
 527 
 528     if (m_event)
 529     {
 530         clReleaseEvent(m_event);
 531     }
 532 
 533     if (m_kernelBuf)
 534     {
 535         clReleaseKernel(m_kernelBuf);
 536         m_kernelBuf = 0;
 537     }
 538 
 539     if (m_kernelImg)
 540     {
 541         clReleaseKernel(m_kernelImg);
 542         m_kernelImg = 0;
 543     }
 544 
 545     if (m_device_id)
 546     {
 547         clReleaseDevice(m_device_id);
 548         m_device_id = 0;
 549     }
 550 
 551     if (m_context)
 552     {
 553         clReleaseContext(m_context);
 554         m_context = 0;
 555     }
 556 } // dtor
 557 
 558 
 559 int App::initOpenCL()
 560 {
 561     cl_int res = CL_SUCCESS;
 562     cl_uint num_entries = 0;
 563 
 564     res = clGetPlatformIDs(0, 0, &num_entries);
 565     if (CL_SUCCESS != res)
 566         return -1;
 567 
 568     m_platform_ids.resize(num_entries);
 569 
 570     res = clGetPlatformIDs(num_entries, &m_platform_ids[0], 0);
 571     if (CL_SUCCESS != res)
 572         return -1;
 573 
 574     unsigned int i;
 575 
 576     // create context from first platform with GPU device
 577     for (i = 0; i < m_platform_ids.size(); i++)
 578     {
 579         cl_context_properties props[] =
 580         {
 581             CL_CONTEXT_PLATFORM,
 582             (cl_context_properties)(m_platform_ids[i]),
 583             0
 584         };
 585 
 586         m_context = clCreateContextFromType(props, CL_DEVICE_TYPE_GPU, 0, 0, &res);
 587         if (0 == m_context || CL_SUCCESS != res)
 588             continue;
 589 
 590         res = clGetContextInfo(m_context, CL_CONTEXT_DEVICES, sizeof(cl_device_id), &m_device_id, 0);
 591         if (CL_SUCCESS != res)
 592             return -1;
 593 
 594         m_queue = clCreateCommandQueue(m_context, m_device_id, 0, &res);
 595         if (0 == m_queue || CL_SUCCESS != res)
 596             return -1;
 597 
 598         const char* kernelSrc =
 599             "__kernel "
 600             "void bitwise_inv_buf_8uC1("
 601             "    __global unsigned char* pSrcDst,"
 602             "             int            srcDstStep,"
 603             "             int            rows,"
 604             "             int            cols)"
 605             "{"
 606             "    int x = get_global_id(0);"
 607             "    int y = get_global_id(1);"
 608             "    int idx = mad24(y, srcDstStep, x);"
 609             "    pSrcDst[idx] = ~pSrcDst[idx];"
 610             "}"
 611             "__kernel "
 612             "void bitwise_inv_img_8uC1("
 613             "    read_only  image2d_t srcImg,"
 614             "    write_only image2d_t dstImg)"
 615             "{"
 616             "    int x = get_global_id(0);"
 617             "    int y = get_global_id(1);"
 618             "    int2 coord = (int2)(x, y);"
 619             "    uint4 val = read_imageui(srcImg, coord);"
 620             "    val.x = (~val.x) & 0x000000FF;"
 621             "    write_imageui(dstImg, coord, val);"
 622             "}";
 623         size_t len = strlen(kernelSrc);
 624         m_program = clCreateProgramWithSource(m_context, 1, &kernelSrc, &len, &res);
 625         if (0 == m_program || CL_SUCCESS != res)
 626             return -1;
 627 
 628         res = clBuildProgram(m_program, 1, &m_device_id, 0, 0, 0);
 629         if (CL_SUCCESS != res)
 630             return -1;
 631 
 632         m_kernelBuf = clCreateKernel(m_program, "bitwise_inv_buf_8uC1", &res);
 633         if (0 == m_kernelBuf || CL_SUCCESS != res)
 634             return -1;
 635 
 636         m_kernelImg = clCreateKernel(m_program, "bitwise_inv_img_8uC1", &res);
 637         if (0 == m_kernelImg || CL_SUCCESS != res)
 638             return -1;
 639 
 640         m_platformInfo.QueryInfo(m_platform_ids[i]);
 641         m_deviceInfo.QueryInfo(m_device_id);
 642 
 643         // attach OpenCL context to OpenCV
 644         cv::ocl::attachContext(m_platformInfo.Name(), m_platform_ids[i], m_context, m_device_id);
 645 
 646         break;
 647     }
 648 
 649     return m_context != 0 ? CL_SUCCESS : -1;
 650 } // initOpenCL()
 651 
 652 
 653 int App::initVideoSource()
 654 {
 655     try
 656     {
 657         if (!m_file_name.empty() && m_camera_id == -1)
 658         {
 659             m_cap.open(m_file_name.c_str());
 660             if (!m_cap.isOpened())
 661                 throw std::runtime_error(std::string("can't open video file: " + m_file_name));
 662         }
 663         else if (m_camera_id != -1)
 664         {
 665             m_cap.open(m_camera_id);
 666             if (!m_cap.isOpened())
 667             {
 668                 std::stringstream msg;
 669                 msg << "can't open camera: " << m_camera_id;
 670                 throw std::runtime_error(msg.str());
 671             }
 672         }
 673         else
 674             throw std::runtime_error(std::string("specify video source"));
 675     }
 676 
 677     catch (std::exception e)
 678     {
 679         cerr << "ERROR: " << e.what() << std::endl;
 680         return -1;
 681     }
 682 
 683     return 0;
 684 } // initVideoSource()
 685 
 686 
 687 // this function is an example of "typical" OpenCL processing pipeline
 688 // It creates OpenCL buffer or image, depending on use_buffer flag,
 689 // from input media frame and process these data
 690 // (inverts each pixel value in half of frame) with OpenCL kernel
 691 int App::process_frame_with_open_cl(cv::Mat& frame, bool use_buffer, cl_mem* mem_obj)
 692 {
 693     cl_int res = CL_SUCCESS;
 694 
 695     CV_Assert(mem_obj);
 696 
 697     cl_kernel kernel = 0;
 698     cl_mem mem = mem_obj[0];
 699 
 700     if (0 == mem || 0 == m_img_src)
 701     {
 702         // allocate/delete cl memory objects every frame for the simplicity.
 703         // in real applicaton more efficient pipeline can be built.
 704 
 705         if (use_buffer)
 706         {
 707             cl_mem_flags flags = CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR;
 708 
 709             mem = clCreateBuffer(m_context, flags, frame.total(), frame.ptr(), &res);
 710             if (0 == mem || CL_SUCCESS != res)
 711                 return -1;
 712 
 713             res = clSetKernelArg(m_kernelBuf, 0, sizeof(cl_mem), &mem);
 714             if (CL_SUCCESS != res)
 715                 return -1;
 716 
 717             res = clSetKernelArg(m_kernelBuf, 1, sizeof(int), &frame.step[0]);
 718             if (CL_SUCCESS != res)
 719                 return -1;
 720 
 721             res = clSetKernelArg(m_kernelBuf, 2, sizeof(int), &frame.rows);
 722             if (CL_SUCCESS != res)
 723                 return -1;
 724 
 725             int cols2 = frame.cols / 2;
 726             res = clSetKernelArg(m_kernelBuf, 3, sizeof(int), &cols2);
 727             if (CL_SUCCESS != res)
 728                 return -1;
 729 
 730             kernel = m_kernelBuf;
 731         }
 732         else
 733         {
 734             cl_mem_flags flags_src = CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR;
 735 
 736             cl_image_format fmt;
 737             fmt.image_channel_order     = CL_R;
 738             fmt.image_channel_data_type = CL_UNSIGNED_INT8;
 739 
 740             cl_image_desc desc_src;
 741             desc_src.image_type        = CL_MEM_OBJECT_IMAGE2D;
 742             desc_src.image_width       = frame.cols;
 743             desc_src.image_height      = frame.rows;
 744             desc_src.image_depth       = 0;
 745             desc_src.image_array_size  = 0;
 746             desc_src.image_row_pitch   = frame.step[0];
 747             desc_src.image_slice_pitch = 0;
 748             desc_src.num_mip_levels    = 0;
 749             desc_src.num_samples       = 0;
 750             desc_src.buffer            = 0;
 751             m_img_src = clCreateImage(m_context, flags_src, &fmt, &desc_src, frame.ptr(), &res);
 752             if (0 == m_img_src || CL_SUCCESS != res)
 753                 return -1;
 754 
 755             cl_mem_flags flags_dst = CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR;
 756 
 757             cl_image_desc desc_dst;
 758             desc_dst.image_type        = CL_MEM_OBJECT_IMAGE2D;
 759             desc_dst.image_width       = frame.cols;
 760             desc_dst.image_height      = frame.rows;
 761             desc_dst.image_depth       = 0;
 762             desc_dst.image_array_size  = 0;
 763             desc_dst.image_row_pitch   = 0;
 764             desc_dst.image_slice_pitch = 0;
 765             desc_dst.num_mip_levels    = 0;
 766             desc_dst.num_samples       = 0;
 767             desc_dst.buffer            = 0;
 768             mem = clCreateImage(m_context, flags_dst, &fmt, &desc_dst, 0, &res);
 769             if (0 == mem || CL_SUCCESS != res)
 770                 return -1;
 771 
 772             size_t origin[] = { 0, 0, 0 };
 773             size_t region[] = { (size_t)frame.cols, (size_t)frame.rows, 1 };
 774             res = clEnqueueCopyImage(m_queue, m_img_src, mem, origin, origin, region, 0, 0, &m_event);
 775             if (CL_SUCCESS != res)
 776                 return -1;
 777 
 778             res = clWaitForEvents(1, &m_event);
 779             if (CL_SUCCESS != res)
 780                 return -1;
 781 
 782             res = clSetKernelArg(m_kernelImg, 0, sizeof(cl_mem), &m_img_src);
 783             if (CL_SUCCESS != res)
 784                 return -1;
 785 
 786             res = clSetKernelArg(m_kernelImg, 1, sizeof(cl_mem), &mem);
 787             if (CL_SUCCESS != res)
 788                 return -1;
 789 
 790             kernel = m_kernelImg;
 791         }
 792     }
 793 
 794     m_event = clCreateUserEvent(m_context, &res);
 795     if (0 == m_event || CL_SUCCESS != res)
 796         return -1;
 797 
 798     // process left half of frame in OpenCL
 799     size_t size[] = { (size_t)frame.cols / 2, (size_t)frame.rows };
 800     res = clEnqueueNDRangeKernel(m_queue, kernel, 2, 0, size, 0, 0, 0, &m_event);
 801     if (CL_SUCCESS != res)
 802         return -1;
 803 
 804     res = clWaitForEvents(1, &m_event);
 805     if (CL_SUCCESS != res)
 806         return - 1;
 807 
 808     mem_obj[0] = mem;
 809 
 810     return  0;
 811 }
 812 
 813 
 814 // this function is an example of interoperability between OpenCL buffer
 815 // and OpenCV UMat objects. It converts (without copying data) OpenCL buffer
 816 // to OpenCV UMat and then do blur on these data
 817 int App::process_cl_buffer_with_opencv(cl_mem buffer, size_t step, int rows, int cols, int type, cv::UMat& u)
 818 {
 819     cv::ocl::convertFromBuffer(buffer, step, rows, cols, type, u);
 820 
 821     // process right half of frame in OpenCV
 822     cv::Point pt(u.cols / 2, 0);
 823     cv::Size  sz(u.cols / 2, u.rows);
 824     cv::Rect roi(pt, sz);
 825     cv::UMat uroi(u, roi);
 826     cv::blur(uroi, uroi, cv::Size(7, 7), cv::Point(-3, -3));
 827 
 828     if (buffer)
 829         clReleaseMemObject(buffer);
 830     m_mem_obj = 0;
 831 
 832     return 0;
 833 }
 834 
 835 
 836 // this function is an example of interoperability between OpenCL image
 837 // and OpenCV UMat objects. It converts OpenCL image
 838 // to OpenCV UMat and then do blur on these data
 839 int App::process_cl_image_with_opencv(cl_mem image, cv::UMat& u)
 840 {
 841     cv::ocl::convertFromImage(image, u);
 842 
 843     // process right half of frame in OpenCV
 844     cv::Point pt(u.cols / 2, 0);
 845     cv::Size  sz(u.cols / 2, u.rows);
 846     cv::Rect roi(pt, sz);
 847     cv::UMat uroi(u, roi);
 848     cv::blur(uroi, uroi, cv::Size(7, 7), cv::Point(-3, -3));
 849 
 850     if (image)
 851         clReleaseMemObject(image);
 852     m_mem_obj = 0;
 853 
 854     if (m_img_src)
 855         clReleaseMemObject(m_img_src);
 856     m_img_src = 0;
 857 
 858     return 0;
 859 }
 860 
 861 
 862 int App::run()
 863 {
 864     if (0 != initOpenCL())
 865         return -1;
 866 
 867     if (0 != initVideoSource())
 868         return -1;
 869 
 870     Mat img_to_show;
 871 
 872     // set running state until ESC pressed
 873     setRunning(true);
 874     // set process flag to show some data processing
 875     // can be toggled on/off by 'p' button
 876     setDoProcess(true);
 877     // set use buffer flag,
 878     // when it is set to true, will demo interop opencl buffer and cv::Umat,
 879     // otherwise demo interop opencl image and cv::UMat
 880     // can be switched on/of by SPACE button
 881     setUseBuffer(true);
 882 
 883     // Iterate over all frames
 884     while (isRunning() && nextFrame(m_frame))
 885     {
 886         cv::cvtColor(m_frame, m_frameGray, COLOR_BGR2GRAY);
 887 
 888         UMat uframe;
 889 
 890         // work
 891         timerStart();
 892 
 893         if (doProcess())
 894         {
 895             process_frame_with_open_cl(m_frameGray, useBuffer(), &m_mem_obj);
 896 
 897             if (useBuffer())
 898                 process_cl_buffer_with_opencv(
 899                     m_mem_obj, m_frameGray.step[0], m_frameGray.rows, m_frameGray.cols, m_frameGray.type(), uframe);
 900             else
 901                 process_cl_image_with_opencv(m_mem_obj, uframe);
 902         }
 903         else
 904         {
 905             m_frameGray.copyTo(uframe);
 906         }
 907 
 908         timerEnd();
 909 
 910         uframe.copyTo(img_to_show);
 911 
 912         putText(img_to_show, "Version : " + m_platformInfo.Version(), Point(5, 30), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
 913         putText(img_to_show, "Name : " + m_platformInfo.Name(), Point(5, 60), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
 914         putText(img_to_show, "Device : " + m_deviceInfo.Name(), Point(5, 90), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
 915         cv::String memtype = useBuffer() ? "buffer" : "image";
 916         putText(img_to_show, "interop with OpenCL " + memtype, Point(5, 120), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
 917         putText(img_to_show, "Time : " + timeStr() + " msec", Point(5, 150), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
 918 
 919         imshow("opencl_interop", img_to_show);
 920 
 921         handleKey((char)waitKey(3));
 922     }
 923 
 924     return 0;
 925 }
 926 
 927 
 928 void App::handleKey(char key)
 929 {
 930     switch (key)
 931     {
 932     case 27:
 933         setRunning(false);
 934         break;
 935 
 936     case ' ':
 937         setUseBuffer(!useBuffer());
 938         break;
 939 
 940     case 'p':
 941     case 'P':
 942         setDoProcess( !doProcess() );
 943         break;
 944 
 945     default:
 946         break;
 947     }
 948 }
 949 
 950 
 951 inline void App::timerStart()
 952 {
 953     m_t0 = getTickCount();
 954 }
 955 
 956 
 957 inline void App::timerEnd()
 958 {
 959     m_t1 = getTickCount();
 960     int64 delta = m_t1 - m_t0;
 961     m_time = (delta / m_frequency) * 1000; // units msec
 962 }
 963 
 964 
 965 inline string App::timeStr() const
 966 {
 967     stringstream ss;
 968     ss << std::fixed << std::setprecision(1) << m_time;
 969     return ss.str();
 970 }
 971 
 972 
 973 int main(int argc, char** argv)
 974 {
 975     const char* keys =
 976         "{ help h ?    |          | print help message }"
 977         "{ camera c    | -1       | use camera as input }"
 978         "{ video  v    |          | use video as input }";
 979 
 980     CommandLineParser cmd(argc, argv, keys);
 981     if (cmd.has("help"))
 982     {
 983         cmd.printMessage();
 984         return EXIT_SUCCESS;
 985     }
 986 
 987     App app(cmd);
 988 
 989     try
 990     {
 991         app.run();
 992     }
 993 
 994     catch (const cv::Exception& e)
 995     {
 996         cout << "error: " << e.what() << endl;
 997         return 1;
 998     }
 999 
1000     catch (const std::exception& e)
1001     {
1002         cout << "error: " << e.what() << endl;
1003         return 1;
1004     }
1005 
1006     catch (...)
1007     {
1008         cout << "unknown exception" << endl;
1009         return 1;
1010     }
1011 
1012     return EXIT_SUCCESS;
1013 } // main()
View Code

     官方给出的代码,有很多和OpenCL平台相关的处理部分,看起的比较冗杂,因此我们就不直接移植这个代码了(参考部分思想和API接口的基本调用),重新写一个新的基于Opencv3.1.0的OCL程序的Demo框架,基于TI的AM57x系列的板卡,其他的设备类似:

1. Opencv3.1版本OpenCL支持情况:

        首先对于Opencv3.1.0版本,从正式发布Opencv3开始,其对OpenCL的支持已经发生了很大的变化,在之前需要使用cv::ocl相关函数API来实现kernel的编译调用等等,在其中还包括了很多的数据搬移,而Opencv3正式改变了这样的情况,重新封装了一个新的数据类型cv::UMat,这个数据类型能够无缝对接Opencv的普通接口,从而最少的改动代码而最大的完成OpenCL平台的加速功能[OpenCV3.x-OpenCL.pptx]

         如上图所示,只需要将原来的Mat格式换为UMat格式就可以实现Opencv函数在OpenCL设备上加速运行,而这其中具体实施的基本原理是什么呢?接下来看一下其底层实现的基本原理,具体参看Opencv中OpenCL部分实现的源代码:

        上图中表明了,当你使用的数据类型是UMat {data.isUmat()},并且开启了OpenCL使能{useOpenCL()},那么Opencv的接口将会跳转到OpenCL支持的设备中进行加速运行,当然你需要注意的是,在第一次使用OpenCL加速程序时,OpenCL需要编译生成对应平台的Kernel代码,而编译是需要花费大量的时间的,因此初次运行需要比较长的时间。

2. 接下来开始AM57x系列的Opencv-OCL编程

主要代码如下所示(Line28非常重要,使能OpenCL平台)

 1 #include <iostream>
 2 #include "opencv2/opencv.hpp"
 3 #include "opencv2/core/ocl.hpp"
 4 #include "opencv2/imgcodecs.hpp"
 5 #include "opencv2/videoio/videoio.hpp"
 6 #include "opencv2/highgui/highgui.hpp"
 7 #include "opencv2/imgproc/imgproc.hpp"
 8 
 9 using namespace std;
10 using namespace cv;
11 using namespace cv::ocl;
12 
13 #define DSP 1
14 
15 int main()
16 {
17     double t = 0.0;
18 #if DSP
19     std::vector<cv::ocl::PlatformInfo> plats;
20     cv::ocl::getPlatfomsInfo(plats);
21     const cv::ocl::PlatformInfo *platform = &plats[0];
22     cout << "Platform Name:" << platform->name().c_str() << endl;
23 
24     cv::ocl::Device c_dev;
25     platform->getDevice(c_dev,0);
26     cout << "Device name:" << c_dev.name().c_str() << endl;
27     c_dev.set(0);
28     cv::ocl::setUseOpenCL(true);
29     cout << "Use the OpenCL Deivice?" << cv::ocl::useOpenCL() << endl;
30 
31     cv::UMat Ori = cv::imread("/home/root/test.jpg").getUMat(cv::ACCESS_RW),Res,Canny;
32 
33     t = (double)cv::getTickCount();
34     cv::cvtColor(Ori,Res,cv::COLOR_RGB2GRAY);
35     cv::Canny(Res,Res,0,30);
36     t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
37     std::cout << "TI AM57x Accelerate Time Cost:" << t << "s" << std::endl;
38     cv::imshow("Test",Ori);
39     cv::imshow("Gray",Res);
40 #else
41     Mat I = imread("/home/root/test.jpg"),gray;
42 
43     t = (double)cv::getTickCount();
44     cv::cvtColor(I,gray,cv::COLOR_RGB2GRAY);
45     cv::Canny(gray,gray,0,30);
46     t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
47     std::cout << "CPU Time Cost:" << t << "s" << std::endl;
48     cv::imshow("Ori",I);
49     cv::imshow("Res",gray);
50 #endif
51 
52     for (;;)
53     {
54         int key = waitKey();
55         if (key == 27/*ESC*/ || key == 'q' || key == 'Q')
56             break;
57     }
58     return 0;
59 }

编译Makfile:

 1 TARGET3 = ocl_demo
 2 CXX = arm-linux-gnueabihf-g++
 3 CFLAGS += -Wl,-rpath-link,$(COMPILE_TOOL_PATH)/usr/lib \
 4           -Wl,-rpath-link,$(COMPILE_TOOL_PATH)/lib \
 5           -I$(COMPILE_TOOL_PATH)/usr/include \
 6           -L$(COMPILE_TOOL_PATH)/usr/lib \
 7           -L$(COMPILE_TOOL_PATH)/lib -Wall -W \
 8           -std=c++98
 9 CFLAGS +=  -lopencv_core -lopencv_objdetect -lopencv_highgui -lopencv_videoio -lopencv_imgcodecs -lopencv_imgproc -lOpenCL -lpthread -lrt
10 
11 all:
12     @$(CXX)  $(TARGET3).cpp -o $(TARGET3) $(CFLAGS)
13 clean:
14     rm -rf  $(TARGET3)
View Code

运行脚本opencv-ocl-runtime.sh

1 export TI_OCL_LOAD_KERNELS_ONCHIP=Y
2 export TI_OCL_CACHE_KERNELS=Y
3 export OPENCV_OPENCL_DEVICE='TI AM57:ACCELERATOR:TI Multicore C66 DSP'
4 echo "OpenCL on, canny"
5 ./ocl_demo
6 export OPENCV_OPENCL_DEVICE='disabled'
7 echo "OpenCL off, canny"
8 ./ocl_demo

使用make指令编译后在AM5718平台上运行结果如下所示:

 根据运行的结构可以看出,CPU运行时间是经过OpenCL-DSP平台加速后的十倍左右,因此能够明显体现出加速的效果。

注1:我们需要注意的是在第一次运行此代码时,一定要注意,通过脚本运行ocl_demo,在脚本当中,第一行表示如果需要加载Kernel,是直接通过本地芯片上加载Kernel的,第二行表示如果编译了Kernel,将会将Kernel保存在Cache当中,下次调用(直到关机之前)就不需要编译了,如果没有这两个选项,代码就会编译Kernel而消耗大量的时间,因此这是必须的。

注2:脚本中第三行是非常重要的,必须要在这里使能OpenCL设备,否则Opencv将永远不会调用DSP加速算法,而在CPU上运行,具体参考这里

猜想与验证:当我在调用Opencv3.1中erode或者其他形态学相关的接口时,出现了经过DSP加速之后的效果居然比CPU端要差很多!!!这是为什么呢?猜想如下:

猜想:首先我们使用的是TI的DSP Accelerator,型号是C66系列, 而这个系列的DSP处理器实际上只有不超过10个核心的Processor,而DSP最主要的优势是其包括的乘法器资源以及加法器资源,因此对于包括乘法加法等运算的这些情况,其加速效果比较明显,对于Opencv中的形态学相关API主要用到的是基本的if逻辑分支判断的情况,因此,不能够充分使用DSP当中的加法器或者乘法器,因此加速效果不明显,甚至更慢,如下所示(erode algorithm):

验证:根据上述猜想,我们可以选择有多个核心的GPU来对erode算法加速,查看erode算法在GPU加速下的效率,加速效果如下所示:

Reference:

官方代码说明:https://www.khronos.org/opencl/

Opencv(编译OCL模块)+VSC+MinGW环境搭建:https://www.cnblogs.com/uestc-mm/p/12758110.html

TI官方OpenCV3.1-Release:https://git.ti.com/cgit/opencv/tiopencv/?h=tiopencvrelease_3.1

TI官方AM57x Processor SDK Linux(OpenCV支持):http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/Foundational_Components_OpenCV.html

posted @ 2020-05-27 12:48  小淼博客  阅读(4019)  评论(9编辑  收藏  举报

大家转载请注明出处!谢谢! 在这里要感谢GISPALAB实验室的各位老师和学长学姐的帮助!谢谢~