add libdnn acceleration to dnn module by pengli · Pull Request #9114 · opencv/opencv

pengli · 2017-07-07T07:07:20Z

libdnn provide ocl acceleration to current dnn module. Currently it provide ocl kernels for 5 layers including convolution, softmax, LRN, fully connect and pooling. In convolution it implements an auto-tuning mechanism to enumerate all possible convolution kernels and find the best performance one at first run time. So application can use the tuned kernel to achieve best performance.

vpisarev · 2017-07-11T15:41:28Z

modules/dnn/src/libdnn/include/greentea_math_functions.hpp

@@ -0,0 +1,54 @@
+/*


Thank you for the contribution, I believe, it will be very valuable!

Just a quick note while you are debugging the OpenCL kernels. We now have much stricter legal requirements, especially for the main repository. This heading comment has no license, has no copyright, we cannot include it into OpenCV in such a form. Besides, we are now regularly running tools to check the code cleanness. So, before the patch is integrated, we need to be sure that the proper header with the license and copyright is in place and the code in the file does not match with non-approved 3rd-party open-source software.

@vpisarev , Thanks for your comments, we will check every single file to make sure it is clean and has proper license and copyright.

vpisarev · 2017-07-11T15:43:47Z

@pengli, not sure if my comment for one of the files is visible. In brief, we need to make sure that every single contributed file has a proper license, copyright. And the code should be absolutely clean, it should not match code from other 3rd-party open-source software.

Signed-off-by: Li Peng <peng.li@intel.com>

- Use more readable string as signture of kernel config - Don't count device name and vendor in signature string - Default kernel configurations are tuned for Intel GPU with 24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.

…dules

Avoid unwanted creation of directories

Signed-off-by: Li Peng <peng.li@intel.com>

alalek · 2017-09-29T15:39:36Z

modules/dnn/src/ocl4dnn/src/ocl4dnn_softmax.cpp

+{
+    bool ret = false;
+    ocl::Queue queue = ocl::Queue::getDefault();
+    bool intel_subgroup = 0 && ocl::Device::getDefault().intelSubgroupsSupport();


Looks like, softmax is completely disabled. What is the problem here?

The subgroup version softmax kernel is enabled in the new commit. both subgroup version and non-subgroup version works in ocl4dnn.

alalek · 2017-09-29T15:43:47Z

modules/dnn/src/ocl4dnn/src/common.cpp

+{
+    int major = ocl::Device::getDefault().deviceVersionMajor();
+    int minor = ocl::Device::getDefault().deviceVersionMinor();
+    return (major >= 2) && (minor >= 1);


This condition is not valid for "=>2.1": "3.0" would not pass.

Also SRB5.0 supports "ifp" flag (at least don't fail) but it is still "OpenCL 2.0".
Perhaps we should try build with "ifp" flag and then fallback to build without this problematic flag in case of errors.

A more universal check method is implemented in new commit.

alalek · 2017-09-29T15:44:26Z

modules/dnn/src/ocl4dnn/src/ocl4dnn_inner_product.cpp

+    // Set up the bias multiplier
+    if (bias_term_)
+    {
+        bias_multiplier_ = UMat(1, M_, 1.0f, CV_32FC1);


dropped unused variable in new commit

Signed-off-by: Li Peng <peng.li@intel.com>

alalek · 2017-10-01T21:09:55Z

modules/core/src/ocl.cpp

+            float arg = 0;
+            clSetKernelArg((cl_kernel)kernel.ptr(), 0, sizeof(arg), &arg);
+            clEnqueueTask((cl_command_queue)queue.ptr(), (cl_kernel)kernel.ptr(), 0,
+                          NULL, &start_gpu_cl_);


Did you try to use events from clEnqueueMarker() (OpenCL 1.1) or preferable clEnqueueBarrierWithWaitList() (OpenCL 1.2+) instead of kernel task submitting?

Thanks for your advice, I will try this in next commit

pengli force-pushed the dnn_rebase branch 20 times, most recently from f7c4ee6 to ae3afa9 Compare July 11, 2017 07:20

vpisarev reviewed Jul 11, 2017

View reviewed changes

pengli force-pushed the dnn_rebase branch 8 times, most recently from a7b6368 to 0350b6b Compare July 13, 2017 01:50

pli2-intel and others added 23 commits September 29, 2017 10:26

dnn(ocl4dnn): wrap cl_mem into UMat

57eca8b

Signed-off-by: Li Peng <peng.li@intel.com>

dnn(ocl4dnn): Refine signature of kernel config

94fd211

- Use more readable string as signture of kernel config - Don't count device name and vendor in signature string - Default kernel configurations are tuned for Intel GPU with 24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.

dnn(ocl4dnn): swap width/height in configuration

7197a79

dnn(ocl4dnn): enable configs for Intel OpenCL runtime only

2a2d8f7

core: make configuration helper functions accessible from non-core mo…

37e329f

…dules

dnn(ocl4dnn): update kernel auto-tuning behavior

ec84599

Avoid unwanted creation of directories

dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash

223c9cd

dnn(ocl4dnn): remove redundant code

7668c85

dnn(ocl4dnn): Add more clear message for simd size dismatch.

6449277

dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel

d5e1d52

dnn(ocl4dnn): add const to const argument

f710d8a

Signed-off-by: Li Peng <peng.li@intel.com>

dnn(ocl4dnn): drop unused tuneLocalSize()

837e2cb

dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method

6e17cbb

dnn(ocl4dnn): sanitize file names used for cache

f6b7ff1

dnn(perf): enable Network tests with OpenCL

cc85ab2

dnn(ocl4dnn/conv): drop computeGlobalSize()

8585ead

dnn(ocl4dnn/conv): drop unused fields

4302d9b

dnn(ocl4dnn/conv): simplify ctor

83680ae

dnn(ocl4dnn/conv): drop unused variable

3980d21

dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL

7a3741b

dnn(ocl4dnn/conv): drop unsupported double / untested half types

5547f47

dnn(ocl4dnn/conv): alignSize/divUp

c79a454

dnn(ocl4dnn/conv): use enum values

40ce490

alalek reviewed Sep 29, 2017

View reviewed changes

pli2-intel and others added 3 commits September 30, 2017 09:02

dnn(ocl4dnn): drop unused innerproduct variable

f9f19d5

Signed-off-by: Li Peng <peng.li@intel.com>

dnn(ocl4dnn): add an generic function to check cl option support

9479734

dnn(ocl4dnn): run softmax subgroup version kernel first

54e6cb5

Signed-off-by: Li Peng <peng.li@intel.com>

alalek reviewed Oct 1, 2017

View reviewed changes

alalek approved these changes Oct 2, 2017

View reviewed changes

alalek mentioned this pull request Aug 30, 2021

dnn(OpenCL): drop CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE check #20635

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add libdnn acceleration to dnn module #9114

add libdnn acceleration to dnn module #9114
alalek merged 110 commits intoopencv:masterfrom
pengli:dnn_rebase

pengli commented Jul 7, 2017

Uh oh!

vpisarev Jul 11, 2017

Uh oh!

pengli Jul 12, 2017

Uh oh!

vpisarev commented Jul 11, 2017

Uh oh!

alalek Sep 29, 2017

Uh oh!

pengli Sep 30, 2017 •

edited

Loading

Uh oh!

alalek Sep 29, 2017

Uh oh!

wzw-intel Sep 30, 2017

Uh oh!

alalek Sep 29, 2017

Uh oh!

pengli Sep 30, 2017

Uh oh!

alalek Oct 1, 2017

Uh oh!

wzw-intel Oct 1, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

pengli commented Jul 7, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vpisarev commented Jul 11, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pengli Sep 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pengli Sep 30, 2017 •

edited

Loading