add libdnn acceleration to dnn module #9114
Conversation
f7c4ee6 to
ae3afa9
Compare
| @@ -0,0 +1,54 @@ | |||
| /* | |||
There was a problem hiding this comment.
Thank you for the contribution, I believe, it will be very valuable!
Just a quick note while you are debugging the OpenCL kernels. We now have much stricter legal requirements, especially for the main repository. This heading comment has no license, has no copyright, we cannot include it into OpenCV in such a form. Besides, we are now regularly running tools to check the code cleanness. So, before the patch is integrated, we need to be sure that the proper header with the license and copyright is in place and the code in the file does not match with non-approved 3rd-party open-source software.
There was a problem hiding this comment.
@vpisarev , Thanks for your comments, we will check every single file to make sure it is clean and has proper license and copyright.
|
@pengli, not sure if my comment for one of the files is visible. In brief, we need to make sure that every single contributed file has a proper license, copyright. And the code should be absolutely clean, it should not match code from other 3rd-party open-source software. |
a7b6368 to
0350b6b
Compare
Signed-off-by: Li Peng <peng.li@intel.com>
- Use more readable string as signture of kernel config - Don't count device name and vendor in signature string - Default kernel configurations are tuned for Intel GPU with 24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
Avoid unwanted creation of directories
Signed-off-by: Li Peng <peng.li@intel.com>
| { | ||
| bool ret = false; | ||
| ocl::Queue queue = ocl::Queue::getDefault(); | ||
| bool intel_subgroup = 0 && ocl::Device::getDefault().intelSubgroupsSupport(); |
There was a problem hiding this comment.
Looks like, softmax is completely disabled. What is the problem here?
There was a problem hiding this comment.
The subgroup version softmax kernel is enabled in the new commit. both subgroup version and non-subgroup version works in ocl4dnn.
| { | ||
| int major = ocl::Device::getDefault().deviceVersionMajor(); | ||
| int minor = ocl::Device::getDefault().deviceVersionMinor(); | ||
| return (major >= 2) && (minor >= 1); |
There was a problem hiding this comment.
This condition is not valid for "=>2.1": "3.0" would not pass.
Also SRB5.0 supports "ifp" flag (at least don't fail) but it is still "OpenCL 2.0".
Perhaps we should try build with "ifp" flag and then fallback to build without this problematic flag in case of errors.
There was a problem hiding this comment.
A more universal check method is implemented in new commit.
| // Set up the bias multiplier | ||
| if (bias_term_) | ||
| { | ||
| bias_multiplier_ = UMat(1, M_, 1.0f, CV_32FC1); |
There was a problem hiding this comment.
dropped unused variable in new commit
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
| float arg = 0; | ||
| clSetKernelArg((cl_kernel)kernel.ptr(), 0, sizeof(arg), &arg); | ||
| clEnqueueTask((cl_command_queue)queue.ptr(), (cl_kernel)kernel.ptr(), 0, | ||
| NULL, &start_gpu_cl_); |
There was a problem hiding this comment.
Did you try to use events from clEnqueueMarker() (OpenCL 1.1) or preferable clEnqueueBarrierWithWaitList() (OpenCL 1.2+) instead of kernel task submitting?
There was a problem hiding this comment.
Thanks for your advice, I will try this in next commit
libdnn provide ocl acceleration to current dnn module. Currently it provide ocl kernels for 5 layers including convolution, softmax, LRN, fully connect and pooling. In convolution it implements an auto-tuning mechanism to enumerate all possible convolution kernels and find the best performance one at first run time. So application can use the tuned kernel to achieve best performance.