Skip to content

add libdnn acceleration to dnn module #9114

Merged
alalek merged 110 commits intoopencv:masterfrom
pengli:dnn_rebase
Oct 2, 2017
Merged

add libdnn acceleration to dnn module #9114
alalek merged 110 commits intoopencv:masterfrom
pengli:dnn_rebase

Conversation

@pengli
Copy link
Copy Markdown

@pengli pengli commented Jul 7, 2017

libdnn provide ocl acceleration to current dnn module. Currently it provide ocl kernels for 5 layers including convolution, softmax, LRN, fully connect and pooling. In convolution it implements an auto-tuning mechanism to enumerate all possible convolution kernels and find the best performance one at first run time. So application can use the tuned kernel to achieve best performance.

@pengli pengli force-pushed the dnn_rebase branch 20 times, most recently from f7c4ee6 to ae3afa9 Compare July 11, 2017 07:20
@@ -0,0 +1,54 @@
/*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution, I believe, it will be very valuable!

Just a quick note while you are debugging the OpenCL kernels. We now have much stricter legal requirements, especially for the main repository. This heading comment has no license, has no copyright, we cannot include it into OpenCV in such a form. Besides, we are now regularly running tools to check the code cleanness. So, before the patch is integrated, we need to be sure that the proper header with the license and copyright is in place and the code in the file does not match with non-approved 3rd-party open-source software.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vpisarev , Thanks for your comments, we will check every single file to make sure it is clean and has proper license and copyright.

@vpisarev
Copy link
Copy Markdown
Contributor

@pengli, not sure if my comment for one of the files is visible. In brief, we need to make sure that every single contributed file has a proper license, copyright. And the code should be absolutely clean, it should not match code from other 3rd-party open-source software.

@pengli pengli force-pushed the dnn_rebase branch 8 times, most recently from a7b6368 to 0350b6b Compare July 13, 2017 01:50
pli2-intel and others added 23 commits September 29, 2017 10:26
Signed-off-by: Li Peng <peng.li@intel.com>
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
  24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
Avoid unwanted creation of directories
Signed-off-by: Li Peng <peng.li@intel.com>
{
bool ret = false;
ocl::Queue queue = ocl::Queue::getDefault();
bool intel_subgroup = 0 && ocl::Device::getDefault().intelSubgroupsSupport();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like, softmax is completely disabled. What is the problem here?

Copy link
Copy Markdown
Author

@pengli pengli Sep 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subgroup version softmax kernel is enabled in the new commit. both subgroup version and non-subgroup version works in ocl4dnn.

{
int major = ocl::Device::getDefault().deviceVersionMajor();
int minor = ocl::Device::getDefault().deviceVersionMinor();
return (major >= 2) && (minor >= 1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is not valid for "=>2.1": "3.0" would not pass.

Also SRB5.0 supports "ifp" flag (at least don't fail) but it is still "OpenCL 2.0".
Perhaps we should try build with "ifp" flag and then fallback to build without this problematic flag in case of errors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more universal check method is implemented in new commit.

// Set up the bias multiplier
if (bias_term_)
{
bias_multiplier_ = UMat(1, M_, 1.0f, CV_32FC1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped unused variable in new commit

float arg = 0;
clSetKernelArg((cl_kernel)kernel.ptr(), 0, sizeof(arg), &arg);
clEnqueueTask((cl_command_queue)queue.ptr(), (cl_kernel)kernel.ptr(), 0,
NULL, &start_gpu_cl_);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try to use events from clEnqueueMarker() (OpenCL 1.1) or preferable clEnqueueBarrierWithWaitList() (OpenCL 1.2+) instead of kernel task submitting?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your advice, I will try this in next commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants