Skip to content

OpenCL: add explicit cast for half#18360

Merged
opencv-pushbot merged 1 commit intoopencv:3.4from
tomoaki0705:fixClampFailure
Sep 18, 2020
Merged

OpenCL: add explicit cast for half#18360
opencv-pushbot merged 1 commit intoopencv:3.4from
tomoaki0705:fixClampFailure

Conversation

@tomoaki0705
Copy link
Copy Markdown
Contributor

relates #18283

Pull Request Readiness Checklist

OpenCL FP16 inference on Arm Mali is not supported, as claimed on #18283
There are 2 points, one is isIntel() call and another is kernel build failure

OpenCL program build log: dnn/prior_box
Status -11: CL_BUILD_PROGRAM_FAILURE
-DDtype=half -DDtype4=half4 -Dconvert_T=convert_half4
<source>:67:9: error: call to 'clamp' is ambiguous
vstore4(clamp(vec, 0.0f, 1.0f), index, dst);
        ^~~~~

This PR is to fix the 2nd point.
The error comes that clamp is accepting
half4, float, float as a parameter, when it's supposed to accept either half4, half, half or float4, float, float

This PR will explicitly cast the latter 2 parameters, and let the kernel to be built.
I confirmed on RK3399, by also disabling here

opencv/modules/dnn/src/dnn.cpp

Lines 1316 to 1323 in e668cff

else if (preferableTarget == DNN_TARGET_OPENCL_FP16 && !ocl::Device::getDefault().isIntel())
{
CV_LOG_WARNING(NULL,
"DNN: OpenCL target with fp16 precision is not supported "
"with current OpenCL device (tested with Intel GPUs only), "
"switching to OpenCL with fp32 precision.");
preferableTarget = DNN_TARGET_OPENCL;
}

The build passed successfully and the opencv_test_dnn only raised rounding error.
The kernel seems to be doing its work.

I don't think it's safe enough to comment out the isIntel() due to the support situation, but at least the kernel build failure comes from OpenCL spec violation, so I think it's worth to fix it.

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@tomoaki0705
Copy link
Copy Markdown
Contributor Author

Here's a summary of opencv_test_dnn, if I bypass isIntel() call

[ RUN      ] Test_Torch_layers.run_convolution/1, where GetParam() = OCV/OCL_FP16
/opencv/modules/dnn/test/test_common.impl.hpp:69: Failure
Expected: (normInf) <= (lInf), actual: 0.421955 vs 0.42
[  FAILED  ] Test_Torch_layers.run_convolution/1, where GetParam() = OCV/OCL_FP16 (103 ms)
(snip)
[ RUN      ] Test_Torch_layers.run_depth_concat/1, where GetParam() = OCV/OCL_FP16
/opencv/modules/dnn/test/test_common.impl.hpp:69: Failure
Expected: (normInf) <= (lInf), actual: 0.0317307 vs 0.021
[  FAILED  ] Test_Torch_layers.run_depth_concat/1, where GetParam() = OCV/OCL_FP16 (180 ms)
(snip)
[  PASSED  ] 3340 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] Test_Torch_layers.run_convolution/1, where GetParam() = OCV/OCL_FP16
[  FAILED  ] Test_Torch_layers.run_depth_concat/1, where GetParam() = OCV/OCL_FP16

So basically, it doesn't harm the implementation.
2 tests fails, but it's only a rounding error.

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@alalek
Copy link
Copy Markdown
Member

alalek commented Sep 18, 2020

I will add runtime option to bypass this FP16 check in a separate PR.

(We tried OpenCL/FP16 on AMD/NVIDIA GPUs some time ago and accuracy/performance of current implementation was very poor on these platforms)

@opencv-pushbot opencv-pushbot merged commit 3e3787e into opencv:3.4 Sep 18, 2020
@tomoaki0705 tomoaki0705 deleted the fixClampFailure branch September 18, 2020 13:11
@alalek
Copy link
Copy Markdown
Member

alalek commented Sep 18, 2020

Actually there is already such parameter to bypass these OpenCL checks: OPENCV_DNN_OPENCL_ALLOW_ALL_DEVICES

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants