OpenCL: add explicit cast for half by tomoaki0705 · Pull Request #18360 · opencv/opencv

tomoaki0705 · 2020-09-18T12:06:19Z

Pull Request Readiness Checklist

OpenCL FP16 inference on Arm Mali is not supported, as claimed on #18283
There are 2 points, one is isIntel() call and another is kernel build failure

OpenCL program build log: dnn/prior_box
Status -11: CL_BUILD_PROGRAM_FAILURE
-DDtype=half -DDtype4=half4 -Dconvert_T=convert_half4
<source>:67:9: error: call to 'clamp' is ambiguous
vstore4(clamp(vec, 0.0f, 1.0f), index, dst);
        ^~~~~

This PR is to fix the 2nd point.
The error comes that clamp is accepting
half4, float, float as a parameter, when it's supposed to accept either half4, half, half or float4, float, float

This PR will explicitly cast the latter 2 parameters, and let the kernel to be built.
I confirmed on RK3399, by also disabling here

opencv/modules/dnn/src/dnn.cpp

Lines 1316 to 1323 in e668cff

    
           else if (preferableTarget == DNN_TARGET_OPENCL_FP16 && !ocl::Device::getDefault().isIntel()) 
        
           { 
        
               CV_LOG_WARNING(NULL, 
        
                   "DNN: OpenCL target with fp16 precision is not supported " 
        
                   "with current OpenCL device (tested with Intel GPUs only), " 
        
                   "switching to OpenCL with fp32 precision."); 
        
               preferableTarget = DNN_TARGET_OPENCL; 
        
           }

The build passed successfully and the opencv_test_dnn only raised rounding error.
The kernel seems to be doing its work.

I don't think it's safe enough to comment out the isIntel() due to the support situation, but at least the kernel build failure comes from OpenCL spec violation, so I think it's worth to fix it.

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
The PR is proposed to proper branch
There is reference to original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

tomoaki0705 · 2020-09-18T12:43:37Z

Here's a summary of opencv_test_dnn, if I bypass isIntel() call

[ RUN      ] Test_Torch_layers.run_convolution/1, where GetParam() = OCV/OCL_FP16
/opencv/modules/dnn/test/test_common.impl.hpp:69: Failure
Expected: (normInf) <= (lInf), actual: 0.421955 vs 0.42
[  FAILED  ] Test_Torch_layers.run_convolution/1, where GetParam() = OCV/OCL_FP16 (103 ms)
(snip)
[ RUN      ] Test_Torch_layers.run_depth_concat/1, where GetParam() = OCV/OCL_FP16
/opencv/modules/dnn/test/test_common.impl.hpp:69: Failure
Expected: (normInf) <= (lInf), actual: 0.0317307 vs 0.021
[  FAILED  ] Test_Torch_layers.run_depth_concat/1, where GetParam() = OCV/OCL_FP16 (180 ms)
(snip)
[  PASSED  ] 3340 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] Test_Torch_layers.run_convolution/1, where GetParam() = OCV/OCL_FP16
[  FAILED  ] Test_Torch_layers.run_depth_concat/1, where GetParam() = OCV/OCL_FP16

So basically, it doesn't harm the implementation.
2 tests fails, but it's only a rounding error.

alalek

Thank you!

alalek · 2020-09-18T12:58:55Z

I will add runtime option to bypass this FP16 check in a separate PR.

(We tried OpenCL/FP16 on AMD/NVIDIA GPUs some time ago and accuracy/performance of current implementation was very poor on these platforms)

alalek · 2020-09-18T15:35:20Z

Actually there is already such parameter to bypass these OpenCL checks: OPENCV_DNN_OPENCL_ALLOW_ALL_DEVICES

add explicit cast for half

f77c2d7

alalek approved these changes Sep 18, 2020

View reviewed changes

opencv-pushbot merged commit 3e3787e into opencv:3.4 Sep 18, 2020

tomoaki0705 deleted the fixClampFailure branch September 18, 2020 13:11

alalek mentioned this pull request Sep 19, 2020

Merge 3.4 #18368

Merged

This was referenced Sep 23, 2020

OpenCL: fix cast for native_powr when half #18395

Merged

DNN_TARGET_OPENCL_FP16 support for RK3399's Mali-T860 #18283

Open

dnn: loosen threshold for Mali #18434

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenCL: add explicit cast for half#18360

OpenCL: add explicit cast for half#18360
opencv-pushbot merged 1 commit intoopencv:3.4from
tomoaki0705:fixClampFailure

tomoaki0705 commented Sep 18, 2020

Uh oh!

tomoaki0705 commented Sep 18, 2020

Uh oh!

alalek left a comment

Uh oh!

alalek commented Sep 18, 2020

Uh oh!

alalek commented Sep 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	else if (preferableTarget == DNN_TARGET_OPENCL_FP16 && !ocl::Device::getDefault().isIntel())
	{
	CV_LOG_WARNING(NULL,
	"DNN: OpenCL target with fp16 precision is not supported "
	"with current OpenCL device (tested with Intel GPUs only), "
	"switching to OpenCL with fp32 precision.");
	preferableTarget = DNN_TARGET_OPENCL;
	}

Uh oh!

Conversation

tomoaki0705 commented Sep 18, 2020

Pull Request Readiness Checklist

Uh oh!

tomoaki0705 commented Sep 18, 2020

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

alalek commented Sep 18, 2020

Uh oh!

alalek commented Sep 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants