Skip to content

DNN(OpenCL): avoid mess FP16/FP32 in convolution layer#19115

Merged
opencv-pushbot merged 1 commit intoopencv:3.4from
alalek:dnn_ocl_conv_fp16_consistency
Dec 15, 2020
Merged

DNN(OpenCL): avoid mess FP16/FP32 in convolution layer#19115
opencv-pushbot merged 1 commit intoopencv:3.4from
alalek:dnn_ocl_conv_fp16_consistency

Conversation

@alalek
Copy link
Copy Markdown
Member

@alalek alalek commented Dec 15, 2020

relates #18465

/cc @tomoaki0705

@asmorkalov
Copy link
Copy Markdown
Contributor

cc @sl-sergei

@asmorkalov asmorkalov requested a review from sl-sergei December 15, 2020 07:19
Copy link
Copy Markdown
Contributor

@sl-sergei sl-sergei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! 👍

@tomoaki0705
Copy link
Copy Markdown
Contributor

Confirmed this works on my RK3399

$ export | grep OPENCV
declare -x OPENCV_DNN_OPENCL_ALLOW_ALL_DEVICES="true"
declare -x OPENCV_TEST_DATA_PATH="/home/linaro/opencv_extra/testdata/"
$ ./opencv_test_dnn
CTEST_FULL_OUTPUT
OpenCV version: 3.4.13-pre
OpenCV VCS version: 3.4.12-177-gc240355cc6
Build type: Release
Compiler: /usr/bin/c++  (ver 6.3.0)
Parallel framework: pthreads (nthreads=6)
CPU features: NEON FP16
OpenCL Platforms:
    ARM Platform
        iGPU: Mali-T860 (OpenCL 1.2 v1.r14p0-01rel0-git(a79caef).e1315d65458474a7b7d3598c7bfdc17e)
Current OpenCL device:
    Type = iGPU
    Name = Mali-T860
    Version = OpenCL 1.2 v1.r14p0-01rel0-git(a79caef).e1315d65458474a7b7d3598c7bfdc17e
    Driver version = 1.2
    Address bits = 64
    Compute units = 4
    Max work group size = 256
    Local memory size = 32 KB
    Max memory allocation size = 952 MB 530 KB
    Double support = Yes
    Host unified memory = Yes
    Device extensions:
        cl_khr_global_int32_base_atomics
        cl_khr_global_int32_extended_atomics
        cl_khr_local_int32_base_atomics
        cl_khr_local_int32_extended_atomics
        cl_khr_byte_addressable_store
        cl_khr_3d_image_writes
        cl_khr_fp64
        cl_khr_int64_base_atomics
        cl_khr_int64_extended_atomics
        cl_khr_fp16
        cl_khr_gl_sharing
        cl_khr_icd
        cl_khr_egl_event
        cl_khr_egl_image
        cl_khr_image2d_from_buffer
        cl_arm_core_id
        cl_arm_printf
        cl_arm_thread_limit_hint
        cl_arm_non_uniform_work_group_size
        cl_arm_import_memory
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 16
    Preferred vector width short = 8
    Preferred vector width int = 4
    Preferred vector width long = 2
    Preferred vector width float = 4
    Preferred vector width double = 2
TEST: Skip tests with tags: 'mem_6gb', 'verylong', 'dnn_skip_halide', 'dnn_skip_ocl', 'dnn_skip_ocl_fp16', 'dnn_skip_ie_ocl', 'dnn_skip_ie_ocl_fp16'
[==========] Running 3424 tests from 70 test cases.
[----------] Global test environment set-up.
 :

[----------] Global test environment tear-down
[ SKIPSTAT ] 669 tests skipped
[ SKIPSTAT ] TAG='mem_6gb' skip 3 tests
[ SKIPSTAT ] TAG='verylong' skip 3 tests
[ SKIPSTAT ] TAG='dnn_skip_ocl' skip 4 tests
[ SKIPSTAT ] TAG='dnn_skip_ocl_fp16' skip 29 tests (3 times in extra skip list)
[ SKIPSTAT ] TAG='dnn_skip_ie_ocl' skip 1 tests
[ SKIPSTAT ] TAG='dnn_skip_ie_ocl_fp16' skip 1 tests
[ SKIPSTAT ] TAG='skip_other' skip 628 tests
[==========] 3424 tests from 70 test cases ran. (41722 ms total)
[  PASSED  ] 3424 tests.

Looking carefully, the FP16 tests are triggered, and passed successfully

[ RUN      ] Test_Torch_layers.net_residual/0, where GetParam() = OCV/OCL
[       OK ] Test_Torch_layers.net_residual/0 (10 ms)
[ RUN      ] Test_Torch_layers.net_residual/1, where GetParam() = OCV/OCL_FP16
[       OK ] Test_Torch_layers.net_residual/1 (12 ms)
[ RUN      ] Test_Torch_layers.net_residual/2, where GetParam() = OCV/CPU
[       OK ] Test_Torch_layers.net_residual/2 (0 ms)
[ RUN      ] Test_Torch_layers.upsampling_nearest/0, where GetParam() = OCV/OCL
[       OK ] Test_Torch_layers.upsampling_nearest/0 (3 ms)
[ RUN      ] Test_Torch_layers.upsampling_nearest/1, where GetParam() = OCV/OCL_FP16
[       OK ] Test_Torch_layers.upsampling_nearest/1 (8 ms)
[ RUN      ] Test_Torch_layers.upsampling_nearest/2, where GetParam() = OCV/CPU
[       OK ] Test_Torch_layers.upsampling_nearest/2 (1 ms)

Great work @alalek ! Thank you!

@asmorkalov asmorkalov mentioned this pull request Dec 15, 2020
6 tasks
@opencv-pushbot opencv-pushbot merged commit 50fed1d into opencv:3.4 Dec 15, 2020
@alalek alalek mentioned this pull request Dec 17, 2020
@alalek alalek mentioned this pull request Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants