-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
DNN fails OCL_FP16 tests on recent Intel GPU #21004
Copy link
Copy link
Closed
Labels
bugcategory: dnncategory: oclconfirmedThere is stable reproducer / investigation completeThere is stable reproducer / investigation complete
Milestone
Description
DNN tests fail on modern Intel iGPUs when running OCL_FP16 test cases.
Does not reproduce on CPU test cases. Does not reproduce on standard OCL test cases.
Does not reproduce if my NVIDIA gpu is selected. The NVIDIA can run OCL and OCL_FP16 correctly.
[ RUN ] Test_ONNX_nets.Alexnet/1, where GetParam() = OCV/OCL_FP16
C:\repos-nobackup\opencv\modules\dnn\test\test_common.impl.hpp(74): error: Expected: (normInf) <= (lInf), actual: 0.640317 vs 0.02
|ref| = 0.00392913818359375C:\repos-nobackup\opencv\modules\dnn\test\test_common.impl.hpp(74): error: Expected: (normInf) <= (lInf), actual: 0.640317 vs 0.02
|ref| = 0.00392913818359375
[ FAILED ] Test_ONNX_nets.Alexnet/1, where GetParam() = OCV/OCL_FP16 (775 ms)
[ RUN ] Test_ONNX_nets.Googlenet/1, where GetParam() = OCV/OCL_FP16
[ERROR:0] global ..\modules\dnn\src\ocl4dnn\src\ocl4dnn_conv_spatial.cpp (1205) cv::dnn::ocl4dnn::OCL4DNNConvSpatial<float>::verifyResult Kernel: U_GEMM_LIKE_CONV_k1x1_cn64_g1_s1x1_d1x1_b1_in64x64_p0x0_num2_M64_activ1_eltwise0_FP16_5_1_8_32_SIMD8
[ERROR:0] global ..\modules\dnn\src\ocl4dnn\src\ocl4dnn_conv_spatial.cpp (1211) cv::dnn::ocl4dnn::OCL4DNNConvSpatial<float>::verifyResult test verification failed @ image 0 group 0 out_ch 0 h 0 w 0 (offset: 0) got -nan(ind) expected -nan(ind)
[ERROR:0] global ..\modules\dnn\src\ocl4dnn\src\ocl4dnn_conv_spatial.cpp (1211) cv::dnn::ocl4dnn::OCL4DNNConvSpatial<float>::verifyResult test verification failed @ image 0 group 0 out_ch 0 h 0 w 1 (offset: 1) got -nan(ind) expected -nan(ind)
...thousands of errors...
System information (version)
- OpenCV => 4.5.4
- Operating System / Platform => Microsoft Windows [Version 10.0.19043.1288]
- Compiler => VS 2019 Community v16.11.5
- Intel UHD graphics from 10th gen cpu
- current Intel graphics driver: 30.0.100.9805 (note that recent drivers have the NEO 3.0 opencl)
Repro
- build OpenCV 4.5.4 for windows with opencl, dnn module, opencl dnn, ipp, tests, etc.
- Run test suite for DNN forcing
OPENCV_OPENCL_DEVICE=Intel:GPU
Result
[ RUN ] Test_ONNX_nets.Alexnet/0, where GetParam() = OCV/OCL
[ OK ] Test_ONNX_nets.Alexnet/0 (911 ms)
[ RUN ] Test_ONNX_nets.Alexnet/1, where GetParam() = OCV/OCL_FP16
C:\repos-nobackup\opencv\modules\dnn\test\test_common.impl.hpp(74): error: Expected: (normInf) <= (lInf), actual: 0.640317 vs 0.02
|ref| = 0.00392913818359375C:\repos-nobackup\opencv\modules\dnn\test\test_common.impl.hpp(74): error: Expected: (normInf) <= (lInf), actual: 0.640317 vs 0.02
|ref| = 0.00392913818359375
[ FAILED ] Test_ONNX_nets.Alexnet/1, where GetParam() = OCV/OCL_FP16 (775 ms)
[ RUN ] Test_ONNX_nets.Alexnet/2, where GetParam() = OCV/CPU
[ OK ] Test_ONNX_nets.Alexnet/2 (578 ms)
[ RUN ] Test_ONNX_nets.Googlenet/0, where GetParam() = OCV/OCL
[ WARN:0] global ..\modules\dnn\src\ocl4dnn\src\ocl4dnn_conv_spatial.cpp (1927) cv::dnn::ocl4dnn::OCL4DNNConvSpatial<float>::loadTunedConfig OpenCV(ocl4dnn): consider to specify kernel configuration cache directory through OPENCV_OCL4DNN_CONFIG_PATH parameter.
[ OK ] Test_ONNX_nets.Googlenet/0 (1013 ms)
[ RUN ] Test_ONNX_nets.Googlenet/1, where GetParam() = OCV/OCL_FP16
[ERROR:0] global ..\modules\dnn\src\ocl4dnn\src\ocl4dnn_conv_spatial.cpp (1205) cv::dnn::ocl4dnn::OCL4DNNConvSpatial<float>::verifyResult Kernel: U_GEMM_LIKE_CONV_k1x1_cn64_g1_s1x1_d1x1_b1_in64x64_p0x0_num2_M64_activ1_eltwise0_FP16_5_1_8_32_SIMD8
[ERROR:0] global ..\modules\dnn\src\ocl4dnn\src\ocl4dnn_conv_spatial.cpp (1211) cv::dnn::ocl4dnn::OCL4DNNConvSpatial<float>::verifyResult test verification failed @ image 0 group 0 out_ch 0 h 0 w 0 (offset: 0) got -nan(ind) expected -nan(ind)
[ERROR:0] global ..\modules\dnn\src\ocl4dnn\src\ocl4dnn_conv_spatial.cpp (1211) cv::dnn::ocl4dnn::OCL4DNNConvSpatial<float>::verifyResult test verification failed @ image 0 group 0 out_ch 0 h 0 w 1 (offset: 1) got -nan(ind) expected -nan(ind)
...thousands of errors...
Expected
No critical errors.
Info
OpenCV version: 4.5.4
OpenCV VCS version: 4.5.4
Build type: Debug
Compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe (ver 19.29.30136.0)
Parallel framework: tbb (nthreads=16)
CPU features: SSE SSE2 SSE3 SSSE3 SSE4.1 POPCNT SSE4.2 AVX *FP16 *AVX2 *AVX512-SKX?
Intel(R) IPP version: ippIP AVX2 (l9) 2021.4 (r0xa677c254) Sep 3 2021
Intel(R) IPP features code: 0x8000
OpenCL Platforms:
NVIDIA CUDA
dGPU: NVIDIA GeForce RTX 2070 Super (OpenCL 3.0 CUDA)
Intel(R) OpenCL HD Graphics
iGPU: Intel(R) UHD Graphics (OpenCL 3.0 NEO )
Intel(R) OpenCL
CPU: Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz (OpenCL 2.1 (Build 0))
Current OpenCL device:
Type = iGPU
Name = Intel(R) UHD Graphics
Version = OpenCL 3.0 NEO
Driver version = 30.0.100.9805
Address bits = 64
Compute units = 24
Max work group size = 256
Local memory size = 64 KB
Max memory allocation size = 3 GB 1023 MB 1016 KB
Double support = Yes
Half support = Yes
Host unified memory = Yes
Device extensions:
cl_khr_byte_addressable_store
cl_khr_fp16
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_icd
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_intel_command_queue_families
cl_intel_subgroups
cl_intel_required_subgroup_size
cl_intel_subgroups_short
cl_khr_spir
cl_intel_accelerator
cl_intel_driver_diagnostics
cl_khr_priority_hints
cl_khr_throttle_hints
cl_khr_create_command_queue
cl_intel_subgroups_char
cl_intel_subgroups_long
cl_khr_il_program
cl_intel_mem_force_host_memory
cl_khr_subgroup_extended_types
cl_khr_subgroup_non_uniform_vote
cl_khr_subgroup_ballot
cl_khr_subgroup_non_uniform_arithmetic
cl_khr_subgroup_shuffle
cl_khr_subgroup_shuffle_relative
cl_khr_subgroup_clustered_reduce
cl_intel_device_attribute_query
cl_khr_fp64
cl_khr_subgroups
cl_intel_spirv_device_side_avc_motion_estimation
cl_intel_spirv_media_block_io
cl_intel_spirv_subgroups
cl_khr_spirv_no_integer_wrap_decoration
cl_intel_unified_shared_memory_preview
cl_khr_mipmap_image
cl_khr_mipmap_image_writes
cl_intel_planar_yuv
cl_intel_packed_yuv
cl_intel_motion_estimation
cl_intel_device_side_avc_motion_estimation
cl_intel_advanced_motion_estimation
cl_khr_int64_base_atomics
cl_khr_int64_extended_atomics
cl_khr_image2d_from_buffer
cl_khr_depth_images
cl_khr_3d_image_writes
cl_intel_media_block_io
cl_khr_gl_sharing
cl_khr_gl_depth_images
cl_khr_gl_event
cl_khr_gl_msaa_sharing
cl_intel_dx9_media_sharing
cl_khr_dx9_media_sharing
cl_khr_d3d10_sharing
cl_khr_d3d11_sharing
cl_intel_d3d11_nv12_media_sharing
cl_intel_sharing_format_query
cl_khr_pci_bus_info
cl_intel_simultaneous_sharing
Exception thrown at 0x00007FF8AA5F4F99 in opencv_test_dnnd.exe: Microsoft C++ exception: cv::Exception at memory location 0x000000FA786F99D0.
Has AMD Blas = No
Exception thrown at 0x00007FF8AA5F4F99 in opencv_test_dnnd.exe: Microsoft C++ exception: cv::Exception at memory location 0x000000FA786F9900.
Has AMD Fft = No
Preferred vector width char = 16
Preferred vector width short = 8
Preferred vector width int = 4
Preferred vector width long = 1
Preferred vector width float = 1
Preferred vector width double = 1
Preferred vector width half = 8
Issue submission checklist
- I report the issue, it's not a question
- I checked the problem with documentation, FAQ, open issues,
forum.opencv.org, Stack Overflow, etc and have not found solution - I updated to latest OpenCV version and the issue is still there
- There is reproducer code and related data files: videos, images, onnx, etc
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugcategory: dnncategory: oclconfirmedThere is stable reproducer / investigation completeThere is stable reproducer / investigation complete