-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
cudafilters: Median_Accuracy fails with CUDA 9.0 and after #12721
Copy link
Copy link
Closed
Labels
bugcategory: gpu/cuda (contrib)OpenCV 4.0+: moved to opencv_contribOpenCV 4.0+: moved to opencv_contrib
Description
System information (version)
- OpenCV =>3.4 branch ( a1ffc00 )
- Operating System / Platform => Windows 10 64bit
- Compiler => Visual Studio 2015
- CUDA => 10.0
- GPU => Geforce GTX 1060
Summary
The opencv_test_cudafilters fails on Median_Accuracy test
opencv_test_cudafilters --gtest_filter=*Median.Accuracy*
[ PASSED ] 0 tests.
[ FAILED ] 28 tests, listed below:
[ FAILED ] CUDA_Filters/Median.Accuracy/0, where GetParam() = (GeForce GTX 1060, 128x128, CV_8U, KernelSize(3), whole matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/1, where GetParam() = (GeForce GTX 1060, 128x128, CV_8U, KernelSize(3), sub matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/2, where GetParam() = (GeForce GTX 1060, 128x128, CV_8U, KernelSize(5), whole matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/3, where GetParam() = (GeForce GTX 1060, 128x128, CV_8U, KernelSize(5), sub matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/4, where GetParam() = (GeForce GTX 1060, 128x128, CV_8U, KernelSize(7), whole matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/5, where GetParam() = (GeForce GTX 1060, 128x128, CV_8U, KernelSize(7), sub matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/6, where GetParam() = (GeForce GTX 1060, 128x128, CV_8U, KernelSize(9), whole matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/7, where GetParam() = (GeForce GTX 1060, 128x128, CV_8U, KernelSize(9), sub matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/8, where GetParam() = (GeForce GTX 1060, 128x128, CV_8U, KernelSize(11), whole matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/9, where GetParam() = (GeForce GTX 1060, 128x128, CV_8U, KernelSize(11), sub matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/10, where GetParam() = (GeForce GTX 1060, 128x128, CV_8U, KernelSize(13), whole matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/11, where GetParam() = (GeForce GTX 1060, 128x128, CV_8U, KernelSize(13), sub matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/12, where GetParam() = (GeForce GTX 1060, 128x128, CV_8U, KernelSize(15), whole matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/13, where GetParam() = (GeForce GTX 1060, 128x128, CV_8U, KernelSize(15), sub matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/14, where GetParam() = (GeForce GTX 1060, 113x113, CV_8U, KernelSize(3), whole matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/15, where GetParam() = (GeForce GTX 1060, 113x113, CV_8U, KernelSize(3), sub matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/16, where GetParam() = (GeForce GTX 1060, 113x113, CV_8U, KernelSize(5), whole matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/17, where GetParam() = (GeForce GTX 1060, 113x113, CV_8U, KernelSize(5), sub matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/18, where GetParam() = (GeForce GTX 1060, 113x113, CV_8U, KernelSize(7), whole matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/19, where GetParam() = (GeForce GTX 1060, 113x113, CV_8U, KernelSize(7), sub matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/20, where GetParam() = (GeForce GTX 1060, 113x113, CV_8U, KernelSize(9), whole matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/21, where GetParam() = (GeForce GTX 1060, 113x113, CV_8U, KernelSize(9), sub matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/22, where GetParam() = (GeForce GTX 1060, 113x113, CV_8U, KernelSize(11), whole matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/23, where GetParam() = (GeForce GTX 1060, 113x113, CV_8U, KernelSize(11), sub matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/24, where GetParam() = (GeForce GTX 1060, 113x113, CV_8U, KernelSize(13), whole matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/25, where GetParam() = (GeForce GTX 1060, 113x113, CV_8U, KernelSize(13), sub matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/26, where GetParam() = (GeForce GTX 1060, 113x113, CV_8U, KernelSize(15), whole matrix)
[ FAILED ] CUDA_Filters/Median.Accuracy/27, where GetParam() = (GeForce GTX 1060, 113x113, CV_8U, KernelSize(15), sub matrix)
28 FAILED TESTS
Detailed description
- I downloaded the GpuMat and stored in file.
and
, definitly, it's not a minor error. Something was wrong.
- Also, I traced back my personal build farm log, and realized that in July, with CUDA 8.0 + Jetson TX2, the
opencv_test_cudafiltershad no problem
Device 0: "GP10B"
CUDA Driver Version / Runtime Version 8.50 / 8.0
CUDA Capability Major/Minor version number: 6.2
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.50, CUDA Runtime Version = 8.0, NumDevs = 1
CTEST_FULL_OUTPUT
OpenCV version: 3.4.2-dev
OpenCV VCS version: 3.4.2-119-g8c75555
[----------] 28 tests from CUDA_Filters/Median
[ RUN ] CUDA_Filters/Median.Accuracy/0, where GetParam() = (GP10B, 128x128, CV_8U, KernelSize(3), whole matrix)
[ OK ] CUDA_Filters/Median.Accuracy/0 (22 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/1, where GetParam() = (GP10B, 128x128, CV_8U, KernelSize(3), sub matrix)
[ OK ] CUDA_Filters/Median.Accuracy/1 (23 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/2, where GetParam() = (GP10B, 128x128, CV_8U, KernelSize(5), whole matrix)
[ OK ] CUDA_Filters/Median.Accuracy/2 (25 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/3, where GetParam() = (GP10B, 128x128, CV_8U, KernelSize(5), sub matrix)
[ OK ] CUDA_Filters/Median.Accuracy/3 (25 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/4, where GetParam() = (GP10B, 128x128, CV_8U, KernelSize(7), whole matrix)
[ OK ] CUDA_Filters/Median.Accuracy/4 (32 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/5, where GetParam() = (GP10B, 128x128, CV_8U, KernelSize(7), sub matrix)
[ OK ] CUDA_Filters/Median.Accuracy/5 (26 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/6, where GetParam() = (GP10B, 128x128, CV_8U, KernelSize(9), whole matrix)
[ OK ] CUDA_Filters/Median.Accuracy/6 (28 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/7, where GetParam() = (GP10B, 128x128, CV_8U, KernelSize(9), sub matrix)
[ OK ] CUDA_Filters/Median.Accuracy/7 (27 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/8, where GetParam() = (GP10B, 128x128, CV_8U, KernelSize(11), whole matrix)
[ OK ] CUDA_Filters/Median.Accuracy/8 (30 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/9, where GetParam() = (GP10B, 128x128, CV_8U, KernelSize(11), sub matrix)
[ OK ] CUDA_Filters/Median.Accuracy/9 (28 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/10, where GetParam() = (GP10B, 128x128, CV_8U, KernelSize(13), whole matrix)
[ OK ] CUDA_Filters/Median.Accuracy/10 (30 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/11, where GetParam() = (GP10B, 128x128, CV_8U, KernelSize(13), sub matrix)
[ OK ] CUDA_Filters/Median.Accuracy/11 (30 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/12, where GetParam() = (GP10B, 128x128, CV_8U, KernelSize(15), whole matrix)
[ OK ] CUDA_Filters/Median.Accuracy/12 (32 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/13, where GetParam() = (GP10B, 128x128, CV_8U, KernelSize(15), sub matrix)
[ OK ] CUDA_Filters/Median.Accuracy/13 (33 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/14, where GetParam() = (GP10B, 113x113, CV_8U, KernelSize(3), whole matrix)
[ OK ] CUDA_Filters/Median.Accuracy/14 (18 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/15, where GetParam() = (GP10B, 113x113, CV_8U, KernelSize(3), sub matrix)
[ OK ] CUDA_Filters/Median.Accuracy/15 (17 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/16, where GetParam() = (GP10B, 113x113, CV_8U, KernelSize(5), whole matrix)
[ OK ] CUDA_Filters/Median.Accuracy/16 (18 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/17, where GetParam() = (GP10B, 113x113, CV_8U, KernelSize(5), sub matrix)
[ OK ] CUDA_Filters/Median.Accuracy/17 (19 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/18, where GetParam() = (GP10B, 113x113, CV_8U, KernelSize(7), whole matrix)
[ OK ] CUDA_Filters/Median.Accuracy/18 (20 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/19, where GetParam() = (GP10B, 113x113, CV_8U, KernelSize(7), sub matrix)
[ OK ] CUDA_Filters/Median.Accuracy/19 (19 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/20, where GetParam() = (GP10B, 113x113, CV_8U, KernelSize(9), whole matrix)
[ OK ] CUDA_Filters/Median.Accuracy/20 (21 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/21, where GetParam() = (GP10B, 113x113, CV_8U, KernelSize(9), sub matrix)
[ OK ] CUDA_Filters/Median.Accuracy/21 (20 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/22, where GetParam() = (GP10B, 113x113, CV_8U, KernelSize(11), whole matrix)
[ OK ] CUDA_Filters/Median.Accuracy/22 (22 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/23, where GetParam() = (GP10B, 113x113, CV_8U, KernelSize(11), sub matrix)
[ OK ] CUDA_Filters/Median.Accuracy/23 (20 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/24, where GetParam() = (GP10B, 113x113, CV_8U, KernelSize(13), whole matrix)
[ OK ] CUDA_Filters/Median.Accuracy/24 (19 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/25, where GetParam() = (GP10B, 113x113, CV_8U, KernelSize(13), sub matrix)
[ OK ] CUDA_Filters/Median.Accuracy/25 (20 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/26, where GetParam() = (GP10B, 113x113, CV_8U, KernelSize(15), whole matrix)
[ OK ] CUDA_Filters/Median.Accuracy/26 (21 ms)
[ RUN ] CUDA_Filters/Median.Accuracy/27, where GetParam() = (GP10B, 113x113, CV_8U, KernelSize(15), sub matrix)
[ OK ] CUDA_Filters/Median.Accuracy/27 (21 ms)
[----------] 28 tests from CUDA_Filters/Median (669 ms total)
- I rolled back OpenCV source code on Windows to 8c75555 and confirmed that the test fails. (CUDA 10.0, Visual Studio 2015)
- To narrow down the cause, I installed CUDA 8.0 on Windows, too.
- Building OpenCV with CUDA 8.0 passes, but with CUDA 10.0, it fails. (Also, it failed on CUDA 9.0 on my platform)
- To narrow down the cause further, I jumped in the CUDA code
- Now, luckily, the output was done only at one location.
dest.ptr(i)[j]=(firstBin<<5) + retval; - Comparing both CUDA 8.0 version and CUDA 10.0 version, both
firstBinandretvaldiffered in the source code
dest.ptr(i)[j]=(firstBin<<5) + retval; // differs
dest.ptr(i)[j]=firstBin; // differs
dest.ptr(i)[j]=retval; // differs
- I chose
retvalto trace back andretvalwas decided here
histogramMedianPar32LookupOnly(HFine[firstBin],HCoarseScan,leftOver,&retval,&countAtMed); - In this function
histogramMedianPar32LookupOnlyI found a "racing condition" while accessing shared memory here
opencv/modules/cudafilters/src/cuda/median_filter.cu
Lines 162 to 171 in a1ffc00
if(tx>=1) Hscan[tx]+=Hscan[tx-1]; if(tx>=2) Hscan[tx]+=Hscan[tx-2]; if(tx>=4) Hscan[tx]+=Hscan[tx-4]; if(tx>=8) Hscan[tx]+=Hscan[tx-8]; if(tx>=16) Hscan[tx]+=Hscan[tx-16]; - I did very similar fix in Arm: fix the test failure of OCL_Imgproc/CLAHETest.Accuracy on ODROID-XU4 #11409, I added
__syncthreads()to avoid the race condition - Also there was another race condition in
histogramMedianPar8LookupOnly, here
opencv/modules/cudafilters/src/cuda/median_filter.cu
Lines 131 to 136 in a1ffc00
if(tx>=1 ) Hscan[tx]+=Hscan[tx-1]; if(tx>=2) Hscan[tx]+=Hscan[tx-2]; if(tx>=4) Hscan[tx]+=Hscan[tx-4]; - which was also causing the difference.
- Adding the
__syncthreads()in neccessary places, the test managed to pass.
opencv_test_cudafilters --gtest_filters=*Median_Accuracy*
[==========] Running 28 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 28 tests from CUDA_Filters/Median
[----------] 28 tests from CUDA_Filters/Median (1491 ms total)
[----------] Global test environment tear-down
[==========] 28 tests from 1 test case ran. (1521 ms total)
[ PASSED ] 28 tests.
Press any key to continue . . .[----------] 28 tests from CUDA_Filters/Median (1491 ms total)
[----------] Global test environment tear-down
[==========] 28 tests from 1 test case ran. (1521 ms total)
[ PASSED ] 28 tests.
- The cause explains the situation. The bug did exists from before, but it just didn't appear.
- Upgrading the CUDA was big enough to make the race condition visible.
- I'll send a PR later
Steps to reproduce
- Build OpenCV with CUDA 9.0 or later
- run
opencv_test_cudafilters
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugcategory: gpu/cuda (contrib)OpenCV 4.0+: moved to opencv_contribOpenCV 4.0+: moved to opencv_contrib