Try to Fix #3725: cudaarithm: fix the compile faiure of CUDA 12.#3726
Try to Fix #3725: cudaarithm: fix the compile faiure of CUDA 12.#3726asmorkalov merged 1 commit intoopencv:4.xfrom
Conversation
|
/cc @cudawarped |
sdy623
left a comment
There was a problem hiding this comment.
I am pretty sure it was introduced in CUDA 12.4 (12040).
Is it worth including an assert before it is used because BufferPool.getBuffer() expects an int? e.g.
CV_Assert(bufSize <= std::numeric_limits<int>::max());
What about the next call to get buffer size?
Have you tested this on 12.4? I think that it will still fail because of the other bug so I am not sure if it will pass any CI tests built against CUDA 12.4.
I checked the denfition of CUDA_VERSION in CUDA 12 which is the 12.XX.XX instad of a int number, so I edited the CmakeLists.txt to add a new denfination of CUDA_12_OR_HIGHER and fixed the compile of reductions.cpp, however It went other errors still need to slove
C:\opencv-bld\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp(379): error: no instance of overloaded function "cv::cudev::blockReduce" matches the argument list
argument types are: (cuda::std::__4::tuple<volatile int *, volatile int *>, cuda::std::__4::tuple<int &, int &>, int, cuda::std::__4::tuple<cv::cudev::minimum<int>, cv::cudev::maximum<int>>)
blockReduce<BLOCK_SIZE>(smem_tuple(sminval, smaxval), tie(mymin, mymax), tid, make_tuple(minOp, maxOp))
I think you are confusing CMake generation and compilation. Adding that definition which is still for the incorrect verison of CUDA into the CMake file is unecessary. Version Info C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda.h Function definitions
|
39073a2 to
34bd538
Compare
I deleted the edit of the CMake and find the correct CUDA_VERSION is 12040 in CUDA 12.4. I found a way to check the cuda.h without install it, just use 7-Zip to open the cuda_12.4.1_551.78_windows.exe and find this directory cuda_12.4.1_551.78_windows.exe\cuda_cudart\cudart\include\cuda.h and found the macro denfination |
Whilst this is a completely valid way to check the header I would advise you to install CUDA 12.4 when submitting a PR which fixes something that it breaks. If you do that you will realize that
If I was authoring this PR I would install both CUDA 12.3 and 12.4 and check that this builds on both without errors. |
A slight API change of NPP nppiMeanStdDevGetBufferHostSize_8u_C1R The type of bufSize is size_t instead of int in CUDA 12.4.x
|
@opencv-alalek this resolves the issue mentioned but still results in build errors because |
|
I still have an issue #3728 |
@jiapei100 Your issue is related to |
|
great job! thank you |
asmorkalov
left a comment
There was a problem hiding this comment.
The patch looks reasonable. I'm looking on #3690 if we can do something on OpenCV side.
A slight API change of nppiMeanStdDevGetBufferHostSize_8u_C1R and nppiMeanStdDevGetBufferHostSize_32f_C1R in NPP of CUDA 12 has caused the #3725. I will try to fix this. I found that the type of bufSize is
size_tinstead ofintin thereductions.cppin the NPP header file, where thenppi_statistics_functions.hchanged the type of second parameter from* intto* size_t.nppi_statistics_functions.h 5392:5408
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.