Enable cuda4dnn on hardware without support for __half by JulienMaille · Pull Request #16218 · opencv/opencv

JulienMaille · 2019-12-21T13:01:26Z

ie. hardware with compute capability < 5.3

It compiles/link fine and I was able to run some inference on my Geforce 960!
Right now I limited support to CC 5.2+ but I suppose we can go lower, what do you think? 5.0? 4.0?

@YashasSamaga said

Some checks in dnn.cpp to identify use of DNN_TARGET_CUDA_FP16 when half precision is disabled.

But compute capability has to be queried at runtime on the selected device in order to tell if FP16 is supported or not, correct?

force_builders=Custom,docs
buildworker:Custom=linux-4
docker_image:Custom=ubuntu-cuda:18.04

build_image:Custom Mac=openvino-2019r3.0
build_image:Custom Win=openvino-2019r3.0
test_opencl:Custom Win=OFF
test_modules:Custom Mac=dnn,java,python3

YashasSamaga · 2019-12-21T13:49:51Z

The DNN_TARGET_CUDA_FP16 option must exist in the enumeration irrespective of whether that target is supported or not. This is required to maintain ABI compatibility. Hence, it's possible for a user who has built the module without FP16 support to set the target to DNN_TARGET_CUDA_FP16. Now since this target isn't supported on the device, there should be some error or a warning and then switch to DNN_TARGET_CUDA.

The capability would be known at compile-time because you decide at compile-time whether the half precision kernels would be instantiated or not.

The CUDA backend would create computation nodes for each supported layer in initCUDABackend(). This would invoke initCUDA() method on every layer which will create and return a node. This node is constructed using a helper template called make_cuda_node which automatically instantiates the correct node template based on the target.

The backend nodes take the form given below where T is float or half depending on the target.

template <class T>
class SomeComputeNode : public CUDABackendNode;

Here is what make_cuda_node does:

opencv/modules/dnn/src/op_cuda.hpp

Lines 169 to 201 in 89d3f95

    
               /** @brief utility function which creates CUDA node of correct type from `targetId` 
        
                * 
        
                * CUDA operation nodes take the type of data they operate on as a template parameter. 
        
                * For example, ConcatOp<float> is an operation node which concats tensors of `float` type 
        
                * into a tensor of `float` type. 
        
                * 
        
                * This utility function aids the creation of nodes of different types and eliminates the 
        
                * need for CUDA target constants (`DNN_TARGET_XXX`) to appear in the operation code which 
        
                * reduces coupling between modules. 
        
                * 
        
                * Example: 
        
                * template <class T> 
        
                * class ConcatOp : public CUDABackendNode; 
        
                * 
        
                * // returns a cv::Ptr to a ConcatOp<half> object 
        
                * auto node = make_cuda_node<ConcatOp>(DNN_TARGET_CUDA_FP16, axis); 
        
                * 
        
                * // returns a cv::Ptr to a ConcatOp<float> object 
        
                * auto node = make_cuda_node<ConcatOp>(DNN_TARGET_CUDA, axis); 
        
                */ 
        
               template <template <class> class NodeType, class ...Args> 
        
               cv::Ptr<BackendNode> make_cuda_node(int targetId, Args&& ...args) { 
        
                   switch (targetId) 
        
                   { 
        
                   case DNN_TARGET_CUDA_FP16: 
        
                       return Ptr<BackendNode>(new NodeType<half>(std::forward<Args>(args)...)); 
        
                   case DNN_TARGET_CUDA: 
        
                       return Ptr<BackendNode>(new NodeType<float>(std::forward<Args>(args)...)); 
        
                   default: 
        
                       CV_Assert(IS_DNN_CUDA_TARGET(targetId)); 
        
                   } 
        
                   return Ptr<BackendNode>(); 
        
               }

This will attempt to instantiate half backend nodes which in turn will attempt to invoke half precision kernels. Now since the half-precision kernels were not instantiated, this should lead to a truckload of linker errors.

I am confused how you have managed to build. I'll check in a few hours.

modules/dnn/CMakeLists.txt

JulienMaille · 2019-12-21T14:10:13Z

The capability would be known at compile-time because you decide at compile-time whether the half precision kernels would be instantiated or not.

I do not fully agree, see below.

I am confused how you have managed to build. I'll check in a few hours.

What I did is make sure nvcc (device code) doesn't compile the __half related code when compiling for a CC<5.3
gcc/msvc (host code) is still compiling all the code handling __half.

I ran my CMAKE with CUDA_ARCH_BIN="5.2 5.3 6.0 6.1 7.0 7.5" so in the end I still support __half when it is available.

That's the reason why I said you don't know at compile time if fp16 will be supported or not, you need a contextinfo to resolve this

YashasSamaga · 2019-12-21T14:35:03Z

Can you try with just 5.2 in CUDA_ARCH_BIN?

JulienMaille · 2019-12-21T14:59:21Z

Sure, what do you expect?

JulienMaille · 2019-12-21T22:00:17Z

Can you try with just 5.2 in CUDA_ARCH_BIN?

I just did, it compiles and works

JulienMaille · 2019-12-29T10:50:35Z

@asmorkalov I updated this PR to handle latest changes in .cu files. Let me know if and how I can help.

YashasSamaga · 2019-12-29T12:59:02Z

There is a new transpose kernel in permute.cu which was added along with the copy kernel.

All the FP16 tests fail when there is no FP16 support (used CUDA_ARCH_BIN as 5.2). An example failure:

[ RUN      ] Test_ONNX_nets.ResNet50v1/1, where GetParam() = CUDA/CUDA_FP16
unknown file: Failure
C++ exception with description "OpenCV(4.2.0-dev) /FakePath/execution.hpp:52: error: (-217:Gpu API call) invalid device function in function 'make_policy'
" thrown in the test body.
[  FAILED  ] Test_ONNX_nets.ResNet50v1/1, where GetParam() = CUDA/CUDA_FP16 (271 ms)

These should be disabled at runtime if possible as they are not really failures.

JulienMaille · 2019-12-29T15:38:51Z

Do you know how to test compute capability at runtime?

YashasSamaga · 2019-12-29T15:46:29Z

You might have to use cudaDeviceGetAttribute and obtain the attributes corresponding to cudaDevAttrComputeCapabilityMajor and cudaDevAttrComputeCapabilityMinor. You have to put them together major.minor to get the compute capability.

You will need the cuda_runtime.h header whose inclusion would have to guarded by #ifdef HAVE_CUDA. You can get the device id for which the tests will be running using cudaGetDevice.

There is another issue which is unrelated to this PR: if you build for 7.5 only and try to run on a 6.1 GPU, all the CUDA tests would fail because there is no kernel PTX or binaries available for that GPU.

JulienMaille · 2019-12-29T16:01:59Z

In that case (built for 7.5 but ran on 6.x) can't you have just in time compilation?

YashasSamaga · 2019-12-29T16:16:03Z

CUDA support in OpenCV provides two options:

you can build binaries for various architectures
you can generate PTX for various virtual architectures

The CUDA runtime generates the binary for the device from the PTX (generated by the compiler) at runtime. This generation incurs a cost at runtime but the generated binary is cached. If the binaries are pre-built at compile-time, you can avoid this initialization cost but this would increase the size of the binaries.

Currently, these are the only two mechanisms supported by the CUDA backend. Runtime compilation is something I have planned for future (other ideas can be found here). It's quite complex and non-trival to implement JIT especially with the current template based kernels.

JulienMaille · 2019-12-29T20:55:11Z

All the FP16 tests fail when there is no FP16 support (used CUDA_ARCH_BIN as 5.2). An example failure:
These should be disabled at runtime if possible as they are not really failures.

Can you try again with latest commit? BTW I must be stupid but I can't find how to compile and run tests. I have replaced -DBUILD_TESTS=OFF with -DBUILD_TESTS=ON but I don't see them in the generated solution.

alalek · 2019-12-29T20:59:57Z

--D_BUILD_TESTS=ON
+-DBUILD_TESTS=ON

JulienMaille · 2019-12-29T21:04:50Z

@alalek sorry this is just a typo in my comment, but not in my command line.
When I build RUN_TESTS I get

No tests were found!!!

alalek · 2019-12-29T21:12:09Z

Build opencv_test_dnn target.
Run ./bin/opencv_test_dnn binary.
Also you should specify environment variables for tests:

OPENCV_TEST_DATA_PATH=<opencv_extra>/testdata (clone "opencv_extra" repository)
and optionally OPENCV_DNN_TEST_DATA_PATH (need to download 5+Gb)

YashasSamaga · 2019-12-30T04:48:05Z

@Nefast You need to have the ts module (which in turn requires videoio module) to run the tests. I have often got No tests were found every time I forgot to enable these modules.

JulienMaille · 2020-01-02T13:27:37Z

I confirm the runtime check works and doesn't show FP16 target on my Geforce 960

JulienMaille · 2020-01-03T22:33:10Z

@YashasSamaga Do you confirm you now pass the tests?
I've been looking at the code and there's a lot of stuff I don't understand like this:

opencv/modules/dnn/test/test_common.impl.hpp

Line 410 in 43a91f8

    
           CV_TEST_TAG_DNN_SKIP_CUDA, CV_TEST_TAG_DNN_SKIP_CUDA_FP32, CV_TEST_TAG_DNN_SKIP_CUDA_FP16

YashasSamaga · 2020-01-04T04:38:48Z

@JulienMaille I don't own a device with CC 5.2 or below (back at college and I don't have one here). I have to borrow it from someone. I think it's sufficient if you could upload the output of opencv_test_dnn and opencv_perf_dnn.

Those are tags which are used to mark the tests. They are specifically skip tags which cause the tests which are marked with any of them to be skipped.

JulienMaille · 2020-01-07T08:13:42Z

@YashasSamaga what I don't understand with the code I've linked is that it looks like if cuda is present then we set the flag to skip cuda tests.

YashasSamaga · 2020-01-07T09:23:54Z

@JulienMaille Registering skip tags and applying skip tags are different. Applying a skip tag is what causes the test to be skipped.

YashasSamaga · 2020-01-08T04:56:25Z

@JulienMaille Please rebase onto master. #16230 added new half-precision kernels. These kernels are used in FP16 target only.

Need a check for the transpose kernel here:

opencv/modules/dnn/src/cuda/permute.cu

Lines 130 to 133 in c8419ff

    
           template void transpose(const Stream&, Span<__half>, View<__half>, std::size_t, std::size_t); 
        
           template void transpose(const Stream&, Span<float>, View<float>, std::size_t, std::size_t);

JulienMaille · 2020-01-08T20:04:46Z

Done, rebasing was enough. Do you think this can be merged soon?

asmorkalov · 2020-01-10T05:06:48Z

@JulienMaille CI bot reports build error:

/build/precommit_linux64/opencv/modules/dnn/src/dnn.cpp:138:2: error: #endif without #if
 #endif

asmorkalov · 2020-01-10T12:48:01Z

@JulienMaille Please take a look on CI again. Build is still broken: https://pullrequest.opencv.org/buildbot/builders/precommit_linux64/builds/24320/steps/compile%20release/logs/stdio

JulienMaille · 2020-01-10T19:05:48Z

@asmorkalov Forgot to remove an extra #endif, it is fixed now

JulienMaille · 2020-01-11T09:42:35Z

@YashasSamaga probably a stupid question, but does cudnn module relies on cublas? (I'm really suprised by the size of the dll that have to be redistributed and trying to squeeze out anything useless)

YashasSamaga · 2020-01-11T11:34:25Z

@JulienMaille cuDNN does not require cuBLAS but the CUDA backend requires cuBLAS for GEMM.

modules/dnn/src/cuda/atomics.hpp

asmorkalov · 2020-01-14T05:39:23Z

👍

…ability < 5.3) Update CMakeLists.txt Lowered minimum CC to 3.0

asmorkalov · 2020-01-15T05:28:42Z

👍 @alalek Please take a look and merge.

alalek

Looks good to me 👍

modules/dnn/CMakeLists.txt

jiapei100 · 2020-01-15T19:08:47Z

Clearly, there is a one trivial mistake, I cannot PR for now. But, please refer to: https://github.com/jiapei100/opencv/blob/master/modules/dnn/src/cuda/math.hpp
Around line 135.

JulienMaille · 2020-01-15T20:08:59Z

@alalek jiape is right, the #endif was including a float operation
Correction here, shall I rebase and create a new PR?
JulienMaille@3d3ed03

alalek · 2020-01-16T02:13:14Z

@JulienMaille Sure! Feel free to prepare new PR with fix (add relates #16218 into description to add GitHub cross-link).

sl1pkn07 · 2020-02-13T19:13:40Z

please backport (or upport) to opencv 4.2 tag

greetings

for example, my hardware setup don't have cuda 5.3 feature

--     NVIDIA GPU arch:             30 35 37 50 52 60 61 70 75

(Nvidia 2060RTX, cuda 10.2)

…gpus Enable cuda4dnn on hardware without support for __half * Enable cuda4dnn on hardware without support for half (ie. compute capability < 5.3) Update CMakeLists.txt Lowered minimum CC to 3.0 * UPD: added ifdef on new copy kernel * added fp16 support detection at runtime * Clarified #if condition on atomicAdd definition * More explicit CMake error message

YashasSamaga reviewed Dec 21, 2019

View reviewed changes

modules/dnn/CMakeLists.txt Outdated Show resolved Hide resolved

asmorkalov self-assigned this Dec 23, 2019

asmorkalov added category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib category: dnn labels Dec 23, 2019

JulienMaille force-pushed the cuda-dnn-for-older-gpus branch from 0fc2d14 to 349278e Compare December 29, 2019 10:41

saudet mentioned this pull request Dec 31, 2019

opencv 4.2.0 with support for cuda4dnn - (opencv/opencv#16218) bytedeco/javacpp-presets#832

Closed

JulienMaille force-pushed the cuda-dnn-for-older-gpus branch from c8419ff to 6750ab6 Compare January 8, 2020 20:04

JulienMaille force-pushed the cuda-dnn-for-older-gpus branch from 78edbd0 to 4b5340d Compare January 10, 2020 10:54

JulienMaille force-pushed the cuda-dnn-for-older-gpus branch from 4b5340d to f0df8ce Compare January 10, 2020 14:16

asmorkalov approved these changes Jan 14, 2020

View reviewed changes

modules/dnn/src/cuda/atomics.hpp Outdated Show resolved Hide resolved

Julien Maille added 3 commits January 14, 2020 22:05

Enable cuda4dnn on hardware without support for half (ie. compute cap…

97e6aa0

…ability < 5.3) Update CMakeLists.txt Lowered minimum CC to 3.0

UPD: added ifdef on new copy kernel

df3f833

added fp16 support detection at runtime

9c24ca2

JulienMaille force-pushed the cuda-dnn-for-older-gpus branch 2 times, most recently from d5fb32e to 9c24ca2 Compare January 14, 2020 21:16

Clarified #if condition on atomicAdd definition

93e5f91

alalek mentioned this pull request Jan 15, 2020

CUDA backend for DNN module requires CC 5.3 or higher. Please remove unsupported architectures from CUDA_ARCH_BIN option. #15930

Closed

alalek approved these changes Jan 15, 2020

View reviewed changes

modules/dnn/CMakeLists.txt Outdated Show resolved Hide resolved

More explicit CMake error message

f1d57d6

alalek merged commit 4e2ef8c into opencv:master Jan 15, 2020

JulienMaille deleted the cuda-dnn-for-older-gpus branch January 15, 2020 16:20

JulienMaille restored the cuda-dnn-for-older-gpus branch January 15, 2020 20:03

This was referenced Jan 16, 2020

Fix: rsqrt(float) was improperly put in the ifdef for half #16363

Merged

FIX: disable dnn cuda input_shortcut on _half for CC<5.3 #16376

Merged

YashasSamaga mentioned this pull request Feb 19, 2020

fix and enable tests for DNN_TARGET_CUDA_FP16 #16010

Merged

Uh oh!

Conversation

JulienMaille commented Dec 21, 2019 • edited by alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YashasSamaga commented Dec 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

JulienMaille commented Dec 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YashasSamaga commented Dec 21, 2019

Uh oh!

JulienMaille commented Dec 21, 2019

Uh oh!

JulienMaille commented Dec 21, 2019

Uh oh!

JulienMaille commented Dec 29, 2019

Uh oh!

YashasSamaga commented Dec 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulienMaille commented Dec 29, 2019

Uh oh!

YashasSamaga commented Dec 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulienMaille commented Dec 29, 2019

Uh oh!

YashasSamaga commented Dec 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulienMaille commented Dec 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalek commented Dec 29, 2019

Uh oh!

JulienMaille commented Dec 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalek commented Dec 29, 2019

Uh oh!

YashasSamaga commented Dec 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulienMaille commented Jan 2, 2020

Uh oh!

JulienMaille commented Jan 3, 2020

Uh oh!

YashasSamaga commented Jan 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulienMaille commented Jan 7, 2020

Uh oh!

YashasSamaga commented Jan 7, 2020

Uh oh!

YashasSamaga commented Jan 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulienMaille commented Jan 8, 2020

Uh oh!

asmorkalov commented Jan 10, 2020

Uh oh!

asmorkalov commented Jan 10, 2020

Uh oh!

JulienMaille commented Jan 10, 2020

Uh oh!

JulienMaille commented Jan 11, 2020

Uh oh!

YashasSamaga commented Jan 11, 2020

Uh oh!

Uh oh!

asmorkalov commented Jan 14, 2020

Uh oh!

asmorkalov commented Jan 15, 2020

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

JulienMaille commented Dec 21, 2019 •

edited by alalek

Loading

YashasSamaga commented Dec 21, 2019 •

edited

Loading

JulienMaille commented Dec 21, 2019 •

edited

Loading

YashasSamaga commented Dec 29, 2019 •

edited

Loading

YashasSamaga commented Dec 29, 2019 •

edited

Loading

YashasSamaga commented Dec 29, 2019 •

edited

Loading

JulienMaille commented Dec 29, 2019 •

edited

Loading

JulienMaille commented Dec 29, 2019 •

edited

Loading

YashasSamaga commented Dec 30, 2019 •

edited

Loading

YashasSamaga commented Jan 4, 2020 •

edited

Loading

YashasSamaga commented Jan 8, 2020 •

edited

Loading

jiapei100 commented Jan 15, 2020 •

edited

Loading

sl1pkn07 commented Feb 13, 2020 •

edited

Loading