DNN: Let part of the operators in nary_eltwise support CUDA by WanliZhong · Pull Request #22478 · opencv/opencv

WanliZhong · 2022-09-06T06:21:55Z

This PR lets MAX, MIN, SUM, PRODUCT and DIV operators without broadcast support CUDA.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

asmorkalov · 2022-09-06T07:26:22Z

@WanliZhong Thanks for the contribution. Please enable related tests for CuDNN backend too. They are skipped for now.

WanliZhong · 2022-09-07T08:16:03Z

This work is incomplete, the input of different shapes cannot be handled yet.

asmorkalov · 2022-09-12T05:46:31Z

@WanliZhong Friendly reminder about tests.

asmorkalov · 2022-09-14T17:41:22Z

@fengyuentau @rogday please take a look.

asmorkalov · 2022-09-15T09:06:37Z

@WanliZhong The PR fails a lot of tests with CUDA. For example for my 1080 with CUDA 10.2 I see:

[  FAILED  ] 70 tests, listed below:
[  FAILED  ] Test_Model.Keypoints_pose/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_Model.Keypoints_pose/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_Model.TextDetectionByDB/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_Model.TextDetectionByDB/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_conformance.Layer_Test/test_add_CUDA_CUDA, where GetParam() = (test_add, CUDA/CUDA)
[  FAILED  ] Test_ONNX_conformance.Layer_Test/test_add_CUDA_CUDA_FP16, where GetParam() = (test_add, CUDA/CUDA_FP16)
[  FAILED  ] Test_ONNX_conformance.Layer_Test/test_sub_CUDA_CUDA, where GetParam() = (test_sub, CUDA/CUDA)
[  FAILED  ] Test_ONNX_conformance.Layer_Test/test_sub_CUDA_CUDA_FP16, where GetParam() = (test_sub, CUDA/CUDA_FP16)
[  FAILED  ] Test_ONNX_layers.Shape/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Shape/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Scale/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Scale/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Power/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Power/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Elementwise_Log/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Elementwise_Log/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Elementwise_not/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Elementwise_not/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Compare_EQ/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Compare_EQ/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Compare_GT/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Compare_GT/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Compare_LT/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Compare_LT/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.CompareSameDims_EQ/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.CompareSameDims_EQ/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.CompareSameDims_GT/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.CompareSameDims_GT/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.CompareSameDims_LT/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.CompareSameDims_LT/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Eltwise3D/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Eltwise3D/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.MatMulAdd/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.MatMulAdd/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.MultyInputs/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.MultyInputs/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Broadcast/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Broadcast/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.DynamicResize/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.DynamicResize/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Div/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Div/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.DynamicReshape/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.DynamicReshape/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Split/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Split/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.MatmulWithTwoInputs/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.MatmulWithTwoInputs/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.GatherMultiOutput/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.GatherMultiOutput/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.SubFromConst/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.SubFromConst/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.DivConst/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.DivConst/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.OutputRegistration/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.OutputRegistration/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_nets.ResNet18v1/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_nets.ResNet18v1/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_nets.ResNet50v1/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_nets.ResNet50v1/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_nets.MobileNet_v2/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_nets.MobileNet_v2/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_nets.LResNet100E_IR/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_nets.LResNet100E_IR/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_nets.Inception_v2/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_nets.Inception_v2/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_nets.DenseNet121/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_nets.DenseNet121/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_nets.Resnet34_kinetics/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_nets.Resnet34_kinetics/1, where GetParam() = CUDA/CUDA_FP16

In particular:

Note: Google Test filter = Test_ONNX_conformance.Layer_Test/test_add_CUDA_CUDA
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Test_ONNX_conformance
[ RUN      ] Test_ONNX_conformance.Layer_Test/test_add_CUDA_CUDA, where GetParam() = (test_add, CUDA/CUDA)
Exception during net.forward() call!
unknown file: Failure
C++ exception with description "OpenCV(4.6.0-dev) /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/layers/nary_eltwise_layers.cpp:674: error: (-213:The function/feature is not implemented) Other operators except MAX, MIN, SUM, PRODUCT and DIV are not supported with cuda. in function 'operator()'
" thrown in the test body.
[  FAILED  ] Test_ONNX_conformance.Layer_Test/test_add_CUDA_CUDA, where GetParam() = (test_add, CUDA/CUDA) (772 ms)
[----------] 1 test from Test_ONNX_conformance (773 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (773 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Test_ONNX_conformance.Layer_Test/test_add_CUDA_CUDA, where GetParam() = (test_add, CUDA/CUDA)

In case if the layer is not implemented the test should be disabled or skipped with not-implemented status.
The missing layers should net destroy existing tests for full featured networks like Resnet.

asmorkalov · 2022-10-11T06:47:55Z

@WanliZhong friendly reminder.

asmorkalov · 2022-10-17T06:50:00Z

@WanliZhong friendly reminder.

asmorkalov · 2022-10-17T06:51:06Z

Also please rebase your PR brunch to current 4.x. You'll get automated CuDNN branch testing with CI (new Github Actions job).

WanliZhong · 2022-10-17T12:15:33Z

OK, I will continue this PR tomorrow.

WanliZhong · 2022-10-26T05:07:37Z

I don't know if there is some way to get the shape of the input in supportBackend(). Otherwise, I have to turn off some tests. If we can get the shape of the input, we can make the backend fallback to the cpu when some nodes don't be supported.

fengyuentau · 2022-10-26T05:57:54Z

modules/dnn/src/layers/nary_eltwise_layers.cpp

+    Ptr<BackendNode> initCUDA(
+        void *context_,
+        const std::vector<Ptr<BackendWrapper>>& inputs,
+        const std::vector<Ptr<BackendWrapper>>& outputs
+    ) override
+    {


I don't know if there is some way to get the shape of the input in supportBackend().

You can do this check in initCUDA. Take a look at CudaBackendWrapper, it should have the host mat and you can get the shape of input from the host mat.

I do use it, but initCUDA is called after supportBackend so it doesn't fallback to the cpu backend.

WanliZhong · 2022-11-02T07:51:39Z

Sorry for late updating.

rogday

LGTM! 👍

WanliZhong changed the title ~~Let part of the operators in nary_eltwise support CUDA~~ DNN: Let part of the operators in nary_eltwise support CUDA Sep 6, 2022

asmorkalov added category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib pr: needs test New functionality requires minimal tests set labels Sep 6, 2022

WanliZhong changed the title ~~DNN: Let part of the operators in nary_eltwise support CUDA~~ DNN: Let part of the operators in nary_eltwise support CUDA [WIP] Sep 7, 2022

fengyuentau self-requested a review September 7, 2022 13:29

WanliZhong changed the title ~~DNN: Let part of the operators in nary_eltwise support CUDA [WIP]~~ DNN: Let part of the operators in nary_eltwise support CUDA Sep 13, 2022

asmorkalov requested a review from rogday September 13, 2022 07:58

asmorkalov removed the pr: needs test New functionality requires minimal tests set label Sep 13, 2022

asmorkalov added the incomplete label Sep 15, 2022

WanliZhong marked this pull request as draft September 27, 2022 08:22

WanliZhong force-pushed the nary_eltwise_cuda branch 3 times, most recently from b78b76e to 00d75d6 Compare October 21, 2022 05:11

fengyuentau reviewed Oct 26, 2022

View reviewed changes

Let part of the operators in nary_eltwise support cuda

11d492b

WanliZhong force-pushed the nary_eltwise_cuda branch from 0534d7a to 11d492b Compare November 2, 2022 06:10

WanliZhong marked this pull request as ready for review November 2, 2022 07:51

asmorkalov removed the incomplete label Nov 2, 2022

rogday approved these changes Nov 22, 2022

View reviewed changes

asmorkalov merged commit 6ca205a into opencv:4.x Nov 22, 2022

alalek mentioned this pull request Jan 8, 2023

(5.x) Merge 4.x #23113

Merged

WanliZhong deleted the nary_eltwise_cuda branch May 16, 2023 12:33

Uh oh!

Conversation

WanliZhong commented Sep 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

asmorkalov commented Sep 6, 2022

Uh oh!

WanliZhong commented Sep 7, 2022

Uh oh!

asmorkalov commented Sep 12, 2022

Uh oh!

asmorkalov commented Sep 14, 2022

Uh oh!

asmorkalov commented Sep 15, 2022

Uh oh!

asmorkalov commented Oct 11, 2022

Uh oh!

asmorkalov commented Oct 17, 2022

Uh oh!

asmorkalov commented Oct 17, 2022

Uh oh!

WanliZhong commented Oct 17, 2022 via email

Uh oh!

WanliZhong commented Oct 26, 2022

Uh oh!

fengyuentau Oct 26, 2022

Choose a reason for hiding this comment

Uh oh!

WanliZhong Oct 26, 2022

Choose a reason for hiding this comment

Uh oh!

WanliZhong commented Nov 2, 2022

Uh oh!

rogday left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

WanliZhong commented Sep 6, 2022 •

edited

Loading