Skip to content

DNN: Let part of the operators in nary_eltwise support CUDA#22478

Merged
asmorkalov merged 1 commit intoopencv:4.xfrom
WanliZhong:nary_eltwise_cuda
Nov 22, 2022
Merged

DNN: Let part of the operators in nary_eltwise support CUDA#22478
asmorkalov merged 1 commit intoopencv:4.xfrom
WanliZhong:nary_eltwise_cuda

Conversation

@WanliZhong
Copy link
Copy Markdown
Member

@WanliZhong WanliZhong commented Sep 6, 2022

This PR lets MAX, MIN, SUM, PRODUCT and DIV operators without broadcast support CUDA.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@WanliZhong WanliZhong changed the title Let part of the operators in nary_eltwise support CUDA DNN: Let part of the operators in nary_eltwise support CUDA Sep 6, 2022
@asmorkalov asmorkalov added category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib pr: needs test New functionality requires minimal tests set labels Sep 6, 2022
@asmorkalov
Copy link
Copy Markdown
Contributor

@WanliZhong Thanks for the contribution. Please enable related tests for CuDNN backend too. They are skipped for now.

@WanliZhong
Copy link
Copy Markdown
Member Author

This work is incomplete, the input of different shapes cannot be handled yet.

@WanliZhong WanliZhong changed the title DNN: Let part of the operators in nary_eltwise support CUDA DNN: Let part of the operators in nary_eltwise support CUDA [WIP] Sep 7, 2022
@fengyuentau fengyuentau self-requested a review September 7, 2022 13:29
@asmorkalov
Copy link
Copy Markdown
Contributor

@WanliZhong Friendly reminder about tests.

@WanliZhong WanliZhong changed the title DNN: Let part of the operators in nary_eltwise support CUDA [WIP] DNN: Let part of the operators in nary_eltwise support CUDA Sep 13, 2022
@asmorkalov asmorkalov requested a review from rogday September 13, 2022 07:58
@asmorkalov asmorkalov removed the pr: needs test New functionality requires minimal tests set label Sep 13, 2022
@asmorkalov
Copy link
Copy Markdown
Contributor

@fengyuentau @rogday please take a look.

@asmorkalov
Copy link
Copy Markdown
Contributor

@WanliZhong The PR fails a lot of tests with CUDA. For example for my 1080 with CUDA 10.2 I see:

[  FAILED  ] 70 tests, listed below:
[  FAILED  ] Test_Model.Keypoints_pose/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_Model.Keypoints_pose/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_Model.TextDetectionByDB/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_Model.TextDetectionByDB/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_conformance.Layer_Test/test_add_CUDA_CUDA, where GetParam() = (test_add, CUDA/CUDA)
[  FAILED  ] Test_ONNX_conformance.Layer_Test/test_add_CUDA_CUDA_FP16, where GetParam() = (test_add, CUDA/CUDA_FP16)
[  FAILED  ] Test_ONNX_conformance.Layer_Test/test_sub_CUDA_CUDA, where GetParam() = (test_sub, CUDA/CUDA)
[  FAILED  ] Test_ONNX_conformance.Layer_Test/test_sub_CUDA_CUDA_FP16, where GetParam() = (test_sub, CUDA/CUDA_FP16)
[  FAILED  ] Test_ONNX_layers.Shape/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Shape/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Scale/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Scale/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Power/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Power/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Elementwise_Log/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Elementwise_Log/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Elementwise_not/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Elementwise_not/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Compare_EQ/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Compare_EQ/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Compare_GT/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Compare_GT/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Compare_LT/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Compare_LT/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.CompareSameDims_EQ/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.CompareSameDims_EQ/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.CompareSameDims_GT/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.CompareSameDims_GT/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.CompareSameDims_LT/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.CompareSameDims_LT/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Eltwise3D/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Eltwise3D/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.MatMulAdd/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.MatMulAdd/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.MultyInputs/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.MultyInputs/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Broadcast/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Broadcast/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.DynamicResize/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.DynamicResize/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Div/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Div/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.DynamicReshape/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.DynamicReshape/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.Split/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.Split/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.MatmulWithTwoInputs/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.MatmulWithTwoInputs/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.GatherMultiOutput/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.GatherMultiOutput/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.SubFromConst/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.SubFromConst/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.DivConst/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.DivConst/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_layers.OutputRegistration/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_layers.OutputRegistration/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_nets.ResNet18v1/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_nets.ResNet18v1/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_nets.ResNet50v1/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_nets.ResNet50v1/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_nets.MobileNet_v2/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_nets.MobileNet_v2/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_nets.LResNet100E_IR/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_nets.LResNet100E_IR/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_nets.Inception_v2/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_nets.Inception_v2/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_nets.DenseNet121/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_nets.DenseNet121/1, where GetParam() = CUDA/CUDA_FP16
[  FAILED  ] Test_ONNX_nets.Resnet34_kinetics/0, where GetParam() = CUDA/CUDA
[  FAILED  ] Test_ONNX_nets.Resnet34_kinetics/1, where GetParam() = CUDA/CUDA_FP16

In particular:

Note: Google Test filter = Test_ONNX_conformance.Layer_Test/test_add_CUDA_CUDA
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Test_ONNX_conformance
[ RUN      ] Test_ONNX_conformance.Layer_Test/test_add_CUDA_CUDA, where GetParam() = (test_add, CUDA/CUDA)
Exception during net.forward() call!
unknown file: Failure
C++ exception with description "OpenCV(4.6.0-dev) /mnt/projects/Projects/OpenCV/opencv-master/modules/dnn/src/layers/nary_eltwise_layers.cpp:674: error: (-213:The function/feature is not implemented) Other operators except MAX, MIN, SUM, PRODUCT and DIV are not supported with cuda. in function 'operator()'
" thrown in the test body.
[  FAILED  ] Test_ONNX_conformance.Layer_Test/test_add_CUDA_CUDA, where GetParam() = (test_add, CUDA/CUDA) (772 ms)
[----------] 1 test from Test_ONNX_conformance (773 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (773 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Test_ONNX_conformance.Layer_Test/test_add_CUDA_CUDA, where GetParam() = (test_add, CUDA/CUDA)
  1. In case if the layer is not implemented the test should be disabled or skipped with not-implemented status.
  2. The missing layers should net destroy existing tests for full featured networks like Resnet.

@WanliZhong WanliZhong marked this pull request as draft September 27, 2022 08:22
@asmorkalov
Copy link
Copy Markdown
Contributor

@WanliZhong friendly reminder.

1 similar comment
@asmorkalov
Copy link
Copy Markdown
Contributor

@WanliZhong friendly reminder.

@asmorkalov
Copy link
Copy Markdown
Contributor

Also please rebase your PR brunch to current 4.x. You'll get automated CuDNN branch testing with CI (new Github Actions job).

@WanliZhong
Copy link
Copy Markdown
Member Author

WanliZhong commented Oct 17, 2022 via email

@WanliZhong WanliZhong force-pushed the nary_eltwise_cuda branch 3 times, most recently from b78b76e to 00d75d6 Compare October 21, 2022 05:11
@WanliZhong
Copy link
Copy Markdown
Member Author

I don't know if there is some way to get the shape of the input in supportBackend(). Otherwise, I have to turn off some tests. If we can get the shape of the input, we can make the backend fallback to the cpu when some nodes don't be supported.

Comment on lines +654 to +659
Ptr<BackendNode> initCUDA(
void *context_,
const std::vector<Ptr<BackendWrapper>>& inputs,
const std::vector<Ptr<BackendWrapper>>& outputs
) override
{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there is some way to get the shape of the input in supportBackend().

You can do this check in initCUDA. Take a look at CudaBackendWrapper, it should have the host mat and you can get the shape of input from the host mat.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do use it, but initCUDA is called after supportBackend so it doesn't fallback to the cpu backend.

@WanliZhong
Copy link
Copy Markdown
Member Author

Sorry for late updating.

@WanliZhong WanliZhong marked this pull request as ready for review November 2, 2022 07:51
Copy link
Copy Markdown
Member

@rogday rogday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

@asmorkalov asmorkalov merged commit 6ca205a into opencv:4.x Nov 22, 2022
@alalek alalek mentioned this pull request Jan 8, 2023
@WanliZhong WanliZhong deleted the nary_eltwise_cuda branch May 16, 2023 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants