Skip to content

Remove exception for NaryEltwise#24786

Closed
Abdurrahheem wants to merge 3 commits intoopencv:4.xfrom
Abdurrahheem:ash/cuda_concat
Closed

Remove exception for NaryEltwise#24786
Abdurrahheem wants to merge 3 commits intoopencv:4.xfrom
Abdurrahheem:ash/cuda_concat

Conversation

@Abdurrahheem
Copy link
Copy Markdown
Contributor

@Abdurrahheem Abdurrahheem commented Dec 27, 2023

This is an experimental PR to check what fails if exception for NaryEltwise is removed. This PR is based on suggestion made in #24721

Data for this PR is located in 1136

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@Abdurrahheem
Copy link
Copy Markdown
Contributor Author

Abdurrahheem commented Dec 27, 2023

@dkurt it seems that just removing this exception works just fine

@zihaomu zihaomu requested a review from WanliZhong December 27, 2023 23:36
@zihaomu
Copy link
Copy Markdown
Member

zihaomu commented Dec 27, 2023

Hi @Abdurrahheem, this solution looks good to me. The frequency of use of concat fusion is much lower than that of Conv fusion. Turning off NaryEltwise such cases has little impact on the inference speed.

The following is the reply that I misunderstood, please ignore it.
This change will remove the support of NaryEltwise on all CUDA devices, which will affect the inference speed. A long-term solution to this issue is needed. @WanliZhong please take a look.

Copy link
Copy Markdown
Member

@zihaomu zihaomu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good short-term solution. 👍

@dkurt
Copy link
Copy Markdown
Member

dkurt commented Dec 28, 2023

@Abdurrahheem, we need a test for it.

@dkurt dkurt added the pr: needs test New functionality requires minimal tests set label Dec 28, 2023
@asmorkalov asmorkalov added this to the 4.10.0 milestone Dec 28, 2023
@fengyuentau
Copy link
Copy Markdown
Member

This bug is originally introduced from #23255.

This PR can also resolve #23977, #24606, #23977.

ASSERT_FALSE(net_cuda.empty());

net.setPreferableBackend(backend);
net.setPreferableTarget(target);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here net also runs different backends. I think we want it run only in the CPU backend and compare results against the ones in the CUDA backend, right? If so, I propose to move it to test_backends.cpp.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. And no need for reference input/output data. Let's run on CPU and compare with backend.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asmorkalov pushed a commit that referenced this pull request Jan 11, 2024
dnn (cuda): support broadcasting if a.rank() != b.rank() #24834

Inspired by #24786. This PR keeps the fusion of `NaryEltwise` and `Concat` while addressed the data missing problem via supporting broadcasting if a.rank() != b.rank().

Resolves #23977
Resolves #24606
Resolves #24635
Resolves #24721 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
@Abdurrahheem
Copy link
Copy Markdown
Contributor Author

Abdurrahheem commented Jan 11, 2024

This PR is no longer need as #24834 has covered the issue

@asmorkalov asmorkalov reopened this Jan 11, 2024
@asmorkalov
Copy link
Copy Markdown
Contributor

@Abdurrahheem please stay it opened till discussion with Dmitry.

@asmorkalov asmorkalov removed the pr: needs test New functionality requires minimal tests set label Jan 12, 2024
@asmorkalov asmorkalov marked this pull request as ready for review January 12, 2024 09:04
inp_i_data->layerInstance->type != "Reorg" &&
inp_i_data->layerInstance->type != "Eltwise" &&
inp_i_data->layerInstance->type != "NaryEltwise" &&
// inp_i_data->layerInstance->type != "NaryEltwise" && // link to the issue: https://github.com/opencv/opencv/issues/24721
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkurt I guess they want your comment on this change.

In my opinion, we do not need to comment out this line as #24834 has mostly resolved the issue. Previously it fuses NaryEltwise and Concat regardless the CPU fallback in NaryEltwise. So there is data both in the host and device in the output of Concat, leading to the missing data reported in those related issues. The CPU fallback happened previously when broadcasting or operation is not supported. The broadcast support is done, leaving the risk of unsupported operation. However, it should also be fine since common operations hves been supported (add, sub, mul, div) and we do not get performance regression with disabling this fusion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fengyuentau, yes, I understand this but this PR before #24834 merged shows that even with disabled broadcast all the tests passed so having an exclusion like this is redundant.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shows that even with disabled broadcast all the tests passed

All tests were passed because:

  • there is no naryeltwise + concat model case in the tests before this PR,
  • this PR added the test case but commented out the naryeltwise + concat fusion.

@dkurt
Copy link
Copy Markdown
Member

dkurt commented Jan 18, 2024

As original issue was resolved by a different PR, I'd like to close this PR and suggest refactor CUDA part later. @fengyuentau pointed right that benchmarking is required before touching this sensitive part of code. @Abdurrahheem, thanks!

@dkurt dkurt closed this Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants