dnn (cuda): support broadcasting if a.rank() != b.rank()#24834
dnn (cuda): support broadcasting if a.rank() != b.rank()#24834asmorkalov merged 5 commits intoopencv:4.xfrom
Conversation
|
Tried to add yolov8n to test on different backends, but turns out we may have more problems, especially in CUDA_FP16 target: |
|
@fengyuentau once this PR is complete (currently yolov8 is not supported on CUDA here, AFAK) does it mean that PR #24786 is going be obsolete? |
Yes.
It's not true. There are some minor differences in the results between CPU and CUDA/CUDA, which is OK I think, but the differences are much bigger when it comes to the CUDA_FP16 target. I guess we lose some accuracy in |
|
Locally I observe several test failures like this: Full list: |
|
It was due to there are inputs of shape [1] (1d mat) in these failed tests. In cuda backend, there are asserts checking It works previously because it was not actually testing the CUDA backend; If two inputs have different dimensions, it falls back to CPU implementation. So it tests nothing related to the CUDA backend in these case. See below for the fall back (Line 804-805): opencv/modules/dnn/src/layers/nary_eltwise_layers.cpp Lines 800 to 811 in 5c9ad9d With that being said, I propose to turn off these tests specifically for CUDA backend. @asmorkalov What do you think? @WanliZhong Please join this talk as well. |
Or we still fall back to CPU when dimension is 1. |
|
I propose fallback when dim is 1 to make sure cuda run correctly rather than throw an error |
It does not work due to the 1d mat is actually produced during the broadcasting implementation in the CUDA backend. Let me find another solution to this. |
|
New commits should resolve this problem. |
|
Pass tests with CUDA locally now. |
|
Sporadic crash in |
Inspired by #24786. This PR keeps the fusion of
NaryEltwiseandConcatwhile addressed the data missing problem via supporting broadcasting if a.rank() != b.rank().Resolves #23977
Resolves #24606
Resolves #24635
Resolves #24721
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.