cuda4dnn(Eltwise, Power): fix eltwise fusion, enable more eltwise fusion, fix power fusion#17939
Conversation
|
To ensure that this doesn't happen again and to catch future bugs, I was thinking of adding a test for every possible fusion in the CUDA backend.
I would create a test template that would create the above scenarios. It will compare the outputs of the CUDA backend (and maybe others?) against OCV CPU backend without fusions. |
|
@YashasSamaga , yes, generic fusion tests would be useful for all backends. Is this PR ready to be merged? |
|
@mshabunin I was waiting for a reply for the tests. I will finish this PR in maybe in another 6-8 hours. EDIT: might need more time. Tests with power activation are failing for both OpenCL and CUDA and I am not able to figure out why. |
|
I agree that it would be nice to have the generic tests. You may build it in runtime, without test data like here: https://github.com/opencv/opencv/blob/master/modules/dnn/test/test_layers.cpp |
6a523f1 to
885e240
Compare
|
PR is ready. There is an unrelated test from calib3d that is failing in Win64 OpenCL CI build that prevents the PR from turning green. All tests are passing locally (CUDA + OpenCL). Results of fusion tests #17976 in 4.4.0 with debug printfs enabled: https://gist.github.com/YashasSamaga/354b2ef2cf570dcd5beb8fc91c657b84 |
…e-fusion * fix eltwise fusion segfault, more eltwise fusions, fix power fusion * add assertion
Eltwise layer can create two types of operators:
ShortcutOpandEltwiseOp. There is a dynamic cast toEltwiseOpBaseto retrieve configuration to check fusion compatibility. No check is made to see if the cast was successful (required since someEltwiseLayerusecuda4dnn::ShortcutOpwhich is not derived fromEltwiseOpBase).fixes #17934
fixes #17946 for master
fixes cuda part of #17964
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.