cuda4dnn: enable fusion tests, update thresholds, fix eltwise fusion regression#18285
Conversation
|
@YashasSamaga Do you have progress on the patch? Do you need help with testing on different hardware? |
|
@asmorkalov I will finish this in a few hours. I am testing on GTX 1050, 2080 Ti and 1080 Ti. There are probably tests failing on other devices. It's hard to get these tests to pass on all devices since cuDNN chooses different algorithms for convolution on different devices. |
fc4f9c6 to
e9a717e
Compare
7abe9f3 to
0f8ab05
Compare
| #endif | ||
|
|
||
| if (pinsToKeep.count(lpNext) != 0) | ||
| if (IS_DNN_OPENCL_TARGET(preferableTarget) && pinsToKeep.count(lpNext) != 0) |
There was a problem hiding this comment.
This change fixes a regression (affects master only -- 3.4 is fine) from #18554 (that landed in master through merge PR #18556). The CUDA backend can optionally fuse an activation before eltwise (for YOLO like networks) and also optionally fuse an activation after eltwise (for ResNet like networks). Hence, this check is wrong for CUDA backend. This check prevents valid fusions and causes the fusion tests to fail (Yay! The tests are doing their job!).
It's correct for OpenCL since it always fuses an activation after eltwise. However, this check is redundant on master. The fusion logic downstream will actually do the same check. But this check will help break the fusion process early.
So I have added a target check to ensure that it's only used in OpenCL targets.
| break; | ||
|
|
||
| if (!nextData->params.has("operation") || toLowerCase(nextData->params.get<String>("operation")) == "sum") | ||
| if (IS_DNN_OPENCL_TARGET(preferableTarget)) |
There was a problem hiding this comment.
I have added an OpenCL target check to prevent eltwise operation and coeffs checks running in other targets. These checks are strictly not required for the CUDA backend as it has its own checks that appear prior to these checks. This OpenCL checks would still be redo the checks in a different way (although redundant) but it causes coupling of the two backends. I just thought decoupling OpenCL and CUDA checks might prevent future surprises (they already have different fusion capabilities).
I am not sure if this change is really required. It seems to be messing up git diff because of the indentation added across the whole block.
| testONNXModels("deconvolution_output_shape", npy, 0, 0, false, false); | ||
| testONNXModels("deconv_adjpad_2d", npy, 0, 0, false, false); | ||
| if (target != DNN_TARGET_CUDA_FP16) // bug | ||
| testONNXModels("deconv_adjpad_2d", npy, 0, 0, false, false); |
There was a problem hiding this comment.
This test case has five tests. Only the last test fails for CUDA FP16 target. I haven't skipped since the other tests are still useful but not skipping might make it harder to track (and enable it in the future after the bug is fixed).
Maybe it would be a good idea to split this tests into multiple tests (or at least split the one that is failing into its own test).
There was a problem hiding this comment.
alalek
left a comment
There was a problem hiding this comment.
Well done! Thank you for contribution 👍
All tests pass on GTX 1080 Ti (cuDNN 7.6.5) and RTX 2080 Ti (cuDNN 7.6.5).
Some tests fail due to non-availability of convolution algorithms on GTX 1050 (cuDNN 8.0.5). These cannot be fixed as it's a cuDNN issue.
This PR also fixes an issue that arose because of a check from #18554 that disabled valid fusions.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#mking-a-good-pull-request
Patch to opencv_extra has the same branch name.