DNN: fixed bug in depthwise conv of stride 2 by zihaomu · Pull Request #23162 · opencv/opencv

zihaomu · 2023-01-20T15:19:26Z

Merge this PR with test data: opencv/opencv_extra#1041.
Related issue: #23151
Related PR: #22905

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

dkurt

Thanks!

alalek · 2023-01-21T14:35:44Z

modules/dnn/test/test_onnx_importer.cpp

+TEST_P(Test_ONNX_layers, DepthWiseConv)
 {
    testONNXModels("depthwiseconv_add");
+    testONNXModels("depthwise_stride2");
 }


Please separate test. Merging them is a bad practice.

Looks like added test doesn't call the modified code.

Also need to add test which triggers out_j <= pad_l condition from above (both cases on lines 126 and 95).

Thanks for the code reviewing.
I have updated the test to trigger out_j <= pad_l more easily.

The difference is: the previous code will enter SIMD optimization, which leads to memory errors. Now, we will break when out_j <= pad_l is true (the test, out_j and pad_l are equal to 1).

Thank you for update!

Now dumped values are: out_j=1 pad_l=1 outW1=2 VEC_NLANES=4
So both parts of condition are still true:

if (out_j <= pad_l || outW1 - VEC_NLANES < 0) break;

Problem is that canonical pattern for vectorization with SIMD tail processing is the following:

int j = 0; for ( ; j < len; j += VECSZ) { if (j > len - VECSZ) { if (j == 0) break; j = len - VECSZ; // "shifted" tail } ... processing ... }

(e.g., core / convert.simd.hpp)

Also we could move out "j == 0" inner check and get a better code (in almost all cases):

int j = 0; if (len < VECSZ) // data is too small return j; // or skip the loop for ( ; j < len; j += VECSZ) { if (j > len - VECSZ) { j = len - VECSZ; // "shifted" tail } ... processing ... }

This code is verified and approved for using. Need to completely understand this code pattern.

Existence of other extra checks is very suspicious in such code.

Now return to used tail handling part:

if (out_j + VEC_NLANES > outW1) { if (out_j <= pad_l || outW1 - VEC_NLANES < 0) break; out_j = outW1 - VEC_NLANES; }

out_j + VEC_NLANES > outW1 could be replaced to out_j > outW1 - VEC_NLANES, because out_j + VEC_NLANES is not a constant, but outW1 - VEC_NLANES is a constant for the loop (and could be computed once). We could assume that out_j + VEC_NLANES is used for the next loop increment but this may be not true if processing part is large (and there are not enough registers to store this temporary value) or compiler is not smart enough.

outW1 - VEC_NLANES < 0 is similar to len < VEC_NLANES

out_j is not always start from 0. It could start from 1 (line 65). But not from pad_l. So out_j <= pad_l check looks suspicious.

I believe we don't need both checks. We should have some modified one according to offset.

depthwise_convolution.cpp

There is mess of SIMD and SIMD-dispatcher code.

SIMD + reference/baseline code must go to .simd.hpp

dispatcher code must be in a separate file like it is done in core/imgproc and even dnn(layers_common.simd.hpp) modules

there is dispatching based on instruction set(!) on fastDepthwiseConv optimization of another algorithm (that should not happen, because this is a mess - @vpisarev)

Hi @alalek, I have updated the code and replaced the check of out_j <= pad_l with in_j < 0.

In the following code:

if (out_j > outW1 - VEC_NLANES) out_j = outW1 - VEC_NLANES; int in_j = out_j * stride_w - pad_l;

When out_j > outW1 - VEC_NLANES is true, out_j maybe 0. And if the pad_l is 1, the in_j may be < 0. And this will cause a memory error.

dispatcher code must be in a separate file like it is done in core/imgproc and even dnn(layers_common.simd.hpp) modules

This is something I've been wanting to optimize for a while and hopes to finish it this week in another PR.

alalek · 2023-01-23T00:23:42Z

FYI, Test_Int8_nets.EfficientDet/0 test is broken on some configurations and it is not related to this patch - skipped in #23167

alalek

Please add assumptions checks for input parameters of this code in the beginning of function.

E.g.:

CV_DbgAssert(pad_l ...);

(just because lines 32 and 65 don't correlate - tail and head processing are different).

alalek · 2023-01-27T03:53:18Z