Reimplementation of Element-wise layers with broadcasting support by rogday · Pull Request #21865 · opencv/opencv

rogday · 2022-04-13T20:21:51Z

This version supports element-wise n-ary operations on k-dimensional tensors with broadcast-compatible shapes.

=== Update by @fengyuentau ===
ONNX operators that needs broadcasting: https://github.com/onnx/onnx/blob/main/docs/Broadcasting.md

Benchmarks on Apple M1:
Input tensor A of shape [8, 256, 128, 100], input tensor B of shape [8, 256, 128, 100]

Operation	Eltwise layers (ms)	This PR (ms)
Add	Not supported	5.60
And	Not supported	-
Div	8.30	5.55
Equal	Not supported	5.58
Greater	Not supported	5.50
Less	Not supported	5.45
Max	8.30	5.64 (nary)
Mean	Not supported	5.44 (nary)
Min	8.30	5.60 (nary)
Mul	8.59	5.69
Or	Not supported	-
Pow	Not supported	213.38
Sub	Not supported	5.70
Sum	8.34	5.56 (nary)
Xor	Not supported	-

=== End of updates ===

Benchmarks:
Sum of two tensors with shape=(8,256,128,100)
Eltwise layer takes 24ms
this implementation takes 49ms

Note: small_vector doesn't speed up anything right now, so it'll probably get removed.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

vpisarev · 2022-04-14T18:20:51Z

@rogday, as we discussed, the slowdown is too big. Please, rework this pull request based on the code that I've sent you. Also, please, get rid of smallvec header. We don't need it. We have cv::AutoBuffer, std::vector and many other standard or already implemented containers. Let's not introduce fundamental containers without big need. In this case, as I shown in my implementation, we can add broadcasting in a universal way without loosing speed and without introducing new containers.

modules/dnn/src/layers/nary_eltwise_layers.cpp

modules/dnn/perf/perf_layer.cpp

modules/dnn/src/layers/nary_eltwise_layers.cpp

modules/dnn/src/onnx/onnx_importer.cpp

fengyuentau · 2022-05-25T11:30:17Z

Thanks for the comments. I am looking into your code.

…sing constBlobsExtraInfo when addConstant is called

fengyuentau · 2022-07-13T11:36:24Z

ONNX conformance-related flags, I think you need the first one

I still have disordered outputs like the screenshot that I shared in the comment above. I guess lets keep it still since there are only like a few ONNX conformance tests can be enabled in the parse deny list.

vpisarev · 2022-07-16T09:32:17Z

@alalek, looks good to me and can be merged

alalek · 2022-07-18T18:45:41Z

Please take a look on "Linux Debug" configuration and this failed test:

[ RUN      ] Test_Int8_nets.EfficientDet/0, where GetParam() = OCV/CPU
[ INFO:0@494.207] global /build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/src/tensorflow/tf_importer.cpp (2993) populateNet DNN/TF: parsing model (N/A version info). Number of nodes = 2011
[ INFO:0@494.207] global /build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/src/tensorflow/tf_importer.cpp (3000) populateNet DNN/TF: parsing config (N/A version info). Number of nodes = 938
Unmatched reference: class 17 score 0.824592 box [0.244581 x 0.530952 from (0.166575, 0.399682)] IoU diff: 1
/build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/test/test_common.impl.hpp:155: Failure
Expected: (refScores[i]) <= (confThreshold), actual: 0.824592 vs 0.65
[  FAILED  ] Test_Int8_nets.EfficientDet/0, where GetParam() = OCV/CPU (32941 ms)

It is unlikely related to this PR, regression is happened before (so it should be fixed in a separate PR).

rogday · 2022-07-18T22:09:31Z

Please take a look on "Linux Debug" configuration and this failed test:

[ RUN      ] Test_Int8_nets.EfficientDet/0, where GetParam() = OCV/CPU
[ INFO:0@494.207] global /build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/src/tensorflow/tf_importer.cpp (2993) populateNet DNN/TF: parsing model (N/A version info). Number of nodes = 2011
[ INFO:0@494.207] global /build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/src/tensorflow/tf_importer.cpp (3000) populateNet DNN/TF: parsing config (N/A version info). Number of nodes = 938
Unmatched reference: class 17 score 0.824592 box [0.244581 x 0.530952 from (0.166575, 0.399682)] IoU diff: 1
/build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/test/test_common.impl.hpp:155: Failure
Expected: (refScores[i]) <= (confThreshold), actual: 0.824592 vs 0.65
[  FAILED  ] Test_Int8_nets.EfficientDet/0, where GetParam() = OCV/CPU (32941 ms)

It is unlikely related to this PR, regression is happened before (so it should be fixed in a separate PR).

AFAICT, the model from this test is loaded by TF importer, which is unchanged.

fengyuentau · 2022-07-19T03:19:58Z

Please take a look on "Linux Debug" configuration and this failed test:

[ RUN      ] Test_Int8_nets.EfficientDet/0, where GetParam() = OCV/CPU
[ INFO:0@494.207] global /build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/src/tensorflow/tf_importer.cpp (2993) populateNet DNN/TF: parsing model (N/A version info). Number of nodes = 2011
[ INFO:0@494.207] global /build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/src/tensorflow/tf_importer.cpp (3000) populateNet DNN/TF: parsing config (N/A version info). Number of nodes = 938
Unmatched reference: class 17 score 0.824592 box [0.244581 x 0.530952 from (0.166575, 0.399682)] IoU diff: 1
/build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/test/test_common.impl.hpp:155: Failure
Expected: (refScores[i]) <= (confThreshold), actual: 0.824592 vs 0.65
[  FAILED  ] Test_Int8_nets.EfficientDet/0, where GetParam() = OCV/CPU (32941 ms)

It is unlikely related to this PR, regression is happened before (so it should be fixed in a separate PR).

Let me try to fix this in a separate PR.

JulienMaille · 2022-11-17T09:22:51Z

Dear @rogday I'm trying to understand why your changes have broken my model inference when using OpenVino CPU backend.
"add" layers used to be handled in eltwise_layer.cpp and now they are process by nary_eltwise_layers.cpp
However while I see code for ngraph, cuda, cpu in the former, I don't this is in your reimplementation. How do you deal which specific backends here?

rogday · 2022-11-21T17:17:15Z

Dear @rogday I'm trying to understand why your changes have broken my model inference when using OpenVino CPU backend. "add" layers used to be handled in eltwise_layer.cpp and now they are process by nary_eltwise_layers.cpp However while I see code for ngraph, cuda, cpu in the former, I don't this is in your reimplementation. How do you deal which specific backends here?

Hello, @JulienMaille. We decided to support only CPU version for now. Your model should still work - it should just fall back to default backend in this case. Can you provide more details(maybe with reproducer)?

JulienMaille · 2022-11-21T17:24:03Z

@rogday Thanks for your reply, I can invite you to a private repository with a full reproducer
I have an open issue here #22640 with a follow up on openvinotoolkit/openvino#13493

JulienMaille · 2022-11-28T15:36:17Z

@rogday I see you accepted my repository invitation, were you able to reproduce the exception?
Exception: OpenCV(4.6.0-dev) D:\Dev\opencv\modules\dnn\src\ie_ngraph.cpp:747: error: (-2:Unspecified error) Failed to initialize Inference Engine backend (device = CPU): Cannot get memory! in function

alalek · 2022-11-28T21:13:30Z

As there is no OpenVINO implementation for this layer anymore, so OpenCVDNN-OpenVINO engine tries to execute this node through "OpenVINO custom layer". Custom layers support is not widely tested in OpenVINO itself (at least in public tests), so something could be broken there (even before this patch in OpenCV).

As reported in the original issue #22640:

OpenVINO GPU plugin is able to properly work with "new" OpenCV execution flow.
OpenVINO CPU plugin raises this useless error message.

The first item shows what OpenCVDNN-OpenVINO engine may work properly and interaction with OpenVINO works (somehow).
Likely there is no workaround possible in OpenCV code for OpenVINO CPU plugin ("OpenVINO custom layers" is already a workaround).

BTW, did you try OpenVINO 2021.4.2 LTS with updated OpenCV? ("Cannot get memory!" error message is widely observed with 2022.1 update of OV - several tests are disabled/skipped due to this)

JulienMaille · 2022-11-29T19:35:57Z

@alalek thanks for your feedback, so you would say the bug is in OpenVino, but they say it's in OpenCV :)

BTW, did you try OpenVINO 2021.4.2 LTS with updated OpenCV?

I'm pretty much sure that it works, I will try again and report, but my plan was to update to latest OpenVino

"Cannot get memory!" error message is widely observed with 2022.1 update of OV - several tests are disabled/skipped due to this)

Is there a related issue I can follow?

zihaomu · 2023-03-13T13:26:36Z

Link:#21078

Reimplementation of Element-wise layers with broadcasting support * init * semi-working initial version * add small_vector * wip * remove smallvec * add nary function * replace auto with Mat in lambda expr used in transform * uncomment asserts * autobuffer shape_buf & step_buf * fix a missing bracket * fixed a missing addLayer in parseElementWise * solve one-dimensional broadcast * remove pre_broadcast_transform for the case of two constants; fix missing constBlobsExtraInfo when addConstant is called * one autobuffer for step & shape * temporal fix for the missing original dimension information * fix parseUnsqueeze when it gets a 1d tensor constant * support sum/mean/min/max with only one input * reuse old code to handle cases of two non-constant inputs * add condition to handle div & mul of two non-constant inputs * use || instead of or * remove trainling spaces * enlarge buf in binary_forward to contain other buffer * use autobuffer in nary_forward * generate data randomly and add more cases for perf * add op and, or & xor * update perf_dnn * remove some comments * remove legacy; add two ONNX conformance tests in filter * move from cpu_denylist to all_denylist * adjust parsing for inputs>=2 Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>

rogday added 3 commits April 11, 2022 21:16

init

2f579f2

semi-working initial version

3f9f171

add small_vector

b0385cb

rogday added feature category: dnn labels Apr 13, 2022

vpisarev self-requested a review April 14, 2022 18:18

rogday added 2 commits April 14, 2022 22:12

wip

d3cc091

remove smallvec

96788ac

vpisarev reviewed Apr 14, 2022

View reviewed changes

modules/dnn/src/layers/nary_eltwise_layers.cpp Outdated Show resolved Hide resolved

vpisarev reviewed Apr 14, 2022

View reviewed changes

modules/dnn/src/layers/nary_eltwise_layers.cpp Show resolved Hide resolved

vpisarev reviewed Apr 14, 2022

View reviewed changes

modules/dnn/src/layers/nary_eltwise_layers.cpp Show resolved Hide resolved

add nary function

261cc08

rogday mentioned this pull request May 4, 2022

Add general broadcasting layer #21449

Closed

6 tasks

fengyuentau mentioned this pull request May 19, 2022

Opencv failed to call ONNX model: shape_utils.hpp:171 in function 'cv::dnn::dnn4_v20211220::total' #21967

Closed

rogday commented May 23, 2022

View reviewed changes

rogday requested a review from fengyuentau May 24, 2022 08:24

fengyuentau added 12 commits June 15, 2022 13:05

replace auto with Mat in lambda expr used in transform

cb30f0b

uncomment asserts

6a676c5

autobuffer shape_buf & step_buf

ca711c9

Merge branch '4.x' into nary_eltwise_layers

92da816

fix a missing bracket

dc4343b

fixed a missing addLayer in parseElementWise

858f4e6

solve one-dimensional broadcast

bb09a34

remove pre_broadcast_transform for the case of two constants; fix mis…

da5c4ba

…sing constBlobsExtraInfo when addConstant is called

one autobuffer for step & shape

f4875e1

temporal fix for the missing original dimension information

8e03ff5

fix parseUnsqueeze when it gets a 1d tensor constant

05ffc5f

support sum/mean/min/max with only one input

57d7aee

move from cpu_denylist to all_denylist

c4c84d7

zihaomu mentioned this pull request Jul 13, 2022

DNN: fix bugs in scale and reduce layer -WIP #22140

Closed

6 tasks

adjust parsing for inputs>=2

bb79f13

vpisarev self-requested a review July 16, 2022 09:31

vpisarev approved these changes Jul 16, 2022

View reviewed changes

alalek assigned vpisarev Jul 19, 2022

alalek merged commit ed69bca into opencv:4.x Jul 19, 2022

fengyuentau mentioned this pull request Jul 19, 2022

Add palm detector from MediaPipe opencv/opencv_zoo#51

Merged

15 tasks

alalek mentioned this pull request Aug 21, 2022

(5.x) Merge 4.x #22408

Merged

fengyuentau mentioned this pull request Sep 29, 2022

Yolov4 onnx model can not be infered using opencv dnn (Different shapes case is not supported with constant inputs: Mul in function 'parseMul' ) #21519

Closed

This was referenced Oct 12, 2022

DNN test failures with CuDNN support #22623

Closed

OpenCV(4.6.0-dev) Error: Assertion failed (is_shape_same(dest, src)) #22586

Closed

rogday deleted the nary_eltwise_layers branch October 18, 2022 14:06

garfeng mentioned this pull request Nov 4, 2022

Master cudawarped/opencv_contrib#1

Closed

JulienMaille mentioned this pull request Nov 17, 2022

Cannot run inference on OpenVino CPU after update to OpenVino 2022.2 #22640

Closed

4 tasks

asmorkalov mentioned this pull request Dec 1, 2022

cv.dnn not support pytorch broadcast? #21255

Closed

2 tasks

zihaomu mentioned this pull request Mar 13, 2023

ONNX conformance test results #21078

Open

48 tasks

Uh oh!

Conversation

rogday commented Apr 13, 2022 • edited by fengyuentau Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

vpisarev commented Apr 14, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fengyuentau commented May 25, 2022

Uh oh!

fengyuentau commented Jul 13, 2022

Uh oh!

vpisarev commented Jul 16, 2022

Uh oh!

alalek commented Jul 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rogday commented Jul 18, 2022

Uh oh!

fengyuentau commented Jul 19, 2022

Uh oh!

JulienMaille commented Nov 17, 2022

Uh oh!

rogday commented Nov 21, 2022

Uh oh!

JulienMaille commented Nov 21, 2022

Uh oh!

JulienMaille commented Nov 28, 2022

Uh oh!

alalek commented Nov 28, 2022

Uh oh!

JulienMaille commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zihaomu commented Mar 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

rogday commented Apr 13, 2022 •

edited by fengyuentau

Loading

alalek commented Jul 18, 2022 •

edited

Loading

JulienMaille commented Nov 29, 2022 •

edited

Loading