Skip to content

Reimplementation of Element-wise layers with broadcasting support#21865

Merged
alalek merged 31 commits intoopencv:4.xfrom
rogday:nary_eltwise_layers
Jul 19, 2022
Merged

Reimplementation of Element-wise layers with broadcasting support#21865
alalek merged 31 commits intoopencv:4.xfrom
rogday:nary_eltwise_layers

Conversation

@rogday
Copy link
Copy Markdown
Member

@rogday rogday commented Apr 13, 2022

This version supports element-wise n-ary operations on k-dimensional tensors with broadcast-compatible shapes.

=== Update by @fengyuentau ===
ONNX operators that needs broadcasting: https://github.com/onnx/onnx/blob/main/docs/Broadcasting.md

Benchmarks on Apple M1:
Input tensor A of shape [8, 256, 128, 100], input tensor B of shape [8, 256, 128, 100]

Operation Eltwise layers (ms) This PR (ms)
Add Not supported 5.60
And Not supported -
Div 8.30 5.55
Equal Not supported 5.58
Greater Not supported 5.50
Less Not supported 5.45
Max 8.30 5.64 (nary)
Mean Not supported 5.44 (nary)
Min 8.30 5.60 (nary)
Mul 8.59 5.69
Or Not supported -
Pow Not supported 213.38
Sub Not supported 5.70
Sum 8.34 5.56 (nary)
Xor Not supported -

=== End of updates ===

Benchmarks:
Sum of two tensors with shape=(8,256,128,100)
Eltwise layer takes 24ms
this implementation takes 49ms

Note: small_vector doesn't speed up anything right now, so it'll probably get removed.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@vpisarev
Copy link
Copy Markdown
Contributor

@rogday, as we discussed, the slowdown is too big. Please, rework this pull request based on the code that I've sent you. Also, please, get rid of smallvec header. We don't need it. We have cv::AutoBuffer, std::vector and many other standard or already implemented containers. Let's not introduce fundamental containers without big need. In this case, as I shown in my implementation, we can add broadcasting in a universal way without loosing speed and without introducing new containers.

@rogday rogday requested a review from fengyuentau May 24, 2022 08:24
@fengyuentau
Copy link
Copy Markdown
Member

Thanks for the comments. I am looking into your code.

@fengyuentau
Copy link
Copy Markdown
Member

ONNX conformance-related flags, I think you need the first one

I still have disordered outputs like the screenshot that I shared in the comment above. I guess lets keep it still since there are only like a few ONNX conformance tests can be enabled in the parse deny list.

@vpisarev vpisarev self-requested a review July 16, 2022 09:31
@vpisarev
Copy link
Copy Markdown
Contributor

@alalek, looks good to me and can be merged

@alalek
Copy link
Copy Markdown
Member

alalek commented Jul 18, 2022

Please take a look on "Linux Debug" configuration and this failed test:

[ RUN      ] Test_Int8_nets.EfficientDet/0, where GetParam() = OCV/CPU
[ INFO:0@494.207] global /build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/src/tensorflow/tf_importer.cpp (2993) populateNet DNN/TF: parsing model (N/A version info). Number of nodes = 2011
[ INFO:0@494.207] global /build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/src/tensorflow/tf_importer.cpp (3000) populateNet DNN/TF: parsing config (N/A version info). Number of nodes = 938
Unmatched reference: class 17 score 0.824592 box [0.244581 x 0.530952 from (0.166575, 0.399682)] IoU diff: 1
/build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/test/test_common.impl.hpp:155: Failure
Expected: (refScores[i]) <= (confThreshold), actual: 0.824592 vs 0.65
[  FAILED  ] Test_Int8_nets.EfficientDet/0, where GetParam() = OCV/CPU (32941 ms)

It is unlikely related to this PR, regression is happened before (so it should be fixed in a separate PR).

@rogday
Copy link
Copy Markdown
Member Author

rogday commented Jul 18, 2022

Please take a look on "Linux Debug" configuration and this failed test:

[ RUN      ] Test_Int8_nets.EfficientDet/0, where GetParam() = OCV/CPU
[ INFO:0@494.207] global /build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/src/tensorflow/tf_importer.cpp (2993) populateNet DNN/TF: parsing model (N/A version info). Number of nodes = 2011
[ INFO:0@494.207] global /build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/src/tensorflow/tf_importer.cpp (3000) populateNet DNN/TF: parsing config (N/A version info). Number of nodes = 938
Unmatched reference: class 17 score 0.824592 box [0.244581 x 0.530952 from (0.166575, 0.399682)] IoU diff: 1
/build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/test/test_common.impl.hpp:155: Failure
Expected: (refScores[i]) <= (confThreshold), actual: 0.824592 vs 0.65
[  FAILED  ] Test_Int8_nets.EfficientDet/0, where GetParam() = OCV/CPU (32941 ms)

It is unlikely related to this PR, regression is happened before (so it should be fixed in a separate PR).

AFAICT, the model from this test is loaded by TF importer, which is unchanged.

@alalek alalek merged commit ed69bca into opencv:4.x Jul 19, 2022
@fengyuentau
Copy link
Copy Markdown
Member

Please take a look on "Linux Debug" configuration and this failed test:

[ RUN      ] Test_Int8_nets.EfficientDet/0, where GetParam() = OCV/CPU
[ INFO:0@494.207] global /build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/src/tensorflow/tf_importer.cpp (2993) populateNet DNN/TF: parsing model (N/A version info). Number of nodes = 2011
[ INFO:0@494.207] global /build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/src/tensorflow/tf_importer.cpp (3000) populateNet DNN/TF: parsing config (N/A version info). Number of nodes = 938
Unmatched reference: class 17 score 0.824592 box [0.244581 x 0.530952 from (0.166575, 0.399682)] IoU diff: 1
/build/precommit_linux64_no_opt/4.x/opencv/modules/dnn/test/test_common.impl.hpp:155: Failure
Expected: (refScores[i]) <= (confThreshold), actual: 0.824592 vs 0.65
[  FAILED  ] Test_Int8_nets.EfficientDet/0, where GetParam() = OCV/CPU (32941 ms)

It is unlikely related to this PR, regression is happened before (so it should be fixed in a separate PR).

Let me try to fix this in a separate PR.

@JulienMaille
Copy link
Copy Markdown
Contributor

Dear @rogday I'm trying to understand why your changes have broken my model inference when using OpenVino CPU backend.
"add" layers used to be handled in eltwise_layer.cpp and now they are process by nary_eltwise_layers.cpp
However while I see code for ngraph, cuda, cpu in the former, I don't this is in your reimplementation. How do you deal which specific backends here?

@rogday
Copy link
Copy Markdown
Member Author

rogday commented Nov 21, 2022

Dear @rogday I'm trying to understand why your changes have broken my model inference when using OpenVino CPU backend. "add" layers used to be handled in eltwise_layer.cpp and now they are process by nary_eltwise_layers.cpp However while I see code for ngraph, cuda, cpu in the former, I don't this is in your reimplementation. How do you deal which specific backends here?

Hello, @JulienMaille. We decided to support only CPU version for now. Your model should still work - it should just fall back to default backend in this case. Can you provide more details(maybe with reproducer)?

@JulienMaille
Copy link
Copy Markdown
Contributor

@rogday Thanks for your reply, I can invite you to a private repository with a full reproducer
I have an open issue here #22640 with a follow up on openvinotoolkit/openvino#13493

@JulienMaille
Copy link
Copy Markdown
Contributor

@rogday I see you accepted my repository invitation, were you able to reproduce the exception?
Exception: OpenCV(4.6.0-dev) D:\Dev\opencv\modules\dnn\src\ie_ngraph.cpp:747: error: (-2:Unspecified error) Failed to initialize Inference Engine backend (device = CPU): Cannot get memory! in function

@alalek
Copy link
Copy Markdown
Member

alalek commented Nov 28, 2022

As there is no OpenVINO implementation for this layer anymore, so OpenCVDNN-OpenVINO engine tries to execute this node through "OpenVINO custom layer". Custom layers support is not widely tested in OpenVINO itself (at least in public tests), so something could be broken there (even before this patch in OpenCV).

As reported in the original issue #22640:

  • OpenVINO GPU plugin is able to properly work with "new" OpenCV execution flow.
  • OpenVINO CPU plugin raises this useless error message.

The first item shows what OpenCVDNN-OpenVINO engine may work properly and interaction with OpenVINO works (somehow).
Likely there is no workaround possible in OpenCV code for OpenVINO CPU plugin ("OpenVINO custom layers" is already a workaround).

BTW, did you try OpenVINO 2021.4.2 LTS with updated OpenCV? ("Cannot get memory!" error message is widely observed with 2022.1 update of OV - several tests are disabled/skipped due to this)

@JulienMaille
Copy link
Copy Markdown
Contributor

JulienMaille commented Nov 29, 2022

@alalek thanks for your feedback, so you would say the bug is in OpenVino, but they say it's in OpenCV :)

BTW, did you try OpenVINO 2021.4.2 LTS with updated OpenCV?

I'm pretty much sure that it works, I will try again and report, but my plan was to update to latest OpenVino

"Cannot get memory!" error message is widely observed with 2022.1 update of OV - several tests are disabled/skipped due to this)

Is there a related issue I can follow?

@zihaomu
Copy link
Copy Markdown
Member

zihaomu commented Mar 13, 2023

Link:#21078

@zihaomu zihaomu mentioned this pull request Mar 13, 2023
48 tasks
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
Reimplementation of Element-wise layers with broadcasting support

* init

* semi-working initial version

* add small_vector

* wip

* remove smallvec

* add nary function

* replace auto with Mat in lambda expr used in transform

* uncomment asserts

* autobuffer shape_buf & step_buf

* fix a missing bracket

* fixed a missing addLayer in parseElementWise

* solve one-dimensional broadcast

* remove pre_broadcast_transform for the case of two constants; fix missing constBlobsExtraInfo when addConstant is called

* one autobuffer for step & shape

* temporal fix for the missing original dimension information

* fix parseUnsqueeze when it gets a 1d tensor constant

* support sum/mean/min/max with only one input

* reuse old code to handle cases of two non-constant inputs

* add condition to handle div & mul of two non-constant inputs

* use || instead of or

* remove trainling spaces

* enlarge buf in binary_forward to contain other buffer

* use autobuffer in nary_forward

* generate data randomly and add more cases for perf

* add op and, or & xor

* update perf_dnn

* remove some comments

* remove legacy; add two ONNX conformance tests in filter

* move from cpu_denylist to all_denylist

* adjust parsing for inputs>=2

Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants