Reimplementation of Element-wise layers with broadcasting support#21865
Reimplementation of Element-wise layers with broadcasting support#21865alalek merged 31 commits intoopencv:4.xfrom
Conversation
|
@rogday, as we discussed, the slowdown is too big. Please, rework this pull request based on the code that I've sent you. Also, please, get rid of smallvec header. We don't need it. We have cv::AutoBuffer, std::vector and many other standard or already implemented containers. Let's not introduce fundamental containers without big need. In this case, as I shown in my implementation, we can add broadcasting in a universal way without loosing speed and without introducing new containers. |
|
Thanks for the comments. I am looking into your code. |
…sing constBlobsExtraInfo when addConstant is called
I still have disordered outputs like the screenshot that I shared in the comment above. I guess lets keep it still since there are only like a few ONNX conformance tests can be enabled in the parse deny list. |
|
@alalek, looks good to me and can be merged |
|
Please take a look on "Linux Debug" configuration and this failed test: It is unlikely related to this PR, regression is happened before (so it should be fixed in a separate PR). |
AFAICT, the model from this test is loaded by TF importer, which is unchanged. |
Let me try to fix this in a separate PR. |
|
Dear @rogday I'm trying to understand why your changes have broken my model inference when using OpenVino CPU backend. |
Hello, @JulienMaille. We decided to support only CPU version for now. Your model should still work - it should just fall back to default backend in this case. Can you provide more details(maybe with reproducer)? |
|
@rogday Thanks for your reply, I can invite you to a private repository with a full reproducer |
|
@rogday I see you accepted my repository invitation, were you able to reproduce the exception? |
|
As there is no OpenVINO implementation for this layer anymore, so OpenCVDNN-OpenVINO engine tries to execute this node through "OpenVINO custom layer". Custom layers support is not widely tested in OpenVINO itself (at least in public tests), so something could be broken there (even before this patch in OpenCV). As reported in the original issue #22640:
The first item shows what OpenCVDNN-OpenVINO engine may work properly and interaction with OpenVINO works (somehow). BTW, did you try OpenVINO 2021.4.2 LTS with updated OpenCV? ("Cannot get memory!" error message is widely observed with 2022.1 update of OV - several tests are disabled/skipped due to this) |
|
@alalek thanks for your feedback, so you would say the bug is in OpenVino, but they say it's in OpenCV :)
I'm pretty much sure that it works, I will try again and report, but my plan was to update to latest OpenVino
Is there a related issue I can follow? |
|
Link:#21078 |
Reimplementation of Element-wise layers with broadcasting support * init * semi-working initial version * add small_vector * wip * remove smallvec * add nary function * replace auto with Mat in lambda expr used in transform * uncomment asserts * autobuffer shape_buf & step_buf * fix a missing bracket * fixed a missing addLayer in parseElementWise * solve one-dimensional broadcast * remove pre_broadcast_transform for the case of two constants; fix missing constBlobsExtraInfo when addConstant is called * one autobuffer for step & shape * temporal fix for the missing original dimension information * fix parseUnsqueeze when it gets a 1d tensor constant * support sum/mean/min/max with only one input * reuse old code to handle cases of two non-constant inputs * add condition to handle div & mul of two non-constant inputs * use || instead of or * remove trainling spaces * enlarge buf in binary_forward to contain other buffer * use autobuffer in nary_forward * generate data randomly and add more cases for perf * add op and, or & xor * update perf_dnn * remove some comments * remove legacy; add two ONNX conformance tests in filter * move from cpu_denylist to all_denylist * adjust parsing for inputs>=2 Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
This version supports element-wise n-ary operations on k-dimensional tensors with broadcast-compatible shapes.
=== Update by @fengyuentau ===
ONNX operators that needs broadcasting: https://github.com/onnx/onnx/blob/main/docs/Broadcasting.md
Benchmarks on Apple M1:
Input tensor A of shape [8, 256, 128, 100], input tensor B of shape [8, 256, 128, 100]
=== End of updates ===
Benchmarks:
Sum of two tensors with shape=(8,256,128,100)
Eltwise layer takes 24ms
this implementation takes 49ms
Note: small_vector doesn't speed up anything right now, so it'll probably get removed.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.