Skip to content

(4.x) Merge 3.4#21186

Merged
alalek merged 22 commits intoopencv:4.xfrom
alalek:merge-3.4
Dec 3, 2021
Merged

(4.x) Merge 3.4#21186
alalek merged 22 commits intoopencv:4.xfrom
alalek:merge-3.4

Conversation

@alalek
Copy link
Copy Markdown
Member

@alalek alalek commented Dec 3, 2021

#20658 from smbz:lstm_optimisation
#21107 from take1014:remove_assert_21038
#21112 from vrabaud:3.4_luv_overflow
#21142 from alalek:dnn_two_inputs_ocl_fp16_3.4
#21152 from rogday:fix_defaults
#21159 from rogday:ceil_mode
#21160 from rogday:elu_alpha
#21162 from rogday:softmax_simplification
#21163 from rogday:transpose_default
#21164 from rogday:sum_identity
#21173 from alalek:3.4_dnn_test_reenable_ov_2021_4
#21174 from APrigarina:fix_qr_encoder

Previous "Merge 3.4": #21140

Details
buildworker:Win64 OpenCL=windows-2
Xbuildworker:Custom=linux-1,linux-2,linux-4,linux-6
buildworker:Docs=linux-4,linux-6
build_image:Docs=docs-js:18.04
build_image:Custom=javascript
buildworker:Custom=linux-4,linux-6
Xbuild_image:Custom=javascript-simd
Xbuild_image:Custom=powerpc64le
Xbuild_image:Custom=ubuntu-openvino-2019r3.0:16.04
Xbuild_image:Custom=ubuntu-openvino-2020.3.0:16.04
Xbuild_image:Custom=ubuntu-openvino-2020.4.0:16.04
Xbuild_image:Custom=ubuntu-openvino-2021.1.0:20.04
Xbuild_image:Custom=ubuntu-openvino-2021.2.0:20.04
Xbuild_image:Custom=ubuntu-openvino-2021.3.0:20.04
Xbuild_image:Custom=ubuntu-openvino-2021.4.0:20.04
Xbuild_image:Custom=ubuntu-openvino-2021.4.1:20.04
Xbuildworker:Custom=linux-1
Xbuild_image:Custom=ubuntu-vulkan:16.04
Xbuildworker:Custom=linux-4
Xbuild_image:Custom=fedora:28
Xbuild_image:Custom=ubuntu-cuda:16.04
Xbuild_image:Custom=ubuntu-clang:18.04
Xbuild_image:Custom=ubuntu:20.04
Xbuildworker:Custom=linux-1
Xbuild_image:Custom=javascript-simd
Xbuild_image:Custom=mips64el

Xbuild_image:Custom Mac=openvino-2019r3.0
Xbuild_image:Custom Mac=openvino-2020.3.0
Xbuild_image:Custom Mac=openvino-2020.4.0
Xbuild_image:Custom Mac=openvino-2021.1.0
Xbuild_image:Custom Mac=openvino-2021.2.0
Xbuild_image:Custom Mac=openvino-2021.3.0
Xbuild_image:Custom Mac=openvino-2021.4.0
build_image:Custom Mac=openvino-2021.4.1
test_modules:Custom Mac=dnn,gapi,python2,python3,java

Xbuild_image:Custom Win=openvino-2019r3.0
Xbuild_image:Custom Win=openvino-2020.3.0
Xbuild_image:Custom Win=openvino-2020.4.0
Xbuild_image:Custom Win=openvino-2021.1.0
Xbuild_image:Custom Win=openvino-2021.2.0
Xbuild_image:Custom Win=openvino-2021.3.0
Xbuild_image:Custom Win=openvino-2021.4.0
build_image:Custom Win=openvino-2021.4.1
buildworker:Custom Win=windows-3
test_bigdata:Custom Win=1
test_filter:Custom Win=*
test_modules:Custom Win=dnn,gapi,python2,python3,java
test_opencl:Custom Win=ON
build_contrib:Custom Win=OFF
Xbuild_image:Custom Win=msvs2017
Xbuild_image:Custom Win=msvs2019

take1014 and others added 22 commits November 27, 2021 18:34
resolves opencv#21038

* remove C assert

* revert C header

* fix several points in review

* fix test_ds.cpp
- remove similar test from IE scope under HAVE_INF_ENGINE
* dnn: LSTM optimisation

This uses the AVX-optimised fastGEMM1T for matrix multiplications where available, instead of the standard cv::gemm.

fastGEMM1T is already used by the fully-connected layer.  This commit involves two minor modifications:
 - Use unaligned access.  I don't believe this involves any performance hit in on modern CPUs (Nehalem and Bulldozer onwards) in the case where the address is actually aligned.
 - Allow for weight matrices where the number of columns is not a multiple of 8.

I have not enabled AVX-512 as I don't have an AVX-512 CPU to test on.

* Fix warning about initialisation order

* Remove C++11 syntax

* Fix build when AVX(2) is not available

In this case the CV_TRY_X macros are defined to 0, rather than being undefined.

* Minor changes as requested:

 - Don't check hardware support for AVX(2) when dispatch is disabled for these
 - Add braces

* Fix out-of-bounds access in fully connected layer

The old tail handling in fastGEMM1T implicitly rounded vecsize up to the next multiple of 8, and the fully connected layer implements padding up to the next multiple of 8 to cope with this.  The new tail handling does not round the vecsize upwards like this but it does require that the vecsize is at least 8.  To adapt to the new tail handling, the fully connected layer now rounds vecsize itself at the same time as adding the padding(which makes more sense anyway).

This also means that the fully connected layer always passes a vecsize of at least 8 to fastGEMM1T, which fixes the out-of-bounds access problems.

* Improve tail mask handling

 - Use static array for generating tail masks (as requested)
 - Apply tail mask to the weights as well as the input vectors to prevent spurious propagation of NaNs/Infs

* Revert whitespace change

* Improve readability of conditions for using AVX

* dnn(lstm): minor coding style changes, replaced left aligned load
* Fix integer overflow in cv::Luv2RGBinteger::process.

For LL=49, uu=205, vv=23, we end up with x=7373056 and y=458
which overflows y*x.

* imgproc(test): adjust test parameters to cover SIMD code
fix ceil_mode for Average/MaxPooling

* fix ceil_mode

* add a comment
@alalek
Copy link
Copy Markdown
Member Author

alalek commented Dec 3, 2021

👍

@alalek alalek merged commit 8b4fa26 into opencv:4.x Dec 3, 2021
@alalek alalek mentioned this pull request Dec 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants