(4.x) Merge 3.4 by alalek · Pull Request #21186 · opencv/opencv

alalek · 2021-12-03T12:35:07Z

#20658 from smbz:lstm_optimisation
#21107 from take1014:remove_assert_21038
#21112 from vrabaud:3.4_luv_overflow
#21142 from alalek:dnn_two_inputs_ocl_fp16_3.4
#21152 from rogday:fix_defaults
#21159 from rogday:ceil_mode
#21160 from rogday:elu_alpha
#21162 from rogday:softmax_simplification
#21163 from rogday:transpose_default
#21164 from rogday:sum_identity
#21173 from alalek:3.4_dnn_test_reenable_ov_2021_4
#21174 from APrigarina:fix_qr_encoder

Previous "Merge 3.4": #21140

Details

buildworker:Win64 OpenCL=windows-2
Xbuildworker:Custom=linux-1,linux-2,linux-4,linux-6
buildworker:Docs=linux-4,linux-6
build_image:Docs=docs-js:18.04
build_image:Custom=javascript
buildworker:Custom=linux-4,linux-6
Xbuild_image:Custom=javascript-simd
Xbuild_image:Custom=powerpc64le
Xbuild_image:Custom=ubuntu-openvino-2019r3.0:16.04
Xbuild_image:Custom=ubuntu-openvino-2020.3.0:16.04
Xbuild_image:Custom=ubuntu-openvino-2020.4.0:16.04
Xbuild_image:Custom=ubuntu-openvino-2021.1.0:20.04
Xbuild_image:Custom=ubuntu-openvino-2021.2.0:20.04
Xbuild_image:Custom=ubuntu-openvino-2021.3.0:20.04
Xbuild_image:Custom=ubuntu-openvino-2021.4.0:20.04
Xbuild_image:Custom=ubuntu-openvino-2021.4.1:20.04
Xbuildworker:Custom=linux-1
Xbuild_image:Custom=ubuntu-vulkan:16.04
Xbuildworker:Custom=linux-4
Xbuild_image:Custom=fedora:28
Xbuild_image:Custom=ubuntu-cuda:16.04
Xbuild_image:Custom=ubuntu-clang:18.04
Xbuild_image:Custom=ubuntu:20.04
Xbuildworker:Custom=linux-1
Xbuild_image:Custom=javascript-simd
Xbuild_image:Custom=mips64el

Xbuild_image:Custom Mac=openvino-2019r3.0
Xbuild_image:Custom Mac=openvino-2020.3.0
Xbuild_image:Custom Mac=openvino-2020.4.0
Xbuild_image:Custom Mac=openvino-2021.1.0
Xbuild_image:Custom Mac=openvino-2021.2.0
Xbuild_image:Custom Mac=openvino-2021.3.0
Xbuild_image:Custom Mac=openvino-2021.4.0
build_image:Custom Mac=openvino-2021.4.1
test_modules:Custom Mac=dnn,gapi,python2,python3,java

Xbuild_image:Custom Win=openvino-2019r3.0
Xbuild_image:Custom Win=openvino-2020.3.0
Xbuild_image:Custom Win=openvino-2020.4.0
Xbuild_image:Custom Win=openvino-2021.1.0
Xbuild_image:Custom Win=openvino-2021.2.0
Xbuild_image:Custom Win=openvino-2021.3.0
Xbuild_image:Custom Win=openvino-2021.4.0
build_image:Custom Win=openvino-2021.4.1
buildworker:Custom Win=windows-3
test_bigdata:Custom Win=1
test_filter:Custom Win=*
test_modules:Custom Win=dnn,gapi,python2,python3,java
test_opencl:Custom Win=ON
build_contrib:Custom Win=OFF
Xbuild_image:Custom Win=msvs2017
Xbuild_image:Custom Win=msvs2019

resolves opencv#21038 * remove C assert * revert C header * fix several points in review * fix test_ds.cpp

- remove similar test from IE scope under HAVE_INF_ENGINE

* dnn: LSTM optimisation This uses the AVX-optimised fastGEMM1T for matrix multiplications where available, instead of the standard cv::gemm. fastGEMM1T is already used by the fully-connected layer. This commit involves two minor modifications: - Use unaligned access. I don't believe this involves any performance hit in on modern CPUs (Nehalem and Bulldozer onwards) in the case where the address is actually aligned. - Allow for weight matrices where the number of columns is not a multiple of 8. I have not enabled AVX-512 as I don't have an AVX-512 CPU to test on. * Fix warning about initialisation order * Remove C++11 syntax * Fix build when AVX(2) is not available In this case the CV_TRY_X macros are defined to 0, rather than being undefined. * Minor changes as requested: - Don't check hardware support for AVX(2) when dispatch is disabled for these - Add braces * Fix out-of-bounds access in fully connected layer The old tail handling in fastGEMM1T implicitly rounded vecsize up to the next multiple of 8, and the fully connected layer implements padding up to the next multiple of 8 to cope with this. The new tail handling does not round the vecsize upwards like this but it does require that the vecsize is at least 8. To adapt to the new tail handling, the fully connected layer now rounds vecsize itself at the same time as adding the padding(which makes more sense anyway). This also means that the fully connected layer always passes a vecsize of at least 8 to fastGEMM1T, which fixes the out-of-bounds access problems. * Improve tail mask handling - Use static array for generating tail masks (as requested) - Apply tail mask to the weights as well as the input vectors to prevent spurious propagation of NaNs/Infs * Revert whitespace change * Improve readability of conditions for using AVX * dnn(lstm): minor coding style changes, replaced left aligned load

* Fix integer overflow in cv::Luv2RGBinteger::process. For LL=49, uu=205, vv=23, we end up with x=7373056 and y=458 which overflows y*x. * imgproc(test): adjust test parameters to cover SIMD code

…2021_4

fix ceil_mode for Average/MaxPooling * fix ceil_mode * add a comment

alalek · 2021-12-03T15:01:42Z

👍

take1014 and others added 22 commits November 27, 2021 18:34

Merge pull request opencv#21107 from take1014:remove_assert_21038

a627737

resolves opencv#21038 * remove C assert * revert C header * fix several points in review * fix test_ds.cpp

dnn(test): add two_inputs test with FP32/U8 data types

58dc397

- remove similar test from IE scope under HAVE_INF_ENGINE

dnn(DataLayer): fix CPU/OpenCL code paths for FP16 handling

58b0622

fix Clip, LeakyReLU, LRN, Split defaults

05db878

Merge pull request opencv#21142 from alalek:dnn_two_inputs_ocl_fp16_3.4

17d99e6

Merge pull request opencv#21152 from rogday:fix_defaults

0d2857a

add alpha parameter to ELU layer

0e2a368

add new (Log)SoftMax simplification passes

8294107

add default order to transpose

11e6848

add sum of 1 input

33e97e9

Merge pull request opencv#21112 from vrabaud:3.4_luv_overflow

1a1a7bb

* Fix integer overflow in cv::Luv2RGBinteger::process. For LL=49, uu=205, vv=23, we end up with x=7373056 and y=458 which overflows y*x. * imgproc(test): adjust test parameters to cover SIMD code

Merge pull request opencv#21163 from rogday:transpose_default

a806e8c

Merge pull request opencv#21164 from rogday:sum_identity

5da69c0

dnn(test): re-enable tests which works with OpenVINO 2021.4.x (3.4)

bd396e1

qr encoder: fix memory and unused variables issues

37b1876

Merge pull request opencv#21173 from alalek:3.4_dnn_test_reenable_ov_…

b9d0dc6

…2021_4

Merge pull request opencv#21174 from APrigarina:fix_qr_encoder

b6df9de

Merge pull request opencv#21159 from rogday:ceil_mode

1613d30

fix ceil_mode for Average/MaxPooling * fix ceil_mode * add a comment

Merge pull request opencv#21160 from rogday:elu_alpha

dad2b9a

Merge pull request opencv#21162 from rogday:softmax_simplification

35ff9af

Merge remote-tracking branch 'upstream/3.4' into merge-3.4

8b4fa26

alalek merged commit 8b4fa26 into opencv:4.x Dec 3, 2021

alalek mentioned this pull request Dec 11, 2021

(4.x) Merge 3.4 #21238

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(4.x) Merge 3.4#21186

(4.x) Merge 3.4#21186
alalek merged 22 commits intoopencv:4.xfrom
alalek:merge-3.4

alalek commented Dec 3, 2021 •

edited

Loading

Uh oh!

alalek commented Dec 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

alalek commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalek commented Dec 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

alalek commented Dec 3, 2021 •

edited

Loading