[hal][neon] Optimize the v_dotprod_fast intrinsics for aarch64. by fpetrogalli · Pull Request #19486 · opencv/opencv

fpetrogalli · 2021-02-09T11:46:40Z

On Armv8 in AArch64 execution mode, we can skip the sequence

   v<op>_<ty>(vget_high_<ty>(x), vget_high_<ty>(y))

in favour of

   v<op>_high_<ty>(x, y)

This has better changes for recent compilers to use less data movement
operations and better register allocation. See for example:

https://godbolt.org/z/bPq7vd

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

[ X ] I agree to contribute to the project under Apache 2 License.
[ X ] To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
[ X ] The PR is proposed to proper branch
There is reference to original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

force_builders=linux,docs,ARMv8,ARMv7

On Armv8 in AArch64 execution mode, we can skip the sequence v<op>_<ty>(vget_high_<ty>(x), vget_high_<ty>(y)) in favour of v<op>_high_<ty>(x, y) This has better changes for recent compilers to use less data movement operations and better register allocation. See for example: https://godbolt.org/z/bPq7vd

fpetrogalli · 2021-02-09T14:42:07Z

Here is the speedup (as in new version / old version) I measured on graviton2 on AWS (neoverse N1).

I used perf to measure the following command:

./bin/opencv_perf_core --gtest_filter=*PerfHamming*'

The total cycle count of the function cv::cpu_baseline::dotProd_32s went down to 0.86x in the version with the patch, compared to the baseline in master.

asmorkalov · 2021-02-09T14:49:26Z

@fpetrogalli Please take a look on CI builds for iOS. The patch breaks the build.

asmorkalov · 2021-02-09T14:51:04Z

The same for Android on arm-v7a.

fpetrogalli · 2021-02-09T14:52:33Z

@asmorkalov - yes! Thank you, I have noticed. I am trying to reproduce the error. I am quite surprised because the macro CV_SIMD128_64F is conditionally defined when the __aarch64__ is, and I have no clue (yet) why this is happening on armv7. I am probably missing something trivial.

alalek · 2021-02-10T11:09:25Z

/cc @tomoaki0705

modules/core/include/opencv2/core/hal/intrin_neon.hpp

tomoaki0705

Generally looks good.
Please see small notes from me.

modules/core/include/opencv2/core/hal/intrin_neon.hpp

PR: opencv#19486

…Armv8.

modules/core/include/opencv2/core/hal/intrin_neon.hpp

The fix is needed to prevent warnings when building for Armv7.

alalek

@fpetrogalli Thank you for contribution!
@tomoaki0705 Thank you for review!

[hal][neon] Fix build failure on armv7.

fd92382

asmorkalov requested a review from terfendail February 10, 2021 08:00

alalek reviewed Feb 10, 2021

View reviewed changes

modules/core/include/opencv2/core/hal/intrin_neon.hpp Outdated Show resolved Hide resolved

tomoaki0705 suggested changes Feb 10, 2021

View reviewed changes

modules/core/include/opencv2/core/hal/intrin_neon.hpp Outdated Show resolved Hide resolved

modules/core/include/opencv2/core/hal/intrin_neon.hpp Outdated Show resolved Hide resolved

Francesco Petrogalli added 2 commits February 10, 2021 13:36

[hal][neon] Address review comments in PR.

1e59b9a

PR: opencv#19486

[hal][neon] Define macro to check for the AArch64 execution state of …

9b5ee45

…Armv8.

alalek reviewed Feb 10, 2021

View reviewed changes

modules/core/include/opencv2/core/hal/intrin_neon.hpp Show resolved Hide resolved

[hal][neon] Fix macro definition for AArch64.

720da8a

The fix is needed to prevent warnings when building for Armv7.

alalek approved these changes Feb 11, 2021

View reviewed changes

alalek merged commit 6ee23c9 into opencv:3.4 Feb 11, 2021

fpetrogalli deleted the dotprod_fast-3.4 branch February 11, 2021 13:57

alalek mentioned this pull request Feb 12, 2021

Merge 3.4 #19517

Merged

jondea mentioned this pull request Mar 24, 2021

[hal][neon] Optimise v_expand for AArch64 #19773

Merged

6 tasks

alalek mentioned this pull request Apr 9, 2021

(5.x) Merge 4.x #19885

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[hal][neon] Optimize the v_dotprod_fast intrinsics for aarch64.#19486

[hal][neon] Optimize the v_dotprod_fast intrinsics for aarch64.#19486
alalek merged 5 commits intoopencv:3.4from
fpetrogalli:dotprod_fast-3.4

fpetrogalli commented Feb 9, 2021 •

edited by alalek

Loading

Uh oh!

fpetrogalli commented Feb 9, 2021

Uh oh!

asmorkalov commented Feb 9, 2021

Uh oh!

asmorkalov commented Feb 9, 2021

Uh oh!

fpetrogalli commented Feb 9, 2021

Uh oh!

alalek commented Feb 10, 2021

Uh oh!

Uh oh!

tomoaki0705 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alalek left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

fpetrogalli commented Feb 9, 2021 • edited by alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

fpetrogalli commented Feb 9, 2021

Uh oh!

asmorkalov commented Feb 9, 2021

Uh oh!

asmorkalov commented Feb 9, 2021

Uh oh!

fpetrogalli commented Feb 9, 2021

Uh oh!

alalek commented Feb 10, 2021

Uh oh!

Uh oh!

tomoaki0705 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fpetrogalli commented Feb 9, 2021 •

edited by alalek

Loading