Skip to content

[hal][neon] Optimise v_expand for AArch64#19773

Merged
opencv-pushbot merged 1 commit intoopencv:3.4from
jondea:add-aarch64-specialised-v_expand-3.4
Mar 26, 2021
Merged

[hal][neon] Optimise v_expand for AArch64#19773
opencv-pushbot merged 1 commit intoopencv:3.4from
jondea:add-aarch64-specialised-v_expand-3.4

Conversation

@jondea
Copy link
Copy Markdown
Contributor

@jondea jondea commented Mar 24, 2021

Similar to #19486 we can fuse the two intrinsics

vmovl_##suffix(vget_high_##suffix());

into the equivalent single intrinsic

vmovl_high_##suffix();

for AArch64.

On gcc 10.2 and before, the compiler emits fewer instructions (see https://godbolt.org/z/x9GcY5evj).

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=linux,docs,ARMv8,ARMv7

@jondea
Copy link
Copy Markdown
Contributor Author

jondea commented Mar 24, 2021

@alalek I would like to add you as a reviewer, but I can't seem to sorry

@jondea
Copy link
Copy Markdown
Contributor Author

jondea commented Mar 25, 2021

To benchmark this change, I ran a few perf tests which call v_expand:

./opencv_perf_imgproc --gtest_filter=*cvtColor* --gtest_param_filter=*127*Luv*RG* --perf_min_samples=20 --perf_force_samples=20  --perf_write_validation_results=results.log

on a c6gd.4xlarge instance, taking the average of 100 runs before and after the changes. As you can see below, I got a speed up of just under 1%. v_expand is used elsewhere too, so there should be improvements elsewhere too.

test_set test number time before changes time after changes ratio
Size_CvtMode_cvtColor8u--cvtColor8u 77 0.155181 0.153927 0.99192
Size_CvtMode_cvtColor8u--cvtColor8u 78 0.161906 0.160384 0.990604
Size_CvtMode_cvtColor8u--cvtColor8u 81 0.156524 0.155422 0.992961
Size_CvtMode_cvtColor8u--cvtColor8u 82 0.162851 0.161776 0.993396

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you for contribution 👍

@opencv-pushbot opencv-pushbot merged commit 6e8022a into opencv:3.4 Mar 26, 2021
@alalek alalek mentioned this pull request Mar 27, 2021
@alalek alalek mentioned this pull request Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants