New v_reverse HAL intrinsic for reversing the ordering of a vector#15662

Merged

alalek merged 6 commits intoopencv:3.4from

ChipKerchner:addVReverseIntrinsic

Oct 11, 2019

Contributor

ChipKerchner commented Oct 8, 2019 •

edited by alalek

Loading

New v_reverse HAL intrinsic for reversing the ordering of a vector.

force_builders=Linux AVX2,Custom
buildworker:Custom=linux-3
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX
disable_ipp=ON


          New v_reverse HAL intrinsic for reversing the ordering of a vector

6f03024

ChipKerchner mentioned this pull request

Vectorize flipHoriz and flipVert functions. #15555

Merged

ChipKerchner added 3 commits

October 8, 2019 07:56


          Fix conflict.

6ca4e07


          Try to resolve conflict again.

8c4000c


          Try one more time.

638156d

Contributor Author

ChipKerchner commented Oct 8, 2019

Some one with AVX512 expertise please review the AVX512F and AVX512VBMI code

tomoaki0705 reviewed

View reviewed changes

Contributor

tomoaki0705 left a comment

I suggested some improvements. Great work !

modules/core/include/opencv2/core/hal/intrin_avx.hpp Outdated

+              inline v_uint64x4 v_reverse(const v_uint64x4 &a)
+              {
+                  return v_uint64x4(_mm256_permute4x64_epi64(a.val, (3 << 0) | (2 << 2) | (1 << 4) | (0 << 6)));

Contributor

tomoaki0705 Oct 8, 2019

please use _MM_SHUFFLE instead of immediate masking.

modules/core/include/opencv2/core/hal/intrin_sse.hpp Outdated

+              inline v_uint32x4 v_reverse(const v_uint32x4 &a)
+              {
+              #if CV_SSE2

Contributor

tomoaki0705 Oct 8, 2019

No need to guard CV_SSE2 here.

modules/core/include/opencv2/core/hal/intrin_sse.hpp Outdated

+              inline v_uint32x4 v_reverse(const v_uint32x4 &a)
+              {
+              #if CV_SSE2
+                  return v_uint32x4(_mm_shuffle_epi32(a.val, (3 << 0) | (2 << 2) | (1 << 4) | (0 << 6)));

Contributor

tomoaki0705 Oct 8, 2019

_MM_SHUFFLE please

modules/core/include/opencv2/core/hal/intrin_sse.hpp Outdated

+              inline v_uint64x2 v_reverse(const v_uint64x2 &a)
+              {
+              #if CV_SSE2

Contributor

tomoaki0705 Oct 8, 2019

Same as above, no need to guard CV_SSE2. Remove else part below, too

modules/core/include/opencv2/core/hal/intrin_sse.hpp Outdated

+              inline v_uint64x2 v_reverse(const v_uint64x2 &a)
+              {
+              #if CV_SSE2
+                  return v_uint64x2(_mm_shuffle_epi32(a.val, (2 << 0) | (3 << 2) | (0 << 4) | (1 << 6)));

Contributor

tomoaki0705 Oct 8, 2019

_MM_SHUFFLE please


          Add _MM_SHUFFLE. Remove non-vectorize code in SSE2. Fix copy and past…

…e issue with NEON.

terfendail reviewed

View reviewed changes

modules/core/include/opencv2/core/hal/intrin_sse.hpp Outdated

+              #else
+                  ushort CV_DECL_ALIGNED(32) d[8];
+                  v_store_aligned(d, a);
+                  return v_uint16x8(d[7], d[6], d[5], d[4], d[3], d[2], d[1], d[0]);

Contributor

terfendail Oct 9, 2019

It would be better to use _mm_shufflelo/hi_epi16 here. That gives a slightly better performance at least on my PC.
Something like that:

    __m128i r = _mm_shuffle_epi32(a.val, _MM_SHUFFLE(0, 1, 2, 3));
    r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(2, 3, 0, 1));
    r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(2, 3, 0, 1));
    return v_uint16x8(r);


          Change v_uint16x8 SSE2 version to use shuffles

5df4a33

terfendail approved these changes

View reviewed changes

alalek assigned terfendail

alalek merged commit 027769b into opencv:3.4

This was referenced Oct 15, 2019

SIMD: v_reverse() is missing in MIPS-MSA backend #15705

Closed

JS(SIMD): v_reverse implementation #15709

Merged

Contributor Author

ChipKerchner commented Oct 17, 2019

Sorry about the broken builds for WASM and MIPS. In the future, I'll be able to add the intrinsic code for WASM but I doubt I'll be able to handle the optimized MIPS code.

alalek mentioned this pull request

Merge 3.4 #15771

Merged

ChipKerchner deleted the addVReverseIntrinsic branch

November 5, 2019 17:54

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: core feature optimization