New v_reverse HAL intrinsic for reversing the ordering of a vector#15662
New v_reverse HAL intrinsic for reversing the ordering of a vector#15662alalek merged 6 commits intoopencv:3.4from ChipKerchner:addVReverseIntrinsic
Conversation
|
Some one with AVX512 expertise please review the AVX512F and AVX512VBMI code |
tomoaki0705
left a comment
There was a problem hiding this comment.
I suggested some improvements. Great work !
|
|
||
| inline v_uint64x4 v_reverse(const v_uint64x4 &a) | ||
| { | ||
| return v_uint64x4(_mm256_permute4x64_epi64(a.val, (3 << 0) | (2 << 2) | (1 << 4) | (0 << 6))); |
There was a problem hiding this comment.
please use _MM_SHUFFLE instead of immediate masking.
|
|
||
| inline v_uint32x4 v_reverse(const v_uint32x4 &a) | ||
| { | ||
| #if CV_SSE2 |
There was a problem hiding this comment.
No need to guard CV_SSE2 here.
| inline v_uint32x4 v_reverse(const v_uint32x4 &a) | ||
| { | ||
| #if CV_SSE2 | ||
| return v_uint32x4(_mm_shuffle_epi32(a.val, (3 << 0) | (2 << 2) | (1 << 4) | (0 << 6))); |
|
|
||
| inline v_uint64x2 v_reverse(const v_uint64x2 &a) | ||
| { | ||
| #if CV_SSE2 |
There was a problem hiding this comment.
Same as above, no need to guard CV_SSE2. Remove else part below, too
| inline v_uint64x2 v_reverse(const v_uint64x2 &a) | ||
| { | ||
| #if CV_SSE2 | ||
| return v_uint64x2(_mm_shuffle_epi32(a.val, (2 << 0) | (3 << 2) | (0 << 4) | (1 << 6))); |
…e issue with NEON.
| #else | ||
| ushort CV_DECL_ALIGNED(32) d[8]; | ||
| v_store_aligned(d, a); | ||
| return v_uint16x8(d[7], d[6], d[5], d[4], d[3], d[2], d[1], d[0]); |
There was a problem hiding this comment.
It would be better to use _mm_shufflelo/hi_epi16 here. That gives a slightly better performance at least on my PC.
Something like that:
__m128i r = _mm_shuffle_epi32(a.val, _MM_SHUFFLE(0, 1, 2, 3));
r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(2, 3, 0, 1));
r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(2, 3, 0, 1));
return v_uint16x8(r);
|
Sorry about the broken builds for WASM and MIPS. In the future, I'll be able to add the intrinsic code for WASM but I doubt I'll be able to handle the optimized MIPS code. |
New v_reverse HAL intrinsic for reversing the ordering of a vector.