Skip to content

New v_reverse HAL intrinsic for reversing the ordering of a vector#15662

Merged
alalek merged 6 commits intoopencv:3.4from
ChipKerchner:addVReverseIntrinsic
Oct 11, 2019
Merged

New v_reverse HAL intrinsic for reversing the ordering of a vector#15662
alalek merged 6 commits intoopencv:3.4from
ChipKerchner:addVReverseIntrinsic

Conversation

@ChipKerchner
Copy link
Copy Markdown
Contributor

@ChipKerchner ChipKerchner commented Oct 8, 2019

New v_reverse HAL intrinsic for reversing the ordering of a vector.

force_builders=Linux AVX2,Custom
buildworker:Custom=linux-3
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX
disable_ipp=ON

@ChipKerchner
Copy link
Copy Markdown
Contributor Author

Some one with AVX512 expertise please review the AVX512F and AVX512VBMI code

Copy link
Copy Markdown
Contributor

@tomoaki0705 tomoaki0705 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested some improvements. Great work !


inline v_uint64x4 v_reverse(const v_uint64x4 &a)
{
return v_uint64x4(_mm256_permute4x64_epi64(a.val, (3 << 0) | (2 << 2) | (1 << 4) | (0 << 6)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use _MM_SHUFFLE instead of immediate masking.


inline v_uint32x4 v_reverse(const v_uint32x4 &a)
{
#if CV_SSE2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to guard CV_SSE2 here.

inline v_uint32x4 v_reverse(const v_uint32x4 &a)
{
#if CV_SSE2
return v_uint32x4(_mm_shuffle_epi32(a.val, (3 << 0) | (2 << 2) | (1 << 4) | (0 << 6)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_MM_SHUFFLE please


inline v_uint64x2 v_reverse(const v_uint64x2 &a)
{
#if CV_SSE2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, no need to guard CV_SSE2. Remove else part below, too

inline v_uint64x2 v_reverse(const v_uint64x2 &a)
{
#if CV_SSE2
return v_uint64x2(_mm_shuffle_epi32(a.val, (2 << 0) | (3 << 2) | (0 << 4) | (1 << 6)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_MM_SHUFFLE please

#else
ushort CV_DECL_ALIGNED(32) d[8];
v_store_aligned(d, a);
return v_uint16x8(d[7], d[6], d[5], d[4], d[3], d[2], d[1], d[0]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to use _mm_shufflelo/hi_epi16 here. That gives a slightly better performance at least on my PC.
Something like that:

    __m128i r = _mm_shuffle_epi32(a.val, _MM_SHUFFLE(0, 1, 2, 3));
    r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(2, 3, 0, 1));
    r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(2, 3, 0, 1));
    return v_uint16x8(r);

@ChipKerchner
Copy link
Copy Markdown
Contributor Author

Sorry about the broken builds for WASM and MIPS. In the future, I'll be able to add the intrinsic code for WASM but I doubt I'll be able to handle the optimized MIPS code.

@alalek alalek mentioned this pull request Oct 24, 2019
@ChipKerchner ChipKerchner deleted the addVReverseIntrinsic branch November 5, 2019 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants