Fixing bug with comparison of v_int64x2 or v_uint64x2#15738
Fixing bug with comparison of v_int64x2 or v_uint64x2#15738alalek merged 6 commits intoopencv:3.4from ChipKerchner:bugInt64x2Comparison
Conversation
…cases. Rewrite using epi64 instructions - faster too.
|
The WASM version needs updating as well. I'm not sure I can do this. It seems like the changes are something like this: |
| #define OPENCV_HAL_IMPL_SSE_64BIT_CMP_OP(_Tpvec) \ | ||
| inline _Tpvec operator == (const _Tpvec& a, const _Tpvec& b) \ | ||
| { __m128i cmp = _mm_cmpeq_epi32(a.val, b.val); \ | ||
| return _Tpvec(_mm_or_si128(cmp, _mm_shuffle_epi32(cmp, _MM_SHUFFLE(2, 3, 0, 1)))); } \ |
There was a problem hiding this comment.
or
low 32-bits are equal "or" high 32-bits are equal?
Probably operator != should be defined first through "or".
There was a problem hiding this comment.
Actually if either low or upper are set, the whole 64-bit register needs to be set. If not, functions like v_check_any/all will not work
There was a problem hiding this comment.
Not quite sure what you are suggesting in your above comment.
There was a problem hiding this comment.
Fix _mm_or_si128 => _mm_and_si128() is enough.
Example:
v_int64x2 a = v_setall_s64(0x111223344);
v_int64x2 b = v_setall_s64(0x011223344);
There was a problem hiding this comment.
You're right. It should be AND and not OR.
I'll add some test cases.
alalek
left a comment
There was a problem hiding this comment.
BTW, it would be nice to add test case for that.
| #define OPENCV_HAL_IMPL_SSE_64BIT_CMP_OP(_Tpvec) \ | ||
| inline _Tpvec operator == (const _Tpvec& a, const _Tpvec& b) \ | ||
| { __m128i cmp = _mm_cmpeq_epi32(a.val, b.val); \ | ||
| return _Tpvec(_mm_or_si128(cmp, _mm_shuffle_epi32(cmp, _MM_SHUFFLE(2, 3, 0, 1)))); } \ |
There was a problem hiding this comment.
Fix _mm_or_si128 => _mm_and_si128() is enough.
Example:
v_int64x2 a = v_setall_s64(0x111223344);
v_int64x2 b = v_setall_s64(0x011223344);
Looks like there is no Lets update WASM backend in a separate PR (test would help to catch error there too). |
Your branch is based on very old commit from 2019-08-08 |
|
Please ignore errors from Custom/AVX512 builder - they are not related to this PR. |
| .test_loadstore() | ||
| .test_addsub() | ||
| #if CV_SIMD_64F | ||
| .test_cmp64() |
There was a problem hiding this comment.
Perhaps we should avoid this check for v_uint64 types.
@terfendail Could you take a look?
There was a problem hiding this comment.
The problem is for NEON comparison of v_int64x2 and v_uint64x2 does NOT exist unless CV_SIMD_64F.
There was a problem hiding this comment.
You are right.
In theory, CV_SIMD_64F should guard using of vectors of doubles and their operations.
Additionally, it guards some 64u/64s operations and some operations with 32f (like div) on NEON.
Perhaps we should address that in a separate PR.
There was a problem hiding this comment.
On the first look It seems like there is no reason to guard 64u/64s for NEON. This intrinsics doesn't use specific 64f related instructions. Maybe it make sense to change NEON guard scope?
There was a problem hiding this comment.
Underlying instructions are available on AARCH64 only. Removing guard breaks ARMv7 builds.
Probably we need dedicated macro for this functionality (btw, == can be emulated through 32-bit (like SSE2/3), but less/greater comparisons are not easy to re-implement).
Casting v_int64x2 or v_uint64x2 to v_float64x2 and comparing does NOT work in all cases. Rewrite using epi64 instructions - faster too.
Here is an example that does NOT work
v_int64x2 a = v_setall_s64(-1);
v_int64x2 b = v_setall_s64(-1);
if (v_reinterpret_as_f64(a) == v_reinterpret_as_f64(b))
// Always fails because reinterpreting it as a f64 produces a NaN. NaN is never equal to any other number, not even itself.