Skip to content

Fixing bug with comparison of v_int64x2 or v_uint64x2#15738

Merged
alalek merged 6 commits intoopencv:3.4from
ChipKerchner:bugInt64x2Comparison
Oct 22, 2019
Merged

Fixing bug with comparison of v_int64x2 or v_uint64x2#15738
alalek merged 6 commits intoopencv:3.4from
ChipKerchner:bugInt64x2Comparison

Conversation

@ChipKerchner
Copy link
Copy Markdown
Contributor

@ChipKerchner ChipKerchner commented Oct 18, 2019

Casting v_int64x2 or v_uint64x2 to v_float64x2 and comparing does NOT work in all cases. Rewrite using epi64 instructions - faster too.

Here is an example that does NOT work

v_int64x2 a = v_setall_s64(-1);
v_int64x2 b = v_setall_s64(-1);

if (v_reinterpret_as_f64(a) == v_reinterpret_as_f64(b))

// Always fails because reinterpreting it as a f64 produces a NaN. NaN is never equal to any other number, not even itself.

force_builders=Linux AVX2,Custom
buildworker:Custom=linux-3
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX
disable_ipp=ON

ChipKerchner added 2 commits October 18, 2019 10:11
@ChipKerchner
Copy link
Copy Markdown
Contributor Author

ChipKerchner commented Oct 18, 2019

The WASM version needs updating as well. I'm not sure I can do this.

It seems like the changes are something like this:

--- a/modules/core/include/opencv2/core/hal/intrin_wasm.hpp
+++ b/modules/core/include/opencv2/core/hal/intrin_wasm.hpp
@@ -2742,6 +2742,8 @@ OPENCV_HAL_IMPL_WASM_INIT_CMP_OP(v_uint16x8, u16x8, i16x8)
 OPENCV_HAL_IMPL_WASM_INIT_CMP_OP(v_int16x8, i16x8, i16x8)
 OPENCV_HAL_IMPL_WASM_INIT_CMP_OP(v_uint32x4, u32x4, i32x4)
 OPENCV_HAL_IMPL_WASM_INIT_CMP_OP(v_int32x4, i32x4, i32x4)
+OPENCV_HAL_IMPL_WASM_INIT_CMP_OP(v_uint64x2, u64x2, i64x2)
+OPENCV_HAL_IMPL_WASM_INIT_CMP_OP(v_int64x2, i64x2, i64x2)
 OPENCV_HAL_IMPL_WASM_INIT_CMP_OP(v_float32x4, f32x4, f32x4)

 #ifdef __wasm_unimplemented_simd128__
@@ -2762,15 +2764,6 @@ OPENCV_HAL_IMPL_INIT_FALLBACK_CMP_OP(v_float64x2, <=)
 OPENCV_HAL_IMPL_INIT_FALLBACK_CMP_OP(v_float64x2, >=)
 #endif

-#define OPENCV_HAL_IMPL_WASM_64BIT_CMP_OP(_Tpvec, cast) \
-inline _Tpvec operator == (const _Tpvec& a, const _Tpvec& b) \
-{ return cast(v_reinterpret_as_f64(a) == v_reinterpret_as_f64(b)); } \
-inline _Tpvec operator != (const _Tpvec& a, const _Tpvec& b) \
-{ return cast(v_reinterpret_as_f64(a) != v_reinterpret_as_f64(b)); }
-
-OPENCV_HAL_IMPL_WASM_64BIT_CMP_OP(v_uint64x2, v_reinterpret_as_u64)
-OPENCV_HAL_IMPL_WASM_64BIT_CMP_OP(v_int64x2, v_reinterpret_as_s64)
-
 inline v_float32x4 v_not_nan(const v_float32x4& a)
 {
     v128_t z = wasm_i32x4_splat(0x7fffffff);

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

#define OPENCV_HAL_IMPL_SSE_64BIT_CMP_OP(_Tpvec) \
inline _Tpvec operator == (const _Tpvec& a, const _Tpvec& b) \
{ __m128i cmp = _mm_cmpeq_epi32(a.val, b.val); \
return _Tpvec(_mm_or_si128(cmp, _mm_shuffle_epi32(cmp, _MM_SHUFFLE(2, 3, 0, 1)))); } \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or

low 32-bits are equal "or" high 32-bits are equal?

Probably operator != should be defined first through "or".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually if either low or upper are set, the whole 64-bit register needs to be set. If not, functions like v_check_any/all will not work

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what you are suggesting in your above comment.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix _mm_or_si128 => _mm_and_si128() is enough.

Example:

v_int64x2 a = v_setall_s64(0x111223344);
v_int64x2 b = v_setall_s64(0x011223344);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It should be AND and not OR.

I'll add some test cases.

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, it would be nice to add test case for that.

#define OPENCV_HAL_IMPL_SSE_64BIT_CMP_OP(_Tpvec) \
inline _Tpvec operator == (const _Tpvec& a, const _Tpvec& b) \
{ __m128i cmp = _mm_cmpeq_epi32(a.val, b.val); \
return _Tpvec(_mm_or_si128(cmp, _mm_shuffle_epi32(cmp, _MM_SHUFFLE(2, 3, 0, 1)))); } \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix _mm_or_si128 => _mm_and_si128() is enough.

Example:

v_int64x2 a = v_setall_s64(0x111223344);
v_int64x2 b = v_setall_s64(0x011223344);

@alalek
Copy link
Copy Markdown
Member

alalek commented Oct 18, 2019

WASM

Looks like there is no wasm_i64x2_eq() yet:
https://github.com/emscripten-core/emscripten/blob/incoming/system/include/wasm_simd128.h

Lets update WASM backend in a separate PR (test would help to catch error there too).

@alalek
Copy link
Copy Markdown
Member

alalek commented Oct 18, 2019

merge conflict

Your branch is based on very old commit from 2019-08-08
Please avoid moving code - put fresh code near the end of the file.

# add upstream (once)
git remote add upstream	https://github.com/opencv/opencv.git
# this is very useful for conflicts resolving: git config --global merge.conflictstyle diff3

# rebase current branch on latest code
git fetch -t upstream master:master 3.4:3.4
git rebase -i upstream/3.4
# in case of conflict: edit files, "git add <modified files>", "git rebase --continue"

# update with --force
git push origin <your_branch_name_like_bugInt64x2Comparison> -f

# In the future: starting of new bugfix branch:
git fetch -t upstream master:master 3.4:3.4
git checkout -B my_new_fix upstream/3.4

@alalek
Copy link
Copy Markdown
Member

alalek commented Oct 18, 2019

Please ignore errors from Custom/AVX512 builder - they are not related to this PR.

.test_loadstore()
.test_addsub()
#if CV_SIMD_64F
.test_cmp64()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should avoid this check for v_uint64 types.
@terfendail Could you take a look?

Copy link
Copy Markdown
Contributor Author

@ChipKerchner ChipKerchner Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is for NEON comparison of v_int64x2 and v_uint64x2 does NOT exist unless CV_SIMD_64F.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.
In theory, CV_SIMD_64F should guard using of vectors of doubles and their operations.
Additionally, it guards some 64u/64s operations and some operations with 32f (like div) on NEON.

Perhaps we should address that in a separate PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the first look It seems like there is no reason to guard 64u/64s for NEON. This intrinsics doesn't use specific 64f related instructions. Maybe it make sense to change NEON guard scope?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underlying instructions are available on AARCH64 only. Removing guard breaks ARMv7 builds.

Probably we need dedicated macro for this functionality (btw, == can be emulated through 32-bit (like SSE2/3), but less/greater comparisons are not easy to re-implement).

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! Thank you 👍

@alalek alalek merged commit 5a6a494 into opencv:3.4 Oct 22, 2019
@alalek alalek mentioned this pull request Oct 24, 2019
@ChipKerchner ChipKerchner deleted the bugInt64x2Comparison branch November 5, 2019 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants