Skip to content

fix to convert float32 to int32/uint32 with rounding to nearest (ties to even).#24271

Merged
asmorkalov merged 11 commits intoopencv:4.xfrom
Kumataro:fix24163
Dec 25, 2023
Merged

fix to convert float32 to int32/uint32 with rounding to nearest (ties to even).#24271
asmorkalov merged 11 commits intoopencv:4.xfrom
Kumataro:fix24163

Conversation

@Kumataro
Copy link
Copy Markdown
Contributor

fix #24163

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

(carotene is BSD)

@Kumataro
Copy link
Copy Markdown
Contributor Author

Hello, at Ubuntu2004-ARM64-Debug, It looks like to failed to save unit test result.

The test itself appears to complete without any problems.
I believe this is not side-effect at this fix.

Could you please confirm that this test environment works properly?

Container for artifact "junit-html-ubuntu20-arm64" successfully created. Starting upload of file(s)
Total file count: 1 ---- Processed file #0 (0.0%)
Total file count: 1 ---- Processed file #0 (0.0%)
Total file count: 1 ---- Processed file #0 (0.0%)
Total file count: 1 ---- Processed file #0 (0.0%)
Total file count: 1 ---- Processed file #0 (0.0%)
Total file count: 1 ---- Processed file #0 (0.0%)
An error has been caught http-client index 0, retrying the upload
Error: read ECONNRESET

@opencv-alalek opencv-alalek added bug optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc labels Sep 15, 2023
@opencv-alalek opencv-alalek added this to the 4.9.0 milestone Sep 21, 2023
@mshabunin
Copy link
Copy Markdown
Contributor

What about cvRound? Is it correct?

#elif defined __GNUC__ && defined __arm__ && (defined __ARM_PCS_VFP || defined __ARM_VFPV3__ || defined __ARM_NEON__) && !defined __SOFTFP__
// 1. general scheme
#define ARM_ROUND(_value, _asm_string) \
int res; \
float temp; \
CV_UNUSED(temp); \
__asm__(_asm_string : [res] "=r" (res), [temp] "=w" (temp) : [value] "w" (_value)); \
return res
// 2. version for double
#ifdef __clang__
#define CV_INLINE_ROUND_DBL(value) ARM_ROUND(value, "vcvtr.s32.f64 %[temp], %[value] \n vmov %[res], %[temp]")
#else
#define CV_INLINE_ROUND_DBL(value) ARM_ROUND(value, "vcvtr.s32.f64 %[temp], %P[value] \n vmov %[res], %[temp]")
#endif
// 3. version for float
#define CV_INLINE_ROUND_FLT(value) ARM_ROUND(value, "vcvtr.s32.f32 %[temp], %[value]\n vmov %[res], %[temp]")

As I understand it uses vcvtr on ARMv8 and ARMv7 with VFPv3 support.

@Kumataro
Copy link
Copy Markdown
Contributor Author

Thank you for your comment, I think cvRound() is tested and passed.

ASSERT_EQ(2, cvRound(2.5));
ASSERT_EQ(4, cvRound(3.5));
ASSERT_EQ(-2, cvRound(-2.5));
ASSERT_EQ(-4, cvRound(-3.5));

I'll try to think of ways to improve it a little more.
vcvtr works with the rounding mode with FPCLR register setting.
But FPCLR has no round-to-ties-to-even mode, so it is wonder...

Mat src1 = ( Mat_<uchar>(1, 16) << 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 );
Mat src2 = ( Mat_<uchar>(1, 16) << 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 );
Mat mean = ( src1 + src2 ) / 2;
Mat expected(1,16,CV_8U);
Copy link
Copy Markdown
Contributor

@mshabunin mshabunin Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to use non 16-divisible size to verify that tail processing also works as expected. E.g. 20 elements.

Comon array size can be stored as a separate constant to reduce duplication.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review.
I updated this test including 3x3, 3x4, 3x5, 3x6 cases.

  • 3x3 - 9 elements - float32_4 x 2 + float32
  • 3x4 - 12 elements - float32_4 x 3
  • 3x5 - 15 elements - float32_4 x 3 + float32_2 x 1 + float32 x 1
  • 3x6 - 18 elements - float32_4 x 4 + float32_2 x 1

I think these test are including above test.

  • 3x1 - 3 elements - float32_2 x 1 + float32 x 1
  • 3x2 - 6 elements - float32_4 x 1 + float32x2 x 1

And CV_8S, CV_16U, CV_16S and CV_32S are supported to test.

@Kumataro
Copy link
Copy Markdown
Contributor Author

Thank you for waiting. I'm a little confused.

What about cvRound? Is it correct?

Strictly, it is not correct in this implementation.

And when calculating (1+0)/2, SIMD implementation and cvRound() are mixed.
I would like to avoid additional implementations that slow down cvRound() if possible.


At OpenCV 4.8.0 doc, cvRound is described as Rounds floating-point number to the nearest integer.

https://docs.opencv.org/4.8.0/db/de0/group__core__utils.html#ga085eca238176984a0b72df2818598d85

cvRound seems to work with the floating-point rounding direction. So this description is not correct.

For arm, vcvtr is used. it is instruction to convert double/float to integer with VFP register. "r" means working with the floating-point rounding direction

We can control the floating-point rounding direction with https://en.cppreference.com/w/cpp/numeric/fenv/feround .

So the result of cvRound() is predictable, however it is not only Rounds floating-point number to the nearest integer.

#include <iostream>
#include <cfenv>
#include <opencv2/core.hpp>

void test(std::string title, int mode)
{
  switch(mode)
  {
  case FE_DOWNWARD:
  case FE_TONEAREST:
  case FE_TOWARDZERO:
  case FE_UPWARD:
    std::fesetround(mode);
    break;
  case -1:
  default:
    break;
  }

  std::cout << title << std::endl;
  std::cout << "cvRound(-1.5) = " << cvRound(-1.5) << std::endl;
  std::cout << "cvRound(-0.5) = " << cvRound(-0.5) << std::endl;
  std::cout << "cvRound(-0.2) = " << cvRound(-0.2) << std::endl;
  std::cout << "cvRound( 0.0) = " << cvRound( 0.0) << std::endl;
  std::cout << "cvRound( 0.2) = " << cvRound( 0.2) << std::endl;
  std::cout << "cvRound( 0.5) = " << cvRound( 0.5) << std::endl;
  std::cout << "cvRound( 1.5) = " << cvRound( 1.5) << std::endl;

  cv::Mat ones  = cv::Mat::ones(3,3,CV_8U);
  cv::Mat zeros = cv::Mat::zeros(3,3,CV_8U);
  cv::Mat means = ( ones + zeros ) / 2;
  std::cout << "(1+0)/2 = " << means << std::endl;
  std::cout << std::endl;
}

int main(void)
{
  test("Default",       -1);
  test("FE_DOWNLOAD",   FE_DOWNWARD);
  test("FE_TONREAREST", FE_TONEAREST);
  test("FE_TOWARDZERO", FE_TOWARDZERO);
  test("FE_UPWARD",     FE_UPWARD);

  return 0;
}

Result on x86-64. (at 4.x branch)

Default
cvRound(-1.5) = -2
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 2
(1+0)/2 = [  0,   0,   0;
   0,   0,   0;
   0,   0,   0]

FE_DOWNLOAD
cvRound(-1.5) = -2
cvRound(-0.5) = -1
cvRound(-0.2) = -1
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 1
(1+0)/2 = [  0,   0,   0;
   0,   0,   0;
   0,   0,   0]

FE_TONREAREST
cvRound(-1.5) = -2
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 2
(1+0)/2 = [  0,   0,   0;
   0,   0,   0;
   0,   0,   0]

FE_TOWARDZERO
cvRound(-1.5) = -1
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 1
(1+0)/2 = [  0,   0,   0;
   0,   0,   0;
   0,   0,   0]

FE_UPWARD
cvRound(-1.5) = -1
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 1
cvRound( 0.5) = 1
cvRound( 1.5) = 2
(1+0)/2 = [  1,   1,   1;
   1,   1,   1;
   1,   1,   1]

Result on Rasberry Pi 4 with Ubuntu 23.10(32bit) (at 4.x branch)

Default
cvRound(-1.5) = -2
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 2
(1+0)/2 = [  1,   1,   1;
   1,   1,   1;
   1,   1,   0]

FE_DOWNLOAD
cvRound(-1.5) = -2
cvRound(-0.5) = -1
cvRound(-0.2) = -1
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 1
(1+0)/2 = [  1,   1,   1;
   1,   1,   1;
   1,   1,   0]

FE_TONREAREST
cvRound(-1.5) = -2
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 2
(1+0)/2 = [  1,   1,   1;
   1,   1,   1;
   1,   1,   0]

FE_TOWARDZERO
cvRound(-1.5) = -1
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 1
(1+0)/2 = [  1,   1,   1;
   1,   1,   1;
   1,   1,   0]

FE_UPWARD
cvRound(-1.5) = -1
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 1
cvRound( 0.5) = 1
cvRound( 1.5) = 2
(1+0)/2 = [  1,   1,   1;
   1,   1,   1;
   1,   1,   1]

@vpisarev
Copy link
Copy Markdown
Contributor

vpisarev commented Dec 22, 2023

@Kumataro,
thank you for the patch! Having IEEE754-compliant rounding is a useful feature that would make results on ARM more similar to other platforms. We found, however, that on ARMv7 the code is very heavy and can lead to serious performance degradations.

Can you please do it on ARMv7 and earlier in the following way (the Carotene part is shown, in HAL it should be similar):

...
#if CAROTENE_NEON_ARCH >= 8 /* get ready for ARMv9 */
return vcvtnq_u32_f32(val);
#else
float32x4_t delta = vdupq_n_f32(12582912.0f);
return vcvtq_s32_f32(vsubq_f32(vaddq_f32(val, delta), delta));
#endif

it will produce correct results for inputs within +/-12,582,912, which is plenty enough for coordinates, pixel values etc. Even for values beyond this range the results are reasonable, even though not always correct.
And it should not impact performance.

@vpisarev vpisarev self-requested a review December 22, 2023 07:05
@asmorkalov
Copy link
Copy Markdown
Contributor

@vpisarev could you give a link to some paper to mention in the code.

@Kumataro
Copy link
Copy Markdown
Contributor Author

Kumataro commented Dec 22, 2023

Thank you for your review, I try to fix it.

On RasberryPi 4B (Ubuntu 23.04, 32bit, armv7 kernel), I run opencv_test_imgproc, only 1 test is failed.

[==========] 10409 tests from 217 test cases ran. (262782 ms total)
[  PASSED  ] 10408 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Imgproc_Hist_Compare.accuracy
[ RUN      ] Imgproc_Hist_Compare.accuracy
/home/kmtr/work/opencv/modules/ts/src/ts.cpp:618: Failure
Failed

        failure reason: Bad accuracy
        test case #70
        seed: 00000000000c5aa2
-----------------------------------
        LOG:
The comparison result using the method #2 (Intersection)
        is inaccurate (=2.10718e+07, should be =2.10718e+07)
test_case_idx = 70

-----------------------------------
        CONSOLE: ..........................................
-----------------------------------

[  FAILED  ] Imgproc_Hist_Compare.accuracy (237 ms)
[----------] 1 test from Imgproc_Hist_Compare (237 ms total)
‘‘‘

@asmorkalov
Copy link
Copy Markdown
Contributor

@Kumataro I appologize for the false-alarm. I found configuration issue on my side. I have the same test failure as yours. It's old issue, not relevant to your PR.

@Kumataro
Copy link
Copy Markdown
Contributor Author

Kumataro commented Dec 22, 2023

@asmorkalov No problem, thank you very much !

I investigated why test is failed. I reproduced failed test with 1a8d37d commit.
It is before current patches, so this is not related with this issue,

commit 1a8d37d19e236036310b055e6baf9d00cac839c1 (HEAD)
Merge: 91cf0d1843 910db5c9b7
Author: Alexander Smorkalov <2536374+asmorkalov@users.noreply.github.com>
Date:   Fri Sep 8 16:33:18 2023 +0300

And at RISC-V, it seems that similar problem is occurred with CV_32S, (0+1)/2 is not same as cvRound(0.5).
But CV_8U, CV_8S. CV_16U, CV_16S are OK. Is it better to remove this test case temporary ?

https://github.com/opencv/opencv/actions/runs/7298088083/job/19888636576?pr=24271

[  FAILED  ] 16 tests, listed below:
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/240, where GetParam() = (CV_32S, 3, -2, -1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/248, where GetParam() = (CV_32S, 3, 0, 1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/250, where GetParam() = (CV_32S, 3, 1, 0)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/252, where GetParam() = (CV_32S, 3, 2, -1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/255, where GetParam() = (CV_32S, 4, -2, -1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/263, where GetParam() = (CV_32S, 4, 0, 1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/265, where GetParam() = (CV_32S, 4, 1, 0)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/267, where GetParam() = (CV_32S, 4, 2, -1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/270, where GetParam() = (CV_32S, 5, -2, -1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/278, where GetParam() = (CV_32S, 5, 0, 1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/280, where GetParam() = (CV_32S, 5, 1, 0)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/282, where GetParam() = (CV_32S, 5, 2, -1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/285, where GetParam() = (CV_32S, 6, -2, -1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/293, where GetParam() = (CV_32S, 6, 0, 1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/295, where GetParam() = (CV_32S, 6, 1, 0)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/297, where GetParam() = (CV_32S, 6, 2, -1)

@Kumataro
Copy link
Copy Markdown
Contributor Author

@vpisarev Thank you very mush for your advice !
I added new commit with your suggestion, I believe it works well.

At RISC-V, additional test for this issue is failed. However it is potential problem,.
So I think it is better to relax test temporary.

@asmorkalov
Copy link
Copy Markdown
Contributor

@mshabunin Could you take a look on RISC-V related issues?

@asmorkalov
Copy link
Copy Markdown
Contributor

@Kumataro Please disable the test for RISC-V for now in the following way:

// Issue: https://github.com/opencv/opencv/issues/24746
#if defined __riscv
TEST_P(Core_Arith_Regression24163, _DISABLED_test_for_ties_to_even)
#else
TEST_P(Core_Arith_Regression24163, test_for_ties_to_even)
#endif

@Kumataro
Copy link
Copy Markdown
Contributor Author

Imgproc_Hist_Compare.accuracy is failed at recent/current 4.x branch.
So it is potential problem, not degrade occurred by this patch.

kmtr@ubuntu:~/work/build4-main$ uname -a && ./bin/opencv_version --v | head -5 && ./bin/opencv_test_imgproc --gtest_filter="Imgproc_Hist_Compare*"
Linux ubuntu 6.2.0-1013-raspi #15-Ubuntu SMP PREEMPT Fri Sep  8 17:00:11 UTC 2023 armv7l armv7l armv7l GNU/Linux

General configuration for OpenCV 4.8.0-dev =====================================
  Version control:               4.8.0-559-g853e5dfcdf

  Platform:
CTEST_FULL_OUTPUT
OpenCV version: 4.8.0-dev
OpenCV VCS version: 4.8.0-559-g853e5dfcdf
Build type: Release
Compiler: /usr/bin/c++  (ver 12.3.0)
Parallel framework: pthreads (nthreads=4)
CPU features: NEON
OpenCL is disabled
TEST: Skip tests with tags: 'mem_2gb', 'verylong'
Note: Google Test filter = Imgproc_Hist_Compare*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Hist_Compare
[ RUN      ] Imgproc_Hist_Compare.accuracy
/home/kmtr/work/opencv/modules/ts/src/ts.cpp:618: Failure
Failed

        failure reason: Bad accuracy
        test case #70
        seed: 00000000000c5aa2
-----------------------------------
        LOG:
The comparison result using the method #2 (Intersection)
        is inaccurate (=2.10718e+07, should be =2.10718e+07)
test_case_idx = 70

-----------------------------------
        CONSOLE: ..........................................
-----------------------------------

[  FAILED  ] Imgproc_Hist_Compare.accuracy (237 ms)
[----------] 1 test from Imgproc_Hist_Compare (237 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (238 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Imgproc_Hist_Compare.accuracy

 1 FAILED TEST

@Kumataro
Copy link
Copy Markdown
Contributor Author

I fix to use "DISABLE" prefix instead of "_DISABLE". I think it works well.

@vpisarev
Copy link
Copy Markdown
Contributor

vpisarev commented Dec 22, 2023

@vpisarev could you give a link to some paper to mention in the code.

@asmorkalov, for many years OpenCV used a similar trick for more or less efficient implementation of cvRound(). In general, to properly round IEEE754-compliant number, which has M bits in mantissa representation (not counting the implicit 1-bit), to the nearest integer, you need to add and then subtract a magic number (3 << (M-1)). 32-bit floating-point number has 23 bits in mantissa, so the delta is (3 << 22). For double it's (3 << 51), for half it's (3 << 9).

Here is the corresponding stack overflow question where you may find some explanation of the trick: https://stackoverflow.com/questions/17035464/a-fast-method-to-round-a-double-to-a-32-bit-int-explained.

{
static const int32x4_t v_sign = vdupq_n_s32(1 << 31),
v_05 = vreinterpretq_s32_f32(vdupq_n_f32(0.5f));
static const float32x4_t v0_0_f32 = vdupq_n_f32(0.0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kumataro, can you please fix HAL implementation of v_round as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vpisarev

OK, I fixed it and committed.
However, I ran opencv_test_core, there are 1 failed.
I believe it doesn't used v_round().I think this is a potential problem.

[----------] Global test environment tear-down
[ SKIPSTAT ] 86 tests skipped
[ SKIPSTAT ] TAG='mem_2gb' skip 1 tests
[ SKIPSTAT ] TAG='skip_other' skip 85 tests
[==========] 11949 tests from 264 test cases ran. (351138 ms total)
[  PASSED  ] 11948 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/1, where GetParam() = (5, 320x240)
Note: Google Test filter = Core/HasNonZeroLimitValues*
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from Core/HasNonZeroLimitValues
[ RUN      ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/0, where GetParam() = (5, 1x1)
[       OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/0 (0 ms)
[ RUN      ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/1, where GetParam() = (5, 320x240)
/home/kmtr/work/opencv/modules/core/test/test_hasnonzero.cpp:109: Failure
Value of: hasNonZero(m)
  Actual: false
Expected: true
[  FAILED  ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/1, where GetParam() = (5, 320x240) (1 ms)
[ RUN      ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/2, where GetParam() = (5, 127x113)
[       OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/2 (0 ms)
[ RUN      ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/3, where GetParam() = (5, 1x113)
[       OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/3 (1 ms)
[ RUN      ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/4, where GetParam() = (6, 1x1)
[       OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/4 (0 ms)
[ RUN      ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/5, where GetParam() = (6, 320x240)
[       OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/5 (2 ms)
[ RUN      ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/6, where GetParam() = (6, 127x113)
[       OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/6 (0 ms)
[ RUN      ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/7, where GetParam() = (6, 1x113)
[       OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/7 (0 ms)
[----------] 8 tests from Core/HasNonZeroLimitValues (4 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 1 test case ran. (5 ms total)
[  PASSED  ] 7 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/1, where GetParam() = (5, 320x240)

 1 FAILED TEST

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I cleared it is not related with this work.
(Some code is updated, however almost are not changed.)
I believe we can split from this issue.

static bool hasNonZero32f( const float* src, size_t len )
{
bool res = false;
const float* srcEnd = src+len;
#if (CV_SIMD || CV_SIMD_SCALABLE)
typedef v_float32 v_type;
const v_type v_zero = vx_setzero_f32();
constexpr const int unrollCount = 8;
int step = VTraits<v_type>::vlanes() * unrollCount;
int len0 = len & -step;
const float* srcSimdEnd = src+len0;

I disabled L196 to stop using simd, same this test is passed.
And those code is simple, loading from memory and or-ing registers, checking to equal it with zero. So this logic is not related currently fixing.

More investigation is required, but it is related with only special cases(denorm_min).

TEST_P(HasNonZeroLimitValues, hasNonZeroLimitValues)
{
const int type = std::get<0>(GetParam());
const Size size = std::get<1>(GetParam());
Mat m = Mat(size, type);
m.setTo(Scalar::all(std::numeric_limits<double>::infinity()));
EXPECT_TRUE(hasNonZero(m));
m.setTo(Scalar::all(-std::numeric_limits<double>::infinity()));
EXPECT_TRUE(hasNonZero(m));
m.setTo(Scalar::all(std::numeric_limits<double>::quiet_NaN()));
EXPECT_TRUE(hasNonZero(m));
m.setTo((CV_MAT_DEPTH(type) == CV_64F) ? Scalar::all(std::numeric_limits<double>::epsilon()) : Scalar::all(std::numeric_limits<float>::epsilon()));
EXPECT_TRUE(hasNonZero(m));
m.setTo((CV_MAT_DEPTH(type) == CV_64F) ? Scalar::all(std::numeric_limits<double>::min()) : Scalar::all(std::numeric_limits<float>::min()));
EXPECT_TRUE(hasNonZero(m));
m.setTo((CV_MAT_DEPTH(type) == CV_64F) ? Scalar::all(std::numeric_limits<double>::denorm_min()) : Scalar::all(std::numeric_limits<float>::denorm_min()));
EXPECT_TRUE(hasNonZero(m));
}

*/

// See https://github.com/opencv/opencv/pull/24271#issuecomment-1867318007
#define ROUND_DELTA (12582912.0f)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add CAROTENE prefix to the macro, ROUND_DELTA name is too generic and may conflict with something else

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I agree with you, I fixed it, thank you !

@Kumataro
Copy link
Copy Markdown
Contributor Author

Test result with e50fa1a

RasberryPi 4B (Ubuntu 23.04 32bit/armv7)
Additional cmake option -DENABLE_NEON=ON

core

[----------] Global test environment tear-down
[ SKIPSTAT ] 86 tests skipped
[ SKIPSTAT ] TAG='mem_2gb' skip 1 tests
[ SKIPSTAT ] TAG='skip_other' skip 85 tests
[==========] 11949 tests from 264 test cases ran. (351138 ms total)
[  PASSED  ] 11948 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/1, where GetParam() = (5, 320x240)

imgproc

[----------] Global test environment tear-down
[ SKIPSTAT ] 1 tests skipped
[ SKIPSTAT ] TAG='verylong' skip 1 tests
[==========] 10409 tests from 217 test cases ran. (267628 ms total)
[  PASSED  ] 10408 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Imgproc_Hist_Compare.accuracy

imgcodecs

[----------] Global test environment tear-down
[ SKIPSTAT ] 4 tests skipped
[ SKIPSTAT ] TAG='mem_2gb' skip 0 tests (2 times in extra skip list)
[ SKIPSTAT ] TAG='verylong' skip 2 tests
[ SKIPSTAT ] TAG='skip_other' skip 2 tests
[==========] 454 tests from 26 test cases ran. (168672 ms total)
[  PASSED  ] 454 tests.

@vpisarev
Copy link
Copy Markdown
Contributor

let's see what our CI bots will say. It's stange that histogram test fails after modification of round(). I thought that histogram bin is computed using floor() function, not round(), since it needs to find n, such that n <= (x-lower_bound)*nbins/(upper_bound - lower_bound) < n+1 and then n = floor((x - lower_bound)*nbins/(upper_bound - lower_bound)).

@Kumataro
Copy link
Copy Markdown
Contributor Author

Kumataro commented Dec 24, 2023

let's see what our CI bots will say. It's stange that histogram test fails after modification of round()

My recognize is it occurs with 4.x branch code potentially. So test fault exists before this fix.

kmtr@ubuntu:~/work2/opencv$ git log -n 1
commit 953dddd26b5cdc32fc9c2fc21fdedef1fa6fb04d (HEAD -> 4.x, origin/HEAD, origin/4.x)
Merge: f399bdfa1a 53cd921ab4
Author: Alexander Smorkalov <2536374+asmorkalov@users.noreply.github.com>
Date:   Fri Dec 22 17:04:46 2023 +0300

    Merge pull request #24747 from asmorkalov:as/tune_vitb_cuda

    Increate Vit_b test threshold a bit for CUDA FP16.

Without ENABLE_NEON=ON => Passed

kmtr@ubuntu:~/work2/build4-main$ uname -a && ./bin/opencv_version --v | head -5 && ./bin/opencv_test_imgproc --gtest_filter="Imgproc_Hist_Compare*"
Linux ubuntu 6.2.0-1013-raspi #15-Ubuntu SMP PREEMPT Fri Sep  8 17:00:11 UTC 2023 armv7l armv7l armv7l GNU/Linux

General configuration for OpenCV 4.8.0-dev =====================================
  Version control:               4.8.0-570-g953dddd26b

  Platform:
CTEST_FULL_OUTPUT
OpenCV version: 4.8.0-dev
OpenCV VCS version: 4.8.0-570-g953dddd26b
Build type: Release
Compiler: /usr/bin/c++  (ver 12.3.0)
Parallel framework: pthreads (nthreads=4)
CPU features: N/A
OpenCL is disabled
TEST: Skip tests with tags: 'mem_2gb', 'verylong'
Note: Google Test filter = Imgproc_Hist_Compare*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Hist_Compare
[ RUN      ] Imgproc_Hist_Compare.accuracy
[       OK ] Imgproc_Hist_Compare.accuracy (291 ms)
[----------] 1 test from Imgproc_Hist_Compare (291 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (292 ms total)
[  PASSED  ] 1 test.

With ENABLE_NEON=ON => Failed

kmtr@ubuntu:~/work2/build4-main$ uname -a && ./bin/opencv_version --v | head -5 && ./bin/opencv_test_imgproc --gtest_filter="Imgproc_Hist_Compare*"
Linux ubuntu 6.2.0-1013-raspi #15-Ubuntu SMP PREEMPT Fri Sep  8 17:00:11 UTC 2023 armv7l armv7l armv7l GNU/Linux

General configuration for OpenCV 4.8.0-dev =====================================
  Version control:               4.8.0-570-g953dddd26b

  Platform:
CTEST_FULL_OUTPUT
OpenCV version: 4.8.0-dev
OpenCV VCS version: 4.8.0-570-g953dddd26b
Build type: Release
Compiler: /usr/bin/c++  (ver 12.3.0)
Parallel framework: pthreads (nthreads=4)
CPU features: NEON
OpenCL is disabled
TEST: Skip tests with tags: 'mem_2gb', 'verylong'
Note: Google Test filter = Imgproc_Hist_Compare*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Hist_Compare
[ RUN      ] Imgproc_Hist_Compare.accuracy
/home/kmtr/work2/opencv/modules/ts/src/ts.cpp:618: Failure
Failed

        failure reason: Bad accuracy
        test case #70
        seed: 00000000000c5aa2
-----------------------------------
        LOG:
The comparison result using the method #2 (Intersection)
        is inaccurate (=2.10718e+07, should be =2.10718e+07)
test_case_idx = 70

-----------------------------------
        CONSOLE: ..........................................
-----------------------------------

[  FAILED  ] Imgproc_Hist_Compare.accuracy (238 ms)
[----------] 1 test from Imgproc_Hist_Compare (239 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (239 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Imgproc_Hist_Compare.accuracy

 1 FAILED TEST

@Kumataro
Copy link
Copy Markdown
Contributor Author

I'm sorry, but do I have misrecognize about what is problem ?
All I can see CI test are passed, and in default, armv7 target are not tested.

@vpisarev
Copy link
Copy Markdown
Contributor

vpisarev commented Dec 24, 2023

@Kumataro, I suggest to do extensive testing of your patch by temporarily switching to the new round even for ARMv8. If your new implementations of SIMD round(), please, temporary change

#if <armv8>
return vcvtnq_...();
#else
// round trick ...
#endif

to

#if 0
return vcvtnq_...();
#else
// round trick ...
#endif

As long as it's passed, change it back and we will merge it.

@Kumataro
Copy link
Copy Markdown
Contributor Author

@vpisarev
OK, I pushed it.

@asmorkalov
Copy link
Copy Markdown
Contributor

Imgproc_Hist_Compare.accuracy failed on ARMv7 before patch. It's old issue.

@Kumataro
Copy link
Copy Markdown
Contributor Author

Ok, almost tests (including ARM64) are passed, last 1 is default test (it is not started.) .

Can I revert temporary commit ?

@Kumataro
Copy link
Copy Markdown
Contributor Author

As long as it's passed, change it back and we will merge it.

Now that we have confirmed that we are passing the necessary ARMv8 tests, I will revert the temporary fix.

@vpisarev vpisarev self-requested a review December 25, 2023 00:37
Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@asmorkalov asmorkalov merged commit dba7186 into opencv:4.x Dec 25, 2023
@Kumataro
Copy link
Copy Markdown
Contributor Author

I would like to express my deepest gratitude to everyone for their help in completing this pull request.
Thank you very much !!!

@asmorkalov asmorkalov mentioned this pull request Jan 19, 2024
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Fix to convert float32 to int32/uint32 with rounding to nearest (ties to even). opencv#24271

Fix opencv#24163

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake

(carotene is BSD)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Carotene (ARM HAL) uses wrong rounding in some places

5 participants