fix to convert float32 to int32/uint32 with rounding to nearest (ties to even). by Kumataro · Pull Request #24271 · opencv/opencv

Kumataro · 2023-09-14T11:47:47Z

fix #24163

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

(carotene is BSD)

… to even).

Kumataro · 2023-09-14T20:42:50Z

Hello, at Ubuntu2004-ARM64-Debug, It looks like to failed to save unit test result.

The test itself appears to complete without any problems.
I believe this is not side-effect at this fix.

Could you please confirm that this test environment works properly?

Container for artifact "junit-html-ubuntu20-arm64" successfully created. Starting upload of file(s)
Total file count: 1 ---- Processed file #0 (0.0%)
Total file count: 1 ---- Processed file #0 (0.0%)
Total file count: 1 ---- Processed file #0 (0.0%)
Total file count: 1 ---- Processed file #0 (0.0%)
Total file count: 1 ---- Processed file #0 (0.0%)
Total file count: 1 ---- Processed file #0 (0.0%)
An error has been caught http-client index 0, retrying the upload
Error: read ECONNRESET

mshabunin · 2023-12-12T16:18:46Z

What about cvRound? Is it correct?

opencv/modules/core/include/opencv2/core/fast_math.hpp

Lines 87 to 102 in 850be1e

    
           #elif defined __GNUC__ && defined __arm__ && (defined __ARM_PCS_VFP || defined __ARM_VFPV3__ || defined __ARM_NEON__) && !defined __SOFTFP__ 
        
             // 1. general scheme 
        
             #define ARM_ROUND(_value, _asm_string) \ 
        
                 int res; \ 
        
                 float temp; \ 
        
                 CV_UNUSED(temp); \ 
        
                 __asm__(_asm_string : [res] "=r" (res), [temp] "=w" (temp) : [value] "w" (_value)); \ 
        
                 return res 
        
             // 2. version for double 
        
             #ifdef __clang__ 
        
                 #define CV_INLINE_ROUND_DBL(value) ARM_ROUND(value, "vcvtr.s32.f64 %[temp], %[value] \n vmov %[res], %[temp]") 
        
             #else 
        
                 #define CV_INLINE_ROUND_DBL(value) ARM_ROUND(value, "vcvtr.s32.f64 %[temp], %P[value] \n vmov %[res], %[temp]") 
        
             #endif 
        
             // 3. version for float 
        
             #define CV_INLINE_ROUND_FLT(value) ARM_ROUND(value, "vcvtr.s32.f32 %[temp], %[value]\n vmov %[res], %[temp]")

As I understand it uses vcvtr on ARMv8 and ARMv7 with VFPv3 support.

Kumataro · 2023-12-12T21:12:15Z

Thank you for your comment, I think cvRound() is tested and passed.

opencv/modules/core/test/test_arithm.cpp

Lines 1686 to 1689 in 850be1e

    
           ASSERT_EQ(2, cvRound(2.5)); 
        
           ASSERT_EQ(4, cvRound(3.5)); 
        
           ASSERT_EQ(-2, cvRound(-2.5)); 
        
           ASSERT_EQ(-4, cvRound(-3.5));

I'll try to think of ways to improve it a little more.
vcvtr works with the rounding mode with FPCLR register setting.
But FPCLR has no round-to-ties-to-even mode, so it is wonder...

mshabunin · 2023-12-13T11:31:23Z

modules/core/test/test_operations.cpp

+        Mat src1 = ( Mat_<uchar>(1, 16) <<  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 );
+        Mat src2 = ( Mat_<uchar>(1, 16) <<  1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  1,  1,  1,  1,  1,  1 );
+        Mat mean = ( src1 + src2 ) / 2;
+        Mat expected(1,16,CV_8U);


Maybe it would be better to use non 16-divisible size to verify that tail processing also works as expected. E.g. 20 elements.

Comon array size can be stored as a separate constant to reduce duplication.

Thank you for your review.
I updated this test including 3x3, 3x4, 3x5, 3x6 cases.

3x3 - 9 elements - float32_4 x 2 + float32

3x4 - 12 elements - float32_4 x 3

3x5 - 15 elements - float32_4 x 3 + float32_2 x 1 + float32 x 1

3x6 - 18 elements - float32_4 x 4 + float32_2 x 1

I think these test are including above test.

3x1 - 3 elements - float32_2 x 1 + float32 x 1

3x2 - 6 elements - float32_4 x 1 + float32x2 x 1

And CV_8S, CV_16U, CV_16S and CV_32S are supported to test.

Kumataro · 2023-12-15T23:03:34Z

Thank you for waiting. I'm a little confused.

What about cvRound? Is it correct?

Strictly, it is not correct in this implementation.

And when calculating (1+0)/2, SIMD implementation and cvRound() are mixed.
I would like to avoid additional implementations that slow down cvRound() if possible.

At OpenCV 4.8.0 doc, cvRound is described as Rounds floating-point number to the nearest integer.

https://docs.opencv.org/4.8.0/db/de0/group__core__utils.html#ga085eca238176984a0b72df2818598d85

cvRound seems to work with the floating-point rounding direction. So this description is not correct.

For arm, vcvtr is used. it is instruction to convert double/float to integer with VFP register. "r" means working with the floating-point rounding direction

We can control the floating-point rounding direction with https://en.cppreference.com/w/cpp/numeric/fenv/feround .

So the result of cvRound() is predictable, however it is not only Rounds floating-point number to the nearest integer.

#include <iostream>
#include <cfenv>
#include <opencv2/core.hpp>

void test(std::string title, int mode)
{
  switch(mode)
  {
  case FE_DOWNWARD:
  case FE_TONEAREST:
  case FE_TOWARDZERO:
  case FE_UPWARD:
    std::fesetround(mode);
    break;
  case -1:
  default:
    break;
  }

  std::cout << title << std::endl;
  std::cout << "cvRound(-1.5) = " << cvRound(-1.5) << std::endl;
  std::cout << "cvRound(-0.5) = " << cvRound(-0.5) << std::endl;
  std::cout << "cvRound(-0.2) = " << cvRound(-0.2) << std::endl;
  std::cout << "cvRound( 0.0) = " << cvRound( 0.0) << std::endl;
  std::cout << "cvRound( 0.2) = " << cvRound( 0.2) << std::endl;
  std::cout << "cvRound( 0.5) = " << cvRound( 0.5) << std::endl;
  std::cout << "cvRound( 1.5) = " << cvRound( 1.5) << std::endl;

  cv::Mat ones  = cv::Mat::ones(3,3,CV_8U);
  cv::Mat zeros = cv::Mat::zeros(3,3,CV_8U);
  cv::Mat means = ( ones + zeros ) / 2;
  std::cout << "(1+0)/2 = " << means << std::endl;
  std::cout << std::endl;
}

int main(void)
{
  test("Default",       -1);
  test("FE_DOWNLOAD",   FE_DOWNWARD);
  test("FE_TONREAREST", FE_TONEAREST);
  test("FE_TOWARDZERO", FE_TOWARDZERO);
  test("FE_UPWARD",     FE_UPWARD);

  return 0;
}

Result on x86-64. (at 4.x branch)

Default
cvRound(-1.5) = -2
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 2
(1+0)/2 = [  0,   0,   0;
   0,   0,   0;
   0,   0,   0]

FE_DOWNLOAD
cvRound(-1.5) = -2
cvRound(-0.5) = -1
cvRound(-0.2) = -1
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 1
(1+0)/2 = [  0,   0,   0;
   0,   0,   0;
   0,   0,   0]

FE_TONREAREST
cvRound(-1.5) = -2
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 2
(1+0)/2 = [  0,   0,   0;
   0,   0,   0;
   0,   0,   0]

FE_TOWARDZERO
cvRound(-1.5) = -1
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 1
(1+0)/2 = [  0,   0,   0;
   0,   0,   0;
   0,   0,   0]

FE_UPWARD
cvRound(-1.5) = -1
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 1
cvRound( 0.5) = 1
cvRound( 1.5) = 2
(1+0)/2 = [  1,   1,   1;
   1,   1,   1;
   1,   1,   1]

Result on Rasberry Pi 4 with Ubuntu 23.10(32bit) (at 4.x branch)

Default
cvRound(-1.5) = -2
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 2
(1+0)/2 = [  1,   1,   1;
   1,   1,   1;
   1,   1,   0]

FE_DOWNLOAD
cvRound(-1.5) = -2
cvRound(-0.5) = -1
cvRound(-0.2) = -1
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 1
(1+0)/2 = [  1,   1,   1;
   1,   1,   1;
   1,   1,   0]

FE_TONREAREST
cvRound(-1.5) = -2
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 2
(1+0)/2 = [  1,   1,   1;
   1,   1,   1;
   1,   1,   0]

FE_TOWARDZERO
cvRound(-1.5) = -1
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 1
(1+0)/2 = [  1,   1,   1;
   1,   1,   1;
   1,   1,   0]

FE_UPWARD
cvRound(-1.5) = -1
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 1
cvRound( 0.5) = 1
cvRound( 1.5) = 2
(1+0)/2 = [  1,   1,   1;
   1,   1,   1;
   1,   1,   1]

3rdparty/carotene/src/convert_scale.cpp

vpisarev · 2023-12-22T07:05:02Z

@Kumataro,
thank you for the patch! Having IEEE754-compliant rounding is a useful feature that would make results on ARM more similar to other platforms. We found, however, that on ARMv7 the code is very heavy and can lead to serious performance degradations.

Can you please do it on ARMv7 and earlier in the following way (the Carotene part is shown, in HAL it should be similar):

...
#if CAROTENE_NEON_ARCH >= 8 /* get ready for ARMv9 */
return vcvtnq_u32_f32(val);
#else
float32x4_t delta = vdupq_n_f32(12582912.0f);
return vcvtq_s32_f32(vsubq_f32(vaddq_f32(val, delta), delta));
#endif

it will produce correct results for inputs within +/-12,582,912, which is plenty enough for coordinates, pixel values etc. Even for values beyond this range the results are reasonable, even though not always correct.
And it should not impact performance.

asmorkalov · 2023-12-22T08:18:32Z

@vpisarev could you give a link to some paper to mention in the code.

Kumataro · 2023-12-22T09:01:11Z

Thank you for your review, I try to fix it.

On RasberryPi 4B (Ubuntu 23.04, 32bit, armv7 kernel), I run opencv_test_imgproc, only 1 test is failed.

[==========] 10409 tests from 217 test cases ran. (262782 ms total)
[  PASSED  ] 10408 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Imgproc_Hist_Compare.accuracy

[ RUN      ] Imgproc_Hist_Compare.accuracy
/home/kmtr/work/opencv/modules/ts/src/ts.cpp:618: Failure
Failed

        failure reason: Bad accuracy
        test case #70
        seed: 00000000000c5aa2
-----------------------------------
        LOG:
The comparison result using the method #2 (Intersection)
        is inaccurate (=2.10718e+07, should be =2.10718e+07)
test_case_idx = 70

-----------------------------------
        CONSOLE: ..........................................
-----------------------------------

[  FAILED  ] Imgproc_Hist_Compare.accuracy (237 ms)
[----------] 1 test from Imgproc_Hist_Compare (237 ms total)
‘‘‘

asmorkalov · 2023-12-22T09:12:54Z

@Kumataro I appologize for the false-alarm. I found configuration issue on my side. I have the same test failure as yours. It's old issue, not relevant to your PR.

Kumataro · 2023-12-22T10:08:19Z

@asmorkalov No problem, thank you very much !

I investigated why test is failed. I reproduced failed test with 1a8d37d commit.
It is before current patches, so this is not related with this issue,

commit 1a8d37d19e236036310b055e6baf9d00cac839c1 (HEAD)
Merge: 91cf0d1843 910db5c9b7
Author: Alexander Smorkalov <2536374+asmorkalov@users.noreply.github.com>
Date:   Fri Sep 8 16:33:18 2023 +0300

And at RISC-V, it seems that similar problem is occurred with CV_32S, (0+1)/2 is not same as cvRound(0.5).
But CV_8U, CV_8S. CV_16U, CV_16S are OK. Is it better to remove this test case temporary ?

https://github.com/opencv/opencv/actions/runs/7298088083/job/19888636576?pr=24271

[  FAILED  ] 16 tests, listed below:
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/240, where GetParam() = (CV_32S, 3, -2, -1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/248, where GetParam() = (CV_32S, 3, 0, 1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/250, where GetParam() = (CV_32S, 3, 1, 0)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/252, where GetParam() = (CV_32S, 3, 2, -1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/255, where GetParam() = (CV_32S, 4, -2, -1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/263, where GetParam() = (CV_32S, 4, 0, 1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/265, where GetParam() = (CV_32S, 4, 1, 0)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/267, where GetParam() = (CV_32S, 4, 2, -1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/270, where GetParam() = (CV_32S, 5, -2, -1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/278, where GetParam() = (CV_32S, 5, 0, 1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/280, where GetParam() = (CV_32S, 5, 1, 0)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/282, where GetParam() = (CV_32S, 5, 2, -1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/285, where GetParam() = (CV_32S, 6, -2, -1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/293, where GetParam() = (CV_32S, 6, 0, 1)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/295, where GetParam() = (CV_32S, 6, 1, 0)
[  FAILED  ] Core_Arith_Regression24163.test_for_ties_to_even/297, where GetParam() = (CV_32S, 6, 2, -1)

Kumataro · 2023-12-22T10:25:57Z

@vpisarev Thank you very mush for your advice !
I added new commit with your suggestion, I believe it works well.

At RISC-V, additional test for this issue is failed. However it is potential problem,.
So I think it is better to relax test temporary.

asmorkalov · 2023-12-22T10:26:37Z

@mshabunin Could you take a look on RISC-V related issues?

asmorkalov · 2023-12-22T10:32:57Z

@Kumataro Please disable the test for RISC-V for now in the following way:

// Issue: https://github.com/opencv/opencv/issues/24746
#if defined __riscv
TEST_P(Core_Arith_Regression24163, _DISABLED_test_for_ties_to_even)
#else
TEST_P(Core_Arith_Regression24163, test_for_ties_to_even)
#endif

Kumataro · 2023-12-22T11:05:21Z

Imgproc_Hist_Compare.accuracy is failed at recent/current 4.x branch.
So it is potential problem, not degrade occurred by this patch.

kmtr@ubuntu:~/work/build4-main$ uname -a && ./bin/opencv_version --v | head -5 && ./bin/opencv_test_imgproc --gtest_filter="Imgproc_Hist_Compare*"
Linux ubuntu 6.2.0-1013-raspi #15-Ubuntu SMP PREEMPT Fri Sep  8 17:00:11 UTC 2023 armv7l armv7l armv7l GNU/Linux

General configuration for OpenCV 4.8.0-dev =====================================
  Version control:               4.8.0-559-g853e5dfcdf

  Platform:
CTEST_FULL_OUTPUT
OpenCV version: 4.8.0-dev
OpenCV VCS version: 4.8.0-559-g853e5dfcdf
Build type: Release
Compiler: /usr/bin/c++  (ver 12.3.0)
Parallel framework: pthreads (nthreads=4)
CPU features: NEON
OpenCL is disabled
TEST: Skip tests with tags: 'mem_2gb', 'verylong'
Note: Google Test filter = Imgproc_Hist_Compare*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Hist_Compare
[ RUN      ] Imgproc_Hist_Compare.accuracy
/home/kmtr/work/opencv/modules/ts/src/ts.cpp:618: Failure
Failed

        failure reason: Bad accuracy
        test case #70
        seed: 00000000000c5aa2
-----------------------------------
        LOG:
The comparison result using the method #2 (Intersection)
        is inaccurate (=2.10718e+07, should be =2.10718e+07)
test_case_idx = 70

-----------------------------------
        CONSOLE: ..........................................
-----------------------------------

[  FAILED  ] Imgproc_Hist_Compare.accuracy (237 ms)
[----------] 1 test from Imgproc_Hist_Compare (237 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (238 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Imgproc_Hist_Compare.accuracy

 1 FAILED TEST

Kumataro · 2023-12-22T12:55:49Z

I fix to use "DISABLE" prefix instead of "_DISABLE". I think it works well.

vpisarev · 2023-12-22T22:01:35Z

@vpisarev could you give a link to some paper to mention in the code.

@asmorkalov, for many years OpenCV used a similar trick for more or less efficient implementation of cvRound(). In general, to properly round IEEE754-compliant number, which has M bits in mantissa representation (not counting the implicit 1-bit), to the nearest integer, you need to add and then subtract a magic number (3 << (M-1)). 32-bit floating-point number has 23 bits in mantissa, so the delta is (3 << 22). For double it's (3 << 51), for half it's (3 << 9).

Here is the corresponding stack overflow question where you may find some explanation of the trick: https://stackoverflow.com/questions/17035464/a-fast-method-to-round-a-double-to-a-32-bit-int-explained.

vpisarev · 2023-12-22T22:35:37Z

modules/core/include/opencv2/core/hal/intrin_neon.hpp

 {
-    static const int32x4_t v_sign = vdupq_n_s32(1 << 31),
-        v_05 = vreinterpretq_s32_f32(vdupq_n_f32(0.5f));
+    static const float32x4_t v0_0_f32  = vdupq_n_f32(0.0);


@Kumataro, can you please fix HAL implementation of v_round as well?

@vpisarev

OK, I fixed it and committed.
However, I ran opencv_test_core, there are 1 failed.
I believe it doesn't used v_round().I think this is a potential problem.

[----------] Global test environment tear-down [ SKIPSTAT ] 86 tests skipped [ SKIPSTAT ] TAG='mem_2gb' skip 1 tests [ SKIPSTAT ] TAG='skip_other' skip 85 tests [==========] 11949 tests from 264 test cases ran. (351138 ms total) [ PASSED ] 11948 tests. [ FAILED ] 1 test, listed below: [ FAILED ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/1, where GetParam() = (5, 320x240)

Note: Google Test filter = Core/HasNonZeroLimitValues* [==========] Running 8 tests from 1 test case. [----------] Global test environment set-up. [----------] 8 tests from Core/HasNonZeroLimitValues [ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/0, where GetParam() = (5, 1x1) [ OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/0 (0 ms) [ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/1, where GetParam() = (5, 320x240) /home/kmtr/work/opencv/modules/core/test/test_hasnonzero.cpp:109: Failure Value of: hasNonZero(m) Actual: false Expected: true [ FAILED ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/1, where GetParam() = (5, 320x240) (1 ms) [ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/2, where GetParam() = (5, 127x113) [ OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/2 (0 ms) [ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/3, where GetParam() = (5, 1x113) [ OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/3 (1 ms) [ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/4, where GetParam() = (6, 1x1) [ OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/4 (0 ms) [ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/5, where GetParam() = (6, 320x240) [ OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/5 (2 ms) [ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/6, where GetParam() = (6, 127x113) [ OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/6 (0 ms) [ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/7, where GetParam() = (6, 1x113) [ OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/7 (0 ms) [----------] 8 tests from Core/HasNonZeroLimitValues (4 ms total) [----------] Global test environment tear-down [==========] 8 tests from 1 test case ran. (5 ms total) [ PASSED ] 7 tests. [ FAILED ] 1 test, listed below: [ FAILED ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/1, where GetParam() = (5, 320x240) 1 FAILED TEST

OK, I cleared it is not related with this work.
(Some code is updated, however almost are not changed.)
I believe we can split from this issue.

opencv/modules/core/src/has_non_zero.simd.hpp

Lines 192 to 202 in 953dddd

static bool hasNonZero32f( const float* src, size_t len )

{

bool res = false;

const float* srcEnd = src+len;

#if (CV_SIMD || CV_SIMD_SCALABLE)

typedef v_float32 v_type;

const v_type v_zero = vx_setzero_f32();

constexpr const int unrollCount = 8;

int step = VTraits<v_type>::vlanes() * unrollCount;

int len0 = len & -step;

const float* srcSimdEnd = src+len0;

I disabled L196 to stop using simd, same this test is passed.
And those code is simple, loading from memory and or-ing registers, checking to equal it with zero. So this logic is not related currently fixing.

More investigation is required, but it is related with only special cases(denorm_min).

opencv/modules/core/test/test_hasnonzero.cpp

Lines 86 to 110 in 953dddd

TEST_P(HasNonZeroLimitValues, hasNonZeroLimitValues)

{

const int type = std::get<0>(GetParam());

const Size size = std::get<1>(GetParam());

Mat m = Mat(size, type);

m.setTo(Scalar::all(std::numeric_limits<double>::infinity()));

EXPECT_TRUE(hasNonZero(m));

m.setTo(Scalar::all(-std::numeric_limits<double>::infinity()));

EXPECT_TRUE(hasNonZero(m));

m.setTo(Scalar::all(std::numeric_limits<double>::quiet_NaN()));

EXPECT_TRUE(hasNonZero(m));

m.setTo((CV_MAT_DEPTH(type) == CV_64F) ? Scalar::all(std::numeric_limits<double>::epsilon()) : Scalar::all(std::numeric_limits<float>::epsilon()));

EXPECT_TRUE(hasNonZero(m));

m.setTo((CV_MAT_DEPTH(type) == CV_64F) ? Scalar::all(std::numeric_limits<double>::min()) : Scalar::all(std::numeric_limits<float>::min()));

EXPECT_TRUE(hasNonZero(m));

m.setTo((CV_MAT_DEPTH(type) == CV_64F) ? Scalar::all(std::numeric_limits<double>::denorm_min()) : Scalar::all(std::numeric_limits<float>::denorm_min()));

EXPECT_TRUE(hasNonZero(m));

}

vpisarev · 2023-12-22T22:41:21Z

3rdparty/carotene/src/vround_helper.hpp

+ */
+
+// See https://github.com/opencv/opencv/pull/24271#issuecomment-1867318007
+#define ROUND_DELTA (12582912.0f)


Please, add CAROTENE prefix to the macro, ROUND_DELTA name is too generic and may conflict with something else

OK, I agree with you, I fixed it, thank you !

Kumataro · 2023-12-22T23:55:24Z

Test result with e50fa1a

RasberryPi 4B (Ubuntu 23.04 32bit/armv7)
Additional cmake option -DENABLE_NEON=ON

core

[----------] Global test environment tear-down
[ SKIPSTAT ] 86 tests skipped
[ SKIPSTAT ] TAG='mem_2gb' skip 1 tests
[ SKIPSTAT ] TAG='skip_other' skip 85 tests
[==========] 11949 tests from 264 test cases ran. (351138 ms total)
[  PASSED  ] 11948 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/1, where GetParam() = (5, 320x240)

imgproc

[----------] Global test environment tear-down
[ SKIPSTAT ] 1 tests skipped
[ SKIPSTAT ] TAG='verylong' skip 1 tests
[==========] 10409 tests from 217 test cases ran. (267628 ms total)
[  PASSED  ] 10408 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Imgproc_Hist_Compare.accuracy

imgcodecs

[----------] Global test environment tear-down
[ SKIPSTAT ] 4 tests skipped
[ SKIPSTAT ] TAG='mem_2gb' skip 0 tests (2 times in extra skip list)
[ SKIPSTAT ] TAG='verylong' skip 2 tests
[ SKIPSTAT ] TAG='skip_other' skip 2 tests
[==========] 454 tests from 26 test cases ran. (168672 ms total)
[  PASSED  ] 454 tests.

vpisarev · 2023-12-23T16:56:39Z

let's see what our CI bots will say. It's stange that histogram test fails after modification of round(). I thought that histogram bin is computed using floor() function, not round(), since it needs to find n, such that n <= (x-lower_bound)*nbins/(upper_bound - lower_bound) < n+1 and then n = floor((x - lower_bound)*nbins/(upper_bound - lower_bound)).

Kumataro · 2023-12-24T00:23:23Z

let's see what our CI bots will say. It's stange that histogram test fails after modification of round()

My recognize is it occurs with 4.x branch code potentially. So test fault exists before this fix.

kmtr@ubuntu:~/work2/opencv$ git log -n 1
commit 953dddd26b5cdc32fc9c2fc21fdedef1fa6fb04d (HEAD -> 4.x, origin/HEAD, origin/4.x)
Merge: f399bdfa1a 53cd921ab4
Author: Alexander Smorkalov <2536374+asmorkalov@users.noreply.github.com>
Date:   Fri Dec 22 17:04:46 2023 +0300

    Merge pull request #24747 from asmorkalov:as/tune_vitb_cuda

    Increate Vit_b test threshold a bit for CUDA FP16.

Without ENABLE_NEON=ON => Passed

kmtr@ubuntu:~/work2/build4-main$ uname -a && ./bin/opencv_version --v | head -5 && ./bin/opencv_test_imgproc --gtest_filter="Imgproc_Hist_Compare*"
Linux ubuntu 6.2.0-1013-raspi #15-Ubuntu SMP PREEMPT Fri Sep  8 17:00:11 UTC 2023 armv7l armv7l armv7l GNU/Linux

General configuration for OpenCV 4.8.0-dev =====================================
  Version control:               4.8.0-570-g953dddd26b

  Platform:
CTEST_FULL_OUTPUT
OpenCV version: 4.8.0-dev
OpenCV VCS version: 4.8.0-570-g953dddd26b
Build type: Release
Compiler: /usr/bin/c++  (ver 12.3.0)
Parallel framework: pthreads (nthreads=4)
CPU features: N/A
OpenCL is disabled
TEST: Skip tests with tags: 'mem_2gb', 'verylong'
Note: Google Test filter = Imgproc_Hist_Compare*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Hist_Compare
[ RUN      ] Imgproc_Hist_Compare.accuracy
[       OK ] Imgproc_Hist_Compare.accuracy (291 ms)
[----------] 1 test from Imgproc_Hist_Compare (291 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (292 ms total)
[  PASSED  ] 1 test.

With ENABLE_NEON=ON => Failed

kmtr@ubuntu:~/work2/build4-main$ uname -a && ./bin/opencv_version --v | head -5 && ./bin/opencv_test_imgproc --gtest_filter="Imgproc_Hist_Compare*"
Linux ubuntu 6.2.0-1013-raspi #15-Ubuntu SMP PREEMPT Fri Sep  8 17:00:11 UTC 2023 armv7l armv7l armv7l GNU/Linux

General configuration for OpenCV 4.8.0-dev =====================================
  Version control:               4.8.0-570-g953dddd26b

  Platform:
CTEST_FULL_OUTPUT
OpenCV version: 4.8.0-dev
OpenCV VCS version: 4.8.0-570-g953dddd26b
Build type: Release
Compiler: /usr/bin/c++  (ver 12.3.0)
Parallel framework: pthreads (nthreads=4)
CPU features: NEON
OpenCL is disabled
TEST: Skip tests with tags: 'mem_2gb', 'verylong'
Note: Google Test filter = Imgproc_Hist_Compare*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Hist_Compare
[ RUN      ] Imgproc_Hist_Compare.accuracy
/home/kmtr/work2/opencv/modules/ts/src/ts.cpp:618: Failure
Failed

        failure reason: Bad accuracy
        test case #70
        seed: 00000000000c5aa2
-----------------------------------
        LOG:
The comparison result using the method #2 (Intersection)
        is inaccurate (=2.10718e+07, should be =2.10718e+07)
test_case_idx = 70

-----------------------------------
        CONSOLE: ..........................................
-----------------------------------

[  FAILED  ] Imgproc_Hist_Compare.accuracy (238 ms)
[----------] 1 test from Imgproc_Hist_Compare (239 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (239 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Imgproc_Hist_Compare.accuracy

 1 FAILED TEST

Kumataro · 2023-12-24T01:28:37Z

I'm sorry, but do I have misrecognize about what is problem ?
All I can see CI test are passed, and in default, armv7 target are not tested.

vpisarev · 2023-12-24T08:07:36Z

@Kumataro, I suggest to do extensive testing of your patch by temporarily switching to the new round even for ARMv8. If your new implementations of SIMD round(), please, temporary change

#if <armv8>
return vcvtnq_...();
#else
// round trick ...
#endif

to

#if 0
return vcvtnq_...();
#else
// round trick ...
#endif

As long as it's passed, change it back and we will merge it.

Kumataro · 2023-12-24T09:36:13Z

@vpisarev
OK, I pushed it.

asmorkalov · 2023-12-24T09:44:32Z

Imgproc_Hist_Compare.accuracy failed on ARMv7 before patch. It's old issue.

Kumataro · 2023-12-24T11:07:16Z

Ok, almost tests (including ARM64) are passed, last 1 is default test (it is not started.) .

Can I revert temporary commit ?

This reverts commit c4e151d.

Kumataro · 2023-12-24T14:05:24Z

As long as it's passed, change it back and we will merge it.

Now that we have confirmed that we are passing the necessary ARMv8 tests, I will revert the temporary fix.

asmorkalov

👍

Kumataro · 2023-12-25T09:27:49Z

I would like to express my deepest gratitude to everyone for their help in completing this pull request.
Thank you very much !!!

Fix to convert float32 to int32/uint32 with rounding to nearest (ties to even). opencv#24271 Fix opencv#24163 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake (carotene is BSD)

Kumataro added 3 commits September 13, 2023 07:28

fix to convert float32 to int32/uint32 with rounding to nearest (ties…

1438275

… to even).

update convert_scale.cpp and add CAROTENE_NEON_ARCH option

1ea853b

fix C4244

f13861b

opencv-alalek added bug optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc labels Sep 15, 2023

opencv-alalek requested a review from asmorkalov September 21, 2023 22:07

opencv-alalek added this to the 4.9.0 milestone Sep 21, 2023

mshabunin reviewed Dec 13, 2023

View reviewed changes

core: update regression test for 24163

74ef902

asmorkalov reviewed Dec 21, 2023

View reviewed changes

3rdparty/carotene/src/convert_scale.cpp Show resolved Hide resolved

3rdparty/carotene/src/convert_scale.cpp Show resolved Hide resolved

3rdparty/carotene/src/convert_scale.cpp Show resolved Hide resolved

vpisarev self-requested a review December 22, 2023 07:05

3rdparty: carotene: optimize for armv7

0c8e8e1

asmorkalov mentioned this pull request Dec 22, 2023

Test Core_Arith_Regression24163, test_for_ties_to_even fails on RISC-V #24746

Closed

4 tasks

Kumataro added 2 commits December 22, 2023 11:10

core: disable test_for_ties_to_even for riscv

988d048

refix disable DISABLED_test_for_ties_to_even for riscv

f8be743

vpisarev reviewed Dec 22, 2023

View reviewed changes

Kumataro added 2 commits December 22, 2023 23:07

3rdparty: carotene: refactoring vround_helper

d087c2f

core: hal: fix for review, update v_round()

e50fa1a

Kumataro mentioned this pull request Dec 24, 2023

ARMv7 With NEON, "Imgproc_Hist_Compare" is failed. #24757

Closed

4 tasks

(temporary) force enable round trick for ARMv8

c4e151d

Revert "(temporary) force enable round trick for ARMv8"

70facde

This reverts commit c4e151d.

vpisarev self-requested a review December 25, 2023 00:37

vpisarev approved these changes Dec 25, 2023

View reviewed changes

asmorkalov approved these changes Dec 25, 2023

View reviewed changes

asmorkalov assigned vpisarev Dec 25, 2023

asmorkalov merged commit dba7186 into opencv:4.x Dec 25, 2023

This was referenced Dec 25, 2023

Subset of MathOperatorArithmeticTestFluid/MathOperatorMatScalarTest cases fails on arm64 #19118

Closed

cvtColor with COLOR_YUV2RGB_NV21 get different result on android arm and linux x64 #20432

Closed

asmorkalov mentioned this pull request Jan 19, 2024

5.x merge 4.x #24862

Merged

asmorkalov mentioned this pull request Jun 21, 2024

WIP: Restore cvtColor implementation in Carotene HAL after rounding fixes. #25801

Closed

6 tasks

asmorkalov mentioned this pull request Sep 2, 2024

Got rid of CAROTENE_NEON_ARCH and use standard __ARM_ARCH check #26091

Merged

6 tasks

	static bool hasNonZero32f( const float* src, size_t len )
	{
	bool res = false;
	const float* srcEnd = src+len;
	#if (CV_SIMD \|\| CV_SIMD_SCALABLE)
	typedef v_float32 v_type;
	const v_type v_zero = vx_setzero_f32();
	constexpr const int unrollCount = 8;
	int step = VTraits<v_type>::vlanes() * unrollCount;
	int len0 = len & -step;
	const float* srcSimdEnd = src+len0;

	TEST_P(HasNonZeroLimitValues, hasNonZeroLimitValues)
	{
	const int type = std::get<0>(GetParam());
	const Size size = std::get<1>(GetParam());

	Mat m = Mat(size, type);

	m.setTo(Scalar::all(std::numeric_limits<double>::infinity()));
	EXPECT_TRUE(hasNonZero(m));

	m.setTo(Scalar::all(-std::numeric_limits<double>::infinity()));
	EXPECT_TRUE(hasNonZero(m));

	m.setTo(Scalar::all(std::numeric_limits<double>::quiet_NaN()));
	EXPECT_TRUE(hasNonZero(m));

	m.setTo((CV_MAT_DEPTH(type) == CV_64F) ? Scalar::all(std::numeric_limits<double>::epsilon()) : Scalar::all(std::numeric_limits<float>::epsilon()));
	EXPECT_TRUE(hasNonZero(m));

	m.setTo((CV_MAT_DEPTH(type) == CV_64F) ? Scalar::all(std::numeric_limits<double>::min()) : Scalar::all(std::numeric_limits<float>::min()));
	EXPECT_TRUE(hasNonZero(m));

	m.setTo((CV_MAT_DEPTH(type) == CV_64F) ? Scalar::all(std::numeric_limits<double>::denorm_min()) : Scalar::all(std::numeric_limits<float>::denorm_min()));
	EXPECT_TRUE(hasNonZero(m));
	}

Uh oh!

Conversation

Kumataro commented Sep 14, 2023

Pull Request Readiness Checklist

Uh oh!

Kumataro commented Sep 14, 2023

Uh oh!

mshabunin commented Dec 12, 2023

Uh oh!

Kumataro commented Dec 12, 2023

Uh oh!

mshabunin Dec 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kumataro Dec 16, 2023

Choose a reason for hiding this comment

Uh oh!

Kumataro commented Dec 15, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vpisarev commented Dec 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmorkalov commented Dec 22, 2023

Uh oh!

Kumataro commented Dec 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmorkalov commented Dec 22, 2023

Uh oh!

Kumataro commented Dec 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kumataro commented Dec 22, 2023

Uh oh!

asmorkalov commented Dec 22, 2023

Uh oh!

asmorkalov commented Dec 22, 2023

Uh oh!

Kumataro commented Dec 22, 2023

Uh oh!

Kumataro commented Dec 22, 2023

Uh oh!

vpisarev commented Dec 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vpisarev Dec 22, 2023

Choose a reason for hiding this comment

Uh oh!

Kumataro Dec 22, 2023

Choose a reason for hiding this comment

Uh oh!

Kumataro Dec 23, 2023

Choose a reason for hiding this comment

Uh oh!

vpisarev Dec 22, 2023

Choose a reason for hiding this comment

Uh oh!

Kumataro Dec 23, 2023

Choose a reason for hiding this comment

Uh oh!

Kumataro commented Dec 22, 2023

Uh oh!

vpisarev commented Dec 23, 2023

Uh oh!

Kumataro commented Dec 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kumataro commented Dec 24, 2023

Uh oh!

vpisarev commented Dec 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kumataro commented Dec 24, 2023

Uh oh!

asmorkalov commented Dec 24, 2023

Uh oh!

mshabunin Dec 13, 2023 •

edited

Loading

vpisarev commented Dec 22, 2023 •

edited

Loading

Kumataro commented Dec 22, 2023 •

edited

Loading

Kumataro commented Dec 22, 2023 •

edited

Loading

vpisarev commented Dec 22, 2023 •

edited

Loading

Kumataro commented Dec 24, 2023 •

edited

Loading

vpisarev commented Dec 24, 2023 •

edited

Loading