fix to convert float32 to int32/uint32 with rounding to nearest (ties to even).#24271
fix to convert float32 to int32/uint32 with rounding to nearest (ties to even).#24271asmorkalov merged 11 commits intoopencv:4.xfrom
Conversation
|
Hello, at Ubuntu2004-ARM64-Debug, It looks like to failed to save unit test result. The test itself appears to complete without any problems. Could you please confirm that this test environment works properly? |
|
What about opencv/modules/core/include/opencv2/core/fast_math.hpp Lines 87 to 102 in 850be1e As I understand it uses |
|
Thank you for your comment, I think opencv/modules/core/test/test_arithm.cpp Lines 1686 to 1689 in 850be1e I'll try to think of ways to improve it a little more. |
| Mat src1 = ( Mat_<uchar>(1, 16) << 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ); | ||
| Mat src2 = ( Mat_<uchar>(1, 16) << 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ); | ||
| Mat mean = ( src1 + src2 ) / 2; | ||
| Mat expected(1,16,CV_8U); |
There was a problem hiding this comment.
Maybe it would be better to use non 16-divisible size to verify that tail processing also works as expected. E.g. 20 elements.
Comon array size can be stored as a separate constant to reduce duplication.
There was a problem hiding this comment.
Thank you for your review.
I updated this test including 3x3, 3x4, 3x5, 3x6 cases.
- 3x3 - 9 elements - float32_4 x 2 + float32
- 3x4 - 12 elements - float32_4 x 3
- 3x5 - 15 elements - float32_4 x 3 + float32_2 x 1 + float32 x 1
- 3x6 - 18 elements - float32_4 x 4 + float32_2 x 1
I think these test are including above test.
- 3x1 - 3 elements - float32_2 x 1 + float32 x 1
- 3x2 - 6 elements - float32_4 x 1 + float32x2 x 1
And CV_8S, CV_16U, CV_16S and CV_32S are supported to test.
|
Thank you for waiting. I'm a little confused.
Strictly, it is not correct in this implementation. And when calculating (1+0)/2, SIMD implementation and cvRound() are mixed. At OpenCV 4.8.0 doc, https://docs.opencv.org/4.8.0/db/de0/group__core__utils.html#ga085eca238176984a0b72df2818598d85
For arm, We can control So the result of #include <iostream>
#include <cfenv>
#include <opencv2/core.hpp>
void test(std::string title, int mode)
{
switch(mode)
{
case FE_DOWNWARD:
case FE_TONEAREST:
case FE_TOWARDZERO:
case FE_UPWARD:
std::fesetround(mode);
break;
case -1:
default:
break;
}
std::cout << title << std::endl;
std::cout << "cvRound(-1.5) = " << cvRound(-1.5) << std::endl;
std::cout << "cvRound(-0.5) = " << cvRound(-0.5) << std::endl;
std::cout << "cvRound(-0.2) = " << cvRound(-0.2) << std::endl;
std::cout << "cvRound( 0.0) = " << cvRound( 0.0) << std::endl;
std::cout << "cvRound( 0.2) = " << cvRound( 0.2) << std::endl;
std::cout << "cvRound( 0.5) = " << cvRound( 0.5) << std::endl;
std::cout << "cvRound( 1.5) = " << cvRound( 1.5) << std::endl;
cv::Mat ones = cv::Mat::ones(3,3,CV_8U);
cv::Mat zeros = cv::Mat::zeros(3,3,CV_8U);
cv::Mat means = ( ones + zeros ) / 2;
std::cout << "(1+0)/2 = " << means << std::endl;
std::cout << std::endl;
}
int main(void)
{
test("Default", -1);
test("FE_DOWNLOAD", FE_DOWNWARD);
test("FE_TONREAREST", FE_TONEAREST);
test("FE_TOWARDZERO", FE_TOWARDZERO);
test("FE_UPWARD", FE_UPWARD);
return 0;
}Result on x86-64. (at 4.x branch) Default
cvRound(-1.5) = -2
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 2
(1+0)/2 = [ 0, 0, 0;
0, 0, 0;
0, 0, 0]
FE_DOWNLOAD
cvRound(-1.5) = -2
cvRound(-0.5) = -1
cvRound(-0.2) = -1
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 1
(1+0)/2 = [ 0, 0, 0;
0, 0, 0;
0, 0, 0]
FE_TONREAREST
cvRound(-1.5) = -2
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 2
(1+0)/2 = [ 0, 0, 0;
0, 0, 0;
0, 0, 0]
FE_TOWARDZERO
cvRound(-1.5) = -1
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 1
(1+0)/2 = [ 0, 0, 0;
0, 0, 0;
0, 0, 0]
FE_UPWARD
cvRound(-1.5) = -1
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 1
cvRound( 0.5) = 1
cvRound( 1.5) = 2
(1+0)/2 = [ 1, 1, 1;
1, 1, 1;
1, 1, 1]Result on Rasberry Pi 4 with Ubuntu 23.10(32bit) (at 4.x branch) Default
cvRound(-1.5) = -2
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 2
(1+0)/2 = [ 1, 1, 1;
1, 1, 1;
1, 1, 0]
FE_DOWNLOAD
cvRound(-1.5) = -2
cvRound(-0.5) = -1
cvRound(-0.2) = -1
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 1
(1+0)/2 = [ 1, 1, 1;
1, 1, 1;
1, 1, 0]
FE_TONREAREST
cvRound(-1.5) = -2
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 2
(1+0)/2 = [ 1, 1, 1;
1, 1, 1;
1, 1, 0]
FE_TOWARDZERO
cvRound(-1.5) = -1
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 0
cvRound( 0.5) = 0
cvRound( 1.5) = 1
(1+0)/2 = [ 1, 1, 1;
1, 1, 1;
1, 1, 0]
FE_UPWARD
cvRound(-1.5) = -1
cvRound(-0.5) = 0
cvRound(-0.2) = 0
cvRound( 0.0) = 0
cvRound( 0.2) = 1
cvRound( 0.5) = 1
cvRound( 1.5) = 2
(1+0)/2 = [ 1, 1, 1;
1, 1, 1;
1, 1, 1] |
|
@Kumataro, Can you please do it on ARMv7 and earlier in the following way (the Carotene part is shown, in HAL it should be similar): it will produce correct results for inputs within +/-12,582,912, which is plenty enough for coordinates, pixel values etc. Even for values beyond this range the results are reasonable, even though not always correct. |
|
@vpisarev could you give a link to some paper to mention in the code. |
|
Thank you for your review, I try to fix it. On RasberryPi 4B (Ubuntu 23.04, 32bit, armv7 kernel), I run opencv_test_imgproc, only 1 test is failed. |
|
@Kumataro I appologize for the false-alarm. I found configuration issue on my side. I have the same test failure as yours. It's old issue, not relevant to your PR. |
|
@asmorkalov No problem, thank you very much ! I investigated why test is failed. I reproduced failed test with And at RISC-V, it seems that similar problem is occurred with CV_32S, (0+1)/2 is not same as cvRound(0.5). https://github.com/opencv/opencv/actions/runs/7298088083/job/19888636576?pr=24271 |
|
@vpisarev Thank you very mush for your advice ! At RISC-V, additional test for this issue is failed. However it is potential problem,. |
|
@mshabunin Could you take a look on RISC-V related issues? |
|
@Kumataro Please disable the test for RISC-V for now in the following way: |
|
|
|
I fix to use "DISABLE" prefix instead of "_DISABLE". I think it works well. |
@asmorkalov, for many years OpenCV used a similar trick for more or less efficient implementation of cvRound(). In general, to properly round IEEE754-compliant number, which has M bits in mantissa representation (not counting the implicit 1-bit), to the nearest integer, you need to add and then subtract a magic number (3 << (M-1)). 32-bit floating-point number has 23 bits in mantissa, so the delta is (3 << 22). For double it's (3 << 51), for half it's (3 << 9). Here is the corresponding stack overflow question where you may find some explanation of the trick: https://stackoverflow.com/questions/17035464/a-fast-method-to-round-a-double-to-a-32-bit-int-explained. |
| { | ||
| static const int32x4_t v_sign = vdupq_n_s32(1 << 31), | ||
| v_05 = vreinterpretq_s32_f32(vdupq_n_f32(0.5f)); | ||
| static const float32x4_t v0_0_f32 = vdupq_n_f32(0.0); |
There was a problem hiding this comment.
@Kumataro, can you please fix HAL implementation of v_round as well?
There was a problem hiding this comment.
OK, I fixed it and committed.
However, I ran opencv_test_core, there are 1 failed.
I believe it doesn't used v_round().I think this is a potential problem.
[----------] Global test environment tear-down
[ SKIPSTAT ] 86 tests skipped
[ SKIPSTAT ] TAG='mem_2gb' skip 1 tests
[ SKIPSTAT ] TAG='skip_other' skip 85 tests
[==========] 11949 tests from 264 test cases ran. (351138 ms total)
[ PASSED ] 11948 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/1, where GetParam() = (5, 320x240)
Note: Google Test filter = Core/HasNonZeroLimitValues*
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from Core/HasNonZeroLimitValues
[ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/0, where GetParam() = (5, 1x1)
[ OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/0 (0 ms)
[ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/1, where GetParam() = (5, 320x240)
/home/kmtr/work/opencv/modules/core/test/test_hasnonzero.cpp:109: Failure
Value of: hasNonZero(m)
Actual: false
Expected: true
[ FAILED ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/1, where GetParam() = (5, 320x240) (1 ms)
[ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/2, where GetParam() = (5, 127x113)
[ OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/2 (0 ms)
[ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/3, where GetParam() = (5, 1x113)
[ OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/3 (1 ms)
[ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/4, where GetParam() = (6, 1x1)
[ OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/4 (0 ms)
[ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/5, where GetParam() = (6, 320x240)
[ OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/5 (2 ms)
[ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/6, where GetParam() = (6, 127x113)
[ OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/6 (0 ms)
[ RUN ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/7, where GetParam() = (6, 1x113)
[ OK ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/7 (0 ms)
[----------] 8 tests from Core/HasNonZeroLimitValues (4 ms total)
[----------] Global test environment tear-down
[==========] 8 tests from 1 test case ran. (5 ms total)
[ PASSED ] 7 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] Core/HasNonZeroLimitValues.hasNonZeroLimitValues/1, where GetParam() = (5, 320x240)
1 FAILED TEST
There was a problem hiding this comment.
OK, I cleared it is not related with this work.
(Some code is updated, however almost are not changed.)
I believe we can split from this issue.
opencv/modules/core/src/has_non_zero.simd.hpp
Lines 192 to 202 in 953dddd
I disabled L196 to stop using simd, same this test is passed.
And those code is simple, loading from memory and or-ing registers, checking to equal it with zero. So this logic is not related currently fixing.
More investigation is required, but it is related with only special cases(denorm_min).
opencv/modules/core/test/test_hasnonzero.cpp
Lines 86 to 110 in 953dddd
| */ | ||
|
|
||
| // See https://github.com/opencv/opencv/pull/24271#issuecomment-1867318007 | ||
| #define ROUND_DELTA (12582912.0f) |
There was a problem hiding this comment.
Please, add CAROTENE prefix to the macro, ROUND_DELTA name is too generic and may conflict with something else
There was a problem hiding this comment.
OK, I agree with you, I fixed it, thank you !
|
Test result with e50fa1a RasberryPi 4B (Ubuntu 23.04 32bit/armv7) core imgproc imgcodecs |
|
let's see what our CI bots will say. It's stange that histogram test fails after modification of round(). I thought that histogram bin is computed using floor() function, not round(), since it needs to find n, such that |
My recognize is it occurs with 4.x branch code potentially. So test fault exists before this fix. Without ENABLE_NEON=ON => Passed With ENABLE_NEON=ON => Failed |
|
I'm sorry, but do I have misrecognize about what is problem ? |
|
@Kumataro, I suggest to do extensive testing of your patch by temporarily switching to the new round even for ARMv8. If your new implementations of SIMD round(), please, temporary change to As long as it's passed, change it back and we will merge it. |
|
@vpisarev |
|
|
|
Ok, almost tests (including ARM64) are passed, last 1 is default test (it is not started.) . Can I revert temporary commit ? |
This reverts commit c4e151d.
Now that we have confirmed that we are passing the necessary ARMv8 tests, I will revert the temporary fix. |
|
I would like to express my deepest gratitude to everyone for their help in completing this pull request. |
Fix to convert float32 to int32/uint32 with rounding to nearest (ties to even). opencv#24271 Fix opencv#24163 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake (carotene is BSD)
fix #24163
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.
(carotene is BSD)