invSqrt SIMD_SCALABLE implementation & HAL tests refactoring by kyler1cartesis · Pull Request #26887 · opencv/opencv

kyler1cartesis · 2025-02-07T14:48:28Z

Enable CV_SIMD_SCALABLE for invSqrt.

Banana Pi BF3 (SpacemiT K1) RISC-V
Compiler: Syntacore Clang 18.1.4 (build 2024.12)

Geometric mean (ms)

                Name of Test                  baseline   simd      simd   
                                                       scalable  scalable
                                                                    vs
                                                                 baseline
                                                                (x-factor)
InvSqrtf::InvSqrtfFixture::(127x61, 32FC1)     0.163    0.051      3.23   
InvSqrtf::InvSqrtfFixture::(127x61, 64FC1)     0.241    0.103      2.35   
InvSqrtf::InvSqrtfFixture::(640x480, 32FC1)    6.460    1.893      3.41   
InvSqrtf::InvSqrtfFixture::(640x480, 64FC1)    9.687    3.999      2.42   
InvSqrtf::InvSqrtfFixture::(1280x720, 32FC1)   19.292   5.701      3.38   
InvSqrtf::InvSqrtfFixture::(1280x720, 64FC1)   29.452   11.963     2.46   
InvSqrtf::InvSqrtfFixture::(1920x1080, 32FC1)  43.326   12.805     3.38   
InvSqrtf::InvSqrtfFixture::(1920x1080, 64FC1)  65.566   26.881     2.44

3rdparty/hal_rvv/hal_rvv_1p0/inv_sqrt_f.hpp

modules/core/test/test_hal_core.cpp

dkurt · 2025-02-13T13:41:57Z

modules/core/test/test_hal_core.cpp

+        min_hal_t = std::min(min_hal_t, t);
+
+        t = (double)getTickCount();
+        bool solveStatus = solve(a0, b, x0, (nfunc == HAL_LU ? DECOMP_LU : DECOMP_CHOLESKY));


This test is weak because it tests explicit HAL call and the same HAL call at

opencv/modules/core/src/matrix_decomp.cpp

Line 187 in 8e65075

CALL_HAL_RET(Cholesky64f, cv_hal_Cholesky64f, output, A, astep, m, b, bstep, n)

I propose to invert the ground truth calculation: generate A and x with randu, calculate b with multiplication, use A and b for test and x as ground truth.

I've inverted the test but keep comparison between A * x and A * x0 because test x vs x0 gives bigger error on size=15:

[ RUN ] Core_HAL/mat_decomp.accuracy/7, where GetParam() = (5, Cholesky, 15) /home/d.kurtaev/opencv/modules/core/test/test_hal_core.cpp:189: Failure Expected: (cvtest::norm(x, x0, NORM_INF | NORM_RELATIVE)) <= (eps), actual: 0.03707 vs 1e-05

asmorkalov · 2025-02-15T07:27:55Z

What if just add CV_SIMD_SCALABLE here:

opencv/modules/core/src/mathfuncs_core.simd.hpp

Line 343 in 36a5176

#if CV_SIMD

and corresponding 64-bit guard for corresponding function? IMHO, it's more efficient to tune code with universal intrinsics rather then duplicate the code in HAL.

dkurt · 2025-02-15T13:07:38Z

@asmorkalov, yes, that worked also fine. So Can we enable in this PR only invSqrt but I will open a new PR with changes to other methods?

Geometric mean (ms)

                Name of Test                   hal     simd      simd
                                                     scalable  scalable
                                                                  vs
                                                                 hal
                                                              (x-factor)
InvSqrtf::InvSqrtfFixture::(127x61, 32FC1)    0.051   0.051      1.01   
InvSqrtf::InvSqrtfFixture::(127x61, 64FC1)    0.102   0.103      1.00
InvSqrtf::InvSqrtfFixture::(640x480, 32FC1)   1.903   1.893      1.00
InvSqrtf::InvSqrtfFixture::(640x480, 64FC1)   4.008   3.999      1.00
InvSqrtf::InvSqrtfFixture::(1280x720, 32FC1)  5.713   5.701      1.00
InvSqrtf::InvSqrtfFixture::(1280x720, 64FC1)  12.000  11.963     1.00
InvSqrtf::InvSqrtfFixture::(1920x1080, 32FC1) 12.832  12.805     1.00
InvSqrtf::InvSqrtfFixture::(1920x1080, 64FC1) 26.911  26.881     1.00

kyler1cartesis · 2025-02-17T13:14:27Z

Newton approximation method could be used in the future, maybe there even will be more accurate instruction (now only 7-bit). But somehow my algorithm didn't work.
What accuracy of a result is needed?

dkurt · 2025-02-17T13:38:32Z

@kyler1cartesis, I think it depends on the input values range and the algorithm. As we discussed offline, some of the OpenCV algorithms does not require perfect accuracy and probably you may do the following research:

Find OpenCV modules in main repo and https://github.com/opencv/opencv_contrib/ which are heavily depend on inv_sqrt
Check if they already have universal intrinsics optimizations
Try apply fast inv sqrt versions for them. Despite local operation accuracy degradation overall algorithm may keep the same quality.

If there is a performance benefit on using fast inverse sqrt on some of the methods, it worth to add a new universal intrinsic (with __riscv_vfrsqrt7_v_* in RVV branch as you tried previously)

modules/core/test/test_hal_core.cpp

asmorkalov · 2025-02-18T06:41:30Z

modules/core/test/test_hal_core.cpp

+        min_hal_t = std::min(min_hal_t, t);
+
+        t = (double)getTickCount();
+        bool solveStatus = solve(a0, b, x0, (nfunc == HAL_LU ? DECOMP_LU : DECOMP_CHOLESKY));


I propose to invert the ground truth calculation: generate A and x with randu, calculate b with multiplication, use A and b for test and x as ground truth.

modules/core/test/test_hal_core.cpp

asmorkalov · 2025-02-19T09:12:36Z

Results for Spacemit muse pi v30 (gcc 14.2.1):

Geometric mean (ms)

               Name of Test                 4.x-1  patch-1  patch-1  
                                                               vs    
                                                             4.x-1   
                                                           (x-factor)
InvSqrt::InvSqrtFixture::(127x61, 32FC1)    0.261   0.050     5.20   
InvSqrt::InvSqrtFixture::(127x61, 64FC1)    0.339   0.102     3.32   
InvSqrt::InvSqrtFixture::(640x480, 32FC1)   10.456  1.895     5.52   
InvSqrt::InvSqrtFixture::(640x480, 64FC1)   13.814  4.003     3.45   
InvSqrt::InvSqrtFixture::(1280x720, 32FC1)  31.338  5.701     5.50   
InvSqrt::InvSqrtFixture::(1280x720, 64FC1)  41.446 11.933     3.47   
InvSqrt::InvSqrtFixture::(1920x1080, 32FC1) 70.409 12.767     5.51   
InvSqrt::InvSqrtFixture::(1920x1080, 64FC1) 92.907 26.813     3.46

invSqrt SIMD_SCALABLE implementation & HAL tests refactoring opencv#26887 Enable CV_SIMD_SCALABLE for invSqrt. * Banana Pi BF3 (SpacemiT K1) RISC-V * Compiler: Syntacore Clang 18.1.4 (build 2024.12) ``` Geometric mean (ms) Name of Test baseline simd simd scalable scalable vs baseline (x-factor) InvSqrtf::InvSqrtfFixture::(127x61, 32FC1) 0.163 0.051 3.23 InvSqrtf::InvSqrtfFixture::(127x61, 64FC1) 0.241 0.103 2.35 InvSqrtf::InvSqrtfFixture::(640x480, 32FC1) 6.460 1.893 3.41 InvSqrtf::InvSqrtfFixture::(640x480, 64FC1) 9.687 3.999 2.42 InvSqrtf::InvSqrtfFixture::(1280x720, 32FC1) 19.292 5.701 3.38 InvSqrtf::InvSqrtfFixture::(1280x720, 64FC1) 29.452 11.963 2.46 InvSqrtf::InvSqrtfFixture::(1920x1080, 32FC1) 43.326 12.805 3.38 InvSqrtf::InvSqrtfFixture::(1920x1080, 64FC1) 65.566 26.881 2.44 ```

dkurt marked this pull request as draft February 7, 2025 15:03

dkurt added optimization platform: riscv labels Feb 7, 2025

dkurt self-assigned this Feb 7, 2025

dkurt reviewed Feb 7, 2025

View reviewed changes

3rdparty/hal_rvv/hal_rvv_1p0/inv_sqrt_f.hpp Outdated Show resolved Hide resolved

dkurt reviewed Feb 7, 2025

View reviewed changes

3rdparty/hal_rvv/hal_rvv_1p0/inv_sqrt_f.hpp Outdated Show resolved Hide resolved

dkurt reviewed Feb 7, 2025

View reviewed changes

3rdparty/hal_rvv/hal_rvv_1p0/inv_sqrt_f.hpp Outdated Show resolved Hide resolved

dkurt reviewed Feb 7, 2025

View reviewed changes

3rdparty/hal_rvv/hal_rvv_1p0/inv_sqrt_f.hpp Outdated Show resolved Hide resolved

asmorkalov reviewed Feb 10, 2025

View reviewed changes

modules/core/test/test_hal_core.cpp Outdated Show resolved Hide resolved

Added RVV HAL invSqrt32f() implementation & tests

fe3d046

dkurt force-pushed the 4.x branch from aeef354 to 1647b97 Compare February 13, 2025 13:08

dkurt reviewed Feb 13, 2025

View reviewed changes

dkurt force-pushed the 4.x branch from 1647b97 to 59e24d0 Compare February 13, 2025 15:28

Remove remainings. Refactor tests

247f51a

dkurt force-pushed the 4.x branch from 59e24d0 to 247f51a Compare February 13, 2025 15:29

dkurt added 2 commits February 13, 2025 20:02

Update test_hal_core.cpp

732b018

Update test_hal_core.cpp

57386a8

dkurt marked this pull request as ready for review February 13, 2025 18:13

dkurt requested review from asmorkalov and mshabunin February 13, 2025 20:17

dkurt changed the title ~~Added RVV HAL invSqrt32f() implementation & tests~~ Added RVV HAL invSqrt implementation & tests Feb 13, 2025

Enable CV_SIMD_SCALABLE for invSqrt

0af6100

dkurt changed the title ~~Added RVV HAL invSqrt implementation & tests~~ invSqrt SIMD_SCALABLE implementation & tests Feb 15, 2025

fengyuentau mentioned this pull request Feb 17, 2025

core: vectorize cv::normalize / cv::norm #26885

Merged

6 tasks

dkurt approved these changes Feb 17, 2025

View reviewed changes

dkurt added this to the 4.12.0 milestone Feb 17, 2025

asmorkalov requested changes Feb 17, 2025

View reviewed changes

modules/core/test/test_hal_core.cpp Outdated Show resolved Hide resolved

Fix typo

569ac1f

opencv-alalek added the category: core label Feb 17, 2025

asmorkalov reviewed Feb 18, 2025

View reviewed changes

dkurt changed the title ~~invSqrt SIMD_SCALABLE implementation & tests~~ invSqrt SIMD_SCALABLE implementation & HAL tests refactoring Feb 18, 2025

asmorkalov reviewed Feb 18, 2025

View reviewed changes

modules/core/test/test_hal_core.cpp Outdated Show resolved Hide resolved

Refactor hal mat_decomp test

5a57e70

dkurt force-pushed the 4.x branch from 578cef4 to 5a57e70 Compare February 18, 2025 08:54

asmorkalov approved these changes Feb 18, 2025

View reviewed changes

asmorkalov merged commit d32d4da into opencv:4.x Feb 19, 2025
27 of 29 checks passed

asmorkalov mentioned this pull request Mar 4, 2025

5.x merge 4.x #27009

Merged

Uh oh!

Conversation

kyler1cartesis commented Feb 7, 2025 • edited by dkurt Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dkurt Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

asmorkalov Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

dkurt Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

asmorkalov commented Feb 15, 2025

Uh oh!

dkurt commented Feb 15, 2025

Uh oh!

kyler1cartesis commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkurt commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asmorkalov Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

asmorkalov commented Feb 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kyler1cartesis commented Feb 7, 2025 •

edited by dkurt

Loading

kyler1cartesis commented Feb 17, 2025 •

edited

Loading

dkurt commented Feb 17, 2025 •

edited

Loading