invSqrt SIMD_SCALABLE implementation & HAL tests refactoring#26887
invSqrt SIMD_SCALABLE implementation & HAL tests refactoring#26887asmorkalov merged 7 commits intoopencv:4.xfrom
Conversation
modules/core/test/test_hal_core.cpp
Outdated
| min_hal_t = std::min(min_hal_t, t); | ||
|
|
||
| t = (double)getTickCount(); | ||
| bool solveStatus = solve(a0, b, x0, (nfunc == HAL_LU ? DECOMP_LU : DECOMP_CHOLESKY)); |
There was a problem hiding this comment.
This test is weak because it tests explicit HAL call and the same HAL call at
opencv/modules/core/src/matrix_decomp.cpp
Line 187 in 8e65075
There was a problem hiding this comment.
I propose to invert the ground truth calculation: generate A and x with randu, calculate b with multiplication, use A and b for test and x as ground truth.
There was a problem hiding this comment.
I've inverted the test but keep comparison between A * x and A * x0 because test x vs x0 gives bigger error on size=15:
[ RUN ] Core_HAL/mat_decomp.accuracy/7, where GetParam() = (5, Cholesky, 15)
/home/d.kurtaev/opencv/modules/core/test/test_hal_core.cpp:189: Failure
Expected: (cvtest::norm(x, x0, NORM_INF | NORM_RELATIVE)) <= (eps), actual: 0.03707 vs 1e-05
|
What if just add CV_SIMD_SCALABLE here: and corresponding 64-bit guard for corresponding function? IMHO, it's more efficient to tune code with universal intrinsics rather then duplicate the code in HAL. |
|
@asmorkalov, yes, that worked also fine. So Can we enable in this PR only |
|
Newton approximation method could be used in the future, maybe there even will be more accurate instruction (now only 7-bit). But somehow my algorithm didn't work. |
|
@kyler1cartesis, I think it depends on the input values range and the algorithm. As we discussed offline, some of the OpenCV algorithms does not require perfect accuracy and probably you may do the following research:
If there is a performance benefit on using fast inverse sqrt on some of the methods, it worth to add a new universal intrinsic (with |
modules/core/test/test_hal_core.cpp
Outdated
| min_hal_t = std::min(min_hal_t, t); | ||
|
|
||
| t = (double)getTickCount(); | ||
| bool solveStatus = solve(a0, b, x0, (nfunc == HAL_LU ? DECOMP_LU : DECOMP_CHOLESKY)); |
There was a problem hiding this comment.
I propose to invert the ground truth calculation: generate A and x with randu, calculate b with multiplication, use A and b for test and x as ground truth.
|
Results for Spacemit muse pi v30 (gcc 14.2.1): |
invSqrt SIMD_SCALABLE implementation & HAL tests refactoring opencv#26887 Enable CV_SIMD_SCALABLE for invSqrt. * Banana Pi BF3 (SpacemiT K1) RISC-V * Compiler: Syntacore Clang 18.1.4 (build 2024.12) ``` Geometric mean (ms) Name of Test baseline simd simd scalable scalable vs baseline (x-factor) InvSqrtf::InvSqrtfFixture::(127x61, 32FC1) 0.163 0.051 3.23 InvSqrtf::InvSqrtfFixture::(127x61, 64FC1) 0.241 0.103 2.35 InvSqrtf::InvSqrtfFixture::(640x480, 32FC1) 6.460 1.893 3.41 InvSqrtf::InvSqrtfFixture::(640x480, 64FC1) 9.687 3.999 2.42 InvSqrtf::InvSqrtfFixture::(1280x720, 32FC1) 19.292 5.701 3.38 InvSqrtf::InvSqrtfFixture::(1280x720, 64FC1) 29.452 11.963 2.46 InvSqrtf::InvSqrtfFixture::(1920x1080, 32FC1) 43.326 12.805 3.38 InvSqrtf::InvSqrtfFixture::(1920x1080, 64FC1) 65.566 26.881 2.44 ```
Enable CV_SIMD_SCALABLE for invSqrt.