[HAL RVV] unify and impl polar_to_cart | add perf test by GenshinImpactStarts · Pull Request #26999 · opencv/opencv

GenshinImpactStarts · 2025-03-03T09:47:51Z

Summary

Implement through the existing cv_hal_polarToCart32f and cv_hal_polarToCart64f interfaces.
Add polarToCart performance tests
Make cv::polarToCart use CALL_HAL in the same way as cv::cartToPolar
To achieve the 3rd point, the original implementation was moved, and some modifications were made.

Tested through:

opencv_test_core --gtest_filter="*PolarToCart*:*Core_CartPolar_reverse*" 
opencv_perf_core --gtest_filter="*PolarToCart*" --perf_min_samples=300 --perf_force_samples=300

HAL performance test

UPDATE: Current implementation is no more depending on vlen.

NOTE: Due to the 4th point in the summary above, the scalar and ui test is based on the modified code of this PR. The impact of this patch on scalar and ui is evaluated in the next section, Effect of Point 4.

Vlen 256 (Muse Pi):

                   Name of Test                     scalar    ui     rvv       ui        rvv    
                                                                               vs         vs    
                                                                             scalar     scalar  
                                                                           (x-factor) (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.315  0.110  0.034     2.85       9.34   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.423  0.163  0.045     2.59       9.34   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   13.695  4.325  1.278     3.17      10.71   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   17.719  7.118  2.105     2.49       8.42   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  40.678  13.114 3.977     3.10      10.23   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  53.124  21.298 6.519     2.49       8.15   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 95.158  29.465 8.894     3.23      10.70   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 119.262 47.743 14.129    2.50       8.44

Effect of Point 4

To make cv::polarToCart behave the same as cv::cartToPolar, the implementation detail of the former has been moved to the latter's location (from mathfuncs.cpp to mathfuncs_core.simd.hpp).

Reason for Changes:

This function works as follows:
$y = \text{mag} \times \sin(\text{angle})$ and $x = \text{mag} \times \cos(\text{angle})$. The original implementation first calculates the values of $\sin$ and $\cos$, storing the results in the output buffers $x$ and $y$, and then multiplies the result by $\text{mag}$.

However, when the function is used as an in-place operation (one of the output buffers is also an input buffer), the original implementation allocates an extra buffer to store the $\sin$ and $\cos$ values in case the $\text{mag}$ value gets overwritten. This extra buffer allocation prevents cv::polarToCart from functioning in the same way as cv::cartToPolar.

Therefore, the multiplication is now performed immediately without storing intermediate values. Since the original implementation also had AVX2 optimizations, I have applied the same optimizations to the AVX2 version of this implementation.

UPDATE: UI use v_sincos from #25892 now. The original implementation has AVX2 optimizations but is slower much than current UI so it's removed, and AVX2 perf test is below. Scalar implementation isn't changed because it's faster than using UI's method.

Test Result

scalar and ui test is done on Muse PI, and AVX2 test is done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz.

scalar test:

                   Name of Test                      orig     pr        pr    
                                                                        vs    
                                                                       orig   
                                                                    (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.333   0.294     1.13   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.385   0.403     0.96   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   14.749  12.343     1.19   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   19.419  16.743     1.16   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  44.155  37.822     1.17   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  62.108  50.358     1.23   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 99.011  85.769     1.15   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 127.740 112.874    1.13

ui test:

                   Name of Test                      orig     pr        pr    
                                                                        vs    
                                                                       orig   
                                                                    (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.306  0.110     2.77   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.455  0.163     2.79   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   13.381  4.325     3.09   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   21.851  7.118     3.07   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  39.975  13.114    3.05   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  67.006  21.298    3.15   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 90.362  29.465    3.07   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 129.637 47.743    2.72

AVX2 test:

                   Name of Test                     orig   pr       pr    
                                                                    vs    
                                                                   orig   
                                                                (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)    0.019 0.009    2.11   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)    0.022 0.013    1.74   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   0.788 0.355    2.22   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   1.102 0.618    1.78   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  2.383 1.042    2.29   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  3.758 2.316    1.62   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 5.577 2.559    2.18   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 9.710 6.424    1.51

A slight performance loss occurs because the check for whether $mag$ is nullptr is performed with every calculation, instead of being done once per batch. This is to reuse current SinCos_32f function.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

asmorkalov · 2025-03-05T11:20:16Z

My performance numbers for Muse Pi v 30 (gcc 14.2):

PolarToCart::PolarToCartFixture::(127x61, 32FC1) 	0.309 	0.052 	5.94
PolarToCart::PolarToCartFixture::(127x61, 64FC1) 	0.505 	0.063 	7.99
PolarToCart::PolarToCartFixture::(640x480, 32FC1) 	12.621 	1.972 	6.40
PolarToCart::PolarToCartFixture::(640x480, 64FC1) 	21.412 	2.737 	7.82
PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 	37.940 	5.991 	6.33
PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 	67.401 	8.152 	8.27
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 	84.716 	13.363 	6.34
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 	141.790 	17.983 	7.88

asmorkalov · 2025-03-05T11:26:01Z

@GenshinImpactStarts Thanks a lot for the contribution! I want to ask you to try UI branch for the function too. See cartToPolar32f_:

opencv/modules/core/src/mathfuncs_core.simd.hpp

Line 122 in dbd4e45

GenshinImpactStarts · 2025-03-05T18:34:38Z

@asmorkalov Do you want me to turn on CV_SIMD_SCALABLE for cartToPolar32f_, or should I work on the UI for polarToCart?

asmorkalov · 2025-03-06T05:00:29Z

Try to turn on for CV_SIMD_SCALABLE for cartToPolar32f_ and fix the implementation, if it's possible.

GenshinImpactStarts · 2025-03-06T05:19:57Z

OK. I will work on this in another PR #27000 .

3rdparty/hal_rvv/hal_rvv_1p0/polar_to_cart.hpp

3rdparty/hal_rvv/hal_rvv_1p0/sincos.hpp

modules/core/src/mathfuncs_core.simd.hpp

fengyuentau · 2025-03-10T08:54:09Z

My performance results (K1 vs RK3568):

Geometric mean (ms)

                   Name of Test                       rk   patch-gcc patch-clang patch-gcc  patch-clang
                                                                                     vs         vs     
                                                                                     rk         rk     
                                                                                 (x-factor) (x-factor) 
PolarToCart::PolarToCartFixture::(127x61, 32FC1)    0.246    0.053      0.054       4.67       4.58    
PolarToCart::PolarToCartFixture::(127x61, 64FC1)    0.266    0.063      0.063       4.20       4.21    
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   10.193   1.951      1.994       5.22       5.11    
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   11.518   2.818      2.764       4.09       4.17    
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  30.418   5.911      6.097       5.15       4.99    
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  34.519   8.710      8.303       3.96       4.16    
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 67.175  13.529     13.866       4.97       4.84    
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 77.590  18.428     18.075       4.21       4.29

GenshinImpactStarts · 2025-03-10T18:46:01Z

UI implementation of sincos has been replaced with v_sincos from #25892. HAL implementation also improved. Comment and perf test updated.

Updated parts in comment:

HAL performance test

UPDATE: Current implementation is no more depending on vlen.

Vlen 256 (Muse Pi):

                   Name of Test                     scalar    ui     rvv       ui        rvv    
                                                                               vs         vs    
                                                                             scalar     scalar  
                                                                           (x-factor) (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.315  0.110  0.034     2.85       9.34   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.423  0.163  0.045     2.59       9.34   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   13.695  4.325  1.278     3.17      10.71   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   17.719  7.118  2.105     2.49       8.42   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  40.678  13.114 3.977     3.10      10.23   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  53.124  21.298 6.519     2.49       8.15   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 95.158  29.465 8.894     3.23      10.70   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 119.262 47.743 14.129    2.50       8.44

Effect of Point 4

UPDATE: UI use v_sincos from #25892 now. The original implementation has AVX2 optimizations but is slower much than current UI so it's removed, and AVX2 perf test is below. Scalar implementation isn't changed because it's faster than using UI's method.

Test Result

scalar and ui test is done on Muse PI, and AVX2 test is done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz.

ui test:

                   Name of Test                      orig     pr        pr    
                                                                        vs    
                                                                       orig   
                                                                    (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.306  0.110     2.77   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.455  0.163     2.79   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   13.381  4.325     3.09   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   21.851  7.118     3.07   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  39.975  13.114    3.05   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  67.006  21.298    3.15   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 90.362  29.465    3.07   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 129.637 47.743    2.72

AVX2 test:

                   Name of Test                     orig   pr       pr    
                                                                    vs    
                                                                   orig   
                                                                (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)    0.019 0.009    2.11   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)    0.022 0.013    1.74   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   0.788 0.355    2.22   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   1.102 0.618    1.78   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  2.383 1.042    2.29   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  3.758 2.316    1.62   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 5.577 2.559    2.18   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 9.710 6.424    1.51

fengyuentau · 2025-03-11T06:45:02Z

According to your results, UI works better than the AVX2 branch. It makes sense to drop it and keep the code clean.

fengyuentau · 2025-03-11T06:53:22Z

CI failed: https://github.com/opencv/opencv/actions/runs/13771668032/job/38541550131?pr=26999.

fengyuentau · 2025-03-12T11:27:25Z

Updated performance results (K1 vs RK3568):

                   Name of Test                       rk   k1-patch-gcc k1-patch-clang k1-patch-gcc k1-patch-clang
                                                                                            vs            vs
                                                                                            rk            rk
                                                                                        (x-factor)    (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)    0.245     0.035         0.035          6.94          7.09
PolarToCart::PolarToCartFixture::(127x61, 64FC1)    0.265     0.048         0.045          5.55          5.89
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   10.170    1.277         1.264          7.96          8.04
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   11.359    2.024         1.942          5.61          5.85
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  30.668    3.855         3.814          7.95          8.04
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  34.467    6.333         6.196          5.44          5.56
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 67.273    8.638         8.632          7.79          7.79
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 77.439    13.415        12.930         5.77          5.99

Impressive 👍

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>

asmorkalov · 2025-03-17T09:37:21Z

There is speedup on aarch64 (jetson orin):

PolarToCart::OCL_PolarToCartFixture::(640x480, 32FC1) 	2.295 	1.301 	1.76
PolarToCart::OCL_PolarToCartFixture::(640x480, 32FC4) 	9.395 	5.164 	1.82
PolarToCart::OCL_PolarToCartFixture::(1280x720, 32FC1) 	6.943 	3.872 	1.79
PolarToCart::OCL_PolarToCartFixture::(1280x720, 32FC4) 	28.141 	15.403 	1.83
PolarToCart::OCL_PolarToCartFixture::(1920x1080, 32FC1) 	15.707 	8.657 	1.81
PolarToCart::OCL_PolarToCartFixture::(1920x1080, 32FC4) 	63.444 	34.729 	1.83
PolarToCart::OCL_PolarToCartFixture::(3840x2160, 32FC1) 	63.401 	34.626 	1.83
PolarToCart::OCL_PolarToCartFixture::(3840x2160, 32FC4) 	252.696 	138.592 	1.82
PolarToCart::PolarToCartFixture::(127x61, 32FC1) 	0.057 	0.033 	1.75
PolarToCart::PolarToCartFixture::(127x61, 64FC1) 	0.062 	0.040 	1.57
PolarToCart::PolarToCartFixture::(640x480, 32FC1) 	2.289 	1.285 	1.78
PolarToCart::PolarToCartFixture::(640x480, 64FC1) 	2.794 	1.616 	1.73
PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 	6.939 	3.855 	1.80
PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 	8.291 	4.864 	1.70
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 	15.657 	8.671 	1.81
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 	19.421 	10.957 	1.77

mshabunin added the platform: riscv label Mar 3, 2025

asmorkalov added the optimization label Mar 3, 2025

asmorkalov added this to the 4.12.0 milestone Mar 3, 2025

GenshinImpactStarts force-pushed the polar_to_cart branch from 6d3d8d0 to 12b22a6 Compare March 4, 2025 15:59

GenshinImpactStarts force-pushed the polar_to_cart branch from 12b22a6 to 6f61e59 Compare March 7, 2025 17:19

fengyuentau self-requested a review March 10, 2025 06:29

fengyuentau reviewed Mar 10, 2025

View reviewed changes

3rdparty/hal_rvv/hal_rvv_1p0/polar_to_cart.hpp Outdated Show resolved Hide resolved

3rdparty/hal_rvv/hal_rvv_1p0/sincos.hpp Show resolved Hide resolved

modules/core/src/mathfuncs_core.simd.hpp Show resolved Hide resolved

modules/core/src/mathfuncs_core.simd.hpp Show resolved Hide resolved

GenshinImpactStarts force-pushed the polar_to_cart branch from 6f61e59 to 87059be Compare March 10, 2025 17:29

GenshinImpactStarts force-pushed the polar_to_cart branch from c081d0c to 850ec68 Compare March 11, 2025 08:29

fengyuentau approved these changes Mar 12, 2025

View reviewed changes

GenshinImpactStarts and others added 4 commits March 13, 2025 14:23

unify and impl polar_to_cart | add perf test

f3bc44e

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>

opt hal and ui sincos | unified helper class

e0733d9

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>

fix wrong cast between float and int

b5c9c6d

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>

add copyright

992714d

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>

GenshinImpactStarts force-pushed the polar_to_cart branch from c7bbbd6 to 992714d Compare March 13, 2025 14:24

asmorkalov assigned fengyuentau Mar 14, 2025

asmorkalov approved these changes Mar 17, 2025

View reviewed changes

asmorkalov merged commit 2090407 into opencv:4.x Mar 17, 2025
69 of 82 checks passed

asmorkalov mentioned this pull request Apr 29, 2025

5.x merge 4.x #27265

Merged

Uh oh!

Conversation

GenshinImpactStarts commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

HAL performance test

Effect of Point 4

Reason for Changes:

Test Result

Pull Request Readiness Checklist

Uh oh!

asmorkalov commented Mar 5, 2025

Uh oh!

asmorkalov commented Mar 5, 2025

Uh oh!

GenshinImpactStarts commented Mar 5, 2025

Uh oh!

asmorkalov commented Mar 6, 2025

Uh oh!

GenshinImpactStarts commented Mar 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fengyuentau commented Mar 10, 2025

Uh oh!

GenshinImpactStarts commented Mar 10, 2025

HAL performance test

Effect of Point 4

Test Result

Uh oh!

fengyuentau commented Mar 11, 2025

Uh oh!

fengyuentau commented Mar 11, 2025

Uh oh!

fengyuentau commented Mar 12, 2025

Uh oh!

asmorkalov commented Mar 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GenshinImpactStarts commented Mar 3, 2025 •

edited

Loading