Skip to content

[HAL RVV] unify and impl polar_to_cart | add perf test#26999

Merged
asmorkalov merged 4 commits intoopencv:4.xfrom
GenshinImpactStarts:polar_to_cart
Mar 17, 2025
Merged

[HAL RVV] unify and impl polar_to_cart | add perf test#26999
asmorkalov merged 4 commits intoopencv:4.xfrom
GenshinImpactStarts:polar_to_cart

Conversation

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor

@GenshinImpactStarts GenshinImpactStarts commented Mar 3, 2025

Summary

  1. Implement through the existing cv_hal_polarToCart32f and cv_hal_polarToCart64f interfaces.
  2. Add polarToCart performance tests
  3. Make cv::polarToCart use CALL_HAL in the same way as cv::cartToPolar
  4. To achieve the 3rd point, the original implementation was moved, and some modifications were made.

Tested through:

opencv_test_core --gtest_filter="*PolarToCart*:*Core_CartPolar_reverse*" 
opencv_perf_core --gtest_filter="*PolarToCart*" --perf_min_samples=300 --perf_force_samples=300

HAL performance test

UPDATE: Current implementation is no more depending on vlen.

NOTE: Due to the 4th point in the summary above, the scalar and ui test is based on the modified code of this PR. The impact of this patch on scalar and ui is evaluated in the next section, Effect of Point 4.

Vlen 256 (Muse Pi):

                   Name of Test                     scalar    ui     rvv       ui        rvv    
                                                                               vs         vs    
                                                                             scalar     scalar  
                                                                           (x-factor) (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.315  0.110  0.034     2.85       9.34   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.423  0.163  0.045     2.59       9.34   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   13.695  4.325  1.278     3.17      10.71   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   17.719  7.118  2.105     2.49       8.42   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  40.678  13.114 3.977     3.10      10.23   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  53.124  21.298 6.519     2.49       8.15   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 95.158  29.465 8.894     3.23      10.70   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 119.262 47.743 14.129    2.50       8.44   

Effect of Point 4

To make cv::polarToCart behave the same as cv::cartToPolar, the implementation detail of the former has been moved to the latter's location (from mathfuncs.cpp to mathfuncs_core.simd.hpp).

Reason for Changes:

This function works as follows:
$y = \text{mag} \times \sin(\text{angle})$ and $x = \text{mag} \times \cos(\text{angle})$. The original implementation first calculates the values of $\sin$ and $\cos$, storing the results in the output buffers $x$ and $y$, and then multiplies the result by $\text{mag}$.

However, when the function is used as an in-place operation (one of the output buffers is also an input buffer), the original implementation allocates an extra buffer to store the $\sin$ and $\cos$ values in case the $\text{mag}$ value gets overwritten. This extra buffer allocation prevents cv::polarToCart from functioning in the same way as cv::cartToPolar.

Therefore, the multiplication is now performed immediately without storing intermediate values. Since the original implementation also had AVX2 optimizations, I have applied the same optimizations to the AVX2 version of this implementation.

UPDATE: UI use v_sincos from #25892 now. The original implementation has AVX2 optimizations but is slower much than current UI so it's removed, and AVX2 perf test is below. Scalar implementation isn't changed because it's faster than using UI's method.

Test Result

scalar and ui test is done on Muse PI, and AVX2 test is done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz.

scalar test:

                   Name of Test                      orig     pr        pr    
                                                                        vs    
                                                                       orig   
                                                                    (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.333   0.294     1.13   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.385   0.403     0.96   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   14.749  12.343     1.19   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   19.419  16.743     1.16   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  44.155  37.822     1.17   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  62.108  50.358     1.23   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 99.011  85.769     1.15   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 127.740 112.874    1.13   

ui test:

                   Name of Test                      orig     pr        pr    
                                                                        vs    
                                                                       orig   
                                                                    (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.306  0.110     2.77   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.455  0.163     2.79   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   13.381  4.325     3.09   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   21.851  7.118     3.07   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  39.975  13.114    3.05   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  67.006  21.298    3.15   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 90.362  29.465    3.07   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 129.637 47.743    2.72   

AVX2 test:

                   Name of Test                     orig   pr       pr    
                                                                    vs    
                                                                   orig   
                                                                (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)    0.019 0.009    2.11   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)    0.022 0.013    1.74   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   0.788 0.355    2.22   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   1.102 0.618    1.78   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  2.383 1.042    2.29   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  3.758 2.316    1.62   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 5.577 2.559    2.18   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 9.710 6.424    1.51   

A slight performance loss occurs because the check for whether $mag$ is nullptr is performed with every calculation, instead of being done once per batch. This is to reuse current SinCos_32f function.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov
Copy link
Copy Markdown
Contributor

My performance numbers for Muse Pi v 30 (gcc 14.2):

PolarToCart::PolarToCartFixture::(127x61, 32FC1) 	0.309 	0.052 	5.94
PolarToCart::PolarToCartFixture::(127x61, 64FC1) 	0.505 	0.063 	7.99
PolarToCart::PolarToCartFixture::(640x480, 32FC1) 	12.621 	1.972 	6.40
PolarToCart::PolarToCartFixture::(640x480, 64FC1) 	21.412 	2.737 	7.82
PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 	37.940 	5.991 	6.33
PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 	67.401 	8.152 	8.27
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 	84.716 	13.363 	6.34
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 	141.790 	17.983 	7.88 

@asmorkalov
Copy link
Copy Markdown
Contributor

@GenshinImpactStarts Thanks a lot for the contribution! I want to ask you to try UI branch for the function too. See cartToPolar32f_:

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

@asmorkalov Do you want me to turn on CV_SIMD_SCALABLE for cartToPolar32f_, or should I work on the UI for polarToCart?

@asmorkalov
Copy link
Copy Markdown
Contributor

Try to turn on for CV_SIMD_SCALABLE for cartToPolar32f_ and fix the implementation, if it's possible.

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

OK. I will work on this in another PR #27000 .

@fengyuentau
Copy link
Copy Markdown
Member

My performance results (K1 vs RK3568):

Geometric mean (ms)

                   Name of Test                       rk   patch-gcc patch-clang patch-gcc  patch-clang
                                                                                     vs         vs     
                                                                                     rk         rk     
                                                                                 (x-factor) (x-factor) 
PolarToCart::PolarToCartFixture::(127x61, 32FC1)    0.246    0.053      0.054       4.67       4.58    
PolarToCart::PolarToCartFixture::(127x61, 64FC1)    0.266    0.063      0.063       4.20       4.21    
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   10.193   1.951      1.994       5.22       5.11    
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   11.518   2.818      2.764       4.09       4.17    
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  30.418   5.911      6.097       5.15       4.99    
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  34.519   8.710      8.303       3.96       4.16    
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 67.175  13.529     13.866       4.97       4.84    
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 77.590  18.428     18.075       4.21       4.29 

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

UI implementation of sincos has been replaced with v_sincos from #25892. HAL implementation also improved. Comment and perf test updated.

Updated parts in comment:

HAL performance test

UPDATE: Current implementation is no more depending on vlen.

Vlen 256 (Muse Pi):

                   Name of Test                     scalar    ui     rvv       ui        rvv    
                                                                               vs         vs    
                                                                             scalar     scalar  
                                                                           (x-factor) (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.315  0.110  0.034     2.85       9.34   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.423  0.163  0.045     2.59       9.34   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   13.695  4.325  1.278     3.17      10.71   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   17.719  7.118  2.105     2.49       8.42   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  40.678  13.114 3.977     3.10      10.23   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  53.124  21.298 6.519     2.49       8.15   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 95.158  29.465 8.894     3.23      10.70   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 119.262 47.743 14.129    2.50       8.44   

Effect of Point 4

UPDATE: UI use v_sincos from #25892 now. The original implementation has AVX2 optimizations but is slower much than current UI so it's removed, and AVX2 perf test is below. Scalar implementation isn't changed because it's faster than using UI's method.

Test Result

scalar and ui test is done on Muse PI, and AVX2 test is done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz.

ui test:

                   Name of Test                      orig     pr        pr    
                                                                        vs    
                                                                       orig   
                                                                    (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.306  0.110     2.77   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.455  0.163     2.79   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   13.381  4.325     3.09   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   21.851  7.118     3.07   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  39.975  13.114    3.05   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  67.006  21.298    3.15   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 90.362  29.465    3.07   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 129.637 47.743    2.72   

AVX2 test:

                   Name of Test                     orig   pr       pr    
                                                                    vs    
                                                                   orig   
                                                                (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)    0.019 0.009    2.11   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)    0.022 0.013    1.74   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   0.788 0.355    2.22   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   1.102 0.618    1.78   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  2.383 1.042    2.29   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  3.758 2.316    1.62   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 5.577 2.559    2.18   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 9.710 6.424    1.51   

@fengyuentau
Copy link
Copy Markdown
Member

According to your results, UI works better than the AVX2 branch. It makes sense to drop it and keep the code clean.

@fengyuentau
Copy link
Copy Markdown
Member

@fengyuentau
Copy link
Copy Markdown
Member

Updated performance results (K1 vs RK3568):

                   Name of Test                       rk   k1-patch-gcc k1-patch-clang k1-patch-gcc k1-patch-clang
                                                                                            vs            vs
                                                                                            rk            rk
                                                                                        (x-factor)    (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)    0.245     0.035         0.035          6.94          7.09
PolarToCart::PolarToCartFixture::(127x61, 64FC1)    0.265     0.048         0.045          5.55          5.89
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   10.170    1.277         1.264          7.96          8.04
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   11.359    2.024         1.942          5.61          5.85
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  30.668    3.855         3.814          7.95          8.04
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  34.467    6.333         6.196          5.44          5.56
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 67.273    8.638         8.632          7.79          7.79
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 77.439    13.415        12.930         5.77          5.99

Impressive 👍

GenshinImpactStarts and others added 4 commits March 13, 2025 14:23
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
@asmorkalov
Copy link
Copy Markdown
Contributor

There is speedup on aarch64 (jetson orin):

PolarToCart::OCL_PolarToCartFixture::(640x480, 32FC1) 	2.295 	1.301 	1.76
PolarToCart::OCL_PolarToCartFixture::(640x480, 32FC4) 	9.395 	5.164 	1.82
PolarToCart::OCL_PolarToCartFixture::(1280x720, 32FC1) 	6.943 	3.872 	1.79
PolarToCart::OCL_PolarToCartFixture::(1280x720, 32FC4) 	28.141 	15.403 	1.83
PolarToCart::OCL_PolarToCartFixture::(1920x1080, 32FC1) 	15.707 	8.657 	1.81
PolarToCart::OCL_PolarToCartFixture::(1920x1080, 32FC4) 	63.444 	34.729 	1.83
PolarToCart::OCL_PolarToCartFixture::(3840x2160, 32FC1) 	63.401 	34.626 	1.83
PolarToCart::OCL_PolarToCartFixture::(3840x2160, 32FC4) 	252.696 	138.592 	1.82
PolarToCart::PolarToCartFixture::(127x61, 32FC1) 	0.057 	0.033 	1.75
PolarToCart::PolarToCartFixture::(127x61, 64FC1) 	0.062 	0.040 	1.57
PolarToCart::PolarToCartFixture::(640x480, 32FC1) 	2.289 	1.285 	1.78
PolarToCart::PolarToCartFixture::(640x480, 64FC1) 	2.794 	1.616 	1.73
PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 	6.939 	3.855 	1.80
PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 	8.291 	4.864 	1.70
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 	15.657 	8.671 	1.81
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 	19.421 	10.957 	1.77 

@asmorkalov asmorkalov merged commit 2090407 into opencv:4.x Mar 17, 2025
69 of 82 checks passed
@asmorkalov asmorkalov mentioned this pull request Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants