Skip to content

[HAL RVV] impl magnitude | add perf test#27002

Merged
asmorkalov merged 3 commits intoopencv:4.xfrom
GenshinImpactStarts:magnitude
Mar 13, 2025
Merged

[HAL RVV] impl magnitude | add perf test#27002
asmorkalov merged 3 commits intoopencv:4.xfrom
GenshinImpactStarts:magnitude

Conversation

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor

@GenshinImpactStarts GenshinImpactStarts commented Mar 3, 2025

Implement through the existing cv_hal_magnitude32f and cv_hal_magnitude64f interfaces.

UPDATE: UI is enabled. The only difference between UI and HAL now is HAL use a approximate sqrt.

Perf test done on MUSE-PI.

$ opencv_test_core --gtest_filter="*Magnitude*"
$ opencv_perf_core --gtest_filter="*Magnitude*" --perf_min_samples=300 --perf_force_samples=300

Test result between enabled UI and HAL:

                 Name of Test                     ui    rvv      rvv    
                                                                  vs    
                                                                  ui    
                                                              (x-factor)
Magnitude::MagnitudeFixture::(127x61, 32FC1)    0.029  0.016     1.75   
Magnitude::MagnitudeFixture::(127x61, 64FC1)    0.057  0.036     1.57   
Magnitude::MagnitudeFixture::(640x480, 32FC1)   1.063  0.648     1.64   
Magnitude::MagnitudeFixture::(640x480, 64FC1)   2.261  1.530     1.48   
Magnitude::MagnitudeFixture::(1280x720, 32FC1)  3.261  2.118     1.54   
Magnitude::MagnitudeFixture::(1280x720, 64FC1)  6.802  4.682     1.45   
Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 7.287  4.738     1.54   
Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 15.226 10.334    1.47   

Test result before and after enabling UI:

                 Name of Test                    orig    pr       pr    
                                                                  vs    
                                                                 orig   
                                                              (x-factor)
Magnitude::MagnitudeFixture::(127x61, 32FC1)    0.032  0.029     1.11   
Magnitude::MagnitudeFixture::(127x61, 64FC1)    0.067  0.057     1.17   
Magnitude::MagnitudeFixture::(640x480, 32FC1)   1.228  1.063     1.16   
Magnitude::MagnitudeFixture::(640x480, 64FC1)   2.786  2.261     1.23   
Magnitude::MagnitudeFixture::(1280x720, 32FC1)  3.762  3.261     1.15   
Magnitude::MagnitudeFixture::(1280x720, 64FC1)  8.549  6.802     1.26   
Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 8.408  7.287     1.15   
Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 18.884 15.226    1.24   

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov
Copy link
Copy Markdown
Contributor

Performance for Muse Pi v30 (GCC 14.2):

Magnitude::MagnitudeFixture::(127x61, 32FC1) 	0.033 	0.029 	1.13
Magnitude::MagnitudeFixture::(127x61, 64FC1) 	0.068 	0.038 	1.80
Magnitude::MagnitudeFixture::(640x480, 32FC1) 	1.250 	1.118 	1.12
Magnitude::MagnitudeFixture::(640x480, 64FC1) 	2.922 	1.591 	1.84
Magnitude::MagnitudeFixture::(1280x720, 32FC1) 	3.881 	3.415 	1.14
Magnitude::MagnitudeFixture::(1280x720, 64FC1) 	8.767 	4.811 	1.82
Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 	8.563 	7.539 	1.14
Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 	19.435 	10.625 	1.83 

@asmorkalov
Copy link
Copy Markdown
Contributor

@fengyuentau Could you take a look too.

@asmorkalov
Copy link
Copy Markdown
Contributor

@GenshinImpactStarts Could you rebase and fix conflicts.

@asmorkalov
Copy link
Copy Markdown
Contributor

@fengyuentau please take a look too.

@asmorkalov
Copy link
Copy Markdown
Contributor

I found out again, that we have magnitude32f and magnitude64f in mathfuncs_core.sing.hpp implemented with UI, but not enabled for scalable intrinsics. Makes sense to add the option there and benchmark it too.

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

This patch can be benefit from PR #27015. I think we can wait #27015 to be merged first and then back to this.

@fengyuentau

This comment was marked as outdated.

auto vx = __riscv_vle32_v_f32m4(x, vl);
auto vy = __riscv_vle32_v_f32m4(y, vl);

auto vmag = __riscv_vfsqrt(__riscv_vfmadd(vx, vx, __riscv_vfmul(vy, vy, vl), vl), vl);
Copy link
Copy Markdown
Member

@fengyuentau fengyuentau Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you can try the same as in #27015.

@fengyuentau
Copy link
Copy Markdown
Member

Please resolve conflicts.


My performance results (K1 vs RK3568):

Geometric mean (ms)

                 Name of Test                     rk   patch-gcc patch-clang patch-gcc  patch-clang
                                                                                 vs         vs     
                                                                                 rk         rk     
                                                                             (x-factor) (x-factor) 
Magnitude::MagnitudeFixture::(127x61, 32FC1)    0.036    0.029      0.029       1.24       1.24    
Magnitude::MagnitudeFixture::(127x61, 64FC1)    0.064    0.038      0.038       1.69       1.67    
Magnitude::MagnitudeFixture::(640x480, 32FC1)   1.143    1.075      1.060       1.06       1.08    
Magnitude::MagnitudeFixture::(640x480, 64FC1)   2.707    1.571      1.506       1.72       1.80    
Magnitude::MagnitudeFixture::(1280x720, 32FC1)  3.178    3.339      3.280       0.95       0.97    
Magnitude::MagnitudeFixture::(1280x720, 64FC1)  8.922    4.740      4.548       1.88       1.96    
Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 6.949    7.482      7.311       0.93       0.95    
Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 20.014  10.562      9.887       1.89       2.02 

@fengyuentau
Copy link
Copy Markdown
Member

This patch can be benefit from PR #27015. I think we can wait #27015 to be merged first and then back to this.

#27015 is just merged.

GenshinImpactStarts and others added 2 commits March 12, 2025 07:38
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

UI is enabled. Now the only difference between UI and HAL is HAL use a approximation of sqrt. If HAL use __riscv_vfsqrt, their performance would be same. Perf test is updated in the first comment.

@fengyuentau
Copy link
Copy Markdown
Member

My performance results (K1 vs. RK3568):

                 Name of Test                     rk   patch-gcc patch-clang patch-gcc  patch-clang
                                                                                 vs         vs
                                                                                 rk         rk
                                                                             (x-factor) (x-factor)
Magnitude::MagnitudeFixture::(127x61, 32FC1)    0.036    0.018      0.018       2.05       2.02
Magnitude::MagnitudeFixture::(127x61, 64FC1)    0.064    0.037      0.038       1.73       1.66
Magnitude::MagnitudeFixture::(640x480, 32FC1)   1.180    0.656      0.652       1.80       1.81
Magnitude::MagnitudeFixture::(640x480, 64FC1)   2.808    1.470      1.514       1.91       1.86
Magnitude::MagnitudeFixture::(1280x720, 32FC1)  3.309    2.004      2.011       1.65       1.65
Magnitude::MagnitudeFixture::(1280x720, 64FC1)  7.521    4.605      4.629       1.63       1.62
Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 7.051    4.420      4.454       1.60       1.58
Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 16.948   9.823     10.024       1.73       1.69

Nice 👍

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

The modification of sqrt leads to 1~2% regression on MUSE-PI while 1~2% improvement on CanMV K230. I think this is negligible and have no need to do perf test again.

I apologize for the repeated CI failures on the same test case. Some tests require reading local files, but I didn’t know where to get the files used in CI, so I had to find corner cases myself, which wasn’t always comprehensive. As a result, the tests kept failing. I’m sorry for wasting CI resources.

Now that I know where to get the files, I’ve tested the two failing projects locally, and this shouldn’t happen again.

@asmorkalov asmorkalov merged commit e30697f into opencv:4.x Mar 13, 2025
25 of 28 checks passed
basavaraj2711 added a commit to basavaraj2711/opencv that referenced this pull request Mar 13, 2025
Merge pull request opencv#27002 from GenshinImpactStarts:magnitude
@GenshinImpactStarts GenshinImpactStarts deleted the magnitude branch March 13, 2025 14:21
@asmorkalov asmorkalov mentioned this pull request Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants