Skip to content

[HAL RVV] impl exp and log | add log perf test#27010

Merged
asmorkalov merged 2 commits intoopencv:4.xfrom
GenshinImpactStarts:exp_log
Mar 11, 2025
Merged

[HAL RVV] impl exp and log | add log perf test#27010
asmorkalov merged 2 commits intoopencv:4.xfrom
GenshinImpactStarts:exp_log

Conversation

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor

@GenshinImpactStarts GenshinImpactStarts commented Mar 4, 2025

Implement through the existing interfaces cv_hal_exp32f, cv_hal_exp64f, cv_hal_log32f, cv_hal_log64f.

Perf test done on MUSE-PI. Because the performance of scalar is much worse than universal intrinsic, only ui and hal rvv is compared.

$ opencv_test_core --gtest_filter="Core_HAL/mathfuncs.*:Core_Exp/ElemWiseTest*:Core_Log/ElemWiseTest*"
$ opencv_perf_core --gtest_filter="ExpFixture_Exp.*:LogFixture_Log.*" --perf_min_samples=300 --perf_force_samples=300
           Name of Test               ui    rvv      rvv    
                                                      vs    
                                                      ui    
                                                  (x-factor)
Exp::ExpFixture::(127x61, 32FC1)    0.036  0.028     1.32   
Exp::ExpFixture::(127x61, 64FC1)    0.089  0.052     1.71   
Exp::ExpFixture::(640x480, 32FC1)   1.308  0.949     1.38   
Exp::ExpFixture::(640x480, 64FC1)   3.494  1.973     1.77   
Exp::ExpFixture::(1280x720, 32FC1)  4.032  2.886     1.40   
Exp::ExpFixture::(1280x720, 64FC1)  10.588 6.069     1.74   
Exp::ExpFixture::(1920x1080, 32FC1) 9.071  6.507     1.39   
Exp::ExpFixture::(1920x1080, 64FC1) 23.710 13.524    1.75   
Log::LogFixture::(127x61, 32FC1)    0.049  0.037     1.33   
Log::LogFixture::(127x61, 64FC1)    0.160  0.075     2.13   
Log::LogFixture::(640x480, 32FC1)   1.809  1.279     1.41   
Log::LogFixture::(640x480, 64FC1)   6.315  2.904     2.17   
Log::LogFixture::(1280x720, 32FC1)  5.540  3.958     1.40   
Log::LogFixture::(1280x720, 64FC1)  19.098 8.862     2.16   
Log::LogFixture::(1920x1080, 32FC1) 12.512 8.956     1.40   
Log::LogFixture::(1920x1080, 64FC1) 42.924 19.883    2.16   

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov
Copy link
Copy Markdown
Contributor

Related: #26886

@asmorkalov
Copy link
Copy Markdown
Contributor

There is UI implementation for log:

void log32f( const float *_x, float *y, int n )
. I propose to enable it first for RVV and then benchmark HAL. More generic implementation with UI preferable, if it produces similar performance. Also, you use copy of table with pre-computed values.

@asmorkalov asmorkalov added this to the 4.12.0 milestone Mar 5, 2025
@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

GenshinImpactStarts commented Mar 5, 2025

Actually, I just wrote it according to the code here. And I mean, there doesn’t seem to be a UI for RVV, because it doesn’t allow CV_SIMD_SCALABLE. Do you mean that I should first adapt the UI for RVV?

@asmorkalov
Copy link
Copy Markdown
Contributor

I should first adapt the UI for RVV

Yes, it should not be hard, IMHO.

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

@asmorkalov I will do it. And I found a logical error in my implementation of exp, which caused the CI to fail. This error only occurs when the input is large enough, and that's why it passes all tests related to exp in the core module (Core_HAL/mathfuncs.:Core_Exp/ElemWiseTest). So, should I expand the test range? However, this may introduce more issues due to precision.

@asmorkalov
Copy link
Copy Markdown
Contributor

Let's try.

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

Test on MUSE-PI after enable UI:

           Name of Test               ui    rvv      rvv    
                                                      vs    
                                                      ui    
                                                  (x-factor)
Exp::ExpFixture::(127x61, 32FC1)    0.036  0.034     1.06   
Exp::ExpFixture::(127x61, 64FC1)    0.091  0.064     1.41   
Exp::ExpFixture::(640x480, 32FC1)   1.306  1.202     1.09   
Exp::ExpFixture::(640x480, 64FC1)   3.539  2.478     1.43   
Exp::ExpFixture::(1280x720, 32FC1)  3.977  3.636     1.09   
Exp::ExpFixture::(1280x720, 64FC1)  10.655 7.462     1.43   
Exp::ExpFixture::(1920x1080, 32FC1) 8.948  8.177     1.09   
Exp::ExpFixture::(1920x1080, 64FC1) 23.991 16.761    1.43   
Log::LogFixture::(127x61, 32FC1)    0.049  0.036     1.38   
Log::LogFixture::(127x61, 64FC1)    0.162  0.074     2.19   
Log::LogFixture::(640x480, 32FC1)   1.810  1.254     1.44   
Log::LogFixture::(640x480, 64FC1)   6.376  2.797     2.28   
Log::LogFixture::(1280x720, 32FC1)  5.547  3.831     1.45   
Log::LogFixture::(1280x720, 64FC1)  19.239 8.452     2.28   
Log::LogFixture::(1920x1080, 32FC1) 12.475 8.626     1.45   
Log::LogFixture::(1920x1080, 64FC1) 43.220 19.017    2.27   

@asmorkalov
Copy link
Copy Markdown
Contributor

The new numbers generally say that the patch brings significant speedup for doubles. OpenCV uses std::exp and std::log for doubles, so the speedup is expected there.

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

Rebase the branch. Fix CV_SIMD_SCALABLE_64F and potential bug in log. A small improvement is done in exp. Perf result updated.

GenshinImpactStarts and others added 2 commits March 7, 2025 17:11
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
@fengyuentau
Copy link
Copy Markdown
Member

My performance results (K1 vs RK3568):

           Name of Test               rk   patch-gcc patch-clang patch-gcc  patch-clang
                                                                     vs         vs     
                                                                     rk         rk     
                                                                 (x-factor) (x-factor) 
Exp::ExpFixture::(127x61, 32FC1)    0.042    0.029      0.028       1.48       1.49    
Exp::ExpFixture::(127x61, 64FC1)    0.127    0.053      0.054       2.40       2.36    
Exp::ExpFixture::(640x480, 32FC1)   1.928    0.951      0.951       2.03       2.03    
Exp::ExpFixture::(640x480, 64FC1)   5.352    1.967      1.961       2.72       2.73    
Exp::ExpFixture::(1280x720, 32FC1)  5.829    2.867      2.852       2.03       2.04    
Exp::ExpFixture::(1280x720, 64FC1)  16.040   5.930      5.925       2.70       2.71    
Exp::ExpFixture::(1920x1080, 32FC1) 12.681   6.436      6.421       1.97       1.98    
Exp::ExpFixture::(1920x1080, 64FC1) 36.081  13.289     13.232       2.72       2.73    
Log::LogFixture::(127x61, 32FC1)    0.065    0.038      0.036       1.71       1.79    
Log::LogFixture::(127x61, 64FC1)    0.162    0.076      0.076       2.14       2.15    
Log::LogFixture::(640x480, 32FC1)   2.754    1.274      1.267       2.16       2.17    
Log::LogFixture::(640x480, 64FC1)   6.510    2.853      2.864       2.28       2.27    
Log::LogFixture::(1280x720, 32FC1)  8.287    3.858      3.857       2.15       2.15    
Log::LogFixture::(1280x720, 64FC1)  19.493   8.555      8.771       2.28       2.22    
Log::LogFixture::(1920x1080, 32FC1) 17.966   8.670      8.690       2.07       2.07    
Log::LogFixture::(1920x1080, 64FC1) 43.837  19.227     19.300       2.28       2.27 

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

@asmorkalov @fengyuentau I found a potential speedup in intrin_math.hpp. I will try it and than return back. If it's also faster than current UI, should I also replace it in this PR?

@fengyuentau
Copy link
Copy Markdown
Member

If it's also faster than current UI, should I also replace it in this PR?

I think it is a different topic and worths a different PR.

@asmorkalov asmorkalov merged commit 4be88e9 into opencv:4.x Mar 11, 2025
28 checks passed
@asmorkalov asmorkalov mentioned this pull request Mar 11, 2025
@GenshinImpactStarts GenshinImpactStarts deleted the exp_log branch March 12, 2025 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants