Skip to content

Add RISC-V HAL implementation for fastAtan32f/fastAtan64f#26853

Merged
asmorkalov merged 1 commit intoopencv:4.xfrom
horror-proton:rvv-fast-atan
Jan 31, 2025
Merged

Add RISC-V HAL implementation for fastAtan32f/fastAtan64f#26853
asmorkalov merged 1 commit intoopencv:4.xfrom
horror-proton:rvv-fast-atan

Conversation

@horror-proton
Copy link
Copy Markdown
Contributor

Current universal intrinsic implementation of fastAtan32f_ in mathfuncs_core.simd.hpp does not work for RVV (CV_SIMD_SCALABLE) so it falls back to scalar method.

This pull request adds an HAL implementation as a workaround allowing it to benefit form RVV vectorization.

Tested on Spacemit X60 cpu:

gcc version 14.2.1

         Name of Test            perf  perf     perf   
                                scalar  rvv     rvv    
                                                 vs    
                                                perf   
                                               scalar  
                                             (x-factor)
phase32f::VectorLength::128     0.007  0.002    3.20   
phase32f::VectorLength::1000    0.046  0.007    6.66   
phase32f::VectorLength::131072  5.995  0.740    8.10   
phase32f::VectorLength::524288  23.857 2.906    8.21   
phase32f::VectorLength::1048576 47.654 5.789    8.23   
phase64f::VectorLength::128     0.008  0.002    3.51   
phase64f::VectorLength::1000    0.056  0.009    6.47   
phase64f::VectorLength::131072  7.427  0.978    7.59   
phase64f::VectorLength::524288  30.726 3.937    7.80   
phase64f::VectorLength::1048576 59.783 7.739    7.72 

clang version 19.1.7

         Name of Test            perf  perf     perf   
                                clang  clang   clang   
                                scalar  rvv     rvv    
                                                 vs    
                                                perf   
                                               clang   
                                               scalar  
                                             (x-factor)
phase32f::VectorLength::128     0.007  0.002    3.22   
phase32f::VectorLength::1000    0.047  0.007    6.36   
phase32f::VectorLength::131072  6.133  0.803    7.63   
phase32f::VectorLength::524288  24.542 3.249    7.55   
phase32f::VectorLength::1048576 49.033 6.489    7.56   
phase64f::VectorLength::128     0.008  0.002    3.81   
phase64f::VectorLength::1000    0.059  0.009    6.66   
phase64f::VectorLength::131072  7.788  1.008    7.72   
phase64f::VectorLength::524288  31.644 4.132    7.66   
phase64f::VectorLength::1048576 62.591 8.074    7.75 

All of which were compiled with -O2 enabled.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

namespace cv::cv_hal_rvv {

namespace detail {
// ref: mathfuncs_core.simd.hpp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add reference to the computation method, name, formula, whatever to identify the approach? Constants meaning and other arithmetic details will be more obvious with such details.

Copy link
Copy Markdown
Contributor Author

@horror-proton horror-proton Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation didn't introduce any new math approach, the constants were simply polynomial coefficients copied from

static const float atan2_p1 = 0.9997878412794807f*(float)(180/CV_PI);
static const float atan2_p3 = -0.3258083974640975f*(float)(180/CV_PI);
static const float atan2_p5 = 0.1555786518463281f*(float)(180/CV_PI);
static const float atan2_p7 = -0.04432655554792128f*(float)(180/CV_PI);

Presumably it's a 7th degree polynomial approximation for Arctan(x) where x in [0, 1], the optimal coefficients can be evaluated using Remez algorithm I guess?

Should I add this in my code comment, or add them to mathfuncs_core.simd.hpp

Suggested change
// ref: mathfuncs_core.simd.hpp
// ref: mathfuncs_core.simd.hpp, 7th degree polynomial form Remez algorithm?

@vpisarev
Copy link
Copy Markdown
Contributor

@horror-proton, thank you!

@asmorkalov, @mshabunin, I don't quite like that we have many small .h files, but this is not a problem of this PR, it just follows the current style. Maybe we should revise and make more manageable.

@asmorkalov asmorkalov merged commit 2f58f82 into opencv:4.x Jan 31, 2025
29 of 30 checks passed
@asmorkalov asmorkalov mentioned this pull request Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants