Add RISC-V HAL implementation for fastAtan32f/fastAtan64f by horror-proton · Pull Request #26853 · opencv/opencv

horror-proton · 2025-01-29T05:13:04Z

Current universal intrinsic implementation of fastAtan32f_ in mathfuncs_core.simd.hpp does not work for RVV (CV_SIMD_SCALABLE) so it falls back to scalar method.

This pull request adds an HAL implementation as a workaround allowing it to benefit form RVV vectorization.

Tested on Spacemit X60 cpu:

gcc version 14.2.1

         Name of Test            perf  perf     perf   
                                scalar  rvv     rvv    
                                                 vs    
                                                perf   
                                               scalar  
                                             (x-factor)
phase32f::VectorLength::128     0.007  0.002    3.20   
phase32f::VectorLength::1000    0.046  0.007    6.66   
phase32f::VectorLength::131072  5.995  0.740    8.10   
phase32f::VectorLength::524288  23.857 2.906    8.21   
phase32f::VectorLength::1048576 47.654 5.789    8.23   
phase64f::VectorLength::128     0.008  0.002    3.51   
phase64f::VectorLength::1000    0.056  0.009    6.47   
phase64f::VectorLength::131072  7.427  0.978    7.59   
phase64f::VectorLength::524288  30.726 3.937    7.80   
phase64f::VectorLength::1048576 59.783 7.739    7.72

clang version 19.1.7

         Name of Test            perf  perf     perf   
                                clang  clang   clang   
                                scalar  rvv     rvv    
                                                 vs    
                                                perf   
                                               clang   
                                               scalar  
                                             (x-factor)
phase32f::VectorLength::128     0.007  0.002    3.22   
phase32f::VectorLength::1000    0.047  0.007    6.36   
phase32f::VectorLength::131072  6.133  0.803    7.63   
phase32f::VectorLength::524288  24.542 3.249    7.55   
phase32f::VectorLength::1048576 49.033 6.489    7.56   
phase64f::VectorLength::128     0.008  0.002    3.81   
phase64f::VectorLength::1000    0.059  0.009    6.66   
phase64f::VectorLength::131072  7.788  1.008    7.72   
phase64f::VectorLength::524288  31.644 4.132    7.66   
phase64f::VectorLength::1048576 62.591 8.074    7.75

All of which were compiled with -O2 enabled.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

asmorkalov · 2025-01-29T06:30:59Z

3rdparty/hal_rvv/hal_rvv_1p0/atan.hpp

+namespace cv::cv_hal_rvv {
+
+namespace detail {
+// ref: mathfuncs_core.simd.hpp


Could you add reference to the computation method, name, formula, whatever to identify the approach? Constants meaning and other arithmetic details will be more obvious with such details.

This implementation didn't introduce any new math approach, the constants were simply polynomial coefficients copied from

opencv/modules/core/src/mathfuncs_core.simd.hpp

Lines 36 to 39 in 08a24ba

static const float atan2_p1 = 0.9997878412794807f*(float)(180/CV_PI);

static const float atan2_p3 = -0.3258083974640975f*(float)(180/CV_PI);

static const float atan2_p5 = 0.1555786518463281f*(float)(180/CV_PI);

static const float atan2_p7 = -0.04432655554792128f*(float)(180/CV_PI);

Presumably it's a 7th degree polynomial approximation for Arctan(x) where x in [0, 1], the optimal coefficients can be evaluated using Remez algorithm I guess?

Should I add this in my code comment, or add them to mathfuncs_core.simd.hpp

Suggested change

// ref: mathfuncs_core.simd.hpp

// ref: mathfuncs_core.simd.hpp, 7th degree polynomial form Remez algorithm?

vpisarev · 2025-01-29T15:20:39Z

@horror-proton, thank you!

@asmorkalov, @mshabunin, I don't quite like that we have many small .h files, but this is not a problem of this PR, it just follows the current style. Maybe we should revise and make more manageable.

Add RISC-V HAL implementation for cv::phase

8624165

asmorkalov reviewed Jan 29, 2025

View reviewed changes

asmorkalov added this to the 4.12.0 milestone Jan 29, 2025

asmorkalov added optimization platform: riscv labels Jan 29, 2025

asmorkalov requested a review from mshabunin January 29, 2025 06:33

vpisarev self-requested a review January 29, 2025 15:18

vpisarev approved these changes Jan 29, 2025

View reviewed changes

mshabunin approved these changes Jan 29, 2025

View reviewed changes

asmorkalov merged commit 2f58f82 into opencv:4.x Jan 31, 2025
29 of 30 checks passed

asmorkalov mentioned this pull request Feb 19, 2025

5.x merge 4.x #26939

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add RISC-V HAL implementation for fastAtan32f/fastAtan64f#26853

Add RISC-V HAL implementation for fastAtan32f/fastAtan64f#26853
asmorkalov merged 1 commit intoopencv:4.xfrom
horror-proton:rvv-fast-atan

horror-proton commented Jan 29, 2025

Uh oh!

asmorkalov Jan 29, 2025

Uh oh!

horror-proton Jan 29, 2025 •

edited

Loading

Uh oh!

vpisarev commented Jan 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	static const float atan2_p1 = 0.9997878412794807f*(float)(180/CV_PI);
	static const float atan2_p3 = -0.3258083974640975f*(float)(180/CV_PI);
	static const float atan2_p5 = 0.1555786518463281f*(float)(180/CV_PI);
	static const float atan2_p7 = -0.04432655554792128f*(float)(180/CV_PI);

	// ref: mathfuncs_core.simd.hpp
	// ref: mathfuncs_core.simd.hpp, 7th degree polynomial form Remez algorithm?

Uh oh!

Conversation

horror-proton commented Jan 29, 2025

Pull Request Readiness Checklist

Uh oh!

asmorkalov Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

horror-proton Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vpisarev commented Jan 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

horror-proton Jan 29, 2025 •

edited

Loading