Skip to content

Enable SIMD_SCALABLE for exp and sqrt#26886

Merged
asmorkalov merged 6 commits intoopencv:4.xfrom
sk1er52:feature/exp64f
Feb 21, 2025
Merged

Enable SIMD_SCALABLE for exp and sqrt#26886
asmorkalov merged 6 commits intoopencv:4.xfrom
sk1er52:feature/exp64f

Conversation

@sk1er52
Copy link
Copy Markdown
Contributor

@sk1er52 sk1er52 commented Feb 7, 2025

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
CPU - Banana Pi k1, compiler - clang 18.1.4
Geometric mean (ms)

              Name of Test               baseline  hal     ui      hal         ui    
                                                                    vs         vs
                                                                 baseline   baseline
                                                                (x-factor) (x-factor)
Exp::ExpFixture::(127x61, 32FC1)          0.358     --   0.033      --       10.70   
Exp::ExpFixture::(640x480, 32FC1)         14.304    --   1.167      --       12.26   
Exp::ExpFixture::(1280x720, 32FC1)        42.785    --   3.538      --       12.09
Exp::ExpFixture::(1920x1080, 32FC1)       96.206    --   7.927      --       12.14   
Exp::ExpFixture::(127x61, 64FC1)          0.433   0.050  0.098     8.59       4.40   
Exp::ExpFixture::(640x480, 64FC1)         17.315  1.935  3.813     8.95       4.54   
Exp::ExpFixture::(1280x720, 64FC1)        52.181  5.877  11.519    8.88       4.53   
Exp::ExpFixture::(1920x1080, 64FC1)      117.082  13.157 25.854    8.90       4.53

Additionally, this PR brings Sqrt optimization with UI:

Geometric mean (ms)

              Name of Test                     baseline    ui       ui    
                                                                    vs
                                                                 baseline
                                                                (x-factor)
Sqrt::SqrtFixture::(127x61, 5, false)            0.111   0.027     4.11   
Sqrt::SqrtFixture::(127x61, 6, false)            0.149   0.053     2.82   
Sqrt::SqrtFixture::(640x480, 5, false)           4.374   0.967     4.52   
Sqrt::SqrtFixture::(640x480, 6, false)           5.885   2.046     2.88   
Sqrt::SqrtFixture::(1280x720, 5, false)          12.960  2.915     4.45   
Sqrt::SqrtFixture::(1280x720, 6, false)          17.648  6.107     2.89   
Sqrt::SqrtFixture::(1920x1080, 5, false)         29.178  6.524     4.47   
Sqrt::SqrtFixture::(1920x1080, 6, false)         39.709  13.670    2.90   

Reference
Muller, J.-M. Elementary Functions: Algorithms and Implementation. 2nd ed. Boston: Birkhäuser, 2006.
https://www.springer.com/gp/book/9780817643720

@dkurt dkurt marked this pull request as draft February 7, 2025 13:34
@dkurt dkurt self-assigned this Feb 7, 2025
@asmorkalov asmorkalov requested a review from vpisarev February 7, 2025 14:13
@asmorkalov
Copy link
Copy Markdown
Contributor

We have v_exp universal intrinsic introduced in #25881. I propose to integrated it to core first.

dkurt and others added 5 commits February 20, 2025 18:43
Build with Syntacore Clang

Run on self-hosted

Disable OpenCL

summary

Run baseline

filter from PR body
@dkurt dkurt marked this pull request as ready for review February 20, 2025 15:59
@dkurt dkurt marked this pull request as draft February 20, 2025 15:59
@dkurt
Copy link
Copy Markdown
Member

dkurt commented Feb 20, 2025

By changing m8 to m4 performance does not change significantly:

Geometric mean (ms)

              Name of Test                hal    hal      hal    
                                           m8     m4       m4
                                                           vs
                                                          hal
                                                           m8
                                                       (x-factor)
Exp::ExpFixture::(127x61, 64FC1)         0.050  0.051     0.99
Exp::ExpFixture::(640x480, 64FC1)        1.935  1.979     0.98
Exp::ExpFixture::(1280x720, 64FC1)       5.877  5.979     0.98   
Exp::ExpFixture::(1920x1080, 64FC1)      13.157 13.294    0.99

So the main question in algo part. Despite tests passed, vfmacc remain a critical part because the order of accumulation may give different results.

@sk1er52
Copy link
Copy Markdown
Contributor Author

sk1er52 commented Feb 20, 2025

regarding the algorithmic part:
I'm still studying the algorithm from the book Muller J. M., Muller J. M. Elementary functions, I think I'll fix it soon

@sk1er52 sk1er52 closed this Feb 20, 2025
@sk1er52 sk1er52 reopened this Feb 20, 2025
Copy link
Copy Markdown
Member

@dkurt dkurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sk1er52, to give you a credit for a done work, I recommend to merge this PR with v_exp universal intrinsics enabled. Let's keep this PR or separate issue for the future discussion. Potential x2 performance improvement should be analyzed more precisely.

@dkurt
Copy link
Copy Markdown
Member

dkurt commented Feb 20, 2025

I'm still studying the algorithm from the book Muller J. M., Muller J. M. Elementary functions, I think I'll fix it soon

Please add a link to the article in PR description

@dkurt dkurt changed the title [HAL] exp64f rvv1.0 Enable SIMD_SCALABLE for exp and sqrt Feb 20, 2025
@fengyuentau fengyuentau self-requested a review February 20, 2025 17:42
@dkurt dkurt marked this pull request as ready for review February 21, 2025 06:48
@asmorkalov asmorkalov added this to the 4.12.0 milestone Feb 21, 2025
@asmorkalov
Copy link
Copy Markdown
Contributor

asmorkalov commented Feb 21, 2025

My perf results for Muse Pi v 30 (GCC 14.2):

Exp::ExpFixture::(127x61, 32FC1) 	0.310 	0.038 	8.20
Exp::ExpFixture::(127x61, 64FC1) 	0.368 	0.091 	4.03
Exp::ExpFixture::(640x480, 32FC1) 	12.412 	1.324 	9.37
Exp::ExpFixture::(640x480, 64FC1) 	15.071 	3.664 	4.11
Exp::ExpFixture::(1280x720, 32FC1) 	37.106 	3.999 	9.28
Exp::ExpFixture::(1280x720, 64FC1) 	45.205 	10.493 	4.31
Exp::ExpFixture::(1920x1080, 32FC1) 	83.658 	8.976 	9.32
Exp::ExpFixture::(1920x1080, 64FC1) 	100.830 	23.518 	4.29 
Sqrt::SqrtFixture::(127x61, 5, false) 	0.182 	0.028 	6.60
Sqrt::SqrtFixture::(127x61, 5, true) 	0.050 	0.052 	0.98
Sqrt::SqrtFixture::(127x61, 6, false) 	0.220 	0.053 	4.12
Sqrt::SqrtFixture::(127x61, 6, true) 	0.102 	0.102 	0.99
Sqrt::SqrtFixture::(640x480, 5, false) 	7.328 	0.967 	7.58
Sqrt::SqrtFixture::(640x480, 5, true) 	1.883 	1.904 	0.99
Sqrt::SqrtFixture::(640x480, 6, false) 	9.098 	2.246 	4.05
Sqrt::SqrtFixture::(640x480, 6, true) 	3.978 	4.217 	0.94
Sqrt::SqrtFixture::(1280x720, 5, false) 	22.062 	2.921 	7.55
Sqrt::SqrtFixture::(1280x720, 5, true) 	5.691 	5.753 	0.99
Sqrt::SqrtFixture::(1280x720, 6, false) 	27.224 	6.110 	4.46
Sqrt::SqrtFixture::(1280x720, 6, true) 	11.999 	12.023 	1.00
Sqrt::SqrtFixture::(1920x1080, 5, false) 	50.162 	6.575 	7.63
Sqrt::SqrtFixture::(1920x1080, 5, true) 	12.810 	12.946 	0.99
Sqrt::SqrtFixture::(1920x1080, 6, false) 	61.132 	13.722 	4.45
Sqrt::SqrtFixture::(1920x1080, 6, true) 	26.906 	27.025 	1.00 

@asmorkalov asmorkalov merged commit b5f5540 into opencv:4.x Feb 21, 2025
28 checks passed
NanQin555 pushed a commit to NanQin555/opencv that referenced this pull request Feb 24, 2025
Enable SIMD_SCALABLE for exp and sqrt opencv#26886

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
```
CPU - Banana Pi k1, compiler - clang 18.1.4
```
```
Geometric mean (ms)

              Name of Test               baseline  hal     ui      hal         ui    
                                                                    vs         vs
                                                                 baseline   baseline
                                                                (x-factor) (x-factor)
Exp::ExpFixture::(127x61, 32FC1)          0.358     --   0.033      --       10.70   
Exp::ExpFixture::(640x480, 32FC1)         14.304    --   1.167      --       12.26   
Exp::ExpFixture::(1280x720, 32FC1)        42.785    --   3.538      --       12.09
Exp::ExpFixture::(1920x1080, 32FC1)       96.206    --   7.927      --       12.14   
Exp::ExpFixture::(127x61, 64FC1)          0.433   0.050  0.098     8.59       4.40   
Exp::ExpFixture::(640x480, 64FC1)         17.315  1.935  3.813     8.95       4.54   
Exp::ExpFixture::(1280x720, 64FC1)        52.181  5.877  11.519    8.88       4.53   
Exp::ExpFixture::(1920x1080, 64FC1)      117.082  13.157 25.854    8.90       4.53
```
Additionally, this PR brings Sqrt optimization with UI:
```
Geometric mean (ms)

              Name of Test                     baseline    ui       ui    
                                                                    vs
                                                                 baseline
                                                                (x-factor)
Sqrt::SqrtFixture::(127x61, 5, false)            0.111   0.027     4.11   
Sqrt::SqrtFixture::(127x61, 6, false)            0.149   0.053     2.82   
Sqrt::SqrtFixture::(640x480, 5, false)           4.374   0.967     4.52   
Sqrt::SqrtFixture::(640x480, 6, false)           5.885   2.046     2.88   
Sqrt::SqrtFixture::(1280x720, 5, false)          12.960  2.915     4.45   
Sqrt::SqrtFixture::(1280x720, 6, false)          17.648  6.107     2.89   
Sqrt::SqrtFixture::(1920x1080, 5, false)         29.178  6.524     4.47   
Sqrt::SqrtFixture::(1920x1080, 6, false)         39.709  13.670    2.90   
```

Reference
Muller, J.-M. Elementary Functions: Algorithms and Implementation. 2nd ed. Boston: Birkhäuser, 2006.
https://www.springer.com/gp/book/9780817643720
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants