Skip to content

HAL: implemented cv_hal_div* and cv_hal_recip* in hal_rvv#27175

Merged
asmorkalov merged 2 commits intoopencv:4.xfrom
fengyuentau:4x/hal_rvv/div_recip
Apr 7, 2025
Merged

HAL: implemented cv_hal_div* and cv_hal_recip* in hal_rvv#27175
asmorkalov merged 2 commits intoopencv:4.xfrom
fengyuentau:4x/hal_rvv/div_recip

Conversation

@fengyuentau
Copy link
Copy Markdown
Member

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@fengyuentau

This comment was marked as outdated.

@asmorkalov asmorkalov self-assigned this Mar 31, 2025
@asmorkalov
Copy link
Copy Markdown
Contributor

My performance numbers fort Muse Pi v 30 (gcc 14.2)
perf-div-recip.zip

@asmorkalov
Copy link
Copy Markdown
Contributor

There are some accuracy failures on CI. Please take a look.

@fengyuentau
Copy link
Copy Markdown
Member Author

Updated performance results (K1):

Details
                Name of Test                 base-gcc patch-gcc patch-gcc 
                                                                    vs    
                                                                 base-gcc 
                                                                (x-factor)
divide::BinaryOpTest::(640x480, 8UC1)         1.336     1.204      1.11   
divide::BinaryOpTest::(640x480, 8SC1)         1.292     1.207      1.07   
divide::BinaryOpTest::(640x480, 16SC1)        1.255     1.168      1.07   
divide::BinaryOpTest::(640x480, 32SC1)        1.277     1.265      1.01   
divide::BinaryOpTest::(640x480, 32FC1)        1.059     1.172      0.90   
divide::BinaryOpTest::(640x480, 16SC2)        2.540     2.371      1.07   
divide::BinaryOpTest::(640x480, 8UC3)         4.020     3.704      1.09   
divide::BinaryOpTest::(640x480, 16SC3)        3.810     3.546      1.07   
divide::BinaryOpTest::(640x480, 8UC4)         5.338     4.860      1.10   
divide::BinaryOpTest::(640x480, 16SC4)        5.084     4.723      1.08   
divide::BinaryOpTest::(1280x720, 8UC1)        4.014     3.634      1.10   
divide::BinaryOpTest::(1280x720, 8SC1)        3.874     3.660      1.06   
divide::BinaryOpTest::(1280x720, 16SC1)       3.830     3.636      1.05   
divide::BinaryOpTest::(1280x720, 32SC1)       3.882     3.699      1.05   
divide::BinaryOpTest::(1280x720, 32FC1)       3.291     3.430      0.96   
divide::BinaryOpTest::(1280x720, 16SC2)       7.676     7.280      1.05   
divide::BinaryOpTest::(1280x720, 8UC3)        11.984   11.002      1.09   
divide::BinaryOpTest::(1280x720, 16SC3)       11.407   10.747      1.06   
divide::BinaryOpTest::(1280x720, 8UC4)        15.962   14.704      1.09   
divide::BinaryOpTest::(1280x720, 16SC4)       15.215   14.323      1.06   
divide::BinaryOpTest::(1920x1080, 8UC1)       8.995     8.259      1.09   
divide::BinaryOpTest::(1920x1080, 8SC1)       8.680     8.264      1.05   
divide::BinaryOpTest::(1920x1080, 16SC1)      8.625     8.166      1.06   
divide::BinaryOpTest::(1920x1080, 32SC1)      8.680     8.176      1.06   
divide::BinaryOpTest::(1920x1080, 32FC1)      7.271     7.582      0.96   
divide::BinaryOpTest::(1920x1080, 16SC2)      17.201   16.270      1.06   
divide::BinaryOpTest::(1920x1080, 8UC3)       26.917   24.750      1.09   
divide::BinaryOpTest::(1920x1080, 16SC3)      25.579   24.171      1.06   
divide::BinaryOpTest::(1920x1080, 8UC4)       35.917   33.016      1.09   
divide::BinaryOpTest::(1920x1080, 16SC4)      34.086   32.213      1.06   
reciprocal::BinaryOpTest::(640x480, 8UC1)     1.230     0.042     29.44   
reciprocal::BinaryOpTest::(640x480, 8SC1)     1.214     0.042     28.99   
reciprocal::BinaryOpTest::(640x480, 16SC1)    1.149     0.081     14.10   
reciprocal::BinaryOpTest::(640x480, 32SC1)    1.097     0.163      6.71   
reciprocal::BinaryOpTest::(640x480, 32FC1)    0.958     0.971      0.99   
reciprocal::BinaryOpTest::(640x480, 16SC2)    2.304     0.164     14.02   
reciprocal::BinaryOpTest::(640x480, 8UC3)     3.692     0.122     30.22   
reciprocal::BinaryOpTest::(640x480, 16SC3)    3.474     0.246     14.14   
reciprocal::BinaryOpTest::(640x480, 8UC4)     4.922     0.165     29.90   
reciprocal::BinaryOpTest::(640x480, 16SC4)    4.652     0.325     14.32   
reciprocal::BinaryOpTest::(1280x720, 8UC1)    3.696     0.111     33.22   
reciprocal::BinaryOpTest::(1280x720, 8SC1)    3.632     0.121     29.96   
reciprocal::BinaryOpTest::(1280x720, 16SC1)   3.474     0.244     14.22   
reciprocal::BinaryOpTest::(1280x720, 32SC1)   3.329     0.490      6.79   
reciprocal::BinaryOpTest::(1280x720, 32FC1)   2.932     3.007      0.98   
reciprocal::BinaryOpTest::(1280x720, 16SC2)   7.083     0.490     14.44   
reciprocal::BinaryOpTest::(1280x720, 8UC3)    11.108    0.366     30.36   
reciprocal::BinaryOpTest::(1280x720, 16SC3)   10.738    0.733     14.65   
reciprocal::BinaryOpTest::(1280x720, 8UC4)    14.779    0.491     30.11   
reciprocal::BinaryOpTest::(1280x720, 16SC4)   14.287    0.974     14.67   
reciprocal::BinaryOpTest::(1920x1080, 8UC1)   8.296     0.276     30.08   
reciprocal::BinaryOpTest::(1920x1080, 8SC1)   8.169     0.276     29.62   
reciprocal::BinaryOpTest::(1920x1080, 16SC1)  7.939     0.551     14.41   
reciprocal::BinaryOpTest::(1920x1080, 32SC1)  7.488     1.066      7.03   
reciprocal::BinaryOpTest::(1920x1080, 32FC1)  6.607     6.805      0.97   
reciprocal::BinaryOpTest::(1920x1080, 16SC2)  16.071    1.096     14.66   
reciprocal::BinaryOpTest::(1920x1080, 8UC3)   24.936    0.820     30.40   
reciprocal::BinaryOpTest::(1920x1080, 16SC3)  23.863    1.644     14.52   
reciprocal::BinaryOpTest::(1920x1080, 8UC4)   33.272    1.096     30.37   
reciprocal::BinaryOpTest::(1920x1080, 16SC4)  32.064    2.161     14.84

More: perf-div_recip.zip

@asmorkalov asmorkalov merged commit 8185925 into opencv:4.x Apr 7, 2025
52 of 55 checks passed
@fengyuentau fengyuentau deleted the 4x/hal_rvv/div_recip branch April 8, 2025 05:52
@asmorkalov asmorkalov mentioned this pull request Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants