Skip to content

hal/riscv-rvv: further optimize div#27348

Merged
asmorkalov merged 1 commit intoopencv:4.xfrom
fengyuentau:4x/hal/riscv_rvv/faster_div_f32
May 23, 2025
Merged

hal/riscv-rvv: further optimize div#27348
asmorkalov merged 1 commit intoopencv:4.xfrom
fengyuentau:4x/hal/riscv_rvv/faster_div_f32

Conversation

@fengyuentau
Copy link
Copy Markdown
Member

Previous optimization on div in hal/riscv-rvv: #27175

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@fengyuentau
Copy link
Copy Markdown
Member Author

Performance numbers:

Details
  • K1 (base vs. patch)

GCC

                Name of Test                 base-gcc patch-gcc patch-gcc 
                                                                    vs    
                                                                 base-gcc 
                                                                (x-factor)
divide::BinaryOpTest::(640x480, 8UC1)         1.206     0.794      1.52   
divide::BinaryOpTest::(640x480, 8SC1)         1.205     0.793      1.52   
divide::BinaryOpTest::(640x480, 16SC1)        1.169     0.703      1.66   
divide::BinaryOpTest::(640x480, 32SC1)        1.224     0.773      1.58   
divide::BinaryOpTest::(640x480, 32FC1)        1.141     0.626      1.82   
divide::BinaryOpTest::(640x480, 16SC2)        2.360     1.426      1.65   
divide::BinaryOpTest::(640x480, 8UC3)         3.719     2.386      1.56   
divide::BinaryOpTest::(640x480, 16SC3)        3.540     2.143      1.65   
divide::BinaryOpTest::(640x480, 8UC4)         4.854     3.142      1.54   
divide::BinaryOpTest::(640x480, 16SC4)        4.802     2.852      1.68   
divide::BinaryOpTest::(1280x720, 8UC1)        3.629     2.357      1.54   
divide::BinaryOpTest::(1280x720, 8SC1)        3.630     2.381      1.52   
divide::BinaryOpTest::(1280x720, 16SC1)       3.564     2.170      1.64   
divide::BinaryOpTest::(1280x720, 32SC1)       3.671     2.385      1.54   
divide::BinaryOpTest::(1280x720, 32FC1)       3.413     1.931      1.77   
divide::BinaryOpTest::(1280x720, 16SC2)       7.141     4.298      1.66   
divide::BinaryOpTest::(1280x720, 8UC3)        10.932    7.027      1.56   
divide::BinaryOpTest::(1280x720, 16SC3)       10.583    6.402      1.65   
divide::BinaryOpTest::(1280x720, 8UC4)        14.578    9.485      1.54   
divide::BinaryOpTest::(1280x720, 16SC4)       14.181    8.543      1.66   
divide::BinaryOpTest::(1920x1080, 8UC1)       8.197     5.333      1.54   
divide::BinaryOpTest::(1920x1080, 8SC1)       8.200     5.348      1.53   
divide::BinaryOpTest::(1920x1080, 16SC1)      7.970     4.877      1.63   
divide::BinaryOpTest::(1920x1080, 32SC1)      8.227     5.401      1.52   
divide::BinaryOpTest::(1920x1080, 32FC1)      7.635     4.530      1.69   
divide::BinaryOpTest::(1920x1080, 16SC2)      16.006    9.608      1.67   
divide::BinaryOpTest::(1920x1080, 8UC3)       24.552   16.001      1.53   
divide::BinaryOpTest::(1920x1080, 16SC3)      23.779   14.379      1.65   
divide::BinaryOpTest::(1920x1080, 8UC4)       32.757   21.231      1.54   
divide::BinaryOpTest::(1920x1080, 16SC4)      31.702   19.162      1.65   
reciprocal::BinaryOpTest::(640x480, 8UC1)     0.042     0.042      1.00   
reciprocal::BinaryOpTest::(640x480, 8SC1)     0.042     0.042      1.00   
reciprocal::BinaryOpTest::(640x480, 16SC1)    0.080     0.075      1.08   
reciprocal::BinaryOpTest::(640x480, 32SC1)    0.162     0.156      1.04   
reciprocal::BinaryOpTest::(640x480, 32FC1)    0.974     0.492      1.98   
reciprocal::BinaryOpTest::(640x480, 16SC2)    0.163     0.162      1.00   
reciprocal::BinaryOpTest::(640x480, 8UC3)     0.122     0.122      1.00   
reciprocal::BinaryOpTest::(640x480, 16SC3)    0.243     0.243      1.00   
reciprocal::BinaryOpTest::(640x480, 8UC4)     0.162     0.162      1.00   
reciprocal::BinaryOpTest::(640x480, 16SC4)    0.324     0.323      1.00   
reciprocal::BinaryOpTest::(1280x720, 8UC1)    0.108     0.107      1.01   
reciprocal::BinaryOpTest::(1280x720, 8SC1)    0.122     0.122      1.00   
reciprocal::BinaryOpTest::(1280x720, 16SC1)   0.243     0.243      1.00   
reciprocal::BinaryOpTest::(1280x720, 32SC1)   0.486     0.485      1.00   
reciprocal::BinaryOpTest::(1280x720, 32FC1)   2.991     1.518      1.97   
reciprocal::BinaryOpTest::(1280x720, 16SC2)   0.486     0.486      1.00   
reciprocal::BinaryOpTest::(1280x720, 8UC3)    0.364     0.364      1.00   
reciprocal::BinaryOpTest::(1280x720, 16SC3)   0.729     0.729      1.00   
reciprocal::BinaryOpTest::(1280x720, 8UC4)    0.486     0.486      1.00   
reciprocal::BinaryOpTest::(1280x720, 16SC4)   0.972     0.971      1.00   
reciprocal::BinaryOpTest::(1920x1080, 8UC1)   0.273     0.271      1.01   
reciprocal::BinaryOpTest::(1920x1080, 8SC1)   0.274     0.273      1.00   
reciprocal::BinaryOpTest::(1920x1080, 16SC1)  0.547     0.546      1.00   
reciprocal::BinaryOpTest::(1920x1080, 32SC1)  1.060     1.063      1.00   
reciprocal::BinaryOpTest::(1920x1080, 32FC1)  6.719     3.413      1.97   
reciprocal::BinaryOpTest::(1920x1080, 16SC2)  1.093     1.092      1.00   
reciprocal::BinaryOpTest::(1920x1080, 8UC3)   0.820     0.820      1.00   
reciprocal::BinaryOpTest::(1920x1080, 16SC3)  1.640     1.639      1.00   
reciprocal::BinaryOpTest::(1920x1080, 8UC4)   1.094     1.092      1.00   
reciprocal::BinaryOpTest::(1920x1080, 16SC4)  2.156     2.158      1.00

Clang

                Name of Test                 base-clang patch-clang patch-clang
                                                                        vs     
                                                                    base-clang 
                                                                    (x-factor) 
divide::BinaryOpTest::(640x480, 8UC1)          1.304       0.822       1.59    
divide::BinaryOpTest::(640x480, 8SC1)          1.304       0.823       1.58    
divide::BinaryOpTest::(640x480, 16SC1)         1.277       0.752       1.70    
divide::BinaryOpTest::(640x480, 32SC1)         1.290       0.877       1.47    
divide::BinaryOpTest::(640x480, 32FC1)         1.152       0.666       1.73    
divide::BinaryOpTest::(640x480, 16SC2)         2.578       1.521       1.69    
divide::BinaryOpTest::(640x480, 8UC3)          3.972       2.497       1.59    
divide::BinaryOpTest::(640x480, 16SC3)         3.865       2.289       1.69    
divide::BinaryOpTest::(640x480, 8UC4)          5.281       3.318       1.59    
divide::BinaryOpTest::(640x480, 16SC4)         5.165       3.051       1.69    
divide::BinaryOpTest::(1280x720, 8UC1)         3.954       2.464       1.60    
divide::BinaryOpTest::(1280x720, 8SC1)         3.956       2.471       1.60    
divide::BinaryOpTest::(1280x720, 16SC1)        3.876       2.298       1.69    
divide::BinaryOpTest::(1280x720, 32SC1)        3.914       2.574       1.52    
divide::BinaryOpTest::(1280x720, 32FC1)        3.502       1.999       1.75    
divide::BinaryOpTest::(1280x720, 16SC2)        7.761       4.592       1.69    
divide::BinaryOpTest::(1280x720, 8UC3)         11.823      7.423       1.59    
divide::BinaryOpTest::(1280x720, 16SC3)        11.567      6.837       1.69    
divide::BinaryOpTest::(1280x720, 8UC4)         15.778      9.841       1.60    
divide::BinaryOpTest::(1280x720, 16SC4)        15.411      9.105       1.69    
divide::BinaryOpTest::(1920x1080, 8UC1)        8.868       5.564       1.59    
divide::BinaryOpTest::(1920x1080, 8SC1)        8.865       5.573       1.59    
divide::BinaryOpTest::(1920x1080, 16SC1)       8.701       5.573       1.56    
divide::BinaryOpTest::(1920x1080, 32SC1)       8.764       5.813       1.51    
divide::BinaryOpTest::(1920x1080, 32FC1)       7.841       4.673       1.68    
divide::BinaryOpTest::(1920x1080, 16SC2)       17.394     10.331       1.68    
divide::BinaryOpTest::(1920x1080, 8UC3)        26.553     16.734       1.59    
divide::BinaryOpTest::(1920x1080, 16SC3)       25.979     15.349       1.69    
divide::BinaryOpTest::(1920x1080, 8UC4)        35.424     22.513       1.57    
divide::BinaryOpTest::(1920x1080, 16SC4)       34.628     20.444       1.69    
reciprocal::BinaryOpTest::(640x480, 8UC1)      0.041       0.041       1.01    
reciprocal::BinaryOpTest::(640x480, 8SC1)      0.042       0.041       1.01    
reciprocal::BinaryOpTest::(640x480, 16SC1)     0.075       0.082       0.92    
reciprocal::BinaryOpTest::(640x480, 32SC1)     0.159       0.151       1.05    
reciprocal::BinaryOpTest::(640x480, 32FC1)     0.972       0.528       1.84    
reciprocal::BinaryOpTest::(640x480, 16SC2)     0.162       0.162       1.00    
reciprocal::BinaryOpTest::(640x480, 8UC3)      0.122       0.122       1.00    
reciprocal::BinaryOpTest::(640x480, 16SC3)     0.243       0.243       1.00    
reciprocal::BinaryOpTest::(640x480, 8UC4)      0.162       0.162       1.00    
reciprocal::BinaryOpTest::(640x480, 16SC4)     0.324       0.323       1.00    
reciprocal::BinaryOpTest::(1280x720, 8UC1)     0.114       0.122       0.93    
reciprocal::BinaryOpTest::(1280x720, 8SC1)     0.122       0.122       1.00    
reciprocal::BinaryOpTest::(1280x720, 16SC1)    0.243       0.243       1.00    
reciprocal::BinaryOpTest::(1280x720, 32SC1)    0.485       0.485       1.00    
reciprocal::BinaryOpTest::(1280x720, 32FC1)    2.980       1.662       1.79    
reciprocal::BinaryOpTest::(1280x720, 16SC2)    0.486       0.486       1.00    
reciprocal::BinaryOpTest::(1280x720, 8UC3)     0.364       0.364       1.00    
reciprocal::BinaryOpTest::(1280x720, 16SC3)    0.728       0.728       1.00    
reciprocal::BinaryOpTest::(1280x720, 8UC4)     0.486       0.485       1.00    
reciprocal::BinaryOpTest::(1280x720, 16SC4)    0.972       0.971       1.00    
reciprocal::BinaryOpTest::(1920x1080, 8UC1)    0.273       0.273       1.00    
reciprocal::BinaryOpTest::(1920x1080, 8SC1)    0.273       0.273       1.00    
reciprocal::BinaryOpTest::(1920x1080, 16SC1)   0.547       0.546       1.00    
reciprocal::BinaryOpTest::(1920x1080, 32SC1)   1.059       1.061       1.00    
reciprocal::BinaryOpTest::(1920x1080, 32FC1)   6.699       3.727       1.80    
reciprocal::BinaryOpTest::(1920x1080, 16SC2)   1.093       1.092       1.00    
reciprocal::BinaryOpTest::(1920x1080, 8UC3)    0.820       0.819       1.00    
reciprocal::BinaryOpTest::(1920x1080, 16SC3)   1.638       1.638       1.00    
reciprocal::BinaryOpTest::(1920x1080, 8UC4)    1.093       1.092       1.00    
reciprocal::BinaryOpTest::(1920x1080, 16SC4)   2.155       2.158       1.00
  • K1 vs. RK3568
                Name of Test                   rk   patch-gcc patch-clang patch-gcc  patch-clang
                                                                              vs         vs     
                                                                              rk         rk     
                                                                          (x-factor) (x-factor) 
divide::BinaryOpTest::(640x480, 8UC1)        1.096    0.794      0.822       1.38       1.33    
divide::BinaryOpTest::(640x480, 8SC1)        1.055    0.793      0.823       1.33       1.28    
divide::BinaryOpTest::(640x480, 16SC1)       1.205    0.703      0.752       1.71       1.60    
divide::BinaryOpTest::(640x480, 32SC1)       1.775    0.773      0.877       2.30       2.03    
divide::BinaryOpTest::(640x480, 32FC1)       1.606    0.626      0.666       2.56       2.41    
divide::BinaryOpTest::(640x480, 16SC2)       2.457    1.426      1.521       1.72       1.62    
divide::BinaryOpTest::(640x480, 8UC3)        3.193    2.386      2.497       1.34       1.28    
divide::BinaryOpTest::(640x480, 16SC3)       3.749    2.143      2.289       1.75       1.64    
divide::BinaryOpTest::(640x480, 8UC4)        4.182    3.142      3.318       1.33       1.26    
divide::BinaryOpTest::(640x480, 16SC4)       5.031    2.852      3.051       1.76       1.65    
divide::BinaryOpTest::(1280x720, 8UC1)       3.140    2.357      2.464       1.33       1.27    
divide::BinaryOpTest::(1280x720, 8SC1)       3.138    2.381      2.471       1.32       1.27    
divide::BinaryOpTest::(1280x720, 16SC1)      3.708    2.170      2.298       1.71       1.61    
divide::BinaryOpTest::(1280x720, 32SC1)      5.481    2.385      2.574       2.30       2.13    
divide::BinaryOpTest::(1280x720, 32FC1)      4.784    1.931      1.999       2.48       2.39    
divide::BinaryOpTest::(1280x720, 16SC2)      7.385    4.298      4.592       1.72       1.61    
divide::BinaryOpTest::(1280x720, 8UC3)       9.413    7.027      7.423       1.34       1.27    
divide::BinaryOpTest::(1280x720, 16SC3)      11.354   6.402      6.837       1.77       1.66    
divide::BinaryOpTest::(1280x720, 8UC4)       12.509   9.485      9.841       1.32       1.27    
divide::BinaryOpTest::(1280x720, 16SC4)      15.107   8.543      9.105       1.77       1.66    
divide::BinaryOpTest::(1920x1080, 8UC1)      7.059    5.333      5.564       1.32       1.27    
divide::BinaryOpTest::(1920x1080, 8SC1)      7.067    5.348      5.573       1.32       1.27    
divide::BinaryOpTest::(1920x1080, 16SC1)     8.116    4.877      5.573       1.66       1.46    
divide::BinaryOpTest::(1920x1080, 32SC1)     12.441   5.401      5.813       2.30       2.14    
divide::BinaryOpTest::(1920x1080, 32FC1)     11.909   4.530      4.673       2.63       2.55    
divide::BinaryOpTest::(1920x1080, 16SC2)     16.636   9.608     10.331       1.73       1.61    
divide::BinaryOpTest::(1920x1080, 8UC3)      21.142  16.001     16.734       1.32       1.26    
divide::BinaryOpTest::(1920x1080, 16SC3)     27.191  14.379     15.349       1.89       1.77    
divide::BinaryOpTest::(1920x1080, 8UC4)      28.808  21.231     22.513       1.36       1.28    
divide::BinaryOpTest::(1920x1080, 16SC4)     34.154  19.162     20.444       1.78       1.67    
reciprocal::BinaryOpTest::(640x480, 8UC1)    0.043    0.042      0.041       1.03       1.05    
reciprocal::BinaryOpTest::(640x480, 8SC1)    0.037    0.042      0.041       0.89       0.90    
reciprocal::BinaryOpTest::(640x480, 16SC1)   0.089    0.075      0.082       1.19       1.09    
reciprocal::BinaryOpTest::(640x480, 32SC1)   0.178    0.156      0.151       1.14       1.17    
reciprocal::BinaryOpTest::(640x480, 32FC1)   1.326    0.492      0.528       2.69       2.51    
reciprocal::BinaryOpTest::(640x480, 16SC2)   0.181    0.162      0.162       1.12       1.12    
reciprocal::BinaryOpTest::(640x480, 8UC3)    0.135    0.122      0.122       1.11       1.11    
reciprocal::BinaryOpTest::(640x480, 16SC3)   0.276    0.243      0.243       1.13       1.14    
reciprocal::BinaryOpTest::(640x480, 8UC4)    0.177    0.162      0.162       1.09       1.09    
reciprocal::BinaryOpTest::(640x480, 16SC4)   0.368    0.323      0.323       1.14       1.14    
reciprocal::BinaryOpTest::(1280x720, 8UC1)   0.137    0.107      0.122       1.28       1.12    
reciprocal::BinaryOpTest::(1280x720, 8SC1)   0.133    0.122      0.122       1.09       1.09    
reciprocal::BinaryOpTest::(1280x720, 16SC1)  0.279    0.243      0.243       1.14       1.15    
reciprocal::BinaryOpTest::(1280x720, 32SC1)  0.573    0.485      0.485       1.18       1.18    
reciprocal::BinaryOpTest::(1280x720, 32FC1)  4.000    1.518      1.662       2.63       2.41    
reciprocal::BinaryOpTest::(1280x720, 16SC2)  0.566    0.486      0.486       1.17       1.17    
reciprocal::BinaryOpTest::(1280x720, 8UC3)   0.420    0.364      0.364       1.16       1.16    
reciprocal::BinaryOpTest::(1280x720, 16SC3)  0.868    0.729      0.728       1.19       1.19    
reciprocal::BinaryOpTest::(1280x720, 8UC4)   0.565    0.486      0.485       1.16       1.17    
reciprocal::BinaryOpTest::(1280x720, 16SC4)  1.154    0.971      0.971       1.19       1.19    
reciprocal::BinaryOpTest::(1920x1080, 8UC1)  0.312    0.271      0.273       1.15       1.14    
reciprocal::BinaryOpTest::(1920x1080, 8SC1)  0.308    0.273      0.273       1.13       1.13    
reciprocal::BinaryOpTest::(1920x1080, 16SC1) 0.637    0.546      0.546       1.17       1.17    
reciprocal::BinaryOpTest::(1920x1080, 32SC1) 1.349    1.063      1.061       1.27       1.27    
reciprocal::BinaryOpTest::(1920x1080, 32FC1) 8.946    3.413      3.727       2.62       2.40    
reciprocal::BinaryOpTest::(1920x1080, 16SC2) 1.312    1.092      1.092       1.20       1.20    
reciprocal::BinaryOpTest::(1920x1080, 8UC3)  0.974    0.820      0.819       1.19       1.19    
reciprocal::BinaryOpTest::(1920x1080, 16SC3) 1.980    1.639      1.638       1.21       1.21    
reciprocal::BinaryOpTest::(1920x1080, 8UC4)  1.306    1.092      1.092       1.20       1.20    
reciprocal::BinaryOpTest::(1920x1080, 16SC4) 2.648    2.158      2.158       1.23       1.23

@asmorkalov asmorkalov self-assigned this May 23, 2025
@asmorkalov asmorkalov merged commit b8099d3 into opencv:4.x May 23, 2025
27 of 28 checks passed
@fengyuentau fengyuentau deleted the 4x/hal/riscv_rvv/faster_div_f32 branch May 23, 2025 08:31
@asmorkalov asmorkalov mentioned this pull request May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants