Skip to content

hal_rvv: further optimized flip#27257

Merged
asmorkalov merged 6 commits intoopencv:4.xfrom
fengyuentau:4x/hal_rvv/flip_opt
Apr 26, 2025
Merged

hal_rvv: further optimized flip#27257
asmorkalov merged 6 commits intoopencv:4.xfrom
fengyuentau:4x/hal_rvv/flip_opt

Conversation

@fengyuentau
Copy link
Copy Markdown
Member

@fengyuentau fengyuentau commented Apr 25, 2025

Checklist:

  • flipX
  • flipY
  • flipXY

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@fengyuentau

This comment was marked as outdated.

@asmorkalov
Copy link
Copy Markdown
Contributor

@fengyuentau I started HAL reorganization in #27252. Please move the files to the new location.

@asmorkalov asmorkalov self-assigned this Apr 25, 2025
@fengyuentau fengyuentau force-pushed the 4x/hal_rvv/flip_opt branch 2 times, most recently from afdab6b to 0944325 Compare April 25, 2025 15:44
@fengyuentau fengyuentau force-pushed the 4x/hal_rvv/flip_opt branch from 0944325 to c450654 Compare April 25, 2025 16:04
@fengyuentau
Copy link
Copy Markdown
Member Author

Updated performance results:

Details

K1 GCC

                      Name of Test                       base-gcc patch-gcc patch-gcc 
                                                                                vs    
                                                                             base-gcc 
                                                                            (x-factor)
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_X)      0.035     0.021      1.62   
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_XY)     0.089     0.091      0.97   
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_Y)      0.083     0.090      0.92   
flip::Size_MatType_FlipCode::(320x240, 8SC1, FLIP_X)      0.026     0.020      1.32   
flip::Size_MatType_FlipCode::(320x240, 8SC1, FLIP_XY)     0.088     0.091      0.96   
flip::Size_MatType_FlipCode::(320x240, 8SC1, FLIP_Y)      0.083     0.090      0.92   
flip::Size_MatType_FlipCode::(320x240, 16SC1, FLIP_X)     0.050     0.040      1.26   
flip::Size_MatType_FlipCode::(320x240, 16SC1, FLIP_XY)    0.190     0.136      1.39   
flip::Size_MatType_FlipCode::(320x240, 16SC1, FLIP_Y)     0.187     0.135      1.38   
flip::Size_MatType_FlipCode::(320x240, 32SC1, FLIP_X)     0.123     0.083      1.49   
flip::Size_MatType_FlipCode::(320x240, 32SC1, FLIP_XY)    0.309     0.224      1.38   
flip::Size_MatType_FlipCode::(320x240, 32SC1, FLIP_Y)     0.325     0.223      1.46   
flip::Size_MatType_FlipCode::(320x240, 32FC1, FLIP_X)     0.090     0.082      1.10   
flip::Size_MatType_FlipCode::(320x240, 32FC1, FLIP_XY)    0.296     0.224      1.32   
flip::Size_MatType_FlipCode::(320x240, 32FC1, FLIP_Y)     0.294     0.223      1.32   
flip::Size_MatType_FlipCode::(320x240, 8UC2, FLIP_X)      0.049     0.041      1.21   
flip::Size_MatType_FlipCode::(320x240, 8UC2, FLIP_XY)     0.191     0.136      1.40   
flip::Size_MatType_FlipCode::(320x240, 8UC2, FLIP_Y)      0.190     0.135      1.41   
flip::Size_MatType_FlipCode::(320x240, 16SC2, FLIP_X)     0.118     0.085      1.39   
flip::Size_MatType_FlipCode::(320x240, 16SC2, FLIP_XY)    0.305     0.224      1.36   
flip::Size_MatType_FlipCode::(320x240, 16SC2, FLIP_Y)     0.295     0.223      1.32   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_X)      0.089     0.063      1.41   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_XY)     0.253     0.081      3.12   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_Y)      0.228     0.080      2.85   
flip::Size_MatType_FlipCode::(320x240, 16SC3, FLIP_X)     0.166     0.126      1.32   
flip::Size_MatType_FlipCode::(320x240, 16SC3, FLIP_XY)    0.627     0.164      3.84   
flip::Size_MatType_FlipCode::(320x240, 16SC3, FLIP_Y)     0.575     0.163      3.52   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_X)      0.106     0.089      1.19   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_XY)     0.305     0.224      1.36   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_Y)      0.299     0.223      1.34   
flip::Size_MatType_FlipCode::(320x240, 16SC4, FLIP_X)     0.225     0.186      1.21   
flip::Size_MatType_FlipCode::(320x240, 16SC4, FLIP_XY)    0.555     0.445      1.25   
flip::Size_MatType_FlipCode::(320x240, 16SC4, FLIP_Y)     0.546     0.444      1.23   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_X)      0.118     0.080      1.47   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_XY)     0.389     0.270      1.44   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_Y)      0.411     0.267      1.54   
flip::Size_MatType_FlipCode::(640x480, 8SC1, FLIP_X)      0.121     0.082      1.47   
flip::Size_MatType_FlipCode::(640x480, 8SC1, FLIP_XY)     0.424     0.270      1.57   
flip::Size_MatType_FlipCode::(640x480, 8SC1, FLIP_Y)      0.420     0.267      1.57   
flip::Size_MatType_FlipCode::(640x480, 16SC1, FLIP_X)     0.272     0.177      1.54   
flip::Size_MatType_FlipCode::(640x480, 16SC1, FLIP_XY)    0.654     0.449      1.46   
flip::Size_MatType_FlipCode::(640x480, 16SC1, FLIP_Y)     0.633     0.445      1.42   
flip::Size_MatType_FlipCode::(640x480, 32SC1, FLIP_X)     0.443     0.372      1.19   
flip::Size_MatType_FlipCode::(640x480, 32SC1, FLIP_XY)    1.293     0.890      1.45   
flip::Size_MatType_FlipCode::(640x480, 32SC1, FLIP_Y)     1.203     0.888      1.36   
flip::Size_MatType_FlipCode::(640x480, 32FC1, FLIP_X)     0.422     0.375      1.13   
flip::Size_MatType_FlipCode::(640x480, 32FC1, FLIP_XY)    1.292     0.890      1.45   
flip::Size_MatType_FlipCode::(640x480, 32FC1, FLIP_Y)     1.204     0.888      1.36   
flip::Size_MatType_FlipCode::(640x480, 8UC2, FLIP_X)      0.257     0.181      1.42   
flip::Size_MatType_FlipCode::(640x480, 8UC2, FLIP_XY)     0.652     0.448      1.46   
flip::Size_MatType_FlipCode::(640x480, 8UC2, FLIP_Y)      0.639     0.445      1.44   
flip::Size_MatType_FlipCode::(640x480, 16SC2, FLIP_X)     0.451     0.383      1.18   
flip::Size_MatType_FlipCode::(640x480, 16SC2, FLIP_XY)    1.296     0.891      1.45   
flip::Size_MatType_FlipCode::(640x480, 16SC2, FLIP_Y)     1.204     0.888      1.36   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_X)      0.369     0.276      1.33   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_XY)     1.372     0.333      4.13   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_Y)      1.308     0.334      3.91   
flip::Size_MatType_FlipCode::(640x480, 16SC3, FLIP_X)     0.642     0.591      1.09   
flip::Size_MatType_FlipCode::(640x480, 16SC3, FLIP_XY)    3.411     0.682      5.00   
flip::Size_MatType_FlipCode::(640x480, 16SC3, FLIP_Y)     3.448     0.680      5.07   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_X)      0.455     0.381      1.19   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_XY)     1.295     0.890      1.45   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_Y)      1.203     0.888      1.36   
flip::Size_MatType_FlipCode::(640x480, 16SC4, FLIP_X)     0.900     0.802      1.12   
flip::Size_MatType_FlipCode::(640x480, 16SC4, FLIP_XY)    3.555     1.802      1.97   
flip::Size_MatType_FlipCode::(640x480, 16SC4, FLIP_Y)     4.028     1.794      2.25   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_X)    0.864     0.673      1.28   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_XY)   2.628     1.616      1.63   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_Y)    2.237     1.607      1.39   
flip::Size_MatType_FlipCode::(1920x1080, 8SC1, FLIP_X)    0.868     0.674      1.29   
flip::Size_MatType_FlipCode::(1920x1080, 8SC1, FLIP_XY)   2.622     1.617      1.62   
flip::Size_MatType_FlipCode::(1920x1080, 8SC1, FLIP_Y)    2.237     1.607      1.39   
flip::Size_MatType_FlipCode::(1920x1080, 16SC1, FLIP_X)   1.649     1.376      1.20   
flip::Size_MatType_FlipCode::(1920x1080, 16SC1, FLIP_XY)  5.971     3.056      1.95   
flip::Size_MatType_FlipCode::(1920x1080, 16SC1, FLIP_Y)   5.399     3.031      1.78   
flip::Size_MatType_FlipCode::(1920x1080, 32SC1, FLIP_X)   3.105     2.763      1.12   
flip::Size_MatType_FlipCode::(1920x1080, 32SC1, FLIP_XY)  13.713    6.084      2.25   
flip::Size_MatType_FlipCode::(1920x1080, 32SC1, FLIP_Y)   15.612    6.065      2.57   
flip::Size_MatType_FlipCode::(1920x1080, 32FC1, FLIP_X)   3.108     2.761      1.13   
flip::Size_MatType_FlipCode::(1920x1080, 32FC1, FLIP_XY)  13.704    6.084      2.25   
flip::Size_MatType_FlipCode::(1920x1080, 32FC1, FLIP_Y)   15.615    6.067      2.57   
flip::Size_MatType_FlipCode::(1920x1080, 8UC2, FLIP_X)    1.642     1.377      1.19   
flip::Size_MatType_FlipCode::(1920x1080, 8UC2, FLIP_XY)   5.975     3.056      1.96   
flip::Size_MatType_FlipCode::(1920x1080, 8UC2, FLIP_Y)    5.400     3.031      1.78   
flip::Size_MatType_FlipCode::(1920x1080, 16SC2, FLIP_X)   3.111     2.765      1.12   
flip::Size_MatType_FlipCode::(1920x1080, 16SC2, FLIP_XY)  13.711    6.078      2.26   
flip::Size_MatType_FlipCode::(1920x1080, 16SC2, FLIP_Y)   15.618    6.060      2.58   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_X)    2.404     2.056      1.17   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_XY)   12.963    2.361      5.49   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_Y)    13.453    2.337      5.76   
flip::Size_MatType_FlipCode::(1920x1080, 16SC3, FLIP_X)   4.529     4.082      1.11   
flip::Size_MatType_FlipCode::(1920x1080, 16SC3, FLIP_XY)  22.543    4.716      4.78   
flip::Size_MatType_FlipCode::(1920x1080, 16SC3, FLIP_Y)   24.099    4.688      5.14   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_X)    3.109     2.763      1.13   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_XY)   13.713    6.084      2.25   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_Y)    15.621    6.064      2.58   
flip::Size_MatType_FlipCode::(1920x1080, 16SC4, FLIP_X)   5.822     5.426      1.07   
flip::Size_MatType_FlipCode::(1920x1080, 16SC4, FLIP_XY)  30.029   12.169      2.47   
flip::Size_MatType_FlipCode::(1920x1080, 16SC4, FLIP_Y)   26.578   12.189      2.18   
rotate::RotateTest::(640x480, 0, 8UC1)                    1.351     1.427      0.95   
rotate::RotateTest::(640x480, 0, 8SC1)                    1.360     1.437      0.95   
rotate::RotateTest::(640x480, 0, 16SC1)                   2.331     2.494      0.93   
rotate::RotateTest::(640x480, 0, 32SC1)                   6.839     7.180      0.95   
rotate::RotateTest::(640x480, 0, 32FC1)                   6.827     7.172      0.95   
rotate::RotateTest::(640x480, 0, 8UC2)                    2.335     2.493      0.94   
rotate::RotateTest::(640x480, 0, 16SC2)                   6.574     6.910      0.95   
rotate::RotateTest::(640x480, 0, 8UC3)                    7.683     7.942      0.97   
rotate::RotateTest::(640x480, 0, 16SC3)                   17.194   17.638      0.97   
rotate::RotateTest::(640x480, 0, 8UC4)                    6.810     7.140      0.95   
rotate::RotateTest::(640x480, 0, 16SC4)                   14.209   14.860      0.96   
rotate::RotateTest::(640x480, 1, 8UC1)                    0.416     0.271      1.54   
rotate::RotateTest::(640x480, 1, 8SC1)                    0.411     0.271      1.52   
rotate::RotateTest::(640x480, 1, 16SC1)                   0.657     0.449      1.46   
rotate::RotateTest::(640x480, 1, 32SC1)                   1.315     0.891      1.48   
rotate::RotateTest::(640x480, 1, 32FC1)                   1.314     0.892      1.47   
rotate::RotateTest::(640x480, 1, 8UC2)                    0.659     0.450      1.47   
rotate::RotateTest::(640x480, 1, 16SC2)                   1.305     0.891      1.46   
rotate::RotateTest::(640x480, 1, 8UC3)                    1.381     0.328      4.21   
rotate::RotateTest::(640x480, 1, 16SC3)                   3.287     0.687      4.79   
rotate::RotateTest::(640x480, 1, 8UC4)                    1.291     0.891      1.45   
rotate::RotateTest::(640x480, 1, 16SC4)                   3.640     1.803      2.02   
rotate::RotateTest::(640x480, 2, 8UC1)                    0.631     0.624      1.01   
rotate::RotateTest::(640x480, 2, 8SC1)                    0.633     0.623      1.02   
rotate::RotateTest::(640x480, 2, 16SC1)                   1.173     1.171      1.00   
rotate::RotateTest::(640x480, 2, 32SC1)                   3.483     3.478      1.00   
rotate::RotateTest::(640x480, 2, 32FC1)                   3.483     3.476      1.00   
rotate::RotateTest::(640x480, 2, 8UC2)                    1.173     1.169      1.00   
rotate::RotateTest::(640x480, 2, 16SC2)                   3.485     3.476      1.00   
rotate::RotateTest::(640x480, 2, 8UC3)                    3.400     3.396      1.00   
rotate::RotateTest::(640x480, 2, 16SC3)                   9.820     9.820      1.00   
rotate::RotateTest::(640x480, 2, 8UC4)                    3.479     3.476      1.00   
rotate::RotateTest::(640x480, 2, 16SC4)                   6.352     6.341      1.00   
rotate::RotateTest::(1280x720, 0, 8UC1)                   4.693     4.968      0.94   
rotate::RotateTest::(1280x720, 0, 8SC1)                   4.712     4.965      0.95   
rotate::RotateTest::(1280x720, 0, 16SC1)                  15.921   16.463      0.97   
rotate::RotateTest::(1280x720, 0, 32SC1)                  23.127   24.159      0.96   
rotate::RotateTest::(1280x720, 0, 32FC1)                  23.178   24.209      0.96   
rotate::RotateTest::(1280x720, 0, 8UC2)                   16.137   16.639      0.97   
rotate::RotateTest::(1280x720, 0, 16SC2)                  23.197   24.197      0.96   
rotate::RotateTest::(1280x720, 0, 8UC3)                   32.137   32.678      0.98   
rotate::RotateTest::(1280x720, 0, 16SC3)                  56.544   57.542      0.98   
rotate::RotateTest::(1280x720, 0, 8UC4)                   23.200   24.199      0.96   
rotate::RotateTest::(1280x720, 0, 16SC4)                  29.425   31.537      0.93   
rotate::RotateTest::(1280x720, 1, 8UC1)                   1.024     0.671      1.53   
rotate::RotateTest::(1280x720, 1, 8SC1)                   1.025     0.671      1.53   
rotate::RotateTest::(1280x720, 1, 16SC1)                  2.044     1.349      1.52   
rotate::RotateTest::(1280x720, 1, 32SC1)                  5.373     2.714      1.98   
rotate::RotateTest::(1280x720, 1, 32FC1)                  5.373     2.715      1.98   
rotate::RotateTest::(1280x720, 1, 8UC2)                   2.039     1.348      1.51   
rotate::RotateTest::(1280x720, 1, 16SC2)                  5.360     2.714      1.97   
rotate::RotateTest::(1280x720, 1, 8UC3)                   5.303     1.052      5.04   
rotate::RotateTest::(1280x720, 1, 16SC3)                  10.491    2.100      5.00   
rotate::RotateTest::(1280x720, 1, 8UC4)                   5.245     2.714      1.93   
rotate::RotateTest::(1280x720, 1, 16SC4)                  11.567    5.415      2.14   
rotate::RotateTest::(1280x720, 2, 8UC1)                   2.750     2.753      1.00   
rotate::RotateTest::(1280x720, 2, 8SC1)                   2.751     2.752      1.00   
rotate::RotateTest::(1280x720, 2, 16SC1)                  7.704     7.687      1.00   
rotate::RotateTest::(1280x720, 2, 32SC1)                  11.133   11.087      1.00   
rotate::RotateTest::(1280x720, 2, 32FC1)                  11.098   11.108      1.00   
rotate::RotateTest::(1280x720, 2, 8UC2)                   7.753     7.749      1.00   
rotate::RotateTest::(1280x720, 2, 16SC2)                  11.110   11.099      1.00   
rotate::RotateTest::(1280x720, 2, 8UC3)                   19.699   19.635      1.00   
rotate::RotateTest::(1280x720, 2, 16SC3)                  33.900   33.314      1.02   
rotate::RotateTest::(1280x720, 2, 8UC4)                   11.061   11.044      1.00   
rotate::RotateTest::(1280x720, 2, 16SC4)                  8.160     8.123      1.00   
rotate::RotateTest::(1920x1080, 0, 8UC1)                  11.798   12.424      0.95   
rotate::RotateTest::(1920x1080, 0, 8SC1)                  11.721   12.354      0.95   
rotate::RotateTest::(1920x1080, 0, 16SC1)                 40.292   41.556      0.97   
rotate::RotateTest::(1920x1080, 0, 32SC1)                 56.577   58.601      0.97   
rotate::RotateTest::(1920x1080, 0, 32FC1)                 56.511   58.535      0.97   
rotate::RotateTest::(1920x1080, 0, 8UC2)                  40.367   41.695      0.97   
rotate::RotateTest::(1920x1080, 0, 16SC2)                 56.625   58.499      0.97   
rotate::RotateTest::(1920x1080, 0, 8UC3)                  75.247   76.965      0.98   
rotate::RotateTest::(1920x1080, 0, 16SC3)                121.242   124.348     0.98   
rotate::RotateTest::(1920x1080, 0, 8UC4)                  56.495   58.349      0.97   
rotate::RotateTest::(1920x1080, 0, 16SC4)                 64.054   68.397      0.94   
rotate::RotateTest::(1920x1080, 1, 8UC1)                  2.624     1.619      1.62   
rotate::RotateTest::(1920x1080, 1, 8SC1)                  2.627     1.619      1.62   
rotate::RotateTest::(1920x1080, 1, 16SC1)                 5.967     3.059      1.95   
rotate::RotateTest::(1920x1080, 1, 32SC1)                 13.740    6.083      2.26   
rotate::RotateTest::(1920x1080, 1, 32FC1)                 13.724    6.090      2.25   
rotate::RotateTest::(1920x1080, 1, 8UC2)                  5.979     3.057      1.96   
rotate::RotateTest::(1920x1080, 1, 16SC2)                 13.589    6.094      2.23   
rotate::RotateTest::(1920x1080, 1, 8UC3)                  12.993    2.367      5.49   
rotate::RotateTest::(1920x1080, 1, 16SC3)                 23.561    4.721      4.99   
rotate::RotateTest::(1920x1080, 1, 8UC4)                  13.243    6.086      2.18   
rotate::RotateTest::(1920x1080, 1, 16SC4)                 27.895   12.162      2.29   
rotate::RotateTest::(1920x1080, 2, 8UC1)                  7.790     7.687      1.01   
rotate::RotateTest::(1920x1080, 2, 8SC1)                  7.703     7.752      0.99   
rotate::RotateTest::(1920x1080, 2, 16SC1)                 22.036   22.253      0.99   
rotate::RotateTest::(1920x1080, 2, 32SC1)                 23.084   22.893      1.01   
rotate::RotateTest::(1920x1080, 2, 32FC1)                 23.086   22.880      1.01   
rotate::RotateTest::(1920x1080, 2, 8UC2)                  22.076   22.297      0.99   
rotate::RotateTest::(1920x1080, 2, 16SC2)                 23.150   22.888      1.01   
rotate::RotateTest::(1920x1080, 2, 8UC3)                  48.287   48.370      1.00   
rotate::RotateTest::(1920x1080, 2, 16SC3)                 79.481   79.561      1.00   
rotate::RotateTest::(1920x1080, 2, 8UC4)                  23.134   22.890      1.01   
rotate::RotateTest::(1920x1080, 2, 16SC4)                 21.256   21.114      1.01

More: perf-flip+rotate.zip

@fengyuentau fengyuentau marked this pull request as ready for review April 25, 2025 16:22
@asmorkalov asmorkalov merged commit 2fb7865 into opencv:4.x Apr 26, 2025
26 of 28 checks passed
@fengyuentau fengyuentau deleted the 4x/hal_rvv/flip_opt branch April 26, 2025 09:49
@asmorkalov asmorkalov mentioned this pull request Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants