Skip to content

HAL: implemented cv_hal_transpose in hal_rvv#27229

Merged
asmorkalov merged 10 commits intoopencv:4.xfrom
fengyuentau:4x/hal_rvv/transpose
Apr 22, 2025
Merged

HAL: implemented cv_hal_transpose in hal_rvv#27229
asmorkalov merged 10 commits intoopencv:4.xfrom
fengyuentau:4x/hal_rvv/transpose

Conversation

@fengyuentau
Copy link
Copy Markdown
Member

@fengyuentau fengyuentau commented Apr 14, 2025

Checklists:

  • transpose2d_8u
  • transpose2d_16u
  • transpose2d_8uC3
  • transpose2d_32s
  • transpose2d_16uC3
  • transpose2d_32sC2
  • transpose_32sC3
  • transpose_32sC4
  • transpose_32sC6
  • transpose_32sC8
  • inplace transpose

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@fengyuentau fengyuentau added this to the 4.12.0 milestone Apr 14, 2025
@fengyuentau fengyuentau requested a review from asmorkalov April 14, 2025 10:14
@fengyuentau fengyuentau force-pushed the 4x/hal_rvv/transpose branch 2 times, most recently from 1462dc9 to 4d993c2 Compare April 21, 2025 09:59
@fengyuentau fengyuentau changed the title [WIP] HAL: implemented cv_hal_transpose in hal_rvv HAL: implemented cv_hal_transpose in hal_rvv Apr 21, 2025
@fengyuentau
Copy link
Copy Markdown
Member Author

Performance results:

Details

K1 (GCC)

                Name of Test                  base-gcc patch-gcc patch-gcc 
                                                                     vs    
                                                                  base-gcc 
                                                                 (x-factor)
transpose2d::BinaryOpTest::(640x480, 8UC1)     0.868     0.583      1.49   
transpose2d::BinaryOpTest::(640x480, 8SC1)     1.053     0.595      1.77   
transpose2d::BinaryOpTest::(640x480, 16SC1)    1.221     0.881      1.39   
transpose2d::BinaryOpTest::(640x480, 32SC1)    5.838     2.863      2.04   
transpose2d::BinaryOpTest::(640x480, 32FC1)    5.846     2.862      2.04   
transpose2d::BinaryOpTest::(640x480, 64FC1)    12.638    5.134      2.46   
transpose2d::BinaryOpTest::(640x480, 16SC2)    5.828     2.881      2.02   
transpose2d::BinaryOpTest::(640x480, 32SC2)    12.520    5.047      2.48   
transpose2d::BinaryOpTest::(640x480, 8UC3)     2.929     2.919      1.00   
transpose2d::BinaryOpTest::(640x480, 16SC3)    9.046     8.993      1.01   
transpose2d::BinaryOpTest::(640x480, 8UC4)     5.830     2.884      2.02   
transpose2d::BinaryOpTest::(640x480, 16SC4)    12.183    4.833      2.52   
transpose2d::BinaryOpTest::(1280x720, 8UC1)    4.135     2.221      1.86   
transpose2d::BinaryOpTest::(1280x720, 8SC1)    3.895     2.263      1.72   
transpose2d::BinaryOpTest::(1280x720, 16SC1)   11.912    6.416      1.86   
transpose2d::BinaryOpTest::(1280x720, 32SC1)   51.289   14.298      3.59   
transpose2d::BinaryOpTest::(1280x720, 32FC1)   51.268   14.504      3.53   
transpose2d::BinaryOpTest::(1280x720, 64FC1)   58.313    5.194     11.23   
transpose2d::BinaryOpTest::(1280x720, 16SC2)   51.127   14.476      3.53   
transpose2d::BinaryOpTest::(1280x720, 32SC2)   57.778    5.278     10.95   
transpose2d::BinaryOpTest::(1280x720, 8UC3)    18.708   18.335      1.02   
transpose2d::BinaryOpTest::(1280x720, 16SC3)   32.911   32.212      1.02   
transpose2d::BinaryOpTest::(1280x720, 8UC4)    51.212   14.243      3.60   
transpose2d::BinaryOpTest::(1280x720, 16SC4)   57.877    5.302     10.92   
transpose2d::BinaryOpTest::(1920x1080, 8UC1)   7.290     6.384      1.14   
transpose2d::BinaryOpTest::(1920x1080, 8SC1)   7.271     6.404      1.14   
transpose2d::BinaryOpTest::(1920x1080, 16SC1)  28.719   20.027      1.43   
transpose2d::BinaryOpTest::(1920x1080, 32SC1)  91.993   35.660      2.58   
transpose2d::BinaryOpTest::(1920x1080, 32FC1)  93.707   35.472      2.64   
transpose2d::BinaryOpTest::(1920x1080, 64FC1) 128.183   15.175      8.45   
transpose2d::BinaryOpTest::(1920x1080, 16SC2)  92.106   36.506      2.52   
transpose2d::BinaryOpTest::(1920x1080, 32SC2) 126.663   14.946      8.47   
transpose2d::BinaryOpTest::(1920x1080, 8UC3)   45.504   45.876      0.99   
transpose2d::BinaryOpTest::(1920x1080, 16SC3)  76.988   77.027      1.00   
transpose2d::BinaryOpTest::(1920x1080, 8UC4)   92.121   39.825      2.31   
transpose2d::BinaryOpTest::(1920x1080, 16SC4) 129.334   15.164      8.53

K1 vs RK3568

                Name of Test                    rk   patch-gcc patch-clang patch-gcc  patch-clang
                                                                               vs         vs     
                                                                               rk         rk     
                                                                           (x-factor) (x-factor) 
transpose2d::BinaryOpTest::(640x480, 8UC1)    0.606    0.579      0.563       1.05       1.08    
transpose2d::BinaryOpTest::(640x480, 8SC1)    0.573    0.595      0.576       0.96       1.00    
transpose2d::BinaryOpTest::(640x480, 16SC1)   2.043    0.875      0.884       2.33       2.31    
transpose2d::BinaryOpTest::(640x480, 32SC1)   3.935    2.654      2.968       1.48       1.33    
transpose2d::BinaryOpTest::(640x480, 32FC1)   3.971    2.968      2.732       1.34       1.45    
transpose2d::BinaryOpTest::(640x480, 64FC1)   7.568    5.123      4.427       1.48       1.71    
transpose2d::BinaryOpTest::(640x480, 16SC2)   3.975    2.642      2.954       1.50       1.35    
transpose2d::BinaryOpTest::(640x480, 32SC2)   7.534    4.977      4.298       1.51       1.75    
transpose2d::BinaryOpTest::(640x480, 8UC3)    3.911    2.902      3.453       1.35       1.13    
transpose2d::BinaryOpTest::(640x480, 16SC3)   7.217    9.448      9.642       0.76       0.75    
transpose2d::BinaryOpTest::(640x480, 8UC4)    3.961    2.621      2.902       1.51       1.36    
transpose2d::BinaryOpTest::(640x480, 16SC4)   7.431    4.752      4.781       1.56       1.55    
transpose2d::BinaryOpTest::(1280x720, 8UC1)   5.029    2.549      2.428       1.97       2.07    
transpose2d::BinaryOpTest::(1280x720, 8SC1)   5.052    2.552      2.433       1.98       2.08    
transpose2d::BinaryOpTest::(1280x720, 16SC1)  9.020    7.310      7.594       1.23       1.19    
transpose2d::BinaryOpTest::(1280x720, 32SC1)  23.668   9.436      9.566       2.51       2.47    
transpose2d::BinaryOpTest::(1280x720, 32FC1)  23.678  10.775     11.006       2.20       2.15    
transpose2d::BinaryOpTest::(1280x720, 64FC1)  28.978   5.260      5.152       5.51       5.63    
transpose2d::BinaryOpTest::(1280x720, 16SC2)  23.377   9.283      9.714       2.52       2.41    
transpose2d::BinaryOpTest::(1280x720, 32SC2)  28.990   5.264      5.441       5.51       5.33    
transpose2d::BinaryOpTest::(1280x720, 8UC3)   14.734  18.810     18.988       0.78       0.78    
transpose2d::BinaryOpTest::(1280x720, 16SC3)  25.451  34.185     34.285       0.74       0.74    
transpose2d::BinaryOpTest::(1280x720, 8UC4)   23.548  10.283     10.394       2.29       2.27    
transpose2d::BinaryOpTest::(1280x720, 16SC4)  28.927   5.274      5.284       5.48       5.47    
transpose2d::BinaryOpTest::(1920x1080, 8UC1)  11.735   7.247      7.014       1.62       1.67    
transpose2d::BinaryOpTest::(1920x1080, 8SC1)  11.726   7.243      6.997       1.62       1.68    
transpose2d::BinaryOpTest::(1920x1080, 16SC1) 20.069  20.598     20.885       0.97       0.96    
transpose2d::BinaryOpTest::(1920x1080, 32SC1) 43.286  22.174     22.340       1.95       1.94    
transpose2d::BinaryOpTest::(1920x1080, 32FC1) 43.409  22.148     20.280       1.96       2.14    
transpose2d::BinaryOpTest::(1920x1080, 64FC1) 65.920  15.468     15.156       4.26       4.35    
transpose2d::BinaryOpTest::(1920x1080, 16SC2) 43.022  19.776     19.640       2.18       2.19    
transpose2d::BinaryOpTest::(1920x1080, 32SC2) 65.947  15.214     15.793       4.33       4.18    
transpose2d::BinaryOpTest::(1920x1080, 8UC3)  35.095  45.925     46.504       0.76       0.75    
transpose2d::BinaryOpTest::(1920x1080, 16SC3) 63.474  76.210     77.432       0.83       0.82    
transpose2d::BinaryOpTest::(1920x1080, 8UC4)  43.185  19.546     20.095       2.21       2.15    
transpose2d::BinaryOpTest::(1920x1080, 16SC4) 65.924  15.511     15.180       4.25       4.34

perf-transpose2d.zip

@fengyuentau fengyuentau marked this pull request as ready for review April 21, 2025 10:07
@asmorkalov
Copy link
Copy Markdown
Contributor

@fengyuentau Please rebase and fix conflicts. I merged your previous changes.

@asmorkalov asmorkalov self-assigned this Apr 21, 2025
@asmorkalov asmorkalov force-pushed the 4x/hal_rvv/transpose branch from 4d993c2 to 21a48f5 Compare April 22, 2025 05:38
@asmorkalov
Copy link
Copy Markdown
Contributor

I did it by mysellf. Let's wait for CI and I'll merge.

@asmorkalov asmorkalov merged commit 325e59b into opencv:4.x Apr 22, 2025
26 of 28 checks passed
@fengyuentau fengyuentau deleted the 4x/hal_rvv/transpose branch April 22, 2025 08:05
@asmorkalov asmorkalov mentioned this pull request Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants