Skip to content

Further optimization of cv::merge RVV HAL for 8U and 16S#26923

Merged
asmorkalov merged 3 commits intoopencv:4.xfrom
dkurt:merge_rvv_opt
Feb 20, 2025
Merged

Further optimization of cv::merge RVV HAL for 8U and 16S#26923
asmorkalov merged 3 commits intoopencv:4.xfrom
dkurt:merge_rvv_opt

Conversation

@dkurt
Copy link
Copy Markdown
Member

@dkurt dkurt commented Feb 15, 2025

Pull Request Readiness Checklist

  • Banana Pi BF3 (SpacemiT K1) RISC-V
  • Compiler: Syntacore Clang 18.1.4 (build 2024.12)
Geometric mean (ms)

                     Name of Test                       baseline   pr       pr
                                                         merge              vs    
                                                                         baseline
                                                                          merge
                                                                        (x-factor)
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 2)      0.013   0.003     3.76   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 3)      0.020   0.006     3.46   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 4)      0.026   0.010     2.61   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 5)      0.043   0.028     1.56   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 6)      0.054   0.035     1.53   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 7)      0.065   0.050     1.30   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 8)      0.070   0.036     1.95   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 2)     0.015   0.008     1.82   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 3)     0.022   0.015     1.48   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 4)     0.029   0.018     1.63   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 5)     0.067   0.044     1.54   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 6)     0.088   0.056     1.58   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 7)     0.104   0.076     1.38   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 8)     0.116   0.065     1.79   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 2)     0.421   0.176     2.39   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 3)     0.792   0.284     2.79   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 4)     1.090   0.370     2.95   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 5)     1.835   1.399     1.31   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 6)     2.389   1.776     1.35   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 7)     3.000   2.471     1.21   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 8)     3.178   2.104     1.51   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 2)    0.490   0.377     1.30   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 3)    1.348   0.602     2.24   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 4)    1.827   0.813     2.25   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 5)    3.283   2.692     1.22   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 6)    4.922   3.334     1.48   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 7)    5.725   4.399     1.30   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 8)    6.278   4.748     1.32   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 2)    1.267   0.603     2.10   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 3)    2.394   0.934     2.56   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 4)    3.236   1.434     2.26   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 5)    5.398   4.345     1.24   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 6)    7.127   5.459     1.31   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 7)    8.590   7.298     1.18   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 8)    9.360   6.152     1.52   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 2)   1.482   1.242     1.19   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 3)   4.008   1.817     2.21   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 4)   6.079   2.468     2.46   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 5)   11.300  8.644     1.31   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 6)   15.125  12.126    1.25   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 7)   17.555  14.804    1.19   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 8)   18.890  14.163    1.33   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 2)   2.910   1.326     2.19   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 3)   5.351   1.997     2.68   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 4)   7.290   2.629     2.77   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 5)   12.426  9.611     1.29   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 6)   16.453  12.162    1.35   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 7)   19.420  16.190    1.20   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 8)   20.588  13.699    1.50   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 2)  3.400   2.640     1.29   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 3)  8.986   3.952     2.27   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 4)  11.972  5.273     2.27   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 5)  20.544  17.996    1.14   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 6)  28.677  22.086    1.30   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 7)  32.958  27.713    1.19   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 8)  36.499  27.439    1.33

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@dkurt
Copy link
Copy Markdown
Member Author

dkurt commented Feb 15, 2025

@hanliutong @amane-ame , may I ask you review this proposal and verify that MUSE-PI will also have performance improvement?

@hanliutong
Copy link
Copy Markdown
Contributor

Hi @dkurt ,

The performance on MUSE-Pi is as follows, it also have improvement:

Clang 19.1.7
Geometric mean (ms)

                     Name of Test                        base    pr       pr    
                                                                          vs    
                                                                         base   
                                                                      (x-factor)
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 2)     0.012  0.003     4.42   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 3)     0.020  0.004     4.72   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 4)     0.025  0.007     3.51   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 2)    0.015  0.006     2.49   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 3)    0.021  0.016     1.31   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 4)    0.027  0.018     1.45   
merge::Size_SrcDepth_DstChannels::(127x61, 32SC1, 2)    0.029  0.029     1.00   
merge::Size_SrcDepth_DstChannels::(127x61, 32SC1, 3)    0.039  0.037     1.04   
merge::Size_SrcDepth_DstChannels::(127x61, 32SC1, 4)    0.050  0.050     1.02   
merge::Size_SrcDepth_DstChannels::(127x61, 32FC1, 2)    0.029  0.028     1.01   
merge::Size_SrcDepth_DstChannels::(127x61, 32FC1, 3)    0.039  0.037     1.03   
merge::Size_SrcDepth_DstChannels::(127x61, 32FC1, 4)    0.051  0.050     1.01   
merge::Size_SrcDepth_DstChannels::(127x61, 64FC1, 2)    0.039  0.039     1.00   
merge::Size_SrcDepth_DstChannels::(127x61, 64FC1, 3)    0.061  0.062     0.98   
merge::Size_SrcDepth_DstChannels::(127x61, 64FC1, 4)    0.079  0.079     1.00   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 2)    0.421  0.191     2.21   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 3)    0.821  0.293     2.80   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 4)    1.120  0.395     2.84   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 2)   0.494  0.398     1.24   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 3)   1.306  0.617     2.12   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 4)   1.884  0.841     2.24   
merge::Size_SrcDepth_DstChannels::(640x480, 32SC1, 2)   1.054  1.052     1.00   
merge::Size_SrcDepth_DstChannels::(640x480, 32SC1, 3)   1.464  1.449     1.01   
merge::Size_SrcDepth_DstChannels::(640x480, 32SC1, 4)   2.057  2.174     0.95   
merge::Size_SrcDepth_DstChannels::(640x480, 32FC1, 2)   1.053  1.055     1.00   
merge::Size_SrcDepth_DstChannels::(640x480, 32FC1, 3)   1.488  1.503     0.99   
merge::Size_SrcDepth_DstChannels::(640x480, 32FC1, 4)   2.050  2.042     1.00   
merge::Size_SrcDepth_DstChannels::(640x480, 64FC1, 2)   1.849  1.866     0.99   
merge::Size_SrcDepth_DstChannels::(640x480, 64FC1, 3)   2.690  2.882     0.93   
merge::Size_SrcDepth_DstChannels::(640x480, 64FC1, 4)   3.675  3.606     1.02   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 2)   1.291  0.633     2.04   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 3)   2.405  0.961     2.50   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 4)   3.316  1.269     2.61   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 2)  1.755  1.287     1.36   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 3)  3.907  1.831     2.13   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 4)  5.286  2.662     1.99   
merge::Size_SrcDepth_DstChannels::(1280x720, 32SC1, 2)  3.218  3.278     0.98   
merge::Size_SrcDepth_DstChannels::(1280x720, 32SC1, 3)  4.419  4.281     1.03   
merge::Size_SrcDepth_DstChannels::(1280x720, 32SC1, 4)  6.712  5.657     1.19   
merge::Size_SrcDepth_DstChannels::(1280x720, 32FC1, 2)  3.216  3.246     0.99   
merge::Size_SrcDepth_DstChannels::(1280x720, 32FC1, 3)  4.431  4.435     1.00   
merge::Size_SrcDepth_DstChannels::(1280x720, 32FC1, 4)  5.964  6.215     0.96   
merge::Size_SrcDepth_DstChannels::(1280x720, 64FC1, 2)  6.011  5.450     1.10   
merge::Size_SrcDepth_DstChannels::(1280x720, 64FC1, 3)  9.257  8.583     1.08   
merge::Size_SrcDepth_DstChannels::(1280x720, 64FC1, 4)  14.760 12.584    1.17   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 2)  2.913  1.373     2.12   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 3)  5.393  2.031     2.66   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 4)  7.100  2.693     2.64   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 2) 3.598  2.723     1.32   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 3) 8.991  4.898     1.84   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 4) 12.360 5.709     2.16   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32SC1, 2) 7.165  7.215     0.99   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32SC1, 3) 9.826  9.836     1.00   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32SC1, 4) 13.081 12.754    1.03   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32FC1, 2) 7.157  7.233     0.99   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32FC1, 3) 9.840  9.851     1.00   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32FC1, 4) 13.226 13.259    1.00   
merge::Size_SrcDepth_DstChannels::(1920x1080, 64FC1, 2) 12.899 12.372    1.04   
merge::Size_SrcDepth_DstChannels::(1920x1080, 64FC1, 3) 19.058 17.930    1.06   
merge::Size_SrcDepth_DstChannels::(1920x1080, 64FC1, 4) 27.752 24.288    1.14
GCC 14.2
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 2)     0.012  0.002     4.78   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 3)     0.018  0.005     4.10   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 4)     0.024  0.008     2.86   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 2)    0.014  0.007     1.93   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 3)    0.020  0.015     1.35   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 4)    0.027  0.018     1.55   
merge::Size_SrcDepth_DstChannels::(127x61, 32SC1, 2)    0.025  0.026     0.95   
merge::Size_SrcDepth_DstChannels::(127x61, 32SC1, 3)    0.038  0.037     1.01   
merge::Size_SrcDepth_DstChannels::(127x61, 32SC1, 4)    0.049  0.050     0.99   
merge::Size_SrcDepth_DstChannels::(127x61, 32FC1, 2)    0.025  0.024     1.01   
merge::Size_SrcDepth_DstChannels::(127x61, 32FC1, 3)    0.036  0.039     0.92   
merge::Size_SrcDepth_DstChannels::(127x61, 32FC1, 4)    0.049  0.055     0.90   
merge::Size_SrcDepth_DstChannels::(127x61, 64FC1, 2)    0.038  0.040     0.96   
merge::Size_SrcDepth_DstChannels::(127x61, 64FC1, 3)    0.055  0.062     0.89   
merge::Size_SrcDepth_DstChannels::(127x61, 64FC1, 4)    0.075  0.077     0.98   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 2)    0.417  0.192     2.18   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 3)    0.758  0.288     2.63   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 4)    1.106  0.400     2.77   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 2)   0.480  0.394     1.22   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 3)   1.297  0.602     2.15   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 4)   1.900  0.835     2.27   
merge::Size_SrcDepth_DstChannels::(640x480, 32SC1, 2)   0.940  0.931     1.01   
merge::Size_SrcDepth_DstChannels::(640x480, 32SC1, 3)   1.450  1.498     0.97   
merge::Size_SrcDepth_DstChannels::(640x480, 32SC1, 4)   2.208  2.255     0.98   
merge::Size_SrcDepth_DstChannels::(640x480, 32FC1, 2)   0.969  0.964     1.00   
merge::Size_SrcDepth_DstChannels::(640x480, 32FC1, 3)   1.462  1.533     0.95   
merge::Size_SrcDepth_DstChannels::(640x480, 32FC1, 4)   2.249  2.161     1.04   
merge::Size_SrcDepth_DstChannels::(640x480, 64FC1, 2)   1.838  1.723     1.07   
merge::Size_SrcDepth_DstChannels::(640x480, 64FC1, 3)   2.745  2.777     0.99   
merge::Size_SrcDepth_DstChannels::(640x480, 64FC1, 4)   3.606  3.777     0.95   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 2)   1.269  0.621     2.04   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 3)   2.278  0.927     2.46   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 4)   3.295  1.288     2.56   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 2)  1.616  1.291     1.25   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 3)  3.877  1.917     2.02   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 4)  5.634  2.540     2.22   
merge::Size_SrcDepth_DstChannels::(1280x720, 32SC1, 2)  2.913  2.860     1.02   
merge::Size_SrcDepth_DstChannels::(1280x720, 32SC1, 3)  4.196  4.444     0.94   
merge::Size_SrcDepth_DstChannels::(1280x720, 32SC1, 4)  5.998  5.692     1.05   
merge::Size_SrcDepth_DstChannels::(1280x720, 32FC1, 2)  2.913  2.904     1.00   
merge::Size_SrcDepth_DstChannels::(1280x720, 32FC1, 3)  4.287  4.322     0.99   
merge::Size_SrcDepth_DstChannels::(1280x720, 32FC1, 4)  6.260  5.828     1.07   
merge::Size_SrcDepth_DstChannels::(1280x720, 64FC1, 2)  5.217  5.072     1.03   
merge::Size_SrcDepth_DstChannels::(1280x720, 64FC1, 3)  8.855  8.063     1.10   
merge::Size_SrcDepth_DstChannels::(1280x720, 64FC1, 4)  11.749 11.487    1.02   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 2)  2.893  1.384     2.09   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 3)  5.125  2.025     2.53   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 4)  7.249  2.701     2.68   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 2) 3.547  2.716     1.31   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 3) 8.754  3.982     2.20   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 4) 12.123 5.416     2.24   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32SC1, 2) 6.560  6.449     1.02   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32SC1, 3) 9.712  9.840     0.99   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32SC1, 4) 12.843 12.852    1.00   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32FC1, 2) 6.536  6.461     1.01   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32FC1, 3) 9.644  9.823     0.98   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32FC1, 4) 12.952 13.730    0.94   
merge::Size_SrcDepth_DstChannels::(1920x1080, 64FC1, 2) 12.470 12.822    0.97   
merge::Size_SrcDepth_DstChannels::(1920x1080, 64FC1, 3) 18.344 18.537    0.99   
merge::Size_SrcDepth_DstChannels::(1920x1080, 64FC1, 4) 27.346 27.177    1.01 

However, seems Clang 17, which is the minimum compiler version that supports rvv intrinsic v0.12 (currently used for CI), does not support __riscv_vcreate_..., but it can be compiled on the latest compiler release (Clang 19.1.7 and GCC 14.2).

Maybe we should add compiler version check

@mshabunin
Copy link
Copy Markdown
Contributor

I agree, we need to somehow keep compatibility with clang 17. Perhaps we can raise version i precommit builders, but keep old image in weekly builds. I'll take a look.

@dkurt
Copy link
Copy Markdown
Member Author

dkurt commented Feb 16, 2025

@mshabunin, I can modify this PR to be compatible with clang 17, so we can postpone upgrade to newer version until it’s critical.

@dkurt
Copy link
Copy Markdown
Member Author

dkurt commented Feb 18, 2025

@asmorkalov just a note that GitHub updated UI and CI mark is misleading (2 jobs failed and was restarted, but mark is ✔️) :
image

@asmorkalov
Copy link
Copy Markdown
Contributor

My performance results for Spacemit Muse Pi v 30 (GCC 14.2.1):

merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 2)                                               0.012    0.003     4.23   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 3)                                               0.020    0.006     3.54   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 4)                                               0.025    0.010     2.59   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 2)                                              0.015    0.009     1.78   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 3)                                              0.022    0.016     1.38   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 4)                                              0.028    0.019     1.50   
merge::Size_SrcDepth_DstChannels::(127x61, 32SC1, 2)                                              0.025    0.026     0.97   
merge::Size_SrcDepth_DstChannels::(127x61, 32SC1, 3)                                              0.037    0.038     0.98   
merge::Size_SrcDepth_DstChannels::(127x61, 32SC1, 4)                                              0.051    0.051     1.00   
merge::Size_SrcDepth_DstChannels::(127x61, 32FC1, 2)                                              0.025    0.026     0.97   
merge::Size_SrcDepth_DstChannels::(127x61, 32FC1, 3)                                              0.036    0.037     0.97   
merge::Size_SrcDepth_DstChannels::(127x61, 32FC1, 4)                                              0.050    0.050     1.00   
merge::Size_SrcDepth_DstChannels::(127x61, 64FC1, 2)                                              0.040    0.039     1.03   
merge::Size_SrcDepth_DstChannels::(127x61, 64FC1, 3)                                              0.062    0.062     0.99   
merge::Size_SrcDepth_DstChannels::(127x61, 64FC1, 4)                                              0.078    0.077     1.01   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 2)                                              0.420    0.188     2.23   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 3)                                              0.779    0.300     2.60   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 4)                                              1.119    0.395     2.83   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 2)                                             0.486    0.394     1.23   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 3)                                             1.391    0.645     2.15   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 4)                                             1.847    0.875     2.11   
merge::Size_SrcDepth_DstChannels::(640x480, 32SC1, 2)                                             0.888    0.927     0.96   
merge::Size_SrcDepth_DstChannels::(640x480, 32SC1, 3)                                             1.430    1.410     1.01   
merge::Size_SrcDepth_DstChannels::(640x480, 32SC1, 4)                                             2.082    2.024     1.03   
merge::Size_SrcDepth_DstChannels::(640x480, 32FC1, 2)                                             0.890    0.910     0.98   
merge::Size_SrcDepth_DstChannels::(640x480, 32FC1, 3)                                             1.429    1.445     0.99   
merge::Size_SrcDepth_DstChannels::(640x480, 32FC1, 4)                                             2.115    2.127     0.99   
merge::Size_SrcDepth_DstChannels::(640x480, 64FC1, 2)                                             1.742    1.767     0.99   
merge::Size_SrcDepth_DstChannels::(640x480, 64FC1, 3)                                             2.780    2.954     0.94   
merge::Size_SrcDepth_DstChannels::(640x480, 64FC1, 4)                                             3.873    3.613     1.07   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 2)                                             1.271    0.604     2.10   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 3)                                             2.415    0.943     2.56   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 4)                                             3.292    1.296     2.54   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 2)                                            1.628    1.204     1.35   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 3)                                            4.199    1.875     2.24   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 4)                                            5.944    2.553     2.33   
merge::Size_SrcDepth_DstChannels::(1280x720, 32SC1, 2)                                            2.793    2.779     1.01   
merge::Size_SrcDepth_DstChannels::(1280x720, 32SC1, 3)                                            4.306    4.241     1.02   
merge::Size_SrcDepth_DstChannels::(1280x720, 32SC1, 4)                                            6.332    6.648     0.95   
merge::Size_SrcDepth_DstChannels::(1280x720, 32FC1, 2)                                            2.777    2.808     0.99   
merge::Size_SrcDepth_DstChannels::(1280x720, 32FC1, 3)                                            4.333    4.249     1.02   
merge::Size_SrcDepth_DstChannels::(1280x720, 32FC1, 4)                                            6.305    6.858     0.92   
merge::Size_SrcDepth_DstChannels::(1280x720, 64FC1, 2)                                            5.202    6.026     0.86   
merge::Size_SrcDepth_DstChannels::(1280x720, 64FC1, 3)                                            8.345    9.370     0.89   
merge::Size_SrcDepth_DstChannels::(1280x720, 64FC1, 4)                                            12.057  14.086     0.86   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 2)                                            2.861    1.359     2.11   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 3)                                            5.426    2.066     2.63   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 4)                                            7.541    2.778     2.71   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 2)                                           3.534    2.684     1.32   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 3)                                           9.315    3.994     2.33   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 4)                                           12.649   5.349     2.36   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32SC1, 2)                                           6.257    6.247     1.00   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32SC1, 3)                                           9.667    9.405     1.03   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32SC1, 4)                                           13.389  12.657     1.06   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32FC1, 2)                                           6.296    6.280     1.00   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32FC1, 3)                                           9.613    9.489     1.01   
merge::Size_SrcDepth_DstChannels::(1920x1080, 32FC1, 4)                                           13.265  12.978     1.02   
merge::Size_SrcDepth_DstChannels::(1920x1080, 64FC1, 2)                                           11.609  12.137     0.96   
merge::Size_SrcDepth_DstChannels::(1920x1080, 64FC1, 3)                                           18.184  17.366     1.05   
merge::Size_SrcDepth_DstChannels::(1920x1080, 64FC1, 4)                                           26.313  24.362     1.08

@asmorkalov
Copy link
Copy Markdown
Contributor

@mshabunin Could I merge the PR?

@asmorkalov asmorkalov self-assigned this Feb 20, 2025
@asmorkalov asmorkalov merged commit 7a2b048 into opencv:4.x Feb 20, 2025
27 of 29 checks passed
@dkurt dkurt deleted the merge_rvv_opt branch February 20, 2025 14:42
NanQin555 pushed a commit to NanQin555/opencv that referenced this pull request Feb 24, 2025
Further optimization of cv::merge RVV HAL for 8U and 16S opencv#26923

### Pull Request Readiness Checklist


* Banana Pi BF3 (SpacemiT K1) RISC-V
* Compiler: Syntacore Clang 18.1.4 (build 2024.12)

```
Geometric mean (ms)

                     Name of Test                       baseline   pr       pr
                                                         merge              vs    
                                                                         baseline
                                                                          merge
                                                                        (x-factor)
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 2)      0.013   0.003     3.76   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 3)      0.020   0.006     3.46   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 4)      0.026   0.010     2.61   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 5)      0.043   0.028     1.56   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 6)      0.054   0.035     1.53   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 7)      0.065   0.050     1.30   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 8)      0.070   0.036     1.95   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 2)     0.015   0.008     1.82   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 3)     0.022   0.015     1.48   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 4)     0.029   0.018     1.63   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 5)     0.067   0.044     1.54   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 6)     0.088   0.056     1.58   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 7)     0.104   0.076     1.38   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 8)     0.116   0.065     1.79   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 2)     0.421   0.176     2.39   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 3)     0.792   0.284     2.79   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 4)     1.090   0.370     2.95   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 5)     1.835   1.399     1.31   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 6)     2.389   1.776     1.35   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 7)     3.000   2.471     1.21   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 8)     3.178   2.104     1.51   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 2)    0.490   0.377     1.30   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 3)    1.348   0.602     2.24   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 4)    1.827   0.813     2.25   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 5)    3.283   2.692     1.22   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 6)    4.922   3.334     1.48   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 7)    5.725   4.399     1.30   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 8)    6.278   4.748     1.32   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 2)    1.267   0.603     2.10   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 3)    2.394   0.934     2.56   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 4)    3.236   1.434     2.26   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 5)    5.398   4.345     1.24   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 6)    7.127   5.459     1.31   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 7)    8.590   7.298     1.18   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 8)    9.360   6.152     1.52   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 2)   1.482   1.242     1.19   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 3)   4.008   1.817     2.21   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 4)   6.079   2.468     2.46   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 5)   11.300  8.644     1.31   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 6)   15.125  12.126    1.25   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 7)   17.555  14.804    1.19   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 8)   18.890  14.163    1.33   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 2)   2.910   1.326     2.19   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 3)   5.351   1.997     2.68   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 4)   7.290   2.629     2.77   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 5)   12.426  9.611     1.29   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 6)   16.453  12.162    1.35   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 7)   19.420  16.190    1.20   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 8)   20.588  13.699    1.50   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 2)  3.400   2.640     1.29   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 3)  8.986   3.952     2.27   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 4)  11.972  5.273     2.27   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 5)  20.544  17.996    1.14   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 6)  28.677  22.086    1.30   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 7)  32.958  27.713    1.19   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 8)  36.499  27.439    1.33
```

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
@asmorkalov asmorkalov mentioned this pull request Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants