Further optimization of cv::merge RVV HAL for 8U and 16S#26923
Further optimization of cv::merge RVV HAL for 8U and 16S#26923asmorkalov merged 3 commits intoopencv:4.xfrom
Conversation
|
@hanliutong @amane-ame , may I ask you review this proposal and verify that MUSE-PI will also have performance improvement? |
|
Hi @dkurt , The performance on MUSE-Pi is as follows, it also have improvement: Clang 19.1.7GCC 14.2However, seems Clang 17, which is the minimum compiler version that supports rvv intrinsic v0.12 (currently used for CI), does not support Maybe we should add compiler version check |
|
I agree, we need to somehow keep compatibility with clang 17. Perhaps we can raise version i precommit builders, but keep old image in weekly builds. I'll take a look. |
|
@mshabunin, I can modify this PR to be compatible with clang 17, so we can postpone upgrade to newer version until it’s critical. |
|
@asmorkalov just a note that GitHub updated UI and CI mark is misleading (2 jobs failed and was restarted, but mark is ✔️) : |
|
My performance results for Spacemit Muse Pi v 30 (GCC 14.2.1): |
|
@mshabunin Could I merge the PR? |
Further optimization of cv::merge RVV HAL for 8U and 16S opencv#26923 ### Pull Request Readiness Checklist * Banana Pi BF3 (SpacemiT K1) RISC-V * Compiler: Syntacore Clang 18.1.4 (build 2024.12) ``` Geometric mean (ms) Name of Test baseline pr pr merge vs baseline merge (x-factor) merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 2) 0.013 0.003 3.76 merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 3) 0.020 0.006 3.46 merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 4) 0.026 0.010 2.61 merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 5) 0.043 0.028 1.56 merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 6) 0.054 0.035 1.53 merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 7) 0.065 0.050 1.30 merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 8) 0.070 0.036 1.95 merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 2) 0.015 0.008 1.82 merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 3) 0.022 0.015 1.48 merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 4) 0.029 0.018 1.63 merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 5) 0.067 0.044 1.54 merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 6) 0.088 0.056 1.58 merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 7) 0.104 0.076 1.38 merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 8) 0.116 0.065 1.79 merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 2) 0.421 0.176 2.39 merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 3) 0.792 0.284 2.79 merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 4) 1.090 0.370 2.95 merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 5) 1.835 1.399 1.31 merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 6) 2.389 1.776 1.35 merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 7) 3.000 2.471 1.21 merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 8) 3.178 2.104 1.51 merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 2) 0.490 0.377 1.30 merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 3) 1.348 0.602 2.24 merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 4) 1.827 0.813 2.25 merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 5) 3.283 2.692 1.22 merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 6) 4.922 3.334 1.48 merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 7) 5.725 4.399 1.30 merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 8) 6.278 4.748 1.32 merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 2) 1.267 0.603 2.10 merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 3) 2.394 0.934 2.56 merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 4) 3.236 1.434 2.26 merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 5) 5.398 4.345 1.24 merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 6) 7.127 5.459 1.31 merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 7) 8.590 7.298 1.18 merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 8) 9.360 6.152 1.52 merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 2) 1.482 1.242 1.19 merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 3) 4.008 1.817 2.21 merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 4) 6.079 2.468 2.46 merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 5) 11.300 8.644 1.31 merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 6) 15.125 12.126 1.25 merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 7) 17.555 14.804 1.19 merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 8) 18.890 14.163 1.33 merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 2) 2.910 1.326 2.19 merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 3) 5.351 1.997 2.68 merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 4) 7.290 2.629 2.77 merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 5) 12.426 9.611 1.29 merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 6) 16.453 12.162 1.35 merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 7) 19.420 16.190 1.20 merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 8) 20.588 13.699 1.50 merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 2) 3.400 2.640 1.29 merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 3) 8.986 3.952 2.27 merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 4) 11.972 5.273 2.27 merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 5) 20.544 17.996 1.14 merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 6) 28.677 22.086 1.30 merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 7) 32.958 27.713 1.19 merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 8) 36.499 27.439 1.33 ``` See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.