Skip to content

Impl RISC-V HAL for cv::flip | Add perf test for flip#26943

Merged
asmorkalov merged 2 commits intoopencv:4.xfrom
GenshinImpactStarts:flip_hal_rvv
Feb 24, 2025
Merged

Impl RISC-V HAL for cv::flip | Add perf test for flip#26943
asmorkalov merged 2 commits intoopencv:4.xfrom
GenshinImpactStarts:flip_hal_rvv

Conversation

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor

Implement through the existing cv_hal_flip interfaces.

Add perf test for cv::flip.

The reason why select these args for testing:

  • size: copied from perf_lut
  • type:
    • U8C1: basic situation
    • U8C3: unaligned element size
    • U8C4: large element size

Tested on

  • MUSE-PI (vlen=256)
  • Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024)
$ opencv_test_core --gtest_filter="Core_Flip/ElemWiseTest.*"
$ opencv_perf_core --gtest_filter="Size_MatType_FlipCode*" --perf_min_samples=300 --perf_force_samples=300
Geometric mean (ms)

                     Name of Test                       scalar   ui    rvv       ui        rvv    
                                                                                 vs         vs    
                                                                               scalar     scalar  
                                                                             (x-factor) (x-factor)
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_X)    0.026  0.033  0.031     0.81       0.84   
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_XY)   0.206  0.212  0.091     0.97       2.26   
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_Y)    0.185  0.189  0.082     0.98       2.25   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_X)    0.070  0.084  0.084     0.83       0.83   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_XY)   0.616  0.612  0.235     1.01       2.62   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_Y)    0.587  0.603  0.204     0.97       2.88   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_X)    0.263  0.110  0.109     2.40       2.41   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_XY)   0.930  0.831  0.316     1.12       2.95   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_Y)    1.175  1.129  0.313     1.04       3.75   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_X)    0.303  0.118  0.111     2.57       2.73   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_XY)   0.949  0.836  0.405     1.14       2.34   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_Y)    0.784  0.783  0.409     1.00       1.92   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_X)    1.084  0.360  0.355     3.01       3.06   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_XY)   3.768  3.348  1.364     1.13       2.76   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_Y)    4.361  4.473  1.296     0.97       3.37   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_X)    1.252  0.469  0.451     2.67       2.78   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_XY)   5.732  5.220  1.303     1.10       4.40   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_Y)    5.041  5.105  1.203     0.99       4.19   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_X)  2.382  0.903  0.903     2.64       2.64   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_XY) 8.606  7.508  2.581     1.15       3.33   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_Y)  8.421  8.535  2.219     0.99       3.80   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_X)  6.312  2.416  2.429     2.61       2.60   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_XY) 29.174 26.055 12.761    1.12       2.29   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_Y)  25.373 25.500 13.382    1.00       1.90   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_X)  7.620  3.204  3.115     2.38       2.45   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_XY) 32.876 29.310 12.976    1.12       2.53   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_Y)  28.831 29.094 14.919    0.99       1.93   

The optimization for vlen <= 256 and > 256 are different, but I have no real hardware with vlen > 256. So accuracy tests for that like 512 and 1024 are conducted on QEMU built from the riscv-collab/riscv-gnu-toolchain.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

In the unsuccessful default check, Linux x64 Debug failed test_videoio. But the test is killed because not output running buildenv python run.py. I don't think this issue is caused by the patch.

@mshabunin
Copy link
Copy Markdown
Contributor

mshabunin commented Feb 20, 2025

There are some issues with tests Core_Rotate/ElemWiseTest.accuracy and Core_SolvePoly.accuracy (clang + qemu). Please take a look.

@asmorkalov
Copy link
Copy Markdown
Contributor

hm.., I do not see the issue with Muse Pi v 30 and GCC. It may be vector size issue.

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

It's my fault. I forget to consider the situation src == dst and the data may be overwrittem before read when flip along Y or XY axis.

To solve this problem, I need to make the helper struct longer. But it's too ugly. I think a helper file holding helper class like following needed to be added:

template<typename T, size_t m>
struct RVV;

where T may be like uchar and m is the LMUL. And this file can also benefit all other hal_rvv works. But I have no idea where I should add this file and whether I should add this file in another pull request. Could you give me some suggestions?

@asmorkalov
Copy link
Copy Markdown
Contributor

You can return NOT_IMPLEMENTED status if src==dst right now and make another optimization round.

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

OK. And I have looked serveral perf_*.cpp in core module. None of them have the license header. Should I add it?

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
@fengyuentau fengyuentau self-requested a review February 20, 2025 17:40
@asmorkalov
Copy link
Copy Markdown
Contributor

My performance numbers for Muse Pi v 30 (GCC 14.2):

flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_X) 	0.035 	0.031 	1.13
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_XY) 	0.237 	0.090 	2.64
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_Y) 	0.210 	0.084 	2.50
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_X) 	0.105 	0.105 	1.00
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_XY) 	0.674 	0.280 	2.40
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_Y) 	0.642 	0.246 	2.61
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_X) 	0.139 	0.136 	1.03
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_XY) 	0.908 	0.328 	2.77
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_Y) 	1.259 	0.322 	3.91
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_X) 	0.137 	0.127 	1.09
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_XY) 	0.913 	0.396 	2.31
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_Y) 	0.868 	0.384 	2.26
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_X) 	0.384 	0.354 	1.08
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_XY) 	3.514 	1.298 	2.71
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_Y) 	4.536 	1.186 	3.82
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_X) 	0.463 	0.465 	1.00
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_XY) 	5.553 	1.350 	4.11
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_Y) 	5.625 	1.155 	4.87
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_X) 	0.924 	0.880 	1.05
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_XY) 	8.491 	2.594 	3.27
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_Y) 	8.998 	2.294 	3.92
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_X) 	2.470 	2.498 	0.99
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_XY) 	27.718 	12.883 	2.15
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_Y) 	27.196 	13.471 	2.02
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_X) 	3.252 	3.139 	1.04
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_XY) 	31.627 	13.076 	2.42
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_Y) 	30.696 	13.904 	2.21 

@asmorkalov asmorkalov self-assigned this Feb 24, 2025
@asmorkalov asmorkalov merged commit 6a6a5a7 into opencv:4.x Feb 24, 2025
28 checks passed
@asmorkalov asmorkalov mentioned this pull request Mar 4, 2025
@GenshinImpactStarts GenshinImpactStarts deleted the flip_hal_rvv branch March 12, 2025 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants