Skip to content

RISC-V: fix unaligned loads and stores#23973

Merged
asmorkalov merged 1 commit intoopencv:4.xfrom
mshabunin:riscv-unaligned-access
Jul 12, 2023
Merged

RISC-V: fix unaligned loads and stores#23973
asmorkalov merged 1 commit intoopencv:4.xfrom
mshabunin:riscv-unaligned-access

Conversation

@mshabunin
Copy link
Copy Markdown
Contributor

We experienced Segmentation Fault errors in core Flip and imgproc Bayer tests. Debugging has shown that issues are caused by an unaligned memory access in RVV code. This PR fixes them.

Below are quick performance comparison:

x86_64 performance results

Core i5-11600

Name of Test before after (x-factor)
Flip::OCL_FlipFixture::(640x480, 8UC1, FLIP_BOTH) 0.008 0.009 0.99
Flip::OCL_FlipFixture::(640x480, 8UC1, FLIP_COLS) 0.009 0.009 1.00
Flip::OCL_FlipFixture::(640x480, 8UC1, FLIP_ROWS) 0.008 0.008 1.05
Flip::OCL_FlipFixture::(640x480, 32FC1, FLIP_BOTH) 0.033 0.033 1.00
Flip::OCL_FlipFixture::(640x480, 32FC1, FLIP_COLS) 0.035 0.035 1.00
Flip::OCL_FlipFixture::(640x480, 32FC1, FLIP_ROWS) 0.035 0.035 1.00
Flip::OCL_FlipFixture::(640x480, 8UC4, FLIP_BOTH) 0.033 0.033 1.00
Flip::OCL_FlipFixture::(640x480, 8UC4, FLIP_COLS) 0.035 0.035 1.00
Flip::OCL_FlipFixture::(640x480, 8UC4, FLIP_ROWS) 0.037 0.036 1.00
Flip::OCL_FlipFixture::(640x480, 32FC4, FLIP_BOTH) 0.151 0.152 0.99
Flip::OCL_FlipFixture::(640x480, 32FC4, FLIP_COLS) 0.155 0.156 0.99
Flip::OCL_FlipFixture::(640x480, 32FC4, FLIP_ROWS) 0.155 0.155 1.00
Flip::OCL_FlipFixture::(1280x720, 8UC1, FLIP_BOTH) 0.026 0.026 1.00
Flip::OCL_FlipFixture::(1280x720, 8UC1, FLIP_COLS) 0.026 0.026 1.00
Flip::OCL_FlipFixture::(1280x720, 8UC1, FLIP_ROWS) 0.027 0.027 1.00
Flip::OCL_FlipFixture::(1280x720, 32FC1, FLIP_BOTH) 0.100 0.101 1.00
Flip::OCL_FlipFixture::(1280x720, 32FC1, FLIP_COLS) 0.103 0.103 1.00
Flip::OCL_FlipFixture::(1280x720, 32FC1, FLIP_ROWS) 0.104 0.107 0.97
Flip::OCL_FlipFixture::(1280x720, 8UC4, FLIP_BOTH) 0.100 0.101 1.00
Flip::OCL_FlipFixture::(1280x720, 8UC4, FLIP_COLS) 0.103 0.104 1.00
Flip::OCL_FlipFixture::(1280x720, 8UC4, FLIP_ROWS) 0.106 0.107 1.00
Flip::OCL_FlipFixture::(1280x720, 32FC4, FLIP_BOTH) 0.600 0.608 0.99
Flip::OCL_FlipFixture::(1280x720, 32FC4, FLIP_COLS) 0.602 0.609 0.99
Flip::OCL_FlipFixture::(1280x720, 32FC4, FLIP_ROWS) 1.001 1.004 1.00
Flip::OCL_FlipFixture::(1920x1080, 8UC1, FLIP_BOTH) 0.057 0.057 1.00
Flip::OCL_FlipFixture::(1920x1080, 8UC1, FLIP_COLS) 0.056 0.056 1.00
Flip::OCL_FlipFixture::(1920x1080, 8UC1, FLIP_ROWS) 0.061 0.061 1.00
Flip::OCL_FlipFixture::(1920x1080, 32FC1, FLIP_BOTH) 0.214 0.215 1.00
Flip::OCL_FlipFixture::(1920x1080, 32FC1, FLIP_COLS) 0.214 0.216 1.00
Flip::OCL_FlipFixture::(1920x1080, 32FC1, FLIP_ROWS) 0.398 0.394 1.01
Flip::OCL_FlipFixture::(1920x1080, 8UC4, FLIP_BOTH) 0.215 0.216 0.99
Flip::OCL_FlipFixture::(1920x1080, 8UC4, FLIP_COLS) 0.214 0.216 0.99
Flip::OCL_FlipFixture::(1920x1080, 8UC4, FLIP_ROWS) 0.276 0.278 0.99
Flip::OCL_FlipFixture::(1920x1080, 32FC4, FLIP_BOTH) 1.517 1.519 1.00
Flip::OCL_FlipFixture::(1920x1080, 32FC4, FLIP_COLS) 1.521 1.514 1.00
Flip::OCL_FlipFixture::(1920x1080, 32FC4, FLIP_ROWS) 1.534 1.542 0.99
Flip::OCL_FlipFixture::(3840x2160, 8UC1, FLIP_BOTH) 0.260 0.254 1.02
Flip::OCL_FlipFixture::(3840x2160, 8UC1, FLIP_COLS) 0.257 0.250 1.03
Flip::OCL_FlipFixture::(3840x2160, 8UC1, FLIP_ROWS) 0.403 0.412 0.98
Flip::OCL_FlipFixture::(3840x2160, 32FC1, FLIP_BOTH) 1.450 1.459 0.99
Flip::OCL_FlipFixture::(3840x2160, 32FC1, FLIP_COLS) 1.475 1.477 1.00
Flip::OCL_FlipFixture::(3840x2160, 32FC1, FLIP_ROWS) 2.373 2.363 1.00
Flip::OCL_FlipFixture::(3840x2160, 8UC4, FLIP_BOTH) 1.461 1.462 1.00
Flip::OCL_FlipFixture::(3840x2160, 8UC4, FLIP_COLS) 1.481 1.473 1.01
Flip::OCL_FlipFixture::(3840x2160, 8UC4, FLIP_ROWS) 1.551 1.546 1.00
Flip::OCL_FlipFixture::(3840x2160, 32FC4, FLIP_BOTH) 6.487 6.449 1.01
Flip::OCL_FlipFixture::(3840x2160, 32FC4, FLIP_COLS) 6.474 6.464 1.00
Flip::OCL_FlipFixture::(3840x2160, 32FC4, FLIP_ROWS) 6.528 6.518 1.00
Name of Test before after (x-factor)
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerBG2BGR) 0.003 0.003 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerBG2BGRA) 0.003 0.003 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerBG2BGR_VNG) 0.038 0.038 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerBG2GRAY) 0.003 0.003 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGB2BGR) 0.003 0.003 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGB2BGRA) 0.003 0.003 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGB2BGR_VNG) 0.038 0.038 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGB2GRAY) 0.003 0.003 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGR2BGR) 0.003 0.003 1.01
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGR2BGRA) 0.003 0.003 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGR2BGR_VNG) 0.037 0.038 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGR2GRAY) 0.003 0.003 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerRG2BGR) 0.003 0.003 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerRG2BGRA) 0.003 0.003 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerRG2BGR_VNG) 0.038 0.037 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerRG2GRAY) 0.003 0.003 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerBG2BGR) 0.042 0.043 0.97
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerBG2BGRA) 0.039 0.039 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerBG2BGR_VNG) 1.534 1.542 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerBG2GRAY) 0.031 0.031 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGB2BGR) 0.039 0.042 0.92
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGB2BGRA) 0.035 0.038 0.93
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGB2BGR_VNG) 1.534 1.542 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGB2GRAY) 0.028 0.032 0.87
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGR2BGR) 0.038 0.043 0.89
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGR2BGRA) 0.036 0.039 0.92
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGR2BGR_VNG) 1.535 1.546 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGR2GRAY) 0.028 0.031 0.90
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerRG2BGR) 0.038 0.042 0.92
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerRG2BGRA) 0.035 0.038 0.93
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerRG2BGR_VNG) 1.535 1.544 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerRG2GRAY) 0.029 0.028 1.02
AArch64 performance results

RK3588

Name of Test before after (x-factor)
Flip::OCL_FlipFixture::(640x480, 8UC1, FLIP_BOTH) 0.040 0.036 1.12
Flip::OCL_FlipFixture::(640x480, 8UC1, FLIP_COLS) 0.018 0.018 1.01
Flip::OCL_FlipFixture::(640x480, 8UC1, FLIP_ROWS) 0.011 0.011 1.01
Flip::OCL_FlipFixture::(640x480, 32FC1, FLIP_BOTH) 0.239 0.261 0.92
Flip::OCL_FlipFixture::(640x480, 32FC1, FLIP_COLS) 0.174 0.139 1.25
Flip::OCL_FlipFixture::(640x480, 32FC1, FLIP_ROWS) 0.132 0.115 1.14
Flip::OCL_FlipFixture::(640x480, 8UC4, FLIP_BOTH) 0.152 0.185 0.82
Flip::OCL_FlipFixture::(640x480, 8UC4, FLIP_COLS) 0.086 0.110 0.78
Flip::OCL_FlipFixture::(640x480, 8UC4, FLIP_ROWS) 0.067 0.089 0.75
Flip::OCL_FlipFixture::(640x480, 32FC4, FLIP_BOTH) 1.120 1.277 0.88
Flip::OCL_FlipFixture::(640x480, 32FC4, FLIP_COLS) 0.675 0.759 0.89
Flip::OCL_FlipFixture::(640x480, 32FC4, FLIP_ROWS) 0.416 0.474 0.88
Flip::OCL_FlipFixture::(1280x720, 8UC1, FLIP_BOTH) 0.110 0.119 0.92
Flip::OCL_FlipFixture::(1280x720, 8UC1, FLIP_COLS) 0.064 0.057 1.13
Flip::OCL_FlipFixture::(1280x720, 8UC1, FLIP_ROWS) 0.043 0.044 0.97
Flip::OCL_FlipFixture::(1280x720, 32FC1, FLIP_BOTH) 0.837 0.851 0.98
Flip::OCL_FlipFixture::(1280x720, 32FC1, FLIP_COLS) 0.487 0.519 0.94
Flip::OCL_FlipFixture::(1280x720, 32FC1, FLIP_ROWS) 0.348 0.371 0.94
Flip::OCL_FlipFixture::(1280x720, 8UC4, FLIP_BOTH) 0.783 0.840 0.93
Flip::OCL_FlipFixture::(1280x720, 8UC4, FLIP_COLS) 0.484 0.502 0.96
Flip::OCL_FlipFixture::(1280x720, 8UC4, FLIP_ROWS) 0.351 0.377 0.93
Flip::OCL_FlipFixture::(1280x720, 32FC4, FLIP_BOTH) 3.199 3.580 0.89
Flip::OCL_FlipFixture::(1280x720, 32FC4, FLIP_COLS) 1.763 2.033 0.87
Flip::OCL_FlipFixture::(1280x720, 32FC4, FLIP_ROWS) 1.246 1.455 0.86
Flip::OCL_FlipFixture::(1920x1080, 8UC1, FLIP_BOTH) 0.276 0.296 0.93
Flip::OCL_FlipFixture::(1920x1080, 8UC1, FLIP_COLS) 0.201 0.206 0.98
Flip::OCL_FlipFixture::(1920x1080, 8UC1, FLIP_ROWS) 0.164 0.165 0.99
Flip::OCL_FlipFixture::(1920x1080, 32FC1, FLIP_BOTH) 1.841 2.070 0.89
Flip::OCL_FlipFixture::(1920x1080, 32FC1, FLIP_COLS) 1.118 1.192 0.94
Flip::OCL_FlipFixture::(1920x1080, 32FC1, FLIP_ROWS) 0.740 0.851 0.87
Flip::OCL_FlipFixture::(1920x1080, 8UC4, FLIP_BOTH) 1.847 2.077 0.89
Flip::OCL_FlipFixture::(1920x1080, 8UC4, FLIP_COLS) 1.105 1.211 0.91
Flip::OCL_FlipFixture::(1920x1080, 8UC4, FLIP_ROWS) 0.738 0.851 0.87
Flip::OCL_FlipFixture::(1920x1080, 32FC4, FLIP_BOTH) 7.048 8.085 0.87
Flip::OCL_FlipFixture::(1920x1080, 32FC4, FLIP_COLS) 3.593 4.216 0.85
Flip::OCL_FlipFixture::(1920x1080, 32FC4, FLIP_ROWS) 2.835 3.362 0.84
Flip::OCL_FlipFixture::(3840x2160, 8UC1, FLIP_BOTH) 1.891 2.145 0.88
Flip::OCL_FlipFixture::(3840x2160, 8UC1, FLIP_COLS) 0.991 1.146 0.86
Flip::OCL_FlipFixture::(3840x2160, 8UC1, FLIP_ROWS) 0.796 0.937 0.85
Flip::OCL_FlipFixture::(3840x2160, 32FC1, FLIP_BOTH) 9.008 10.067 0.89
Flip::OCL_FlipFixture::(3840x2160, 32FC1, FLIP_COLS) 4.972 5.634 0.88
Flip::OCL_FlipFixture::(3840x2160, 32FC1, FLIP_ROWS) 3.041 3.505 0.87
Flip::OCL_FlipFixture::(3840x2160, 8UC4, FLIP_BOTH) 9.044 10.042 0.90
Flip::OCL_FlipFixture::(3840x2160, 8UC4, FLIP_COLS) 4.970 5.625 0.88
Flip::OCL_FlipFixture::(3840x2160, 8UC4, FLIP_ROWS) 3.122 3.506 0.89
Flip::OCL_FlipFixture::(3840x2160, 32FC4, FLIP_BOTH) 28.244 30.723 0.92
Flip::OCL_FlipFixture::(3840x2160, 32FC4, FLIP_COLS) 13.721 13.630 1.01
Flip::OCL_FlipFixture::(3840x2160, 32FC4, FLIP_ROWS) 12.112 12.300 0.98
Name of Test before after (x-factor)
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerBG2BGR) 0.009 0.009 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerBG2BGRA) 0.010 0.010 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerBG2BGR_VNG) 0.095 0.096 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerBG2GRAY) 0.008 0.008 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGB2BGR) 0.009 0.009 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGB2BGRA) 0.010 0.010 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGB2BGR_VNG) 0.095 0.096 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGB2GRAY) 0.008 0.008 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGR2BGR) 0.009 0.009 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGR2BGRA) 0.010 0.010 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGR2BGR_VNG) 0.095 0.096 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerGR2GRAY) 0.008 0.008 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerRG2BGR) 0.009 0.009 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerRG2BGRA) 0.010 0.010 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerRG2BGR_VNG) 0.095 0.095 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(127x61, COLOR_BayerRG2GRAY) 0.008 0.008 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerBG2BGR) 0.162 0.163 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerBG2BGRA) 0.173 0.174 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerBG2BGR_VNG) 4.003 4.031 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerBG2GRAY) 0.174 0.147 1.19
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGB2BGR) 0.162 0.163 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGB2BGRA) 0.173 0.174 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGB2BGR_VNG) 3.990 4.011 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGB2GRAY) 0.151 0.171 0.88
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGR2BGR) 0.165 0.163 1.01
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGR2BGRA) 0.173 0.174 0.99
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGR2BGR_VNG) 4.006 4.014 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerGR2GRAY) 0.154 0.171 0.90
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerRG2BGR) 0.163 0.163 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerRG2BGRA) 0.173 0.174 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerRG2BGR_VNG) 3.992 4.007 1.00
cvtColorBayer8u::Size_CvtMode_Bayer::(640x480, COLOR_BayerRG2GRAY) 0.164 0.160 1.03

Copy link
Copy Markdown
Contributor

@opencv-alalek opencv-alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@asmorkalov asmorkalov merged commit 85f0074 into opencv:4.x Jul 12, 2023
@mshabunin mshabunin deleted the riscv-unaligned-access branch July 12, 2023 12:14
@asmorkalov asmorkalov mentioned this pull request Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants