Skip to content

Change the lsx to baseline features.#24565

Merged
asmorkalov merged 1 commit intoopencv:4.xfrom
CNClareChen:4.x
Nov 30, 2023
Merged

Change the lsx to baseline features.#24565
asmorkalov merged 1 commit intoopencv:4.xfrom
CNClareChen:4.x

Conversation

@CNClareChen
Copy link
Copy Markdown
Contributor

@CNClareChen CNClareChen commented Nov 21, 2023

This patch change lsx to baseline feature, and lasx to dispatch feature. Additionally, the runtime detection methods for lasx and lsx have been modified.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

This patch change lsx to baseline feature, and lasx to dispatch
feature. Additionally, the runtime detection methods for lasx and
lsx have been modified.
@CNClareChen
Copy link
Copy Markdown
Contributor Author

@fengyuentau Please run acc and perf test. The last PR was closed for some reason, so I applied for a new one.

@asmorkalov asmorkalov added optimization platform: loongson Loonson CPU architecure and LASX simd labels Nov 21, 2023
@asmorkalov asmorkalov added this to the 4.9.0 milestone Nov 21, 2023
@fengyuentau fengyuentau self-assigned this Nov 28, 2023
@fengyuentau
Copy link
Copy Markdown
Member

Hello @CNClareChen , accuracy tests are all good, the only thing confused me is the performance regressions. Could you elaborate them?

Geometric mean (ms)

                                          Name of Test                                             opencv    opencv    opencv
                                                                                                    perf      perf      perf
                                                                                                 core.before core.w    core.w
                                                                                                                         vs
                                                                                                                       opencv
                                                                                                                        perf
                                                                                                                     core.before
                                                                                                                     (x-factor)
LUT::SizePrm::1920x1080                                                                             0.289     0.550     0.52
Mat_Clone::Size_MatType::(127x61, 32FC1)                                                            0.003     0.003     0.95
Mat_CopyToWithMask::Size_MatType::(127x61, 8UC1)                                                    0.001     0.001     0.67
Mat_CopyToWithMask::Size_MatType::(127x61, 16UC1)                                                   0.001     0.001     0.78
Mat_CopyToWithMask::Size_MatType::(127x61, 8UC2)                                                    0.001     0.001     0.78
Mat_CopyToWithMask::Size_MatType::(1920x1080, 8UC1)                                                 0.197     0.286     0.69
Mat_CopyToWithMask::Size_MatType::(1920x1080, 8UC3)                                                 6.479     7.070     0.92
Mat_SetToWithMask::Size_MatType::(127x61, 8UC1)                                                     0.005     0.005     0.92
Mat_SetToWithMask::Size_MatType::(127x61, 8UC2)                                                     0.003     0.003     0.82
Mat_SetToWithMask::Size_MatType::(640x480, 8UC1)                                                    0.030     0.041     0.73
Mat_SetToWithMask::Size_MatType::(640x480, 8UC2)                                                    0.050     0.062     0.81
Mat_SetToWithMask::Size_MatType::(1280x720, 8UC1)                                                   0.079     0.113     0.70
Mat_SetToWithMask::Size_MatType::(1280x720, 8UC2)                                                   0.146     0.181     0.81
Mat_SetToWithMask::Size_MatType::(1920x1080, 8UC1)                                                  0.173     0.250     0.69
Mat_SetToWithMask::Size_MatType::(1920x1080, 8UC2)                                                  0.332     0.404     0.82
Mat_Transform::Size_MatType::(127x61, 64FC3)                                                        0.046     0.050     0.92
Mat_Transform::Size_MatType::(1280x720, 32FC3)                                                     14.929    16.768     0.89
Mat_Transform::Size_MatType::(1920x1080, 32FC3)                                                    35.473    74.373     0.48
Mat_Transform::Size_MatType::(1920x1080, 64FC3)                                                    16.332    31.254     0.52
PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)                                                       0.082     0.103     0.80
PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)                                                       0.162     0.202     0.80
PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)                                                       0.244     0.304     0.80
PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)                                                       0.325     0.405     0.80
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)                                                      0.246     0.304     0.81
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)                                                      0.551     0.643     0.86
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)                                                     0.657     0.731     0.90
abs::Size_MatType::(127x61, 8SC1)                                                                   0.013     0.016     0.83
abs::Size_MatType::(127x61, 32FC1)                                                                  0.005     0.005     0.94
abs::Size_MatType::(1280x720, 32SC1)                                                                0.379     0.399     0.95
absdiff::BinaryOpTest::(640x480, 8UC1)                                                              0.029     0.031     0.94
absdiff::BinaryOpTest::(640x480, 32SC1)                                                             0.141     0.159     0.89
absdiff::BinaryOpTest::(640x480, 32FC1)                                                             0.127     0.143     0.89
absdiff::BinaryOpTest::(640x480, 16SC2)                                                             0.142     0.153     0.93
absdiff::BinaryOpTest::(640x480, 8UC4)                                                              0.123     0.148     0.83
absdiff::BinaryOpTest::(640x480, 16SC4)                                                             0.238     0.282     0.84
absdiff::BinaryOpTest::(1920x1080, 16SC1)                                                           0.602     1.402     0.43
absdiff::BinaryOpTest::(1920x1080, 16SC2)                                                           3.214    12.061     0.27
absdiff::BinaryOpTest::(1920x1080, 16SC3)                                                           5.475    13.372     0.41
absdiff::BinaryOpTest::(1920x1080, 8UC4)                                                            3.602    12.083     0.30
absdiffScalarDouble::BinaryOpTest::(640x480, 8UC1)                                                  0.541     0.641     0.84
absdiffScalarDouble::BinaryOpTest::(640x480, 8SC1)                                                  0.450     0.529     0.85
absdiffScalarDouble::BinaryOpTest::(640x480, 8UC3)                                                  1.471     1.739     0.85
absdiffScalarDouble::BinaryOpTest::(640x480, 16SC3)                                                 0.863     1.194     0.72
absdiffScalarDouble::BinaryOpTest::(640x480, 8UC4)                                                  2.414     2.620     0.92
absdiffScalarDouble::BinaryOpTest::(1280x720, 8UC1)                                                 1.782     1.975     0.90
absdiffScalarDouble::BinaryOpTest::(1280x720, 8SC1)                                                 1.342     1.582     0.85
absdiffScalarDouble::BinaryOpTest::(1280x720, 16SC1)                                                0.661     1.033     0.64
absdiffScalarDouble::BinaryOpTest::(1280x720, 16SC3)                                                3.796     4.043     0.94
absdiffScalarDouble::BinaryOpTest::(1280x720, 8UC4)                                                 6.895     7.547     0.91
absdiffScalarDouble::BinaryOpTest::(1280x720, 16SC4)                                                2.769     4.427     0.63
absdiffScalarDouble::BinaryOpTest::(1920x1080, 8UC1)                                                3.012     4.223     0.71
absdiffScalarDouble::BinaryOpTest::(1920x1080, 16SC2)                                               3.330     7.065     0.47
absdiffScalarDouble::BinaryOpTest::(1920x1080, 8UC3)                                               10.936    14.237     0.77
absdiffScalarDouble::BinaryOpTest::(1920x1080, 16SC3)                                              13.387    25.442     0.53
absdiffScalarDouble::BinaryOpTest::(1920x1080, 8UC4)                                               15.500    18.142     0.85
absdiffScalarDouble::BinaryOpTest::(1920x1080, 16SC4)                                               7.011     9.099     0.77
absdiffScalarSameType::BinaryOpTest::(640x480, 8SC1)                                                0.449     0.530     0.85
absdiffScalarSameType::BinaryOpTest::(640x480, 16SC3)                                               0.857     1.207     0.71
absdiffScalarSameType::BinaryOpTest::(1280x720, 8SC1)                                               1.347     1.583     0.85
absdiffScalarSameType::BinaryOpTest::(1280x720, 16SC1)                                              0.659     1.028     0.64
absdiffScalarSameType::BinaryOpTest::(1280x720, 16SC4)                                              2.763     3.273     0.84
absdiffScalarSameType::BinaryOpTest::(1920x1080, 8SC1)                                              3.019     3.564     0.85
absdiffScalarSameType::BinaryOpTest::(1920x1080, 16SC2)                                             3.338     4.056     0.82
absdiffScalarSameType::BinaryOpTest::(1920x1080, 8UC3)                                              1.531     1.639     0.93
absdiffScalarSameType::BinaryOpTest::(1920x1080, 16SC4)                                             6.983     8.221     0.85
add::BinaryOpTest::(640x480, 8SC1)                                                                  0.030     0.032     0.93
add::BinaryOpTest::(640x480, 32SC1)                                                                 0.154     0.165     0.93
add::BinaryOpTest::(640x480, 32FC1)                                                                 0.155     0.170     0.91
add::BinaryOpTest::(640x480, 8UC4)                                                                  0.125     0.151     0.83
add::BinaryOpTest::(640x480, 16SC4)                                                                 0.287     0.309     0.93
add::BinaryOpTest::(1920x1080, 16SC1)                                                               0.630     0.688     0.92
addScalarDouble::BinaryOpTest::(640x480, 32FC1)                                                     0.124     0.131     0.95
addScalarDouble::BinaryOpTest::(640x480, 16SC3)                                                     0.814     1.233     0.66
addScalarDouble::BinaryOpTest::(1280x720, 16SC1)                                                    0.643     1.067     0.60
addScalarDouble::BinaryOpTest::(1280x720, 16SC4)                                                    2.758     3.431     0.80
addScalarDouble::BinaryOpTest::(1920x1080, 8UC1)                                                    3.539     3.833     0.92
addScalarDouble::BinaryOpTest::(1920x1080, 32SC1)                                                   1.602     1.704     0.94
addScalarDouble::BinaryOpTest::(1920x1080, 32FC1)                                                   1.582     1.784     0.89
addScalarDouble::BinaryOpTest::(1920x1080, 16SC2)                                                   3.345     4.076     0.82
addScalarDouble::BinaryOpTest::(1920x1080, 16SC4)                                                   6.992     8.217     0.85
addScalarSameType::BinaryOpTest::(640x480, 8SC1)                                                    0.456     0.569     0.80
addScalarSameType::BinaryOpTest::(640x480, 16SC3)                                                   0.852     1.177     0.72
addScalarSameType::BinaryOpTest::(1280x720, 8UC1)                                                   0.101     0.107     0.94
addScalarSameType::BinaryOpTest::(1280x720, 16SC1)                                                  0.657     1.048     0.63
addScalarSameType::BinaryOpTest::(1280x720, 16SC4)                                                  2.762     3.429     0.81
addScalarSameType::BinaryOpTest::(1920x1080, 16SC2)                                                 3.375     4.064     0.83
addScalarSameType::BinaryOpTest::(1920x1080, 8UC4)                                                  1.566     1.714     0.91
addScalarSameType::BinaryOpTest::(1920x1080, 16SC4)                                                 7.097     8.302     0.85
addWeighted::Size_MatType::(640x480, 8UC1)                                                          0.708     0.874     0.81
addWeighted::Size_MatType::(640x480, 8SC1)                                                          0.702     0.867     0.81
addWeighted::Size_MatType::(640x480, 8UC4)                                                          2.806     3.464     0.81
addWeighted::Size_MatType::(1280x720, 8UC1)                                                         2.099     2.597     0.81
addWeighted::Size_MatType::(1280x720, 8SC1)                                                         2.103     2.618     0.80
addWeighted::Size_MatType::(1280x720, 8UC4)                                                         8.455    10.478     0.81
addWeighted::Size_MatType::(1920x1080, 8UC1)                                                        4.722     5.839     0.81
addWeighted::Size_MatType::(1920x1080, 8SC1)                                                        4.724     5.845     0.81
addWeighted::Size_MatType::(1920x1080, 8UC4)                                                       19.415    23.974     0.81
basic::BroadcastTest::({ 1, 100, 800 }, 32FC1, { 10, 100, 800 })                                    0.224     0.240     0.93
basic::BroadcastTest::({ 10, 1, 800 }, 32FC1, { 10, 100, 800 })                                     0.132     0.216     0.61
basic::BroadcastTest::({ 10, 100, 1 }, 32FC1, { 10, 100, 800 })                                     0.127     0.213     0.59
bitwise_and::Size_MatType::(640x480, 32SC1)                                                         0.120     0.150     0.80
bitwise_and::Size_MatType::(640x480, 8UC4)                                                          0.120     0.150     0.80
bitwise_or::Size_MatType::(640x480, 32SC1)                                                          0.120     0.149     0.80
bitwise_or::Size_MatType::(640x480, 8UC4)                                                           0.121     0.150     0.81
bitwise_xor::Size_MatType::(640x480, 32SC1)                                                         0.121     0.149     0.81
bitwise_xor::Size_MatType::(640x480, 8UC4)                                                          0.120     0.150     0.80
compare::Size_MatType_CmpType::(640x480, 8UC4, CMP_EQ)                                              0.120     0.150     0.80
compare::Size_MatType_CmpType::(640x480, 8UC4, CMP_GE)                                              0.119     0.150     0.79
compare::Size_MatType_CmpType::(640x480, 8UC4, CMP_GT)                                              0.125     0.151     0.83
compare::Size_MatType_CmpType::(640x480, 8UC4, CMP_LE)                                              0.139     0.158     0.88
compare::Size_MatType_CmpType::(640x480, 8UC4, CMP_NE)                                              0.123     0.149     0.83
compareScalar::Size_MatType_CmpType::(127x61, 32FC1, CMP_GE)                                        0.004     0.004     0.95
compareScalar::Size_MatType_CmpType::(127x61, 32FC1, CMP_GT)                                        0.004     0.004     0.93
compareScalar::Size_MatType_CmpType::(127x61, 32FC1, CMP_NE)                                        0.004     0.004     0.95
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(640x480, 8UC1, 32FC1, 1, 0.00392157)             0.385     0.471     0.82
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(640x480, 8UC1, 32FC1, 4, 0.00392157)             1.539     1.884     0.82
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(640x480, 8UC1, 64FC1, 1, 0.00392157)             0.800     0.973     0.82
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(640x480, 8UC1, 64FC1, 4, 0.00392157)             3.218     3.903     0.82
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(640x480, 8SC1, 32FC1, 1, 0.00392157)             0.385     0.472     0.82
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(640x480, 8SC1, 32FC1, 4, 0.00392157)             1.540     1.884     0.82
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(640x480, 8SC1, 64FC1, 1, 0.00392157)             0.800     0.974     0.82
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(640x480, 8SC1, 64FC1, 4, 0.00392157)             3.218     3.909     0.82
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(640x480, 64FC1, 16UC1, 4, 0.00392157)            0.914     1.052     0.87
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(640x480, 64FC1, 16UC1, 4, 1)                     0.970     1.072     0.90
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(640x480, 64FC1, 16SC1, 4, 0.00392157)            0.913     1.050     0.87
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(640x480, 64FC1, 16SC1, 4, 1)                     0.971     1.082     0.90
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 8UC1, 32FC1, 1, 0.00392157)           2.619     3.181     0.82
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 8UC1, 32FC1, 4, 0.00392157)          11.131    13.173     0.84
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 8UC1, 64FC1, 1, 0.00392157)           5.565     6.640     0.84
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 8UC1, 64FC1, 4, 0.00392157)          22.342    26.997     0.83
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 8SC1, 32FC1, 1, 0.00392157)           2.622     3.181     0.82
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 8SC1, 32FC1, 4, 0.00392157)          11.135    13.167     0.85
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 8SC1, 64FC1, 1, 0.00392157)           5.556     6.641     0.84
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 8SC1, 64FC1, 4, 0.00392157)          22.376    26.979     0.83
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 16SC1, 32SC1, 1, 1)                   0.508     0.579     0.88
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 16SC1, 32FC1, 1, 1)                   0.557     0.592     0.94
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 32SC1, 8UC1, 1, 1)                    0.485     0.518     0.94
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 32SC1, 8SC1, 1, 1)                    0.486     0.519     0.94
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 32SC1, 16UC1, 1, 0.00392157)          0.743     0.805     0.92
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 32SC1, 16UC1, 1, 1)                   0.433     0.572     0.76
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 32SC1, 16SC1, 1, 0.00392157)          0.739     0.806     0.92
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 32SC1, 16SC1, 1, 1)                   0.432     0.572     0.76
convertTo::Size_DepthSrc_DepthDst_Channels_alpha::(1920x1080, 32SC1, 32SC1, 1, 1)                   1.209     1.299     0.93
dct::Size_MatType_Flag::(1024x768, 64FC1, DCT_INVERSE|DCT_ROWS)                                     8.023     8.537     0.94
dct::Size_MatType_Flag::(1920x1080, 32FC1, DCT_INVERSE)                                            48.058    50.800     0.95
dct::Size_MatType_Flag::(2048x2048, 64FC1, 0)                                                      231.804   334.157    0.69
dct::Size_MatType_Flag::(2048x2048, 64FC1, DCT_INVERSE)                                            272.449   362.527    0.75
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_INVERSE, false)                      68.462    72.273     0.95
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_INVERSE, true)                       55.110    59.173     0.93
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false)   68.224    72.337     0.94
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    55.122    59.151     0.93
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, 0, false)                                78.458    82.636     0.95
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_SCALE, false)                        79.758    84.497     0.94
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_SCALE, true)                         59.816    63.935     0.94
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, 0, false)                                107.680   163.146    0.66
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, 0, true)                                 89.140    143.410    0.62
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_COMPLEX_OUTPUT, false)               132.633   175.469    0.76
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_COMPLEX_OUTPUT, true)                112.349   154.698    0.73
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_INVERSE, true)                       111.773   142.264    0.79
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, 0, false)                                165.625   202.782    0.82
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, 0, true)                                 142.100   178.548    0.80
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_COMPLEX_OUTPUT, false)               171.916   187.802    0.92
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_COMPLEX_OUTPUT, true)                147.685   163.270    0.90
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    175.928   188.496    0.93
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_SCALE, false)                        168.225   203.849    0.83
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_SCALE, true)                         144.459   180.912    0.80
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, 0, false)                                232.879   314.649    0.74
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, 0, true)                                 195.710   280.171    0.70
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_COMPLEX_OUTPUT, false)               233.133   314.801    0.74
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_COMPLEX_OUTPUT, true)                195.282   280.722    0.70
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_INVERSE, false)                      238.362   322.474    0.74
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_INVERSE, true)                       200.990   285.772    0.70
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false)   238.145   322.432    0.74
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    200.674   285.474    0.70
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_SCALE, false)                        235.950   318.731    0.74
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_SCALE, true)                         197.062   283.973    0.69
dot::MatType_Length::(8UC1, 256)                                                                    0.005     0.008     0.66
dot::MatType_Length::(32SC1, 512)                                                                   0.117     0.128     0.91
dot::MatType_Length::(32FC1, 256)                                                                   0.019     0.033     0.59
dot::MatType_Length::(32FC1, 512)                                                                   0.076     0.115     0.66
hal_normL1_f32::test_len::300000                                                                    0.084     0.120     0.70
hal_normL1_u8::test_len::2000000                                                                    0.268     0.316     0.85
hal_normL2Sqr::test_len::300000                                                                     0.075     0.096     0.78
inRange::Size_MatType::(640x480, 8UC1)                                                              0.044     0.057     0.77
inRange::Size_MatType::(640x480, 8SC1)                                                              0.043     0.057     0.77
inRange::Size_MatType::(640x480, 16UC1)                                                             0.093     0.111     0.85
inRange::Size_MatType::(640x480, 16SC1)                                                             0.094     0.109     0.86
inRange::Size_MatType::(640x480, 32SC1)                                                             0.203     0.238     0.85
inRange::Size_MatType::(640x480, 32FC1)                                                             0.205     0.233     0.88
inRange::Size_MatType::(1280x720, 8UC1)                                                             0.130     0.169     0.77
inRange::Size_MatType::(1280x720, 8SC1)                                                             0.129     0.167     0.77
inRange::Size_MatType::(1280x720, 16UC1)                                                            0.274     0.330     0.83
inRange::Size_MatType::(1280x720, 16SC1)                                                            0.279     0.326     0.85
inRange::Size_MatType::(1280x720, 32SC1)                                                            0.615     0.764     0.81
inRange::Size_MatType::(1280x720, 32FC1)                                                            0.627     0.779     0.81
inRange::Size_MatType::(1920x1080, 8UC1)                                                            0.293     0.384     0.76
inRange::Size_MatType::(1920x1080, 8SC1)                                                            0.293     0.379     0.77
inRange::Size_MatType::(1920x1080, 16UC1)                                                           0.856     1.008     0.85
inRange::Size_MatType::(1920x1080, 16SC1)                                                           0.866     0.996     0.87
max::BinaryOpTest::(640x480, 16SC4)                                                                 0.258     0.308     0.84
max::BinaryOpTest::(1280x720, 16SC3)                                                                1.366     1.460     0.94
maxScalarDouble::BinaryOpTest::(1920x1080, 8UC3)                                                    1.558     1.685     0.92
meanStdDev_mask::Size_MatType::(1280x720, 32FC1)                                                    2.351     2.503     0.94
mean_mask::Size_MatType::(127x61, 8UC1)                                                             0.016     0.020     0.80
merge::Size_SrcDepth_DstChannels::(640x480, 64FC1, 3)                                               0.647     3.080     0.21
merge::Size_SrcDepth_DstChannels::(640x480, 64FC1, 4)                                               2.893     7.853     0.37
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 3)                                              0.455     0.862     0.53
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 4)                                              0.827     3.493     0.24
merge::Size_SrcDepth_DstChannels::(1280x720, 32SC1, 2)                                              0.713     3.368     0.21
merge::Size_SrcDepth_DstChannels::(1280x720, 32SC1, 3)                                              3.136    10.673     0.29
merge::Size_SrcDepth_DstChannels::(1280x720, 32SC1, 4)                                              5.040    17.304     0.29
merge::Size_SrcDepth_DstChannels::(1280x720, 32FC1, 2)                                              0.716     3.412     0.21
merge::Size_SrcDepth_DstChannels::(1280x720, 32FC1, 3)                                              3.140    10.691     0.29
merge::Size_SrcDepth_DstChannels::(1280x720, 32FC1, 4)                                              5.055    17.309     0.29
merge::Size_SrcDepth_DstChannels::(1280x720, 64FC1, 2)                                              5.340    16.624     0.32
merge::Size_SrcDepth_DstChannels::(1280x720, 64FC1, 3)                                              7.961    25.166     0.32
merge::Size_SrcDepth_DstChannels::(1280x720, 64FC1, 4)                                             10.716    28.937     0.37
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 3)                                              0.531     1.410     0.38
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 4)                                              1.363     5.467     0.25
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 2)                                             1.232     4.716     0.26
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 3)                                             3.830    14.114     0.27
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 4)                                             5.792    19.605     0.30
merge::Size_SrcDepth_DstChannels::(1920x1080, 32SC1, 2)                                             5.444    19.061     0.29
merge::Size_SrcDepth_DstChannels::(1920x1080, 32SC1, 3)                                             7.686    28.557     0.27
merge::Size_SrcDepth_DstChannels::(1920x1080, 32SC1, 4)                                            11.177    31.835     0.35
min::BinaryOpTest::(640x480, 32SC1)                                                                 0.154     0.198     0.78
min::BinaryOpTest::(640x480, 32FC1)                                                                 0.140     0.199     0.70
min::BinaryOpTest::(640x480, 16SC4)                                                                 0.334     0.354     0.94
min::BinaryOpTest::(1920x1080, 16SC1)                                                               0.615     0.677     0.91
minScalarDouble::BinaryOpTest::(640x480, 32FC1)                                                     0.121     0.133     0.91
minScalarDouble::BinaryOpTest::(1280x720, 32SC1)                                                    0.364     0.388     0.94
minScalarDouble::BinaryOpTest::(1280x720, 16SC2)                                                    0.361     0.380     0.95
minScalarDouble::BinaryOpTest::(1280x720, 8UC4)                                                     0.381     0.407     0.94
minScalarDouble::BinaryOpTest::(1920x1080, 32SC1)                                                   1.542     1.691     0.91
minScalarDouble::BinaryOpTest::(1920x1080, 16SC3)                                                   7.044    24.494     0.29
minScalarDouble::BinaryOpTest::(1920x1080, 8UC4)                                                    1.470     1.583     0.93
minScalarDouble::BinaryOpTest::(1920x1080, 16SC4)                                                   5.668    16.226     0.35
minScalarSameType::BinaryOpTest::(1280x720, 16SC3)                                                  1.132     1.278     0.89
minScalarSameType::BinaryOpTest::(1280x720, 8UC4)                                                   0.383     0.411     0.93
minScalarSameType::BinaryOpTest::(1280x720, 16SC4)                                                  1.110     1.241     0.89
minScalarSameType::BinaryOpTest::(1920x1080, 32SC1)                                                 1.466     1.707     0.86
minScalarSameType::BinaryOpTest::(1920x1080, 32FC1)                                                 1.451     1.591     0.91
minScalarSameType::BinaryOpTest::(1920x1080, 8UC3)                                                  1.358     1.570     0.87
multiply::BinaryOpTest::(640x480, 32FC1)                                                            0.154     0.169     0.91
multiply::BinaryOpTest::(640x480, 16SC2)                                                            0.136     0.152     0.90
multiply::BinaryOpTest::(640x480, 16SC4)                                                            0.266     0.290     0.92
norm2::PerfHamming::(NORM_HAMMING2, 8UC1, 640x480)                                                  0.032     0.046     0.70
norm2::PerfHamming::(NORM_HAMMING2, 8UC1, 1920x1080)                                                0.206     0.315     0.65
norm2::Size_MatType_NormType::(127x61, 8UC1, NORM_INF)                                              0.002     0.003     0.71
norm2::Size_MatType_NormType::(127x61, 8UC1, NORM_INF|NORM_RELATIVE)                                0.003     0.005     0.73
norm2::Size_MatType_NormType::(127x61, 8UC4, NORM_INF)                                              0.008     0.012     0.69
norm2::Size_MatType_NormType::(127x61, 8UC4, NORM_INF|NORM_RELATIVE)                                0.012     0.017     0.70
norm2::Size_MatType_NormType::(640x480, 8UC1, NORM_INF)                                             0.088     0.119     0.74
norm2::Size_MatType_NormType::(640x480, 8UC1, NORM_INF|NORM_RELATIVE)                               0.132     0.173     0.76
norm2::Size_MatType_NormType::(640x480, 8UC4, NORM_INF)                                             0.352     0.476     0.74
norm2::Size_MatType_NormType::(640x480, 8UC4, NORM_INF|NORM_RELATIVE)                               0.516     0.681     0.76
norm2::Size_MatType_NormType::(1280x720, 8UC1, NORM_INF)                                            0.264     0.357     0.74
norm2::Size_MatType_NormType::(1280x720, 8UC1, NORM_INF|NORM_RELATIVE)                              0.388     0.510     0.76
norm2::Size_MatType_NormType::(1280x720, 8UC4, NORM_INF)                                            1.084     1.462     0.74
norm2::Size_MatType_NormType::(1280x720, 8UC4, NORM_INF|NORM_RELATIVE)                              1.575     2.075     0.76
norm2::Size_MatType_NormType::(1920x1080, 8UC1, NORM_INF)                                           0.592     0.809     0.73
norm2::Size_MatType_NormType::(1920x1080, 8UC1, NORM_INF|NORM_RELATIVE)                             0.871     1.163     0.75
norm2::Size_MatType_NormType::(1920x1080, 8UC4, NORM_INF)                                           3.088     3.804     0.81
norm2::Size_MatType_NormType::(1920x1080, 8UC4, NORM_INF|NORM_RELATIVE)                             4.352     5.199     0.84
norm2_mask::Size_MatType_NormType::(127x61, 8UC1, NORM_L1|NORM_RELATIVE)                            0.060     0.063     0.95
norm2_mask::Size_MatType_NormType::(127x61, 8UC1, NORM_L2)                                          0.032     0.033     0.95
norm2_mask::Size_MatType_NormType::(127x61, 8UC1, NORM_L2|NORM_RELATIVE)                            0.059     0.065     0.91
norm2_mask::Size_MatType_NormType::(127x61, 8UC4, NORM_L2)                                          0.069     0.073     0.94
norm2_mask::Size_MatType_NormType::(127x61, 8UC4, NORM_L2|NORM_RELATIVE)                            0.131     0.146     0.90
norm2_mask::Size_MatType_NormType::(640x480, 8UC1, NORM_L1|NORM_RELATIVE)                           2.364     2.499     0.95
norm2_mask::Size_MatType_NormType::(640x480, 8UC1, NORM_L2)                                         1.256     1.346     0.93
norm2_mask::Size_MatType_NormType::(640x480, 8UC1, NORM_L2|NORM_RELATIVE)                           2.350     2.591     0.91
norm2_mask::Size_MatType_NormType::(640x480, 8UC4, NORM_L2)                                         2.697     2.902     0.93
norm2_mask::Size_MatType_NormType::(640x480, 8UC4, NORM_L2|NORM_RELATIVE)                           5.137     5.856     0.88
norm2_mask::Size_MatType_NormType::(1280x720, 8UC1, NORM_L2)                                        3.764     4.027     0.93
norm2_mask::Size_MatType_NormType::(1280x720, 8UC1, NORM_L2|NORM_RELATIVE)                          7.044     7.750     0.91
norm2_mask::Size_MatType_NormType::(1280x720, 8UC4, NORM_L2)                                        8.101     8.715     0.93
norm2_mask::Size_MatType_NormType::(1280x720, 8UC4, NORM_L2|NORM_RELATIVE)                         15.585    17.581     0.89
norm2_mask::Size_MatType_NormType::(1920x1080, 8UC1, NORM_L1|NORM_RELATIVE)                        15.934    16.877     0.94
norm2_mask::Size_MatType_NormType::(1920x1080, 8UC1, NORM_L2)                                       8.466     9.063     0.93
norm2_mask::Size_MatType_NormType::(1920x1080, 8UC1, NORM_L2|NORM_RELATIVE)                        15.849    17.438     0.91
norm2_mask::Size_MatType_NormType::(1920x1080, 8UC4, NORM_L2)                                      18.676    19.945     0.94
norm2_mask::Size_MatType_NormType::(1920x1080, 8UC4, NORM_L2|NORM_RELATIVE)                        35.455    39.898     0.89
norm::PerfHamming::(NORM_HAMMING2, 8UC1, 640x480)                                                   0.019     0.035     0.54
norm::PerfHamming::(NORM_HAMMING2, 8UC1, 1920x1080)                                                 0.127     0.194     0.66
norm::Size_MatType_NormType::(127x61, 8UC1, NORM_INF)                                               0.001     0.001     0.76
norm::Size_MatType_NormType::(127x61, 8UC4, NORM_INF)                                               0.004     0.005     0.72
norm::Size_MatType_NormType::(640x480, 8UC1, NORM_INF)                                              0.046     0.058     0.79
norm::Size_MatType_NormType::(640x480, 8UC4, NORM_INF)                                              0.165     0.203     0.81
norm::Size_MatType_NormType::(1280x720, 8UC1, NORM_INF)                                             0.124     0.154     0.81
norm::Size_MatType_NormType::(1280x720, 8UC4, NORM_INF)                                             0.494     0.626     0.79
norm::Size_MatType_NormType::(1920x1080, 8UC1, NORM_INF)                                            0.279     0.344     0.81
norm::Size_MatType_NormType::(1920x1080, 8UC4, NORM_INF)                                            1.185     1.451     0.82
norm_mask::Size_MatType_NormType::(127x61, 8UC1, NORM_INF)                                          0.031     0.033     0.95
norm_mask::Size_MatType_NormType::(127x61, 8UC1, NORM_L1)                                           0.027     0.030     0.91
norm_mask::Size_MatType_NormType::(127x61, 8UC1, NORM_L2)                                           0.027     0.031     0.88
norm_mask::Size_MatType_NormType::(127x61, 8UC4, NORM_INF)                                          0.074     0.083     0.89
norm_mask::Size_MatType_NormType::(127x61, 8UC4, NORM_L2)                                           0.061     0.074     0.83
norm_mask::Size_MatType_NormType::(640x480, 8UC1, NORM_INF)                                         1.235     1.358     0.91
norm_mask::Size_MatType_NormType::(640x480, 8UC1, NORM_L1)                                          1.093     1.209     0.90
norm_mask::Size_MatType_NormType::(640x480, 8UC1, NORM_L2)                                          1.095     1.242     0.88
norm_mask::Size_MatType_NormType::(640x480, 8UC4, NORM_INF)                                         2.949     3.282     0.90
norm_mask::Size_MatType_NormType::(640x480, 8UC4, NORM_L2)                                          2.469     2.954     0.84
norm_mask::Size_MatType_NormType::(1280x720, 8UC1, NORM_INF)                                        3.703     4.075     0.91
norm_mask::Size_MatType_NormType::(1280x720, 8UC1, NORM_L1)                                         3.272     3.608     0.91
norm_mask::Size_MatType_NormType::(1280x720, 8UC1, NORM_L2)                                         3.281     3.718     0.88
norm_mask::Size_MatType_NormType::(1280x720, 8UC4, NORM_INF)                                        8.853     9.894     0.89
norm_mask::Size_MatType_NormType::(1280x720, 8UC4, NORM_L2)                                         7.372     8.856     0.83
norm_mask::Size_MatType_NormType::(1920x1080, 8UC1, NORM_INF)                                       8.328     9.146     0.91
norm_mask::Size_MatType_NormType::(1920x1080, 8UC1, NORM_L1)                                        7.362     8.282     0.89
norm_mask::Size_MatType_NormType::(1920x1080, 8UC1, NORM_L2)                                        7.378     8.356     0.88
norm_mask::Size_MatType_NormType::(1920x1080, 8UC4, NORM_INF)                                      20.031    22.223     0.90
norm_mask::Size_MatType_NormType::(1920x1080, 8UC4, NORM_L2)                                       16.988    19.989     0.85
normalize_mask::Size_MatType_NormType::(640x480, 8UC1, NORM_INF)                                    1.811     1.958     0.92
normalize_mask::Size_MatType_NormType::(640x480, 8UC1, NORM_L1)                                     1.663     1.800     0.92
normalize_mask::Size_MatType_NormType::(640x480, 8UC1, NORM_L2)                                     1.666     1.828     0.91
normalize_mask::Size_MatType_NormType::(640x480, 8UC4, NORM_INF)                                    5.387     5.712     0.94
normalize_mask::Size_MatType_NormType::(640x480, 8UC4, NORM_L2)                                     4.929     5.385     0.92
normalize_mask::Size_MatType_NormType::(1920x1080, 8UC1, NORM_INF)                                 12.182    13.040     0.93
normalize_mask::Size_MatType_NormType::(1920x1080, 8UC1, NORM_L1)                                  11.223    12.151     0.92
normalize_mask::Size_MatType_NormType::(1920x1080, 8UC1, NORM_L2)                                  11.229    12.290     0.91
normalize_mask::Size_MatType_NormType::(1920x1080, 8UC4, NORM_L2)                                  34.388    37.611     0.91
phase32f::VectorLength::1048576                                                                     2.421     2.903     0.83
phase64f::VectorLength::128                                                                         0.001     0.001     0.95
phase64f::VectorLength::1000                                                                        0.003     0.004     0.93
phase64f::VectorLength::131072                                                                      0.449     0.494     0.91
phase64f::VectorLength::524288                                                                      1.931     2.789     0.69
phase64f::VectorLength::1048576                                                                     5.884    14.657     0.40
reduceC::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_SUM2)                                            0.263     0.322     0.82
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_AVG)                                            0.649     0.685     0.95
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_SUM2)                                           0.502     0.640     0.78
reduceC::Size_MatType_ROp::(1920x1080, 8UC4, REDUCE_SUM)                                            0.624     0.666     0.94
reduceR::Size_MatType_ROp::(640x480, 8UC1, REDUCE_SUM)                                              0.116     0.134     0.87
reduceR::Size_MatType_ROp::(640x480, 32FC1, REDUCE_MAX)                                             0.056     0.220     0.25
reduceR::Size_MatType_ROp::(640x480, 32FC1, REDUCE_MIN)                                             0.056     0.222     0.25
reduceR::Size_MatType_ROp::(640x480, 32FC1, REDUCE_SUM)                                             0.051     0.113     0.45
reduceR::Size_MatType_ROp::(640x480, 8UC4, REDUCE_SUM)                                              0.146     0.823     0.18
reduceR::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_AVG)                                             0.775     0.844     0.92
reduceR::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_MAX)                                             0.298     0.920     0.32
reduceR::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_SUM2)                                            0.595     0.730     0.82
reduceR::Size_MatType_ROp::(1280x720, 8UC1, REDUCE_SUM)                                             0.744     0.832     0.89
reduceR::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_AVG)                                            0.162     0.190     0.85
reduceR::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_MAX)                                            0.752     0.979     0.77
reduceR::Size_MatType_ROp::(1280x720, 32FC1, REDUCE_MIN)                                            0.747     0.979     0.76
reduceR::Size_MatType_ROp::(1280x720, 8UC4, REDUCE_AVG)                                             0.437     1.578     0.28
reduceR::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_AVG)                                           0.123     1.356     0.09
reduceR::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_MAX)                                           1.494     1.949     0.77
reduceR::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_SUM2)                                          0.127     0.194     0.65
reduceR::Size_MatType_ROp::(1920x1080, 32FC1, REDUCE_SUM)                                           0.111     1.326     0.08
single_iter::KMeans::(4, 3, 500)                                                                    1.220     1.541     0.79
single_iter::KMeans::(4, 3, 1000)                                                                   1.531     1.654     0.93
single_iter::KMeans::(8, 3, 1000)                                                                   3.056     3.390     0.90
single_iter::KMeans::(100, 2, 1000)                                                                34.623    46.413     0.75
split::Size_Depth_Channels::(127x61, 8UC1, 3)                                                       0.002     0.002     0.91
split::Size_Depth_Channels::(127x61, 8UC1, 4)                                                       0.002     0.002     0.94
split::Size_Depth_Channels::(640x480, 64FC1, 3)                                                     1.124     1.412     0.80
split::Size_Depth_Channels::(1280x720, 16SC1, 3)                                                    0.619     0.679     0.91
split::Size_Depth_Channels::(1280x720, 16SC1, 4)                                                    1.125     1.591     0.71
split::Size_Depth_Channels::(1280x720, 32FC1, 2)                                                    1.005     1.247     0.81
split::Size_Depth_Channels::(1920x1080, 8UC1, 2)                                                    0.292     0.360     0.81
split::Size_Depth_Channels::(1920x1080, 8UC1, 3)                                                    0.777     0.993     0.78
split::Size_Depth_Channels::(1920x1080, 64FC1, 2)                                                  12.250    13.110     0.93
split::Size_Depth_Channels::(1920x1080, 64FC1, 3)                                                  21.245    22.710     0.94
subtract::BinaryOpTest::(640x480, 32SC1)                                                            0.141     0.162     0.87
subtract::BinaryOpTest::(640x480, 16SC2)                                                            0.154     0.166     0.93
subtract::BinaryOpTest::(640x480, 8UC4)                                                             0.121     0.150     0.81
subtract::BinaryOpTest::(640x480, 16SC4)                                                            0.259     0.308     0.84
subtractScalarDouble::BinaryOpTest::(640x480, 8SC1)                                                 0.527     0.585     0.90
subtractScalarDouble::BinaryOpTest::(640x480, 16SC3)                                                0.816     1.229     0.66
subtractScalarDouble::BinaryOpTest::(1280x720, 16SC1)                                               0.661     1.081     0.61
subtractScalarDouble::BinaryOpTest::(1280x720, 16SC4)                                               2.747     3.359     0.82
subtractScalarDouble::BinaryOpTest::(1920x1080, 8UC1)                                               3.519     4.054     0.87
subtractScalarDouble::BinaryOpTest::(1920x1080, 16SC2)                                              3.352     4.185     0.80
subtractScalarDouble::BinaryOpTest::(1920x1080, 16SC4)                                              6.967     8.373     0.83
subtractScalarSameType::BinaryOpTest::(640x480, 8SC1)                                               0.455     0.547     0.83
subtractScalarSameType::BinaryOpTest::(640x480, 16SC3)                                              0.855     1.202     0.71
subtractScalarSameType::BinaryOpTest::(1280x720, 16SC1)                                             0.664     1.060     0.63
subtractScalarSameType::BinaryOpTest::(1280x720, 16SC4)                                             2.735     3.249     0.84
subtractScalarSameType::BinaryOpTest::(1920x1080, 32SC1)                                            1.495     1.603     0.93
subtractScalarSameType::BinaryOpTest::(1920x1080, 16SC2)                                            3.410     4.039     0.84
subtractScalarSameType::BinaryOpTest::(1920x1080, 8UC3)                                             1.370     1.624     0.84
subtractScalarSameType::BinaryOpTest::(1920x1080, 16SC4)                                            7.038     8.228     0.86
sum::Size_MatType::(640x480, 8UC1)                                                                  0.019     0.021     0.94

@CNClareChen
Copy link
Copy Markdown
Contributor Author

CNClareChen commented Nov 29, 2023

@fengyuentau This patch does not make many changes to SIMD optimization, so there should be no widespread performance degradation. After applying this patch, the compilation does not need to specify parameters such as -DCPU_BASELINE. I didn't see such a big performance gap in my local tests. I recommend that you can run a few more times.

Copy link
Copy Markdown
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. LGTM👍

@asmorkalov asmorkalov merged commit 3893936 into opencv:4.x Nov 30, 2023
{
__m256i res = __lasx_xvsrarni_w_d(a.val, a.val, n);
__lasx_xvstelm_d(res, ptr, 0, 0);
__lasx_xvstelm_d(res, ptr, 8, 2);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch does not make many changes to SIMD optimization

So these changes are unintended, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these were intentionally modified by me. I just changed the implementation method to avoid mixing LSX instructions, and the number of instructions did not increase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: core optimization platform: loongson Loonson CPU architecure and LASX simd

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants