Skip to content

Add SymmColumnVec_32f8u#20740

Merged
alalek merged 2 commits intoopencv:3.4from
Nicholas-Ho-arm:3.4_SymmColumnVec_32f8u
Oct 15, 2021
Merged

Add SymmColumnVec_32f8u#20740
alalek merged 2 commits intoopencv:3.4from
Nicholas-Ho-arm:3.4_SymmColumnVec_32f8u

Conversation

@Nicholas-Ho-arm
Copy link
Copy Markdown
Contributor

@Nicholas-Ho-arm Nicholas-Ho-arm commented Sep 23, 2021

Improve the performance of SymmColumnFilter in imgproc for 32f input, 8u output data by adding a SIMD optimisation SymmColumnVec_32f8u.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Linux AVX2,Custom,ARMv7,ARMv8
buildworker:Custom=linux-3
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX
disable_ipp=ON

@Nicholas-Ho-arm
Copy link
Copy Markdown
Contributor Author

Results of speedup in corresponding perf tests opencv_perf_features2d --gtest_filter=*extract/*:*detectAndExtract/*. Run on single thread Cortex-A72.

Geometric mean (ms)

                                                       Name of Test                                                            features2d       features2d       features2d 
                                                                                                                                myChange          master           master   
                                                                                                                            detectAndExtract detectAndExtract detectAndExtract
                                                                                                                                                                     vs     
                                                                                                                                                                 features2d 
                                                                                                                                                                  myChange  
                                                                                                                                                              detectAndExtract
                                                                                                                                                                 (x-factor) 
detectAndExtract::feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")             222.212          214.238            1.04    
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                                172.278          166.826            1.03    
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                                557.166          535.180            1.04    
detectAndExtract::feature2d::(AKAZE_DESCRIPTOR_KAZE, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")     256.534          251.973            1.02    
detectAndExtract::feature2d::(AKAZE_DESCRIPTOR_KAZE, "stitching/a3.png")                                                        198.285          190.955            1.04    
detectAndExtract::feature2d::(AKAZE_DESCRIPTOR_KAZE, "stitching/s2.jpg")                                                        715.233          692.244            1.03    
detectAndExtract::feature2d::(BRISK_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")             205.183          203.862            1.01    
detectAndExtract::feature2d::(BRISK_DEFAULT, "stitching/a3.png")                                                                128.711          128.246            1.00    
detectAndExtract::feature2d::(BRISK_DEFAULT, "stitching/s2.jpg")                                                                1295.035         1292.400           1.00    
detectAndExtract::feature2d::(KAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")              1405.261         1344.826           1.04    
detectAndExtract::feature2d::(KAZE_DEFAULT, "stitching/a3.png")                                                                 1148.566         1091.093           1.05    
detectAndExtract::feature2d::(KAZE_DEFAULT, "stitching/s2.jpg")                                                                 2590.820         2736.460           0.95    
detectAndExtract::feature2d::(ORB_1500_13_1, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")              18.160           20.384            0.89    
detectAndExtract::feature2d::(ORB_1500_13_1, "stitching/a3.png")                                                                 15.663           17.534            0.89    
detectAndExtract::feature2d::(ORB_1500_13_1, "stitching/s2.jpg")                                                                 36.655           40.611            0.90    
detectAndExtract::feature2d::(ORB_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")                39.580           46.755            0.85    
detectAndExtract::feature2d::(ORB_DEFAULT, "stitching/a3.png")                                                                   32.529           38.555            0.84    
detectAndExtract::feature2d::(ORB_DEFAULT, "stitching/s2.jpg")                                                                   95.350          107.906            0.88    
detectAndExtract::feature2d::(SIFT_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")              495.280          474.871            1.04    
detectAndExtract::feature2d::(SIFT_DEFAULT, "stitching/a3.png")                                                                 582.961          572.074            1.02    
detectAndExtract::feature2d::(SIFT_DEFAULT, "stitching/s2.jpg")                                                                 1457.919         1395.741           1.04    
extract::feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")                      194.687          209.851            0.93    
extract::feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                                         152.208          145.743            1.04    
extract::feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                                         482.798          454.353            1.06    
extract::feature2d::(AKAZE_DESCRIPTOR_KAZE, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")              236.363          232.898            1.01    
extract::feature2d::(AKAZE_DESCRIPTOR_KAZE, "stitching/a3.png")                                                                 178.518          172.524            1.03    
extract::feature2d::(AKAZE_DESCRIPTOR_KAZE, "stitching/s2.jpg")                                                                 643.482          626.733            1.03    
extract::feature2d::(BRISK_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")                       21.597           21.741            0.99    
extract::feature2d::(BRISK_DEFAULT, "stitching/a3.png")                                                                          14.841           14.928            0.99    
extract::feature2d::(BRISK_DEFAULT, "stitching/s2.jpg")                                                                          84.740           84.998            1.00    
extract::feature2d::(KAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")                       725.864          696.094            1.04    
extract::feature2d::(KAZE_DEFAULT, "stitching/a3.png")                                                                          581.521          562.657            1.03    
extract::feature2d::(KAZE_DEFAULT, "stitching/s2.jpg")                                                                          1638.190         1591.597           1.03    
extract::feature2d::(ORB_1500_13_1, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")                       19.614           23.742            0.83    
extract::feature2d::(ORB_1500_13_1, "stitching/a3.png")                                                                          15.546           19.222            0.81    
extract::feature2d::(ORB_1500_13_1, "stitching/s2.jpg")                                                                          52.026           60.506            0.86    
extract::feature2d::(ORB_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")                         21.551           26.099            0.83    
extract::feature2d::(ORB_DEFAULT, "stitching/a3.png")                                                                            17.195           21.254            0.81    
extract::feature2d::(ORB_DEFAULT, "stitching/s2.jpg")                                                                            56.786           67.400            0.84    
extract::feature2d::(SIFT_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")                       198.914          200.911            0.99    
extract::feature2d::(SIFT_DEFAULT, "stitching/a3.png")                                                                          126.152          126.685            1.00    
extract::feature2d::(SIFT_DEFAULT, "stitching/s2.jpg")                                                                          644.808          654.250            0.99

@asmorkalov asmorkalov added optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc labels Sep 24, 2021
Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! Thank you 👍

Perf results on i5-6600 (1 thread, noIPP, noOpenCL):

Name of Test base patch speedup
Sobel::OCL_SobelFixture::(640x480, 8UC1) 2.349 0.460 5.10
Sobel::OCL_SobelFixture::(640x480, 8UC4) 9.442 1.973 4.78
Sobel::OCL_SobelFixture::(1280x720, 8UC1) 6.995 1.384 5.05
Sobel::OCL_SobelFixture::(1280x720, 8UC4) 28.219 5.910 4.77
Sobel::OCL_SobelFixture::(1920x1080, 8UC1) 15.380 3.271 4.70
Sobel::OCL_SobelFixture::(1920x1080, 8UC4) 63.457 13.123 4.84
Sobel::OCL_SobelFixture::(3840x2160, 8UC1) 61.049 13.055 4.68
Sobel::OCL_SobelFixture::(3840x2160, 8UC4) 259.697 53.806 4.83

@alalek alalek merged commit bd0732b into opencv:3.4 Oct 15, 2021
This was referenced Oct 15, 2021
@Nicholas-Ho-arm
Copy link
Copy Markdown
Contributor Author

@alalek Thank you for reviewing. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: imgproc optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants