Skip to content

Change fast corner flags in HAL version from char array to single int#15357

Merged
opencv-pushbot merged 1 commit intoopencv:3.4from
ChipKerchner:fastCorner
Aug 29, 2019
Merged

Change fast corner flags in HAL version from char array to single int#15357
opencv-pushbot merged 1 commit intoopencv:3.4from
ChipKerchner:fastCorner

Conversation

@ChipKerchner
Copy link
Copy Markdown
Contributor

Change fast corner flags in HAL version from char array to single int

@alalek
Copy link
Copy Markdown
Member

alalek commented Aug 21, 2019

relates #14916

@ChipKerchner
Copy link
Copy Markdown
Contributor Author

ChipKerchner commented Aug 21, 2019

I'm seeing slightly better performance with v_signmask than the array of chars. If you disagree we can cancel this pull request - I didn't realize that I'm undoing a previous check-in. You would think a single variable would be better than multiple memory accesses.

@alalek
Copy link
Copy Markdown
Member

alalek commented Aug 21, 2019

This code is under CV_SIMD128, so v_signmask() is probably fine there.
@terfendail Could you please check performance changes?

@terfendail
Copy link
Copy Markdown
Contributor

Performance for SSE2 baseline
Performance test Reference time PR time Speedup
detect::feature2d::(FAST_20_FALSE_TYPE5_8, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.241 1.307 0.95
detect::feature2d::(FAST_20_FALSE_TYPE5_8, "stitching/a3.png") 0.718 0.752 0.96
detect::feature2d::(FAST_20_FALSE_TYPE5_8, "stitching/s2.jpg") 4.954 5.014 0.99
detect::feature2d::(FAST_20_FALSE_TYPE7_12, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 3.916 3.949 0.99
detect::feature2d::(FAST_20_FALSE_TYPE7_12, "stitching/a3.png") 2.195 2.233 0.98
detect::feature2d::(FAST_20_FALSE_TYPE7_12, "stitching/s2.jpg") 14.500 14.498 1.00
detect::feature2d::(FAST_20_FALSE_TYPE9_16, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.077 1.039 1.04
detect::feature2d::(FAST_20_FALSE_TYPE9_16, "stitching/a3.png") 0.731 0.715 1.02
detect::feature2d::(FAST_20_FALSE_TYPE9_16, "stitching/s2.jpg") 3.973 3.807 1.04
detect::feature2d::(FAST_20_TRUE_TYPE5_8, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.311 1.313 1.00
detect::feature2d::(FAST_20_TRUE_TYPE5_8, "stitching/a3.png") 0.752 0.753 1.00
detect::feature2d::(FAST_20_TRUE_TYPE5_8, "stitching/s2.jpg") 5.113 5.169 0.99
detect::feature2d::(FAST_20_TRUE_TYPE7_12, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 4.122 4.228 0.97
detect::feature2d::(FAST_20_TRUE_TYPE7_12, "stitching/a3.png") 2.397 2.456 0.98
detect::feature2d::(FAST_20_TRUE_TYPE7_12, "stitching/s2.jpg") 15.509 16.036 0.97
detect::feature2d::(FAST_20_TRUE_TYPE9_16, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.374 1.394 0.99
detect::feature2d::(FAST_20_TRUE_TYPE9_16, "stitching/a3.png") 1.024 1.050 0.98
detect::feature2d::(FAST_20_TRUE_TYPE9_16, "stitching/s2.jpg") 5.815 5.976 0.97
detect::feature2d::(FAST_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 2.569 2.427 1.06
detect::feature2d::(FAST_DEFAULT, "stitching/a3.png") 2.076 2.056 1.01
detect::feature2d::(FAST_DEFAULT, "stitching/s2.jpg") 9.493 9.404 1.01
Performance for SSE3 baseline
Performance test Reference time PR time Speedup
detect::feature2d::(FAST_20_FALSE_TYPE5_8, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.287 1.300 0.99
detect::feature2d::(FAST_20_FALSE_TYPE5_8, "stitching/a3.png") 0.749 0.753 0.99
detect::feature2d::(FAST_20_FALSE_TYPE5_8, "stitching/s2.jpg") 4.953 4.951 1.00
detect::feature2d::(FAST_20_FALSE_TYPE7_12, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 3.842 3.770 1.02
detect::feature2d::(FAST_20_FALSE_TYPE7_12, "stitching/a3.png") 2.198 2.171 1.01
detect::feature2d::(FAST_20_FALSE_TYPE7_12, "stitching/s2.jpg") 14.532 14.360 1.01
detect::feature2d::(FAST_20_FALSE_TYPE9_16, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.068 1.015 1.05
detect::feature2d::(FAST_20_FALSE_TYPE9_16, "stitching/a3.png") 0.726 0.692 1.05
detect::feature2d::(FAST_20_FALSE_TYPE9_16, "stitching/s2.jpg") 3.965 3.746 1.06
detect::feature2d::(FAST_20_TRUE_TYPE5_8, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.297 1.252 1.04
detect::feature2d::(FAST_20_TRUE_TYPE5_8, "stitching/a3.png") 0.749 0.718 1.04
detect::feature2d::(FAST_20_TRUE_TYPE5_8, "stitching/s2.jpg") 5.104 4.930 1.04
detect::feature2d::(FAST_20_TRUE_TYPE7_12, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 4.070 3.893 1.05
detect::feature2d::(FAST_20_TRUE_TYPE7_12, "stitching/a3.png") 2.332 2.370 0.98
detect::feature2d::(FAST_20_TRUE_TYPE7_12, "stitching/s2.jpg") 15.205 15.809 0.96
detect::feature2d::(FAST_20_TRUE_TYPE9_16, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.438 1.397 1.03
detect::feature2d::(FAST_20_TRUE_TYPE9_16, "stitching/a3.png") 1.084 1.053 1.03
detect::feature2d::(FAST_20_TRUE_TYPE9_16, "stitching/s2.jpg") 6.188 5.981 1.03
detect::feature2d::(FAST_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 2.550 2.427 1.05
detect::feature2d::(FAST_DEFAULT, "stitching/a3.png") 2.143 2.000 1.07
detect::feature2d::(FAST_DEFAULT, "stitching/s2.jpg") 9.804 8.987 1.09
Performance for SSE4_2 baseline
Performance test Reference time PR time Speedup
detect::feature2d::(FAST_20_FALSE_TYPE5_8, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.276 1.262 1.01
detect::feature2d::(FAST_20_FALSE_TYPE5_8, "stitching/a3.png") 0.731 0.736 0.99
detect::feature2d::(FAST_20_FALSE_TYPE5_8, "stitching/s2.jpg") 4.814 4.869 0.99
detect::feature2d::(FAST_20_FALSE_TYPE7_12, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 3.779 3.893 0.97
detect::feature2d::(FAST_20_FALSE_TYPE7_12, "stitching/a3.png") 2.152 2.221 0.97
detect::feature2d::(FAST_20_FALSE_TYPE7_12, "stitching/s2.jpg") 13.805 14.470 0.95
detect::feature2d::(FAST_20_FALSE_TYPE9_16, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.026 1.053 0.97
detect::feature2d::(FAST_20_FALSE_TYPE9_16, "stitching/a3.png") 0.700 0.709 0.99
detect::feature2d::(FAST_20_FALSE_TYPE9_16, "stitching/s2.jpg") 3.962 3.841 1.03
detect::feature2d::(FAST_20_TRUE_TYPE5_8, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.224 1.271 0.96
detect::feature2d::(FAST_20_TRUE_TYPE5_8, "stitching/a3.png") 0.706 0.732 0.96
detect::feature2d::(FAST_20_TRUE_TYPE5_8, "stitching/s2.jpg") 4.817 5.005 0.96
detect::feature2d::(FAST_20_TRUE_TYPE7_12, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 4.163 4.111 1.01
detect::feature2d::(FAST_20_TRUE_TYPE7_12, "stitching/a3.png") 2.404 2.415 1.00
detect::feature2d::(FAST_20_TRUE_TYPE7_12, "stitching/s2.jpg") 15.977 16.113 0.99
detect::feature2d::(FAST_20_TRUE_TYPE9_16, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.454 1.398 1.04
detect::feature2d::(FAST_20_TRUE_TYPE9_16, "stitching/a3.png") 1.093 1.049 1.04
detect::feature2d::(FAST_20_TRUE_TYPE9_16, "stitching/s2.jpg") 6.262 5.947 1.05
detect::feature2d::(FAST_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 2.646 2.542 1.04
detect::feature2d::(FAST_DEFAULT, "stitching/a3.png") 2.135 2.045 1.04
detect::feature2d::(FAST_DEFAULT, "stitching/s2.jpg") 9.831 9.366 1.05
Performance for AVX2 baseline
Performance test Reference time PR time Speedup
detect::feature2d::(FAST_20_FALSE_TYPE5_8, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.338 1.296 1.03
detect::feature2d::(FAST_20_FALSE_TYPE5_8, "stitching/a3.png") 0.777 0.757 1.03
detect::feature2d::(FAST_20_FALSE_TYPE5_8, "stitching/s2.jpg") 5.079 4.935 1.03
detect::feature2d::(FAST_20_FALSE_TYPE7_12, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 3.854 3.863 1.00
detect::feature2d::(FAST_20_FALSE_TYPE7_12, "stitching/a3.png") 2.215 2.203 1.01
detect::feature2d::(FAST_20_FALSE_TYPE7_12, "stitching/s2.jpg") 14.048 14.471 0.97
detect::feature2d::(FAST_20_FALSE_TYPE9_16, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 0.800 0.837 0.96
detect::feature2d::(FAST_20_FALSE_TYPE9_16, "stitching/a3.png") 0.547 0.568 0.96
detect::feature2d::(FAST_20_FALSE_TYPE9_16, "stitching/s2.jpg") 2.770 2.906 0.95
detect::feature2d::(FAST_20_TRUE_TYPE5_8, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.346 1.313 1.03
detect::feature2d::(FAST_20_TRUE_TYPE5_8, "stitching/a3.png") 0.780 0.758 1.03
detect::feature2d::(FAST_20_TRUE_TYPE5_8, "stitching/s2.jpg") 5.057 5.224 0.97
detect::feature2d::(FAST_20_TRUE_TYPE7_12, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 3.923 4.106 0.96
detect::feature2d::(FAST_20_TRUE_TYPE7_12, "stitching/a3.png") 2.311 2.301 1.00
detect::feature2d::(FAST_20_TRUE_TYPE7_12, "stitching/s2.jpg") 15.796 15.269 1.03
detect::feature2d::(FAST_20_TRUE_TYPE9_16, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 1.184 1.143 1.04
detect::feature2d::(FAST_20_TRUE_TYPE9_16, "stitching/a3.png") 0.908 0.869 1.05
detect::feature2d::(FAST_20_TRUE_TYPE9_16, "stitching/s2.jpg") 5.027 4.801 1.05
detect::feature2d::(FAST_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 2.110 2.111 1.00
detect::feature2d::(FAST_DEFAULT, "stitching/a3.png") 1.740 1.675 1.04
detect::feature2d::(FAST_DEFAULT, "stitching/s2.jpg") 8.354 7.954 1.05

@terfendail
Copy link
Copy Markdown
Contributor

It looks like the change makes performance better a bit. Actually the change to array was a part of avoiding of non-universal v_signmask, but I agree that for SIMD128 it's not a big issue to use it.

@opencv-pushbot opencv-pushbot merged commit 51ceabb into opencv:3.4 Aug 29, 2019
@alalek alalek mentioned this pull request Aug 30, 2019
@ChipKerchner ChipKerchner deleted the fastCorner branch September 3, 2019 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants