Improve SIFT for arm64/Apple silicon#20204
Conversation
|
Investigating build failures |
|
TIPS 1 : rather filtering with TIPS 2: use Will give you something like this Which is WAY MORE READABLE on Github Median
Also, the commit message
|
|
Thanks for the tips, will reformat.
We tested full vector on coffeelake, marginal up (and down). Since the benefit was clear on NEON, thats where we landed. |
|
@Developer-Ecosystem-Engineering, thank you for the contribution! This work on improving OpenCV@M1 performance is brilliant! Note, however, that except for the kernels in DNN module, which are few and really critical, we do not accept native optimizations any longer. It would be just impossible for our tiny team to maintain all those branches. Please, rewrite the native NEON code using our universal intrinsics. |
An updated patch is available with it rewritten. |
|
@Developer-Ecosystem-Engineering, thank you! the patch is almost ready to be merged. Please, fix the compile warnings on Windows (see pullrequest.opencv.org) and squash commits into one. |
3506d96 to
cb12f86
Compare
|
- Reduce branch density by collapsing compares. - Fix windows build errors - Use OpenCV universal intrinsics - Use v_check_any and v_signmask as requested
cb12f86 to
9557b9f
Compare
|
Modifications requested by @alalek have been integrated and re-squashed. |
Reduce branch density by collapsing compares.
Performance improvements from 1.03 to 1.53 with existing tests
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.