Add Neon optimised RGB2Lab conversion#19883
Conversation
|
Marking this RFC, because it doesn't follow OpenCV guidelines to avoid using of raw native intrinsics in OpenCV modules. |
|
Hi @alalek, thank you for looking into this. We (me and @fpetrogalli) submitted it like this because we weren't sure what correct approach was. Also, sorry if this is an silly question, but what does it mean to mark it as RFC? One possible solution to the raw intrinsics is to keep the |
|
@jondea, thank you for the contribution! I'd start with the first option that you suggested - keep the separate branch under CV_NEON, but rewrite it using HAL intrinsics. I briefly looked at the current implementation and I found it too bulky for the equivalent C code that it accelerates. So, I'm 60-80% sure that the HAL code that you write will be faster than the existing implementation not just on ARM, but on the other platforms as well. And then we will just replace that code with yours, i.e. remove |
|
The changes have been rewritten to use just HAL intrinsics, any feedback would be appreciated. |
fpetrogalli
left a comment
There was a problem hiding this comment.
LGTM, with a nit.
Final word on the maintainers, of course!
Thank you, Francesco
Co-authored-by: Francesco Petrogalli <25690309+fpetrogalli@users.noreply.github.com>
|
@jondea, thank you very much! I tested the code both on Mac-Intel and Mac-ARM (M1), it works well, the claimed acceleration is achieved. On Intel it's no slower than the previous version, but, unfortunately, it's 128-bit only. In any case, it can be merged as-is, and later we can modify this code to use some new variations of 👍 |
@vpisarev , @jondea is working on an equivalent version that uses SVE2 intrinsics. He is using the intrinsic |
A Neon specific implementation of RGB2Lab increases single threaded performance by ~25%, here's the numbers run on aws c6gd.4xlarge with gcc 9.3 (numbers are similar using gcc 10)
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.