Add Neon optimised RGB2Lab conversion by jondea · Pull Request #19883 · opencv/opencv

jondea · 2021-04-09T09:51:38Z

A Neon specific implementation of RGB2Lab increases single threaded performance by ~25%, here's the numbers run on aws c6gd.4xlarge with gcc 9.3 (numbers are similar using gcc 10)

Test set	Test number	After/before ratio	Speedup with 1/million bounds [%]
cvtColor8u	8	0.76835	23.2 ± 0.2
cvtColor8u	34	0.76204	23.8 ± 0.2
cvtColor8u	67	0.76667	23.3 ± 0.2
cvtColor8u	69	0.76773	23.2 ± 0.2
cvtColor8u	71	0.76231	23.8 ± 0.2
cvtColor8u	73	0.76184	23.8 ± 0.2
cvtColor8u	90	0.76851	23.1 ± 0.2
cvtColor8u	103	0.76143	23.9 ± 0.2
cvtColor8u	128	0.73870	26.1 ± 0.1
cvtColor8u	154	0.73760	26.2 ± 0.2
cvtColor8u	187	0.73891	26.1 ± 0.1
cvtColor8u	189	0.73889	26.1 ± 0.1
cvtColor8u	191	0.73802	26.2 ± 0.2
cvtColor8u	193	0.73817	26.2 ± 0.2
cvtColor8u	210	0.73879	26.1 ± 0.1
cvtColor8u	223	0.73745	26.3 ± 0.2
cvtColor8u	248	0.73756	26.2 ± 0.1
cvtColor8u	274	0.73613	26.4 ± 0.1
cvtColor8u	307	0.73768	26.2 ± 0.1
cvtColor8u	309	0.73767	26.2 ± 0.1
cvtColor8u	311	0.73676	26.3 ± 0.2
cvtColor8u	313	0.73672	26.3 ± 0.2
cvtColor8u	330	0.73748	26.3 ± 0.1
cvtColor8u	343	0.73591	26.4 ± 0.1

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
The PR is proposed to proper branch
There is reference to original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

force_builders=linux,docs,ARMv8,ARMv7

alalek · 2021-04-09T10:00:04Z

Marking this RFC, because it doesn't follow OpenCV guidelines to avoid using of raw native intrinsics in OpenCV modules.

jondea · 2021-04-09T13:38:12Z

Hi @alalek, thank you for looking into this. We (me and @fpetrogalli) submitted it like this because we weren't sure what correct approach was. Also, sorry if this is an silly question, but what does it mean to mark it as RFC?

One possible solution to the raw intrinsics is to keep the #if CV_NEON block in color_lab.cpp but rewrite it using the HAL intrinsics. Another solution would be to split it into a neon specific file, like in the case of resize.cpp, resize.avx2.cpp and resize.sse4_1.cpp for example. Are either of these acceptable or preferable? Or is there a another way which would achieve the same goal?

vpisarev · 2021-04-15T12:37:12Z

@jondea, thank you for the contribution!
as @alalek said, for the tiny OpenCV core team it's simply unfeasible to maintain separate code branches for the growing amount of code and the growing number of platforms that we support. With time we hope to port most of the remaining native branches to HAL/universal intrinsics. There will be some exceptions, like deep learning, where the amount of critical kernels is not that big and where we can afford separate branches, but overall universal intrinsics is the preferable (by far) option.

I'd start with the first option that you suggested - keep the separate branch under CV_NEON, but rewrite it using HAL intrinsics. I briefly looked at the current implementation and I found it too bulky for the equivalent C code that it accelerates. So, I'm 60-80% sure that the HAL code that you write will be faster than the existing implementation not just on ARM, but on the other platforms as well. And then we will just replace that code with yours, i.e. remove #if CV_NEON ... #endif around your code and remove the other branch.

jondea · 2021-04-27T14:35:58Z

The changes have been rewritten to use just HAL intrinsics, any feedback would be appreciated.

fpetrogalli

Hi @jondea,

just a couple of minor observations.

Thank you for your work.

Francesco

modules/imgproc/src/color_lab.cpp

fpetrogalli

LGTM, with a nit.

Final word on the maintainers, of course!

Thank you, Francesco

modules/imgproc/src/color_lab.cpp

Co-authored-by: Francesco Petrogalli <25690309+fpetrogalli@users.noreply.github.com>

jondea · 2021-05-10T15:13:16Z

@vpisarev @alalek is there anything else which needs to be done before this can be merged?

jondea · 2021-05-19T08:53:20Z

Hi @vpisarev @alalek would you be able to take another look at this please and let me know if it can be merged?

vpisarev · 2021-05-25T05:45:03Z

@jondea, thank you very much! I tested the code both on Mac-Intel and Mac-ARM (M1), it works well, the claimed acceleration is achieved. On Intel it's no slower than the previous version, but, unfortunately, it's 128-bit only.

In any case, it can be merged as-is, and later we can modify this code to use some new variations of v_lut() intrinsic.

👍

modules/imgproc/src/color_lab.cpp

fpetrogalli · 2021-05-25T09:52:30Z

@jondea, thank you very much! I tested the code both on Mac-Intel and Mac-ARM (M1), it works well, the claimed acceleration is achieved. On Intel it's no slower than the previous version, but, unfortunately, it's 128-bit only.

@vpisarev , @jondea is working on an equivalent version that uses SVE2 intrinsics. He is using the intrinsic svld1uh_gather_s32index_s32 for the variation of v_lut() that does a gather from the indexes. It is a Vector Length Agnositc (VLA) version, so it could be ported easily to HAL once we make the nlanes field to be a runtime value and we have the correspondent indexed lut intrinsic.

Add Neon optimised RGB2Lab conversion

8c25876

Fix compile errors, change lambda to macro

00e21e4

jondea added 2 commits April 23, 2021 09:01

Change NEON optimised RGB2Lab to just use HAL

df4ed8c

Change [] to v_extract_n in RGB2Lab

c73eae8

fpetrogalli reviewed Apr 28, 2021

View reviewed changes

modules/imgproc/src/color_lab.cpp Outdated Show resolved Hide resolved

modules/imgproc/src/color_lab.cpp Outdated Show resolved Hide resolved

modules/imgproc/src/color_lab.cpp Outdated Show resolved Hide resolved

RGB2LAB Code quality, change to nlane agnostic

fe8a1bd

fpetrogalli reviewed Apr 29, 2021

View reviewed changes

modules/imgproc/src/color_lab.cpp Outdated Show resolved Hide resolved

modules/imgproc/src/color_lab.cpp Outdated Show resolved Hide resolved

Change RGB2Lab to use function rather than macro

422583a

fpetrogalli approved these changes May 4, 2021

View reviewed changes

modules/imgproc/src/color_lab.cpp Outdated Show resolved Hide resolved

Remove whitespace

1da33dc

Co-authored-by: Francesco Petrogalli <25690309+fpetrogalli@users.noreply.github.com>

vpisarev self-assigned this May 24, 2021

vpisarev self-requested a review May 25, 2021 05:45

vpisarev approved these changes May 25, 2021

View reviewed changes

asmorkalov requested changes May 25, 2021

View reviewed changes

modules/imgproc/src/color_lab.cpp Show resolved Hide resolved

modules/imgproc/src/color_lab.cpp Show resolved Hide resolved

vpisarev requested a review from asmorkalov May 28, 2021 04:28

asmorkalov requested review from asmorkalov and removed request for asmorkalov May 28, 2021 07:24

alalek merged commit 8ecfbdb into opencv:3.4 May 28, 2021

This was referenced May 29, 2021

(4.x) Merge 3.4 #20180

Merged

(5.x) Merge 4.x #20216

Merged

Uh oh!

Conversation

jondea commented Apr 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

alalek commented Apr 9, 2021

Uh oh!

jondea commented Apr 9, 2021

Uh oh!

vpisarev commented Apr 15, 2021

Uh oh!

jondea commented Apr 27, 2021

Uh oh!

fpetrogalli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fpetrogalli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jondea commented May 10, 2021

Uh oh!

jondea commented May 19, 2021

Uh oh!

vpisarev commented May 25, 2021

Uh oh!

Uh oh!

Uh oh!

fpetrogalli commented May 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jondea commented Apr 9, 2021 •

edited

Loading