Improve resize 8u3 by pmur · Pull Request #16146 · opencv/opencv

pmur · 2019-12-12T20:51:18Z

I wasn't impressed with my workaround for the OOB fix. Let's fix it with little or no regressing of performance.

This fixes #16137 with little or no performance impact. I have
also added a debug check as a sanity check.

alalek · 2019-12-12T23:55:07Z

modules/imgproc/src/resize.cpp

                int *D1 = dst[k+1];

-                for( dx = 0; dx < len0; dx += cn )
+                for( dx = 0; (xofs[dx] + cn) < smax; dx += cn )


cn

As we reading 4 bytes (v_load_expand_q()), so I believe we should count these + 4 somewhere.

(in meaning I would like to see " + 4" / " - 4" in the code directly)

This loop is predicated on single units of the input matrix (cn), not vector unit size. Would you prefer a variable similar to int step = (vuint32x4::nlanes/cn)*cn; to replace the direct usage of cn? The 8u3 case can easily port to larger vector sizes (the 75% efficiency may show more).

This note is not about the "step".
It is about the limit, like xofs[dx] + cn + 4 < some_buffer_limit

8u3 case can easily port to larger vector sizes

Make sense, there are several algorithm implementations which process 4 pixels (4*3 = 12 bytes) per iteration. Image resolutions are usually divided by 4 (640x480, 1280x720, etc), so they don't require heavy tail processing.

Any unit not a multiple of cn doesn't make sense here. Adding + 4 only guarantees 1 more operation which can safely vectorize does not get vectorized. The only thing the vector loop must satisfy is that it never touches the final cn bytes of src or dst. It strides by 3B.

v_load_expand_q() reads 4 bytes, so we must check that all these 4 bytes are fit into the buffer range limits.

We must check that all 4 bytes valid.
Through explicit check (with 4 bytes, not through "obfuscated" conditions). Make it clear for review.
We already have that mistake, so let avoid similar mistakes in the future.

Checking a non-modulo 3 value against such a loop counter is at best accidental correctness, and at worst subtly wrong. (e.g failing to account for src[] offset as was the original bug, which was a different bug altogether). The loop touches 1 + 1/3 8c3 values per iteration, but increments by 1.

As far as I understand fourth value will be OOB on writing when dx >= (dwidth - cn) and will be OOB on reading when (xofs[dx] + cn) >= (swidth -cn).
So the condition could be (xofs[dx] + 2*cn) < smax or (xofs[dx] + cn + 4) <= smax

That isn't wrong, but will overlook safe vector ops. It ignores how smax is selected. smax >= (swidth - cn) can only be met if dx >= dmax. This can never be true since xofs[] is by design an array of increasingly large offsets.

Sorry for mistake in the previous message. I've been confused by the change of smax meaning.
So due to the way smax is defined we could state that smax < swidth - cn therefore (xofs[dx] + cn) < smax check ensures that xofs[dx] + cn < swidth - cn or that xofs[dx] + cn <= swidth - cn - 1 equal to xofs[dx] + cn <= swidth - 4
Both statements ensure safe reading of 4 values starting from xofs[dx] + cn however the latter clearly states that it safe for reading of strictly 4 values while the former rather looks like safe reading of a pixel triplet.
Direct mention of 4 values(or even better of v_int32x4::nlanes) is important to avoid introduction of OOB mistakes when the code will be migrated to wide universal intrinsics.

Actually, we can do this in constant time. xofs always contains same or increasing offset values. We can instead find the most extreme value used and never attempt to load it. Similarly, we can note for all dx >= 0 and dx < (dwidth - cn) where xofs[dx] + cn < xofs[dwidth-cn] implies dx < (dwidth - cn). Thus, we can use this to control our loop termination optimally. This fixes opencv#16137 with little or no performance impact. I have also added a debug check as a sanity check.

asmorkalov · 2020-01-17T14:21:46Z

@terfendail @alalek Any comments on latest version?

terfendail

There still is a readability issue that could affect porting to wide universal intrinsics. However this version is better and cleaner than previous so I think it could be merged.

alalek reviewed Dec 12, 2019

View reviewed changes

pmur force-pushed the reg_16137x2 branch from eefa513 to c1cdb24 Compare January 3, 2020 20:48

terfendail approved these changes Jan 22, 2020

View reviewed changes

alalek assigned terfendail Jan 22, 2020

opencv-pushbot pushed a commit that referenced this pull request Jan 22, 2020

Merge pull request #16146 from pmur:reg_16137x2

881cee4

opencv-pushbot merged commit c1cdb24 into opencv:3.4 Jan 22, 2020

alalek mentioned this pull request Jan 22, 2020

Merge 3.4 #16406

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve resize 8u3#16146

Improve resize 8u3#16146
opencv-pushbot merged 1 commit intoopencv:3.4from
pmur:reg_16137x2

pmur commented Dec 12, 2019

Uh oh!

alalek Dec 12, 2019 •

edited

Loading

Uh oh!

pmur Dec 13, 2019

Uh oh!

alalek Dec 13, 2019 •

edited

Loading

Uh oh!

pmur Dec 13, 2019

Uh oh!

alalek Dec 13, 2019

Uh oh!

alalek Dec 16, 2019

Uh oh!

pmur Dec 16, 2019

Uh oh!

terfendail Dec 17, 2019

Uh oh!

pmur Jan 3, 2020

Uh oh!

terfendail Jan 15, 2020 •

edited

Loading

Uh oh!

asmorkalov commented Jan 17, 2020

Uh oh!

terfendail left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

pmur commented Dec 12, 2019

Uh oh!

alalek Dec 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmur Dec 13, 2019

Choose a reason for hiding this comment

Uh oh!

alalek Dec 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmur Dec 13, 2019

Choose a reason for hiding this comment

Uh oh!

alalek Dec 13, 2019

Choose a reason for hiding this comment

Uh oh!

alalek Dec 16, 2019

Choose a reason for hiding this comment

Uh oh!

pmur Dec 16, 2019

Choose a reason for hiding this comment

Uh oh!

terfendail Dec 17, 2019

Choose a reason for hiding this comment

Uh oh!

pmur Jan 3, 2020

Choose a reason for hiding this comment

Uh oh!

terfendail Jan 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asmorkalov commented Jan 17, 2020

Uh oh!

terfendail left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alalek Dec 12, 2019 •

edited

Loading

alalek Dec 13, 2019 •

edited

Loading

terfendail Jan 15, 2020 •

edited

Loading