GAPI Fluid: Resize Linear U8C3 - reworking horizontal pass.#21144
GAPI Fluid: Resize Linear U8C3 - reworking horizontal pass.#21144alalek merged 2 commits intoopencv:4.xfrom
Conversation
6bf90aa to
000588b
Compare
000588b to
52f38b6
Compare
sivanov-work
left a comment
There was a problem hiding this comment.
I'm sorry - looks like i'm not the right person who eligible to review intrisics code so deeply.
@anna-khakimova you had better to add the more qualified reviewer then me.
I was expected to find differences in buffer tail processing conditions, but it seems on much more modifications here. Sorry again - I'm not enough qualified for reviewing such strong SSE code
| bool yRatioEq = inSz.height == outSz.height; | ||
| constexpr int nlanes = 16; | ||
| constexpr int half_nlanes = 16 / 2; | ||
| constexpr int nlanes = 16; // number of 8-bit integers that fit into a 128-bit SIMD vector. |
There was a problem hiding this comment.
suggest to make code self documented ( may be in future ways) likes as
struct sse_traits {
constexpr int instruction_size = 128;
constexpr int lanes_count = 128 / 8bit;
...
}
and reuse them
|
|
||
| for (int x = 0; outSz.width >= nlanes; ) | ||
| __m128i horizontal_shuf_mask1 = _mm_setr_epi8(0, 1, 2, 4, 5, 6, 8, 9, 10, 12, 13, 14, 3, 7, 11, 15); | ||
| constexpr int nproc_pixels = 5; |
There was a problem hiding this comment.
what is nproc_pixels: is it non_proc or number of proc and how is 5 was obtained?
UPDATE: discussed offline.
but i still think that processing_pixel_number looks better
5 = 128 / 24, where 24 is rgb * 8bit - could you please put it into comment?
GAPI Fluid: Resize Linear U8C3 - reworking horizontal pass. * Reworked horizontal pass * Fixed valgrind issue and removed unnesesary snippet
Reworking the Resize Linear U8C3 horizontal pass. Previous version handles 16 pixels per loop iteration. New version handles 5 pixels per iteration.
Fix for valgrind issue
Enabling SSE41 SIMD Resize U8C3
Valgrind run ( ✖ failed due to
GAPI_Streaming_Desync.Python_Pull_Overloadhang/timeout - reproducible on weekly builds)Performance report:
Resize8uc3NewHorizontal.xlsx