Skip to content

GAPI Fluid: Resize Linear U8C3 - reworking horizontal pass.#21144

Merged
alalek merged 2 commits intoopencv:4.xfrom
anna-khakimova:ak/resize_simd_v2
Jan 14, 2022
Merged

GAPI Fluid: Resize Linear U8C3 - reworking horizontal pass.#21144
alalek merged 2 commits intoopencv:4.xfrom
anna-khakimova:ak/resize_simd_v2

Conversation

@anna-khakimova
Copy link
Copy Markdown
Member

@anna-khakimova anna-khakimova commented Nov 28, 2021

  • Reworking the Resize Linear U8C3 horizontal pass. Previous version handles 16 pixels per loop iteration. New version handles 5 pixels per iteration.

  • Fix for valgrind issue

  • Enabling SSE41 SIMD Resize U8C3

  • Valgrind run ( ✖ failed due to GAPI_Streaming_Desync.Python_Pull_Overload hang/timeout - reproducible on weekly builds)

Performance report:

Resize8uc3NewHorizontal.xlsx

force_builders=Linux AVX2,Custom,Custom Win,Custom Mac
build_gapi_standalone:Linux x64=ade-0.1.1f
build_gapi_standalone:Win64=ade-0.1.1f
Xbuild_gapi_standalone:Mac=ade-0.1.1f
build_gapi_standalone:Linux x64 Debug=ade-0.1.1f

build_image:Custom=centos:7
buildworker:Custom=linux-1
build_gapi_standalone:Custom=ade-0.1.1f

Xbuild_image:Custom=ubuntu-openvino-2021.3.0:20.04
build_image:Custom Win=openvino-2021.4.1
build_image:Custom Mac=openvino-2021.2.0

buildworker:Custom Win=windows-3

test_modules:Custom=gapi,python2,python3,java
test_modules:Custom Win=gapi,python2,python3,java
test_modules:Custom Mac=gapi,python2,python3,java

buildworker:Custom=linux-1
# disabled due high memory usage: test_opencl:Custom=ON
Xtest_opencl:Custom=OFF
Xtest_bigdata:Custom=1
Xtest_filter:Custom=*

CPU_BASELINE:Custom Win=AVX512_SKX
CPU_BASELINE:Custom=SSE4_2

Copy link
Copy Markdown
Contributor

@sivanov-work sivanov-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry - looks like i'm not the right person who eligible to review intrisics code so deeply.
@anna-khakimova you had better to add the more qualified reviewer then me.

I was expected to find differences in buffer tail processing conditions, but it seems on much more modifications here. Sorry again - I'm not enough qualified for reviewing such strong SSE code

bool yRatioEq = inSz.height == outSz.height;
constexpr int nlanes = 16;
constexpr int half_nlanes = 16 / 2;
constexpr int nlanes = 16; // number of 8-bit integers that fit into a 128-bit SIMD vector.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest to make code self documented ( may be in future ways) likes as

struct sse_traits {
constexpr int instruction_size = 128;
constexpr int lanes_count = 128 / 8bit;
...
}

and reuse them


for (int x = 0; outSz.width >= nlanes; )
__m128i horizontal_shuf_mask1 = _mm_setr_epi8(0, 1, 2, 4, 5, 6, 8, 9, 10, 12, 13, 14, 3, 7, 11, 15);
constexpr int nproc_pixels = 5;
Copy link
Copy Markdown
Contributor

@sivanov-work sivanov-work Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is nproc_pixels: is it non_proc or number of proc and how is 5 was obtained?

UPDATE: discussed offline.
but i still think that processing_pixel_number looks better
5 = 128 / 24, where 24 is rgb * 8bit - could you please put it into comment?

@alalek alalek merged commit 60228d3 into opencv:4.x Jan 14, 2022
@alalek alalek mentioned this pull request Feb 22, 2022
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
GAPI Fluid: Resize Linear U8C3 - reworking horizontal pass.

* Reworked horizontal pass

* Fixed valgrind issue and removed unnesesary snippet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants