Skip to content

GAPI Fluid: SIMD AVX2 Resize F32C1.#21728

Merged
alalek merged 6 commits intoopencv:4.xfrom
anna-khakimova:ak/resize_f32c1_avx_simd
Mar 25, 2022
Merged

GAPI Fluid: SIMD AVX2 Resize F32C1.#21728
alalek merged 6 commits intoopencv:4.xfrom
anna-khakimova:ak/resize_f32c1_avx_simd

Conversation

@anna-khakimova
Copy link
Copy Markdown
Member

@anna-khakimova anna-khakimova commented Mar 16, 2022

SIMD AVX2 for Resize F32C1.

Performance:
ResizeF32C1_SIMD_AVX.xlsx

force_builders=Linux AVX2,Custom,Custom Win,Custom Mac
build_gapi_standalone:Linux x64=ade-0.1.1f
build_gapi_standalone:Win64=ade-0.1.1f
Xbuild_gapi_standalone:Mac=ade-0.1.1f
build_gapi_standalone:Linux x64 Debug=ade-0.1.1f

build_image:Custom=centos:7
buildworker:Custom=linux-1
build_gapi_standalone:Custom=ade-0.1.1f

Xbuild_image:Custom=ubuntu-openvino-2021.3.0:20.04
build_image:Custom Win=openvino-2021.4.1
build_image:Custom Mac=openvino-2021.2.0

buildworker:Custom Win=windows-3

test_modules:Custom=gapi,python2,python3,java
test_modules:Custom Win=gapi,python2,python3,java
test_modules:Custom Mac=gapi,python2,python3,java

buildworker:Custom=linux-1
# disabled due high memory usage: test_opencl:Custom=ON
Xtest_opencl:Custom=OFF
Xtest_bigdata:Custom=1
Xtest_filter:Custom=*

CPU_BASELINE:Custom Win=AVX512_SKX
CPU_BASELINE:Custom=SSE4_2

@anna-khakimova anna-khakimova added this to the 4.6.0 milestone Mar 16, 2022
@anna-khakimova anna-khakimova force-pushed the ak/resize_f32c1_avx_simd branch from 009e3bb to c1938d8 Compare March 16, 2022 11:04
@anna-khakimova anna-khakimova changed the title GAPI Fluid: SIMD Resize F32C1. GAPI Fluid: SIMD AVX2 Resize F32C1. Mar 16, 2022
@anna-khakimova anna-khakimova force-pushed the ak/resize_f32c1_avx_simd branch from c1938d8 to c0e49b5 Compare March 21, 2022 09:50
@anna-khakimova anna-khakimova force-pushed the ak/resize_f32c1_avx_simd branch from c0e49b5 to 8f920d4 Compare March 21, 2022 10:12
Copy link
Copy Markdown
Contributor

@sivanov-work sivanov-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think last loop can be optimized in single memcpy


#if defined __GNUC__
# pragma GCC diagnostic push
# pragma GCC diagnostic ignored "-Wstrict-overflow"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need to mask it here? does it hide some sort of error?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a legacy carried over from OpenVINO. This warning suppression was added to OpenVINO without my participation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to remove it and try build with opencv CI

Copy link
Copy Markdown
Member Author

@anna-khakimova anna-khakimova Mar 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.
Running CI checks were successful.

{
GAPI_DbgAssert(xRatioEq1 && yRatioEq1);
int length = inSz.width; // == outSz.width
for (int line = 0; line < lpi; ++line)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still memory allocated continuously (i believe) is it possible to make copy whole matrix block ( which is linear data in memory) using single one memcpy call instead of wasting CPU cycles on loop iteration?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

Copy link
Copy Markdown
Contributor

@sivanov-work sivanov-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (when tests are passed)

if (inSz.width >= nlanes && outSz.width >= nlanes)
{
avx2::calcRowLinear32FC1Impl(reinterpret_cast<float**>(dst),
reinterpret_cast<const float**>(src0),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reinterpret cast is normally a red flag, can this be avoided? can avx2::calcRowLinear32FC1Impl also accept a float*[] type instead?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Removed.

@anna-khakimova
Copy link
Copy Markdown
Member Author

@alalek Could you please take a look this PR and merge if it looks good for you?

@alalek alalek merged commit e5bdab0 into opencv:4.x Mar 25, 2022
@opencv-pushbot opencv-pushbot mentioned this pull request Apr 23, 2022
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
…vx_simd

GAPI Fluid: SIMD AVX2 Resize F32C1.

* GAPI Fluid: Resize F32C1 scalar.

* Final version

* GAPI Fluid: SIMD AVX2 for Resize F32C1.

* Applied comments.

* Deleted warning suppression.

* Applied comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants