Skip to content

GAPI Fluid: SIMD for AddC kernel.#21119

Merged
alalek merged 3 commits intoopencv:4.xfrom
anna-khakimova:ak/simd_addc
Nov 29, 2021
Merged

GAPI Fluid: SIMD for AddC kernel.#21119
alalek merged 3 commits intoopencv:4.xfrom
anna-khakimova:ak/simd_addc

Conversation

@anna-khakimova
Copy link
Copy Markdown
Member

@anna-khakimova anna-khakimova commented Nov 25, 2021

SIMD for GAPI Fluid AddC kernel.

Performance report:

FluidAddCSIMD.xlsx

force_builders=Linux AVX2,Custom,Custom Win,Custom Mac
build_gapi_standalone:Linux x64=ade-0.1.1f
build_gapi_standalone:Win64=ade-0.1.1f
Xbuild_gapi_standalone:Mac=ade-0.1.1f
build_gapi_standalone:Linux x64 Debug=ade-0.1.1f

build_image:Custom=centos:7
buildworker:Custom=linux-1
build_gapi_standalone:Custom=ade-0.1.1f

Xbuild_image:Custom=ubuntu-openvino-2021.3.0:20.04
build_image:Custom Win=openvino-2021.4.1
build_image:Custom Mac=openvino-2021.2.0

buildworker:Custom Win=windows-3

test_modules:Custom=gapi,python2,python3,java
test_modules:Custom Win=gapi,python2,python3,java
test_modules:Custom Mac=gapi,python2,python3,java

buildworker:Custom=linux-1
# disabled due high memory usage: test_opencl:Custom=ON
Xtest_opencl:Custom=OFF
Xtest_bigdata:Custom=1
Xtest_filter:Custom=*

CPU_BASELINE:Custom Win=AVX512_SKX
CPU_BASELINE:Custom=SSE4_2

Copy link
Copy Markdown
Contributor

@sivanov-work sivanov-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in overall LGTM.
Could you please clarify about: Why 1, 2 & 4 falls into the same case and Warning?Exception?

static void initScratch(const GMatDesc&, const GScalarDesc&, int, Buffer& scratch)
{
#if CV_SIMD
// Max value of v_float32::nlanes = 16 in AVX512 ISA
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to launch this library compiled with CV_SIMD on a machine without SIMD support?
I mean: do we need runtime check on supporting SIMD on executable machine ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#if CV_SIMD is the compilation macro. When CV_SIMD is false this code is truncated in compilation time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is different.
There are BUILD machine and EXECUTION machine. The build machine may turn on SIMD or other instructions sets through MACRO. In this case the appropriate code burn out binary (#if CV_SIMD) and we will have buflen = 16.

Let's assume that OPENCV was built on machine with CV_SIMD (code is exist) but executes on machine with CPU which doesn't support SIMD (code exist but cannot be executed) - so we still have 16 here, but actual earlier dispatching work out in assumption that there are no SIMD support and expect that it should have 4 due to CPU doesn't support SIMD

To resolve this situation we usually use TWO kind of check

#if EXTENSION_EXIST   // compile code ON
   if( extension_supported()) // test is supported
   {
      //      supported
   }
   else 
   {
         // compiled, but not supported
   }
#else
    // not compiled, but not supported 

Is this situation applicable here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, no errors or problems will arise. The buffer will be larger, but it will not cause any unwanted side effects.

Comment on lines +1074 to +1081
constexpr int nlanes = v_float32::nlanes;
constexpr int chan = 3;
constexpr int lanes = chan * nlanes;

int x = 0;
for (;;)
{
for (; x <= length - lanes; x += lanes)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this copy-paste part with lanes calculation & internal for-loop could be incorporated in initial dispatcher addc_simd_c3(const SRC in[], const float scalar[], DST out[], const int length) then
this *_impl would just preform intrisict calculation only without this dupliacted pre- and post- operations

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

case 3: \
return addc_simd_c3(in, scalar, out, length); \
default: \
break; \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning/Exception?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

Copy link
Copy Markdown
Contributor

@sivanov-work sivanov-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@rgarnov rgarnov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@anna-khakimova
Copy link
Copy Markdown
Member Author

@alalek Could you please merge the PR?

@alalek alalek merged commit d58b5ef into opencv:4.x Nov 29, 2021
@alalek alalek mentioned this pull request Dec 30, 2021
@alalek alalek mentioned this pull request Feb 22, 2022
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
* GAPI Fluid: SIMD for AddC kernel

* Final version

* Applied comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants