Skip to content

GAPI Fluid: SIMD for SubC kernel.#21158

Merged
alalek merged 2 commits intoopencv:4.xfrom
anna-khakimova:ak/simd_subC
Dec 1, 2021
Merged

GAPI Fluid: SIMD for SubC kernel.#21158
alalek merged 2 commits intoopencv:4.xfrom
anna-khakimova:ak/simd_subC

Conversation

@anna-khakimova
Copy link
Copy Markdown
Member

@anna-khakimova anna-khakimova commented Nov 30, 2021

SIMD for GAPI Fluid SubC kernel.

Performance report:

FluidSubCSIMD.xlsx

force_builders=Linux AVX2,Custom,Custom Win,Custom Mac
build_gapi_standalone:Linux x64=ade-0.1.1f
build_gapi_standalone:Win64=ade-0.1.1f
Xbuild_gapi_standalone:Mac=ade-0.1.1f
build_gapi_standalone:Linux x64 Debug=ade-0.1.1f

build_image:Custom=centos:7
buildworker:Custom=linux-1
build_gapi_standalone:Custom=ade-0.1.1f

Xbuild_image:Custom=ubuntu-openvino-2021.3.0:20.04
build_image:Custom Win=openvino-2021.4.1
build_image:Custom Mac=openvino-2021.2.0

buildworker:Custom Win=windows-3

test_modules:Custom=gapi,python2,python3,java
test_modules:Custom Win=gapi,python2,python3,java
test_modules:Custom Mac=gapi,python2,python3,java

buildworker:Custom=linux-1
# disabled due high memory usage: test_opencl:Custom=ON
Xtest_opencl:Custom=OFF
Xtest_bigdata:Custom=1
Xtest_filter:Custom=*

CPU_BASELINE:Custom Win=AVX512_SKX
CPU_BASELINE:Custom=SSE4_2

@anna-khakimova anna-khakimova force-pushed the ak/simd_subC branch 2 times, most recently from 5abee9b to 86f21d4 Compare November 30, 2021 12:43
Copy link
Copy Markdown
Contributor

@sivanov-work sivanov-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM: added minor comments

float* sc = scratch.OutLine<float>();

for (int i = 0; i < scratch.length(); ++i)
sc[i] = static_cast<float>(_scalar[i % chan]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive: till scratch size is known at compile phase - is it possible to make unrolling loop for that determined size?
In case of compiler didn't make unroll it automatically...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not entirely clear yet. Could you explain in more detail?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure:

Scratch buffer size depends from MACRO definition here https://github.com/opencv/opencv/pull/21158/files#diff-fe0ac6b5a07c1bec59c3756100eb186c9117a82dd1d951ef6997b8423a89943dR1505

we have constant buffer size here for CV_SIMD turn on/off case... So it is possible to introduce MACRO branch here by CV_SIMD turn on/off and make manual loop unrolling in a way

#ifdef CV_SIMD
  cs[0] = _scalar[0];
   cs[1] = _scalar[1];
...<16  + 2 times>

#else

But from the other hand, C++ compiler can do this work of manual loop unrolling in more optimized way... But I do not known much details how is it applicable here: would C++ compiler make unrolling or wouldn't in this case - if scratch.length() has runtime (but not compile time) value.

So my point here: We can try to simply replace scratch.length() from dynamic parameter to constant parameter from MACRO where scratch buffer is determined and believe that C++ compiler would make loop unrolling or straight way vectorization here

P.S. this comment may be (or may not be ) applicable if this code is placed on critical performance path ONLY

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me there are no sense to do this for current case, because this loop work just once when function is called for the first line of the handling picture. And the compiler usually does it well itself.


#undef MUL_SIMD

struct add_op {};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simple find on a page doesn't return to me anything about using this structure: where is it used?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Removed.

{
#if CV_SIMD
// 512 bits / 32 bits = 16 elements of float32 can contain a AVX 512 SIMD vector.
constexpr int maxNlanes = 16;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the same initialization which used for AddC? if yes - then maybe we need single common function to reuse in both?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

typename std::enable_if<(std::is_same<DST, ushort>::value ||
std::is_same<DST, short>::value), void>::type
addc_simd_common_impl(const SRC* inx, DST* outx, const v_float32& sc, const int nlanes)
arithmOpScalar_simd_common_impl(op_tag t, const SRC* inx, DST* outx, const v_float32& sc, const int nlanes)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻
Now code for sum + sub is unified?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think to rename op for something more less domain specific -> operation or op/operation_dispatcher ? too much short-named variables here and functions name stays hidden beside them

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

@anna-khakimova
Copy link
Copy Markdown
Member Author

anna-khakimova commented Dec 1, 2021

@alalek CI checks passed, PR was approved. Could you please merge it?

@alalek alalek merged commit 369b260 into opencv:4.x Dec 1, 2021
@alalek alalek mentioned this pull request Dec 30, 2021
@alalek alalek mentioned this pull request Feb 22, 2022
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
* GAPI Fluid: SIMD for SubC kernel.

* Applied comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants