GAPI Fluid: SIMD for AddC kernel.#21119
Conversation
cb58db3 to
d5554e2
Compare
d5554e2 to
fc59736
Compare
fc59736 to
57a9eed
Compare
sivanov-work
left a comment
There was a problem hiding this comment.
in overall LGTM.
Could you please clarify about: Why 1, 2 & 4 falls into the same case and Warning?Exception?
| static void initScratch(const GMatDesc&, const GScalarDesc&, int, Buffer& scratch) | ||
| { | ||
| #if CV_SIMD | ||
| // Max value of v_float32::nlanes = 16 in AVX512 ISA |
There was a problem hiding this comment.
Is it possible to launch this library compiled with CV_SIMD on a machine without SIMD support?
I mean: do we need runtime check on supporting SIMD on executable machine ?
There was a problem hiding this comment.
#if CV_SIMD is the compilation macro. When CV_SIMD is false this code is truncated in compilation time.
There was a problem hiding this comment.
The question is different.
There are BUILD machine and EXECUTION machine. The build machine may turn on SIMD or other instructions sets through MACRO. In this case the appropriate code burn out binary (#if CV_SIMD) and we will have buflen = 16.
Let's assume that OPENCV was built on machine with CV_SIMD (code is exist) but executes on machine with CPU which doesn't support SIMD (code exist but cannot be executed) - so we still have 16 here, but actual earlier dispatching work out in assumption that there are no SIMD support and expect that it should have 4 due to CPU doesn't support SIMD
To resolve this situation we usually use TWO kind of check
#if EXTENSION_EXIST // compile code ON
if( extension_supported()) // test is supported
{
// supported
}
else
{
// compiled, but not supported
}
#else
// not compiled, but not supported
Is this situation applicable here?
There was a problem hiding this comment.
In this case, no errors or problems will arise. The buffer will be larger, but it will not cause any unwanted side effects.
| constexpr int nlanes = v_float32::nlanes; | ||
| constexpr int chan = 3; | ||
| constexpr int lanes = chan * nlanes; | ||
|
|
||
| int x = 0; | ||
| for (;;) | ||
| { | ||
| for (; x <= length - lanes; x += lanes) |
There was a problem hiding this comment.
looks like this copy-paste part with lanes calculation & internal for-loop could be incorporated in initial dispatcher addc_simd_c3(const SRC in[], const float scalar[], DST out[], const int length) then
this *_impl would just preform intrisict calculation only without this dupliacted pre- and post- operations
| case 3: \ | ||
| return addc_simd_c3(in, scalar, out, length); \ | ||
| default: \ | ||
| break; \ |
37c7008 to
535133c
Compare
|
@alalek Could you please merge the PR? |
* GAPI Fluid: SIMD for AddC kernel * Final version * Applied comments.
SIMD for GAPI Fluid AddC kernel.
Performance report:
FluidAddCSIMD.xlsx