GAPI Fluid: SIMD AVX2 Resize F32C1.#21728
Conversation
009e3bb to
c1938d8
Compare
c1938d8 to
c0e49b5
Compare
c0e49b5 to
8f920d4
Compare
sivanov-work
left a comment
There was a problem hiding this comment.
I think last loop can be optimized in single memcpy
|
|
||
| #if defined __GNUC__ | ||
| # pragma GCC diagnostic push | ||
| # pragma GCC diagnostic ignored "-Wstrict-overflow" |
There was a problem hiding this comment.
why we need to mask it here? does it hide some sort of error?
There was a problem hiding this comment.
This is a legacy carried over from OpenVINO. This warning suppression was added to OpenVINO without my participation.
There was a problem hiding this comment.
I would suggest to remove it and try build with opencv CI
There was a problem hiding this comment.
Applied.
Running CI checks were successful.
| { | ||
| GAPI_DbgAssert(xRatioEq1 && yRatioEq1); | ||
| int length = inSz.width; // == outSz.width | ||
| for (int line = 0; line < lpi; ++line) |
There was a problem hiding this comment.
Still memory allocated continuously (i believe) is it possible to make copy whole matrix block ( which is linear data in memory) using single one memcpy call instead of wasting CPU cycles on loop iteration?
| if (inSz.width >= nlanes && outSz.width >= nlanes) | ||
| { | ||
| avx2::calcRowLinear32FC1Impl(reinterpret_cast<float**>(dst), | ||
| reinterpret_cast<const float**>(src0), |
There was a problem hiding this comment.
reinterpret cast is normally a red flag, can this be avoided? can avx2::calcRowLinear32FC1Impl also accept a float*[] type instead?
|
@alalek Could you please take a look this PR and merge if it looks good for you? |
…vx_simd GAPI Fluid: SIMD AVX2 Resize F32C1. * GAPI Fluid: Resize F32C1 scalar. * Final version * GAPI Fluid: SIMD AVX2 for Resize F32C1. * Applied comments. * Deleted warning suppression. * Applied comments.
SIMD AVX2 for Resize F32C1.
Performance:
ResizeF32C1_SIMD_AVX.xlsx