Adding support for vectorized masking for uchar/ushort.#15527
Adding support for vectorized masking for uchar/ushort.#15527alalek merged 5 commits intoopencv:3.4from
Conversation
|
Just found a bug here, I was under the impression that the mask should return 0 but masking should just keep the value os dst as it is, also need to make some further improvements. |
the mask and tweaked for further performance improvements.
|
Fixed the problem, got event further performance improvements up to almost 4x |
modules/imgproc/src/accum.simd.hpp
Outdated
| int size = len * cn; | ||
| int y = -cVectorWidth; | ||
| int cc = 0; | ||
| for (; x <= size - cVectorWidth; x += cVectorWidth, cc = (cc+1) % cn) |
There was a problem hiding this comment.
cc will always be equal to 0 since cn is 1 in this branch
modules/imgproc/src/accum.simd.hpp
Outdated
| v_store(dst + x + step * 3, v_dst11); | ||
| } | ||
| } else if ( cn == 1 ){ | ||
| //#include <stdio.h> |
modules/imgproc/src/accum.simd.hpp
Outdated
| v_float32 v_dst10 = vx_load(dst + x + step * 2); | ||
| v_float32 v_dst11 = vx_load(dst + x + step * 3); | ||
|
|
||
| v_dst00 = v_fma(v_fma(v_dst00, v_beta, v_cvt_f32(v_reinterpret_as_s32(v_src00)) * v_alpha), d0, (~d0)*v_mf00); |
There was a problem hiding this comment.
Looks like for zero mask this line will produce result equal to (~d0)*v_mf00 that is definitely unrelated to expected initial dst value
There was a problem hiding this comment.
Yeah I'm fixing that in a bit, thanks for the heads up.
There was a problem hiding this comment.
It's possible to use v_select(d0, v_fma(...), v_dst00) intrinsic to choose between updated and original destination values
|
[355/1819]
Performance for SSE2 baseline
[249/1819]
Performance for SSE3 baseline
Performance for SSE4_2 baseline
Performance for AVX2 baseline
|
Currently whenever a mask is used at the accumulate function opencv reverts to the generic non-vectorized implementation. This PR adds masking support for uchar and ushort's implementation. Performance improvement for ppc64le are up to 89%.