GAPI Fluid: SIMD Multiply kernel.#21024
Conversation
| { | ||
| // like OpenCV: returns 0, if y=0 | ||
| // like OpenCV: returns 0, if DST type=uchar/short/ushort and divider(y)=0 | ||
| auto result = y? scale * x / y: 0; |
There was a problem hiding this comment.
Yes, it can be. And next overload handles this case.
| BINARY_(uchar, uchar, uchar, run_arithm, dst, src1, src2, ARITHM_MULTIPLY, scale); | ||
| BINARY_(uchar, ushort, ushort, run_arithm, dst, src1, src2, ARITHM_MULTIPLY, scale); | ||
| BINARY_(uchar, short, short, run_arithm, dst, src1, src2, ARITHM_MULTIPLY, scale); | ||
| BINARY_(uchar, float, float, run_arithm, dst, src1, src2, ARITHM_MULTIPLY, scale); |
There was a problem hiding this comment.
Is it supposed to have SRC1 & SRC2 for the same type (i mean if it falls into the div/mul with SFINAE with SRC1 & SRC2 then do we need to satisfy static_check(std::is_same<SRC1, SRC2>))?
There was a problem hiding this comment.
This check is already exist. Please take a look here
| namespace gapi { | ||
| namespace fluid { | ||
|
|
||
| #define DIV_SIMD(SRC, DST) \ |
There was a problem hiding this comment.
looks like order is different with BINARY_
https://github.com/opencv/opencv/pull/21024/files#diff-fe0ac6b5a07c1bec59c3756100eb186c9117a82dd1d951ef6997b8423a89943dR787
What do you think to align: DST <-->SRC according to BINARY_ order or vise versa?
There was a problem hiding this comment.
Two first arguments of div_simd() have SRC type. Third argument has DST type. So for MACRO I should mention SRC type first and second should be DST.
There was a problem hiding this comment.
BYNARY_ macro is working good, so in the bounds of this task, I can't change API of this MACRO. Otherwise I have to change its invocation for all kernels that use it.
There was a problem hiding this comment.
I see your point.
But what do you think to align argument positional agreement in newly added functions for mul/div in accommodation with the old approach with BINARY_?
UPDATED: Or it will affect template type auto-deduction?
| CV_ALWAYS_INLINE | ||
| typename std::enable_if<(std::is_same<SRC, short>::value && std::is_same<DST, ushort>::value) || | ||
| (std::is_same<SRC, ushort>::value && std::is_same<DST, ushort>::value) || | ||
| (std::is_same<SRC, ushort>::value && std::is_same<DST, short>::value), int>::type |
There was a problem hiding this comment.
IMHO: enable_if is good for simple condition check otherwise it confused
but as I noticed we need to pass type into v_load_f32 and others function, so we need to know actual type here...
IMHO again:
To avoid eyeballs scattering when reading this code i think it is possible to introduce MACRO at least
#define SRC_DST_SHORT_USHORT (std::is_same<SRC, >...)
#define DST_SHORT_USHORT (std::is_same ...)
...
template<...>
CV_ALWAYS_INLINE
typename std::enable_if<SRC_DST_SHORT_USHORT, int >::type
div_hal(...)
...
template<...>
CV_ALWAYS_INLINE
typename std::enable_if<SRC_DST_SHORT_USHORT, int >::type
mul_hal(...)
Feel free to ignore it
| #if CV_SIMD | ||
| x = div_simd(in1, in2, out, length, scale); | ||
| #endif | ||
| for (; x < length; ++x) |
There was a problem hiding this comment.
is there are no loop-unrolling MACRO in opencv?
There was a problem hiding this comment.
Does it make sense here?
|
Need to rebase after "Div" PR merge ( |
ecc0595 to
5ad0b60
Compare
5ad0b60 to
c47673b
Compare
@alalek Ok. Done. Please take a look. |
SIMD optimization for Fluid Multiply kernel.
Performance report:
FluidMulSIMD.xlsx