Conversation
sivanov-work
left a comment
There was a problem hiding this comment.
LGTM (besides assert for unsupported types comment in div_hal)
| BINARY_(uchar , short, short, run_arithm, dst, src1, src2, ARITHM_DIVIDE, scale); | ||
| BINARY_(uchar , float, float, run_arithm, dst, src1, src2, ARITHM_DIVIDE, scale); | ||
| BINARY_( short, short, short, run_arithm, dst, src1, src2, ARITHM_DIVIDE, scale); | ||
| BINARY_(ushort, ushort, ushort, run_arithm, dst, src1, src2, ARITHM_DIVIDE, scale); |
There was a problem hiding this comment.
Comment not for this PR:
maybe in future it would be nice to have something:
GAPI_KERNEL(...)
...
#define KERNEL_XXX_SUPPORTED_TYPES ushort, int, char,...
static void run(...) {
run_dispatcher<KERNEL_XXX_SUPPORTED_TYPES>(...);
}
template <class ....Types>
run_dispatcher(...) {
bool expander[] { (run_impl<Type>(), true)...}
}
template<class SupportedType>
run_impl(...) {
BINARY_(SupportedType, ...);
}
There was a problem hiding this comment.
Probably yes, however for this purpose it's need to rework the entire Fluid backend. It will take a certain amount of time. As you know the allocation of resources is handled by @dmatveev , so this issue should be discussed with him.
525c9a3 to
eace9c7
Compare
| cv::divide(in_mat1, in_mat2, out_mat_ocv, dtype); | ||
|
|
||
| cv::divide(in_mat1, in_mat2, out_mat_ocv, scale, dtype); | ||
| //out_mat_ocv.size(). |
| @@ -0,0 +1,427 @@ | |||
| // This file is part of OpenCV project. | |||
There was a problem hiding this comment.
.simd.hpp
These files are used for dynamic dispatching between multiple instruction sets.
Check:
- documentation / tutorial here: https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options
- some of these PRs
- how it is used
gfluidimgproc_func.simd.hppin G-API
There was a problem hiding this comment.
Ok. Reworked. I've applied dynamic dispatching for gfluidcore_func.simd.hpp the same as it has been applied for gfluidimgproc_func.simd.hpp
eace9c7 to
54e5744
Compare
6e7af53 to
fd6a2fe
Compare
fd6a2fe to
d69ae64
Compare
alalek
left a comment
There was a problem hiding this comment.
Looks good to me! Thank you for update 👍
| CV_ALWAYS_INLINE void v_store_select(short* dst, const v_int16&, const v_int16&, | ||
| const v_int32& res1, const v_int32& res2) | ||
| { | ||
| vx_store(dst, v_pack(res1, res2)); |
There was a problem hiding this comment.
v_store_select
v_pack
Do we need select somewhere here?
There was a problem hiding this comment.
Yes. Reworked. This lines were done deliberately as a workaround for a known issue in OCV.
Since one more workaround has been added to the DivPerfTestFluid test, this workaround can be reworked.
| template<> struct vector_type_of<ushort> { using type = v_uint16; }; | ||
| template<> struct vector_type_of<short> { using type = v_int16; }; | ||
|
|
||
| CV_ALWAYS_INLINE v_float32 v_load_f32(const float* in) |
There was a problem hiding this comment.
v_load_f32
It makes sense to add "gapi_v_" prefix to avoid confusion with OpenCV SIMD functions.
There was a problem hiding this comment.
Ok. However "gapi_v_" too long. I replaced "v" prefix to "vg" prefix.
| //This condition need to workaround bug in OpenCV. | ||
| //It reinitializes divider matrix without zero values. | ||
| if (dtype == CV_16S && type != CV_16S) | ||
| cv::randu(in_mat2, cv::Scalar::all(1), cv::Scalar::all(255)); |
There was a problem hiding this comment.
BTW, There is similar problem in cv::divide() with dst=8S (e.g., src=8U).
Probably accurate condition should be dtype != type ( && dtype != CV_32F)
There was a problem hiding this comment.
Fortunately, fluid's Div kernel doesn't support 8S type at all.
For supported types:
Since bug observed only when DST_Type == 16S and SRC_Type != 16S (both condition together), I think accurate condition is
(dtype == 16S) && (dtype != type)
This condition has already been applied in the last PR's update.
For all other combinations of DST and SRC types, we must check handling of division by zero cases with this test.
d69ae64 to
e54fafe
Compare
|
|
||
| When src2(I) is zero, dst(I) will also be zero. Different channels of | ||
| For integer types when src2(I) is zero, dst(I) will also be zero. | ||
| Floating point case returns Inf/NaN (according to IEEE). |
There was a problem hiding this comment.
Unfortunately, I haven't heard about this issue earlier. However, I can say with confidence that during testing with DivPerfTestFluid test, behavior of the Div fluid's kernel is fully consistent with the behavior of cv::divide() operation. All CI checks successfully passed. The exception is the behavior is caused with known issue in OpenCV (dtype== CV_16S && type != dtype). For mentioned OCV's issue workaround has already added to test.
There was a problem hiding this comment.
Since all CI checks successfully passed, I can assume that mentioned issue in PR #13147 was successfully resolved.
| static inline | ||
| typename std::enable_if<!std::is_same<DST, float>::value, DST>::type | ||
| div(SRC1 x, SRC2 y, float scale=1) | ||
| { | ||
| // like OpenCV: returns 0, if y=0 | ||
| // like OpenCV: returns 0, if DST type=uchar/short/ushort and divider(y)=0 | ||
| auto result = y? scale * x / y: 0; |
There was a problem hiding this comment.
BTW, We saw tests failures for "Linux Debug" builder only.
This is strange because SIMD optimized builders should fail too.
There was a problem hiding this comment.
| } | ||
|
|
||
| template<typename DST, typename SRC1, typename SRC2> | ||
| static inline DST divr(SRC1 x, SRC2 y, float scale=1) |
There was a problem hiding this comment.
divr
has the same flow, but it is still not updated.
There was a problem hiding this comment.
Within current task this operation wasn't and won't be updated. I've added SIMD for Div kernel only.
There was a problem hiding this comment.
BTW, divr() function is declared but not used anywhere in gfluidcore.cpp
dmatveev
left a comment
There was a problem hiding this comment.
LGTM but not sure if everything is ok about .simd.hpp
GAPI Fluid: SIMD Div kernel. * HAL implementation for Div kernel * Removed dbg lines * Applied comments. * Reworked * Final version
SIMD for Divide kernel.
Performance report:
FluidDivSIMD.xlsx