Univ Intrinsics implementation of Add, Sub, Absdiff kernels by anna-khakimova · Pull Request #18338 · opencv/opencv

anna-khakimova · 2020-09-14T21:58:19Z

Performance report:

AbsDiff_Add_Sub_perf_report.xlsx
SIMD optimization via universal intrinsics for Add, Sub and AbsDiff fluid kernels.

Published for review 24th of September.
@smirnov-alexey , @anton-potapov , @OrestChura, @rgarnov please take a look.

force_builders=Linux AVX2,Custom,Custom Win,Custom Mac,ARMv8,ARMv7,Linux32,Win32
disable_ipp:Custom=ON

buildworker:Custom=linux-3
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX

Xbuildworker:Custom=linux-1,linux-2,linux-4
Xbuild_image:Custom=powerpc64le

asmorkalov · 2020-09-17T08:20:52Z

@anna-khakimova Welcome back! Please take a look on CI failures: https://pullrequest.opencv.org/buildbot/builders/precommit_linux64/builds/27868.

anna-khakimova · 2020-09-21T02:06:38Z

@OrestChura , please test this patch on KMB.

modules/gapi/src/backends/fluid/gfluidcore.cpp

alalek · 2020-09-25T12:57:16Z

modules/core/include/opencv2/core/hal/intrin_sse.hpp

 { return v_float32x4(_mm_castsi128_ps(a.val)); }
 inline v_float32x4 v_reinterpret_as_f32(const v_int64x2& a)
 { return v_float32x4(_mm_castsi128_ps(a.val)); }
+


please remove unrelated changes from the patch

modules/core/include/opencv2/core/hal/intrin_neon.hpp

alalek · 2020-10-14T07:41:37Z

modules/core/include/opencv2/core/hal/intrin_avx.hpp

 { return v_float32x8(_mm256_cvtepi32_ps(a.val)); }

+inline v_float32x8 v_cvt_f32(const v_uint32x8& a)
+{ return v_float32x8(_mm256_cvtepi32_ps(a.val)); }


_mm256_cvtepi32_ps() documentation has this statement:

Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.

This doesn't work with unsigned values.

Testing doesn't test "unsigned" case.

If this works with your code, then you probably don't really need to add this intrinsic.

alalek · 2020-10-14T07:42:48Z

modules/core/include/opencv2/core/hal/intrin_cpp.hpp

+{
+    v_reg<float, n> c;
+    for (int i = 0; i < n; i++)
+        c.s[i] = (float)a.s[i];


let left this "as is" (this file prefers C style casts)

alalek · 2020-10-14T07:45:21Z

modules/core/test/test_intrin_utils.hpp

        return *this;
    }
-
+        


Git hook must be installed before development to avoid that.
Refer to "How to contribute" Wiki page.

alalek · 2020-10-14T07:50:49Z

modules/gapi/src/backends/fluid/gfluidcore.cpp

+                v_int16 b1 = vx_load(reinterpret_cast<const short*>(&in2[x]));
+                v_int16 b2 = vx_load(reinterpret_cast<const short*>(&in2[x + nlanes / 2]));
+
+                vx_store(reinterpret_cast<uchar*>(&out[x]), v_pack_u(a1 + b1, a2 + b2));


What is about saturation rules?

+ is overloaded operation with saturation.

alalek · 2020-10-14T07:52:04Z

modules/gapi/src/backends/fluid/gfluidcore.cpp

+#if CV_SIMD
+        absdiff_simd(in1, in2, out, length, x);
+#endif
+        for (; x < length; ++x)
+            out[x] = absdiff<DST>(in1[x], in2[x]);
+}


it is broken

alalek · 2020-10-14T07:56:57Z

modules/gapi/src/backends/fluid/gfluidcore.cpp


+#if CV_SIMD
+template<typename T, typename VT>
+static inline void absdiff_impl(const T in1[], const T in2[], T out[], int length, int& x)


void ... int& x

It is better to "return x" instead (avoid code which may block compiler optimizations).

anna-khakimova · 2020-10-15T09:40:08Z

@alalek please take a look one more

alalek · 2020-10-15T09:41:53Z

Remove unnecessary changes from the patch and make required builds green.

anna-khakimova · 2020-10-15T10:10:34Z

modules/core/include/opencv2/core/hal/intrin_forward.hpp

 //! @endcond

-} // cv::
+} // cv::


Unfortunately I can't unroll this changes. Visual Studio insert shift to next line automatically. However as I know file should be ended by empty line.

I think it is configurable, can you please find this setting?

BTW, git can unroll any file. We don't need editor for that.

git checkout upstream/master -- modules/core/include/opencv2/core/hal/intrin_forward.hpp

@alalek Ok. Done.

anna-khakimova · 2020-10-15T12:08:02Z

@alalek Sorry for inconvenient. Builds are passed now. Please look one more.

alalek · 2020-10-15T12:14:53Z

Test in AVX512 build (custom) is crashed.

anna-khakimova · 2020-10-15T12:39:47Z

Test in AVX512 build (custom) is crashed.

@alalek Custom build crashes constantly and on all PRs. It catches exception when try to get video that absent. It's known issue and it doesn't have relation with my PR.

alalek · 2020-10-15T13:27:32Z

This is not true.

Exception != crash
Current nighly AVX512 build doesn't crash on gapi test.

when try to get video that absent

you should configure your local environment properly.

alalek · 2020-10-15T14:40:43Z

GDB log of this crash is below.

Details

[ RUN      ] GComputationCompile.FluidReshapeWithDifferentDims

Thread 1 "opencv_test_gap" received signal SIGSEGV, Segmentation fault.
cv::gapi::fluid::add_simd_sametype<unsigned char, cv::hal_baseline::v_uint8x64> (length=<optimized out>, out=<optimized out>, in2=<optimized out>, in1=<optimized out>)
    at /build/precommit_custom_linux/opencv/modules/gapi/src/backends/fluid/gfluidcore.cpp:242
242	            vx_store(&out[x], a + b);
(gdb) bt
#0  0x00007f7fc3a66253 in cv::gapi::fluid::add_simd_sametype<unsigned char, cv::hal_baseline::v_uint8x64> (length=<optimized out>, out=<optimized out>, in2=<optimized out>, in1=<optimized out>)
    at /build/precommit_custom_linux/opencv/modules/gapi/src/backends/fluid/gfluidcore.cpp:242
#1  0x00007f7fc3a66253 in cv::gapi::fluid::add_simd<unsigned char, unsigned char> (length=<optimized out>, out=<optimized out>, in2=<optimized out>, in1=<optimized out>)
    at /build/precommit_custom_linux/opencv/modules/gapi/src/backends/fluid/gfluidcore.cpp:266
#2  0x00007f7fc3a66253 in cv::gapi::fluid::run_arithm<unsigned char, unsigned char, unsigned char> (scale=1, arithm=cv::gapi::fluid::ARITHM_ADD, src2=..., src1=..., dst=...)
    at /build/precommit_custom_linux/opencv/modules/gapi/src/backends/fluid/gfluidcore.cpp:467
#3  0x00007f7fc3a66253 in cv::gapi::fluid::GFluidAdd::run(cv::gapi::fluid::View const&, cv::gapi::fluid::View const&, int, cv::gapi::fluid::Buffer&) (dst=..., src2=..., src1=...)
    at /build/precommit_custom_linux/opencv/modules/gapi/src/backends/fluid/gfluidcore.cpp:501
#4  0x00007f7fc3a66253 in cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call_impl<0, 1, 2, 0>(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&, cv::detail::Seq<0, 1, 2>, cv::detail::Seq<0>) (out_bufs=..., in_args=...)
    at /build/precommit_custom_linux/opencv/modules/gapi/include/opencv2/gapi/fluid/gfluidkernel.hpp:369
#5  0x00007f7fc3a66253 in cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&) (in_args=..., out_bufs=...) at /build/precommit_custom_linux/opencv/modules/gapi/include/opencv2/gapi/fluid/gfluidkernel.hpp:376
#6  0x00007f7fc39cbf73 in std::function<void (std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)>::operator()(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&) const (__args#1=std::vector of length 1, capacity 1 = {...}, __args#0=std::vector of length 3, capacity 3 = {...}, this=0x55e7d1384308) at /usr/include/c++/7/bits/std_function.h:706
#7  0x00007f7fc39cbf73 in cv::gimpl::FluidAgent::doWork() (this=0x55e7d13842f0) at /build/precommit_custom_linux/opencv/modules/gapi/src/backends/fluid/gfluidbackend.cpp:506
#8  0x00007f7fc39d09d8 in cv::gimpl::GFluidExecutable::run(std::vector<std::pair<cv::gimpl::RcDesc, cv::util::variant<cv::UMat, cv::RMat, std::shared_ptr<cv::gapi::wip::IStreamSource>, cv::Mat, cv::Scalar_<double>, cv::detail::VectorRef, cv::detail::OpaqueRef, cv::MediaFrame> >, std::allocator<std::pair<cv::gimpl::RcDesc, cv::util::variant<cv::UMat, cv::RMat, std::shared_ptr<cv::gapi::wip::IStreamSource>, cv::Mat, cv::Scalar_<double>, cv::detail::VectorRef, cv::detail::OpaqueRef, cv::MediaFrame> > > >&, std::vector<std::pair<cv::gimpl::RcDesc, cv::util::variant<cv::UMat*, cv::Mat*, cv::RMat*, cv::Scalar_<double>*, cv::detail::VectorRef, cv::detail::OpaqueRef> >, std::allocator<std::pair<cv::gimpl::RcDesc, cv::util::variant<cv::UMat*, cv::Mat*, cv::RMat*, cv::Scalar_<double>*, cv::detail::VectorRef, cv::detail::OpaqueRef> > > >&) (this=0x55e7d1328260, input_objs=std::vector of length 1, capacity 1 = {...}, output_objs=std::vector of length 1, capacity 1 = {...}) at /build/precommit_custom_linux/opencv/modules/gapi/src/backends/fluid/gfluidbackend.cpp:1354
#9  0x00007f7fc38fc51e in cv::gimpl::GIslandExecutable::run(cv::gimpl::GIslandExecutable::IInput&, cv::gimpl::GIslandExecutable::IOutput&) (this=0x55e7d1328260, in=..., out=...)
    at /build/precommit_custom_linux/opencv/modules/gapi/src/compiler/gislandmodel.cpp:371
#10 0x00007f7fc395af82 in cv::gimpl::GExecutor::run(cv::gimpl::GRuntimeArgs&&) (this=0x55e7d133c7b0, args=...) at /build/precommit_custom_linux/opencv/modules/gapi/src/executor/gexecutor.cpp:336
#11 0x00007f7fc3916133 in cv::GCompiled::Priv::run(cv::gimpl::GRuntimeArgs&&) (this=<optimized out>, args=...) at /build/precommit_custom_linux/opencv/modules/gapi/src/compiler/gcompiled.cpp:38
#12 0x00007f7fc3917070 in cv::GCompiled::operator()(std::vector<cv::util::variant<cv::UMat, cv::RMat, std::shared_ptr<cv::gapi::wip::IStreamSource>, cv::Mat, cv::Scalar_<double>, cv::detail::VectorRef, cv::detail::OpaqueRef, cv::MediaFrame>, std::allocator<cv::util::variant<cv::UMat, cv::RMat, std::shared_ptr<cv::gapi::wip::IStreamSource>, cv::Mat, cv::Scalar_<double>, cv::detail::VectorRef, cv::detail::OpaqueRef, cv::MediaFrame> > >&&, std::vector<cv::util::variant<cv::UMat*, cv::Mat*, cv::RMat*, cv::Scalar_<double>*, cv::detail::VectorRef, cv::detail::OpaqueRef>, std::allocator<cv::util::variant<cv::UMat*, cv::Mat*, cv::RMat*, cv::Scalar_<double>*, cv::detail::VectorRef, cv::detail::OpaqueRef> > >&&) (this=<optimized out>, ins=..., outs=...) at /build/precommit_custom_linux/opencv/modules/gapi/src/compiler/gcompiled.cpp:107
#13 0x00007f7fc38874cd in cv::GComputation::apply(std::vector<cv::util::variant<cv::UMat, cv::RMat, std::shared_ptr<cv::gapi::wip::IStreamSource>, cv::Mat, cv::Scalar_<double>, cv::detail::VectorRef, cv::detail::OpaqueRef, cv::MediaFrame>, std::allocator<cv::util::variant<cv::UMat, cv::RMat, std::shared_ptr<cv::gapi::wip::IStreamSource>, cv::Mat, cv::Scalar_<double>, cv::detail::VectorRef, cv::detail::OpaqueRef, cv::MediaFrame> > >&&, std::vector<cv::util::variant<cv::UMat*, cv::Mat*, cv::RMat*, cv::Scalar_<double>*, cv::detail::VectorRef, cv::detail::OpaqueRef>, std::allocator<cv::util::variant<cv::UMat*, cv::Mat*, cv::RMat*, cv::Scalar_<double>*, cv::detail::VectorRef, cv::detail::OpaqueRef> > >&&, std::vector<cv::GCompileArg, std::allocator<cv::GCompileArg> >&&) (this=this@entry=0x7ffc4c0ef7f0, ins=..., outs=..., args=...) at /build/precommit_custom_linux/opencv/modules/gapi/src/api/gcomputation.cpp:155
#14 0x00007f7fc3887c89 in cv::GComputation::apply(cv::Mat, cv::Mat&, std::vector<cv::GCompileArg, std::allocator<cv::GCompileArg> >&&) (this=this@entry=0x7ffc4c0ef7f0, in=..., out=..., args=...)
    at /build/precommit_custom_linux/opencv/modules/gapi/src/api/gcomputation.cpp:210
#15 0x000055e7ce9be02d in opencv_test::GComputationCompile_FluidReshapeWithDifferentDims_Test::Body() (this=<optimized out>) at /build/precommit_custom_linux/opencv/modules/gapi/test/internal/gapi_int_recompilation_test.cpp:84
#16 0x000055e7ce9b6370 in opencv_test::GComputationCompile_FluidReshapeWithDifferentDims_Test::TestBody() (this=0x55e7d123abd0) at /build/precommit_custom_linux/opencv/modules/gapi/test/internal/gapi_int_recompilation_test.cpp:75
#17 0x000055e7ceb442ea in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (location=0x55e7cebce41c "the test body", method=<optimized out>, object=0x55e7d123abd0) at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:3917
#18 0x000055e7ceb442ea in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (object=0x55e7d123abd0, method=<optimized out>, location=0x55e7cebce41c "the test body") at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:3953
#19 0x000055e7ceb4453a in testing::Test::Run() (this=0x55e7d123abd0) at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:3991
#20 0x000055e7ceb44858 in testing::TestInfo::Run() (this=0x55e7d055a9c0) at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:4167
#21 0x000055e7ceb44965 in testing::TestCase::Run() (this=0x55e7d055a420) at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:4285
#22 0x000055e7ceb4a4c8 in testing::internal::UnitTestImpl::RunAllTests() (this=this@entry=0x55e7d05079b0) at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:6660
#23 0x000055e7ceb4a6b1 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (location=0x55e7cebce828 "auxiliary test code (environments or event listeners)", method=<optimized out>, object=<optimized out>) at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:3917
#24 0x000055e7ceb4a6b1 in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (location=0x55e7cebce828 "auxiliary test code (environments or event listeners)", method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x55e7ceb4a050 <testing::internal::UnitTestImpl::RunAllTests()>, object=0x55e7d05079b0)
    at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:3953
#25 0x000055e7ceb4a6b1 in testing::UnitTest::Run() (this=0x55e7cef3fca0 <testing::UnitTest::GetInstance()::instance>) at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:6269
#26 0x000055e7ce554771 in RUN_ALL_TESTS() () at /build/precommit_custom_linux/opencv/modules/ts/include/opencv2/ts/ts_gtest.h:22228
#27 0x000055e7ce554771 in main(int, char**) (argc=<optimized out>, argv=0x7ffc4c0eff78) at /build/precommit_custom_linux/opencv/modules/gapi/test/test_main.cpp:12
(gdb) info registers 
rax            0xd53b40	13974336
rbx            0x55e7d1386f80	94454135943040
rcx            0xffffffe0	4294967264
rdx            0xd53b00	13974272
rsi            0x55e7d1382900	94454135924992
rdi            0x55e7d1382900	94454135924992
rbp            0x7ffc4c0eeab0	0x7ffc4c0eeab0
rsp            0x7ffc4c0eea40	0x7ffc4c0eea40
r8             0x55e7d138e500	94454135973120
r9             0x20	32
r10            0x3f	63
r11            0x0	0
r12            0x0	0
r13            0x55e7d1210498	94454134408344
r14            0x55e7d1386f80	94454135943040
r15            0x55e7d123c1d8	94454134587864
rip            0x7f7fc3a66253	0x7f7fc3a66253 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+307>
eflags         0x10202	[ IF RF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0
k0             0xffff	65535
k1             0xffff	65535
k2             0x24924924	613566756
k3             0xaaaaaaaaaaaaaaaa	12297829382473034410
k4             0xffff	65535
k5             0x1ff	511
k6             0x1ff	511
k7             0xfe00	65024

(gdb) x/16i $rip-0x13 
   0x7f7fc3a66240 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+288>:	mov    %eax,%edx
   0x7f7fc3a66242 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+290>:	add    $0x40,%eax
   0x7f7fc3a66245 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+293>:	vmovdqu8 (%rdi,%rdx,1),%zmm0
   0x7f7fc3a6624c <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+300>:	vpaddusb (%rsi,%rdx,1),%zmm0,%zmm0
=> 0x7f7fc3a66253 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+307>:	vmovdqu64 %zmm0,(%r8,%rdx,1)
   0x7f7fc3a6625a <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+314>:	cmp    %ecx,%eax
   0x7f7fc3a6625c <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+316>:	
    jbe    0x7f7fc3a66240 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+288>
   0x7f7fc3a6625e <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+318>:	cmp    $0x3f,%eax
   0x7f7fc3a66261 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+321>:	mov    %r10d,%edx
   0x7f7fc3a66264 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+324>:	cmovae %eax,%edx
   0x7f7fc3a66267 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+327>:	cmp    %edx,%r9d
   0x7f7fc3a6626a <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+330>:	
    jbe    0x7f7fc3a667f1 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+1745>
   0x7f7fc3a66270 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+336>:	test   %r11d,%r11d
   0x7f7fc3a66273 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+339>:	
    je     0x7f7fc3a667e9 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+1737>
   0x7f7fc3a66279 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+345>:	mov    %ecx,%eax
   0x7f7fc3a6627b <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+347>:	
    jmp    0x7f7fc3a6625a <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+314>

anna-khakimova · 2020-10-15T16:10:30Z

Now test log from Custom build:
[ RUN ] StatefulKernel.StateIsAutoResetForNewStream
unknown file: Failure
C++ exception with description "OpenCV(4.5.0-pre) /home/user/Projects/opencv/opencv/modules/gapi/test/cpu/gapi_ocv_stateful_kernel_tests.cpp:54: error: (-215:Assertion failed) testDataPath != nullptr in function 'initTestDataPath'
" thrown in the test body.
[ FAILED ] StatefulKernel.StateIsAutoResetForNewStream (0 ms)

anna-khakimova · 2020-10-15T16:12:52Z

Now test log from Custom build:
[ RUN ] StatefulKernel.StateIsAutoResetForNewStream
unknown file: Failure
C++ exception with description "OpenCV(4.5.0-pre) /home/user/Projects/opencv/opencv/modules/gapi/test/cpu/gapi_ocv_stateful_kernel_tests.cpp:54: error: (-215:Assertion failed) testDataPath != nullptr in function 'initTestDataPath'
" thrown in the test body.
[ FAILED ] StatefulKernel.StateIsAutoResetForNewStream (0 ms)

It's mean that now Custom build tests fail only because of test infrastructure.

alalek · 2020-10-15T17:20:56Z

/home/user/Projects/opencv/opencv/modules/gapi/test/cpu/gapi_ocv_stateful_kernel_tests.cpp
It's mean that now Custom build tests fail because of test infrastructure.

There is no /home on CI test infrastructure, so it is not true again.

/cc @dmatveev Please help how to check build logs

Currently test app just hangs:

[ RUN      ] GComputationCompile.FluidReshapeWithDifferentDims

command timed out: 180 seconds without output running buildenv python ../opencv/modules/ts/misc/run.py --gtest_output=xml:results_test_gapi.xml -t gapi -a, attempting to kill
process killed by signal 15
program finished with exit code -1

anna-khakimova · 2020-10-15T17:31:41Z

/home/user/Projects/opencv/opencv/modules/gapi/test/cpu/gapi_ocv_stateful_kernel_tests.cpp
It's mean that now Custom build tests fail because of test infrastructure.

There is no /home on CI test infrastructure, so it is not true again.

/cc @dmatveev Please help how to check build logs

Currently test app just hangs:
[ RUN      ] GComputationCompile.FluidReshapeWithDifferentDims

command timed out: 180 seconds without output running buildenv python ../opencv/modules/ts/misc/run.py --gtest_output=xml:results_test_gapi.xml -t gapi -a, attempting to kill
process killed by signal 15
program finished with exit code -1

I've already run tests on AVX512 machine and show the log mentioned in previous comment.
There are only several failed tests as you can see in the log. They failed because of test infrastructure.
Please see that log.

terfendail · 2020-10-16T10:40:25Z

modules/gapi/src/backends/fluid/gfluidcore.cpp

+        }
+
+        if (x < length)
+            x = length - nlanes;


Such tail processing requires that in != out. (However that's ok if such a check is performed somewhere higher by call stack)

Ok. Add check for cases when input and output types are the same.

@dmatveev Do we really need these checks in G-API code? Does G-API support inplace processing? If no, then it makes sense to add CV_DbgAssert() instead.

@anna-khakimova There are 7 similar loops in this patch. Commit contains 3 updates only. Why?

@alalek
Answer for first question: This checks necessary to process last several elements of input array (tail) via univ intrinsics since their number is less than nlanes .
Answer for second question: As I've already mentioned in previous comment, for inplace implementation it's necessary that input and output array types should be the same. So, this check is needed only in 3 cases (in 3 functions which calls when input and output types are the same).
Note: Please read my comments more attentively.

Does G-API support inplace processing?

Yes, we have to support inplace processing at least to avoid copy when calling cv::gapi::wip::draw::render().

Not sure if Fluid have ever been tested with such type of inplace execution, but the Fluid itself never forces input and output buffer to be the same.

@alalek
There are no such cases in GAPI.
For AbsDiff there are cases such as:

uchar inputs and uchar output

short int inputs and short int output.

ushort inputs and ushort output.

float inputs and float outputs
For all cases mentioned above there is check to detect inplace.

For Add and Sub there are cases such as:

uchar inputs and uchar output <--- For this case there are checks for inplace.

short int inputs and uchar output <---- There is no sense to check for inplace.

float inputs and uchar output <------ There is no sense to check for inplace.

short int inputs and short int output <------- For this case there are check for inplace.

float inputs and float output <-------- For this case there are checks for inplace.

uchar inputs and float output <------- There is no sense to check for inplace.

short int inputs and float output <------- There is no sense to check for inplace.

As you can see, there are no cases with int at all and with int and float in particular.

These GAPI kernels don't support int type at all.

Inplace implementation is absent for these kernels. And case when user gives one the same int matrix for inputs, then cast this matrix to float and pass it to output is difficult to imagine in reality.

What's the point in adding checks for all occasions?

... for now.

Assertions verify assumptions which are required by related code below.
It is a really powerful tool.
This helps with investigations in the future through emitting error messages. This reduces annoying debugging process of related problems.

... for now.

Assertions verify assumptions which are required by related code below.
It is a really powerful tool.
This helps with investigations in the future through emitting error messages. This reduces annoying debugging process of related problems.

@alalek Ok. Done.

Anyway, I would suggest to put CV_DbgAssert() for other cases.

when input and output types are the same

to catch cases when different types are inplaced (e.g. both float and int are 32-bit)

I've already put checks for inplace when input and output types are the same.

I've already put checks for inplace when input and output types are the same.

Please open an issue internally to start checking for inplace at the Fluid backend level.
Then we could remove such checks from the kernels (where it may be costly given that kernels are called for every line of an image)

anna-khakimova · 2020-10-19T15:40:15Z

@alalek please review one more. All checks are passed.

modules/gapi/src/backends/fluid/gfluidcore.cpp

anna-khakimova · 2020-10-20T19:32:00Z

@alalek @dmatveev, all comments were addressed, all checks were passed. Please take a look.

anton-potapov · 2020-10-20T22:22:10Z

modules/gapi/src/backends/fluid/gfluidcore.cpp

+        {
+            VT a = vx_load(&in1[x]);
+            VT b = vx_load(&in2[x]);
+            absdiff_store(out, a, b, x);


Is it the only place where absdiff_store is used, or did I miss something ? If so - that is the point of separate function for this , why not inline it's body here ?

@anton-potapov
I forced to add these overloads because I realized that new universal intrinsic absdiffs() (added by me for v_uint8, v_uint16, v_float32 types earlier) work the same as already exist absdiff() for argument types mentioned above. So now I have to use only one absdiffs() for v_int16 type only. For the rest types (v_uint8, v_uint16, v_float32) I use absdiff(). And so these overloads are need here.

dmatveev

The code can be really simplified using templates, but that's another story

anna-khakimova force-pushed the ak/opt_arithm_kernel branch 4 times, most recently from f023760 to ed3e1bb Compare September 16, 2020 10:05

asmorkalov added the category: core label Sep 17, 2020

anna-khakimova force-pushed the ak/opt_arithm_kernel branch 2 times, most recently from 4640203 to de8eab5 Compare September 21, 2020 01:56

anna-khakimova force-pushed the ak/opt_arithm_kernel branch 18 times, most recently from 4c2a6e4 to dcb6425 Compare September 24, 2020 15:06

OrestChura reviewed Sep 25, 2020

View reviewed changes

modules/gapi/src/backends/fluid/gfluidcore.cpp Outdated Show resolved Hide resolved

anna-khakimova force-pushed the ak/opt_arithm_kernel branch 2 times, most recently from 9ea8588 to 2050b86 Compare September 25, 2020 01:04

anna-khakimova force-pushed the ak/opt_arithm_kernel branch 4 times, most recently from 17747d1 to 931caaf Compare October 14, 2020 07:51

alalek reviewed Oct 14, 2020

View reviewed changes

anna-khakimova force-pushed the ak/opt_arithm_kernel branch 2 times, most recently from 588f8e4 to ff1078c Compare October 14, 2020 12:21

anna-khakimova commented Oct 15, 2020

View reviewed changes

Add, sub, absdiff kernels optimization

043192e

terfendail reviewed Oct 16, 2020

View reviewed changes

avoid unused conditions

212b10d

anna-khakimova commented Oct 19, 2020

View reviewed changes

modules/gapi/src/backends/fluid/gfluidcore.cpp Show resolved Hide resolved

add conditions for tail processing

e31af6a

anton-potapov reviewed Oct 20, 2020

View reviewed changes

terfendail approved these changes Oct 21, 2020

View reviewed changes

dmatveev approved these changes Oct 21, 2020

View reviewed changes

alalek mentioned this pull request Nov 27, 2020

Merge 4.x #18951

Merged

Uh oh!

Conversation

anna-khakimova commented Sep 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmorkalov commented Sep 17, 2020

Uh oh!

anna-khakimova commented Sep 21, 2020

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anna-khakimova Oct 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anna-khakimova commented Oct 15, 2020

Uh oh!

alalek commented Oct 15, 2020

Uh oh!

anna-khakimova Oct 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anna-khakimova commented Oct 15, 2020

Uh oh!

alalek commented Oct 15, 2020

Uh oh!

anna-khakimova commented Oct 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalek commented Oct 15, 2020

Uh oh!

alalek commented Oct 15, 2020

Uh oh!

anna-khakimova commented Oct 15, 2020

Uh oh!

anna-khakimova commented Oct 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalek commented Oct 15, 2020

Uh oh!

anna-khakimova commented Oct 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anna-khakimova Oct 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

anna-khakimova commented Sep 14, 2020 •

edited

Loading

anna-khakimova Oct 14, 2020 •

edited

Loading

anna-khakimova Oct 15, 2020 •

edited

Loading

anna-khakimova commented Oct 15, 2020 •

edited

Loading

anna-khakimova commented Oct 15, 2020 •

edited

Loading

anna-khakimova commented Oct 15, 2020 •

edited

Loading

anna-khakimova Oct 19, 2020 •

edited

Loading

anna-khakimova Oct 20, 2020 •

edited

Loading

anna-khakimova commented Oct 20, 2020 •

edited

Loading

anna-khakimova Oct 21, 2020 •

edited

Loading