Skip to content

Univ Intrinsics implementation of Add, Sub, Absdiff kernels#18338

Merged
alalek merged 3 commits intoopencv:masterfrom
anna-khakimova:ak/opt_arithm_kernel
Oct 21, 2020
Merged

Univ Intrinsics implementation of Add, Sub, Absdiff kernels#18338
alalek merged 3 commits intoopencv:masterfrom
anna-khakimova:ak/opt_arithm_kernel

Conversation

@anna-khakimova
Copy link
Copy Markdown
Member

@anna-khakimova anna-khakimova commented Sep 14, 2020

Performance report:

AbsDiff_Add_Sub_perf_report.xlsx
SIMD optimization via universal intrinsics for Add, Sub and AbsDiff fluid kernels.

Published for review 24th of September.
@smirnov-alexey , @anton-potapov , @OrestChura, @rgarnov please take a look.

force_builders=Linux AVX2,Custom,Custom Win,Custom Mac,ARMv8,ARMv7,Linux32,Win32
disable_ipp:Custom=ON

buildworker:Custom=linux-3
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX

Xbuildworker:Custom=linux-1,linux-2,linux-4
Xbuild_image:Custom=powerpc64le

@anna-khakimova anna-khakimova force-pushed the ak/opt_arithm_kernel branch 4 times, most recently from f023760 to ed3e1bb Compare September 16, 2020 10:05
@asmorkalov
Copy link
Copy Markdown
Contributor

@anna-khakimova Welcome back! Please take a look on CI failures: https://pullrequest.opencv.org/buildbot/builders/precommit_linux64/builds/27868.

@anna-khakimova anna-khakimova force-pushed the ak/opt_arithm_kernel branch 2 times, most recently from 4640203 to de8eab5 Compare September 21, 2020 01:56
@anna-khakimova
Copy link
Copy Markdown
Member Author

@OrestChura , please test this patch on KMB.

@anna-khakimova anna-khakimova force-pushed the ak/opt_arithm_kernel branch 18 times, most recently from 4c2a6e4 to dcb6425 Compare September 24, 2020 15:06
@anna-khakimova anna-khakimova force-pushed the ak/opt_arithm_kernel branch 2 times, most recently from 9ea8588 to 2050b86 Compare September 25, 2020 01:04
@anna-khakimova anna-khakimova force-pushed the ak/opt_arithm_kernel branch 4 times, most recently from 17747d1 to 931caaf Compare October 14, 2020 07:51
{ return v_float32x4(_mm_castsi128_ps(a.val)); }
inline v_float32x4 v_reinterpret_as_f32(const v_int64x2& a)
{ return v_float32x4(_mm_castsi128_ps(a.val)); }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove unrelated changes from the patch

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

{ return v_float32x8(_mm256_cvtepi32_ps(a.val)); }

inline v_float32x8 v_cvt_f32(const v_uint32x8& a)
{ return v_float32x8(_mm256_cvtepi32_ps(a.val)); }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_mm256_cvtepi32_ps() documentation has this statement:

Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.

  1. This doesn't work with unsigned values.
  2. Testing doesn't test "unsigned" case.
  3. If this works with your code, then you probably don't really need to add this intrinsic.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

{
v_reg<float, n> c;
for (int i = 0; i < n; i++)
c.s[i] = (float)a.s[i];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let left this "as is" (this file prefers C style casts)

return *this;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git hook must be installed before development to avoid that.
Refer to "How to contribute" Wiki page.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

v_int16 b1 = vx_load(reinterpret_cast<const short*>(&in2[x]));
v_int16 b2 = vx_load(reinterpret_cast<const short*>(&in2[x + nlanes / 2]));

vx_store(reinterpret_cast<uchar*>(&out[x]), v_pack_u(a1 + b1, a2 + b2));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is about saturation rules?

Copy link
Copy Markdown
Member Author

@anna-khakimova anna-khakimova Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ is overloaded operation with saturation.

Comment on lines +533 to +615
#if CV_SIMD
absdiff_simd(in1, in2, out, length, x);
#endif
for (; x < length; ++x)
out[x] = absdiff<DST>(in1[x], in2[x]);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is broken


#if CV_SIMD
template<typename T, typename VT>
static inline void absdiff_impl(const T in1[], const T in2[], T out[], int length, int& x)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void ... int& x

It is better to "return x" instead (avoid code which may block compiler optimizations).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@anna-khakimova anna-khakimova force-pushed the ak/opt_arithm_kernel branch 2 times, most recently from 588f8e4 to ff1078c Compare October 14, 2020 12:21
@anna-khakimova
Copy link
Copy Markdown
Member Author

@alalek please take a look one more

@alalek
Copy link
Copy Markdown
Member

alalek commented Oct 15, 2020

Remove unnecessary changes from the patch and make required builds green.

//! @endcond

} // cv:: No newline at end of file
} // cv::
Copy link
Copy Markdown
Member Author

@anna-khakimova anna-khakimova Oct 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I can't unroll this changes. Visual Studio insert shift to next line automatically. However as I know file should be ended by empty line.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is configurable, can you please find this setting?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, git can unroll any file. We don't need editor for that.

git checkout upstream/master -- modules/core/include/opencv2/core/hal/intrin_forward.hpp

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alalek Ok. Done.

@anna-khakimova
Copy link
Copy Markdown
Member Author

@alalek Sorry for inconvenient. Builds are passed now. Please look one more.

@alalek
Copy link
Copy Markdown
Member

alalek commented Oct 15, 2020

Test in AVX512 build (custom) is crashed.

@anna-khakimova
Copy link
Copy Markdown
Member Author

anna-khakimova commented Oct 15, 2020

Test in AVX512 build (custom) is crashed.

@alalek Custom build crashes constantly and on all PRs. It catches exception when try to get video that absent. It's known issue and it doesn't have relation with my PR.

@alalek
Copy link
Copy Markdown
Member

alalek commented Oct 15, 2020

This is not true.

  1. Exception != crash
  2. Current nighly AVX512 build doesn't crash on gapi test.

when try to get video that absent

you should configure your local environment properly.

@alalek
Copy link
Copy Markdown
Member

alalek commented Oct 15, 2020

GDB log of this crash is below.

Details
[ RUN      ] GComputationCompile.FluidReshapeWithDifferentDims

Thread 1 "opencv_test_gap" received signal SIGSEGV, Segmentation fault.
cv::gapi::fluid::add_simd_sametype<unsigned char, cv::hal_baseline::v_uint8x64> (length=<optimized out>, out=<optimized out>, in2=<optimized out>, in1=<optimized out>)
    at /build/precommit_custom_linux/opencv/modules/gapi/src/backends/fluid/gfluidcore.cpp:242
242	            vx_store(&out[x], a + b);
(gdb) bt
#0  0x00007f7fc3a66253 in cv::gapi::fluid::add_simd_sametype<unsigned char, cv::hal_baseline::v_uint8x64> (length=<optimized out>, out=<optimized out>, in2=<optimized out>, in1=<optimized out>)
    at /build/precommit_custom_linux/opencv/modules/gapi/src/backends/fluid/gfluidcore.cpp:242
#1  0x00007f7fc3a66253 in cv::gapi::fluid::add_simd<unsigned char, unsigned char> (length=<optimized out>, out=<optimized out>, in2=<optimized out>, in1=<optimized out>)
    at /build/precommit_custom_linux/opencv/modules/gapi/src/backends/fluid/gfluidcore.cpp:266
#2  0x00007f7fc3a66253 in cv::gapi::fluid::run_arithm<unsigned char, unsigned char, unsigned char> (scale=1, arithm=cv::gapi::fluid::ARITHM_ADD, src2=..., src1=..., dst=...)
    at /build/precommit_custom_linux/opencv/modules/gapi/src/backends/fluid/gfluidcore.cpp:467
#3  0x00007f7fc3a66253 in cv::gapi::fluid::GFluidAdd::run(cv::gapi::fluid::View const&, cv::gapi::fluid::View const&, int, cv::gapi::fluid::Buffer&) (dst=..., src2=..., src1=...)
    at /build/precommit_custom_linux/opencv/modules/gapi/src/backends/fluid/gfluidcore.cpp:501
#4  0x00007f7fc3a66253 in cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call_impl<0, 1, 2, 0>(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&, cv::detail::Seq<0, 1, 2>, cv::detail::Seq<0>) (out_bufs=..., in_args=...)
    at /build/precommit_custom_linux/opencv/modules/gapi/include/opencv2/gapi/fluid/gfluidkernel.hpp:369
#5  0x00007f7fc3a66253 in cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&) (in_args=..., out_bufs=...) at /build/precommit_custom_linux/opencv/modules/gapi/include/opencv2/gapi/fluid/gfluidkernel.hpp:376
#6  0x00007f7fc39cbf73 in std::function<void (std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)>::operator()(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&) const (__args#1=std::vector of length 1, capacity 1 = {...}, __args#0=std::vector of length 3, capacity 3 = {...}, this=0x55e7d1384308) at /usr/include/c++/7/bits/std_function.h:706
#7  0x00007f7fc39cbf73 in cv::gimpl::FluidAgent::doWork() (this=0x55e7d13842f0) at /build/precommit_custom_linux/opencv/modules/gapi/src/backends/fluid/gfluidbackend.cpp:506
#8  0x00007f7fc39d09d8 in cv::gimpl::GFluidExecutable::run(std::vector<std::pair<cv::gimpl::RcDesc, cv::util::variant<cv::UMat, cv::RMat, std::shared_ptr<cv::gapi::wip::IStreamSource>, cv::Mat, cv::Scalar_<double>, cv::detail::VectorRef, cv::detail::OpaqueRef, cv::MediaFrame> >, std::allocator<std::pair<cv::gimpl::RcDesc, cv::util::variant<cv::UMat, cv::RMat, std::shared_ptr<cv::gapi::wip::IStreamSource>, cv::Mat, cv::Scalar_<double>, cv::detail::VectorRef, cv::detail::OpaqueRef, cv::MediaFrame> > > >&, std::vector<std::pair<cv::gimpl::RcDesc, cv::util::variant<cv::UMat*, cv::Mat*, cv::RMat*, cv::Scalar_<double>*, cv::detail::VectorRef, cv::detail::OpaqueRef> >, std::allocator<std::pair<cv::gimpl::RcDesc, cv::util::variant<cv::UMat*, cv::Mat*, cv::RMat*, cv::Scalar_<double>*, cv::detail::VectorRef, cv::detail::OpaqueRef> > > >&) (this=0x55e7d1328260, input_objs=std::vector of length 1, capacity 1 = {...}, output_objs=std::vector of length 1, capacity 1 = {...}) at /build/precommit_custom_linux/opencv/modules/gapi/src/backends/fluid/gfluidbackend.cpp:1354
#9  0x00007f7fc38fc51e in cv::gimpl::GIslandExecutable::run(cv::gimpl::GIslandExecutable::IInput&, cv::gimpl::GIslandExecutable::IOutput&) (this=0x55e7d1328260, in=..., out=...)
    at /build/precommit_custom_linux/opencv/modules/gapi/src/compiler/gislandmodel.cpp:371
#10 0x00007f7fc395af82 in cv::gimpl::GExecutor::run(cv::gimpl::GRuntimeArgs&&) (this=0x55e7d133c7b0, args=...) at /build/precommit_custom_linux/opencv/modules/gapi/src/executor/gexecutor.cpp:336
#11 0x00007f7fc3916133 in cv::GCompiled::Priv::run(cv::gimpl::GRuntimeArgs&&) (this=<optimized out>, args=...) at /build/precommit_custom_linux/opencv/modules/gapi/src/compiler/gcompiled.cpp:38
#12 0x00007f7fc3917070 in cv::GCompiled::operator()(std::vector<cv::util::variant<cv::UMat, cv::RMat, std::shared_ptr<cv::gapi::wip::IStreamSource>, cv::Mat, cv::Scalar_<double>, cv::detail::VectorRef, cv::detail::OpaqueRef, cv::MediaFrame>, std::allocator<cv::util::variant<cv::UMat, cv::RMat, std::shared_ptr<cv::gapi::wip::IStreamSource>, cv::Mat, cv::Scalar_<double>, cv::detail::VectorRef, cv::detail::OpaqueRef, cv::MediaFrame> > >&&, std::vector<cv::util::variant<cv::UMat*, cv::Mat*, cv::RMat*, cv::Scalar_<double>*, cv::detail::VectorRef, cv::detail::OpaqueRef>, std::allocator<cv::util::variant<cv::UMat*, cv::Mat*, cv::RMat*, cv::Scalar_<double>*, cv::detail::VectorRef, cv::detail::OpaqueRef> > >&&) (this=<optimized out>, ins=..., outs=...) at /build/precommit_custom_linux/opencv/modules/gapi/src/compiler/gcompiled.cpp:107
#13 0x00007f7fc38874cd in cv::GComputation::apply(std::vector<cv::util::variant<cv::UMat, cv::RMat, std::shared_ptr<cv::gapi::wip::IStreamSource>, cv::Mat, cv::Scalar_<double>, cv::detail::VectorRef, cv::detail::OpaqueRef, cv::MediaFrame>, std::allocator<cv::util::variant<cv::UMat, cv::RMat, std::shared_ptr<cv::gapi::wip::IStreamSource>, cv::Mat, cv::Scalar_<double>, cv::detail::VectorRef, cv::detail::OpaqueRef, cv::MediaFrame> > >&&, std::vector<cv::util::variant<cv::UMat*, cv::Mat*, cv::RMat*, cv::Scalar_<double>*, cv::detail::VectorRef, cv::detail::OpaqueRef>, std::allocator<cv::util::variant<cv::UMat*, cv::Mat*, cv::RMat*, cv::Scalar_<double>*, cv::detail::VectorRef, cv::detail::OpaqueRef> > >&&, std::vector<cv::GCompileArg, std::allocator<cv::GCompileArg> >&&) (this=this@entry=0x7ffc4c0ef7f0, ins=..., outs=..., args=...) at /build/precommit_custom_linux/opencv/modules/gapi/src/api/gcomputation.cpp:155
#14 0x00007f7fc3887c89 in cv::GComputation::apply(cv::Mat, cv::Mat&, std::vector<cv::GCompileArg, std::allocator<cv::GCompileArg> >&&) (this=this@entry=0x7ffc4c0ef7f0, in=..., out=..., args=...)
    at /build/precommit_custom_linux/opencv/modules/gapi/src/api/gcomputation.cpp:210
#15 0x000055e7ce9be02d in opencv_test::GComputationCompile_FluidReshapeWithDifferentDims_Test::Body() (this=<optimized out>) at /build/precommit_custom_linux/opencv/modules/gapi/test/internal/gapi_int_recompilation_test.cpp:84
#16 0x000055e7ce9b6370 in opencv_test::GComputationCompile_FluidReshapeWithDifferentDims_Test::TestBody() (this=0x55e7d123abd0) at /build/precommit_custom_linux/opencv/modules/gapi/test/internal/gapi_int_recompilation_test.cpp:75
#17 0x000055e7ceb442ea in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (location=0x55e7cebce41c "the test body", method=<optimized out>, object=0x55e7d123abd0) at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:3917
#18 0x000055e7ceb442ea in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (object=0x55e7d123abd0, method=<optimized out>, location=0x55e7cebce41c "the test body") at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:3953
#19 0x000055e7ceb4453a in testing::Test::Run() (this=0x55e7d123abd0) at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:3991
#20 0x000055e7ceb44858 in testing::TestInfo::Run() (this=0x55e7d055a9c0) at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:4167
#21 0x000055e7ceb44965 in testing::TestCase::Run() (this=0x55e7d055a420) at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:4285
#22 0x000055e7ceb4a4c8 in testing::internal::UnitTestImpl::RunAllTests() (this=this@entry=0x55e7d05079b0) at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:6660
#23 0x000055e7ceb4a6b1 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (location=0x55e7cebce828 "auxiliary test code (environments or event listeners)", method=<optimized out>, object=<optimized out>) at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:3917
#24 0x000055e7ceb4a6b1 in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (location=0x55e7cebce828 "auxiliary test code (environments or event listeners)", method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x55e7ceb4a050 <testing::internal::UnitTestImpl::RunAllTests()>, object=0x55e7d05079b0)
    at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:3953
#25 0x000055e7ceb4a6b1 in testing::UnitTest::Run() (this=0x55e7cef3fca0 <testing::UnitTest::GetInstance()::instance>) at /build/precommit_custom_linux/opencv/modules/ts/src/ts_gtest.cpp:6269
#26 0x000055e7ce554771 in RUN_ALL_TESTS() () at /build/precommit_custom_linux/opencv/modules/ts/include/opencv2/ts/ts_gtest.h:22228
#27 0x000055e7ce554771 in main(int, char**) (argc=<optimized out>, argv=0x7ffc4c0eff78) at /build/precommit_custom_linux/opencv/modules/gapi/test/test_main.cpp:12
(gdb) info registers 
rax            0xd53b40	13974336
rbx            0x55e7d1386f80	94454135943040
rcx            0xffffffe0	4294967264
rdx            0xd53b00	13974272
rsi            0x55e7d1382900	94454135924992
rdi            0x55e7d1382900	94454135924992
rbp            0x7ffc4c0eeab0	0x7ffc4c0eeab0
rsp            0x7ffc4c0eea40	0x7ffc4c0eea40
r8             0x55e7d138e500	94454135973120
r9             0x20	32
r10            0x3f	63
r11            0x0	0
r12            0x0	0
r13            0x55e7d1210498	94454134408344
r14            0x55e7d1386f80	94454135943040
r15            0x55e7d123c1d8	94454134587864
rip            0x7f7fc3a66253	0x7f7fc3a66253 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+307>
eflags         0x10202	[ IF RF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0
k0             0xffff	65535
k1             0xffff	65535
k2             0x24924924	613566756
k3             0xaaaaaaaaaaaaaaaa	12297829382473034410
k4             0xffff	65535
k5             0x1ff	511
k6             0x1ff	511
k7             0xfe00	65024

(gdb) x/16i $rip-0x13 
   0x7f7fc3a66240 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+288>:	mov    %eax,%edx
   0x7f7fc3a66242 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+290>:	add    $0x40,%eax
   0x7f7fc3a66245 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+293>:	vmovdqu8 (%rdi,%rdx,1),%zmm0
   0x7f7fc3a6624c <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+300>:	vpaddusb (%rsi,%rdx,1),%zmm0,%zmm0
=> 0x7f7fc3a66253 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+307>:	vmovdqu64 %zmm0,(%r8,%rdx,1)
   0x7f7fc3a6625a <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+314>:	cmp    %ecx,%eax
   0x7f7fc3a6625c <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+316>:	
    jbe    0x7f7fc3a66240 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+288>
   0x7f7fc3a6625e <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+318>:	cmp    $0x3f,%eax
   0x7f7fc3a66261 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+321>:	mov    %r10d,%edx
   0x7f7fc3a66264 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+324>:	cmovae %eax,%edx
   0x7f7fc3a66267 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+327>:	cmp    %edx,%r9d
   0x7f7fc3a6626a <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+330>:	
    jbe    0x7f7fc3a667f1 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+1745>
   0x7f7fc3a66270 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+336>:	test   %r11d,%r11d
   0x7f7fc3a66273 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+339>:	
    je     0x7f7fc3a667e9 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+1737>
   0x7f7fc3a66279 <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+345>:	mov    %ecx,%eax
   0x7f7fc3a6627b <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+347>:	
    jmp    0x7f7fc3a6625a <cv::detail::FluidCallHelper<cv::gapi::fluid::GFluidAdd, std::tuple<cv::GMat, cv::GMat, int>, std::tuple<cv::GMat>, false>::call(std::vector<cv::GArg, std::allocator<cv::GArg> > const&, std::vector<cv::gapi::fluid::Buffer*, std::allocator<cv::gapi::fluid::Buffer*> > const&)+314>

@anna-khakimova
Copy link
Copy Markdown
Member Author

Now test log from Custom build:
[ RUN ] StatefulKernel.StateIsAutoResetForNewStream
unknown file: Failure
C++ exception with description "OpenCV(4.5.0-pre) /home/user/Projects/opencv/opencv/modules/gapi/test/cpu/gapi_ocv_stateful_kernel_tests.cpp:54: error: (-215:Assertion failed) testDataPath != nullptr in function 'initTestDataPath'
" thrown in the test body.
[ FAILED ] StatefulKernel.StateIsAutoResetForNewStream (0 ms)

@anna-khakimova
Copy link
Copy Markdown
Member Author

anna-khakimova commented Oct 15, 2020

Now test log from Custom build:
[ RUN ] StatefulKernel.StateIsAutoResetForNewStream
unknown file: Failure
C++ exception with description "OpenCV(4.5.0-pre) /home/user/Projects/opencv/opencv/modules/gapi/test/cpu/gapi_ocv_stateful_kernel_tests.cpp:54: error: (-215:Assertion failed) testDataPath != nullptr in function 'initTestDataPath'
" thrown in the test body.
[ FAILED ] StatefulKernel.StateIsAutoResetForNewStream (0 ms)

It's mean that now Custom build tests fail only because of test infrastructure.

@alalek
Copy link
Copy Markdown
Member

alalek commented Oct 15, 2020

/home/user/Projects/opencv/opencv/modules/gapi/test/cpu/gapi_ocv_stateful_kernel_tests.cpp
It's mean that now Custom build tests fail because of test infrastructure.

There is no /home on CI test infrastructure, so it is not true again.

/cc @dmatveev Please help how to check build logs


Currently test app just hangs:

[ RUN      ] GComputationCompile.FluidReshapeWithDifferentDims

command timed out: 180 seconds without output running buildenv python ../opencv/modules/ts/misc/run.py --gtest_output=xml:results_test_gapi.xml -t gapi -a, attempting to kill
process killed by signal 15
program finished with exit code -1

@anna-khakimova
Copy link
Copy Markdown
Member Author

anna-khakimova commented Oct 15, 2020

/home/user/Projects/opencv/opencv/modules/gapi/test/cpu/gapi_ocv_stateful_kernel_tests.cpp
It's mean that now Custom build tests fail because of test infrastructure.

There is no /home on CI test infrastructure, so it is not true again.

/cc @dmatveev Please help how to check build logs

Currently test app just hangs:

[ RUN      ] GComputationCompile.FluidReshapeWithDifferentDims

command timed out: 180 seconds without output running buildenv python ../opencv/modules/ts/misc/run.py --gtest_output=xml:results_test_gapi.xml -t gapi -a, attempting to kill
process killed by signal 15
program finished with exit code -1

I've already run tests on AVX512 machine and show the log mentioned in previous comment.
There are only several failed tests as you can see in the log. They failed because of test infrastructure.
Please see that log.

}

if (x < length)
x = length - nlanes;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such tail processing requires that in != out. (However that's ok if such a check is performed somewhere higher by call stack)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Add check for cases when input and output types are the same.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmatveev Do we really need these checks in G-API code? Does G-API support inplace processing? If no, then it makes sense to add CV_DbgAssert() instead.

@anna-khakimova There are 7 similar loops in this patch. Commit contains 3 updates only. Why?

Copy link
Copy Markdown
Member Author

@anna-khakimova anna-khakimova Oct 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alalek
Answer for first question: This checks necessary to process last several elements of input array (tail) via univ intrinsics since their number is less than nlanes .
Answer for second question: As I've already mentioned in previous comment, for inplace implementation it's necessary that input and output array types should be the same. So, this check is needed only in 3 cases (in 3 functions which calls when input and output types are the same).
Note: Please read my comments more attentively.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does G-API support inplace processing?

Yes, we have to support inplace processing at least to avoid copy when calling cv::gapi::wip::draw::render().

Not sure if Fluid have ever been tested with such type of inplace execution, but the Fluid itself never forces input and output buffer to be the same.

Copy link
Copy Markdown
Member Author

@anna-khakimova anna-khakimova Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alalek
There are no such cases in GAPI.
For AbsDiff there are cases such as:

  1. uchar inputs and uchar output
  2. short int inputs and short int output.
  3. ushort inputs and ushort output.
  4. float inputs and float outputs
    For all cases mentioned above there is check to detect inplace.

For Add and Sub there are cases such as:

  1. uchar inputs and uchar output <--- For this case there are checks for inplace.
  2. short int inputs and uchar output <---- There is no sense to check for inplace.
  3. float inputs and uchar output <------ There is no sense to check for inplace.
  4. short int inputs and short int output <------- For this case there are check for inplace.
  5. float inputs and float output <-------- For this case there are checks for inplace.
  6. uchar inputs and float output <------- There is no sense to check for inplace.
  7. short int inputs and float output <------- There is no sense to check for inplace.

As you can see, there are no cases with int at all and with int and float in particular.

These GAPI kernels don't support int type at all.

Inplace implementation is absent for these kernels. And case when user gives one the same int matrix for inputs, then cast this matrix to float and pass it to output is difficult to imagine in reality.

What's the point in adding checks for all occasions?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... for now.

Assertions verify assumptions which are required by related code below.
It is a really powerful tool.
This helps with investigations in the future through emitting error messages. This reduces annoying debugging process of related problems.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... for now.

Assertions verify assumptions which are required by related code below.
It is a really powerful tool.
This helps with investigations in the future through emitting error messages. This reduces annoying debugging process of related problems.

@alalek Ok. Done.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I would suggest to put CV_DbgAssert() for other cases.

when input and output types are the same

to catch cases when different types are inplaced (e.g. both float and int are 32-bit)

I've already put checks for inplace when input and output types are the same.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already put checks for inplace when input and output types are the same.

Please open an issue internally to start checking for inplace at the Fluid backend level.
Then we could remove such checks from the kernels (where it may be costly given that kernels are called for every line of an image)

@anna-khakimova
Copy link
Copy Markdown
Member Author

@alalek please review one more. All checks are passed.

@anna-khakimova
Copy link
Copy Markdown
Member Author

anna-khakimova commented Oct 20, 2020

@alalek @dmatveev, all comments were addressed, all checks were passed. Please take a look.

{
VT a = vx_load(&in1[x]);
VT b = vx_load(&in2[x]);
absdiff_store(out, a, b, x);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the only place where absdiff_store is used, or did I miss something ? If so - that is the point of separate function for this , why not inline it's body here ?

Copy link
Copy Markdown
Member Author

@anna-khakimova anna-khakimova Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anton-potapov
I forced to add these overloads because I realized that new universal intrinsic absdiffs() (added by me for v_uint8, v_uint16, v_float32 types earlier) work the same as already exist absdiff() for argument types mentioned above. So now I have to use only one absdiffs() for v_int16 type only. For the rest types (v_uint8, v_uint16, v_float32) I use absdiff(). And so these overloads are need here.

Copy link
Copy Markdown
Contributor

@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code can be really simplified using templates, but that's another story

@alalek alalek mentioned this pull request Nov 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants