performance tests for ORB by ilya-lavrenov · Pull Request #2 · vpisarev/opencv

ilya-lavrenov · 2014-03-13T19:32:55Z

No description provided.

update to master

[GSOC] Speeding-up AKAZE, part #2 (opencv#8951) * feature2d: instrument more functions used in AKAZE * rework Compute_Determinant_Hessian_Response * this takes 84% of time of Feature_Detection * run everything in parallel * compute Scharr kernels just once * compute sigma more efficiently * allocate all matrices in evolution without zeroing * features2d: add one bigger image to tests * now test have images: 600x768, 900x600 and 1385x700 to cover different resolutions * explicitly zero Lx and Ly * add Lflow and Lstep to evolution as in original AKAZE code * reworked computing keypoints orientation integrated faster function from https://github.com/h2suzuki/fast_akaze * use standard fastAtan2 instead of getAngle * compute keypoints orientation in parallel * fix visual studio warnings * replace some wrapped functions with direct calls to OpenCV functions * improved readability for people familiar with opencv * do not same image twice in base level * rework diffusity stencil * use one pass stencil for diffusity from https://github.com/h2suzuki/fast_akaze * improve locality in Create_Scale_Space * always compute determinat od hessian and spacial derivatives * this needs to be computed always as we need derivatives while computing descriptors * fixed tests of AKAZE with KAZE descriptors which have been affected by this Currently it computes all first and second order derivatives together and the determiant of the hessian. For descriptors it would be enough to compute just first order derivates, but it is not probably worth it optimize for scenario where descriptors and keypoints are computed separately, since it is already very inefficient. When computing keypoint and descriptors together it is faster to do it the current way (preserves locality). * parallelize non linear diffusion computation * do multiplication right in the nlp diffusity kernel * rework kfactor computation * get rid of sharing buffers when creating scale space pyramid, the performace impact is neglegible * features2d: initialize TBB scheduler in perf tests * ensures more stable output * more reasonable profiles, since the first call of parallel_for_ is not getting big performace hit * compute_kfactor: interleave finding of maximum and computing distance * no need to go twice through the data * start to use UMats in AKAZE to leverage OpenCl in the future * fixed bug that prevented computing determinant for scale pyramid of size 1 (just the base image) * all descriptors now support writing to uninitialized memory * use InputArray and OutputArray for input image and descriptors, allows to make use UMAt that user passes to us * enable use of all existing ocl paths in AKAZE * all parts that uses ocl-enabled functions should use ocl by now * imgproc: fix dispatching of IPP version when OCL is disabled * when OCL is disabled IPP version should be always prefered (even when the dst is UMat) * get rid of copy in DeterminantHessian response * this slows CPU version considerably * do no run in parallel when running with OCL * store derivations as UMat in pyramid * enables OCL path computing of determint hessian * will allow to compute descriptors on GPU in the future * port diffusivity to OCL * diffusivity itself is not a blocker, but this saves us downloading and uploading derivations * implement kernel for nonlinear scalar diffusion step * download the pyramid from GPU just once we don't want to downlaod matrices ad hoc from gpu when the function in AKAZE needs it. There is a HUGE mapping overhead and without shared memory support a LOT of unnecessary transfers. This maps/downloads matrices just once. * fix bug with uninitialized values in non linear diffusion * this was causing spurious segfaults in stitching tests due to propagation of NaNs * added new test, which checks for NaNs (added new debug asserts for NaNs) * valgrind now says everything is ok * add nonlinear diffusion step OCL implementation * Lt in pyramid changed to UMat, it will be downlaoded from GPU along with Lx, Ly * fix bug in pm_g2 kernel. OpenCV mangles dimensions passed to OpenCL, so we need to check for boundaries in each OCL kernel. * port computing of determinant to OCL * computing of determinant is not a blocker, but with this change we don't need to download all spatial derivatives to CPU, we only download determinant * make Ldet in the pyramid UMat, download it from CPU together with the other parts of the pyramid * add profiling macros * fix visual studio warning * instrument non_linear_diffusion * remove changes I have made to TEvolution * TEvolution is used only in KAZE now * Revert "features2d: initialize TBB scheduler in perf tests" This reverts commit ba81e2a.

* Update demosaicing.cpp Fixed calculation of Bs for non-green pixels. * Fixed cvtColor perf test for bayer VNG

The address sanitizer highlighted this issue in our code base. It looks like the code is currently grabbing a pointer to a temporary object and then performing operations on it. I printed some information right before the asan crash: eigensolver address: 0x7f0ad95032f0 eigensolver size: 4528 eig_vecs_ ptr: 0x7f0ad95045e0 eig_vecs_ offset: 4848 This shows that `eig_vecs_` points past the end of `eigensolver`. In other words, it points at the temporary object created by the `eigensolver.eigenvectors()` call. Compare the docs for `.eigenvalues()`: https://eigen.tuxfamily.org/dox/classEigen_1_1EigenSolver.html#a0f507ad7ab14797882f474ca8f2773e7 to the docs for `.eigenvectors()`: https://eigen.tuxfamily.org/dox/classEigen_1_1EigenSolver.html#a66288022802172e3ee059283b26201d7 The difference in return types is interesting. `.eigenvalues()` returns a reference. But `.eigenvectors()` returns a matrix. This patch here fixes the problem by saving the temporary object and then grabbing a pointer into it. This is a curated snippet of the original asan failure: ==12==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fc633704640 at pc 0x7fc64f7f1593 bp 0x7ffe8875fc90 sp 0x7ffe8875fc88 READ of size 8 at 0x7fc633704640 thread T0 #0 0x7fc64f7f1592 in cv::usac::EssentialMinimalSolverStewenius5ptsImpl::estimate(std::__1::vector<int, std::__1::allocator<int> > const&, std::__1::vector<cv::Mat, std::__1::allocator<cv::Mat> >&) const /proc/self/cwd/external/com_github_opencv_opencv/modules/calib3d/src/usac/essential_solver.cpp:181:48 #1 0x7fc64f915d92 in cv::usac::EssentialEstimatorImpl::estimateModels(std::__1::vector<int, std::__1::allocator<int> > const&, std::__1::vector<cv::Mat, std::__1::allocator<cv::Mat> >&) const /proc/self/cwd/external/com_github_opencv_opencv/modules/calib3d/src/usac/estimator.cpp:110:46 #2 0x7fc64fa74fb0 in cv::usac::Ransac::run(cv::Ptr<cv::usac::RansacOutput>&) /proc/self/cwd/external/com_github_opencv_opencv/modules/calib3d/src/usac/ransac_solvers.cpp:152:58 #3 0x7fc64fa6cd8e in cv::usac::run(cv::Ptr<cv::usac::Model const> const&, cv::_InputArray const&, cv::_InputArray const&, int, cv::Ptr<cv::usac::RansacOutput>&, cv::_InputArray const&, cv::_InputArray const&, cv::_InputArray const&, cv::_InputArray const&) /proc/self/cwd/external/com_github_opencv_opencv/modules/calib3d/src/usac/ransac_solvers.cpp:1010:16 #4 0x7fc64fa6fb46 in cv::usac::findEssentialMat(cv::_InputArray const&, cv::_InputArray const&, cv::_InputArray const&, int, double, double, cv::_OutputArray const&) /proc/self/cwd/external/com_github_opencv_opencv/modules/calib3d/src/usac/ransac_solvers.cpp:527:9 #5 0x7fc64f3b5522 in cv::findEssentialMat(cv::_InputArray const&, cv::_InputArray const&, cv::_InputArray const&, int, double, double, int, cv::_OutputArray const&) /proc/self/cwd/external/com_github_opencv_opencv/modules/calib3d/src/five-point.cpp:437:16 #6 0x7fc64f3b7e00 in cv::findEssentialMat(cv::_InputArray const&, cv::_InputArray const&, cv::_InputArray const&, int, double, double, cv::_OutputArray const&) /proc/self/cwd/external/com_github_opencv_opencv/modules/calib3d/src/five-point.cpp:486:12 ... Address 0x7fc633704640 is located in stack of thread T0 at offset 17984 in frame #0 0x7fc64f7ed4ff in cv::usac::EssentialMinimalSolverStewenius5ptsImpl::estimate(std::__1::vector<int, std::__1::allocator<int> > const&, std::__1::vector<cv::Mat, std::__1::allocator<cv::Mat> >&) const /proc/self/cwd/external/com_github_opencv_opencv/modules/calib3d/src/usac/essential_solver.cpp:36 This frame has 63 object(s): [32, 56) 'coefficients' (line 38) [96, 384) 'ee' (line 55) ... [13040, 17568) 'eigensolver' (line 142) [17824, 17840) 'ref.tmp518' (line 143) [17856, 17872) 'ref.tmp523' (line 144) [17888, 19488) 'ref.tmp524' (line 144) <== Memory access at offset 17984 is inside this variable [19616, 19640) 'ref.tmp532' (line 169) ... The crash report says that we're accessing a temporary object from line 144 when we shouldn't be. Line 144 looks like this: https://github.com/opencv/opencv/blob/4.6.0/modules/calib3d/src/usac/essential_solver.cpp#L144 const auto * const eig_vecs_ = (double *) eigensolver.eigenvectors().real().data(); We are using version 4.6.0 for this, but the problem is present on the 4.x branch. Note that I am dropping the .real() call here. I think that is safe because of the code further down (line 277 in the most recent version): const int eig_i = 20 * i + 12; // eigen stores imaginary values too The code appears to expect to have to skip doubles for the imaginary parts of the complex numbers. Admittedly, I couldn't find a test case that exercised this code path to validate correctness.

Handle huge images in IPP distanceTransform opencv#24535 ### Pull Request Readiness Checklist * Do not use IPP for huge Mat (reproduced with opencv#23895 (comment) on `DIST_MASK_5`) I have observed two types of errors on the reproducer from the issue: 1. When `temp` is not allocated: ``` Thread 1 "app" received signal SIGSEGV, Segmentation fault. 0x00007ffff65dc755 in icv_l9_ownDistanceTransform_5x5_8u32f_C1R_21B_g9e9 () from /home/dkurtaev/opencv_install/bin/../lib/libopencv_imgproc.so.408 (gdb) bt #0 0x00007ffff65dc755 in icv_l9_ownDistanceTransform_5x5_8u32f_C1R_21B_g9e9 () from /home/dkurtaev/opencv_install/bin/../lib/libopencv_imgproc.so.408 #1 0x00007ffff659e8df in icv_l9_ippiDistanceTransform_5x5_8u32f_C1R () from /home/dkurtaev/opencv_install/bin/../lib/libopencv_imgproc.so.408 #2 0x00007ffff5c390f0 in cv::distanceTransform (_src=..., _dst=..., _labels=..., distType=2, maskSize=5, labelType=1) at /home/dkurtaev/opencv/modules/imgproc/src/distransform.cpp:854 #3 0x00007ffff5c396ef in cv::distanceTransform (_src=..., _dst=..., distanceType=2, maskSize=5, dstType=5) at /home/dkurtaev/opencv/modules/imgproc/src/distransform.cpp:903 #4 0x000055555555669e in main (argc=1, argv=0x7fffffffdef8) at /home/dkurtaev/main.cpp:18 ``` 2. When we keep `temp` allocated every time: ``` OpenCV(4.8.0-dev) Error: Assertion failed (udata < (uchar*)ptr && ((uchar*)ptr - udata) <= (ptrdiff_t)(sizeof(void*)+64)) in fastFree, file /home/dkurtaev/opencv/modules/core/src/alloc.cpp, line 191 terminate called after throwing an instance of 'cv::Exception' what(): OpenCV(4.8.0-dev) /home/dkurtaev/opencv/modules/core/src/alloc.cpp:191: error: (-215:Assertion failed) udata < (uchar*)ptr && ((uchar*)ptr - udata) <= (ptrdiff_t)(sizeof(void*)+64) in function 'fastFree' ``` * Try enable IPP for 3x3 (see opencv#15904) * Reduce memory footprint with IPP See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

Use size_t when calculating size of all_points opencv#26650 Closes: opencv#26642 Asan log ``` ================================================================= ==41401==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fc55a02a3fc at pc 0x7fc58e304131 bp 0x7ffd54787b00 sp 0x7ffd54787af8 WRITE of size 4 at 0x7fc55a02a3fc thread T0 #0 0x7fc58e304130 in cv::QRDetectMulti::checkSets(std::vector<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > >, std::allocator<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > > > >&, std::vector<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > >, std::allocator<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > > > >&, std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > >&) /home/fanta/source/opencv/modules/objdetect/src/qrcode.cpp:3726 #1 0x7fc58e3054b0 in cv::QRDetectMulti::localization() /home/fanta/source/opencv/modules/objdetect/src/qrcode.cpp:3829 #2 0x7fc58e308020 in cv::ImplContour::detectMulti(cv::_InputArray const&, cv::_OutputArray const&) const /home/fanta/source/opencv/modules/objdetect/src/qrcode.cpp:3987 #3 0x7fc58e30b5b1 in cv::ImplContour::detectAndDecodeMulti(cv::_InputArray const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, cv::_OutputArray const&, cv::_OutputArray const&) const /home/fanta/source/opencv/modules/objdetect/src/qrcode.cpp:4176 #4 0x7fc58e28922f in cv::GraphicalCodeDetector::detectAndDecodeMulti(cv::_InputArray const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, cv::_OutputArray const&, cv::_OutputArray const&) const /home/fanta/source/opencv/modules/objdetect/src/graphical_code_detector.cpp:42 #5 0x5954e8 in Body /home/fanta/source/opencv/modules/objdetect/test/test_qrcode.cpp:48 #6 0x594fc0 in TestBody /home/fanta/source/opencv/modules/objdetect/test/test_qrcode.cpp:42 #7 0x67ee6a in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:3919 #8 0x6734a4 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:3955 #9 0x641fe8 in testing::Test::Run() /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:3993 #10 0x6431ac in testing::TestInfo::Run() /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:4169 #11 0x643d15 in testing::TestCase::Run() /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:4287 #12 0x659ff3 in testing::internal::UnitTestImpl::RunAllTests() /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:6662 #13 0x681205 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:3919 #14 0x675127 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:3955 #15 0x65734c in testing::UnitTest::Run() /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:6271 #16 0x5907f0 in RUN_ALL_TESTS() /home/fanta/source/opencv/modules/ts/include/opencv2/ts/ts_gtest.h:22240 #17 0x590cdd in main (/home/fanta/source/opencv-build-4.x-clang/bin/opencv_test_objdetect+0x590cdd) (BuildId: a9363fc788d57c48225fc0559ac9199d07d415db) #18 0x7fc58ab242ad in __libc_start_call_main (/lib64/libc.so.6+0x2a2ad) (BuildId: 03f1631dc9760d3e30311fe62e15cc4baaa89db7) opencv#19 0x7fc58ab24378 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x2a378) (BuildId: 03f1631dc9760d3e30311fe62e15cc4baaa89db7) opencv#20 0x417014 in _start ../sysdeps/x86_64/start.S:115 0x7fc55a02a3fc is located 0 bytes after 2938510332-byte region [0x7fc4aadc8800,0x7fc55a02a3fc) allocated by thread T0 here: #0 0x7fc58e590298 in operator new(unsigned long) (/lib64/libasan.so.8+0xfd298) (BuildId: da72ee674d801ced58193987786b90646d94ff8d) #1 0x7fc58e34d010 in std::__new_allocator<cv::Vec<int, 3> >::allocate(unsigned long, void const*) /usr/include/c++/14/bits/new_allocator.h:151 SUMMARY: AddressSanitizer: heap-buffer-overflow /home/fanta/source/opencv/modules/objdetect/src/qrcode.cpp:3726 in cv::QRDetectMulti::checkSets(std::vector<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > >, std::allocator<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > > > >&, std::vector<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > >, std::allocator<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > > > >&, std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > >&) Shadow bytes around the buggy address: 0x7fc55a02a100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7fc55a02a180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7fc55a02a200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7fc55a02a280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7fc55a02a300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x7fc55a02a380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00[04] 0x7fc55a02a400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x7fc55a02a480: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x7fc55a02a500: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x7fc55a02a580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x7fc55a02a600: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==41401==ABORTING ``` `(true_points_group[i].size()` is 1794 and `(true_points_group[i].size() - 2 ) * (true_points_group[i].size() - 1) * true_points_group[i].size())` is 5764222464 which overflows `int` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

ilya-lavrenov added 2 commits March 13, 2014 23:30

performance tests for ORB

b47bec2

performance tests for FAST

4ebd343

ilya-lavrenov closed this Mar 14, 2014

vpisarev pushed a commit that referenced this pull request Jun 23, 2014

ipp: some functions are not available in the ICV package #2

ab691a4

vpisarev pushed a commit that referenced this pull request Dec 22, 2016

Merge pull request #2 from opencv/master

5ebdf6c

update to master

vpisarev pushed a commit that referenced this pull request Sep 13, 2019

Bayer VNG Demosaicing Fix #2 (Merge pull request opencv#15086)

d3cf0d2

* Update demosaicing.cpp Fixed calculation of Bs for non-green pixels. * Fixed cvtColor perf test for bayer VNG

vpisarev added a commit that referenced this pull request Oct 12, 2020

try to fix ARMv7/ARMv8 compile errors; attempt #2

2a901f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance tests for ORB#2

performance tests for ORB#2
ilya-lavrenov wants to merge 2 commits intovpisarev:ocl_orbfrom
ilya-lavrenov:tapi_perf_orb

ilya-lavrenov commented Mar 13, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ilya-lavrenov commented Mar 13, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant