Skip to content

[GSOC] Speeding-up AKAZE, part #2#8951

Merged
alalek merged 31 commits intoopencv:masterfrom
hrnr:akaze_part2
Aug 1, 2017
Merged

[GSOC] Speeding-up AKAZE, part #2#8951
alalek merged 31 commits intoopencv:masterfrom
hrnr:akaze_part2

Conversation

@hrnr
Copy link
Copy Markdown
Contributor

@hrnr hrnr commented Jun 21, 2017

This part focuses on performance improvements for AKAZE on CPU and implementing basic OCL support.

cc: @bmagyar

OpenCL:

some parts (mainly construction of the pyramid) execute OpenCL paths.

  • Create_Nonlinear_Scale_Space almost all operatiins are OpenCl enabled
    • GaussianBlur - executes OCL, but less effective path through sepFilter2D
    • Scharr - has ocl
    • resize - has ocl
    • kfactor estimation - needs custom ocl kernel, might reuse some parts for computing histogram
    • diffussion + conductivity - needs custom ocl kernel
    • Compute_Determinant_Hessian_Response executes OpenCL general sepFilter2D, computing determinant would need custom kernel, but is not currently blocker
  • Feature_Detection no OCL
  • Compute_Keypoints_Orientation no OCL
  • Compute_Descriptors no OCL

CPU status:

  • Create_Nonlinear_Scale_Space reworked, some intrinsics might help with diffusion
    • Compute_Determinant_Hessian_Response reworked, needs specialized fine tuning for kernels 5x5, 7x7, 9x9
  • Feature_Detection this will be reworked together with GPU part, we might want the same format of keypoints on GPU and CPU
  • Compute_Keypoints_Orientation reworked
  • Compute_Descriptors not a blocker

The main parts (and largest bottlenecks) are:

  • Create_Nonlinear_Scale_Space ~19% ~21ms
    • needs to be reworked
  • Feature_Detection ~52% ~60ms, of which:
    • Compute_Determinant_Hessian_Response 83% (over 43% of the whole algorithm)
      • reworked slightly -> reduced Feature_Detection to ~58ms
    • Find_Scale_Space_Extrema 16% (globally just 8%)
      • this might be worth to look into when other parts will be optimized
    • Do_Subpixel_Refinement 1%
      • not a bottleneck in the current state
  • Compute_Keypoints_Orientation ~21% ~24ms
    • reworked completely, replaced by a faster implementation, parallelized -> reduced to 3.69ms
  • Compute_Descriptors ~4% ~4.8ms
    • not a bottleneck in the current state

Improvement in Compute_Determinant_Hessian_Response is of cause not very
satisfying. The main problem there is that AKAZE uses Scharr with non-standard
size kernel (different from 3x3). Implementing Scharr with sepFilter2D is much
slower than specialized cv::Scharr we have.

I have tried to replace the sepFilter2D implementation with cv:Scharr,
just to get the idea of possible speedup. It is in the separate branch. With
cv::Scharr Compute_Determinant_Hessian_Response went from ~47ms to ~11ms,
which is quite significant. However it is not possible to replace Scharr just
like, so the tests are failing. I have opened pablofdezalc/akaze#32 to get
some addition info on this.

Performance stats per commit:

I'm testing all CPU performance on Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz.


Geometric mean

                                                   Name of Test                                                        perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf
                                                                                                                    8200996b1  b12facf48  aa5a72b46  8cc0b286c  1d3f7fe9e  c13351891  76151e566  ea089a8ab  f9c2951fa  ba071d1ad  b12facf48  aa5a72b46  8cc0b286c  1d3f7fe9e  c13351891  76151e566  ea089a8ab  f9c2951fa  ba071d1ad
                                                                                                                                                                                                                                      vs         vs         vs         vs         vs         vs         vs         vs         vs
                                                                                                                                                                                                                                     perf       perf       perf       perf       perf       perf       perf       perf       perf
                                                                                                                                                                                                                                  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1
                                                                                                                                                                                                                                  (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor)
detectAndExtract::feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 71.572 ms  67.583 ms  44.742 ms  45.349 ms  42.336 ms  40.726 ms  38.732 ms  38.268 ms  43.283 ms  38.549 ms     1.06       1.60       1.58       1.69       1.76       1.85       1.87       1.65       1.86
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                    56.269 ms  52.597 ms  34.432 ms  34.941 ms  32.298 ms  31.299 ms  29.619 ms  29.137 ms  31.197 ms  29.492 ms     1.07       1.63       1.61       1.74       1.80       1.90       1.93       1.80       1.91
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                    273.459 ms 263.874 ms 163.169 ms 168.078 ms 153.912 ms 164.177 ms 147.598 ms 145.255 ms 156.495 ms 150.994 ms    1.04       1.68       1.63       1.78       1.67       1.85       1.88       1.75       1.81

If you want to reproduce these results you on your own machine, you might find this performance evaluation script helpful. It automates running perf tests for this pull request across many revisions, it generates this summary along with xml test outputs and instrumentation output as shown below.

Instrumentation output:

INITIAL:

Time compensation is 0
CTEST_FULL_OUTPUT
OpenCV version: 3.2.0-dev
OpenCV VCS version: 3.2.0-816-g28d66b332
Build type: release
Parallel framework: tbb
CPU features: popcnt mmx sse sse2 sse3 ssse3 sse4.1 sse4.2 avx avx2 fma3 fp16
cl_get_gt_device(): error, unknown device: ffffffff
cl_get_gt_device(): error, unknown device: ffffffff
cl_get_gt_device(): error, unknown device: ffffffff
OpenCL is disabled
Note: Google Test filter = feature2d_detectAndExtract.detectAndExtract/4
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from feature2d_detectAndExtract
[ RUN      ] feature2d_detectAndExtract.detectAndExtract/4
[ PERFSTAT ]    (samples = 13, mean = 114.95, median = 115.11, stddev = 1.16 (1.0%))
[ VALUE    ]    (AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")
[ TRACE    ]
ROOT
\---detectAndCompute - TC:1 C:13 T:1513.67ms
    |---convertTo - TC:1 C:13 T:2.22ms L:0% G:0%
    |   \---ipp_convertTo<W_IPP> - TC:1 C:13 T:2.18ms L:98% G:0%
    |       \---::ipp::iwiScale<F_IPP> - TC:1 C:13 T:2.17ms L:99% G:0%
    |
    |---zeros - TC:1 C:1664 T:0.75ms L:0% G:0%
    |---operator= - TC:1 C:1664 T:55.10ms L:4% G:4%
    |---Create_Nonlinear_Scale_Space - TC:1 C:13 T:281.87ms L:19% G:19%
    |   |---copyTo - TC:1 C:182 T:17.01ms L:6% G:1%
    |   |   \---ippicviCopy_8u_C1R_L<F_IPP> - TC:1 C:182 T:16.84ms L:99% G:1%
    |   |
    |   |---GaussianBlur - TC:1 C:221 T:50.43ms L:18% G:3%
    |   |   |---ipp_GaussianBlur<W_IPP> - TC:1 C:221 T:27.69ms L:55% G:2%
    |   |   |   |---parallel_for_ - TC:1 C:117 T:25.80ms L:93% G:2%
    |   |   |   |   \---operator() - TC:5 C:2912 T:90.64ms L:351% G:6%
    |   |   |   |       \---operator()<W_IPP> - TC:8 C:2912 T:73.70ms L:81% G:5%
    |   |   |   |           \---::ipp::iwiFilterGaussian<F_IPP> - TC:8 C:2593 T:60.24ms L:82% G:4%
    |   |   |   |               \---::ipp::iwiFilterGaussian - BadExit<MARK_IPP> - TC:8 C:97 T:0.05ms L:0% G:0%
    |   |   |   |
    |   |   |   \---::ipp::iwiFilterGaussian<F_IPP> - TC:1 C:104 T:1.61ms L:6% G:0%
    |   |   |
    |   |   \---sepFilter2D - TC:1 C:13 T:22.39ms L:44% G:1%
    |   |       |---convertTo - TC:1 C:26 T:0.09ms L:0% G:0%
    |   |       |   \---ipp_convertTo<W_IPP> - TC:1 C:26 T:0.06ms L:66% G:0%
    |   |       |       \---::ipp::iwiScale<F_IPP> - TC:1 C:26 T:0.03ms L:60% G:0%
    |   |       |
    |   |       \---apply - TC:1 C:13 T:21.93ms L:98% G:1%
    |   |           |---borderInterpolate - TC:1 C:23608 T:2.98ms L:14% G:0%
    |   |           \---ippiOperator<W_IPP> - TC:1 C:7800 T:8.71ms L:40% G:1%
    |   |               \---ippicviFilterRowBorderPipeline_32f_C1R<F_IPP> - TC:1 C:7800 T:6.43ms L:74% G:0%
    |   |
    |   |---zeros - TC:1 C:143 T:0.09ms L:0% G:0%
    |   |---operator= - TC:1 C:143 T:6.98ms L:2% G:0%
    |   |---Scharr - TC:1 C:416 T:31.74ms L:11% G:2%
    |   |   \---ipp_Deriv<W_IPP> - TC:1 C:416 T:31.45ms L:99% G:2%
    |   |       \---::ipp::iwiFilterScharr<F_IPP> - TC:1 C:416 T:31.04ms L:99% G:2%
    |   |
    |   |---parallel_for_ - TC:1 C:2158 T:72.07ms L:26% G:5%
    |   |   \---operator() - TC:5 C:1612 T:143.77ms L:199% G:9%
    |   |
    |   |---add - TC:1 C:2158 T:29.22ms L:10% G:2%
    |   |   \---ippicviAdd_32f_C1R<F_IPP> - TC:1 C:2158 T:27.56ms L:94% G:2%
    |   |
    |   \---resize - TC:1 C:39 T:2.46ms L:1% G:0%
    |       \---resize - TC:1 C:39 T:2.40ms L:97% G:0%
    |           |---ipp_resize<W_IPP> - TC:1 C:39 T:0.01ms L:1% G:0%
    |           |---parallel_for_ - TC:1 C:26 T:1.56ms L:65% G:0%
    |           |   \---operator() - TC:1 C:26 T:1.49ms L:96% G:0%
    |           |
    |           \---parallel_for_ - TC:1 C:13 T:0.65ms L:27% G:0%
    |
    |---Feature_Detection - TC:1 C:13 T:785.74ms L:52% G:52%
    |   |---Compute_Determinant_Hessian_Response - TC:1 C:13 T:652.72ms L:83% G:43%
    |   |   \---parallel_for_ - TC:1 C:13 T:625.91ms L:96% G:41%
    |   |       \---operator() - TC:1 C:208 T:4019.96ms L:642% G:266%
    |   |           |---copyTo - TC:8 C:2080 T:3.92ms L:0% G:0%
    |   |           |   \---ippicviCopy_8u_C1R_L<F_IPP> - TC:8 C:2080 T:1.48ms L:38% G:0%
    |   |           |
    |   |           |---sepFilter2D - TC:8 C:1040 T:1295.68ms L:32% G:86%
    |   |           |   |---convertTo - TC:8 C:2080 T:6.56ms L:1% G:0%
    |   |           |   |   \---ipp_convertTo<W_IPP> - TC:8 C:2080 T:4.00ms L:61% G:0%
    |   |           |   |       \---::ipp::iwiScale<F_IPP> - TC:8 C:2080 T:1.41ms L:35% G:0%
    |   |           |   |
    |   |           |   \---apply - TC:8 C:1040 T:1287.03ms L:99% G:85%
    |   |           |       |---borderInterpolate - TC:8 C:744900 T:207.46ms L:16% G:14%
    |   |           |       \---ippiOperator<W_IPP> - TC:8 C:219375 T:220.04ms L:17% G:15%
    |   |           |           \---ippicviFilterRowBorderPipeline_32f_C1R<F_IPP> - TC:8 C:219375 T:103.64ms L:47% G:7%
    |   |           |
    |   |           \---convertTo - TC:3 C:1040 T:59.62ms L:1% G:4%
    |   |               \---ipp_convertTo<W_IPP> - TC:3 C:1040 T:56.39ms L:95% G:4%
    |   |                   \---::ipp::iwiScale<F_IPP> - TC:3 C:1040 T:53.24ms L:94% G:4%
    |   |
    |   |---Find_Scale_Space_Extrema - TC:1 C:13 T:122.20ms L:16% G:8%
    |   \---Do_Subpixel_Refinement - TC:1 C:13 T:10.79ms L:1% G:1%
    |       \---solve - TC:1 C:23985 T:5.07ms L:47% G:0%
    |
    |---Compute_Keypoints_Orientation - TC:1 C:13 T:321.29ms L:21% G:21%
    |   \---fastAtan32f - TC:1 C:23946 T:14.92ms L:5% G:1%
    |       \---fastAtan32f - TC:1 C:23946 T:7.91ms L:53% G:1%
    |
    |---Compute_Descriptors - TC:1 C:13 T:63.65ms L:4% G:4%
    |   |---zeros - TC:1 C:13 T:0.01ms L:0% G:0%
    |   |---operator= - TC:1 C:13 T:0.06ms L:0% G:0%
    |   \---parallel_for_ - TC:1 C:13 T:63.48ms L:100% G:4%
    |       \---operator() - TC:8 C:10239 T:149.85ms L:236% G:10%
    |
    \---copyTo - TC:1 C:13 T:0.27ms L:0% G:0%
        \---ippicviCopy_8u_C1R_L<F_IPP> - TC:1 C:13 T:0.24ms L:89% G:0%

IPP weight: 20.2%
OPENCL weight: 0.0%
[/TRACE    ]
[       OK ] feature2d_detectAndExtract.detectAndExtract/4 (1528 ms)
[----------] 1 test from feature2d_detectAndExtract (1528 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1528 ms total)
[  PASSED  ] 1 test.

CURRENT (ba071d1):

Time compensation is 0
CTEST_FULL_OUTPUT
OpenCV version: 3.2.0-dev
OpenCV VCS version: 3.2.0-932-g6744b7cc3
Build type: release
Parallel framework: tbb
CPU features: popcnt mmx sse sse2 sse3 ssse3 sse4.1 sse4.2 avx avx2 fma3 fp16
cl_get_gt_device(): error, unknown device: ffffffff
cl_get_gt_device(): error, unknown device: ffffffff
cl_get_gt_device(): error, unknown device: ffffffff
OpenCL is disabled
Note: Google Test filter = feature2d_detectAndExtract.detectAndExtract/6:feature2d_detectAndExtract.detectAndExtract/7:feature2d_detectAndExtract.detectAndExtract/8
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from feature2d_detectAndExtract
[ RUN      ] feature2d_detectAndExtract.detectAndExtract/6
[ PERFSTAT ]    (samples = 10, mean = 76.67, median = 76.70, stddev = 1.14 (1.5%))
[ VALUE    ]    (AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")
[ TRACE    ]
ROOT
\---detectAndCompute - TC:1 C:10 T:766.70ms
    |---convertTo - TC:1 C:10 T:1.90ms L:0% G:0%
    |   \---ipp_convertTo<W_IPP> - TC:1 C:10 T:1.86ms L:98% G:0%
    |       \---::ipp::iwiScale<F_IPP> - TC:1 C:10 T:1.84ms L:99% G:0%
    |
    |---Allocate_Memory_Evolution - TC:1 C:10 T:0.14ms L:0% G:0%
    |---Create_Nonlinear_Scale_Space - TC:1 C:10 T:575.78ms L:75% G:75%
    |   |---GaussianBlur - TC:1 C:170 T:16.65ms L:3% G:2%
    |   |   \---ipp_GaussianBlur<W_IPP> - TC:1 C:170 T:16.28ms L:98% G:2%
    |   |       |---parallel_for_ - TC:1 C:90 T:14.42ms L:89% G:2%
    |   |       |   \---operator() - TC:8 C:2240 T:12.98ms L:90% G:2%
    |   |       |       \---operator()<W_IPP> - TC:8 C:2240 T:12.47ms L:96% G:2%
    |   |       |           \---::ipp::iwiFilterGaussian<F_IPP> - TC:8 C:2240 T:11.99ms L:96% G:2%
    |   |       |
    |   |       \---::ipp::iwiFilterGaussian<F_IPP> - TC:1 C:80 T:1.52ms L:9% G:0%
    |   |
    |   |---copyTo - TC:1 C:10 T:1.63ms L:0% G:0%
    |   |---copyTo - TC:1 C:120 T:9.20ms L:2% G:1%
    |   |   \---ippicviCopy_8u_C1R_L<F_IPP> - TC:1 C:120 T:8.88ms L:97% G:1%
    |   |
    |   |---compute_kcontrast - TC:1 C:10 T:17.30ms L:3% G:2%
    |   |   \---convertTo - TC:1 C:10 T:0.66ms L:4% G:0%
    |   |       \---ipp_convertTo<W_IPP> - TC:1 C:10 T:0.64ms L:97% G:0%
    |   |           \---::ipp::iwiScale<F_IPP> - TC:1 C:10 T:0.63ms L:98% G:0%
    |   |
    |   |---Scharr - TC:1 C:320 T:26.78ms L:5% G:3%
    |   |   \---ipp_Deriv<W_IPP> - TC:1 C:320 T:26.43ms L:99% G:3%
    |   |       \---::ipp::iwiFilterScharr<F_IPP> - TC:1 C:320 T:26.03ms L:98% G:3%
    |   |
    |   |---pm_g2 - TC:1 C:150 T:11.74ms L:2% G:2%
    |   |---parallel_for_ - TC:1 C:1660 T:269.44ms L:47% G:35%
    |   |   \---operator() - TC:8 C:253923 T:205.26ms L:76% G:27%
    |   |       \---nld_step_scalar_one_lane - TC:8 C:253923 T:88.20ms L:43% G:12%
    |   |
    |   |---add - TC:1 C:1660 T:25.94ms L:5% G:3%
    |   |   \---ippicviAdd_32f_C1R<F_IPP> - TC:1 C:1660 T:23.99ms L:92% G:3%
    |   |
    |   |---resize - TC:1 C:30 T:1.99ms L:0% G:0%
    |   |   \---resize - TC:1 C:30 T:1.88ms L:94% G:0%
    |   |       |---ipp_resize<W_IPP> - TC:1 C:30 T:0.01ms L:1% G:0%
    |   |       |---parallel_for_ - TC:1 C:20 T:1.09ms L:58% G:0%
    |   |       |   \---operator() - TC:2 C:20 T:0.80ms L:73% G:0%
    |   |       |
    |   |       \---parallel_for_ - TC:1 C:10 T:0.63ms L:33% G:0%
    |   |
    |   \---Compute_Determinant_Hessian_Response - TC:1 C:10 T:193.26ms L:34% G:25%
    |       \---parallel_for_ - TC:1 C:10 T:193.24ms L:100% G:25%
    |           \---operator() - TC:8 C:160 T:193.08ms L:100% G:25%
    |               |---compute_derivative_kernels - TC:8 C:320 T:1.17ms L:1% G:0%
    |               |   \---copyTo - TC:8 C:640 T:0.72ms L:62% G:0%
    |               |       \---ippicviCopy_8u_C1R_L<F_IPP> - TC:8 C:640 T:0.25ms L:35% G:0%
    |               |
    |               \---sepFilter2D - TC:8 C:800 T:186.08ms L:96% G:24%
    |                   |---convertTo - TC:8 C:1600 T:2.43ms L:1% G:0%
    |                   |   \---ipp_convertTo<W_IPP> - TC:8 C:1600 T:1.42ms L:58% G:0%
    |                   |       \---::ipp::iwiScale<F_IPP> - TC:8 C:1600 T:0.45ms L:32% G:0%
    |                   |
    |                   \---apply - TC:8 C:800 T:183.31ms L:99% G:24%
    |                       \---ippiOperator<W_IPP> - TC:8 C:168750 T:113.84ms L:62% G:15%
    |                           \---ippicviFilterRowBorderPipeline_32f_C1R<F_IPP> - TC:8 C:168750 T:57.25ms L:50% G:7%
    |
    |---Feature_Detection - TC:1 C:10 T:96.31ms L:13% G:13%
    |   |---Find_Scale_Space_Extrema - TC:1 C:10 T:87.95ms L:91% G:11%
    |   \---Do_Subpixel_Refinement - TC:1 C:10 T:8.33ms L:9% G:1%
    |       \---solve - TC:1 C:18400 T:3.88ms L:46% G:1%
    |
    |---Compute_Keypoints_Orientation - TC:1 C:10 T:32.10ms L:4% G:4%
    |   \---parallel_for_ - TC:1 C:10 T:32.10ms L:100% G:4%
    |       \---operator() - TC:8 C:7555 T:30.50ms L:95% G:4%
    |           \---fastAtan2 - TC:8 C:18360 T:18.38ms L:60% G:2%
    |               \---fastAtan32f - TC:8 C:18360 T:11.48ms L:62% G:1%
    |                   \---fastAtan32f - TC:8 C:18360 T:4.18ms L:36% G:1%
    |
    \---Compute_Descriptors - TC:1 C:10 T:54.78ms L:7% G:7%
        \---parallel_for_ - TC:1 C:10 T:54.71ms L:100% G:7%
            \---operator() - TC:8 C:8067 T:54.17ms L:99% G:7%

IPP weight: 17.3%
OPENCL weight: 0.0%
[/TRACE    ]
[       OK ] feature2d_detectAndExtract.detectAndExtract/6 (779 ms)
[ RUN      ] feature2d_detectAndExtract.detectAndExtract/7
[ PERFSTAT ]    (samples = 10, mean = 66.97, median = 67.19, stddev = 0.65 (1.0%))
[ VALUE    ]    (AKAZE_DEFAULT, "stitching/a3.png")
[ TRACE    ]
ROOT
\---detectAndCompute - TC:1 C:10 T:669.65ms
    |---convertTo - TC:1 C:10 T:1.56ms L:0% G:0%
    |   \---ipp_convertTo<W_IPP> - TC:1 C:10 T:1.52ms L:98% G:0%
    |       \---::ipp::iwiScale<F_IPP> - TC:1 C:10 T:1.50ms L:99% G:0%
    |
    |---Allocate_Memory_Evolution - TC:1 C:10 T:0.10ms L:0% G:0%
    |---Create_Nonlinear_Scale_Space - TC:1 C:10 T:539.96ms L:81% G:81%
    |   |---GaussianBlur - TC:1 C:130 T:13.03ms L:2% G:2%
    |   |   \---ipp_GaussianBlur<W_IPP> - TC:1 C:130 T:12.80ms L:98% G:2%
    |   |       |---parallel_for_ - TC:1 C:90 T:11.45ms L:89% G:2%
    |   |       |   \---operator() - TC:8 C:2080 T:10.16ms L:89% G:2%
    |   |       |       \---operator()<W_IPP> - TC:8 C:2080 T:9.68ms L:95% G:1%
    |   |       |           \---::ipp::iwiFilterGaussian<F_IPP> - TC:8 C:2080 T:9.24ms L:95% G:1%
    |   |       |
    |   |       \---::ipp::iwiFilterGaussian<F_IPP> - TC:1 C:40 T:1.10ms L:9% G:0%
    |   |
    |   |---copyTo - TC:1 C:10 T:1.35ms L:0% G:0%
    |   |---copyTo - TC:1 C:90 T:5.47ms L:1% G:1%
    |   |   \---ippicviCopy_8u_C1R_L<F_IPP> - TC:1 C:90 T:5.27ms L:96% G:1%
    |   |
    |   |---compute_kcontrast - TC:1 C:10 T:14.72ms L:3% G:2%
    |   |   \---convertTo - TC:1 C:10 T:0.56ms L:4% G:0%
    |   |       \---ipp_convertTo<W_IPP> - TC:1 C:10 T:0.55ms L:97% G:0%
    |   |           \---::ipp::iwiScale<F_IPP> - TC:1 C:10 T:0.54ms L:98% G:0%
    |   |
    |   |---Scharr - TC:1 C:240 T:19.86ms L:4% G:3%
    |   |   \---ipp_Deriv<W_IPP> - TC:1 C:240 T:19.62ms L:99% G:3%
    |   |       \---::ipp::iwiFilterScharr<F_IPP> - TC:1 C:240 T:19.32ms L:98% G:3%
    |   |
    |   |---pm_g2 - TC:1 C:110 T:7.88ms L:1% G:1%
    |   |---parallel_for_ - TC:1 C:760 T:239.34ms L:44% G:36%
    |   |   \---operator() - TC:8 C:227591 T:183.49ms L:77% G:27%
    |   |       \---nld_step_scalar_one_lane - TC:8 C:227591 T:74.65ms L:41% G:11%
    |   |
    |   |---add - TC:1 C:760 T:19.87ms L:4% G:3%
    |   |   \---ippicviAdd_32f_C1R<F_IPP> - TC:1 C:760 T:18.89ms L:95% G:3%
    |   |
    |   |---resize - TC:1 C:20 T:0.87ms L:0% G:0%
    |   |   \---resize - TC:1 C:20 T:0.81ms L:93% G:0%
    |   |       |---ipp_resize<W_IPP> - TC:1 C:20 T:0.01ms L:1% G:0%
    |   |       \---parallel_for_ - TC:1 C:20 T:0.74ms L:91% G:0%
    |   |           \---operator() - TC:2 C:20 T:0.46ms L:62% G:0%
    |   |
    |   \---Compute_Determinant_Hessian_Response - TC:1 C:10 T:216.48ms L:40% G:32%
    |       \---parallel_for_ - TC:1 C:10 T:216.47ms L:100% G:32%
    |           \---operator() - TC:8 C:120 T:216.33ms L:100% G:32%
    |               |---compute_derivative_kernels - TC:8 C:240 T:0.91ms L:0% G:0%
    |               |   \---copyTo - TC:8 C:480 T:0.58ms L:64% G:0%
    |               |       \---ippicviCopy_8u_C1R_L<F_IPP> - TC:8 C:480 T:0.21ms L:36% G:0%
    |               |
    |               \---sepFilter2D - TC:8 C:600 T:210.64ms L:97% G:31%
    |                   |---convertTo - TC:8 C:1200 T:2.11ms L:1% G:0%
    |                   |   \---ipp_convertTo<W_IPP> - TC:8 C:1200 T:1.22ms L:58% G:0%
    |                   |       \---::ipp::iwiScale<F_IPP> - TC:8 C:1200 T:0.42ms L:35% G:0%
    |                   |
    |                   \---apply - TC:8 C:600 T:208.67ms L:99% G:31%
    |                       \---ippiOperator<W_IPP> - TC:8 C:201600 T:140.09ms L:67% G:21%
    |                           \---ippicviFilterRowBorderPipeline_32f_C1R<F_IPP> - TC:8 C:201600 T:65.83ms L:47% G:10%
    |
    |---Feature_Detection - TC:1 C:10 T:61.85ms L:9% G:9%
    |   |---Find_Scale_Space_Extrema - TC:1 C:10 T:55.59ms L:90% G:8%
    |   \---Do_Subpixel_Refinement - TC:1 C:10 T:6.24ms L:10% G:1%
    |       \---solve - TC:1 C:14180 T:2.97ms L:48% G:0%
    |
    |---Compute_Keypoints_Orientation - TC:1 C:10 T:25.25ms L:4% G:4%
    |   \---parallel_for_ - TC:1 C:10 T:25.24ms L:100% G:4%
    |       \---operator() - TC:8 C:6788 T:23.82ms L:94% G:4%
    |           \---fastAtan2 - TC:8 C:14180 T:14.28ms L:60% G:2%
    |               \---fastAtan32f - TC:8 C:14180 T:8.90ms L:62% G:1%
    |                   \---fastAtan32f - TC:8 C:14180 T:3.32ms L:37% G:0%
    |
    \---Compute_Descriptors - TC:1 C:10 T:40.64ms L:6% G:6%
        \---parallel_for_ - TC:1 C:10 T:40.59ms L:100% G:6%
            \---operator() - TC:8 C:7038 T:40.12ms L:99% G:6%

IPP weight: 18.3%
OPENCL weight: 0.0%
[/TRACE    ]
[       OK ] feature2d_detectAndExtract.detectAndExtract/7 (682 ms)
[ RUN      ] feature2d_detectAndExtract.detectAndExtract/8
[ PERFSTAT ]    (samples = 10, mean = 196.35, median = 195.55, stddev = 1.82 (0.9%))
[ VALUE    ]    (AKAZE_DEFAULT, "stitching/s2.jpg")
[ TRACE    ]
ROOT
\---detectAndCompute - TC:1 C:10 T:1963.44ms
    |---convertTo - TC:1 C:10 T:2.85ms L:0% G:0%
    |   \---ipp_convertTo<W_IPP> - TC:1 C:10 T:2.82ms L:99% G:0%
    |       \---::ipp::iwiScale<F_IPP> - TC:1 C:10 T:2.80ms L:99% G:0%
    |
    |---Allocate_Memory_Evolution - TC:1 C:10 T:0.13ms L:0% G:0%
    |---Create_Nonlinear_Scale_Space - TC:1 C:10 T:779.65ms L:40% G:40%
    |   |---GaussianBlur - TC:1 C:170 T:33.91ms L:4% G:2%
    |   |   \---ipp_GaussianBlur<W_IPP> - TC:1 C:170 T:33.45ms L:99% G:2%
    |   |       |---parallel_for_ - TC:1 C:90 T:30.99ms L:93% G:2%
    |   |       |   \---operator() - TC:8 C:2720 T:29.24ms L:94% G:1%
    |   |       |       \---operator()<W_IPP> - TC:8 C:2720 T:28.67ms L:98% G:1%
    |   |       |           \---::ipp::iwiFilterGaussian<F_IPP> - TC:8 C:2720 T:28.17ms L:98% G:1%
    |   |       |
    |   |       \---::ipp::iwiFilterGaussian<F_IPP> - TC:1 C:80 T:2.07ms L:6% G:0%
    |   |
    |   |---copyTo - TC:1 C:10 T:3.12ms L:0% G:0%
    |   |---copyTo - TC:1 C:120 T:16.78ms L:2% G:1%
    |   |   \---ippicviCopy_8u_C1R_L<F_IPP> - TC:1 C:120 T:16.39ms L:98% G:1%
    |   |
    |   |---compute_kcontrast - TC:1 C:10 T:31.24ms L:4% G:2%
    |   |   \---convertTo - TC:1 C:10 T:1.23ms L:4% G:0%
    |   |       \---ipp_convertTo<W_IPP> - TC:1 C:10 T:1.21ms L:98% G:0%
    |   |           \---::ipp::iwiScale<F_IPP> - TC:1 C:10 T:1.20ms L:99% G:0%
    |   |
    |   |---Scharr - TC:1 C:320 T:47.77ms L:6% G:2%
    |   |   \---ipp_Deriv<W_IPP> - TC:1 C:320 T:47.43ms L:99% G:2%
    |   |       \---::ipp::iwiFilterScharr<F_IPP> - TC:1 C:320 T:46.99ms L:99% G:2%
    |   |
    |   |---pm_g2 - TC:1 C:150 T:21.81ms L:3% G:1%
    |   |---parallel_for_ - TC:1 C:1660 T:311.89ms L:40% G:16%
    |   |   \---operator() - TC:8 C:289449 T:242.03ms L:78% G:12%
    |   |       \---nld_step_scalar_one_lane - TC:8 C:289449 T:115.17ms L:48% G:6%
    |   |
    |   |---add - TC:1 C:1660 T:50.06ms L:6% G:3%
    |   |   \---ippicviAdd_32f_C1R<F_IPP> - TC:1 C:1660 T:48.02ms L:96% G:2%
    |   |
    |   |---resize - TC:1 C:30 T:7.97ms L:1% G:0%
    |   |   \---resize - TC:1 C:30 T:7.88ms L:99% G:0%
    |   |       |---ipp_resize<W_IPP> - TC:1 C:30 T:0.01ms L:0% G:0%
    |   |       |---parallel_for_ - TC:1 C:20 T:7.29ms L:93% G:0%
    |   |       |   \---operator() - TC:4 C:40 T:6.14ms L:84% G:0%
    |   |       |
    |   |       \---parallel_for_ - TC:1 C:10 T:0.30ms L:4% G:0%
    |   |
    |   \---Compute_Determinant_Hessian_Response - TC:1 C:10 T:253.08ms L:32% G:13%
    |       \---parallel_for_ - TC:1 C:10 T:253.07ms L:100% G:13%
    |           \---operator() - TC:8 C:160 T:252.89ms L:100% G:13%
    |               |---compute_derivative_kernels - TC:8 C:320 T:1.20ms L:0% G:0%
    |               |   \---copyTo - TC:8 C:640 T:0.74ms L:62% G:0%
    |               |       \---ippicviCopy_8u_C1R_L<F_IPP> - TC:8 C:640 T:0.24ms L:32% G:0%
    |               |
    |               \---sepFilter2D - TC:8 C:800 T:243.93ms L:96% G:12%
    |                   |---convertTo - TC:8 C:1600 T:2.40ms L:1% G:0%
    |                   |   \---ipp_convertTo<W_IPP> - TC:8 C:1600 T:1.38ms L:58% G:0%
    |                   |       \---::ipp::iwiScale<F_IPP> - TC:8 C:1600 T:0.46ms L:33% G:0%
    |                   |
    |                   \---apply - TC:8 C:800 T:241.43ms L:99% G:12%
    |                       \---ippiOperator<W_IPP> - TC:8 C:196800 T:144.94ms L:60% G:7%
    |                           \---ippicviFilterRowBorderPipeline_32f_C1R<F_IPP> - TC:8 C:196800 T:82.44ms L:57% G:4%
    |
    |---Feature_Detection - TC:1 C:10 T:824.76ms L:42% G:42%
    |   |---Find_Scale_Space_Extrema - TC:1 C:10 T:790.53ms L:96% G:40%
    |   \---Do_Subpixel_Refinement - TC:1 C:10 T:34.20ms L:4% G:2%
    |       \---solve - TC:1 C:75310 T:15.64ms L:46% G:1%
    |
    |---Compute_Keypoints_Orientation - TC:1 C:10 T:119.35ms L:6% G:6%
    |   \---parallel_for_ - TC:1 C:10 T:119.33ms L:100% G:6%
    |       \---operator() - TC:8 C:10596 T:117.30ms L:98% G:6%
    |           \---fastAtan2 - TC:8 C:75200 T:72.71ms L:62% G:4%
    |               \---fastAtan32f - TC:8 C:75200 T:45.09ms L:62% G:2%
    |                   \---fastAtan32f - TC:8 C:75200 T:16.34ms L:36% G:1%
    |
    \---Compute_Descriptors - TC:1 C:10 T:228.30ms L:12% G:12%
        \---parallel_for_ - TC:1 C:10 T:228.09ms L:100% G:12%
            \---operator() - TC:8 C:10921 T:227.32ms L:100% G:12%

IPP weight: 11.7%
OPENCL weight: 0.0%
[/TRACE    ]
[       OK ] feature2d_detectAndExtract.detectAndExtract/8 (1988 ms)
[----------] 3 tests from feature2d_detectAndExtract (3449 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (3449 ms total)
[  PASSED  ] 3 tests.

Failed branches:

These are branches that contains code that is faster, but not suitable for
including into main branch (there might be failing tests etc.):

test_scharr I have tried to
[replace Scharr operator in Compute_Determinant_Hessian_Response with Scharr
[with fixed 3x3 kernel.

akaze_octaves Reworked
[non-linear scale space pyramid so that diffusivity is propagated only inside
[octaves. Probably not worth it, since it damages accuracy.

@sovrasov sovrasov added the GSoC label Jun 21, 2017
@alalek
Copy link
Copy Markdown
Member

alalek commented Jun 21, 2017

@hrnr Please keep patches in a single PR: #8869

# git commit pending changes
git checkout -B akaze_part1 HEAD
git push origin akaze_part1

@alalek
Copy link
Copy Markdown
Member

alalek commented Jun 21, 2017

OK, we merged part 1 (#8869). Lets continue here.
Please rebase the latest commits on the current master.

@hrnr
Copy link
Copy Markdown
Contributor Author

hrnr commented Jun 21, 2017

Thanks. Sorry for confusion, I should have add more description, to clarify that the situation.

I will rebase for sure.

@ysolovyov
Copy link
Copy Markdown
Contributor

How much it speed up the algo on your machine?

@hrnr
Copy link
Copy Markdown
Contributor Author

hrnr commented Jun 22, 2017

This is work in progress. I will update the description with my measurements. The current improvement is minor.

edit: description updated, plese see the stats above. The current speedup is ~1.7x.

@hrnr
Copy link
Copy Markdown
Contributor Author

hrnr commented Jun 30, 2017

rebase due to merge conflict. The commit hashes are now different from what is reported above in the perf stats (I will fix that in the future).

hrnr added 9 commits June 30, 2017 12:06
* now test have images: 600x768, 900x600 and 1385x700 to cover different resolutions
* this takes 84% of time of Feature_Detection
* run everything in parallel
* compute Scharr kernels just once
* compute sigma more efficiently
* allocate all matrices in evolution without zeroing
* add Lflow and Lstep to evolution as in original AKAZE code
* improved readability for people familiar with opencv
* do not same image twice in base level
hrnr added 11 commits June 30, 2017 23:37
* use one pass stencil for diffusity from https://github.com/h2suzuki/fast_akaze
* improve locality in Create_Scale_Space
* this needs to be computed always as we need derivatives while computing descriptors
* fixed tests of AKAZE with KAZE descriptors which have been affected by this

Currently it computes all first and second order derivatives together and the determiant of the hessian. For descriptors it would be enough to compute just first order derivates, but it is not probably worth it optimize for scenario where descriptors and keypoints are computed separately, since it is already very inefficient. When computing keypoint and descriptors together it is faster to do it the current way (preserves locality).
* get rid of sharing buffers when creating scale space pyramid, the performace impact is neglegible
* ensures more stable output
* more reasonable profiles, since the first call of parallel_for_ is not getting big performace hit
* fixed bug that prevented computing determinant for scale pyramid of size 1 (just the base image)
* all descriptors now support writing to uninitialized memory
* use InputArray and OutputArray for input image and descriptors, allows to make use UMAt that user passes to us
* all parts that uses ocl-enabled functions should use ocl by now
* when OCL is disabled IPP version should be always prefered (even when the dst is UMat)
* this slows CPU version considerably
* do no run in parallel when running with OCL
@hrnr
Copy link
Copy Markdown
Contributor Author

hrnr commented Jul 10, 2017

I have evaluated the option of using CV_8U for images and derivations in AKAZE. It does not seem to be a viable path. The precision is affected badly. In our tests only a 40 keypoints have been found out of 507.

A viable option might be to use a half precision floats when they become widely available.

[ RUN      ] Features2d_DescriptorExtractor_AKAZE.regression
/home/henry/.opencv/modules/ts/src/ts.cpp:541: Failure
Failed

        failure reason: Invalid test data
        test case #-1
        seed: ffffffffffffffff
-----------------------------------
        LOG:

Average time of computing one descriptor = 4.13323e-06 ms.
Valid and computed descriptors matrices must have the same size and type.

-----------------------------------

[  FAILED  ] Features2d_DescriptorExtractor_AKAZE.regression (357 ms)
[----------] 1 test from Features2d_DescriptorExtractor_AKAZE (357 ms total)

[----------] 1 test from Features2d_DescriptorExtractor_AKAZE_DESCRIPTOR_KAZE
[ RUN      ] Features2d_DescriptorExtractor_AKAZE_DESCRIPTOR_KAZE.regression
/home/henry/.opencv/modules/ts/src/ts.cpp:541: Failure
Failed

        failure reason: Invalid test data
        test case #-1
        seed: ffffffffffffffff
-----------------------------------
        LOG:

Average time of computing one descriptor = 5.14768e-06 ms.
Valid and computed descriptors matrices must have the same size and type.

-----------------------------------

[  FAILED  ] Features2d_DescriptorExtractor_AKAZE_DESCRIPTOR_KAZE.regression (365 ms)
[----------] 1 test from Features2d_DescriptorExtractor_AKAZE_DESCRIPTOR_KAZE (365 ms total)

[----------] 2 tests from Features2d_Detector_AKAZE
[ RUN      ] Features2d_Detector_AKAZE.regression
/home/henry/.opencv/modules/ts/src/ts.cpp:541: Failure
Failed

        failure reason: Invalid function output
        test case #-1
        seed: ffffffffffffffff
-----------------------------------
        LOG:
Bad keypoints count ratio (validCount = 507, calcCount = 40).

-----------------------------------

[  FAILED  ] Features2d_Detector_AKAZE.regression (280 ms)
[ RUN      ] Features2d_Detector_AKAZE.detect_and_compute_split
[       OK ] Features2d_Detector_AKAZE.detect_and_compute_split (5 ms)
[----------] 2 tests from Features2d_Detector_AKAZE (285 ms total)

[----------] 1 test from Features2d_Detector_AKAZE_DESCRIPTOR_KAZE
[ RUN      ] Features2d_Detector_AKAZE_DESCRIPTOR_KAZE.regression
/home/henry/.opencv/modules/ts/src/ts.cpp:541: Failure
Failed

        failure reason: Invalid function output
        test case #-1
        seed: ffffffffffffffff
-----------------------------------
        LOG:
Bad keypoints count ratio (validCount = 439, calcCount = 35).

-----------------------------------

[  FAILED  ] Features2d_Detector_AKAZE_DESCRIPTOR_KAZE.regression (180 ms)
[----------] 1 test from Features2d_Detector_AKAZE_DESCRIPTOR_KAZE (180 ms total)

[----------] 1 test from Features2d_Detector_Keypoints_AKAZE
[ RUN      ] Features2d_Detector_Keypoints_AKAZE.validation
[       OK ] Features2d_Detector_Keypoints_AKAZE.validation (352 ms)
[----------] 1 test from Features2d_Detector_Keypoints_AKAZE (352 ms total)

[----------] 1 test from AKAZE/DescriptorRotationInvariance
[ RUN      ] AKAZE/DescriptorRotationInvariance.rotation/0
Intial keypoints: 40
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.925 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.925 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.8 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.775 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.9 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.9 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.975 vs 0.99
[  FAILED  ] AKAZE/DescriptorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <C0-6E D6-AC DD-55 00-00 30-80 D6-AC DD-55 00-00>, 16-byte object <30-BE D6-AC DD-55 00-00 00-A3 D6-AC DD-55 00-00>, 0.99) (9241 ms)
[----------] 1 test from AKAZE/DescriptorRotationInvariance (9241 ms total)

[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DescriptorRotationInvariance
[ RUN      ] AKAZE_DESCRIPTOR_KAZE/DescriptorRotationInvariance.rotation/0
Intial keypoints: 35
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.942857 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.628571 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.485714 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.228571 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.114286 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.114286 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0857143 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0857143 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0571429 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0285714 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0285714 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0285714 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0857143 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0857143 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.142857 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.228571 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.428571 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.6 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.857143 vs 0.99
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DescriptorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <70-65 D6-AC DD-55 00-00 60-6D D6-AC DD-55 00-00>, 16-byte object <50-68 D6-AC DD-55 00-00 60-74 D6-AC DD-55 00-00>, 0.99) (9208 ms)
[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DescriptorRotationInvariance (9208 ms total)

[----------] 1 test from AKAZE/DescriptorScaleInvariance
[ RUN      ] AKAZE/DescriptorScaleInvariance.scale/0
unknown file: Failure
C++ exception with description "/home/henry/.opencv/modules/features2d/src/kaze/AKAZEFeatures.cpp:930: error: (-215) 0 <= kpts[i].class_id && kpts[i].class_id < static_cast<int>(evolution_.size()) in function Compute_Descriptors
" thrown in the test body.
[  FAILED  ] AKAZE/DescriptorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <40-1B D8-AC DD-55 00-00 C0-88 D6-AC DD-55 00-00>, 16-byte object <C0-66 D6-AC DD-55 00-00 A0-67 D6-AC DD-55 00-00>, 0.6) (2932 ms)
[----------] 1 test from AKAZE/DescriptorScaleInvariance (2932 ms total)

[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DescriptorScaleInvariance
[ RUN      ] AKAZE_DESCRIPTOR_KAZE/DescriptorScaleInvariance.scale/0
unknown file: Failure
C++ exception with description "/home/henry/.opencv/modules/features2d/src/kaze/AKAZEFeatures.cpp:930: error: (-215) 0 <= kpts[i].class_id && kpts[i].class_id < static_cast<int>(evolution_.size()) in function Compute_Descriptors
" thrown in the test body.
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DescriptorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <D0-16 D8-AC DD-55 00-00 90-13 D8-AC DD-55 00-00>, 16-byte object <60-13 D8-AC DD-55 00-00 D0-22 D8-AC DD-55 00-00>, 0.55) (2987 ms)
[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DescriptorScaleInvariance (2987 ms total)

[----------] 1 test from AKAZE/DetectorRotationInvariance
[ RUN      ] AKAZE/DetectorRotationInvariance.rotation/0
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.225 vs 0.5
angle: 15
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.175 vs 0.5
angle: 30
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.175 vs 0.5
angle: 45
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0.714286 vs 0.76
angle: 45
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.075 vs 0.5
angle: 60
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.05 vs 0.5
angle: 75
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0.5 vs 0.76
angle: 75
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.1 vs 0.5
angle: 90
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 90
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.05 vs 0.5
angle: 105
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 105
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.025 vs 0.5
angle: 120
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 120
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.025 vs 0.5
angle: 135
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 135
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 150
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 165
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 180
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.025 vs 0.5
angle: 195
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 195
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 210
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 225
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 240
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.025 vs 0.5
angle: 255
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 255
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 270
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.1 vs 0.5
angle: 285
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 285
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.05 vs 0.5
angle: 300
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 300
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.075 vs 0.5
angle: 315
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 315
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.2 vs 0.5
angle: 330
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0.625 vs 0.76
angle: 330
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.225 vs 0.5
angle: 345
[  FAILED  ] AKAZE/DetectorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <B0-26 D8-AC DD-55 00-00 E0-26 D8-AC DD-55 00-00>, 0.5, 0.76) (9367 ms)
[----------] 1 test from AKAZE/DetectorRotationInvariance (9367 ms total)

[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DetectorRotationInvariance
[ RUN      ] AKAZE_DESCRIPTOR_KAZE/DetectorRotationInvariance.rotation/0
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.257143 vs 0.5
angle: 15
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.2 vs 0.5
angle: 30
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.171429 vs 0.5
angle: 45
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0571429 vs 0.5
angle: 60
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0571429 vs 0.5
angle: 75
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0.5 vs 0.76
angle: 75
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.114286 vs 0.5
angle: 90
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 90
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0285714 vs 0.5
angle: 105
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 105
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 120
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0285714 vs 0.5
angle: 135
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 135
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 150
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 165
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 180
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0285714 vs 0.5
angle: 195
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 195
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 210
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 225
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 240
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0285714 vs 0.5
angle: 255
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 255
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 270
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0857143 vs 0.5
angle: 285
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 285
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0571429 vs 0.5
angle: 300
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 300
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0857143 vs 0.5
angle: 315
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 315
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.2 vs 0.5
angle: 330
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0.714286 vs 0.76
angle: 330
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.2 vs 0.5
angle: 345
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0.714286 vs 0.76
angle: 345
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DetectorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <A0-DB D8-AC DD-55 00-00 10-27 D8-AC DD-55 00-00>, 0.5, 0.76) (9356 ms)
[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DetectorRotationInvariance (9356 ms total)

[----------] 1 test from AKAZE/DetectorScaleInvariance
[ RUN      ] AKAZE/DetectorScaleInvariance.scale/0
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:210: Failure
Expected: (scaleInliersRatio) >= (minInliersRatio), actual: 0.222222 vs 0.49
[  FAILED  ] AKAZE/DetectorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <60-99 D9-AC DD-55 00-00 50-94 D9-AC DD-55 00-00>, 0.08, 0.49) (2081 ms)
[----------] 1 test from AKAZE/DetectorScaleInvariance (2081 ms total)

[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DetectorScaleInvariance
[ RUN      ] AKAZE_DESCRIPTOR_KAZE/DetectorScaleInvariance.scale/0
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:210: Failure
Expected: (scaleInliersRatio) >= (minInliersRatio), actual: 0.266667 vs 0.49
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:173: Failure
Expected: (keypoints1.size()) >= (15u), actual: 14 vs 15
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DetectorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <D0-9D D9-AC DD-55 00-00 30-99 D9-AC DD-55 00-00>, 0.08, 0.49) (2099 ms)
[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DetectorScaleInvariance (2100 ms total)

[----------] Global test environment tear-down
[==========] 14 tests from 13 test cases ran. (48812 ms total)
[  PASSED  ] 2 tests.
[  FAILED  ] 12 tests, listed below:
[  FAILED  ] Features2d_DescriptorExtractor_AKAZE.regression
[  FAILED  ] Features2d_DescriptorExtractor_AKAZE_DESCRIPTOR_KAZE.regression
[  FAILED  ] Features2d_Detector_AKAZE.regression
[  FAILED  ] Features2d_Detector_AKAZE_DESCRIPTOR_KAZE.regression
[  FAILED  ] AKAZE/DescriptorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <C0-6E D6-AC DD-55 00-00 30-80 D6-AC DD-55 00-00>, 16-byte object <30-BE D6-AC DD-55 00-00 00-A3 D6-AC DD-55 00-00>, 0.99)
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DescriptorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <70-65 D6-AC DD-55 00-00 60-6D D6-AC DD-55 00-00>, 16-byte object <50-68 D6-AC DD-55 00-00 60-74 D6-AC DD-55 00-00>, 0.99)
[  FAILED  ] AKAZE/DescriptorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <40-1B D8-AC DD-55 00-00 C0-88 D6-AC DD-55 00-00>, 16-byte object <C0-66 D6-AC DD-55 00-00 A0-67 D6-AC DD-55 00-00>, 0.6)
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DescriptorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <D0-16 D8-AC DD-55 00-00 90-13 D8-AC DD-55 00-00>, 16-byte object <60-13 D8-AC DD-55 00-00 D0-22 D8-AC DD-55 00-00>, 0.55)
[  FAILED  ] AKAZE/DetectorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <B0-26 D8-AC DD-55 00-00 E0-26 D8-AC DD-55 00-00>, 0.5, 0.76)
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DetectorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <A0-DB D8-AC DD-55 00-00 10-27 D8-AC DD-55 00-00>, 0.5, 0.76)
[  FAILED  ] AKAZE/DetectorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <60-99 D9-AC DD-55 00-00 50-94 D9-AC DD-55 00-00>, 0.08, 0.49)
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DetectorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <D0-9D D9-AC DD-55 00-00 30-99 D9-AC DD-55 00-00>, 0.08, 0.49)

* diffusivity itself is not a blocker, but this saves us downloading and uploading derivations
@bmagyar
Copy link
Copy Markdown
Contributor

bmagyar commented Jul 11, 2017

It was worth a shot!
Half precision floats are a good idea although more for the long shot. This is a note that I think could be added to the notes of the descriptor as TODO. Someone may pick it up in the future.

@hrnr
Copy link
Copy Markdown
Contributor Author

hrnr commented Jul 12, 2017

I have finally got the perf measurements for OCL version on GRID K520 nvidia card with OpenCL 1.2. The performance as now is pretty bad, much slower than CPU version.

Nevertheless I wasn't able to reproduce the test failure that occurs on Linux OCL buildbot.

There is a bug in computing keypoint orientation and computing descriptors, which causes matrices to be downloaded again and again for each keypoint. I need to fix this and then the times will be back reasonable.

Apart from this bug, there is a lot of transfers between CPU and GPU while building the scale pyramid. I'm working on porting fast explicit diffusion to GPU, so that almost whole pyramid could be computed on GPU.

Some OpenCL functions (GaussianBlur, Scharr) execute non-optimal OCL paths, this will be subject to fine-tuning later. In current state they are slower that IPP equivalents (which is bad).

Time compensation is 0
CTEST_FULL_OUTPUT
OpenCV version: 3.2.0-dev
OpenCV VCS version: 3.1.0-3096-g6f5382a6e
Build type: release
Parallel framework: tbb
CPU features: mmx sse sse2 sse3
OpenCL Platforms: 
    NVIDIA CUDA
        dGPU: GRID K520 (OpenCL 1.2 CUDA)
Current OpenCL device: 
    Type = dGPU
    Name = GRID K520
    Version = OpenCL 1.2 CUDA
    Driver version = 352.99
    Compute units = 8
    Max work group size = 1024
    Local memory size = 48 kB 
    Max memory allocation size = 1023 MB 976 kB 
    Double support = Yes
    Host unified memory = No
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 1
    Preferred vector width short = 1
    Preferred vector width int = 1
    Preferred vector width long = 1
    Preferred vector width float = 1
    Preferred vector width double = 1
Note: Google Test filter = OCL_feature2d_detectAndExtract.detectAndExtract/6:OCL_feature2d_detectAndExtract.detectAndExtract/7:OCL_feature2d_detectAndExtract.detectAndExtract/8
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from OCL_feature2d_detectAndExtract
[ RUN      ] OCL_feature2d_detectAndExtract.detectAndExtract/6
.
.
[ PERFSTAT ]    (samples = 3, mean = 9472.25, median = 9702.75, stddev = 1071.39 (11.3%))
[ VALUE    ] 	(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")
[ TRACE    ]
ROOT
\---detectAndCompute - TC:1 C:3 T:28416.68ms
    |---convertTo - TC:1 C:3 T:1.67ms L:0% G:0%
    |   \---ipp_convertTo<W_IPP> - TC:1 C:3 T:1.63ms L:97% G:0%
    |       \---::ipp::iwiScale<F_IPP> - TC:1 C:3 T:1.61ms L:99% G:0%
    |   
    |---Allocate_Memory_Evolution - TC:1 C:3 T:0.11ms L:0% G:0%
    |---Create_Nonlinear_Scale_Space - TC:1 C:3 T:529.17ms L:2% G:2%
    |   |---GaussianBlur - TC:1 C:51 T:82.13ms L:16% G:0%
    |   |   |---ipp_GaussianBlur<W_IPP> - TC:1 C:3 T:9.89ms L:12% G:0%
    |   |   |   \---parallel_for_ - TC:1 C:3 T:3.04ms L:31% G:0%
    |   |   |       \---operator() - TC:8 C:96 T:1.81ms L:60% G:0%
    |   |   |           \---operator()<W_IPP> - TC:8 C:96 T:1.49ms L:82% G:0%
    |   |   |               \---::ipp::iwiFilterGaussian<F_IPP> - TC:8 C:96 T:1.41ms L:95% G:0%
    |   |   |   
    |   |   \---sepFilter2D - TC:1 C:48 T:71.23ms L:87% G:0%
    |   |       |---convertTo - TC:1 C:96 T:0.48ms L:1% G:0%
    |   |       |   \---ipp_convertTo<W_IPP> - TC:1 C:96 T:0.30ms L:62% G:0%
    |   |       |       \---::ipp::iwiScale<F_IPP> - TC:1 C:96 T:0.15ms L:49% G:0%
    |   |       |   
    |   |       |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:2.33ms L:3% G:0%
    |   |       |---row_filter<F_OCL> - TC:1 C:48 T:18.61ms L:26% G:0%
    |   |       |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.42ms L:1% G:0%
    |   |       |---col_filter<F_OCL> - TC:1 C:48 T:10.76ms L:15% G:0%
    |   |       \---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.56ms L:1% G:0%
    |   |   
    |   |---copyTo - TC:1 C:3 T:3.11ms L:1% G:0%
    |   |---copyTo - TC:1 C:36 T:5.98ms L:1% G:0%
    |   |   \---ippicviCopy_8u_C1R_L<F_IPP> - TC:1 C:36 T:5.60ms L:94% G:0%
    |   |   
    |   |---compute_kcontrast - TC:1 C:3 T:8.34ms L:2% G:0%
    |   |   \---convertTo - TC:1 C:3 T:0.49ms L:6% G:0%
    |   |       \---ipp_convertTo<W_IPP> - TC:1 C:3 T:0.47ms L:97% G:0%
    |   |           \---::ipp::iwiScale<F_IPP> - TC:1 C:3 T:0.46ms L:98% G:0%
    |   |   
    |   |---Scharr - TC:1 C:96 T:51.76ms L:10% G:0%
    |   |   |---convertTo - TC:1 C:192 T:0.89ms L:2% G:0%
    |   |   |   \---ipp_convertTo<W_IPP> - TC:1 C:192 T:0.57ms L:64% G:0%
    |   |   |       \---::ipp::iwiScale<F_IPP> - TC:1 C:192 T:0.28ms L:49% G:0%
    |   |   |   
    |   |   \---sepFilter2D - TC:1 C:96 T:48.60ms L:94% G:0%
    |   |       |---convertTo - TC:1 C:192 T:0.65ms L:1% G:0%
    |   |       |   \---ipp_convertTo<W_IPP> - TC:1 C:192 T:0.37ms L:57% G:0%
    |   |       |       \---::ipp::iwiScale<F_IPP> - TC:1 C:192 T:0.15ms L:41% G:0%
    |   |       |   
    |   |       |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.52ms L:1% G:0%
    |   |       |---row_filter<F_OCL> - TC:1 C:96 T:23.80ms L:49% G:0%
    |   |       |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.38ms L:1% G:0%
    |   |       |---col_filter<F_OCL> - TC:1 C:96 T:14.18ms L:29% G:0%
    |   |       |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.50ms L:1% G:0%
    |   |       |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.40ms L:1% G:0%
    |   |       |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.53ms L:1% G:0%
    |   |       \---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.52ms L:1% G:0%
    |   |   
    |   |---Compile: 784c1f97388a88e9 options: <W_OCL> - TC:1 C:1 T:0.46ms L:0% G:0%
    |   |---AKAZE_pm_g2<F_OCL> - TC:1 C:45 T:4.78ms L:1% G:0%
    |   |---parallel_for_ - TC:1 C:498 T:90.60ms L:17% G:0%
    |   |   \---operator() - TC:8 C:25938 T:69.37ms L:77% G:0%
    |   |       \---nld_step_scalar_one_lane - TC:8 C:25938 T:42.27ms L:61% G:0%
    |   |   
    |   |---add - TC:1 C:498 T:18.60ms L:4% G:0%
    |   |   \---ippicviAdd_32f_C1R<F_IPP> - TC:1 C:498 T:16.41ms L:88% G:0%
    |   |   
    |   |---resize - TC:1 C:9 T:1.25ms L:0% G:0%
    |   |   \---resize - TC:1 C:9 T:1.12ms L:90% G:0%
    |   |       |---ipp_resize<W_IPP> - TC:1 C:9 T:0.02ms L:1% G:0%
    |   |       |---parallel_for_ - TC:1 C:6 T:0.69ms L:61% G:0%
    |   |       |   \---operator() - TC:2 C:6 T:0.48ms L:69% G:0%
    |   |       |   
    |   |       \---parallel_for_ - TC:1 C:3 T:0.28ms L:25% G:0%
    |   |   
    |   \---Compute_Determinant_Hessian_Response - TC:1 C:3 T:217.58ms L:41% G:1%
    |       |---compute_derivative_kernels - TC:1 C:96 T:0.92ms L:0% G:0%
    |       |   \---copyTo - TC:1 C:192 T:0.47ms L:51% G:0%
    |       |       \---ippicviCopy_8u_C1R_L<F_IPP> - TC:1 C:192 T:0.18ms L:38% G:0%
    |       |   
    |       \---sepFilter2D - TC:1 C:240 T:109.57ms L:50% G:0%
    |           |---convertTo - TC:1 C:480 T:2.18ms L:2% G:0%
    |           |   \---ipp_convertTo<W_IPP> - TC:1 C:480 T:1.18ms L:54% G:0%
    |           |       \---::ipp::iwiScale<F_IPP> - TC:1 C:480 T:0.48ms L:41% G:0%
    |           |   
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.64ms L:1% G:0%
    |           |---row_filter<F_OCL> - TC:1 C:240 T:46.11ms L:42% G:0%
    |           |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.57ms L:1% G:0%
    |           |---col_filter<F_OCL> - TC:1 C:240 T:30.56ms L:28% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.63ms L:1% G:0%
    |           |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.41ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.63ms L:1% G:0%
    |           |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.45ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.52ms L:0% G:0%
    |           |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.41ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.64ms L:1% G:0%
    |           |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.37ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.47ms L:0% G:0%
    |           |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.38ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.60ms L:1% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.48ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.48ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.47ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.47ms L:0% G:0%
    |           \---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.53ms L:0% G:0%
    |   
    |---Feature_Detection - TC:1 C:3 T:51.55ms L:0% G:0%
    |   |---Find_Scale_Space_Extrema - TC:1 C:3 T:45.32ms L:88% G:0%
    |   \---Do_Subpixel_Refinement - TC:1 C:3 T:6.21ms L:12% G:0%
    |       \---solve - TC:1 C:5520 T:3.28ms L:53% G:0%
    |   
    |---Compute_Keypoints_Orientation - TC:1 C:3 T:10005.75ms L:35% G:35%
    |   \---parallel_for_ - TC:1 C:3 T:10005.73ms L:100% G:35%
    |       \---operator() - TC:8 C:271 T:10005.37ms L:100% G:35%
    |           \---fastAtan2 - TC:8 C:5508 T:5.67ms L:0% G:0%
    |               \---fastAtan32f - TC:8 C:5508 T:2.65ms L:47% G:0%
    |                   \---fastAtan32f - TC:8 C:5508 T:1.17ms L:44% G:0%
    |   
    \---Compute_Descriptors - TC:1 C:3 T:17821.87ms L:63% G:63%
        \---parallel_for_ - TC:1 C:3 T:17821.25ms L:100% G:63%
            \---operator() - TC:8 C:271 T:17820.72ms L:100% G:63%

IPP weight: 0.1%
OPENCL weight: 0.5%
[/TRACE    ]
[       OK ] OCL_feature2d_detectAndExtract.detectAndExtract/6 (28438 ms)
[ RUN      ] OCL_feature2d_detectAndExtract.detectAndExtract/7

Geometric mean

                                                     Name of Test                                                          perf        perf         perf        perf       perf   
                                                                                                                        8200996b1   ba071d1ad    6f5382a6e   ba071d1ad  6f5382a6e 
                                                                                                                                                                 vs         vs    
                                                                                                                                                                perf       perf   
                                                                                                                                                             8200996b1  8200996b1 
                                                                                                                                                             (x-factor) (x-factor)
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 155.292 ms 10895.510 ms 10923.836 ms    0.01       0.01   
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                    114.314 ms 6721.282 ms  7428.931 ms     0.02       0.02   
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                    495.586 ms 68924.386 ms 92936.885 ms    0.01       0.01   

hrnr added 3 commits July 13, 2017 16:50
we don't want to downlaod matrices ad hoc from gpu when the function in AKAZE needs it. There is a HUGE mapping overhead and without shared memory support a LOT of unnecessary transfers.

This maps/downloads matrices just once.
* this was causing spurious segfaults in stitching tests due to propagation of NaNs
* added new test, which checks for NaNs (added new debug asserts for NaNs)
* valgrind now says everything is ok
@hrnr
Copy link
Copy Markdown
Contributor Author

hrnr commented Jul 13, 2017

The builders are green again. I have spent this day bugging with valgrind and gdb to hunt down the bug that was failing the builder. Initialy there was uninitialized memory in just four pixels int the corners, it was spread though the pyramid and messed the results. It also caused crashes via segfaults, if the uninitialized memory could be interpreted as float NaNs.

This was quite hard bug to track down, because it was just 4 pixels that has been uninitialized, so it did not caused too much problem. The bug is also highly dependent on selected allocator, which is why it was causing problems only with OpenCL. I'm not sure why it did not cause any problem for Windows OpenCl.

I have also fixed the other bug with OpenCL, which caused matrices to be downloaded from GPU multiple times. OpenCL times are now back reasonable, although not really fast.

After fixing those 2 bugs, CPU times are a bit worse, but nothing horrible. I look into that see if I can make them better without breaking OpenCL again.


Geometric mean

                                                   Name of Test                                                        perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf   
                                                                                                                    8200996b1  b12facf48  aa5a72b46  8cc0b286c  1d3f7fe9e  c13351891  76151e566  ea089a8ab  f9c2951fa  ba071d1ad  09c7288de  b12facf48  aa5a72b46  8cc0b286c  1d3f7fe9e  c13351891  76151e566  ea089a8ab  f9c2951fa  ba071d1ad  09c7288de 
                                                                                                                                                                                                                                                 vs         vs         vs         vs         vs         vs         vs         vs         vs         vs    
                                                                                                                                                                                                                                                perf       perf       perf       perf       perf       perf       perf       perf       perf       perf   
                                                                                                                                                                                                                                             8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1 
                                                                                                                                                                                                                                             (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor)
detectAndExtract::feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 71.572 ms  67.583 ms  44.742 ms  45.349 ms  42.336 ms  40.726 ms  38.732 ms  38.268 ms  43.283 ms  38.549 ms  39.980 ms     1.06       1.60       1.58       1.69       1.76       1.85       1.87       1.65       1.86       1.79   
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                    56.269 ms  52.597 ms  34.432 ms  34.941 ms  32.298 ms  31.299 ms  29.619 ms  29.137 ms  31.197 ms  29.492 ms  30.396 ms     1.07       1.63       1.61       1.74       1.80       1.90       1.93       1.80       1.91       1.85   
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                    273.459 ms 263.874 ms 163.169 ms 168.078 ms 153.912 ms 164.177 ms 147.598 ms 145.255 ms 156.495 ms 150.994 ms 156.637 ms    1.04       1.68       1.63       1.78       1.67       1.85       1.88       1.75       1.81       1.75   

I have kernel for OpenCL for non-linear diffusion prepared, after it will be deployed, the whole pyramid construction could be done on GPU.

* Lt in pyramid changed to UMat, it will be downlaoded from GPU along with Lx, Ly
* fix bug in pm_g2 kernel. OpenCV mangles dimensions passed to OpenCL, so we need to check for boundaries in each OCL kernel.
hrnr added 2 commits July 17, 2017 15:36
* computing of determinant is not a blocker, but with this change we don't need to download all spatial derivatives to CPU, we only download determinant
* make Ldet in the pyramid UMat, download it from CPU together with the other parts of the pyramid
* add profiling macros
@hrnr
Copy link
Copy Markdown
Contributor Author

hrnr commented Jul 18, 2017

I'm finished with basic OpenCL support in AKAZE. Creation of the scale space pyramid runs almost fully on GPU (except computing k factor, which runs just once before constructing the pyramid). For computing keypoints and descriptors OCL is not supported. Supporting OCL for remaining parts might be interesting only after the creation of the pyramid will be faster, so that the remaining parts become a bottleneck.

The current OCL performace is not very good. GaussianBlur, Scharr, sepFilter2D all execute non-optimal OCL paths, especially GaussianBlur and Scharr are slower compared to IPP version. This will need to be optimized.

Performace result with NVIDIA GRID K520:


Geometric mean

                                                     Name of Test                                                          perf       perf       perf       perf       perf       perf       perf   
                                                                                                                        8200996b1  09c7288de  d71718dea  61a35d7a6  09c7288de  d71718dea  61a35d7a6 
                                                                                                                                                                        vs         vs         vs    
                                                                                                                                                                       perf       perf       perf   
                                                                                                                                                                    8200996b1  8200996b1  8200996b1 
                                                                                                                                                                    (x-factor) (x-factor) (x-factor)
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 156.955 ms 198.360 ms 179.362 ms 165.495 ms    0.79       0.88       0.95   
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                    112.895 ms 126.513 ms 139.499 ms 121.170 ms    0.89       0.81       0.93   
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                    502.205 ms 418.058 ms 394.685 ms 355.424 ms    1.20       1.27       1.41   

The same machine without OpenCL (8 cores):


Geometric mean

                                                   Name of Test                                                        perf       perf       perf       perf       perf       perf       perf   
                                                                                                                    8200996b1  09c7288de  d71718dea  61a35d7a6  09c7288de  d71718dea  61a35d7a6 
                                                                                                                                                                    vs         vs         vs    
                                                                                                                                                                   perf       perf       perf   
                                                                                                                                                                8200996b1  8200996b1  8200996b1 
                                                                                                                                                                (x-factor) (x-factor) (x-factor)
detectAndExtract::feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 146.956 ms 75.688 ms  75.397 ms  76.015 ms     1.94       1.95       1.93   
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                    114.665 ms 56.766 ms  56.650 ms  57.000 ms     2.02       2.02       2.01   
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                    505.238 ms 292.877 ms 296.262 ms 294.253 ms    1.73       1.71       1.72   

@hrnr
Copy link
Copy Markdown
Contributor Author

hrnr commented Jul 19, 2017

I have also tried the current OCL version on intel hardware (Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz). The ocl implementation is slower than CPU, this is partly because of the unopmized paths for Gaussian and Scharr and partly just because intel GPU is slow. The intel GPU seems to execute better path for sepFilter2D, later I will try to get the same of better on nvidia too.


Geometric mean

                                                     Name of Test                                                          perf       perf       perf       perf       perf       perf       perf   
                                                                                                                        8200996b1  09c7288de  d71718dea  61a35d7a6  09c7288de  d71718dea  61a35d7a6 
                                                                                                                                                                        vs         vs         vs    
                                                                                                                                                                       perf       perf       perf   
                                                                                                                                                                    8200996b1  8200996b1  8200996b1 
                                                                                                                                                                    (x-factor) (x-factor) (x-factor)
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 78.903 ms  276.286 ms 372.727 ms 221.087 ms    0.29       0.21       0.36   
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                    75.532 ms  93.295 ms  80.923 ms  84.733 ms     0.81       0.93       0.89   
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                    277.869 ms 341.712 ms 343.082 ms 281.725 ms    0.81       0.81       0.99   

The first test is influenced by kernels compilations time. The same machine with disabled OCL:


Geometric mean

                                                   Name of Test                                                        perf       perf       perf       perf       perf       perf       perf   
                                                                                                                    8200996b1  09c7288de  d71718dea  61a35d7a6  09c7288de  d71718dea  61a35d7a6 
                                                                                                                                                                    vs         vs         vs    
                                                                                                                                                                   perf       perf       perf   
                                                                                                                                                                8200996b1  8200996b1  8200996b1 
                                                                                                                                                                (x-factor) (x-factor) (x-factor)
detectAndExtract::feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 110.199 ms 38.914 ms  39.671 ms  39.441 ms     2.83       2.78       2.79   
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                    66.143 ms  29.708 ms  29.918 ms  29.904 ms     2.23       2.21       2.21   
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                    310.263 ms 150.959 ms 160.233 ms 154.761 ms    2.06       1.94       2.00   

CPU speedups up to 2.8x looks nice for the current code.

* TEvolution is used only in KAZE now
@bmagyar
Copy link
Copy Markdown
Contributor

bmagyar commented Jul 24, 2017

This PR is now concluded as everything in the work package has been completed.
Could we please get a review and merge @vpisarev ?

@vpisarev vpisarev self-assigned this Jul 31, 2017
@vpisarev
Copy link
Copy Markdown
Contributor

👍

@vpisarev vpisarev added this to the 3.3 milestone Jul 31, 2017
Mat mask;
vector<KeyPoint> points;
// initialize task scheduler for TBB
cv::setNumThreads(cv::getNumberOfCPUs());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is useful to get consistent results with instrumentation (first parallel function does not take the initialization penalty). But if you don't like this hack, it can be removed, just the timing will be less stable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be done in the ts module: #9278

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I have reverted this change. With #9278 it'll be fine.

@hrnr
Copy link
Copy Markdown
Contributor Author

hrnr commented Aug 1, 2017

Anything more I should fix?

@alalek
Copy link
Copy Markdown
Member

alalek commented Aug 1, 2017

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants