[GSOC] Speeding-up AKAZE, part #2 by hrnr · Pull Request #8951 · opencv/opencv

hrnr · 2017-06-21T09:39:57Z

This part focuses on performance improvements for AKAZE on CPU and implementing basic OCL support.

cc: @bmagyar

OpenCL:

some parts (mainly construction of the pyramid) execute OpenCL paths.

Create_Nonlinear_Scale_Space almost all operatiins are OpenCl enabled
- GaussianBlur - executes OCL, but less effective path through sepFilter2D
- Scharr - has ocl
- resize - has ocl
- kfactor estimation - needs custom ocl kernel, might reuse some parts for computing histogram
- diffussion + conductivity - needs custom ocl kernel
- Compute_Determinant_Hessian_Response executes OpenCL general sepFilter2D, computing determinant would need custom kernel, but is not currently blocker
Feature_Detection no OCL
Compute_Keypoints_Orientation no OCL
Compute_Descriptors no OCL

CPU status:

Create_Nonlinear_Scale_Space reworked, some intrinsics might help with diffusion
- Compute_Determinant_Hessian_Response reworked, needs specialized fine tuning for kernels 5x5, 7x7, 9x9
Feature_Detection this will be reworked together with GPU part, we might want the same format of keypoints on GPU and CPU
Compute_Keypoints_Orientation reworked
Compute_Descriptors not a blocker

The main parts (and largest bottlenecks) are:

Create_Nonlinear_Scale_Space ~19% ~21ms
- needs to be reworked
Feature_Detection ~52% ~60ms, of which:
- Compute_Determinant_Hessian_Response 83% (over 43% of the whole algorithm)
  - reworked slightly -> reduced Feature_Detection to ~58ms
- Find_Scale_Space_Extrema 16% (globally just 8%)
  - this might be worth to look into when other parts will be optimized
- Do_Subpixel_Refinement 1%
  - not a bottleneck in the current state
Compute_Keypoints_Orientation ~21% ~24ms
- reworked completely, replaced by a faster implementation, parallelized -> reduced to 3.69ms
Compute_Descriptors ~4% ~4.8ms
- not a bottleneck in the current state

Improvement in Compute_Determinant_Hessian_Response is of cause not very
satisfying. The main problem there is that AKAZE uses Scharr with non-standard
size kernel (different from 3x3). Implementing Scharr with sepFilter2D is much
slower than specialized cv::Scharr we have.

I have tried to replace the sepFilter2D implementation with cv:Scharr,
just to get the idea of possible speedup. It is in the separate branch. With
cv::Scharr Compute_Determinant_Hessian_Response went from ~47ms to ~11ms,
which is quite significant. However it is not possible to replace Scharr just
like, so the tests are failing. I have opened pablofdezalc/akaze#32 to get
some addition info on this.

Performance stats per commit:

I'm testing all CPU performance on Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz.


Geometric mean

                                                   Name of Test                                                        perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf
                                                                                                                    8200996b1  b12facf48  aa5a72b46  8cc0b286c  1d3f7fe9e  c13351891  76151e566  ea089a8ab  f9c2951fa  ba071d1ad  b12facf48  aa5a72b46  8cc0b286c  1d3f7fe9e  c13351891  76151e566  ea089a8ab  f9c2951fa  ba071d1ad
                                                                                                                                                                                                                                      vs         vs         vs         vs         vs         vs         vs         vs         vs
                                                                                                                                                                                                                                     perf       perf       perf       perf       perf       perf       perf       perf       perf
                                                                                                                                                                                                                                  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1
                                                                                                                                                                                                                                  (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor)
detectAndExtract::feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 71.572 ms  67.583 ms  44.742 ms  45.349 ms  42.336 ms  40.726 ms  38.732 ms  38.268 ms  43.283 ms  38.549 ms     1.06       1.60       1.58       1.69       1.76       1.85       1.87       1.65       1.86
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                    56.269 ms  52.597 ms  34.432 ms  34.941 ms  32.298 ms  31.299 ms  29.619 ms  29.137 ms  31.197 ms  29.492 ms     1.07       1.63       1.61       1.74       1.80       1.90       1.93       1.80       1.91
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                    273.459 ms 263.874 ms 163.169 ms 168.078 ms 153.912 ms 164.177 ms 147.598 ms 145.255 ms 156.495 ms 150.994 ms    1.04       1.68       1.63       1.78       1.67       1.85       1.88       1.75       1.81

If you want to reproduce these results you on your own machine, you might find this performance evaluation script helpful. It automates running perf tests for this pull request across many revisions, it generates this summary along with xml test outputs and instrumentation output as shown below.

Instrumentation output:

INITIAL:

Time compensation is 0
CTEST_FULL_OUTPUT
OpenCV version: 3.2.0-dev
OpenCV VCS version: 3.2.0-816-g28d66b332
Build type: release
Parallel framework: tbb
CPU features: popcnt mmx sse sse2 sse3 ssse3 sse4.1 sse4.2 avx avx2 fma3 fp16
cl_get_gt_device(): error, unknown device: ffffffff
cl_get_gt_device(): error, unknown device: ffffffff
cl_get_gt_device(): error, unknown device: ffffffff
OpenCL is disabled
Note: Google Test filter = feature2d_detectAndExtract.detectAndExtract/4
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from feature2d_detectAndExtract
[ RUN      ] feature2d_detectAndExtract.detectAndExtract/4
[ PERFSTAT ]    (samples = 13, mean = 114.95, median = 115.11, stddev = 1.16 (1.0%))
[ VALUE    ]    (AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")
[ TRACE    ]
ROOT
\---detectAndCompute - TC:1 C:13 T:1513.67ms
    |---convertTo - TC:1 C:13 T:2.22ms L:0% G:0%
    |   \---ipp_convertTo<W_IPP> - TC:1 C:13 T:2.18ms L:98% G:0%
    |       \---::ipp::iwiScale<F_IPP> - TC:1 C:13 T:2.17ms L:99% G:0%
    |
    |---zeros - TC:1 C:1664 T:0.75ms L:0% G:0%
    |---operator= - TC:1 C:1664 T:55.10ms L:4% G:4%
    |---Create_Nonlinear_Scale_Space - TC:1 C:13 T:281.87ms L:19% G:19%
    |   |---copyTo - TC:1 C:182 T:17.01ms L:6% G:1%
    |   |   \---ippicviCopy_8u_C1R_L<F_IPP> - TC:1 C:182 T:16.84ms L:99% G:1%
    |   |
    |   |---GaussianBlur - TC:1 C:221 T:50.43ms L:18% G:3%
    |   |   |---ipp_GaussianBlur<W_IPP> - TC:1 C:221 T:27.69ms L:55% G:2%
    |   |   |   |---parallel_for_ - TC:1 C:117 T:25.80ms L:93% G:2%
    |   |   |   |   \---operator() - TC:5 C:2912 T:90.64ms L:351% G:6%
    |   |   |   |       \---operator()<W_IPP> - TC:8 C:2912 T:73.70ms L:81% G:5%
    |   |   |   |           \---::ipp::iwiFilterGaussian<F_IPP> - TC:8 C:2593 T:60.24ms L:82% G:4%
    |   |   |   |               \---::ipp::iwiFilterGaussian - BadExit<MARK_IPP> - TC:8 C:97 T:0.05ms L:0% G:0%
    |   |   |   |
    |   |   |   \---::ipp::iwiFilterGaussian<F_IPP> - TC:1 C:104 T:1.61ms L:6% G:0%
    |   |   |
    |   |   \---sepFilter2D - TC:1 C:13 T:22.39ms L:44% G:1%
    |   |       |---convertTo - TC:1 C:26 T:0.09ms L:0% G:0%
    |   |       |   \---ipp_convertTo<W_IPP> - TC:1 C:26 T:0.06ms L:66% G:0%
    |   |       |       \---::ipp::iwiScale<F_IPP> - TC:1 C:26 T:0.03ms L:60% G:0%
    |   |       |
    |   |       \---apply - TC:1 C:13 T:21.93ms L:98% G:1%
    |   |           |---borderInterpolate - TC:1 C:23608 T:2.98ms L:14% G:0%
    |   |           \---ippiOperator<W_IPP> - TC:1 C:7800 T:8.71ms L:40% G:1%
    |   |               \---ippicviFilterRowBorderPipeline_32f_C1R<F_IPP> - TC:1 C:7800 T:6.43ms L:74% G:0%
    |   |
    |   |---zeros - TC:1 C:143 T:0.09ms L:0% G:0%
    |   |---operator= - TC:1 C:143 T:6.98ms L:2% G:0%
    |   |---Scharr - TC:1 C:416 T:31.74ms L:11% G:2%
    |   |   \---ipp_Deriv<W_IPP> - TC:1 C:416 T:31.45ms L:99% G:2%
    |   |       \---::ipp::iwiFilterScharr<F_IPP> - TC:1 C:416 T:31.04ms L:99% G:2%
    |   |
    |   |---parallel_for_ - TC:1 C:2158 T:72.07ms L:26% G:5%
    |   |   \---operator() - TC:5 C:1612 T:143.77ms L:199% G:9%
    |   |
    |   |---add - TC:1 C:2158 T:29.22ms L:10% G:2%
    |   |   \---ippicviAdd_32f_C1R<F_IPP> - TC:1 C:2158 T:27.56ms L:94% G:2%
    |   |
    |   \---resize - TC:1 C:39 T:2.46ms L:1% G:0%
    |       \---resize - TC:1 C:39 T:2.40ms L:97% G:0%
    |           |---ipp_resize<W_IPP> - TC:1 C:39 T:0.01ms L:1% G:0%
    |           |---parallel_for_ - TC:1 C:26 T:1.56ms L:65% G:0%
    |           |   \---operator() - TC:1 C:26 T:1.49ms L:96% G:0%
    |           |
    |           \---parallel_for_ - TC:1 C:13 T:0.65ms L:27% G:0%
    |
    |---Feature_Detection - TC:1 C:13 T:785.74ms L:52% G:52%
    |   |---Compute_Determinant_Hessian_Response - TC:1 C:13 T:652.72ms L:83% G:43%
    |   |   \---parallel_for_ - TC:1 C:13 T:625.91ms L:96% G:41%
    |   |       \---operator() - TC:1 C:208 T:4019.96ms L:642% G:266%
    |   |           |---copyTo - TC:8 C:2080 T:3.92ms L:0% G:0%
    |   |           |   \---ippicviCopy_8u_C1R_L<F_IPP> - TC:8 C:2080 T:1.48ms L:38% G:0%
    |   |           |
    |   |           |---sepFilter2D - TC:8 C:1040 T:1295.68ms L:32% G:86%
    |   |           |   |---convertTo - TC:8 C:2080 T:6.56ms L:1% G:0%
    |   |           |   |   \---ipp_convertTo<W_IPP> - TC:8 C:2080 T:4.00ms L:61% G:0%
    |   |           |   |       \---::ipp::iwiScale<F_IPP> - TC:8 C:2080 T:1.41ms L:35% G:0%
    |   |           |   |
    |   |           |   \---apply - TC:8 C:1040 T:1287.03ms L:99% G:85%
    |   |           |       |---borderInterpolate - TC:8 C:744900 T:207.46ms L:16% G:14%
    |   |           |       \---ippiOperator<W_IPP> - TC:8 C:219375 T:220.04ms L:17% G:15%
    |   |           |           \---ippicviFilterRowBorderPipeline_32f_C1R<F_IPP> - TC:8 C:219375 T:103.64ms L:47% G:7%
    |   |           |
    |   |           \---convertTo - TC:3 C:1040 T:59.62ms L:1% G:4%
    |   |               \---ipp_convertTo<W_IPP> - TC:3 C:1040 T:56.39ms L:95% G:4%
    |   |                   \---::ipp::iwiScale<F_IPP> - TC:3 C:1040 T:53.24ms L:94% G:4%
    |   |
    |   |---Find_Scale_Space_Extrema - TC:1 C:13 T:122.20ms L:16% G:8%
    |   \---Do_Subpixel_Refinement - TC:1 C:13 T:10.79ms L:1% G:1%
    |       \---solve - TC:1 C:23985 T:5.07ms L:47% G:0%
    |
    |---Compute_Keypoints_Orientation - TC:1 C:13 T:321.29ms L:21% G:21%
    |   \---fastAtan32f - TC:1 C:23946 T:14.92ms L:5% G:1%
    |       \---fastAtan32f - TC:1 C:23946 T:7.91ms L:53% G:1%
    |
    |---Compute_Descriptors - TC:1 C:13 T:63.65ms L:4% G:4%
    |   |---zeros - TC:1 C:13 T:0.01ms L:0% G:0%
    |   |---operator= - TC:1 C:13 T:0.06ms L:0% G:0%
    |   \---parallel_for_ - TC:1 C:13 T:63.48ms L:100% G:4%
    |       \---operator() - TC:8 C:10239 T:149.85ms L:236% G:10%
    |
    \---copyTo - TC:1 C:13 T:0.27ms L:0% G:0%
        \---ippicviCopy_8u_C1R_L<F_IPP> - TC:1 C:13 T:0.24ms L:89% G:0%

IPP weight: 20.2%
OPENCL weight: 0.0%
[/TRACE    ]
[       OK ] feature2d_detectAndExtract.detectAndExtract/4 (1528 ms)
[----------] 1 test from feature2d_detectAndExtract (1528 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1528 ms total)
[  PASSED  ] 1 test.

CURRENT (`ba071d1`):

Time compensation is 0
CTEST_FULL_OUTPUT
OpenCV version: 3.2.0-dev
OpenCV VCS version: 3.2.0-932-g6744b7cc3
Build type: release
Parallel framework: tbb
CPU features: popcnt mmx sse sse2 sse3 ssse3 sse4.1 sse4.2 avx avx2 fma3 fp16
cl_get_gt_device(): error, unknown device: ffffffff
cl_get_gt_device(): error, unknown device: ffffffff
cl_get_gt_device(): error, unknown device: ffffffff
OpenCL is disabled
Note: Google Test filter = feature2d_detectAndExtract.detectAndExtract/6:feature2d_detectAndExtract.detectAndExtract/7:feature2d_detectAndExtract.detectAndExtract/8
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from feature2d_detectAndExtract
[ RUN      ] feature2d_detectAndExtract.detectAndExtract/6
[ PERFSTAT ]    (samples = 10, mean = 76.67, median = 76.70, stddev = 1.14 (1.5%))
[ VALUE    ]    (AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")
[ TRACE    ]
ROOT
\---detectAndCompute - TC:1 C:10 T:766.70ms
    |---convertTo - TC:1 C:10 T:1.90ms L:0% G:0%
    |   \---ipp_convertTo<W_IPP> - TC:1 C:10 T:1.86ms L:98% G:0%
    |       \---::ipp::iwiScale<F_IPP> - TC:1 C:10 T:1.84ms L:99% G:0%
    |
    |---Allocate_Memory_Evolution - TC:1 C:10 T:0.14ms L:0% G:0%
    |---Create_Nonlinear_Scale_Space - TC:1 C:10 T:575.78ms L:75% G:75%
    |   |---GaussianBlur - TC:1 C:170 T:16.65ms L:3% G:2%
    |   |   \---ipp_GaussianBlur<W_IPP> - TC:1 C:170 T:16.28ms L:98% G:2%
    |   |       |---parallel_for_ - TC:1 C:90 T:14.42ms L:89% G:2%
    |   |       |   \---operator() - TC:8 C:2240 T:12.98ms L:90% G:2%
    |   |       |       \---operator()<W_IPP> - TC:8 C:2240 T:12.47ms L:96% G:2%
    |   |       |           \---::ipp::iwiFilterGaussian<F_IPP> - TC:8 C:2240 T:11.99ms L:96% G:2%
    |   |       |
    |   |       \---::ipp::iwiFilterGaussian<F_IPP> - TC:1 C:80 T:1.52ms L:9% G:0%
    |   |
    |   |---copyTo - TC:1 C:10 T:1.63ms L:0% G:0%
    |   |---copyTo - TC:1 C:120 T:9.20ms L:2% G:1%
    |   |   \---ippicviCopy_8u_C1R_L<F_IPP> - TC:1 C:120 T:8.88ms L:97% G:1%
    |   |
    |   |---compute_kcontrast - TC:1 C:10 T:17.30ms L:3% G:2%
    |   |   \---convertTo - TC:1 C:10 T:0.66ms L:4% G:0%
    |   |       \---ipp_convertTo<W_IPP> - TC:1 C:10 T:0.64ms L:97% G:0%
    |   |           \---::ipp::iwiScale<F_IPP> - TC:1 C:10 T:0.63ms L:98% G:0%
    |   |
    |   |---Scharr - TC:1 C:320 T:26.78ms L:5% G:3%
    |   |   \---ipp_Deriv<W_IPP> - TC:1 C:320 T:26.43ms L:99% G:3%
    |   |       \---::ipp::iwiFilterScharr<F_IPP> - TC:1 C:320 T:26.03ms L:98% G:3%
    |   |
    |   |---pm_g2 - TC:1 C:150 T:11.74ms L:2% G:2%
    |   |---parallel_for_ - TC:1 C:1660 T:269.44ms L:47% G:35%
    |   |   \---operator() - TC:8 C:253923 T:205.26ms L:76% G:27%
    |   |       \---nld_step_scalar_one_lane - TC:8 C:253923 T:88.20ms L:43% G:12%
    |   |
    |   |---add - TC:1 C:1660 T:25.94ms L:5% G:3%
    |   |   \---ippicviAdd_32f_C1R<F_IPP> - TC:1 C:1660 T:23.99ms L:92% G:3%
    |   |
    |   |---resize - TC:1 C:30 T:1.99ms L:0% G:0%
    |   |   \---resize - TC:1 C:30 T:1.88ms L:94% G:0%
    |   |       |---ipp_resize<W_IPP> - TC:1 C:30 T:0.01ms L:1% G:0%
    |   |       |---parallel_for_ - TC:1 C:20 T:1.09ms L:58% G:0%
    |   |       |   \---operator() - TC:2 C:20 T:0.80ms L:73% G:0%
    |   |       |
    |   |       \---parallel_for_ - TC:1 C:10 T:0.63ms L:33% G:0%
    |   |
    |   \---Compute_Determinant_Hessian_Response - TC:1 C:10 T:193.26ms L:34% G:25%
    |       \---parallel_for_ - TC:1 C:10 T:193.24ms L:100% G:25%
    |           \---operator() - TC:8 C:160 T:193.08ms L:100% G:25%
    |               |---compute_derivative_kernels - TC:8 C:320 T:1.17ms L:1% G:0%
    |               |   \---copyTo - TC:8 C:640 T:0.72ms L:62% G:0%
    |               |       \---ippicviCopy_8u_C1R_L<F_IPP> - TC:8 C:640 T:0.25ms L:35% G:0%
    |               |
    |               \---sepFilter2D - TC:8 C:800 T:186.08ms L:96% G:24%
    |                   |---convertTo - TC:8 C:1600 T:2.43ms L:1% G:0%
    |                   |   \---ipp_convertTo<W_IPP> - TC:8 C:1600 T:1.42ms L:58% G:0%
    |                   |       \---::ipp::iwiScale<F_IPP> - TC:8 C:1600 T:0.45ms L:32% G:0%
    |                   |
    |                   \---apply - TC:8 C:800 T:183.31ms L:99% G:24%
    |                       \---ippiOperator<W_IPP> - TC:8 C:168750 T:113.84ms L:62% G:15%
    |                           \---ippicviFilterRowBorderPipeline_32f_C1R<F_IPP> - TC:8 C:168750 T:57.25ms L:50% G:7%
    |
    |---Feature_Detection - TC:1 C:10 T:96.31ms L:13% G:13%
    |   |---Find_Scale_Space_Extrema - TC:1 C:10 T:87.95ms L:91% G:11%
    |   \---Do_Subpixel_Refinement - TC:1 C:10 T:8.33ms L:9% G:1%
    |       \---solve - TC:1 C:18400 T:3.88ms L:46% G:1%
    |
    |---Compute_Keypoints_Orientation - TC:1 C:10 T:32.10ms L:4% G:4%
    |   \---parallel_for_ - TC:1 C:10 T:32.10ms L:100% G:4%
    |       \---operator() - TC:8 C:7555 T:30.50ms L:95% G:4%
    |           \---fastAtan2 - TC:8 C:18360 T:18.38ms L:60% G:2%
    |               \---fastAtan32f - TC:8 C:18360 T:11.48ms L:62% G:1%
    |                   \---fastAtan32f - TC:8 C:18360 T:4.18ms L:36% G:1%
    |
    \---Compute_Descriptors - TC:1 C:10 T:54.78ms L:7% G:7%
        \---parallel_for_ - TC:1 C:10 T:54.71ms L:100% G:7%
            \---operator() - TC:8 C:8067 T:54.17ms L:99% G:7%

IPP weight: 17.3%
OPENCL weight: 0.0%
[/TRACE    ]
[       OK ] feature2d_detectAndExtract.detectAndExtract/6 (779 ms)
[ RUN      ] feature2d_detectAndExtract.detectAndExtract/7
[ PERFSTAT ]    (samples = 10, mean = 66.97, median = 67.19, stddev = 0.65 (1.0%))
[ VALUE    ]    (AKAZE_DEFAULT, "stitching/a3.png")
[ TRACE    ]
ROOT
\---detectAndCompute - TC:1 C:10 T:669.65ms
    |---convertTo - TC:1 C:10 T:1.56ms L:0% G:0%
    |   \---ipp_convertTo<W_IPP> - TC:1 C:10 T:1.52ms L:98% G:0%
    |       \---::ipp::iwiScale<F_IPP> - TC:1 C:10 T:1.50ms L:99% G:0%
    |
    |---Allocate_Memory_Evolution - TC:1 C:10 T:0.10ms L:0% G:0%
    |---Create_Nonlinear_Scale_Space - TC:1 C:10 T:539.96ms L:81% G:81%
    |   |---GaussianBlur - TC:1 C:130 T:13.03ms L:2% G:2%
    |   |   \---ipp_GaussianBlur<W_IPP> - TC:1 C:130 T:12.80ms L:98% G:2%
    |   |       |---parallel_for_ - TC:1 C:90 T:11.45ms L:89% G:2%
    |   |       |   \---operator() - TC:8 C:2080 T:10.16ms L:89% G:2%
    |   |       |       \---operator()<W_IPP> - TC:8 C:2080 T:9.68ms L:95% G:1%
    |   |       |           \---::ipp::iwiFilterGaussian<F_IPP> - TC:8 C:2080 T:9.24ms L:95% G:1%
    |   |       |
    |   |       \---::ipp::iwiFilterGaussian<F_IPP> - TC:1 C:40 T:1.10ms L:9% G:0%
    |   |
    |   |---copyTo - TC:1 C:10 T:1.35ms L:0% G:0%
    |   |---copyTo - TC:1 C:90 T:5.47ms L:1% G:1%
    |   |   \---ippicviCopy_8u_C1R_L<F_IPP> - TC:1 C:90 T:5.27ms L:96% G:1%
    |   |
    |   |---compute_kcontrast - TC:1 C:10 T:14.72ms L:3% G:2%
    |   |   \---convertTo - TC:1 C:10 T:0.56ms L:4% G:0%
    |   |       \---ipp_convertTo<W_IPP> - TC:1 C:10 T:0.55ms L:97% G:0%
    |   |           \---::ipp::iwiScale<F_IPP> - TC:1 C:10 T:0.54ms L:98% G:0%
    |   |
    |   |---Scharr - TC:1 C:240 T:19.86ms L:4% G:3%
    |   |   \---ipp_Deriv<W_IPP> - TC:1 C:240 T:19.62ms L:99% G:3%
    |   |       \---::ipp::iwiFilterScharr<F_IPP> - TC:1 C:240 T:19.32ms L:98% G:3%
    |   |
    |   |---pm_g2 - TC:1 C:110 T:7.88ms L:1% G:1%
    |   |---parallel_for_ - TC:1 C:760 T:239.34ms L:44% G:36%
    |   |   \---operator() - TC:8 C:227591 T:183.49ms L:77% G:27%
    |   |       \---nld_step_scalar_one_lane - TC:8 C:227591 T:74.65ms L:41% G:11%
    |   |
    |   |---add - TC:1 C:760 T:19.87ms L:4% G:3%
    |   |   \---ippicviAdd_32f_C1R<F_IPP> - TC:1 C:760 T:18.89ms L:95% G:3%
    |   |
    |   |---resize - TC:1 C:20 T:0.87ms L:0% G:0%
    |   |   \---resize - TC:1 C:20 T:0.81ms L:93% G:0%
    |   |       |---ipp_resize<W_IPP> - TC:1 C:20 T:0.01ms L:1% G:0%
    |   |       \---parallel_for_ - TC:1 C:20 T:0.74ms L:91% G:0%
    |   |           \---operator() - TC:2 C:20 T:0.46ms L:62% G:0%
    |   |
    |   \---Compute_Determinant_Hessian_Response - TC:1 C:10 T:216.48ms L:40% G:32%
    |       \---parallel_for_ - TC:1 C:10 T:216.47ms L:100% G:32%
    |           \---operator() - TC:8 C:120 T:216.33ms L:100% G:32%
    |               |---compute_derivative_kernels - TC:8 C:240 T:0.91ms L:0% G:0%
    |               |   \---copyTo - TC:8 C:480 T:0.58ms L:64% G:0%
    |               |       \---ippicviCopy_8u_C1R_L<F_IPP> - TC:8 C:480 T:0.21ms L:36% G:0%
    |               |
    |               \---sepFilter2D - TC:8 C:600 T:210.64ms L:97% G:31%
    |                   |---convertTo - TC:8 C:1200 T:2.11ms L:1% G:0%
    |                   |   \---ipp_convertTo<W_IPP> - TC:8 C:1200 T:1.22ms L:58% G:0%
    |                   |       \---::ipp::iwiScale<F_IPP> - TC:8 C:1200 T:0.42ms L:35% G:0%
    |                   |
    |                   \---apply - TC:8 C:600 T:208.67ms L:99% G:31%
    |                       \---ippiOperator<W_IPP> - TC:8 C:201600 T:140.09ms L:67% G:21%
    |                           \---ippicviFilterRowBorderPipeline_32f_C1R<F_IPP> - TC:8 C:201600 T:65.83ms L:47% G:10%
    |
    |---Feature_Detection - TC:1 C:10 T:61.85ms L:9% G:9%
    |   |---Find_Scale_Space_Extrema - TC:1 C:10 T:55.59ms L:90% G:8%
    |   \---Do_Subpixel_Refinement - TC:1 C:10 T:6.24ms L:10% G:1%
    |       \---solve - TC:1 C:14180 T:2.97ms L:48% G:0%
    |
    |---Compute_Keypoints_Orientation - TC:1 C:10 T:25.25ms L:4% G:4%
    |   \---parallel_for_ - TC:1 C:10 T:25.24ms L:100% G:4%
    |       \---operator() - TC:8 C:6788 T:23.82ms L:94% G:4%
    |           \---fastAtan2 - TC:8 C:14180 T:14.28ms L:60% G:2%
    |               \---fastAtan32f - TC:8 C:14180 T:8.90ms L:62% G:1%
    |                   \---fastAtan32f - TC:8 C:14180 T:3.32ms L:37% G:0%
    |
    \---Compute_Descriptors - TC:1 C:10 T:40.64ms L:6% G:6%
        \---parallel_for_ - TC:1 C:10 T:40.59ms L:100% G:6%
            \---operator() - TC:8 C:7038 T:40.12ms L:99% G:6%

IPP weight: 18.3%
OPENCL weight: 0.0%
[/TRACE    ]
[       OK ] feature2d_detectAndExtract.detectAndExtract/7 (682 ms)
[ RUN      ] feature2d_detectAndExtract.detectAndExtract/8
[ PERFSTAT ]    (samples = 10, mean = 196.35, median = 195.55, stddev = 1.82 (0.9%))
[ VALUE    ]    (AKAZE_DEFAULT, "stitching/s2.jpg")
[ TRACE    ]
ROOT
\---detectAndCompute - TC:1 C:10 T:1963.44ms
    |---convertTo - TC:1 C:10 T:2.85ms L:0% G:0%
    |   \---ipp_convertTo<W_IPP> - TC:1 C:10 T:2.82ms L:99% G:0%
    |       \---::ipp::iwiScale<F_IPP> - TC:1 C:10 T:2.80ms L:99% G:0%
    |
    |---Allocate_Memory_Evolution - TC:1 C:10 T:0.13ms L:0% G:0%
    |---Create_Nonlinear_Scale_Space - TC:1 C:10 T:779.65ms L:40% G:40%
    |   |---GaussianBlur - TC:1 C:170 T:33.91ms L:4% G:2%
    |   |   \---ipp_GaussianBlur<W_IPP> - TC:1 C:170 T:33.45ms L:99% G:2%
    |   |       |---parallel_for_ - TC:1 C:90 T:30.99ms L:93% G:2%
    |   |       |   \---operator() - TC:8 C:2720 T:29.24ms L:94% G:1%
    |   |       |       \---operator()<W_IPP> - TC:8 C:2720 T:28.67ms L:98% G:1%
    |   |       |           \---::ipp::iwiFilterGaussian<F_IPP> - TC:8 C:2720 T:28.17ms L:98% G:1%
    |   |       |
    |   |       \---::ipp::iwiFilterGaussian<F_IPP> - TC:1 C:80 T:2.07ms L:6% G:0%
    |   |
    |   |---copyTo - TC:1 C:10 T:3.12ms L:0% G:0%
    |   |---copyTo - TC:1 C:120 T:16.78ms L:2% G:1%
    |   |   \---ippicviCopy_8u_C1R_L<F_IPP> - TC:1 C:120 T:16.39ms L:98% G:1%
    |   |
    |   |---compute_kcontrast - TC:1 C:10 T:31.24ms L:4% G:2%
    |   |   \---convertTo - TC:1 C:10 T:1.23ms L:4% G:0%
    |   |       \---ipp_convertTo<W_IPP> - TC:1 C:10 T:1.21ms L:98% G:0%
    |   |           \---::ipp::iwiScale<F_IPP> - TC:1 C:10 T:1.20ms L:99% G:0%
    |   |
    |   |---Scharr - TC:1 C:320 T:47.77ms L:6% G:2%
    |   |   \---ipp_Deriv<W_IPP> - TC:1 C:320 T:47.43ms L:99% G:2%
    |   |       \---::ipp::iwiFilterScharr<F_IPP> - TC:1 C:320 T:46.99ms L:99% G:2%
    |   |
    |   |---pm_g2 - TC:1 C:150 T:21.81ms L:3% G:1%
    |   |---parallel_for_ - TC:1 C:1660 T:311.89ms L:40% G:16%
    |   |   \---operator() - TC:8 C:289449 T:242.03ms L:78% G:12%
    |   |       \---nld_step_scalar_one_lane - TC:8 C:289449 T:115.17ms L:48% G:6%
    |   |
    |   |---add - TC:1 C:1660 T:50.06ms L:6% G:3%
    |   |   \---ippicviAdd_32f_C1R<F_IPP> - TC:1 C:1660 T:48.02ms L:96% G:2%
    |   |
    |   |---resize - TC:1 C:30 T:7.97ms L:1% G:0%
    |   |   \---resize - TC:1 C:30 T:7.88ms L:99% G:0%
    |   |       |---ipp_resize<W_IPP> - TC:1 C:30 T:0.01ms L:0% G:0%
    |   |       |---parallel_for_ - TC:1 C:20 T:7.29ms L:93% G:0%
    |   |       |   \---operator() - TC:4 C:40 T:6.14ms L:84% G:0%
    |   |       |
    |   |       \---parallel_for_ - TC:1 C:10 T:0.30ms L:4% G:0%
    |   |
    |   \---Compute_Determinant_Hessian_Response - TC:1 C:10 T:253.08ms L:32% G:13%
    |       \---parallel_for_ - TC:1 C:10 T:253.07ms L:100% G:13%
    |           \---operator() - TC:8 C:160 T:252.89ms L:100% G:13%
    |               |---compute_derivative_kernels - TC:8 C:320 T:1.20ms L:0% G:0%
    |               |   \---copyTo - TC:8 C:640 T:0.74ms L:62% G:0%
    |               |       \---ippicviCopy_8u_C1R_L<F_IPP> - TC:8 C:640 T:0.24ms L:32% G:0%
    |               |
    |               \---sepFilter2D - TC:8 C:800 T:243.93ms L:96% G:12%
    |                   |---convertTo - TC:8 C:1600 T:2.40ms L:1% G:0%
    |                   |   \---ipp_convertTo<W_IPP> - TC:8 C:1600 T:1.38ms L:58% G:0%
    |                   |       \---::ipp::iwiScale<F_IPP> - TC:8 C:1600 T:0.46ms L:33% G:0%
    |                   |
    |                   \---apply - TC:8 C:800 T:241.43ms L:99% G:12%
    |                       \---ippiOperator<W_IPP> - TC:8 C:196800 T:144.94ms L:60% G:7%
    |                           \---ippicviFilterRowBorderPipeline_32f_C1R<F_IPP> - TC:8 C:196800 T:82.44ms L:57% G:4%
    |
    |---Feature_Detection - TC:1 C:10 T:824.76ms L:42% G:42%
    |   |---Find_Scale_Space_Extrema - TC:1 C:10 T:790.53ms L:96% G:40%
    |   \---Do_Subpixel_Refinement - TC:1 C:10 T:34.20ms L:4% G:2%
    |       \---solve - TC:1 C:75310 T:15.64ms L:46% G:1%
    |
    |---Compute_Keypoints_Orientation - TC:1 C:10 T:119.35ms L:6% G:6%
    |   \---parallel_for_ - TC:1 C:10 T:119.33ms L:100% G:6%
    |       \---operator() - TC:8 C:10596 T:117.30ms L:98% G:6%
    |           \---fastAtan2 - TC:8 C:75200 T:72.71ms L:62% G:4%
    |               \---fastAtan32f - TC:8 C:75200 T:45.09ms L:62% G:2%
    |                   \---fastAtan32f - TC:8 C:75200 T:16.34ms L:36% G:1%
    |
    \---Compute_Descriptors - TC:1 C:10 T:228.30ms L:12% G:12%
        \---parallel_for_ - TC:1 C:10 T:228.09ms L:100% G:12%
            \---operator() - TC:8 C:10921 T:227.32ms L:100% G:12%

IPP weight: 11.7%
OPENCL weight: 0.0%
[/TRACE    ]
[       OK ] feature2d_detectAndExtract.detectAndExtract/8 (1988 ms)
[----------] 3 tests from feature2d_detectAndExtract (3449 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (3449 ms total)
[  PASSED  ] 3 tests.

Failed branches:

These are branches that contains code that is faster, but not suitable for
including into main branch (there might be failing tests etc.):

test_scharr I have tried to
[replace Scharr operator in Compute_Determinant_Hessian_Response with Scharr
[with fixed 3x3 kernel.

akaze_octaves Reworked
[non-linear scale space pyramid so that diffusivity is propagated only inside
[octaves. Probably not worth it, since it damages accuracy.

alalek · 2017-06-21T10:58:37Z

@hrnr Please keep patches in a single PR: #8869

# git commit pending changes
git checkout -B akaze_part1 HEAD
git push origin akaze_part1

alalek · 2017-06-21T11:34:25Z

OK, we merged part 1 (#8869). Lets continue here.
Please rebase the latest commits on the current master.

hrnr · 2017-06-21T12:18:32Z

Thanks. Sorry for confusion, I should have add more description, to clarify that the situation.

I will rebase for sure.

ysolovyov · 2017-06-21T18:20:50Z

How much it speed up the algo on your machine?

hrnr · 2017-06-22T09:43:17Z

This is work in progress. I will update the description with my measurements. The current improvement is minor.

edit: description updated, plese see the stats above. The current speedup is ~1.7x.

hrnr · 2017-06-30T08:08:07Z

rebase due to merge conflict. The commit hashes are now different from what is reported above in the perf stats (I will fix that in the future).

* now test have images: 600x768, 900x600 and 1385x700 to cover different resolutions

* this takes 84% of time of Feature_Detection * run everything in parallel * compute Scharr kernels just once * compute sigma more efficiently * allocate all matrices in evolution without zeroing

* add Lflow and Lstep to evolution as in original AKAZE code

integrated faster function from https://github.com/h2suzuki/fast_akaze

* improved readability for people familiar with opencv * do not same image twice in base level

* use one pass stencil for diffusity from https://github.com/h2suzuki/fast_akaze * improve locality in Create_Scale_Space

* this needs to be computed always as we need derivatives while computing descriptors * fixed tests of AKAZE with KAZE descriptors which have been affected by this Currently it computes all first and second order derivatives together and the determiant of the hessian. For descriptors it would be enough to compute just first order derivates, but it is not probably worth it optimize for scenario where descriptors and keypoints are computed separately, since it is already very inefficient. When computing keypoint and descriptors together it is faster to do it the current way (preserves locality).

* get rid of sharing buffers when creating scale space pyramid, the performace impact is neglegible

* ensures more stable output * more reasonable profiles, since the first call of parallel_for_ is not getting big performace hit

* no need to go twice through the data

* fixed bug that prevented computing determinant for scale pyramid of size 1 (just the base image) * all descriptors now support writing to uninitialized memory * use InputArray and OutputArray for input image and descriptors, allows to make use UMAt that user passes to us

* all parts that uses ocl-enabled functions should use ocl by now

* when OCL is disabled IPP version should be always prefered (even when the dst is UMat)

* this slows CPU version considerably * do no run in parallel when running with OCL

hrnr · 2017-07-10T13:47:53Z

I have evaluated the option of using CV_8U for images and derivations in AKAZE. It does not seem to be a viable path. The precision is affected badly. In our tests only a 40 keypoints have been found out of 507.

A viable option might be to use a half precision floats when they become widely available.

[ RUN      ] Features2d_DescriptorExtractor_AKAZE.regression
/home/henry/.opencv/modules/ts/src/ts.cpp:541: Failure
Failed

        failure reason: Invalid test data
        test case #-1
        seed: ffffffffffffffff
-----------------------------------
        LOG:

Average time of computing one descriptor = 4.13323e-06 ms.
Valid and computed descriptors matrices must have the same size and type.

-----------------------------------

[  FAILED  ] Features2d_DescriptorExtractor_AKAZE.regression (357 ms)
[----------] 1 test from Features2d_DescriptorExtractor_AKAZE (357 ms total)

[----------] 1 test from Features2d_DescriptorExtractor_AKAZE_DESCRIPTOR_KAZE
[ RUN      ] Features2d_DescriptorExtractor_AKAZE_DESCRIPTOR_KAZE.regression
/home/henry/.opencv/modules/ts/src/ts.cpp:541: Failure
Failed

        failure reason: Invalid test data
        test case #-1
        seed: ffffffffffffffff
-----------------------------------
        LOG:

Average time of computing one descriptor = 5.14768e-06 ms.
Valid and computed descriptors matrices must have the same size and type.

-----------------------------------

[  FAILED  ] Features2d_DescriptorExtractor_AKAZE_DESCRIPTOR_KAZE.regression (365 ms)
[----------] 1 test from Features2d_DescriptorExtractor_AKAZE_DESCRIPTOR_KAZE (365 ms total)

[----------] 2 tests from Features2d_Detector_AKAZE
[ RUN      ] Features2d_Detector_AKAZE.regression
/home/henry/.opencv/modules/ts/src/ts.cpp:541: Failure
Failed

        failure reason: Invalid function output
        test case #-1
        seed: ffffffffffffffff
-----------------------------------
        LOG:
Bad keypoints count ratio (validCount = 507, calcCount = 40).

-----------------------------------

[  FAILED  ] Features2d_Detector_AKAZE.regression (280 ms)
[ RUN      ] Features2d_Detector_AKAZE.detect_and_compute_split
[       OK ] Features2d_Detector_AKAZE.detect_and_compute_split (5 ms)
[----------] 2 tests from Features2d_Detector_AKAZE (285 ms total)

[----------] 1 test from Features2d_Detector_AKAZE_DESCRIPTOR_KAZE
[ RUN      ] Features2d_Detector_AKAZE_DESCRIPTOR_KAZE.regression
/home/henry/.opencv/modules/ts/src/ts.cpp:541: Failure
Failed

        failure reason: Invalid function output
        test case #-1
        seed: ffffffffffffffff
-----------------------------------
        LOG:
Bad keypoints count ratio (validCount = 439, calcCount = 35).

-----------------------------------

[  FAILED  ] Features2d_Detector_AKAZE_DESCRIPTOR_KAZE.regression (180 ms)
[----------] 1 test from Features2d_Detector_AKAZE_DESCRIPTOR_KAZE (180 ms total)

[----------] 1 test from Features2d_Detector_Keypoints_AKAZE
[ RUN      ] Features2d_Detector_Keypoints_AKAZE.validation
[       OK ] Features2d_Detector_Keypoints_AKAZE.validation (352 ms)
[----------] 1 test from Features2d_Detector_Keypoints_AKAZE (352 ms total)

[----------] 1 test from AKAZE/DescriptorRotationInvariance
[ RUN      ] AKAZE/DescriptorRotationInvariance.rotation/0
Intial keypoints: 40
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.925 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.925 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.8 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.775 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.9 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.9 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.975 vs 0.99
[  FAILED  ] AKAZE/DescriptorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <C0-6E D6-AC DD-55 00-00 30-80 D6-AC DD-55 00-00>, 16-byte object <30-BE D6-AC DD-55 00-00 00-A3 D6-AC DD-55 00-00>, 0.99) (9241 ms)
[----------] 1 test from AKAZE/DescriptorRotationInvariance (9241 ms total)

[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DescriptorRotationInvariance
[ RUN      ] AKAZE_DESCRIPTOR_KAZE/DescriptorRotationInvariance.rotation/0
Intial keypoints: 35
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.942857 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.628571 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.485714 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.228571 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.114286 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.114286 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0857143 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0857143 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0571429 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0285714 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0285714 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0285714 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0857143 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.0857143 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.142857 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.228571 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.428571 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.6 vs 0.99
/home/henry/.opencv/modules/features2d/test/test_descriptors_invariance.cpp:108: Failure
Expected: (descInliersRatio) >= (minInliersRatio), actual: 0.857143 vs 0.99
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DescriptorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <70-65 D6-AC DD-55 00-00 60-6D D6-AC DD-55 00-00>, 16-byte object <50-68 D6-AC DD-55 00-00 60-74 D6-AC DD-55 00-00>, 0.99) (9208 ms)
[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DescriptorRotationInvariance (9208 ms total)

[----------] 1 test from AKAZE/DescriptorScaleInvariance
[ RUN      ] AKAZE/DescriptorScaleInvariance.scale/0
unknown file: Failure
C++ exception with description "/home/henry/.opencv/modules/features2d/src/kaze/AKAZEFeatures.cpp:930: error: (-215) 0 <= kpts[i].class_id && kpts[i].class_id < static_cast<int>(evolution_.size()) in function Compute_Descriptors
" thrown in the test body.
[  FAILED  ] AKAZE/DescriptorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <40-1B D8-AC DD-55 00-00 C0-88 D6-AC DD-55 00-00>, 16-byte object <C0-66 D6-AC DD-55 00-00 A0-67 D6-AC DD-55 00-00>, 0.6) (2932 ms)
[----------] 1 test from AKAZE/DescriptorScaleInvariance (2932 ms total)

[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DescriptorScaleInvariance
[ RUN      ] AKAZE_DESCRIPTOR_KAZE/DescriptorScaleInvariance.scale/0
unknown file: Failure
C++ exception with description "/home/henry/.opencv/modules/features2d/src/kaze/AKAZEFeatures.cpp:930: error: (-215) 0 <= kpts[i].class_id && kpts[i].class_id < static_cast<int>(evolution_.size()) in function Compute_Descriptors
" thrown in the test body.
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DescriptorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <D0-16 D8-AC DD-55 00-00 90-13 D8-AC DD-55 00-00>, 16-byte object <60-13 D8-AC DD-55 00-00 D0-22 D8-AC DD-55 00-00>, 0.55) (2987 ms)
[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DescriptorScaleInvariance (2987 ms total)

[----------] 1 test from AKAZE/DetectorRotationInvariance
[ RUN      ] AKAZE/DetectorRotationInvariance.rotation/0
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.225 vs 0.5
angle: 15
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.175 vs 0.5
angle: 30
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.175 vs 0.5
angle: 45
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0.714286 vs 0.76
angle: 45
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.075 vs 0.5
angle: 60
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.05 vs 0.5
angle: 75
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0.5 vs 0.76
angle: 75
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.1 vs 0.5
angle: 90
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 90
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.05 vs 0.5
angle: 105
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 105
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.025 vs 0.5
angle: 120
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 120
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.025 vs 0.5
angle: 135
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 135
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 150
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 165
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 180
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.025 vs 0.5
angle: 195
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 195
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 210
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 225
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 240
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.025 vs 0.5
angle: 255
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 255
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 270
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.1 vs 0.5
angle: 285
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 285
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.05 vs 0.5
angle: 300
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 300
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.075 vs 0.5
angle: 315
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 315
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.2 vs 0.5
angle: 330
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0.625 vs 0.76
angle: 330
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.225 vs 0.5
angle: 345
[  FAILED  ] AKAZE/DetectorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <B0-26 D8-AC DD-55 00-00 E0-26 D8-AC DD-55 00-00>, 0.5, 0.76) (9367 ms)
[----------] 1 test from AKAZE/DetectorRotationInvariance (9367 ms total)

[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DetectorRotationInvariance
[ RUN      ] AKAZE_DESCRIPTOR_KAZE/DetectorRotationInvariance.rotation/0
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.257143 vs 0.5
angle: 15
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.2 vs 0.5
angle: 30
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.171429 vs 0.5
angle: 45
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0571429 vs 0.5
angle: 60
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0571429 vs 0.5
angle: 75
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0.5 vs 0.76
angle: 75
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.114286 vs 0.5
angle: 90
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 90
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0285714 vs 0.5
angle: 105
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 105
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 120
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0285714 vs 0.5
angle: 135
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 135
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 150
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 165
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 180
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0285714 vs 0.5
angle: 195
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 195
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 210
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 225
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 240
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0285714 vs 0.5
angle: 255
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 255
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0 vs 0.5
angle: 270
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0857143 vs 0.5
angle: 285
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 285
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0571429 vs 0.5
angle: 300
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 300
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.0857143 vs 0.5
angle: 315
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0 vs 0.76
angle: 315
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.2 vs 0.5
angle: 330
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0.714286 vs 0.76
angle: 330
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:141: Failure
Expected: (keyPointMatchesRatio) >= (minKeyPointMatchesRatio), actual: 0.2 vs 0.5
angle: 345
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:146: Failure
Expected: (angleInliersRatio) >= (minInliersRatio), actual: 0.714286 vs 0.76
angle: 345
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DetectorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <A0-DB D8-AC DD-55 00-00 10-27 D8-AC DD-55 00-00>, 0.5, 0.76) (9356 ms)
[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DetectorRotationInvariance (9356 ms total)

[----------] 1 test from AKAZE/DetectorScaleInvariance
[ RUN      ] AKAZE/DetectorScaleInvariance.scale/0
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:210: Failure
Expected: (scaleInliersRatio) >= (minInliersRatio), actual: 0.222222 vs 0.49
[  FAILED  ] AKAZE/DetectorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <60-99 D9-AC DD-55 00-00 50-94 D9-AC DD-55 00-00>, 0.08, 0.49) (2081 ms)
[----------] 1 test from AKAZE/DetectorScaleInvariance (2081 ms total)

[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DetectorScaleInvariance
[ RUN      ] AKAZE_DESCRIPTOR_KAZE/DetectorScaleInvariance.scale/0
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:210: Failure
Expected: (scaleInliersRatio) >= (minInliersRatio), actual: 0.266667 vs 0.49
/home/henry/.opencv/modules/features2d/test/test_detectors_invariance.cpp:173: Failure
Expected: (keypoints1.size()) >= (15u), actual: 14 vs 15
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DetectorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <D0-9D D9-AC DD-55 00-00 30-99 D9-AC DD-55 00-00>, 0.08, 0.49) (2099 ms)
[----------] 1 test from AKAZE_DESCRIPTOR_KAZE/DetectorScaleInvariance (2100 ms total)

[----------] Global test environment tear-down
[==========] 14 tests from 13 test cases ran. (48812 ms total)
[  PASSED  ] 2 tests.
[  FAILED  ] 12 tests, listed below:
[  FAILED  ] Features2d_DescriptorExtractor_AKAZE.regression
[  FAILED  ] Features2d_DescriptorExtractor_AKAZE_DESCRIPTOR_KAZE.regression
[  FAILED  ] Features2d_Detector_AKAZE.regression
[  FAILED  ] Features2d_Detector_AKAZE_DESCRIPTOR_KAZE.regression
[  FAILED  ] AKAZE/DescriptorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <C0-6E D6-AC DD-55 00-00 30-80 D6-AC DD-55 00-00>, 16-byte object <30-BE D6-AC DD-55 00-00 00-A3 D6-AC DD-55 00-00>, 0.99)
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DescriptorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <70-65 D6-AC DD-55 00-00 60-6D D6-AC DD-55 00-00>, 16-byte object <50-68 D6-AC DD-55 00-00 60-74 D6-AC DD-55 00-00>, 0.99)
[  FAILED  ] AKAZE/DescriptorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <40-1B D8-AC DD-55 00-00 C0-88 D6-AC DD-55 00-00>, 16-byte object <C0-66 D6-AC DD-55 00-00 A0-67 D6-AC DD-55 00-00>, 0.6)
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DescriptorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <D0-16 D8-AC DD-55 00-00 90-13 D8-AC DD-55 00-00>, 16-byte object <60-13 D8-AC DD-55 00-00 D0-22 D8-AC DD-55 00-00>, 0.55)
[  FAILED  ] AKAZE/DetectorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <B0-26 D8-AC DD-55 00-00 E0-26 D8-AC DD-55 00-00>, 0.5, 0.76)
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DetectorRotationInvariance.rotation/0, where GetParam() = ("features2d/tsukuba.png", 16-byte object <A0-DB D8-AC DD-55 00-00 10-27 D8-AC DD-55 00-00>, 0.5, 0.76)
[  FAILED  ] AKAZE/DetectorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <60-99 D9-AC DD-55 00-00 50-94 D9-AC DD-55 00-00>, 0.08, 0.49)
[  FAILED  ] AKAZE_DESCRIPTOR_KAZE/DetectorScaleInvariance.scale/0, where GetParam() = ("detectors_descriptors_evaluation/images_datasets/bikes/img1.png", 16-byte object <D0-9D D9-AC DD-55 00-00 30-99 D9-AC DD-55 00-00>, 0.08, 0.49)

* diffusivity itself is not a blocker, but this saves us downloading and uploading derivations

bmagyar · 2017-07-11T13:59:45Z

It was worth a shot!
Half precision floats are a good idea although more for the long shot. This is a note that I think could be added to the notes of the descriptor as TODO. Someone may pick it up in the future.

hrnr · 2017-07-12T14:56:09Z

I have finally got the perf measurements for OCL version on GRID K520 nvidia card with OpenCL 1.2. The performance as now is pretty bad, much slower than CPU version.

Nevertheless I wasn't able to reproduce the test failure that occurs on Linux OCL buildbot.

There is a bug in computing keypoint orientation and computing descriptors, which causes matrices to be downloaded again and again for each keypoint. I need to fix this and then the times will be back reasonable.

Apart from this bug, there is a lot of transfers between CPU and GPU while building the scale pyramid. I'm working on porting fast explicit diffusion to GPU, so that almost whole pyramid could be computed on GPU.

Some OpenCL functions (GaussianBlur, Scharr) execute non-optimal OCL paths, this will be subject to fine-tuning later. In current state they are slower that IPP equivalents (which is bad).

Time compensation is 0
CTEST_FULL_OUTPUT
OpenCV version: 3.2.0-dev
OpenCV VCS version: 3.1.0-3096-g6f5382a6e
Build type: release
Parallel framework: tbb
CPU features: mmx sse sse2 sse3
OpenCL Platforms: 
    NVIDIA CUDA
        dGPU: GRID K520 (OpenCL 1.2 CUDA)
Current OpenCL device: 
    Type = dGPU
    Name = GRID K520
    Version = OpenCL 1.2 CUDA
    Driver version = 352.99
    Compute units = 8
    Max work group size = 1024
    Local memory size = 48 kB 
    Max memory allocation size = 1023 MB 976 kB 
    Double support = Yes
    Host unified memory = No
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 1
    Preferred vector width short = 1
    Preferred vector width int = 1
    Preferred vector width long = 1
    Preferred vector width float = 1
    Preferred vector width double = 1
Note: Google Test filter = OCL_feature2d_detectAndExtract.detectAndExtract/6:OCL_feature2d_detectAndExtract.detectAndExtract/7:OCL_feature2d_detectAndExtract.detectAndExtract/8
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from OCL_feature2d_detectAndExtract
[ RUN      ] OCL_feature2d_detectAndExtract.detectAndExtract/6
.
.
[ PERFSTAT ]    (samples = 3, mean = 9472.25, median = 9702.75, stddev = 1071.39 (11.3%))
[ VALUE    ] 	(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")
[ TRACE    ]
ROOT
\---detectAndCompute - TC:1 C:3 T:28416.68ms
    |---convertTo - TC:1 C:3 T:1.67ms L:0% G:0%
    |   \---ipp_convertTo<W_IPP> - TC:1 C:3 T:1.63ms L:97% G:0%
    |       \---::ipp::iwiScale<F_IPP> - TC:1 C:3 T:1.61ms L:99% G:0%
    |   
    |---Allocate_Memory_Evolution - TC:1 C:3 T:0.11ms L:0% G:0%
    |---Create_Nonlinear_Scale_Space - TC:1 C:3 T:529.17ms L:2% G:2%
    |   |---GaussianBlur - TC:1 C:51 T:82.13ms L:16% G:0%
    |   |   |---ipp_GaussianBlur<W_IPP> - TC:1 C:3 T:9.89ms L:12% G:0%
    |   |   |   \---parallel_for_ - TC:1 C:3 T:3.04ms L:31% G:0%
    |   |   |       \---operator() - TC:8 C:96 T:1.81ms L:60% G:0%
    |   |   |           \---operator()<W_IPP> - TC:8 C:96 T:1.49ms L:82% G:0%
    |   |   |               \---::ipp::iwiFilterGaussian<F_IPP> - TC:8 C:96 T:1.41ms L:95% G:0%
    |   |   |   
    |   |   \---sepFilter2D - TC:1 C:48 T:71.23ms L:87% G:0%
    |   |       |---convertTo - TC:1 C:96 T:0.48ms L:1% G:0%
    |   |       |   \---ipp_convertTo<W_IPP> - TC:1 C:96 T:0.30ms L:62% G:0%
    |   |       |       \---::ipp::iwiScale<F_IPP> - TC:1 C:96 T:0.15ms L:49% G:0%
    |   |       |   
    |   |       |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:2.33ms L:3% G:0%
    |   |       |---row_filter<F_OCL> - TC:1 C:48 T:18.61ms L:26% G:0%
    |   |       |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.42ms L:1% G:0%
    |   |       |---col_filter<F_OCL> - TC:1 C:48 T:10.76ms L:15% G:0%
    |   |       \---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.56ms L:1% G:0%
    |   |   
    |   |---copyTo - TC:1 C:3 T:3.11ms L:1% G:0%
    |   |---copyTo - TC:1 C:36 T:5.98ms L:1% G:0%
    |   |   \---ippicviCopy_8u_C1R_L<F_IPP> - TC:1 C:36 T:5.60ms L:94% G:0%
    |   |   
    |   |---compute_kcontrast - TC:1 C:3 T:8.34ms L:2% G:0%
    |   |   \---convertTo - TC:1 C:3 T:0.49ms L:6% G:0%
    |   |       \---ipp_convertTo<W_IPP> - TC:1 C:3 T:0.47ms L:97% G:0%
    |   |           \---::ipp::iwiScale<F_IPP> - TC:1 C:3 T:0.46ms L:98% G:0%
    |   |   
    |   |---Scharr - TC:1 C:96 T:51.76ms L:10% G:0%
    |   |   |---convertTo - TC:1 C:192 T:0.89ms L:2% G:0%
    |   |   |   \---ipp_convertTo<W_IPP> - TC:1 C:192 T:0.57ms L:64% G:0%
    |   |   |       \---::ipp::iwiScale<F_IPP> - TC:1 C:192 T:0.28ms L:49% G:0%
    |   |   |   
    |   |   \---sepFilter2D - TC:1 C:96 T:48.60ms L:94% G:0%
    |   |       |---convertTo - TC:1 C:192 T:0.65ms L:1% G:0%
    |   |       |   \---ipp_convertTo<W_IPP> - TC:1 C:192 T:0.37ms L:57% G:0%
    |   |       |       \---::ipp::iwiScale<F_IPP> - TC:1 C:192 T:0.15ms L:41% G:0%
    |   |       |   
    |   |       |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.52ms L:1% G:0%
    |   |       |---row_filter<F_OCL> - TC:1 C:96 T:23.80ms L:49% G:0%
    |   |       |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.38ms L:1% G:0%
    |   |       |---col_filter<F_OCL> - TC:1 C:96 T:14.18ms L:29% G:0%
    |   |       |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.50ms L:1% G:0%
    |   |       |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.40ms L:1% G:0%
    |   |       |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.53ms L:1% G:0%
    |   |       \---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.52ms L:1% G:0%
    |   |   
    |   |---Compile: 784c1f97388a88e9 options: <W_OCL> - TC:1 C:1 T:0.46ms L:0% G:0%
    |   |---AKAZE_pm_g2<F_OCL> - TC:1 C:45 T:4.78ms L:1% G:0%
    |   |---parallel_for_ - TC:1 C:498 T:90.60ms L:17% G:0%
    |   |   \---operator() - TC:8 C:25938 T:69.37ms L:77% G:0%
    |   |       \---nld_step_scalar_one_lane - TC:8 C:25938 T:42.27ms L:61% G:0%
    |   |   
    |   |---add - TC:1 C:498 T:18.60ms L:4% G:0%
    |   |   \---ippicviAdd_32f_C1R<F_IPP> - TC:1 C:498 T:16.41ms L:88% G:0%
    |   |   
    |   |---resize - TC:1 C:9 T:1.25ms L:0% G:0%
    |   |   \---resize - TC:1 C:9 T:1.12ms L:90% G:0%
    |   |       |---ipp_resize<W_IPP> - TC:1 C:9 T:0.02ms L:1% G:0%
    |   |       |---parallel_for_ - TC:1 C:6 T:0.69ms L:61% G:0%
    |   |       |   \---operator() - TC:2 C:6 T:0.48ms L:69% G:0%
    |   |       |   
    |   |       \---parallel_for_ - TC:1 C:3 T:0.28ms L:25% G:0%
    |   |   
    |   \---Compute_Determinant_Hessian_Response - TC:1 C:3 T:217.58ms L:41% G:1%
    |       |---compute_derivative_kernels - TC:1 C:96 T:0.92ms L:0% G:0%
    |       |   \---copyTo - TC:1 C:192 T:0.47ms L:51% G:0%
    |       |       \---ippicviCopy_8u_C1R_L<F_IPP> - TC:1 C:192 T:0.18ms L:38% G:0%
    |       |   
    |       \---sepFilter2D - TC:1 C:240 T:109.57ms L:50% G:0%
    |           |---convertTo - TC:1 C:480 T:2.18ms L:2% G:0%
    |           |   \---ipp_convertTo<W_IPP> - TC:1 C:480 T:1.18ms L:54% G:0%
    |           |       \---::ipp::iwiScale<F_IPP> - TC:1 C:480 T:0.48ms L:41% G:0%
    |           |   
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.64ms L:1% G:0%
    |           |---row_filter<F_OCL> - TC:1 C:240 T:46.11ms L:42% G:0%
    |           |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.57ms L:1% G:0%
    |           |---col_filter<F_OCL> - TC:1 C:240 T:30.56ms L:28% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.63ms L:1% G:0%
    |           |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.41ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.63ms L:1% G:0%
    |           |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.45ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.52ms L:0% G:0%
    |           |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.41ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.64ms L:1% G:0%
    |           |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.37ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.47ms L:0% G:0%
    |           |---Compile: 2156d5d2860fd695 options: -D RA...<W_OCL> - TC:1 C:1 T:0.38ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.60ms L:1% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.48ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.48ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.47ms L:0% G:0%
    |           |---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.47ms L:0% G:0%
    |           \---Compile: aaef5b34beb66fe2 options: -D RA...<W_OCL> - TC:1 C:1 T:0.53ms L:0% G:0%
    |   
    |---Feature_Detection - TC:1 C:3 T:51.55ms L:0% G:0%
    |   |---Find_Scale_Space_Extrema - TC:1 C:3 T:45.32ms L:88% G:0%
    |   \---Do_Subpixel_Refinement - TC:1 C:3 T:6.21ms L:12% G:0%
    |       \---solve - TC:1 C:5520 T:3.28ms L:53% G:0%
    |   
    |---Compute_Keypoints_Orientation - TC:1 C:3 T:10005.75ms L:35% G:35%
    |   \---parallel_for_ - TC:1 C:3 T:10005.73ms L:100% G:35%
    |       \---operator() - TC:8 C:271 T:10005.37ms L:100% G:35%
    |           \---fastAtan2 - TC:8 C:5508 T:5.67ms L:0% G:0%
    |               \---fastAtan32f - TC:8 C:5508 T:2.65ms L:47% G:0%
    |                   \---fastAtan32f - TC:8 C:5508 T:1.17ms L:44% G:0%
    |   
    \---Compute_Descriptors - TC:1 C:3 T:17821.87ms L:63% G:63%
        \---parallel_for_ - TC:1 C:3 T:17821.25ms L:100% G:63%
            \---operator() - TC:8 C:271 T:17820.72ms L:100% G:63%

IPP weight: 0.1%
OPENCL weight: 0.5%
[/TRACE    ]
[       OK ] OCL_feature2d_detectAndExtract.detectAndExtract/6 (28438 ms)
[ RUN      ] OCL_feature2d_detectAndExtract.detectAndExtract/7


Geometric mean

                                                     Name of Test                                                          perf        perf         perf        perf       perf   
                                                                                                                        8200996b1   ba071d1ad    6f5382a6e   ba071d1ad  6f5382a6e 
                                                                                                                                                                 vs         vs    
                                                                                                                                                                perf       perf   
                                                                                                                                                             8200996b1  8200996b1 
                                                                                                                                                             (x-factor) (x-factor)
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 155.292 ms 10895.510 ms 10923.836 ms    0.01       0.01   
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                    114.314 ms 6721.282 ms  7428.931 ms     0.02       0.02   
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                    495.586 ms 68924.386 ms 92936.885 ms    0.01       0.01

we don't want to downlaod matrices ad hoc from gpu when the function in AKAZE needs it. There is a HUGE mapping overhead and without shared memory support a LOT of unnecessary transfers. This maps/downloads matrices just once.

* this was causing spurious segfaults in stitching tests due to propagation of NaNs * added new test, which checks for NaNs (added new debug asserts for NaNs) * valgrind now says everything is ok

hrnr · 2017-07-13T15:39:48Z

The builders are green again. I have spent this day bugging with valgrind and gdb to hunt down the bug that was failing the builder. Initialy there was uninitialized memory in just four pixels int the corners, it was spread though the pyramid and messed the results. It also caused crashes via segfaults, if the uninitialized memory could be interpreted as float NaNs.

This was quite hard bug to track down, because it was just 4 pixels that has been uninitialized, so it did not caused too much problem. The bug is also highly dependent on selected allocator, which is why it was causing problems only with OpenCL. I'm not sure why it did not cause any problem for Windows OpenCl.

I have also fixed the other bug with OpenCL, which caused matrices to be downloaded from GPU multiple times. OpenCL times are now back reasonable, although not really fast.

After fixing those 2 bugs, CPU times are a bit worse, but nothing horrible. I look into that see if I can make them better without breaking OpenCL again.


Geometric mean

                                                   Name of Test                                                        perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf       perf   
                                                                                                                    8200996b1  b12facf48  aa5a72b46  8cc0b286c  1d3f7fe9e  c13351891  76151e566  ea089a8ab  f9c2951fa  ba071d1ad  09c7288de  b12facf48  aa5a72b46  8cc0b286c  1d3f7fe9e  c13351891  76151e566  ea089a8ab  f9c2951fa  ba071d1ad  09c7288de 
                                                                                                                                                                                                                                                 vs         vs         vs         vs         vs         vs         vs         vs         vs         vs    
                                                                                                                                                                                                                                                perf       perf       perf       perf       perf       perf       perf       perf       perf       perf   
                                                                                                                                                                                                                                             8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1  8200996b1 
                                                                                                                                                                                                                                             (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor) (x-factor)
detectAndExtract::feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 71.572 ms  67.583 ms  44.742 ms  45.349 ms  42.336 ms  40.726 ms  38.732 ms  38.268 ms  43.283 ms  38.549 ms  39.980 ms     1.06       1.60       1.58       1.69       1.76       1.85       1.87       1.65       1.86       1.79   
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                    56.269 ms  52.597 ms  34.432 ms  34.941 ms  32.298 ms  31.299 ms  29.619 ms  29.137 ms  31.197 ms  29.492 ms  30.396 ms     1.07       1.63       1.61       1.74       1.80       1.90       1.93       1.80       1.91       1.85   
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                    273.459 ms 263.874 ms 163.169 ms 168.078 ms 153.912 ms 164.177 ms 147.598 ms 145.255 ms 156.495 ms 150.994 ms 156.637 ms    1.04       1.68       1.63       1.78       1.67       1.85       1.88       1.75       1.81       1.75

I have kernel for OpenCL for non-linear diffusion prepared, after it will be deployed, the whole pyramid construction could be done on GPU.

* Lt in pyramid changed to UMat, it will be downlaoded from GPU along with Lx, Ly * fix bug in pm_g2 kernel. OpenCV mangles dimensions passed to OpenCL, so we need to check for boundaries in each OCL kernel.

* computing of determinant is not a blocker, but with this change we don't need to download all spatial derivatives to CPU, we only download determinant * make Ldet in the pyramid UMat, download it from CPU together with the other parts of the pyramid * add profiling macros

hrnr · 2017-07-18T10:25:31Z

I'm finished with basic OpenCL support in AKAZE. Creation of the scale space pyramid runs almost fully on GPU (except computing k factor, which runs just once before constructing the pyramid). For computing keypoints and descriptors OCL is not supported. Supporting OCL for remaining parts might be interesting only after the creation of the pyramid will be faster, so that the remaining parts become a bottleneck.

The current OCL performace is not very good. GaussianBlur, Scharr, sepFilter2D all execute non-optimal OCL paths, especially GaussianBlur and Scharr are slower compared to IPP version. This will need to be optimized.

Performace result with NVIDIA GRID K520:


Geometric mean

                                                     Name of Test                                                          perf       perf       perf       perf       perf       perf       perf   
                                                                                                                        8200996b1  09c7288de  d71718dea  61a35d7a6  09c7288de  d71718dea  61a35d7a6 
                                                                                                                                                                        vs         vs         vs    
                                                                                                                                                                       perf       perf       perf   
                                                                                                                                                                    8200996b1  8200996b1  8200996b1 
                                                                                                                                                                    (x-factor) (x-factor) (x-factor)
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 156.955 ms 198.360 ms 179.362 ms 165.495 ms    0.79       0.88       0.95   
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                    112.895 ms 126.513 ms 139.499 ms 121.170 ms    0.89       0.81       0.93   
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                    502.205 ms 418.058 ms 394.685 ms 355.424 ms    1.20       1.27       1.41

The same machine without OpenCL (8 cores):


Geometric mean

                                                   Name of Test                                                        perf       perf       perf       perf       perf       perf       perf   
                                                                                                                    8200996b1  09c7288de  d71718dea  61a35d7a6  09c7288de  d71718dea  61a35d7a6 
                                                                                                                                                                    vs         vs         vs    
                                                                                                                                                                   perf       perf       perf   
                                                                                                                                                                8200996b1  8200996b1  8200996b1 
                                                                                                                                                                (x-factor) (x-factor) (x-factor)
detectAndExtract::feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 146.956 ms 75.688 ms  75.397 ms  76.015 ms     1.94       1.95       1.93   
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                    114.665 ms 56.766 ms  56.650 ms  57.000 ms     2.02       2.02       2.01   
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                    505.238 ms 292.877 ms 296.262 ms 294.253 ms    1.73       1.71       1.72

hrnr · 2017-07-19T10:21:21Z

I have also tried the current OCL version on intel hardware (Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz). The ocl implementation is slower than CPU, this is partly because of the unopmized paths for Gaussian and Scharr and partly just because intel GPU is slow. The intel GPU seems to execute better path for sepFilter2D, later I will try to get the same of better on nvidia too.


Geometric mean

                                                     Name of Test                                                          perf       perf       perf       perf       perf       perf       perf   
                                                                                                                        8200996b1  09c7288de  d71718dea  61a35d7a6  09c7288de  d71718dea  61a35d7a6 
                                                                                                                                                                        vs         vs         vs    
                                                                                                                                                                       perf       perf       perf   
                                                                                                                                                                    8200996b1  8200996b1  8200996b1 
                                                                                                                                                                    (x-factor) (x-factor) (x-factor)
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 78.903 ms  276.286 ms 372.727 ms 221.087 ms    0.29       0.21       0.36   
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                    75.532 ms  93.295 ms  80.923 ms  84.733 ms     0.81       0.93       0.89   
detectAndExtract::OCL_feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                    277.869 ms 341.712 ms 343.082 ms 281.725 ms    0.81       0.81       0.99

The first test is influenced by kernels compilations time. The same machine with disabled OCL:


Geometric mean

                                                   Name of Test                                                        perf       perf       perf       perf       perf       perf       perf   
                                                                                                                    8200996b1  09c7288de  d71718dea  61a35d7a6  09c7288de  d71718dea  61a35d7a6 
                                                                                                                                                                    vs         vs         vs    
                                                                                                                                                                   perf       perf       perf   
                                                                                                                                                                8200996b1  8200996b1  8200996b1 
                                                                                                                                                                (x-factor) (x-factor) (x-factor)
detectAndExtract::feature2d::(AKAZE_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 110.199 ms 38.914 ms  39.671 ms  39.441 ms     2.83       2.78       2.79   
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/a3.png")                                                    66.143 ms  29.708 ms  29.918 ms  29.904 ms     2.23       2.21       2.21   
detectAndExtract::feature2d::(AKAZE_DEFAULT, "stitching/s2.jpg")                                                    310.263 ms 150.959 ms 160.233 ms 154.761 ms    2.06       1.94       2.00

CPU speedups up to 2.8x looks nice for the current code.

* TEvolution is used only in KAZE now

bmagyar · 2017-07-24T12:59:41Z

This PR is now concluded as everything in the work package has been completed.
Could we please get a review and merge @vpisarev ?

vpisarev · 2017-07-31T12:47:08Z

👍

alalek · 2017-07-31T12:49:29Z

modules/features2d/perf/perf_feature2d.cpp

    Mat mask;
    vector<KeyPoint> points;
+    // initialize task scheduler for TBB
+    cv::setNumThreads(cv::getNumberOfCPUs());


Do we really need this?

this is useful to get consistent results with instrumentation (first parallel function does not take the initialization penalty). But if you don't like this hack, it can be removed, just the timing will be less stable.

I believe this should be done in the ts module: #9278

You are right, I have reverted this change. With #9278 it'll be fine.

This reverts commit ba81e2a.

hrnr · 2017-08-01T08:26:08Z

Anything more I should fix?

alalek · 2017-08-01T12:45:24Z

👍

sovrasov added the GSoC label Jun 21, 2017

hrnr force-pushed the akaze_part2 branch from 566e5d7 to 28d66b3 Compare June 21, 2017 12:56

hrnr force-pushed the akaze_part2 branch from 64f2d7c to 3a0dba8 Compare June 30, 2017 08:04

hrnr added 9 commits June 30, 2017 12:06

features2d: add one bigger image to tests

0f9c6a0

* now test have images: 600x768, 900x600 and 1385x700 to cover different resolutions

feature2d: instrument more functions used in AKAZE

8200996

rework Compute_Determinant_Hessian_Response

8f4c57c

* this takes 84% of time of Feature_Detection * run everything in parallel * compute Scharr kernels just once * compute sigma more efficiently * allocate all matrices in evolution without zeroing

explicitly zero Lx and Ly

b12facf

* add Lflow and Lstep to evolution as in original AKAZE code

reworked computing keypoints orientation

aa5a72b

integrated faster function from https://github.com/h2suzuki/fast_akaze

use standard fastAtan2 instead of getAngle

8cc0b28

compute keypoints orientation in parallel

1d3f7fe

fix visual studio warnings

8015cfd

replace some wrapped functions with direct calls to OpenCV functions

7f8e9c2

* improved readability for people familiar with opencv * do not same image twice in base level

hrnr force-pushed the akaze_part2 branch from 3a0dba8 to 7f8e9c2 Compare June 30, 2017 12:23

hrnr added 11 commits June 30, 2017 23:37

rework diffusity stencil

fca8463

* use one pass stencil for diffusity from https://github.com/h2suzuki/fast_akaze * improve locality in Create_Scale_Space

parallelize non linear diffusion computation

554c47e

do multiplication right in the nlp diffusity kernel

c133518

rework kfactor computation

76151e5

* get rid of sharing buffers when creating scale space pyramid, the performace impact is neglegible

features2d: initialize TBB scheduler in perf tests

ba81e2a

* ensures more stable output * more reasonable profiles, since the first call of parallel_for_ is not getting big performace hit

compute_kfactor: interleave finding of maximum and computing distance

ea089a8

* no need to go twice through the data

enable use of all existing ocl paths in AKAZE

f9c2951

* all parts that uses ocl-enabled functions should use ocl by now

imgproc: fix dispatching of IPP version when OCL is disabled

e4ae5d2

* when OCL is disabled IPP version should be always prefered (even when the dst is UMat)

get rid of copy in DeterminantHessian response

8cafc4b

* this slows CPU version considerably * do no run in parallel when running with OCL

port diffusivity to OCL

6f5382a

* diffusivity itself is not a blocker, but this saves us downloading and uploading derivations

hrnr added 3 commits July 13, 2017 16:50

implement kernel for nonlinear scalar diffusion step

883bda2

download the pyramid from GPU just once

c8f2327

we don't want to downlaod matrices ad hoc from gpu when the function in AKAZE needs it. There is a HUGE mapping overhead and without shared memory support a LOT of unnecessary transfers. This maps/downloads matrices just once.

fix bug with uninitialized values in non linear diffusion

09c7288

* this was causing spurious segfaults in stitching tests due to propagation of NaNs * added new test, which checks for NaNs (added new debug asserts for NaNs) * valgrind now says everything is ok

hrnr force-pushed the akaze_part2 branch from e4e8d26 to 09c7288 Compare July 13, 2017 14:51

add nonlinear diffusion step OCL implementation

d71718d

* Lt in pyramid changed to UMat, it will be downlaoded from GPU along with Lx, Ly * fix bug in pm_g2 kernel. OpenCV mangles dimensions passed to OpenCL, so we need to check for boundaries in each OCL kernel.

hrnr force-pushed the akaze_part2 branch from fa11778 to d71718d Compare July 17, 2017 10:16

hrnr added 2 commits July 17, 2017 15:36

fix visual studio warning

81e349a

instrument non_linear_diffusion

7d7d438

remove changes I have made to TEvolution

b58ebaa

* TEvolution is used only in KAZE now

hrnr force-pushed the akaze_part2 branch from 4bd54bb to b58ebaa Compare July 21, 2017 22:06

hrnr mentioned this pull request Jul 27, 2017

[GSOC] Speeding-up AKAZE, part #3 #9249

Merged

vpisarev self-assigned this Jul 31, 2017

vpisarev added this to the 3.3 milestone Jul 31, 2017

alalek reviewed Jul 31, 2017

View reviewed changes

Revert "features2d: initialize TBB scheduler in perf tests"

0eeedf9

This reverts commit ba81e2a.

alalek merged commit bb6496d into opencv:master Aug 1, 2017

hrnr mentioned this pull request Aug 24, 2017

[GSOC] Speeding-up AKAZE, tracking tutorial #9444

Merged

shimat mentioned this pull request Aug 15, 2018

AKAZE keypoint output differs between 3.2 and 3.3 #12217

Closed

Uh oh!

Conversation

hrnr commented Jun 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OpenCL:

CPU status:

The main parts (and largest bottlenecks) are:

Performance stats per commit:

Instrumentation output:

INITIAL:

CURRENT (ba071d1):

Failed branches:

Uh oh!

alalek commented Jun 21, 2017

Uh oh!

alalek commented Jun 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hrnr commented Jun 21, 2017

Uh oh!

ysolovyov commented Jun 21, 2017

Uh oh!

hrnr commented Jun 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hrnr commented Jun 30, 2017

Uh oh!

hrnr commented Jul 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bmagyar commented Jul 11, 2017

Uh oh!

hrnr commented Jul 12, 2017

Uh oh!

hrnr commented Jul 13, 2017

Uh oh!

hrnr commented Jul 18, 2017

Uh oh!

hrnr commented Jul 19, 2017

Uh oh!

bmagyar commented Jul 24, 2017

Uh oh!

vpisarev commented Jul 31, 2017

Uh oh!

alalek Jul 31, 2017

Choose a reason for hiding this comment

Uh oh!

hrnr Jul 31, 2017

Choose a reason for hiding this comment

Uh oh!

alalek Jul 31, 2017

Choose a reason for hiding this comment

Uh oh!

hrnr Jul 31, 2017

Choose a reason for hiding this comment

Uh oh!

hrnr commented Aug 1, 2017

Uh oh!

alalek commented Aug 1, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

hrnr commented Jun 21, 2017 •

edited

Loading

CURRENT (`ba071d1`):

alalek commented Jun 21, 2017 •

edited

Loading

hrnr commented Jun 22, 2017 •

edited

Loading

hrnr commented Jul 10, 2017 •

edited

Loading