Skip to content

Improve BRISK Initialization#18037

Merged
opencv-pushbot merged 1 commit intoopencv:3.4from
danielenricocahall:improve-brisk-init-perf
Aug 18, 2020
Merged

Improve BRISK Initialization#18037
opencv-pushbot merged 1 commit intoopencv:3.4from
danielenricocahall:improve-brisk-init-perf

Conversation

@danielenricocahall
Copy link
Copy Markdown
Contributor

@danielenricocahall danielenricocahall commented Aug 5, 2020

-Reorder loops in BRISK to reduce unnecessary values being recomputed
-Utilize lookup tables and a trigonometry identity to reduce the number of calls to sine/cosine

This PR addresses the ideas described in #13390. It's worth noting that this implementation is heavily based off the fork (https://github.com/okriof/opencv/tree/master/modules/features2d/src) from the OP of #13390 - I just omitted the sine/cosine approximation aspect.

Specs:
OS: Linux (Ubuntu 18.04)
Build type: Debug
Compiler: /usr/bin/c++ (ver 7.5.0)
Parallel framework: pthreads (nthreads=8)
CPU features: SSE SSE2 SSE3 *SSE4.1 *SSE4.2 *FP16 *AVX *AVX2 *AVX512-SKX?

cat /proc/cpuinfo  | grep 'name'| uniq
model name	: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz

Quick Performance Evaluation Test:

    std::vector<double> times_elapsed;
    for(int i = 0; i < 100; ++i) {
        const auto t_start = std::chrono::high_resolution_clock::now();
        Ptr<FeatureDetector> detector = BRISK::create();
        const auto t_end = std::chrono::high_resolution_clock::now();
        times_elapsed.push_back(std::chrono::duration<double>(t_end - t_start).count());
    }
    double mean = std::accumulate(times_elapsed.begin(), times_elapsed.end(), 0.0) / times_elapsed.size();
    std::cout << mean << std::endl;

Output (original): 0.50843
Output (refactor): 0.0477887

The refactor is about 10x faster than the original on my machine (specs above), and may yield even better improvements on low power devices.

Update

With @vpisarev suggestion of using sine/cosine approximations for the lookup table, the output time improves to 0.0369674, which is ~13.75x faster.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under OpenCV (BSD) License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@danielenricocahall danielenricocahall changed the title Improve initialization performance of Brisk Improve BRISK Initialization Aug 5, 2020
@asmorkalov
Copy link
Copy Markdown
Contributor

@danielenricocahall Thanks for the contribution! The patch is very noisy and contains a lot of formatting changes and non-functional changes. Please revert all formatting changes and presume only functional part.

@danielenricocahall
Copy link
Copy Markdown
Contributor Author

@danielenricocahall Thanks for the contribution! The patch is very noisy and contains a lot of formatting changes and non-functional changes. Please revert all formatting changes and presume only functional part.

@asmorkalov ah sorry! Used the autoformatting in CLion and it looks like it reformatted the whole file. Fixed!

@asmorkalov
Copy link
Copy Markdown
Contributor

Please pay attention on CI test failure on Windows:

[ RUN      ] Features2d_FLANN_Auto.regression
C:\build\precommit_windows64\3.4\opencv\modules\features2d\test\test_nearestneighbors.cpp(115): error: Expected: (correctPerc) >= (.75), actual: 0.741 vs 0.75
correctMatches=741 pointsCount=1000
C:\build\precommit_windows64\3.4\opencv\modules\ts\src\ts.cpp(616): error: Failed

	failure reason: Bad accuracy
	test case #-1
	seed: ffffffffffffffff
-----------------------------------

[  FAILED  ] Features2d_FLANN_Auto.regression (52446 ms)

Link: https://pullrequest.opencv.org/buildbot/builders/precommit_windows64/builds/25733/steps/test_features2d/logs/stdio

@asmorkalov asmorkalov self-requested a review August 5, 2020 13:07
@danielenricocahall
Copy link
Copy Markdown
Contributor Author

Please pay attention on CI test failure on Windows:

[ RUN      ] Features2d_FLANN_Auto.regression
C:\build\precommit_windows64\3.4\opencv\modules\features2d\test\test_nearestneighbors.cpp(115): error: Expected: (correctPerc) >= (.75), actual: 0.741 vs 0.75
correctMatches=741 pointsCount=1000
C:\build\precommit_windows64\3.4\opencv\modules\ts\src\ts.cpp(616): error: Failed

	failure reason: Bad accuracy
	test case #-1
	seed: ffffffffffffffff
-----------------------------------

[  FAILED  ] Features2d_FLANN_Auto.regression (52446 ms)

Link: https://pullrequest.opencv.org/buildbot/builders/precommit_windows64/builds/25733/steps/test_features2d/logs/stdio

@asmorkalov Sorry I just saw this - that failure looks like a documented transient error though: #9179. Should I just re-kick off the build?

@danielenricocahall danielenricocahall force-pushed the improve-brisk-init-perf branch 4 times, most recently from b106acc to 02ac64c Compare August 8, 2020 17:08
{
points_ += numberList[ring];
}
for (size_t rot = 0; rot < n_rot_; ++rot)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_rot_ is hard-codded constant and I propose to calculate the table once or even replace it with pre-computed values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asmorkalov yes that's pretty much what I did with the trig values - I just don't know if other values in that loop can be precomputed.

@danielenricocahall danielenricocahall force-pushed the improve-brisk-init-perf branch 2 times, most recently from 15a018b to 1db2a58 Compare August 16, 2020 20:15
Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share the performance test that you use for the constructor?

}
double cosval = 1., sinval = 0.;
double dcos = cos(2*CV_PI/double(n_rot_)), dsin = sin(2*CV_PI/double(n_rot_));
for( size_t rot = 0; rot < n_rot_; ++rot)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to add a comment that loop body is approximation of sin and cos. It makes code easier to read.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good - will add comments later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asmorkalov added some comments

@danielenricocahall
Copy link
Copy Markdown
Contributor Author

Could you share the performance test that you use for the constructor?

@asmorkalov the performance evaluation I used is contained in the description. But just for clarity:

    std::vector<double> times_elapsed;
    for(int i = 0; i < 100; ++i) {
        const auto t_start = std::chrono::high_resolution_clock::now();
        Ptr<FeatureDetector> detector = BRISK::create();
        const auto t_end = std::chrono::high_resolution_clock::now();
        times_elapsed.push_back(std::chrono::duration<double>(t_end - t_start).count());
    }
    double mean = std::accumulate(times_elapsed.begin(), times_elapsed.end(), 0.0) / times_elapsed.size();
    std::cout << mean << std::endl;

@asmorkalov asmorkalov self-requested a review August 18, 2020 06:57
Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well Done! Please squash commits to merge the change as single patch.

reformatting

Improve initialization performance of Brisk

fix formatting

Improve initialization performance of Brisk

formatting

Improve initialization performance of Brisk

make a lookup table for ring

use cosine/sine lookup table for theta in brisk and utilize trig identity

fix ring lookup table

use cosine/sine lookup table for theta in brisk and utilize trig identity

formatting

use cosine/sine lookup table for theta in brisk and utilize trig identity

move scale radius product to ring loop to ensure it's not recomputed for each rot

revert change

move scale radius product to ring loop to ensure it's not recomputed for each rot

remove rings lookup table

move scale radius product to ring loop to ensure it's not recomputed for each rot

fix formatting of for loop

move scale radius product to ring loop to ensure it's not recomputed for each rot

use sine/cosine approximations for brisk lookup table.

add documentation for sine/cosine lookup tables

Improve initialization performance of BRISK
@danielenricocahall
Copy link
Copy Markdown
Contributor Author

Well Done! Please squash commits to merge the change as single patch.

@asmorkalov done!

Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@opencv-pushbot opencv-pushbot merged commit 2c1f348 into opencv:3.4 Aug 18, 2020
@alalek alalek mentioned this pull request Aug 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants