Skip to content

apply partial sort to save computations#19552

Merged
alalek merged 5 commits intoopencv:3.4from
WeiChungChang:partialSort
Feb 22, 2021
Merged

apply partial sort to save computations#19552
alalek merged 5 commits intoopencv:3.4from
WeiChungChang:partialSort

Conversation

@WeiChungChang
Copy link
Copy Markdown
Contributor

@WeiChungChang WeiChungChang commented Feb 18, 2021

According to experiment, for sorting std::vector, when the target to be sorted is less than 12.5% of the whole vector, apply partial_sort instead of sort benefits at most x8 speedup.

Typically within detection out layer, the output is far less than the proposals after NMS; for example, for faster RCNN model coming from TensorFlow, there are 90 (# of classes) * 100 proposals but outputs 100 outcomes only. Sometimes even after NMS, the proposals is still far largen than the number to keep. Ex, for maskRCNN, we keep top 100 but proposal after NMS is 1292, so the ratio is 100/1292 ~ 8% and partial sort is faster x 2 times than sort. In the case if we keep top 10 only, the ratio is 10/1292 < 1% so partial sort benefits more.
As the result, in this case, apply partial sort makes it faster x8 times than applying std::sort().

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              12
On-line CPU(s) list: 0-11
Thread(s) per core:  2
Core(s) per socket:  6
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               158
Model name:          Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz
Stepping:            13
CPU MHz:             3699.935
CPU max MHz:         4600.0000
CPU min MHz:         800.0000
BogoMIPS:            5199.98
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            12288K
NUMA node0 CPU(s):   0-11

Following table shows the speedup of the partial sort ratio; from it we can see when ratio < 25%, partial sort is faster than sort.
To make it safe, 12.5% is chosen
.

ratio speedup
0.01 882.304527
0.025 568.382353
0.05 282.666039
0.075 214.444444
0.1 184.935221
0.125 148.211587
0.15 139.189189
0.175 123.027245
0.2 105.377107
0.225 107.397118
0.25 97.782454
0.275 91.972024
0.3 84.519949
0.325 80.845336
0.35 76.994597
0.375 74.148076
0.4 73.282076
0.425 68.344028
0.45 67.519529
0.475 67.85597
0.5 63.972632
0.525 64.522519
0.55 62.118126
0.575 57.528958
0.6 60.985036
0.625 60.06216
0.65 55.178097
0.675 55.948583
0.7 58.331031
0.725 54.652328
0.75 53.867627
0.775 52.787238
0.8 54.747005
0.825 52.296487
0.85 53.444712
0.875 52.717297
0.9 51.256324
0.925 52.305246
0.95 52.063384
0.975 53.834947
1 54.353426

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contribution!

This patch should go into 3.4 branch first.
We will merge changes from 3.4 into master regularly (weekly/bi-weekly).

Please:

  • change "base" branch of this PR: master => 3.4 (use "Edit" button near PR title)
  • rebase your commits from master onto 3.4 branch. For example:
    git rebase -i --onto upstream/3.4 upstream/master
    (check list of your commits, save and quit (Esc + "wq" + Enter)
    where upstream is configured by following this GitHub guide and fetched (git fetch upstream).
  • push rebased commits into source branch of your fork (with --force option)

Note: no needs to re-open PR, apply changes "inplace".

std::sort(scoreIndexPairs.begin(), scoreIndexPairs.end(),
util::SortScorePairDescend<std::pair<int, int> >);
// if _keepTopK * 8 <= scoreIndexPairs.size(), apply partial sort.
if ((_keepTopK << 3) > scoreIndexPairs.size()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<< 3

modern compilers are smart enough, so just * 8 should work well too.

@alalek
Copy link
Copy Markdown
Member

alalek commented Feb 20, 2021

Please remove unrelated commits (the first 2 commits) and squash commits into one (use git rebase -i) before migration on 3.4 branch.

@WeiChungChang WeiChungChang changed the base branch from master to 3.4 February 21, 2021 23:06
Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 👍

@alalek alalek merged commit f6bc4fd into opencv:3.4 Feb 22, 2021
@alalek alalek mentioned this pull request Feb 27, 2021
@alalek alalek mentioned this pull request Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants