scheduler: Use heap-sort instead of quickselect by aaronlehmann · Pull Request #1550 · moby/swarmkit

aaronlehmann · 2016-09-19T19:24:09Z

The HA scheduling algorithm introduced code that built a slice of all
nodes meeting the constraints, then selected the k best nodes from that
slice.

Building the slice can be expensive if there are a lot of nodes.

To improve performance, we can build a max-heap of nodes and pop all of
the elements to perform a heap sort. We never need to have more than k
items in the heap at any time, since the heap enables a trivial
optimization to check if a node is "better" than the root of the heap.
If it isn't, we can skip that node, and if it is, we can replace the
root of the heap with it.

This approach needs less memory allocation with k << len(nodes).

It also removes the dependency on quickselect.

Thanks to @LK4D4 for the idea.

aaronlehmann · 2016-09-19T19:30:45Z

Benchmark results:


benchmark                                           old ns/op       new ns/op       delta
BenchmarkScheduler1kNodes1kTasks-8                  99535349        98353639        -1.19%
BenchmarkScheduler1kNodes10kTasks-8                 546633387       548325249       +0.31%
BenchmarkScheduler1kNodes100kTasks-8                6060034233      6160128707      +1.65%
BenchmarkScheduler100kNodes1kTasks-8                1907288560      311341903       -83.68%
BenchmarkScheduler100kNodes100kTasks-8              6561291350      6670113753      +1.66%
BenchmarkScheduler100kNodes1MTasks-8                87268133282     81827858414     -6.23%
BenchmarkSchedulerConstraints1kNodes1kTasks-8       165552129       153443398       -7.31%
BenchmarkSchedulerConstraints1kNodes10kTasks-8      729888274       722834862       -0.97%
BenchmarkSchedulerConstraints1kNodes100kTasks-8     6605733833      6801978602      +2.97%
BenchmarkSchedulerConstraints5kNodes100kTasks-8     6662712088      6575577455      -1.31%

benchmark                                           old allocs     new allocs     delta
BenchmarkScheduler1kNodes1kTasks-8                  243095         244065         +0.40%
BenchmarkScheduler1kNodes10kTasks-8                 2760071        2760575        +0.02%
BenchmarkScheduler1kNodes100kTasks-8                31112694       31106768       -0.02%
BenchmarkScheduler100kNodes1kTasks-8                348289         349075         +0.23%
BenchmarkScheduler100kNodes100kTasks-8              31005741       31105897       +0.32%
BenchmarkScheduler100kNodes1MTasks-8                345009629      345098851      +0.03%
BenchmarkSchedulerConstraints1kNodes1kTasks-8       255069         257582         +0.99%
BenchmarkSchedulerConstraints1kNodes10kTasks-8      2879668        2880483        +0.03%
BenchmarkSchedulerConstraints1kNodes100kTasks-8     32308673       32314324       +0.02%
BenchmarkSchedulerConstraints5kNodes100kTasks-8     32164883       32168084       +0.01%

benchmark                                           old bytes       new bytes       delta
BenchmarkScheduler1kNodes1kTasks-8                  17889218        17944530        +0.31%
BenchmarkScheduler1kNodes10kTasks-8                 198313008       198338596       +0.01%
BenchmarkScheduler1kNodes100kTasks-8                2207604464      2206614656      -0.04%
BenchmarkScheduler100kNodes1kTasks-8                69665912        43189427        -38.00%
BenchmarkScheduler100kNodes100kTasks-8              2326743080      2332232632      +0.24%
BenchmarkScheduler100kNodes1MTasks-8                24970282280     24969463304     -0.00%
BenchmarkSchedulerConstraints1kNodes1kTasks-8       18517088        18688487        +0.93%
BenchmarkSchedulerConstraints1kNodes10kTasks-8      204626968       204619544       -0.00%
BenchmarkSchedulerConstraints1kNodes100kTasks-8     2269526128      2270519280      +0.04%
BenchmarkSchedulerConstraints5kNodes100kTasks-8     2275526616      2274922968      -0.03%

As expected, this is a big improvement if there are many more nodes than tasks being scheduled. Otherwise, the difference is minor.

codecov-io · 2016-09-19T19:30:52Z

Current coverage is 54.30% (diff: 92.00%)

Merging #1550 into master will increase coverage by 0.31%

@@             master      #1550   diff @@
==========================================
  Files            82         82          
  Lines         13528      13539    +11   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           7305       7353    +48   
+ Misses         5233       5207    -26   
+ Partials        990        979    -11

Powered by Codecov. Last update a2e755d...6c8cb80

The HA scheduling algorithm introduced code that built a slice of all nodes meeting the constraints, then selected the k best nodes from that slice. Building the slice can be expensive if there are a lot of nodes. To improve performance, we can build a max-heap of nodes and pop all of the elements to perform a heap sort. We never need to have more than k items in the heap at any time, since the heap enables a trivial optimization to check if a node is "better" than the root of the heap. If it isn't, we can skip that node, and if it is, we can replace the root of the heap with it. This approach needs less memory allocation with k << len(nodes). It also removes the dependency on quickselect. Thanks to LK4D4 for the idea. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

nishanttotla · 2016-09-19T20:26:19Z

manager/scheduler/nodeset.go


 // findBestNodes returns n nodes (or < n if fewer nodes are available) that
 // rank best (lowest) according to the sorting function.
 func (ns *nodeSet) findBestNodes(n int, meetsConstraints func(*NodeInfo) bool, nodeLess func(*NodeInfo, *NodeInfo) bool) []NodeInfo {


Just to confirm: Is this argument n the same as k in the above description, when findBestNodes actually gets called?

nishanttotla · 2016-09-19T20:26:39Z

LGTM

stevvooe · 2016-09-19T20:30:46Z

manager/scheduler/nodeset.go

+		// If there are fewer then n nodes in the heap, we add this
+		// node if it meets the constraints. Otherwise, the heap has
+		// n nodes, and if this node is better than the worst node in
+		// the heap, we replace the worst node and then fix the heap.


Does this imply that we have a minHeap, rather than a maxHeap?

I think in this case "less" means better, so for all the nodes in the heap, the worst ("largest") is at the top.

The comparator is defined so that Less(a, b) is true if a is better than b.

So the "greatest" node is the least favored, and that's the one we want at the root of our heap so we can easily check if we can replace it with a better one.

LK4D4

LGTM

GordonTheTurtle added the status/0-triage label Sep 19, 2016

aaronlehmann force-pushed the faster-node-selection branch from c6d3ee5 to 6c8cb80 Compare September 19, 2016 19:31

nishanttotla reviewed Sep 19, 2016

View reviewed changes

stevvooe reviewed Sep 19, 2016

View reviewed changes

aaronlehmann mentioned this pull request Sep 19, 2016

Topology-aware scheduling #1512

Merged

LK4D4 approved these changes Sep 20, 2016

View reviewed changes

LK4D4 merged commit e5f6230 into moby:master Sep 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduler: Use heap-sort instead of quickselect#1550

scheduler: Use heap-sort instead of quickselect#1550
LK4D4 merged 1 commit intomoby:masterfrom
aaronlehmann:faster-node-selection

aaronlehmann commented Sep 19, 2016

Uh oh!

aaronlehmann commented Sep 19, 2016

Uh oh!

codecov-io commented Sep 19, 2016 •

edited

Loading

Uh oh!

nishanttotla Sep 19, 2016

Uh oh!

aaronlehmann Sep 19, 2016

Uh oh!

nishanttotla commented Sep 19, 2016

Uh oh!

stevvooe Sep 19, 2016

Uh oh!

nishanttotla Sep 19, 2016 •

edited

Loading

Uh oh!

aaronlehmann Sep 19, 2016

Uh oh!

LK4D4 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

aaronlehmann commented Sep 19, 2016

Uh oh!

aaronlehmann commented Sep 19, 2016

Uh oh!

codecov-io commented Sep 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current coverage is 54.30% (diff: 92.00%)

Uh oh!

nishanttotla Sep 19, 2016

Choose a reason for hiding this comment

Uh oh!

aaronlehmann Sep 19, 2016

Choose a reason for hiding this comment

Uh oh!

nishanttotla commented Sep 19, 2016

Uh oh!

stevvooe Sep 19, 2016

Choose a reason for hiding this comment

Uh oh!

nishanttotla Sep 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aaronlehmann Sep 19, 2016

Choose a reason for hiding this comment

Uh oh!

LK4D4 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov-io commented Sep 19, 2016 •

edited

Loading

nishanttotla Sep 19, 2016 •

edited

Loading