scheduler: Use heap-sort instead of quickselect#1550
Conversation
|
Benchmark results: As expected, this is a big improvement if there are many more nodes than tasks being scheduled. Otherwise, the difference is minor. |
Current coverage is 54.30% (diff: 92.00%)@@ master #1550 diff @@
==========================================
Files 82 82
Lines 13528 13539 +11
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 7305 7353 +48
+ Misses 5233 5207 -26
+ Partials 990 979 -11
|
The HA scheduling algorithm introduced code that built a slice of all nodes meeting the constraints, then selected the k best nodes from that slice. Building the slice can be expensive if there are a lot of nodes. To improve performance, we can build a max-heap of nodes and pop all of the elements to perform a heap sort. We never need to have more than k items in the heap at any time, since the heap enables a trivial optimization to check if a node is "better" than the root of the heap. If it isn't, we can skip that node, and if it is, we can replace the root of the heap with it. This approach needs less memory allocation with k << len(nodes). It also removes the dependency on quickselect. Thanks to LK4D4 for the idea. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
c6d3ee5 to
6c8cb80
Compare
|
|
||
| // findBestNodes returns n nodes (or < n if fewer nodes are available) that | ||
| // rank best (lowest) according to the sorting function. | ||
| func (ns *nodeSet) findBestNodes(n int, meetsConstraints func(*NodeInfo) bool, nodeLess func(*NodeInfo, *NodeInfo) bool) []NodeInfo { |
There was a problem hiding this comment.
Just to confirm: Is this argument n the same as k in the above description, when findBestNodes actually gets called?
|
LGTM |
| // If there are fewer then n nodes in the heap, we add this | ||
| // node if it meets the constraints. Otherwise, the heap has | ||
| // n nodes, and if this node is better than the worst node in | ||
| // the heap, we replace the worst node and then fix the heap. |
There was a problem hiding this comment.
Does this imply that we have a minHeap, rather than a maxHeap?
There was a problem hiding this comment.
I think in this case "less" means better, so for all the nodes in the heap, the worst ("largest") is at the top.
There was a problem hiding this comment.
The comparator is defined so that Less(a, b) is true if a is better than b.
So the "greatest" node is the least favored, and that's the one we want at the root of our heap so we can easily check if we can replace it with a better one.
The HA scheduling algorithm introduced code that built a slice of all
nodes meeting the constraints, then selected the k best nodes from that
slice.
Building the slice can be expensive if there are a lot of nodes.
To improve performance, we can build a max-heap of nodes and pop all of
the elements to perform a heap sort. We never need to have more than k
items in the heap at any time, since the heap enables a trivial
optimization to check if a node is "better" than the root of the heap.
If it isn't, we can skip that node, and if it is, we can replace the
root of the heap with it.
This approach needs less memory allocation with k << len(nodes).
It also removes the dependency on quickselect.
Thanks to @LK4D4 for the idea.