*: re-implement partition pruning for better performance#14679
*: re-implement partition pruning for better performance#14679tiancaiamao merged 15 commits intopingcap:masterfrom
Conversation
The new partition pruning algorithm is not as powerful as the original one. However, it's much faster by using binary search and avoid constructing expressions.
|
Just now I run a benchmark on my own laptop and compare the result The old version: and the new version: |
| func fullRange(end int) partitionRangeOR { | ||
| var reduceAllocation [3]partitionRange | ||
| reduceAllocation[0] = partitionRange{0, end} | ||
| return partitionRangeOR(reduceAllocation[:1]) |
There was a problem hiding this comment.
| return partitionRangeOR(reduceAllocation[:1]) | |
| return reduceAllocation[:1] |
| // Let M = intersection, U = union, then | ||
| // a M (b U c) == (a M b) U (a M c) | ||
| ret := or[:0] | ||
| for _, r1 := range or { |
There was a problem hiding this comment.
just a question: can we make sure or be sorted?
if so maybe we can more optimize to avoid {6, 7} U {3, 9} call in {0, 4}, {6, 7}, {8, 11} U {3, 9} or {3, +INF} U {6, 7} call in {3, +INF} U {0, 4}, {6, 7}- -?
it seems leaf exp (e.g. a > 3 in a > 3 and b < 5) can take benfit from this(although intersectionRange is fast)
There was a problem hiding this comment.
Maintain the sorted condition is more restrictive.
The or array is usually small, so I guess loop over the whole array VS maintain the sorted array and break in advance don't have a big difference.
What problem does this PR solve?
Re-implement partition pruning for better performance.
The old partition pruning's implementation uses an algorithm we called constraint propagate which is powerful, yet slow.
It works like this:
It's powerful as it can propagate something like
a > b and b > 3=>a > 3and it also include some propagate rules for functionsx > const1, f(x) < const2, f is monotonous => false.It's possible to handle some cases like:
But the process is slow because constructing new expressions involve too many object allocation, and the constraint propagation process is also heavy. If there are many partitions, 2048 for example, the whole time spent on partition pruning would be significant.
What is changed and how it works?
The new partition pruning algorithm is much faster by using binary search and avoid constructing expressions.
The query expression would be abstract to
f(col) op const, whereopis one of> < = >= <=, in the code it's represent asdataForPrune.The 'partition p0 less than xxx, partition p1 less than xxx, ...' forms an array
[p0 p1 p2 ... maxvalue], represented as thelessThanData.The new algorithm uses a binary search
pruneUseBinarySearchto locate const in the array.A simple benchmark,
Before:
After:
Some side notes:
As null values are all located in the first partition, the first partition expression in the old implementation is
val < xxx or val is null. Theor val is nullcondition is hard to eliminate so we can't prune the first partition in many cases.The new implementation doesn't consider the null value, so the first partition is pruned.
There is a
relaxoperation during the pruning, and it may lead to less accurate pruning results.For example,
This condition doesn't always hold:
A counterexample is:
So we have to relax
<to<=to handle functions.Check List
Tests
Related changes
Release note