opt: introduce PushLimitIntoOrdinality rule#41908
opt: introduce PushLimitIntoOrdinality rule#41908craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
884a880 to
99e13f0
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
Looks good overall!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @savoie)
pkg/sql/opt/norm/custom_funcs.go, line 408 at r1 (raw file):
} // Returns whether <ordering1> implies <ordering2>.
[nit] exported function comments should start with the function name (OrderingImplies returns ..).
pkg/sql/opt/norm/custom_funcs.go, line 413 at r1 (raw file):
// is a valid intersection), though it is possible that an intersection exists // even if neither ordering implies the other (see OrderingChoice.Intersects). func (c *CustomFuncs) OrderingImplies(ordering1, ordering2 physical.OrderingChoice) bool {
I thought we would be using Intersects instead of Implies, am I missing something?
As an example, if either of the operator has a+ while the other has a+,b+, we should be able to push down with a+,b+
pkg/sql/opt/norm/rules/limit.opt, line 109 at r1 (raw file):
(Const $limit:*) $limitOrdering:* & (HasColsInOrdering $input $limitOrdering)
[nit] & at the end and next line indented (for consistency with existing style)
pkg/sql/opt/norm/testdata/rules/limit, line 513 at r1 (raw file):
# Complex example of a limit that should be pushed down, since the ordinality's ordering implies # the limit ordering: +3 +4 opt(2) => +3 +2.
[nit] should use the column names in the comment (the col id assignments could change in the future)
99e13f0 to
bfc1437
Compare
justinj
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @savoie)
pkg/sql/opt/norm/rules/limit.opt, line 107 at r1 (raw file):
$private:* ) (Const $limit:*)
how come this has to be a Const?
pkg/sql/opt/norm/testdata/rules/limit, line 513 at r1 (raw file):
# Complex example of a limit that should be pushed down, since the ordinality's ordering implies # the limit ordering: +3 +4 opt(2) => +3 +2.
Is this implication true? The comment above Implies seems to indicate no? They do have nonempty intersection (+3 +2 +4) which might be what you mean?
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @justinj and @savoie)
pkg/sql/opt/norm/testdata/rules/limit, line 513 at r1 (raw file):
Previously, justinj (Justin Jaffray) wrote…
Is this implication true? The comment above
Impliesseems to indicate no? They do have nonempty intersection (+3 +2 +4) which might be what you mean?
I'd expect the limit's ordering to also have opt(i) since we reduce it using the input FDs (I believe that's what's happening in the plan)
bfc1437 to
ac785ad
Compare
savoie
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @justinj and @RaduBerinde)
pkg/sql/opt/norm/custom_funcs.go, line 413 at r1 (raw file):
Previously, RaduBerinde wrote…
I thought we would be using
Intersectsinstead ofImplies, am I missing something?As an example, if either of the operator has
a+while the other hasa+,b+, we should be able to push down witha+,b+
The minimal counter-example to Intersects working that I found was using the orderings x- and <empty> (which intersect: x- is an intersection) on this table foo:
x
===
a
b
SELECT * FROM (SELECT * FROM foo WITH ORDINALITY) ORDER BY x DESC LIMIT 1
should give the result
x | ordinality
================
b | 2
but if the LIMIT 1 is pushed into the ordinality you get
x | ordinality
================
b | 1
since only 1 row is ever provided to the ordinality; we're never going to get it to assign the ordinal 2.
On the other hand, with the same orderings but with the ordinality ordering implying the limit ordering, we're OK to push down (if we replace the limit's ordering with the intersection of the two orderings):
SELECT * FROM (SELECT * FROM foo ORDER BY x DESC) WITH ORDINALITY LIMIT 1
and
SELECT * FROM (SELECT * FROM foo ORDER BY x DESC LIMIT 1) WITH ORDINALITY
give the same result:
x | ordinality
================
b | 1
I don't have a perfect proof of why intersection isn't sufficient but implication is, but here's my intuition:
Let's say that the limit is for 10 rows. The limit can only be pushed down into the ordinality if those 10 rows are exactly the first 10 rows (numbered 1-10) of the ordinality ordering. If the rows are the ones that, without applying the rule, would be numbered 2-11, the limit can't be pushed down, since there's no way to tell the ordinality to begin its numbering at 2.
So if (prior to pushing down the limit), the result of the ordinality operation needs to be reordered at all when applying the limit, the limit operator can't be pushed down into the ordinality. (Strictly, the limit could still be pushed down if the set of rows actually selected by the limit remained unchanged, but this can't be known as far as I'm aware.) So it's not enough that an intersection of the two orderings exists; the ordinality ordering needs to imply the limit ordering.
pkg/sql/opt/norm/rules/limit.opt, line 107 at r1 (raw file):
Previously, justinj (Justin Jaffray) wrote…
how come this has to be a
Const?
...sometimes I blindly copy existing code without thinking it through; fixed
pkg/sql/opt/norm/testdata/rules/limit, line 513 at r1 (raw file):
Previously, RaduBerinde wrote…
I'd expect the limit's ordering to also have
opt(i)since we reduce it using the input FDs (I believe that's what's happening in the plan)
You're both right; I updated the test to include an optional ordering and didn't think the comment through fully. The limit's ordering has an opt(i) when the rule is applied, so I've updated the comment.
|
pkg/sql/opt/norm/custom_funcs.go, line 413 at r1 (raw file): Previously, savoie (Céline O'Neil) wrote…
|
This optimizer rule is the first of several that will push limit operators as far down the tree as possible, in order to remove the need for logic in `opt_limits.go` that currently recurses down the plan tree ensuring that leaf scan nodes have hard limits set even when a limit operator is much higher up in the tree. The following explanation is also included as a comment above the rule: > PushLimitIntoOrdinality pushes the Limit operator into the Ordinality > operator when the ordering associated with both operators allows it. > > Pushing the limit as far as possible down the tree shouldn't have > negative effects, but will reduce the number of rows processed by > operators higher up, and if the limit is pushed all the way down to a > scan, the scan can be limited directly. > > In order to prevent this rule from affecting: > 1. the set of rows kept by the limit, > 2. the ordinals assigned to those rows by the ordinality, and > 3. the final ordering of the rows, > the new limit's ordering should be "extended" to imply the > ordinality's ordering, so it is set to be an intersection of the > original limit ordering and the ordinality's ordering (see > OrderingChoice.Intersection). Release note: None
ac785ad to
84dcc63
Compare
savoie
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @justinj and @RaduBerinde)
pkg/sql/opt/norm/custom_funcs.go, line 413 at r1 (raw file):
Previously, RaduBerinde wrote…
SELECT * FROM foo WITH ORDINALITYmeans that the ordinality can be assigned in any order, so bothb 1andb 2are correct results for the query..
Updated to use intersection, and removed ambiguity in the allowed ordering of the logic tests.
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @justinj and @RaduBerinde)
justinj
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @justinj and @RaduBerinde)
|
bors r+ |
41908: opt: introduce PushLimitIntoOrdinality rule r=savoie a=savoie This optimizer rule is the first of several that will push limit operators as far down the tree as possible, in order to remove the need for logic in `opt_limits.go` that currently recurses down the plan tree ensuring that leaf scan nodes have hard limits set even when a limit operator is much higher up in the tree. The following explanation is also included as a comment above the rule: > PushLimitIntoOrdinality pushes the Limit operator into the Ordinality > operator when the ordering associated with both operators allows it. > > Pushing the limit as far as possible down the tree shouldn't have > negative effects, but will reduce the number of rows processed by > operators higher up, and if the limit is pushed all the way down to a > scan, the scan can be limited directly. > > In order to prevent this rule from affecting: > 1. the set of rows kept by the limit, > 2. the ordinals assigned to those rows by the ordinality, and > 3. the final ordering of the rows, > the rule should only be applied if the ordinality's ordering implies > the original limit's ordering (see OrderingChoice.Implies). > > The new limit's ordering must also be "extended" to imply the > ordinality's ordering, which is done by setting it to be the > intersection of the old limit's ordering and the ordinality's ordering > (see OrderingChoice.Intersection). Release note: None Co-authored-by: Céline O'Neil <celineloneil@gmail.com>
Build succeeded |
Previously, the heuristic planner function `applyLimit` propagated and set both hard and soft limits. Hard limit propagation is now handled in the optimizer by using rules to push Limit operators down the tree as far as is possible, and to remove Limits by adding hard limits to child Scan operators. As a result, a majority of the logic within `applyLimit` that supported propagating hard limits was never exercised, since optimization rules would have already eliminated situations in which Limit operators high up in the tree could be the cause of hard limits further down the tree. The one remaining case for which `applyLimit` was still responsible for propagating hard limits was eliminated by the normalization rule introduced in cockroachdb#41908. The entirety of `applyLimit` will eventually be removed and replaced by the optimizer. Removing hard limit propagation from it entirely is a step in this direction, and will make it easier to refactor soft limit propagation later. This patch changes `applyLimit` into `applySoftLimit`, which no longer performs any hard limit propagation; the responsibility for setting any hard limits on scanNodes and spoolNodes is now left to the optimizer. It also renames `setUnlimited` to `propagateSoftLimits` in order to indicate more clearly that this function is not used to remove any limits that might have been set on a node, but instead to trigger soft limit propagation in case a limit hint can be introduced further down the tree. In the case where the optimizer does not generate a limited Scan in place of a Limit around a Scan (because the Scan cannot be constrained), previously `applyLimit` would still set the `hardLimit` field of the scan. This change will result in the `softLimit` being set instead. The end state of the planNode tree is otherwise the same as before this refactor. Release note: None
Previously, the heuristic planner function `applyLimit` propagated and set both hard and soft limits. Hard limit propagation is now handled in the optimizer by using rules to push Limit operators down the tree as far as is possible, and to remove Limits by adding hard limits to child Scan operators. As a result, a majority of the logic within `applyLimit` that supported propagating hard limits was never exercised, since optimization rules would have already eliminated situations in which Limit operators high up in the tree could be the cause of hard limits further down the tree. The one remaining case for which `applyLimit` was still responsible for propagating hard limits was eliminated by the normalization rule introduced in cockroachdb#41908. The entirety of `applyLimit` will eventually be removed and replaced by the optimizer. Removing hard limit propagation from it entirely is a step in this direction, and will make it easier to refactor soft limit propagation later. This patch changes `applyLimit` into `applySoftLimit`, which no longer performs any hard limit propagation; the responsibility for setting any hard limits on scanNodes and spoolNodes is now left to the optimizer. It also renames `setUnlimited` to `propagateSoftLimits` in order to indicate more clearly that this function is not used to remove any limits that might have been set on a node, but instead to trigger soft limit propagation in case a limit hint can be introduced further down the tree. In the case where the optimizer does not generate a limited Scan in place of a Limit around a Scan (because the Scan cannot be constrained), previously `applyLimit` would still set the `hardLimit` field of the scan. This change will result in the `softLimit` being set instead. The end state of the planNode tree is otherwise the same as before this refactor. Release note: None
42086: sql: remove hard limit handling from heuristic planner r=savoie a=savoie Previously, the heuristic planner function `applyLimit` propagated and set both hard and soft limits. Hard limit propagation is now handled in the optimizer by using rules to push Limit operators down the tree as far as is possible, and to remove Limits by adding hard limits to child Scan operators. As a result, a majority of the logic within `applyLimit` that supported propagating hard limits was never exercised, since optimization rules would have already eliminated situations in which Limit operators high up in the tree could be the cause of hard limits further down the tree. The one remaining case for which `applyLimit` was still responsible for propagating hard limits was eliminated by the normalization rule introduced in #41908. The entirety of `applyLimit` will eventually be removed and replaced by the optimizer. Removing hard limit propagation from it entirely is a step in this direction, and will make it easier to refactor soft limit propagation later. This patch changes `applyLimit` into `applySoftLimit`, which no longer performs any hard limit propagation; the responsibility for setting any hard limits on scanNodes and spoolNodes is now left to the optimizer. It also renames `setUnlimited` to `propagateSoftLimits` in order to indicate more clearly that this function is not used to remove any limits that might have been set on a node, but instead to trigger soft limit propagation in case a limit hint can be introduced further down the tree. In the case where the optimizer does not generate a limited Scan in place of a Limit around a Scan (because the Scan cannot be constrained), previously `applyLimit` would still set the `hardLimit` field of the scan. This change will result in the `softLimit` being set instead. The end state of the planNode tree is otherwise the same as before this refactor. Release note: None 42181: storage/engine: port the final few MVCC benchmarks to Pebble r=petermattis a=petermattis The `MVCCDeleteRange`, `ClearRange`, and `ClearIterRange` benchmarks appear to give nonsensical results. I suspect the benchmarks themselves are at fault and need to be rewritten. ``` name old time/op new time/op delta MVCCScanTransactionalData-16 4.20ms ± 1% 3.39ms ± 0% -19.15% (p=0.000 n=9+9) ``` Release note: None 42185: cli/cliflags: add COCKROACH_STORAGE_ENGINE env var r=petermattis a=petermattis `roachprod` passes through COCKROACH_* env vars allowing `COCKROACH_STORAGE_ENGINE=pebble roachprod start` to work. Fixes #41620 Release note (cli change): Add COCKROACH_STORAGE_ENGINE env var which is tied to the `--storage-engine` flag. Allows selection of "pebble" as an alternative to the default of "rocksdb". Co-authored-by: Céline O'Neil <celineloneil@gmail.com> Co-authored-by: Peter Mattis <petermattis@gmail.com>
This optimizer rule is the first of several that will push limit
operators as far down the tree as possible, in order to remove the need
for logic in
opt_limits.gothat currently recurses down the plantree ensuring that leaf scan nodes have hard limits set even when a
limit operator is much higher up in the tree. The following explanation
is also included as a comment above the rule:
Release note: None