sql: remove hard limit handling from heuristic planner#42086
sql: remove hard limit handling from heuristic planner#42086craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
savoie
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/sql/opt_limits.go, line 35 at r1 (raw file):
// The special value math.MaxInt64 means "no limit". if !n.disableBatchLimits && numRows != math.MaxInt64 { n.hardLimit = 0
This keeps the original behavior of applyLimit, which was to always overwrite an existing hard or soft limit with a new soft limit. hardLimit and softLimit are defined for a scanNode to be mutually exclusive; the other is supposed to be set to 0 if one is defined.
pkg/sql/opt_limits.go, line 57 at r1 (raw file):
case *groupNode: p.propagateSoftLimits(n.plan)
the groupNode case used to have a special condition (n.needOnlyOneRow) that is no longer possible as of #39766 .
pkg/sql/opt_limits.go, line 60 at r1 (raw file):
case *indexJoinNode: p.applySoftLimit(n.input, numRows)
Propagating hard limits through an indexJoinNode is made redundant by the transformation rule PushLimitIntoIndexJoin.
pkg/sql/opt_limits.go, line 65 at r1 (raw file):
case *unionNode: if n.right != nil { p.applySoftLimit(n.right, numRows)
Propagating a soft limit into the right input seems incorrect for EXCEPT at the very least; I'm leaving this behavior unchanged for now with the intention of taking a closer look at it when using required physical properties to propagate soft limits.
pkg/sql/opt_limits.go, line 75 at r1 (raw file):
case *filterNode: p.applySoftLimit(n.source.plan, numRows)
The filterNode case used to pass through a hard limit here if isFilterTrue(n.filter) was true. The SimplifyFilters normalization rule should have eliminated most cases where isFilterTrue(n.filter) would evaluate to true here; the exception may be for filters like WHERE current_user() = 'root', which isn't known during normalization.
I haven't been able to construct an example where a filter like this would have caused a node to have a hard limit before, but after this change it now gets a soft limit, but I think it might be possible.
pkg/sql/opt_limits.go, line 78 at r1 (raw file):
case *renderNode: p.applySoftLimit(n.source.plan, numRows)
Propagating hard limits through a renderNode is made redundant by the normalization rule PushLimitIntoProject.
pkg/sql/opt_limits.go, line 91 at r1 (raw file):
case *ordinalityNode: p.applySoftLimit(n.source, numRows)
Propagating hard limits through an ordinalityNode is made redundant by the normalization rule PushLimitIntoOrdinality.
pkg/sql/opt_limits.go, line 94 at r1 (raw file):
case *spoolNode: p.propagateSoftLimits(n.source)
A spoolNode is always created around a mutation so it shouldn't be propagating hard or soft limits; see the comments for the deleteNode, updateNode, upsertNode, and insertNode cases below.
Previously, applyLimits would set the hardLimit on the spoolNode; this is taken care of by execFactory.ConstructLimit.
pkg/sql/opt_limits.go, line 98 at r1 (raw file):
case *delayedNode: if n.plan != nil { p.propagateSoftLimits(n.plan)
As far as I can understand, adelayedNode corresponds to a virtual table, and virtual scans do not support limits, so neither hard nor soft limits should be propagated through a delayedNode.
pkg/sql/opt_limits.go, line 130 at r1 (raw file):
case *createTableNode: if n.sourcePlan != nil { p.propagateSoftLimits(n.sourcePlan)
CREATE TABLE AS does not have any output rows, so propagating a hard or soft limit through this node does not make sense.
RaduBerinde
left a comment
There was a problem hiding this comment.
Nice work! Thanks for all the comments in the PR, they made reviewing this very easy!
Reviewable status:
complete! 0 of 0 LGTMs obtained
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained
rytaft
left a comment
There was a problem hiding this comment.
[nit] since this change doesn't affect the opt package, I'd change the commit and PR message to say "sql: remove hard limit..."
Reviewed 4 of 4 files at r1.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @savoie)
pkg/sql/opt_limits.go, line 53 at r1 (raw file):
case *sortNode: // We can't propagate the limit, because the sort potentially needs all // rows.
[nit] This comment is a bit confusing now because of the word "propagate"... maybe say we can't apply the limit?
Previously, the heuristic planner function `applyLimit` propagated and set both hard and soft limits. Hard limit propagation is now handled in the optimizer by using rules to push Limit operators down the tree as far as is possible, and to remove Limits by adding hard limits to child Scan operators. As a result, a majority of the logic within `applyLimit` that supported propagating hard limits was never exercised, since optimization rules would have already eliminated situations in which Limit operators high up in the tree could be the cause of hard limits further down the tree. The one remaining case for which `applyLimit` was still responsible for propagating hard limits was eliminated by the normalization rule introduced in cockroachdb#41908. The entirety of `applyLimit` will eventually be removed and replaced by the optimizer. Removing hard limit propagation from it entirely is a step in this direction, and will make it easier to refactor soft limit propagation later. This patch changes `applyLimit` into `applySoftLimit`, which no longer performs any hard limit propagation; the responsibility for setting any hard limits on scanNodes and spoolNodes is now left to the optimizer. It also renames `setUnlimited` to `propagateSoftLimits` in order to indicate more clearly that this function is not used to remove any limits that might have been set on a node, but instead to trigger soft limit propagation in case a limit hint can be introduced further down the tree. In the case where the optimizer does not generate a limited Scan in place of a Limit around a Scan (because the Scan cannot be constrained), previously `applyLimit` would still set the `hardLimit` field of the scan. This change will result in the `softLimit` being set instead. The end state of the planNode tree is otherwise the same as before this refactor. Release note: None
5e0fb4f to
80c27f5
Compare
savoie
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)
pkg/sql/opt_limits.go, line 53 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] This comment is a bit confusing now because of the word "propagate"... maybe say we can't apply the limit?
Done.
This patch follows up on cockroachdb#42086 and finishes the refactor of the heuristic planner `applyLimits` function. It removes the last of the heuristic planner limit handling, and introduces the LimitHint required physical property, which is represented by a float64 in anticipation of future improvements that will use statistics to inform better estimations. Previously, in the case where a limit hint was propagated down to a limitNode that had an offset but no limit, the limit hint would be discarded. In this patch, an OffsetOp's limit hint will be propagated to its children, having been increased enough to ensure that the required number of rows can be discarded by the offset. Another change from the behaviour of the heuristic planner's handling of limit hints is in the case of subqueries. `applyLimit` would not be called on subqueries created with `Builder.addSubquery`, so no nodes within subqueries had soft limits set. Since handling of required physical properties occurs before subqueries are created, this ceases to be a special case and soft limits may be set for scanNodes within subqueries. Besides these two departures from the heuristic planner's limit propagation, this patch should not have introduced further changes affecting the ultimate value of `scanNode.softLimit`. With this refactor, improvements to limit hint handling can be made within the optimizer. Release note: None
This patch follows up on cockroachdb#42086 and finishes the refactor of the heuristic planner `applyLimits` function. It removes the last of the heuristic planner limit handling, and introduces the LimitHint required physical property, which is represented by a float64 in anticipation of future improvements that will use statistics to inform better estimations. Previously, in the case where a limit hint was propagated down to a limitNode that had an offset but no limit, the limit hint would be discarded. In this patch, an OffsetOp's limit hint will be propagated to its children, having been increased enough to ensure that the required number of rows can be discarded by the offset. Another change from the behaviour of the heuristic planner's handling of limit hints is in the case of subqueries. `applyLimit` would not be called on subqueries created with `Builder.addSubquery`, so no nodes within subqueries had soft limits set. Since handling of required physical properties occurs before subqueries are created, this ceases to be a special case and soft limits may be set for scanNodes within subqueries. Besides these two departures from the heuristic planner's limit propagation, this patch should not have introduced further changes affecting the ultimate value of `scanNode.softLimit`. With this refactor, improvements to limit hint handling can be made within the optimizer. Release note: None
justinj
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)
|
bors r+ |
42086: sql: remove hard limit handling from heuristic planner r=savoie a=savoie Previously, the heuristic planner function `applyLimit` propagated and set both hard and soft limits. Hard limit propagation is now handled in the optimizer by using rules to push Limit operators down the tree as far as is possible, and to remove Limits by adding hard limits to child Scan operators. As a result, a majority of the logic within `applyLimit` that supported propagating hard limits was never exercised, since optimization rules would have already eliminated situations in which Limit operators high up in the tree could be the cause of hard limits further down the tree. The one remaining case for which `applyLimit` was still responsible for propagating hard limits was eliminated by the normalization rule introduced in #41908. The entirety of `applyLimit` will eventually be removed and replaced by the optimizer. Removing hard limit propagation from it entirely is a step in this direction, and will make it easier to refactor soft limit propagation later. This patch changes `applyLimit` into `applySoftLimit`, which no longer performs any hard limit propagation; the responsibility for setting any hard limits on scanNodes and spoolNodes is now left to the optimizer. It also renames `setUnlimited` to `propagateSoftLimits` in order to indicate more clearly that this function is not used to remove any limits that might have been set on a node, but instead to trigger soft limit propagation in case a limit hint can be introduced further down the tree. In the case where the optimizer does not generate a limited Scan in place of a Limit around a Scan (because the Scan cannot be constrained), previously `applyLimit` would still set the `hardLimit` field of the scan. This change will result in the `softLimit` being set instead. The end state of the planNode tree is otherwise the same as before this refactor. Release note: None 42181: storage/engine: port the final few MVCC benchmarks to Pebble r=petermattis a=petermattis The `MVCCDeleteRange`, `ClearRange`, and `ClearIterRange` benchmarks appear to give nonsensical results. I suspect the benchmarks themselves are at fault and need to be rewritten. ``` name old time/op new time/op delta MVCCScanTransactionalData-16 4.20ms ± 1% 3.39ms ± 0% -19.15% (p=0.000 n=9+9) ``` Release note: None 42185: cli/cliflags: add COCKROACH_STORAGE_ENGINE env var r=petermattis a=petermattis `roachprod` passes through COCKROACH_* env vars allowing `COCKROACH_STORAGE_ENGINE=pebble roachprod start` to work. Fixes #41620 Release note (cli change): Add COCKROACH_STORAGE_ENGINE env var which is tied to the `--storage-engine` flag. Allows selection of "pebble" as an alternative to the default of "rocksdb". Co-authored-by: Céline O'Neil <celineloneil@gmail.com> Co-authored-by: Peter Mattis <petermattis@gmail.com>
Build succeeded |
This patch follows up on cockroachdb#42086 and finishes the refactor of the heuristic planner `applyLimits` function. It removes the last of the heuristic planner limit handling, and introduces the LimitHint required physical property, which is represented by a float64 in anticipation of future improvements that will use statistics to inform better estimations. Previously, in the case where a limit hint was propagated down to a limitNode that had an offset but no limit, the limit hint would be discarded. In this patch, an OffsetOp's limit hint will be propagated to its children, having been increased enough to ensure that the required number of rows can be discarded by the offset. Another change from the behaviour of the heuristic planner's handling of limit hints is in the case of subqueries. `applyLimit` would not be called on subqueries created with `Builder.addSubquery`, so no nodes within subqueries had soft limits set. Since handling of required physical properties occurs before subqueries are created, this ceases to be a special case and soft limits may be set for scanNodes within subqueries. Besides these two departures from the heuristic planner's limit propagation, this patch should not have introduced further changes affecting the ultimate value of `scanNode.softLimit`. With this refactor, improvements to limit hint handling can be made within the optimizer. Release note: None
This patch follows up on cockroachdb#42086 and finishes the refactor of the heuristic planner `applyLimits` function. It removes the last of the heuristic planner limit handling, and introduces the LimitHint required physical property, which is represented by a float64 in anticipation of future improvements that will use statistics to inform better estimations. Previously, in the case where a limit hint was propagated down to a limitNode that had an offset but no limit, the limit hint would be discarded. In this patch, an OffsetOp's limit hint will be propagated to its children, having been increased enough to ensure that the required number of rows can be discarded by the offset. Another change from the behaviour of the heuristic planner's handling of limit hints is in the case of subqueries. `applyLimit` would not be called on subqueries created with `Builder.addSubquery`, so no nodes within subqueries had soft limits set. Since handling of required physical properties occurs before subqueries are created, this ceases to be a special case and soft limits may be set for scanNodes within subqueries. Besides these two departures from the heuristic planner's limit propagation, this patch should not have introduced further changes affecting the ultimate value of `scanNode.softLimit`. With this refactor, improvements to limit hint handling can be made within the optimizer. Release note: None
This patch follows up on cockroachdb#42086 and finishes the refactor of the heuristic planner `applyLimits` function. It removes the last of the heuristic planner limit handling, and introduces the LimitHint required physical property, which is represented by a float64 in anticipation of future improvements that will use statistics to inform better estimations. Previously, in the case where a limit hint was propagated down to a limitNode that had an offset but no limit, the limit hint would be discarded. In this patch, an OffsetOp's limit hint will be propagated to its children, having been increased enough to ensure that the required number of rows can be discarded by the offset. Another change from the behaviour of the heuristic planner's handling of limit hints is in the case of subqueries. `applyLimit` would not be called on subqueries created with `Builder.addSubquery`, so no nodes within subqueries had soft limits set. Since handling of required physical properties occurs before subqueries are created, this ceases to be a special case and soft limits may be set for scanNodes within subqueries. Besides these two departures from the heuristic planner's limit propagation, this patch should not have introduced further changes affecting the ultimate value of `scanNode.softLimit`. With this refactor, improvements to limit hint handling can be made within the optimizer. Release note: None
42170: opt: refactor limit hint propagation into a required physical property r=savoie a=savoie This patch follows up on #42086 and finishes the refactor of the heuristic planner `applyLimits` function. It removes the last of the heuristic planner limit handling, and introduces the LimitHint required physical property, which is represented by a float64 in anticipation of future improvements that will use statistics to inform better estimations. Previously, in the case where a limit hint was propagated down to a limitNode that had an offset but no limit, the limit hint would be discarded. In this patch, an OffsetOp's limit hint will be propagated to its children, having been increased enough to ensure that the required number of rows can be discarded by the offset. Another change from the behaviour of the heuristic planner's handling of limit hints is in the case of subqueries. `applyLimit` would not be called on subqueries created with `Builder.addSubquery`, so no nodes within subqueries had soft limits set. Since handling of required physical properties occurs before subqueries are created, this ceases to be a special case and soft limits may be set for scanNodes within subqueries. Besides these two departures from the heuristic planner's limit propagation, this patch should not have introduced further changes affecting the ultimate value of `scanNode.softLimit`. With this refactor, improvements to limit hint handling can be made within the optimizer. Release note: None 42521: jobs: allow jobs to be created and started separately r=spaskob a=ajwerner Prior to this commit, if a creator of a job wanted to get the results it would have to create and start it at the same time by calling jobs.Registry.StartJob. That method would take a jobs.Record, create a new *jobs.Job and start it immediately, sending results on a passed channel and returning an error channel. This is awkward because sometimes callers might want to do other work on the same transaction which is creating the job. Furthermore those callers might not want to start the job until after the creating transaction commits. This refactor provides a mechanism for callers to create and store a job in a transaction and then start it after that transaction commits. Here's an example: ``` j := registry.NewJob(record) if err := db.Txn(ctx, func(ctx context.Context, txn *client.Txn) error { if err := j.WithTxn(txn).Created(ctx); err != nil { return err } ... }); err != nil { ... } resultsCh := make(chan tree.Datums) errCh, err := j.Start(ctx, resultsCh) ... ``` Release note: None Co-authored-by: Céline O'Neil <celineloneil@gmail.com> Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Previously, the heuristic planner function
applyLimitpropagated and set bothhard and soft limits. Hard limit propagation is now handled in the optimizer by
using rules to push Limit operators down the tree as far as is possible, and to
remove Limits by adding hard limits to child Scan operators. As a result, a
majority of the logic within
applyLimitthat supported propagating hardlimits was never exercised, since optimization rules would have already
eliminated situations in which Limit operators high up in the tree could be the
cause of hard limits further down the tree. The one remaining case for which
applyLimitwas still responsible for propagating hard limits was eliminatedby the normalization rule introduced in #41908.
The entirety of
applyLimitwill eventually be removed and replaced by theoptimizer. Removing hard limit propagation from it entirely is a step in this
direction, and will make it easier to refactor soft limit propagation later.
This patch changes
applyLimitintoapplySoftLimit, which no longer performsany hard limit propagation; the responsibility for setting any hard limits on
scanNodes and spoolNodes is now left to the optimizer.
It also renames
setUnlimitedtopropagateSoftLimitsin order to indicatemore clearly that this function is not used to remove any limits that might
have been set on a node, but instead to trigger soft limit propagation in case
a limit hint can be introduced further down the tree.
In the case where the optimizer does not generate a limited Scan in place of a
Limit around a Scan (because the Scan cannot be constrained), previously
applyLimitwould still set thehardLimitfield of the scan. This changewill result in the
softLimitbeing set instead. The end state of theplanNode tree is otherwise the same as before this refactor.
Release note: None