Skip to content

sql: remove hard limit handling from heuristic planner#42086

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
savoie:refactor-hard-limits
Nov 5, 2019
Merged

sql: remove hard limit handling from heuristic planner#42086
craig[bot] merged 1 commit intocockroachdb:masterfrom
savoie:refactor-hard-limits

Conversation

@savoie
Copy link
Copy Markdown
Contributor

@savoie savoie commented Oct 31, 2019

Previously, the heuristic planner function applyLimit propagated and set both
hard and soft limits. Hard limit propagation is now handled in the optimizer by
using rules to push Limit operators down the tree as far as is possible, and to
remove Limits by adding hard limits to child Scan operators. As a result, a
majority of the logic within applyLimit that supported propagating hard
limits was never exercised, since optimization rules would have already
eliminated situations in which Limit operators high up in the tree could be the
cause of hard limits further down the tree. The one remaining case for which
applyLimit was still responsible for propagating hard limits was eliminated
by the normalization rule introduced in #41908.

The entirety of applyLimit will eventually be removed and replaced by the
optimizer. Removing hard limit propagation from it entirely is a step in this
direction, and will make it easier to refactor soft limit propagation later.

This patch changes applyLimit into applySoftLimit, which no longer performs
any hard limit propagation; the responsibility for setting any hard limits on
scanNodes and spoolNodes is now left to the optimizer.

It also renames setUnlimited to propagateSoftLimits in order to indicate
more clearly that this function is not used to remove any limits that might
have been set on a node, but instead to trigger soft limit propagation in case
a limit hint can be introduced further down the tree.

In the case where the optimizer does not generate a limited Scan in place of a
Limit around a Scan (because the Scan cannot be constrained), previously
applyLimit would still set the hardLimit field of the scan. This change
will result in the softLimit being set instead. The end state of the
planNode tree is otherwise the same as before this refactor.

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@savoie savoie requested a review from a team October 31, 2019 23:27
Copy link
Copy Markdown
Contributor Author

@savoie savoie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt_limits.go, line 35 at r1 (raw file):

		// The special value math.MaxInt64 means "no limit".
		if !n.disableBatchLimits && numRows != math.MaxInt64 {
			n.hardLimit = 0

This keeps the original behavior of applyLimit, which was to always overwrite an existing hard or soft limit with a new soft limit. hardLimit and softLimit are defined for a scanNode to be mutually exclusive; the other is supposed to be set to 0 if one is defined.


pkg/sql/opt_limits.go, line 57 at r1 (raw file):

	case *groupNode:
		p.propagateSoftLimits(n.plan)

the groupNode case used to have a special condition (n.needOnlyOneRow) that is no longer possible as of #39766 .


pkg/sql/opt_limits.go, line 60 at r1 (raw file):

	case *indexJoinNode:
		p.applySoftLimit(n.input, numRows)

Propagating hard limits through an indexJoinNode is made redundant by the transformation rule PushLimitIntoIndexJoin.


pkg/sql/opt_limits.go, line 65 at r1 (raw file):

	case *unionNode:
		if n.right != nil {
			p.applySoftLimit(n.right, numRows)

Propagating a soft limit into the right input seems incorrect for EXCEPT at the very least; I'm leaving this behavior unchanged for now with the intention of taking a closer look at it when using required physical properties to propagate soft limits.


pkg/sql/opt_limits.go, line 75 at r1 (raw file):

	case *filterNode:
		p.applySoftLimit(n.source.plan, numRows)

The filterNode case used to pass through a hard limit here if isFilterTrue(n.filter) was true. The SimplifyFilters normalization rule should have eliminated most cases where isFilterTrue(n.filter) would evaluate to true here; the exception may be for filters like WHERE current_user() = 'root', which isn't known during normalization.

I haven't been able to construct an example where a filter like this would have caused a node to have a hard limit before, but after this change it now gets a soft limit, but I think it might be possible.


pkg/sql/opt_limits.go, line 78 at r1 (raw file):

	case *renderNode:
		p.applySoftLimit(n.source.plan, numRows)

Propagating hard limits through a renderNode is made redundant by the normalization rule PushLimitIntoProject.


pkg/sql/opt_limits.go, line 91 at r1 (raw file):

	case *ordinalityNode:
		p.applySoftLimit(n.source, numRows)

Propagating hard limits through an ordinalityNode is made redundant by the normalization rule PushLimitIntoOrdinality.


pkg/sql/opt_limits.go, line 94 at r1 (raw file):

	case *spoolNode:
		p.propagateSoftLimits(n.source)

A spoolNode is always created around a mutation so it shouldn't be propagating hard or soft limits; see the comments for the deleteNode, updateNode, upsertNode, and insertNode cases below.

Previously, applyLimits would set the hardLimit on the spoolNode; this is taken care of by execFactory.ConstructLimit.


pkg/sql/opt_limits.go, line 98 at r1 (raw file):

	case *delayedNode:
		if n.plan != nil {
			p.propagateSoftLimits(n.plan)

As far as I can understand, adelayedNode corresponds to a virtual table, and virtual scans do not support limits, so neither hard nor soft limits should be propagated through a delayedNode.


pkg/sql/opt_limits.go, line 130 at r1 (raw file):

	case *createTableNode:
		if n.sourcePlan != nil {
			p.propagateSoftLimits(n.sourcePlan)

CREATE TABLE AS does not have any output rows, so propagating a hard or soft limit through this node does not make sense.

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Thanks for all the comments in the PR, they made reviewing this very easy!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice!

[nit] since this change doesn't affect the opt package, I'd change the commit and PR message to say "sql: remove hard limit..."

Reviewed 4 of 4 files at r1.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @savoie)


pkg/sql/opt_limits.go, line 53 at r1 (raw file):

	case *sortNode:
		// We can't propagate the limit, because the sort potentially needs all
		// rows.

[nit] This comment is a bit confusing now because of the word "propagate"... maybe say we can't apply the limit?

Previously, the heuristic planner function `applyLimit` propagated and set both
hard and soft limits. Hard limit propagation is now handled in the optimizer by
using rules to push Limit operators down the tree as far as is possible, and to
remove Limits by adding hard limits to child Scan operators. As a result, a
majority of the logic within `applyLimit` that supported propagating hard
limits was never exercised, since optimization rules would have already
eliminated situations in which Limit operators high up in the tree could be the
cause of hard limits further down the tree. The one remaining case for which
`applyLimit` was still responsible for propagating hard limits was eliminated
by the normalization rule introduced in cockroachdb#41908.

The entirety of `applyLimit` will eventually be removed and replaced by the
optimizer. Removing hard limit propagation from it entirely is a step in this
direction, and will make it easier to refactor soft limit propagation later.

This patch changes `applyLimit` into `applySoftLimit`, which no longer performs
any hard limit propagation; the responsibility for setting any hard limits on
scanNodes and spoolNodes is now left to the optimizer.

It also renames `setUnlimited` to `propagateSoftLimits` in order to indicate
more clearly that this function is not used to remove any limits that might
have been set on a node, but instead to trigger soft limit propagation in case
a limit hint can be introduced further down the tree.

In the case where the optimizer does not generate a limited Scan in place of a
Limit around a Scan (because the Scan cannot be constrained), previously
`applyLimit` would still set the `hardLimit` field of the scan.  This change
will result in the `softLimit` being set instead.  The end state of the
planNode tree is otherwise the same as before this refactor.

Release note: None
@savoie savoie force-pushed the refactor-hard-limits branch from 5e0fb4f to 80c27f5 Compare November 4, 2019 20:19
@savoie savoie changed the title opt: remove hard limit handling from heuristic planner sql: remove hard limit handling from heuristic planner Nov 4, 2019
Copy link
Copy Markdown
Contributor Author

@savoie savoie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)


pkg/sql/opt_limits.go, line 53 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] This comment is a bit confusing now because of the word "propagate"... maybe say we can't apply the limit?

Done.

savoie added a commit to savoie/cockroach that referenced this pull request Nov 5, 2019
This patch follows up on cockroachdb#42086 and finishes the refactor of the
heuristic planner `applyLimits` function. It removes the last of the
heuristic planner limit handling, and introduces the LimitHint required
physical property, which is represented by a float64 in anticipation of
future improvements that will use statistics to inform better
estimations.

Previously, in the case where a limit hint was propagated down to a
limitNode that had an offset but no limit, the limit hint would be
discarded. In this patch, an OffsetOp's limit hint will be propagated to
its children, having been increased enough to ensure that the required
number of rows can be discarded by the offset.

Another change from the behaviour of the heuristic planner's handling of
limit hints is in the case of subqueries. `applyLimit` would not be
called on subqueries created with `Builder.addSubquery`, so no nodes
within subqueries had soft limits set. Since handling of required
physical properties occurs before subqueries are created, this ceases to
be a special case and soft limits may be set for scanNodes within
subqueries.

Besides these two departures from the heuristic planner's limit
propagation, this patch should not have introduced further changes
affecting the ultimate value of `scanNode.softLimit`. With this
refactor, improvements to limit hint handling can be made within the
optimizer.

Release note: None
savoie added a commit to savoie/cockroach that referenced this pull request Nov 5, 2019
This patch follows up on cockroachdb#42086 and finishes the refactor of the
heuristic planner `applyLimits` function. It removes the last of the
heuristic planner limit handling, and introduces the LimitHint required
physical property, which is represented by a float64 in anticipation of
future improvements that will use statistics to inform better
estimations.

Previously, in the case where a limit hint was propagated down to a
limitNode that had an offset but no limit, the limit hint would be
discarded. In this patch, an OffsetOp's limit hint will be propagated to
its children, having been increased enough to ensure that the required
number of rows can be discarded by the offset.

Another change from the behaviour of the heuristic planner's handling of
limit hints is in the case of subqueries. `applyLimit` would not be
called on subqueries created with `Builder.addSubquery`, so no nodes
within subqueries had soft limits set. Since handling of required
physical properties occurs before subqueries are created, this ceases to
be a special case and soft limits may be set for scanNodes within
subqueries.

Besides these two departures from the heuristic planner's limit
propagation, this patch should not have introduced further changes
affecting the ultimate value of `scanNode.softLimit`. With this
refactor, improvements to limit hint handling can be made within the
optimizer.

Release note: None
Copy link
Copy Markdown
Contributor

@justinj justinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

![](https://66.media.tumblr.com/tumblr_m8kpk42WmG1rwl09fo1_500.gifv)

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)

@savoie
Copy link
Copy Markdown
Contributor Author

savoie commented Nov 5, 2019

bors r+

craig bot pushed a commit that referenced this pull request Nov 5, 2019
42086: sql: remove hard limit handling from heuristic planner r=savoie a=savoie

Previously, the heuristic planner function `applyLimit` propagated and set both
hard and soft limits. Hard limit propagation is now handled in the optimizer by
using rules to push Limit operators down the tree as far as is possible, and to
remove Limits by adding hard limits to child Scan operators. As a result, a
majority of the logic within `applyLimit` that supported propagating hard
limits was never exercised, since optimization rules would have already
eliminated situations in which Limit operators high up in the tree could be the
cause of hard limits further down the tree. The one remaining case for which
`applyLimit` was still responsible for propagating hard limits was eliminated
by the normalization rule introduced in #41908.

The entirety of `applyLimit` will eventually be removed and replaced by the
optimizer. Removing hard limit propagation from it entirely is a step in this
direction, and will make it easier to refactor soft limit propagation later.

This patch changes `applyLimit` into `applySoftLimit`, which no longer performs
any hard limit propagation; the responsibility for setting any hard limits on
scanNodes and spoolNodes is now left to the optimizer.

It also renames `setUnlimited` to `propagateSoftLimits` in order to indicate
more clearly that this function is not used to remove any limits that might
have been set on a node, but instead to trigger soft limit propagation in case
a limit hint can be introduced further down the tree.

In the case where the optimizer does not generate a limited Scan in place of a
Limit around a Scan (because the Scan cannot be constrained), previously
`applyLimit` would still set the `hardLimit` field of the scan.  This change
will result in the `softLimit` being set instead.  The end state of the
planNode tree is otherwise the same as before this refactor.

Release note: None

42181: storage/engine: port the final few MVCC benchmarks to Pebble r=petermattis a=petermattis

The `MVCCDeleteRange`, `ClearRange`, and `ClearIterRange` benchmarks
appear to give nonsensical results. I suspect the benchmarks themselves
are at fault and need to be rewritten.

```
name                          old time/op    new time/op    delta
MVCCScanTransactionalData-16    4.20ms ± 1%    3.39ms ± 0%  -19.15%  (p=0.000 n=9+9)
```

Release note: None

42185: cli/cliflags: add COCKROACH_STORAGE_ENGINE env var r=petermattis a=petermattis

`roachprod` passes through COCKROACH_* env vars allowing
`COCKROACH_STORAGE_ENGINE=pebble roachprod start` to work.

Fixes #41620

Release note (cli change): Add COCKROACH_STORAGE_ENGINE env var which
is tied to the `--storage-engine` flag. Allows selection of "pebble" as
an alternative to the default of "rocksdb".

Co-authored-by: Céline O'Neil <celineloneil@gmail.com>
Co-authored-by: Peter Mattis <petermattis@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 5, 2019

Build succeeded

@craig craig bot merged commit 80c27f5 into cockroachdb:master Nov 5, 2019
@savoie savoie deleted the refactor-hard-limits branch November 5, 2019 17:39
savoie added a commit to savoie/cockroach that referenced this pull request Nov 7, 2019
This patch follows up on cockroachdb#42086 and finishes the refactor of the
heuristic planner `applyLimits` function. It removes the last of the
heuristic planner limit handling, and introduces the LimitHint required
physical property, which is represented by a float64 in anticipation of
future improvements that will use statistics to inform better
estimations.

Previously, in the case where a limit hint was propagated down to a
limitNode that had an offset but no limit, the limit hint would be
discarded. In this patch, an OffsetOp's limit hint will be propagated to
its children, having been increased enough to ensure that the required
number of rows can be discarded by the offset.

Another change from the behaviour of the heuristic planner's handling of
limit hints is in the case of subqueries. `applyLimit` would not be
called on subqueries created with `Builder.addSubquery`, so no nodes
within subqueries had soft limits set. Since handling of required
physical properties occurs before subqueries are created, this ceases to
be a special case and soft limits may be set for scanNodes within
subqueries.

Besides these two departures from the heuristic planner's limit
propagation, this patch should not have introduced further changes
affecting the ultimate value of `scanNode.softLimit`. With this
refactor, improvements to limit hint handling can be made within the
optimizer.

Release note: None
savoie added a commit to savoie/cockroach that referenced this pull request Nov 8, 2019
This patch follows up on cockroachdb#42086 and finishes the refactor of the
heuristic planner `applyLimits` function. It removes the last of the
heuristic planner limit handling, and introduces the LimitHint required
physical property, which is represented by a float64 in anticipation of
future improvements that will use statistics to inform better
estimations.

Previously, in the case where a limit hint was propagated down to a
limitNode that had an offset but no limit, the limit hint would be
discarded. In this patch, an OffsetOp's limit hint will be propagated to
its children, having been increased enough to ensure that the required
number of rows can be discarded by the offset.

Another change from the behaviour of the heuristic planner's handling of
limit hints is in the case of subqueries. `applyLimit` would not be
called on subqueries created with `Builder.addSubquery`, so no nodes
within subqueries had soft limits set. Since handling of required
physical properties occurs before subqueries are created, this ceases to
be a special case and soft limits may be set for scanNodes within
subqueries.

Besides these two departures from the heuristic planner's limit
propagation, this patch should not have introduced further changes
affecting the ultimate value of `scanNode.softLimit`. With this
refactor, improvements to limit hint handling can be made within the
optimizer.

Release note: None
savoie added a commit to savoie/cockroach that referenced this pull request Nov 14, 2019
This patch follows up on cockroachdb#42086 and finishes the refactor of the
heuristic planner `applyLimits` function. It removes the last of the
heuristic planner limit handling, and introduces the LimitHint required
physical property, which is represented by a float64 in anticipation of
future improvements that will use statistics to inform better
estimations.

Previously, in the case where a limit hint was propagated down to a
limitNode that had an offset but no limit, the limit hint would be
discarded. In this patch, an OffsetOp's limit hint will be propagated to
its children, having been increased enough to ensure that the required
number of rows can be discarded by the offset.

Another change from the behaviour of the heuristic planner's handling of
limit hints is in the case of subqueries. `applyLimit` would not be
called on subqueries created with `Builder.addSubquery`, so no nodes
within subqueries had soft limits set. Since handling of required
physical properties occurs before subqueries are created, this ceases to
be a special case and soft limits may be set for scanNodes within
subqueries.

Besides these two departures from the heuristic planner's limit
propagation, this patch should not have introduced further changes
affecting the ultimate value of `scanNode.softLimit`. With this
refactor, improvements to limit hint handling can be made within the
optimizer.

Release note: None
craig bot pushed a commit that referenced this pull request Nov 19, 2019
42170:  opt: refactor limit hint propagation into a required physical property r=savoie a=savoie

This patch follows up on #42086 and finishes the refactor of the
heuristic planner `applyLimits` function. It removes the last of the
heuristic planner limit handling, and introduces the LimitHint required
physical property, which is represented by a float64 in anticipation of
future improvements that will use statistics to inform better
estimations.

Previously, in the case where a limit hint was propagated down to a
limitNode that had an offset but no limit, the limit hint would be
discarded. In this patch, an OffsetOp's limit hint will be propagated to
its children, having been increased enough to ensure that the required
number of rows can be discarded by the offset.

Another change from the behaviour of the heuristic planner's handling of
limit hints is in the case of subqueries. `applyLimit` would not be
called on subqueries created with `Builder.addSubquery`, so no nodes
within subqueries had soft limits set. Since handling of required
physical properties occurs before subqueries are created, this ceases to
be a special case and soft limits may be set for scanNodes within
subqueries.

Besides these two departures from the heuristic planner's limit
propagation, this patch should not have introduced further changes
affecting the ultimate value of `scanNode.softLimit`. With this
refactor, improvements to limit hint handling can be made within the
optimizer.

Release note: None

42521: jobs: allow jobs to be created and started separately r=spaskob a=ajwerner

Prior to this commit, if a creator of a job wanted to get the results it would
have to create and start it at the same time by calling jobs.Registry.StartJob.
That method would take a jobs.Record, create a new *jobs.Job and start it
immediately, sending results on a passed channel and returning an error
channel.

This is awkward because sometimes callers might want to do other work on the
same transaction which is creating the job. Furthermore those callers might
not want to start the job until after the creating transaction commits.

This refactor provides a mechanism for callers to create and store a job
in a transaction and then start it after that transaction commits.

Here's an example:

```
j := registry.NewJob(record)
if err := db.Txn(ctx, func(ctx context.Context, txn *client.Txn) error {
    if err := j.WithTxn(txn).Created(ctx); err != nil {
        return err
    }
    ...
}); err != nil {
    ...
}
resultsCh := make(chan tree.Datums)
errCh, err := j.Start(ctx, resultsCh)
...
```

Release note: None

Co-authored-by: Céline O'Neil <celineloneil@gmail.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants