Skip to content

opt: introduce PushLimitIntoOrdinality rule#41908

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
savoie:limit-ordinality-rule
Oct 31, 2019
Merged

opt: introduce PushLimitIntoOrdinality rule#41908
craig[bot] merged 1 commit intocockroachdb:masterfrom
savoie:limit-ordinality-rule

Conversation

@savoie
Copy link
Copy Markdown
Contributor

@savoie savoie commented Oct 24, 2019

This optimizer rule is the first of several that will push limit
operators as far down the tree as possible, in order to remove the need
for logic in opt_limits.go that currently recurses down the plan
tree ensuring that leaf scan nodes have hard limits set even when a
limit operator is much higher up in the tree. The following explanation
is also included as a comment above the rule:

PushLimitIntoOrdinality pushes the Limit operator into the Ordinality
operator when the ordering associated with both operators allows it.

Pushing the limit as far as possible down the tree shouldn't have
negative effects, but will reduce the number of rows processed by
operators higher up, and if the limit is pushed all the way down to a
scan, the scan can be limited directly.

In order to prevent this rule from affecting:

  1. the set of rows kept by the limit,
  2. the ordinals assigned to those rows by the ordinality, and
  3. the final ordering of the rows,
    the rule should only be applied if the ordinality's ordering implies
    the original limit's ordering (see OrderingChoice.Implies).

The new limit's ordering must also be "extended" to imply the
ordinality's ordering, which is done by setting it to be the
intersection of the old limit's ordering and the ordinality's ordering
(see OrderingChoice.Intersection).

Release note: None

@savoie savoie requested a review from a team as a code owner October 24, 2019 17:01
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@savoie savoie force-pushed the limit-ordinality-rule branch from 884a880 to 99e13f0 Compare October 24, 2019 17:29
Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @savoie)


pkg/sql/opt/norm/custom_funcs.go, line 408 at r1 (raw file):

}

// Returns whether <ordering1> implies <ordering2>.

[nit] exported function comments should start with the function name (OrderingImplies returns ..).


pkg/sql/opt/norm/custom_funcs.go, line 413 at r1 (raw file):

// is a valid intersection), though it is possible that an intersection exists
// even if neither ordering implies the other (see OrderingChoice.Intersects).
func (c *CustomFuncs) OrderingImplies(ordering1, ordering2 physical.OrderingChoice) bool {

I thought we would be using Intersects instead of Implies, am I missing something?

As an example, if either of the operator has a+ while the other has a+,b+, we should be able to push down with a+,b+


pkg/sql/opt/norm/rules/limit.opt, line 109 at r1 (raw file):

    (Const $limit:*)
    $limitOrdering:*
    & (HasColsInOrdering $input $limitOrdering)

[nit] & at the end and next line indented (for consistency with existing style)


pkg/sql/opt/norm/testdata/rules/limit, line 513 at r1 (raw file):


# Complex example of a limit that should be pushed down, since the ordinality's ordering implies
# the limit ordering: +3 +4 opt(2) => +3 +2.

[nit] should use the column names in the comment (the col id assignments could change in the future)

@savoie savoie force-pushed the limit-ordinality-rule branch from 99e13f0 to bfc1437 Compare October 24, 2019 17:50
Copy link
Copy Markdown
Contributor

@justinj justinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @savoie)


pkg/sql/opt/norm/rules/limit.opt, line 107 at r1 (raw file):

        $private:*
    )
    (Const $limit:*)

how come this has to be a Const?


pkg/sql/opt/norm/testdata/rules/limit, line 513 at r1 (raw file):


# Complex example of a limit that should be pushed down, since the ordinality's ordering implies
# the limit ordering: +3 +4 opt(2) => +3 +2.

Is this implication true? The comment above Implies seems to indicate no? They do have nonempty intersection (+3 +2 +4) which might be what you mean?

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @savoie)


pkg/sql/opt/norm/testdata/rules/limit, line 513 at r1 (raw file):

Previously, justinj (Justin Jaffray) wrote…

Is this implication true? The comment above Implies seems to indicate no? They do have nonempty intersection (+3 +2 +4) which might be what you mean?

I'd expect the limit's ordering to also have opt(i) since we reduce it using the input FDs (I believe that's what's happening in the plan)

@savoie savoie force-pushed the limit-ordinality-rule branch from bfc1437 to ac785ad Compare October 24, 2019 18:41
Copy link
Copy Markdown
Contributor Author

@savoie savoie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @RaduBerinde)


pkg/sql/opt/norm/custom_funcs.go, line 413 at r1 (raw file):

Previously, RaduBerinde wrote…

I thought we would be using Intersects instead of Implies, am I missing something?

As an example, if either of the operator has a+ while the other has a+,b+, we should be able to push down with a+,b+

The minimal counter-example to Intersects working that I found was using the orderings x- and <empty> (which intersect: x- is an intersection) on this table foo:

 x
===
 a
 b
SELECT * FROM (SELECT * FROM foo WITH ORDINALITY) ORDER BY x DESC LIMIT 1

should give the result

 x | ordinality
================
 b | 2

but if the LIMIT 1 is pushed into the ordinality you get

 x | ordinality
================
 b | 1

since only 1 row is ever provided to the ordinality; we're never going to get it to assign the ordinal 2.

On the other hand, with the same orderings but with the ordinality ordering implying the limit ordering, we're OK to push down (if we replace the limit's ordering with the intersection of the two orderings):

SELECT * FROM (SELECT * FROM foo ORDER BY x DESC) WITH ORDINALITY LIMIT 1

and

SELECT * FROM (SELECT * FROM foo ORDER BY x DESC LIMIT 1) WITH ORDINALITY

give the same result:

 x | ordinality
================
 b | 1

I don't have a perfect proof of why intersection isn't sufficient but implication is, but here's my intuition:

Let's say that the limit is for 10 rows. The limit can only be pushed down into the ordinality if those 10 rows are exactly the first 10 rows (numbered 1-10) of the ordinality ordering. If the rows are the ones that, without applying the rule, would be numbered 2-11, the limit can't be pushed down, since there's no way to tell the ordinality to begin its numbering at 2.

So if (prior to pushing down the limit), the result of the ordinality operation needs to be reordered at all when applying the limit, the limit operator can't be pushed down into the ordinality. (Strictly, the limit could still be pushed down if the set of rows actually selected by the limit remained unchanged, but this can't be known as far as I'm aware.) So it's not enough that an intersection of the two orderings exists; the ordinality ordering needs to imply the limit ordering.


pkg/sql/opt/norm/rules/limit.opt, line 107 at r1 (raw file):

Previously, justinj (Justin Jaffray) wrote…

how come this has to be a Const?

...sometimes I blindly copy existing code without thinking it through; fixed


pkg/sql/opt/norm/testdata/rules/limit, line 513 at r1 (raw file):

Previously, RaduBerinde wrote…

I'd expect the limit's ordering to also have opt(i) since we reduce it using the input FDs (I believe that's what's happening in the plan)

You're both right; I updated the test to include an optional ordering and didn't think the comment through fully. The limit's ordering has an opt(i) when the rule is applied, so I've updated the comment.

@RaduBerinde
Copy link
Copy Markdown
Member


pkg/sql/opt/norm/custom_funcs.go, line 413 at r1 (raw file):

Previously, savoie (Céline O'Neil) wrote…

The minimal counter-example to Intersects working that I found was using the orderings x- and <empty> (which intersect: x- is an intersection) on this table foo:

 x
===
 a
 b
SELECT * FROM (SELECT * FROM foo WITH ORDINALITY) ORDER BY x DESC LIMIT 1

should give the result

 x | ordinality
================
 b | 2

but if the LIMIT 1 is pushed into the ordinality you get

 x | ordinality
================
 b | 1

since only 1 row is ever provided to the ordinality; we're never going to get it to assign the ordinal 2.

On the other hand, with the same orderings but with the ordinality ordering implying the limit ordering, we're OK to push down (if we replace the limit's ordering with the intersection of the two orderings):

SELECT * FROM (SELECT * FROM foo ORDER BY x DESC) WITH ORDINALITY LIMIT 1

and

SELECT * FROM (SELECT * FROM foo ORDER BY x DESC LIMIT 1) WITH ORDINALITY

give the same result:

 x | ordinality
================
 b | 1

I don't have a perfect proof of why intersection isn't sufficient but implication is, but here's my intuition:

Let's say that the limit is for 10 rows. The limit can only be pushed down into the ordinality if those 10 rows are exactly the first 10 rows (numbered 1-10) of the ordinality ordering. If the rows are the ones that, without applying the rule, would be numbered 2-11, the limit can't be pushed down, since there's no way to tell the ordinality to begin its numbering at 2.

So if (prior to pushing down the limit), the result of the ordinality operation needs to be reordered at all when applying the limit, the limit operator can't be pushed down into the ordinality. (Strictly, the limit could still be pushed down if the set of rows actually selected by the limit remained unchanged, but this can't be known as far as I'm aware.) So it's not enough that an intersection of the two orderings exists; the ordinality ordering needs to imply the limit ordering.

SELECT * FROM foo WITH ORDINALITYmeans that the ordinality can be assigned in any order, so both b 1 and b 2 are correct results for the query..

This optimizer rule is the first of several that will push limit
operators as far down the tree as possible, in order to remove the need
for logic in `opt_limits.go` that currently recurses down the plan
tree ensuring that leaf scan nodes have hard limits set even when a
limit operator is much higher up in the tree. The following explanation
is also included as a comment above the rule:

> PushLimitIntoOrdinality pushes the Limit operator into the Ordinality
> operator when the ordering associated with both operators allows it.
>
> Pushing the limit as far as possible down the tree shouldn't have
> negative effects, but will reduce the number of rows processed by
> operators higher up, and if the limit is pushed all the way down to a
> scan, the scan can be limited directly.
>
> In order to prevent this rule from affecting:
> 1. the set of rows kept by the limit,
> 2. the ordinals assigned to those rows by the ordinality, and
> 3. the final ordering of the rows,
> the new limit's ordering should be "extended" to imply the
> ordinality's ordering, so it is set to be an intersection of the
> original limit ordering and the ordinality's ordering (see
> OrderingChoice.Intersection).

Release note: None
@savoie savoie force-pushed the limit-ordinality-rule branch from ac785ad to 84dcc63 Compare October 29, 2019 14:37
Copy link
Copy Markdown
Contributor Author

@savoie savoie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @RaduBerinde)


pkg/sql/opt/norm/custom_funcs.go, line 413 at r1 (raw file):

Previously, RaduBerinde wrote…

SELECT * FROM foo WITH ORDINALITYmeans that the ordinality can be assigned in any order, so both b 1 and b 2 are correct results for the query..

Updated to use intersection, and removed ambiguity in the allowed ordering of the logic tests.

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @justinj and @RaduBerinde)

Copy link
Copy Markdown
Contributor

@justinj justinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: nice job!

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @justinj and @RaduBerinde)

@savoie
Copy link
Copy Markdown
Contributor Author

savoie commented Oct 31, 2019

bors r+

craig bot pushed a commit that referenced this pull request Oct 31, 2019
41908: opt: introduce PushLimitIntoOrdinality rule r=savoie a=savoie

This optimizer rule is the first of several that will push limit
operators as far down the tree as possible, in order to remove the need
for logic in `opt_limits.go` that currently recurses down the plan
tree ensuring that leaf scan nodes have hard limits set even when a
limit operator is much higher up in the tree. The following explanation
is also included as a comment above the rule:

> PushLimitIntoOrdinality pushes the Limit operator into the Ordinality
> operator when the ordering associated with both operators allows it.
>
> Pushing the limit as far as possible down the tree shouldn't have
> negative effects, but will reduce the number of rows processed by
> operators higher up, and if the limit is pushed all the way down to a
> scan, the scan can be limited directly.
>
> In order to prevent this rule from affecting:
> 1. the set of rows kept by the limit,
> 2. the ordinals assigned to those rows by the ordinality, and
> 3. the final ordering of the rows,
> the rule should only be applied if the ordinality's ordering implies
> the original limit's ordering (see OrderingChoice.Implies).
>
> The new limit's ordering must also be "extended" to imply the
> ordinality's ordering, which is done by setting it to be the
> intersection of the old limit's ordering and the ordinality's ordering
> (see OrderingChoice.Intersection).

Release note: None

Co-authored-by: Céline O'Neil <celineloneil@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 31, 2019

Build succeeded

@craig craig bot merged commit 84dcc63 into cockroachdb:master Oct 31, 2019
savoie added a commit to savoie/cockroach that referenced this pull request Oct 31, 2019
Previously, the heuristic planner function `applyLimit` propagated and set both
hard and soft limits. Hard limit propagation is now handled in the optimizer by
using rules to push Limit operators down the tree as far as is possible, and to
remove Limits by adding hard limits to child Scan operators. As a result, a
majority of the logic within `applyLimit` that supported propagating hard
limits was never exercised, since optimization rules would have already
eliminated situations in which Limit operators high up in the tree could be the
cause of hard limits further down the tree. The one remaining case for which
`applyLimit` was still responsible for propagating hard limits was eliminated
by the normalization rule introduced in cockroachdb#41908.

The entirety of `applyLimit` will eventually be removed and replaced by the
optimizer. Removing hard limit propagation from it entirely is a step in this
direction, and will make it easier to refactor soft limit propagation later.

This patch changes `applyLimit` into `applySoftLimit`, which no longer performs
any hard limit propagation; the responsibility for setting any hard limits on
scanNodes and spoolNodes is now left to the optimizer.

It also renames `setUnlimited` to `propagateSoftLimits` in order to indicate
more clearly that this function is not used to remove any limits that might
have been set on a node, but instead to trigger soft limit propagation in case
a limit hint can be introduced further down the tree.

In the case where the optimizer does not generate a limited Scan in place of a
Limit around a Scan (because the Scan cannot be constrained), previously
`applyLimit` would still set the `hardLimit` field of the scan.  This change
will result in the `softLimit` being set instead.  The end state of the
planNode tree is otherwise the same as before this refactor.

Release note: None
savoie added a commit to savoie/cockroach that referenced this pull request Nov 4, 2019
Previously, the heuristic planner function `applyLimit` propagated and set both
hard and soft limits. Hard limit propagation is now handled in the optimizer by
using rules to push Limit operators down the tree as far as is possible, and to
remove Limits by adding hard limits to child Scan operators. As a result, a
majority of the logic within `applyLimit` that supported propagating hard
limits was never exercised, since optimization rules would have already
eliminated situations in which Limit operators high up in the tree could be the
cause of hard limits further down the tree. The one remaining case for which
`applyLimit` was still responsible for propagating hard limits was eliminated
by the normalization rule introduced in cockroachdb#41908.

The entirety of `applyLimit` will eventually be removed and replaced by the
optimizer. Removing hard limit propagation from it entirely is a step in this
direction, and will make it easier to refactor soft limit propagation later.

This patch changes `applyLimit` into `applySoftLimit`, which no longer performs
any hard limit propagation; the responsibility for setting any hard limits on
scanNodes and spoolNodes is now left to the optimizer.

It also renames `setUnlimited` to `propagateSoftLimits` in order to indicate
more clearly that this function is not used to remove any limits that might
have been set on a node, but instead to trigger soft limit propagation in case
a limit hint can be introduced further down the tree.

In the case where the optimizer does not generate a limited Scan in place of a
Limit around a Scan (because the Scan cannot be constrained), previously
`applyLimit` would still set the `hardLimit` field of the scan.  This change
will result in the `softLimit` being set instead.  The end state of the
planNode tree is otherwise the same as before this refactor.

Release note: None
craig bot pushed a commit that referenced this pull request Nov 5, 2019
42086: sql: remove hard limit handling from heuristic planner r=savoie a=savoie

Previously, the heuristic planner function `applyLimit` propagated and set both
hard and soft limits. Hard limit propagation is now handled in the optimizer by
using rules to push Limit operators down the tree as far as is possible, and to
remove Limits by adding hard limits to child Scan operators. As a result, a
majority of the logic within `applyLimit` that supported propagating hard
limits was never exercised, since optimization rules would have already
eliminated situations in which Limit operators high up in the tree could be the
cause of hard limits further down the tree. The one remaining case for which
`applyLimit` was still responsible for propagating hard limits was eliminated
by the normalization rule introduced in #41908.

The entirety of `applyLimit` will eventually be removed and replaced by the
optimizer. Removing hard limit propagation from it entirely is a step in this
direction, and will make it easier to refactor soft limit propagation later.

This patch changes `applyLimit` into `applySoftLimit`, which no longer performs
any hard limit propagation; the responsibility for setting any hard limits on
scanNodes and spoolNodes is now left to the optimizer.

It also renames `setUnlimited` to `propagateSoftLimits` in order to indicate
more clearly that this function is not used to remove any limits that might
have been set on a node, but instead to trigger soft limit propagation in case
a limit hint can be introduced further down the tree.

In the case where the optimizer does not generate a limited Scan in place of a
Limit around a Scan (because the Scan cannot be constrained), previously
`applyLimit` would still set the `hardLimit` field of the scan.  This change
will result in the `softLimit` being set instead.  The end state of the
planNode tree is otherwise the same as before this refactor.

Release note: None

42181: storage/engine: port the final few MVCC benchmarks to Pebble r=petermattis a=petermattis

The `MVCCDeleteRange`, `ClearRange`, and `ClearIterRange` benchmarks
appear to give nonsensical results. I suspect the benchmarks themselves
are at fault and need to be rewritten.

```
name                          old time/op    new time/op    delta
MVCCScanTransactionalData-16    4.20ms ± 1%    3.39ms ± 0%  -19.15%  (p=0.000 n=9+9)
```

Release note: None

42185: cli/cliflags: add COCKROACH_STORAGE_ENGINE env var r=petermattis a=petermattis

`roachprod` passes through COCKROACH_* env vars allowing
`COCKROACH_STORAGE_ENGINE=pebble roachprod start` to work.

Fixes #41620

Release note (cli change): Add COCKROACH_STORAGE_ENGINE env var which
is tied to the `--storage-engine` flag. Allows selection of "pebble" as
an alternative to the default of "rocksdb".

Co-authored-by: Céline O'Neil <celineloneil@gmail.com>
Co-authored-by: Peter Mattis <petermattis@gmail.com>
@savoie savoie deleted the limit-ordinality-rule branch November 7, 2019 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants