opt: refactor limit hint propagation into a required physical property#42170
opt: refactor limit hint propagation into a required physical property#42170craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
If anyone has pro tips on how to make this easier to review with #42170 still unmerged, let me know :) |
RaduBerinde
left a comment
There was a problem hiding this comment.
Nice work! I posted various comments. Review was easy (using the "review each comment separately" setting in reviewable).
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @savoie)
pkg/sql/opt/exec/execbuilder/relational.go, line 471 at r2 (raw file):
} softLimit := int64(scan.RequiredPhysical().LimitHint)
Here we convert 0.1 to 0 (which is no limit). I think we should make it 1 if there's a limit hint.
pkg/sql/opt/props/physical/required.go, line 47 at r2 (raw file):
Ordering OrderingChoice LimitHint float64
Definitely needs a comment, especially mentioning the special value 0.
pkg/sql/opt/props/physical/required.go, line 86 at r2 (raw file):
buf.WriteByte(']') if hasOrdering || hasLimitHint {
The logic in this function is very hard to follow (and hard to extend correctly). We should do something better, e.g.:
var buf bytes.Buffer
output := func(name string, fn func (*bytes.Buffer)) {
if buf.Len() != 0 {
buf.WriteByte(' ')
}
buf.WriteByte('[')
buf.WriteString(name)
fn(&buf)
buf.WriteByte(']')
}and call output for each set property.
pkg/sql/opt/xform/optimizer.go, line 557 at r2 (raw file):
// enforcer operator can provide them. inner.Presentation = nil inner.LimitHint = 0
I don't think that conceptually the limit hint is the same as the presentation here. Zero just happens to be the right choice when adding a SortExpr{} below. And it's not even necessarily the best choice: we can pass through a limit hint in the second case below where we have a partial ordering. For example, if we are ordering by a,b and we have an ordering on a and we know that on average there are say 100 bs for every a, we can pass through limitHint + 100. This could be an important optimization (favoring using an index on a for this operation).
I think the right thing to do here is to calculate inner in the two cases below by calling BuildChildPhysicalProps on the SortExpr that we create. Then we will use whatever logic is in there (which we can improve later as suggested). It is already the case that inner here must be exactly what BuildChildPhysicalProps would return for the enforcer, it was just trivial to achieve that directly until now.
pkg/sql/opt/xform/physical_props.go, line 81 at r2 (raw file):
childProps.Ordering = ordering.BuildChildRequired(parent, &parentProps.Ordering, nth) switch parent.Op() {
We need targeted tests for this logic in a xform/testdata/physprops/limit-hint file.
pkg/sql/opt/xform/physical_props.go, line 92 at r2 (raw file):
if constOffset, ok := parent.(*memo.OffsetExpr).Offset.(*memo.ConstExpr); ok { offsetValue := float64(*constOffset.Value.(*tree.DInt)) if offsetValue > math.MaxFloat64-parentProps.LimitHint {
Seems odd to have to worry about this for floats, I'd remove it. If you're really worried about it, you can add a catch-all below if LimitHint == math.Inf(+1) { LimitHint = 0 }
pkg/sql/opt/xform/physical_props.go, line 103 at r2 (raw file):
case opt.ExceptOp, opt.ExceptAllOp, opt.IntersectOp, opt.IntersectAllOp, opt.UnionOp, opt.UnionAllOp: childProps.LimitHint = parentProps.LimitHint
Add a TODO here to differentiate between the left and right child.
pkg/sql/opt/xform/physical_props.go, line 107 at r2 (raw file):
childProps.LimitHint = parentProps.LimitHint case opt.SelectOp: childProps.LimitHint = parentProps.LimitHint
Add a TODO() here that we should take into account the selectivity.
pkg/sql/opt/xform/physical_props.go, line 108 at r2 (raw file):
case opt.SelectOp: childProps.LimitHint = parentProps.LimitHint case opt.ProjectOp:
these cases can be coalesced case opt.ProjectOp, opt.OrdinalityOp, ...
9558ed8 to
1eb99fa
Compare
|
(Just adding a file I'd forgotten to bundle into the commit) |
justinj
left a comment
There was a problem hiding this comment.
Nice! Less code than I expected, but definitely shows that this was a subtle thing to work on.
It strikes me that it seems like conceptually what we really are tracking as the soft limit is really kind of a probability distribution of the number of rows we might need, and I wonder if future work would also track some other properties of this distribution (say, the soft limit itself is the mean, but then if we also track the stdev we can control how risk-averse we're being in a given situation).
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @savoie)
pkg/sql/opt/xform/physical_props.go, line 107 at r2 (raw file):
Previously, RaduBerinde wrote…
Add a
TODO()here that we should take into account the selectivity.
Same with DistinctOn
pkg/sql/opt/xform/physical_props.go, line 113 at r3 (raw file):
childProps.LimitHint = parentProps.LimitHint case opt.ProjectSetOp: childProps.LimitHint = parentProps.LimitHint
consider panicking here so we know to update it for new operators (even if we just have a catch-all case that contains all the existing operators)
1eb99fa to
0a3f09a
Compare
savoie
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @justinj)
pkg/sql/opt/exec/execbuilder/relational.go, line 471 at r2 (raw file):
Previously, RaduBerinde wrote…
Here we convert 0.1 to 0 (which is no limit). I think we should make it 1 if there's a limit hint.
Fixed by calling math.Ceil the 2 times we make the conversion to int64. If LimitHint needs to be converted to an integer number of rows many more times, it should probably get some dedicated toNumberOfRows function. For the moment though this seems fine.
pkg/sql/opt/xform/physical_props.go, line 81 at r2 (raw file):
Previously, RaduBerinde wrote…
We need targeted tests for this logic in a
xform/testdata/physprops/limit-hintfile.
Only have the one test right now (for the case of an Offset somewhere below a Limit), but will be adding more as we get into more complex hint calculations.
pkg/sql/opt/xform/physical_props.go, line 113 at r3 (raw file):
Previously, justinj (Justin Jaffray) wrote…
consider panicking here so we know to update it for new operators (even if we just have a catch-all case that contains all the existing operators)
Radu and I discussed this and it seems not worth it since the default case here ("don't impose a limit hint on your children") is maybe suboptimal in some cases but never wrong.
rytaft
left a comment
There was a problem hiding this comment.
Nice work!
Reviewed 12 of 16 files at r2, 1 of 1 files at r3, 5 of 5 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @justinj and @savoie)
pkg/sql/opt/exec/execbuilder/relational.go, line 475 at r4 (raw file):
hardLimit := scan.HardLimit.RowCount() // At most one of hardLimit and softLimit may be defined at the same time.
why is softLimit prioritized over hardLimit?
pkg/sql/opt/memo/interner.go, line 539 at r4 (raw file):
} h.HashOrderingChoice(val.Ordering) h.HashFloat64(val.LimitHint)
Is there any chance we could have a hash collision here with a longer ordering choice but no limit hint?
pkg/sql/opt/props/physical/required.go, line 52 at r4 (raw file):
// to return all result rows, but it can be optimized based on the assumption // that only the hinted number of rows will be needed. // A LimitHint of 0 indicates "no limit". The LimitHint an intermediate
[nit] LimitHint has an intermediate
0a3f09a to
8009135
Compare
savoie
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @justinj and @rytaft)
pkg/sql/opt/exec/execbuilder/relational.go, line 475 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
why is softLimit prioritized over hardLimit?
Overwriting a hard limit with a soft limit was the default behaviour in applyLimits and I was aiming for as close to a straight refactor as possible with this patch. It's not immediately obvious to me what is optimal if, say, a node has a large hard limit and a small soft limit. Adding a TODO here.
pkg/sql/opt/memo/interner.go, line 539 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Is there any chance we could have a hash collision here with a longer ordering choice but no limit hint?
OK so this was an interesting can of worms. I think the answer is probably yes, we could have a collision, but before I got to constructing one, I found out that it's actually quite simple to construct a hash collision without even adding any properties besides OrderingChoice: -2 opt(1) and -(1|2) have the same hash. This is handled correctly by InternPhysicalProps, which does verify that physical properties with the same hash are actually equal before considering them duplicates.
So I think the entirety of HashPhysProps would need a rewrite to avoid collisions, but also collisions are handled correctly, so infrequent collisions look like they're probably fine?
(Side note: turns out we weren't running all of the physical props interner tests; fixing that in this new push.)
pkg/sql/opt/props/physical/required.go, line 52 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] LimitHint has an intermediate
Done.
andy-kimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @justinj and @rytaft)
pkg/sql/opt/memo/interner.go, line 539 at r4 (raw file):
Previously, savoie (Céline O'Neil) wrote…
OK so this was an interesting can of worms. I think the answer is probably yes, we could have a collision, but before I got to constructing one, I found out that it's actually quite simple to construct a hash collision without even adding any properties besides
OrderingChoice:-2 opt(1)and-(1|2)have the same hash. This is handled correctly byInternPhysicalProps, which does verify that physical properties with the same hash are actually equal before considering them duplicates.So I think the entirety of
HashPhysPropswould need a rewrite to avoid collisions, but also collisions are handled correctly, so infrequent collisions look like they're probably fine?(Side note: turns out we weren't running all of the physical props interner tests; fixing that in this new push.)
I tried to keep the hashing functions extremely simple and fast. Collisions aren't a problem unless they're common.
|
pkg/sql/opt/memo/interner.go, line 539 at r4 (raw file): Previously, andy-kimball (Andy Kimball) wrote…
Yes, our approach is to keep things simple, but avoid unnecessary collisions if it's not more complicated. To answer thr original concern, because we "append" the float in all cases, it shouldn't be a problem. It would be if we only hashed it when it's nonzero. |
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r5.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @justinj)
pkg/sql/opt/memo/interner.go, line 539 at r4 (raw file):
Previously, RaduBerinde wrote…
Yes, our approach is to keep things simple, but avoid unnecessary collisions if it's not more complicated. To answer thr original concern, because we "append" the float in all cases, it shouldn't be a problem. It would be if we only hashed it when it's nonzero.
Makes sense - thanks!
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @justinj and @savoie)
pkg/sql/opt/xform/testdata/physprops/limit_hint, line 4 at r5 (raw file):
CREATE TABLE t (x INT PRIMARY KEY, y INT) ----
I would add some simple tests with opt that show that the hint gets all the way into the scan, including through index join and other cases where we're passing the limit down.
This patch follows up on cockroachdb#42086 and finishes the refactor of the heuristic planner `applyLimits` function. It removes the last of the heuristic planner limit handling, and introduces the LimitHint required physical property, which is represented by a float64 in anticipation of future improvements that will use statistics to inform better estimations. Previously, in the case where a limit hint was propagated down to a limitNode that had an offset but no limit, the limit hint would be discarded. In this patch, an OffsetOp's limit hint will be propagated to its children, having been increased enough to ensure that the required number of rows can be discarded by the offset. Another change from the behaviour of the heuristic planner's handling of limit hints is in the case of subqueries. `applyLimit` would not be called on subqueries created with `Builder.addSubquery`, so no nodes within subqueries had soft limits set. Since handling of required physical properties occurs before subqueries are created, this ceases to be a special case and soft limits may be set for scanNodes within subqueries. Besides these two departures from the heuristic planner's limit propagation, this patch should not have introduced further changes affecting the ultimate value of `scanNode.softLimit`. With this refactor, improvements to limit hint handling can be made within the optimizer. Release note: None
8009135 to
56239e4
Compare
savoie
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @justinj and @rytaft)
pkg/sql/opt/xform/testdata/physprops/limit_hint, line 4 at r5 (raw file):
Previously, RaduBerinde wrote…
I would add some simple tests with
optthat show that the hint gets all the way into the scan, including through index join and other cases where we're passing the limit down.
Added the limit hint to the output of opt and added some simple tests that cover this.
justinj
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @justinj and @rytaft)
pkg/sql/opt/exec/execbuilder/relational.go, line 475 at r4 (raw file):
Previously, savoie (Céline O'Neil) wrote…
Overwriting a hard limit with a soft limit was the default behaviour in
applyLimitsand I was aiming for as close to a straight refactor as possible with this patch. It's not immediately obvious to me what is optimal if, say, a node has a large hard limit and a small soft limit. Adding a TODO here.
if we have a hardLimit won't we almost always have a softLimit? Since the softLimit will propagate down from a Limit operator?
savoie
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @justinj and @rytaft)
pkg/sql/opt/exec/execbuilder/relational.go, line 475 at r4 (raw file):
Previously, justinj (Justin Jaffray) wrote…
if we have a
hardLimitwon't we almost always have asoftLimit? Since the softLimit will propagate down from aLimitoperator?
hardLimit gets added in exploration rules, which replace a Limit around a Scan with a hard-limited Scan, so in that case there is no longer a Limit operator that can propagate down a limit hint, and there is only a hard limit.
The case of competing hard and soft limits only comes up if there were at least 2 limits above the scan, one of which can't be pushed down into a scan:
opt
SELECT * FROM (SELECT * FROM t LIMIT 20) where y=1 LIMIT 10
----
limit
├── columns: x:1(int!null) y:2(int!null) z:3(int)
├── select
│ ├── columns: x:1(int!null) y:2(int!null) z:3(int)
│ ├── limit hint: 10.00
│ ├── scan t
│ │ ├── columns: x:1(int!null) y:2(int) z:3(int)
│ │ ├── limit: 20
│ │ └── limit hint: 10.00
│ └── filters
│ └── y = 1 [type=bool]
└── const: 10 [type=int]
If the hint of 10 were actually any good, it might be better to take that over the hard limit of 20. Once I have better limit hint estimates in cases like filtering, I'll look into whether something like "take the lowest of the hard and soft limit" is good in most cases.
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewed 7 of 16 files at r2, 3 of 5 files at r4, 2 of 3 files at r5, 47 of 47 files at r6.
Reviewable status:complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @justinj)
|
bors r+ |
Build failed |
|
bors r+ |
Build failed |
|
bors r+ |
Build failed |
|
bors r+ |
Merge conflict (retrying...) |
42170: opt: refactor limit hint propagation into a required physical property r=savoie a=savoie This patch follows up on #42086 and finishes the refactor of the heuristic planner `applyLimits` function. It removes the last of the heuristic planner limit handling, and introduces the LimitHint required physical property, which is represented by a float64 in anticipation of future improvements that will use statistics to inform better estimations. Previously, in the case where a limit hint was propagated down to a limitNode that had an offset but no limit, the limit hint would be discarded. In this patch, an OffsetOp's limit hint will be propagated to its children, having been increased enough to ensure that the required number of rows can be discarded by the offset. Another change from the behaviour of the heuristic planner's handling of limit hints is in the case of subqueries. `applyLimit` would not be called on subqueries created with `Builder.addSubquery`, so no nodes within subqueries had soft limits set. Since handling of required physical properties occurs before subqueries are created, this ceases to be a special case and soft limits may be set for scanNodes within subqueries. Besides these two departures from the heuristic planner's limit propagation, this patch should not have introduced further changes affecting the ultimate value of `scanNode.softLimit`. With this refactor, improvements to limit hint handling can be made within the optimizer. Release note: None 42521: jobs: allow jobs to be created and started separately r=spaskob a=ajwerner Prior to this commit, if a creator of a job wanted to get the results it would have to create and start it at the same time by calling jobs.Registry.StartJob. That method would take a jobs.Record, create a new *jobs.Job and start it immediately, sending results on a passed channel and returning an error channel. This is awkward because sometimes callers might want to do other work on the same transaction which is creating the job. Furthermore those callers might not want to start the job until after the creating transaction commits. This refactor provides a mechanism for callers to create and store a job in a transaction and then start it after that transaction commits. Here's an example: ``` j := registry.NewJob(record) if err := db.Txn(ctx, func(ctx context.Context, txn *client.Txn) error { if err := j.WithTxn(txn).Created(ctx); err != nil { return err } ... }); err != nil { ... } resultsCh := make(chan tree.Datums) errCh, err := j.Start(ctx, resultsCh) ... ``` Release note: None Co-authored-by: Céline O'Neil <celineloneil@gmail.com> Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Build succeeded |
This patch follows up on #42086 and finishes the refactor of the
heuristic planner
applyLimitsfunction. It removes the last of theheuristic planner limit handling, and introduces the LimitHint required
physical property, which is represented by a float64 in anticipation of
future improvements that will use statistics to inform better
estimations.
Previously, in the case where a limit hint was propagated down to a
limitNode that had an offset but no limit, the limit hint would be
discarded. In this patch, an OffsetOp's limit hint will be propagated to
its children, having been increased enough to ensure that the required
number of rows can be discarded by the offset.
Another change from the behaviour of the heuristic planner's handling of
limit hints is in the case of subqueries.
applyLimitwould not becalled on subqueries created with
Builder.addSubquery, so no nodeswithin subqueries had soft limits set. Since handling of required
physical properties occurs before subqueries are created, this ceases to
be a special case and soft limits may be set for scanNodes within
subqueries.
Besides these two departures from the heuristic planner's limit
propagation, this patch should not have introduced further changes
affecting the ultimate value of
scanNode.softLimit. With thisrefactor, improvements to limit hint handling can be made within the
optimizer.
Release note: None