opt: ensure validation of unique constraints is efficient#65355
opt: ensure validation of unique constraints is efficient#65355craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
|
Please ignore the first commit; it's from #65350. |
RaduBerinde
left a comment
There was a problem hiding this comment.
Nicely done!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/opt/xform/limit_funcs.go, line 215 at r2 (raw file):
// SplitScanIntoUnionScans returns a UnionAll of Scan operators (with an // optional hard limit) that each scan over a single key from the original // Scan's constraints. If no such UnionAll of Scans can be found, ok=false is
[nit] returns a UnionAll [..] if the scans can provide the given ordering, and if the statistics suggest that splitting the scan could be beneficial.
pkg/sql/opt/xform/limit_funcs.go, line 263 at r2 (raw file):
if scan.Relational().Stats.Available && float64(scanCount*limit*threshold) >= scan.Relational().Stats.RowCount {
[nit] I would put all the code that is limit specific inside an if limit > 0. It's more or less luck that it all works out with 0, better to not have to think about it.
pkg/sql/opt/xform/limit_funcs.go, line 272 at r2 (raw file):
// The index ordering must have a prefix of columns of length keyLength // followed by the ordering columns either in order or in reverse order. hasLimitOrderingSeq, reverse := indexHasOrderingSequence(
[nit] hasOrderingSeq
pkg/sql/opt/xform/rules/groupby.opt, line 163 at r2 (raw file):
) => ((OpName) $unionScans $aggs $private)
After this rule runs, we are relying on the existing exploration rules to generate a streaming GroupBy, correct?
a1b3ca6 to
7b9ad1f
Compare
rytaft
left a comment
There was a problem hiding this comment.
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)
pkg/sql/opt/xform/limit_funcs.go, line 215 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] returns a UnionAll [..] if the scans can provide the given ordering, and if the statistics suggest that splitting the scan could be beneficial.
Done.
pkg/sql/opt/xform/limit_funcs.go, line 263 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] I would put all the code that is limit specific inside an
if limit > 0. It's more or less luck that it all works out with 0, better to not have to think about it.
Done.
pkg/sql/opt/xform/limit_funcs.go, line 272 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] hasOrderingSeq
Done.
pkg/sql/opt/xform/rules/groupby.opt, line 163 at r2 (raw file):
Previously, RaduBerinde wrote…
After this rule runs, we are relying on the existing exploration rules to generate a streaming GroupBy, correct?
Yep! Added a comment.
mgartner
left a comment
There was a problem hiding this comment.
[nit] the PR description is missing the second commit
Reviewed 5 of 7 files at r2, 2 of 2 files at r3, 3 of 3 files at r4.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @rytaft)
pkg/sql/opt/xform/groupby_funcs.go, line 213 at r3 (raw file):
scan memo.RelExpr, sp *memo.ScanPrivate, private *memo.GroupingPrivate, ) (_ memo.RelExpr, ok bool) { cons, ok := c.getKnownScanConstraint(sp)
[nit] move the definition of getKnownScanConstraint from limit_funcs.go to general_funcs.go (probably in a separate commit)
pkg/sql/opt/xform/rules/groupby.opt, line 185 at r3 (raw file):
# inefficient and memory-intensive hash group by operation. Note that the # previous rule, SplitGroupByScanIntoUnionScans, is also needed since it would # apply in cases where a and b are not nullable.
[nit] It might be nice to expand this sentence a bit more—"are not nullable" confused me a bit when I read it because I thought it should read "are nullable". Something like "it would apply in cases where a and b are not nullable (the IS NOT NULL filters would be eliminated during normalization)". (Or is it that the IS NOT NULL filters are not part of the query to begin with?)
DrewKimball
left a comment
There was a problem hiding this comment.
Really cool!
It might be nice to have a comment explaining why zero is passed as the limit to SplitScanIntoUnionScans.
This commit adds two new exploration rules: SplitGroupByScanIntoUnionScans
and SplitGroupByFilteredScanIntoUnionScans.
SplitGroupByScanIntoUnionScans splits a non-inverted scan under a GroupBy,
DistinctOn, or EnsureUpsertDistinctOn into a union-all of scans, where each
scan is ordered on the grouping columns. This ordering is then maintained by
the union-all operation and passed on to the grouping operation. Ordering on
the grouping columns is important since it enables the grouping operation to
execute in a streaming fashion, which is more efficient. Example:
CREATE TABLE tab (
region STRING NOT NULL CHECK (region IN ('ASIA', 'EUROPE')),
data INT NOT NULL,
INDEX (region, data)
);
SELECT DISTINCT data
FROM tab;
=>
SELECT DISTINCT data
FROM (SELECT * FROM tab WHERE region='ASIA')
UNION ALL (SELECT * FROM tab WHERE region='EUROPE');
This rule does not actually build the streaming grouping operation, but it
allows another rule, GenerateStreamingGroupBy, to fire and use the new
interesting orderings provided by the UnionAll of scans to build a streaming
operation.
SplitGroupByFilteredScanIntoUnionScans is like SplitGroupByScanIntoUnionScans,
but the scan is wrapped in a Select.
These transformations are important for ensuring that validation of the
unique constraint in an implicitly-partitioned unique index is efficient. The
validation query to verify that (a, b) is UNIQUE on table tbl looks like this:
SELECT a, b
FROM tbl
WHERE a IS NOT NULL AND b IS NOT NULL
GROUP BY a, b
HAVING count(*) > 1
LIMIT 1;
Without SplitGroupByFilteredScanIntoUnionScans, this query would require an
inefficient and memory-intensive hash group by operation. Note that the
previous rule, SplitGroupByScanIntoUnionScans, is also needed since it would
apply in cases where a and b are not nullable.
Fixes cockroachdb#56201
Release note (performance improvement): Validation of a new UNIQUE index in a
REGIONAL BY ROW table no longer requires an inefficient and memory-intensive
hash aggregation query. The optimizer can now plan the validation query so
that it uses all streaming operations, which are much more efficient.
This commit moves splitScanIntoUnionScans and all helper functions into general_funcs.go since it's now used in both limit_funcs.go and groupby_funcs.go. This is a separate commit for ease of reviewing. Release note: None
7b9ad1f to
35b1ec6
Compare
rytaft
left a comment
There was a problem hiding this comment.
TFTRs!
[nit] the PR description is missing the second commit
Done.
It might be nice to have a comment explaining why zero is passed as the limit to SplitScanIntoUnionScans.
Done.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner)
pkg/sql/opt/xform/groupby_funcs.go, line 213 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
[nit] move the definition of
getKnownScanConstraintfromlimit_funcs.gotogeneral_funcs.go(probably in a separate commit)
It was moved in the second commit.
pkg/sql/opt/xform/rules/groupby.opt, line 185 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
[nit] It might be nice to expand this sentence a bit more—"are not nullable" confused me a bit when I read it because I thought it should read "are nullable". Something like "it would apply in cases where a and b are not nullable (the
IS NOT NULLfilters would be eliminated during normalization)". (Or is it that theIS NOT NULLfilters are not part of the query to begin with?)
Done.
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 4 of 4 files at r5, 3 of 3 files at r6.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
pkg/sql/opt/xform/groupby_funcs.go, line 213 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
It was moved in the second commit.
Ahh I missed that...
rytaft
left a comment
There was a problem hiding this comment.
Thanks!
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
|
Build succeeded: |
opt: ensure validation of unique constraints is efficient
This commit adds two new exploration rules:
SplitGroupByScanIntoUnionScansand
SplitGroupByFilteredScanIntoUnionScans.SplitGroupByScanIntoUnionScanssplits a non-inverted scan under aGroupBy,DistinctOn, orEnsureUpsertDistinctOninto a union-all of scans, where eachscan is ordered on the grouping columns. This ordering is then maintained by
the union-all operation and passed on to the grouping operation. Ordering on
the grouping columns is important since it enables the grouping operation to
execute in a streaming fashion, which is more efficient. Example:
This rule does not actually build the streaming grouping operation, but it
allows another rule,
GenerateStreamingGroupBy, to fire and use the newinteresting orderings provided by the
UnionAllof scans to build a streamingoperation.
SplitGroupByFilteredScanIntoUnionScansis likeSplitGroupByScanIntoUnionScans,but the scan is wrapped in a Select.
These transformations are important for ensuring that validation of the
unique constraint in an implicitly-partitioned unique index is efficient. The
validation query to verify that
(a, b)is UNIQUE on tabletbllooks like this:Without
SplitGroupByFilteredScanIntoUnionScans, this query would require aninefficient and memory-intensive hash group by operation. Note that the
previous rule,
SplitGroupByScanIntoUnionScans, is also needed since it wouldapply in cases where
aandbare not nullable.Fixes #56201
Release note (performance improvement): Validation of a new UNIQUE index in a
REGIONAL BY ROW table no longer requires an inefficient and memory-intensive
hash aggregation query. The optimizer can now plan the validation query so
that it uses all streaming operations, which are much more efficient.
opt: move splitScanIntoUnionScans to general_funcs.go
This commit moves
splitScanIntoUnionScansand all helper functionsinto
general_funcs.gosince it's now used in bothlimit_funcs.goandgroupby_funcs.go. This is a separate commit for ease of reviewing.Release note: None