opt: inline constant insert values in uniqueness checks#77943
opt: inline constant insert values in uniqueness checks#77943craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
|
This commit revives an old PR where I attempted to do the same: #65093. The main difference is that this PR only inlines values in uniqiueness checks. It does not inline values into FK checks to avoid breaking the insert fast path. I have some ideas on how to overcome this limitation, but I'll save that for a future PR. cc @chengxiong-ruan who might be interested in some of the improvements to queries with hash-sharded indexes. |
There was a problem hiding this comment.
These TODOs were added gratuitously, and I'm removed them in cases where they were not accurate.
146c9b1 to
99f4be6
Compare
msirek
left a comment
There was a problem hiding this comment.
Very nice! I'm curious how much speedup this yields in common cases.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/opt/memo/expr.go, line 843 at r1 (raw file):
if project, ok := input.(*ProjectExpr); ok { for i := range project.Projections { item := &project.Projections[i]
I know this is just moving the function, but I wonder if it's useful to examine Passthrough columns too and peek into the child operation to look for the constant. Maybe there are no use cases where this is needed though.
Code quote:
ExtractColumnFromProjectOrValuespkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r2 (raw file):
│ estimated row count: 1 (missing stats) │ table: t_unique_hash_pk@t_unique_hash_pk_pkey │ spans
This looks like when intersecting spans the result is empty. Do you think it would be worth opening an issue to eliminate the UNION ALL when one side is known to return zero rows.
mgartner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @msirek and @rytaft)
pkg/sql/opt/memo/expr.go, line 843 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
I know this is just moving the function, but I wonder if it's useful to examine Passthrough columns too and peek into the child operation to look for the constant. Maybe there are no use cases where this is needed though.
I don't think it's necessary because normalization rules should merge all the Projects together, or merge the Projects into a subexpression that is a single Values row.
For example:
optsteps
SELECT a + 1 FROM (VALUES (1)) AS v(a)
----
================================================================================
Initial expression
Cost: 0.05
================================================================================
project
├── columns: "?column?":2!null
├── cardinality: [1 - 1]
├── immutable
├── key: ()
├── fd: ()-->(2)
├── values
│ ├── columns: column1:1!null
│ ├── cardinality: [1 - 1]
│ ├── key: ()
│ ├── fd: ()-->(1)
│ └── (1,)
└── projections
└── column1:1 + 1 [as="?column?":2, outer=(1), immutable]
================================================================================
InlineProjectConstants
Cost: 0.05
================================================================================
project
├── columns: "?column?":2!null
├── cardinality: [1 - 1]
├── immutable
├── key: ()
├── fd: ()-->(2)
├── values
│ ├── columns: column1:1!null
│ ├── cardinality: [1 - 1]
│ ├── key: ()
│ ├── fd: ()-->(1)
│ └── (1,)
└── projections
- └── column1:1 + 1 [as="?column?":2, outer=(1), immutable]
+ └── 1 + 1 [as="?column?":2, immutable]
================================================================================
FoldBinary
Cost: 0.05
================================================================================
project
├── columns: "?column?":2!null
├── cardinality: [1 - 1]
- ├── immutable
├── key: ()
├── fd: ()-->(2)
├── values
│ ├── columns: column1:1!null
│ ├── cardinality: [1 - 1]
│ ├── key: ()
│ ├── fd: ()-->(1)
│ └── (1,)
└── projections
- └── 1 + 1 [as="?column?":2, immutable]
+ └── 2 [as="?column?":2]
================================================================================
MergeProjectWithValues
Cost: 0.02
================================================================================
-project
+values
├── columns: "?column?":2!null
├── cardinality: [1 - 1]
├── key: ()
├── fd: ()-->(2)
- ├── values
- │ ├── columns: column1:1!null
- │ ├── cardinality: [1 - 1]
- │ ├── key: ()
- │ ├── fd: ()-->(1)
- │ └── (1,)
- └── projections
- └── 2 [as="?column?":2]
+ └── (2,)
================================================================================
Final best expression
Cost: 0.02
================================================================================
values
├── columns: "?column?":2!null
├── cardinality: [1 - 1]
├── key: ()
├── fd: ()-->(2)
└── (2,)
Anything beneath the Project should be non-constant (or at least not provably constant).
pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r2 (raw file):
Previously, msirek (Mark Sirek) wrote…
This looks like when intersecting
spansthe result is empty. Do you think it would be worth opening an issue to eliminate the UNION ALL when one side is known to return zero rows.
Great eye! I spent all day yesterday working on ways to eliminate the scan with a contradiction constraint and the union all. It's a bit tricky, but I'll keep at it. I've created an issue to track: #78026
msirek
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/opt/memo/expr.go, line 843 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I don't think it's necessary because normalization rules should merge all the Projects together, or merge the Projects into a subexpression that is a single Values row.
For example:
optsteps SELECT a + 1 FROM (VALUES (1)) AS v(a) ---- ================================================================================ Initial expression Cost: 0.05 ================================================================================ project ├── columns: "?column?":2!null ├── cardinality: [1 - 1] ├── immutable ├── key: () ├── fd: ()-->(2) ├── values │ ├── columns: column1:1!null │ ├── cardinality: [1 - 1] │ ├── key: () │ ├── fd: ()-->(1) │ └── (1,) └── projections └── column1:1 + 1 [as="?column?":2, outer=(1), immutable] ================================================================================ InlineProjectConstants Cost: 0.05 ================================================================================ project ├── columns: "?column?":2!null ├── cardinality: [1 - 1] ├── immutable ├── key: () ├── fd: ()-->(2) ├── values │ ├── columns: column1:1!null │ ├── cardinality: [1 - 1] │ ├── key: () │ ├── fd: ()-->(1) │ └── (1,) └── projections - └── column1:1 + 1 [as="?column?":2, outer=(1), immutable] + └── 1 + 1 [as="?column?":2, immutable] ================================================================================ FoldBinary Cost: 0.05 ================================================================================ project ├── columns: "?column?":2!null ├── cardinality: [1 - 1] - ├── immutable ├── key: () ├── fd: ()-->(2) ├── values │ ├── columns: column1:1!null │ ├── cardinality: [1 - 1] │ ├── key: () │ ├── fd: ()-->(1) │ └── (1,) └── projections - └── 1 + 1 [as="?column?":2, immutable] + └── 2 [as="?column?":2] ================================================================================ MergeProjectWithValues Cost: 0.02 ================================================================================ -project +values ├── columns: "?column?":2!null ├── cardinality: [1 - 1] ├── key: () ├── fd: ()-->(2) - ├── values - │ ├── columns: column1:1!null - │ ├── cardinality: [1 - 1] - │ ├── key: () - │ ├── fd: ()-->(1) - │ └── (1,) - └── projections - └── 2 [as="?column?":2] + └── (2,) ================================================================================ Final best expression Cost: 0.02 ================================================================================ values ├── columns: "?column?":2!null ├── cardinality: [1 - 1] ├── key: () ├── fd: ()-->(2) └── (2,)Anything beneath the Project should be non-constant (or at least not provably constant).
OK, thanks.
pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Great eye! I spent all day yesterday working on ways to eliminate the scan with a contradiction constraint and the union all. It's a bit tricky, but I'll keep at it. I've created an issue to track: #78026
Thanks for looking into it. It shouldn't hold up this PR, but I was just curious about this. We don't currently have a lot of normalization rules matching on UNION ALL, but maybe it's possible to train EliminateSetLeft and EliminateSetRight to handle this. I see they call HasZeroRows which calls Relational().Cardinality.IsZero(). Maybe HasZeroRows could also look for a ScanExpr inside a ProjectExpr or SelectExpr and see if there's a contradiction constraint.
|
pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r2 (raw file): Previously, msirek (Mark Sirek) wrote…
The problem is that normalization rules only apply when Only later on, when I think we'd also have trouble making an exploration rule that eliminates the I do have a somewhat promising idea, though. We can add the computed column equality into the filters during normalization, and the other normalization rules should fire to eliminate the |
msirek
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
The problem is that normalization rules only apply when
norm.Factory.ConstructXxxxfunctions are called. WhenSplitDisjunctioncallsFactory.ConstructUnionAll, the left side is an expression like(Select (Scan ...) Filters). This doesn't appear to be a contradiction, so the left side is not simplified to an empty values expression, thus theUnionAllis not eliminated.Only later on, when
GenerateConstrainedScansfires, is it determined that the Select's filters and the computed column expression form a contradiction. At this point, there is no opportunity forEliminateSetLeftto fire because theUnionAllhas already been constructed.I think we'd also have trouble making an exploration rule that eliminates the
UnionAll. The(Select (Scan ...) Filters)created on the left side of theUnionAllbySplitDisjunctionis the first member of a new group. The Cardinality in the logical properties of this group is not zero because, like I mentioned above, it's not apparent that there is a contradiction.GenerateConstrainedScansadds the(Scan contradiction)to this same group. However, the cardinality of the scan remains non-zero because logical properties are shared by all members of a group.I do have a somewhat promising idea, though. We can add the computed column equality into the filters during normalization, and the other normalization rules should fire to eliminate the
ORentirely. Then SplitDisjunction never matches, and theUnionAllis never created. I mentioned this in #78026. It is a non-trivial rule, however, mostly because there's not an obvious way to prevent the rule from matching over and over again forever.
Interesting idea. Would the rule keep firing because the idea adds computed column equality within a normalization rule? Would it be possible to add the equality around the same location as Builder.addCheckConstraintsForTable() is called? Maybe not all required info is available at that location though.
Another option may be to re-run normalization rules on all RelExprs after all transformation rules are applied. But that would require pulling all of the normalization rules out into functions instead of inlining them in the norm.Factory.ConstructXxxx functions; a slightly drastic change, but may be worth thinking about. Something like that might mean items in the internCache would need to get removed and re-added if their hash changes.
99f4be6 to
3ee4e37
Compare
|
pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r2 (raw file):
A simple implementation of the rule would match something like
This has come up before, but we've been reluctant to try it because it won't work in all cases. If we do a pass of normalization rules after the lowest cost plan has been selected, we might not end up with the optimal plan. The optimal plan could be one that only has the lowest cost after that final round of normalization rules. |
|
@rytaft do you mind taking a look at this when you get a chance? No rush. |
rytaft
left a comment
There was a problem hiding this comment.
Very nice! So sorry for the delay.
Reviewed 2 of 2 files at r1, 12 of 12 files at r3, 13 of 13 files at r4, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @msirek)
pkg/sql/opt/optbuilder/mutation_builder.go, line 1329 at r4 (raw file):
// non-nullable table columns). // // fk should be true when building inputs for FK checks, and false otherwise.
nit: I'd call this isFK
pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 86 at r4 (raw file):
# TODO (mgartner): there is a lookup join with lookup condition checking every # single shard. This is unnecessary and could be improved by having the shard # number calculated instead of looking at all possible shards.
can you remove this TODO now?
pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r4 (raw file):
│ estimated row count: 1 (missing stats) │ table: t_unique_hash_pk@t_unique_hash_pk_pkey │ spans
what is this expression? looks like a full table scan?
pkg/ccl/logictestccl/testdata/logic_test/partitioning_implicit, line 781 at r4 (raw file):
└── • error if rows │ └── • norows
awesome!
We could potentially optimize this to remove the constraint check altogether in this case (in a future PR)
Edit: I see you added a TODO for this elsewhere
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 1524 at r4 (raw file):
# TODO(treilly): The constraint check for uniq_idx should use uniq_idx but due # to stats issues w/ empty stats, partial indexes and multicol stats its not. # Hopefully fixing #67583 (and possibly #67479) will resolve this.
wonder if we can remove this TODO
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2207 at r4 (raw file):
│ missing stats │ table: regional_by_row_table_virt_partial@regional_by_row_table_virt_partial_pkey │ spans: [/'ap-southeast-2' - /'ap-southeast-2'] [/'ca-central-1' - /'us-east-1']
Not related to this PR, but here's an example where I think we probably want to enable @msirek's stats logic to get distinct counts / histograms from check constraints even when there are no stats available.
This commit moves some logic in normalization custom functions to the memo package so that it can be used more generally in the future. Release note: None
In uniqueness checks, WithScans that buffer the mutation's input are now replaced with Values expression when inserted values are constant. This allows further optimization of the uniqueness check sub-expression. It is particularly beneficial for `REGIONAL BY ROW` tables and hash-sharded `REGIONAL BY ROW` tables because it reduces the search space for duplicate values. Note that this inlining does not occur for FK checks. The FK insert fast path relies on WithScans in FK checks. Until this limitation is lifted, inlining constant insert values would prevent the fast path from being planned. Informs cockroachdb#63882 Release note (performance improvement): Previously, uniqueness checks performed for inserts into REGIONAL BY ROW tables always searched all regions for duplicates. In some cases, these checks will now only search a subset of regions when inserting a single row of constant values.
3ee4e37 to
e55a7bf
Compare
mgartner
left a comment
There was a problem hiding this comment.
No worries!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @msirek and @rytaft)
pkg/sql/opt/optbuilder/mutation_builder.go, line 1329 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: I'd call this
isFK
Good idea. Done.
pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 86 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
can you remove this TODO now?
There is still the FK check where the values are not inlined, so all shards are searched. I think if we can inline the values in the FK check in the future, only a single shard would be searched. I've left the TODO but reworded it to make it more clear that it is referencing the FK check, not the uniqueness check.
pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
what is this expression? looks like a full table scan?
It's actually a span with a contradiction constraint. See the discussion Mark and I had above. It's quite peculiar. I've already created an issue to track this: #78026
pkg/ccl/logictestccl/testdata/logic_test/partitioning_implicit, line 781 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
awesome!
We could potentially optimize this to remove the constraint check altogether in this case (in a future PR)
Edit: I see you added a TODO for this elsewhere
👍
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 1524 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
wonder if we can remove this TODO
Ya, looks like uniq_idx is now being used. I'm removed the TODO.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2207 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Not related to this PR, but here's an example where I think we probably want to enable @msirek's stats logic to get distinct counts / histograms from check constraints even when there are no stats available.
👍
rytaft
left a comment
There was a problem hiding this comment.
This feels like something we could consider backporting for 22.1.1 -- what do you think?
Reviewed 12 of 12 files at r5, 12 of 12 files at r6, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @msirek)
pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
It's actually a span with a contradiction constraint. See the discussion Mark and I had above. It's quite peculiar. I've already created an issue to track this: #78026
Ohhh got it. Thanks!
Seems fine to me. I'll add the backport label. TFTRs! bors r+ |
|
Build succeeded: |
opt: move constant value helpers to memo package
This commit moves some logic in normalization custom functions to the
memo package so that it can be used more generally in the future.
Release note: None
opt: inline constant insert values in uniqueness checks
In uniqueness checks, WithScans that buffer the mutation's input are now
replaced with Values expression when inserted values are constant. This
allows further optimization of the uniqueness check sub-expression. It
is particularly beneficial for
REGIONAL BY ROWtables and hash-shardedREGIONAL BY ROWtables because it reduces the search space forduplicate values.
Note that this inlining does not occur for FK checks. The FK insert fast
path relies on WithScans in FK checks. Until this limitation is lifted,
inlining constant insert values would prevent the fast path from being
planned.
Informs #63882
Release note (performance improvement): Previously, uniqueness checks
performed for inserts into REGIONAL BY ROW tables always searched all
regions for duplicates. In some cases, these checks will now only search
a subset of regions when inserting a single row of constant values.