Skip to content

opt: inline constant insert values in uniqueness checks#77943

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
mgartner:uniq-check-const-values
Apr 1, 2022
Merged

opt: inline constant insert values in uniqueness checks#77943
craig[bot] merged 2 commits intocockroachdb:masterfrom
mgartner:uniq-check-const-values

Conversation

@mgartner
Copy link
Copy Markdown
Contributor

opt: move constant value helpers to memo package

This commit moves some logic in normalization custom functions to the
memo package so that it can be used more generally in the future.

Release note: None

opt: inline constant insert values in uniqueness checks

In uniqueness checks, WithScans that buffer the mutation's input are now
replaced with Values expression when inserted values are constant. This
allows further optimization of the uniqueness check sub-expression. It
is particularly beneficial for REGIONAL BY ROW tables and hash-sharded
REGIONAL BY ROW tables because it reduces the search space for
duplicate values.

Note that this inlining does not occur for FK checks. The FK insert fast
path relies on WithScans in FK checks. Until this limitation is lifted,
inlining constant insert values would prevent the fast path from being
planned.

Informs #63882

Release note (performance improvement): Previously, uniqueness checks
performed for inserts into REGIONAL BY ROW tables always searched all
regions for duplicates. In some cases, these checks will now only search
a subset of regions when inserting a single row of constant values.

@mgartner mgartner requested review from a team, msirek and rytaft March 16, 2022 15:31
@mgartner mgartner requested a review from a team as a code owner March 16, 2022 15:31
@mgartner
Copy link
Copy Markdown
Contributor Author

This commit revives an old PR where I attempted to do the same: #65093. The main difference is that this PR only inlines values in uniqiueness checks. It does not inline values into FK checks to avoid breaking the insert fast path. I have some ideas on how to overcome this limitation, but I'll save that for a future PR.


cc @chengxiong-ruan who might be interested in some of the improvements to queries with hash-sharded indexes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These TODOs were added gratuitously, and I'm removed them in cases where they were not accurate.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@mgartner mgartner force-pushed the uniq-check-const-values branch from 146c9b1 to 99f4be6 Compare March 16, 2022 16:45
Copy link
Copy Markdown
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! I'm curious how much speedup this yields in common cases.
:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/memo/expr.go, line 843 at r1 (raw file):

	if project, ok := input.(*ProjectExpr); ok {
		for i := range project.Projections {
			item := &project.Projections[i]

I know this is just moving the function, but I wonder if it's useful to examine Passthrough columns too and peek into the child operation to look for the constant. Maybe there are no use cases where this is needed though.

Code quote:

ExtractColumnFromProjectOrValues

pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r2 (raw file):

                            │     estimated row count: 1 (missing stats)
                            │     table: t_unique_hash_pk@t_unique_hash_pk_pkey
                            │     spans

This looks like when intersecting spans the result is empty. Do you think it would be worth opening an issue to eliminate the UNION ALL when one side is known to return zero rows.

Copy link
Copy Markdown
Contributor Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @msirek and @rytaft)


pkg/sql/opt/memo/expr.go, line 843 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

I know this is just moving the function, but I wonder if it's useful to examine Passthrough columns too and peek into the child operation to look for the constant. Maybe there are no use cases where this is needed though.

I don't think it's necessary because normalization rules should merge all the Projects together, or merge the Projects into a subexpression that is a single Values row.

For example:

optsteps
SELECT a + 1 FROM (VALUES (1)) AS v(a)
----
================================================================================
Initial expression
  Cost: 0.05
================================================================================
  project
   ├── columns: "?column?":2!null
   ├── cardinality: [1 - 1]
   ├── immutable
   ├── key: ()
   ├── fd: ()-->(2)
   ├── values
   │    ├── columns: column1:1!null
   │    ├── cardinality: [1 - 1]
   │    ├── key: ()
   │    ├── fd: ()-->(1)
   │    └── (1,)
   └── projections
        └── column1:1 + 1 [as="?column?":2, outer=(1), immutable]
================================================================================
InlineProjectConstants
  Cost: 0.05
================================================================================
   project
    ├── columns: "?column?":2!null
    ├── cardinality: [1 - 1]
    ├── immutable
    ├── key: ()
    ├── fd: ()-->(2)
    ├── values
    │    ├── columns: column1:1!null
    │    ├── cardinality: [1 - 1]
    │    ├── key: ()
    │    ├── fd: ()-->(1)
    │    └── (1,)
    └── projections
  -      └── column1:1 + 1 [as="?column?":2, outer=(1), immutable]
  +      └── 1 + 1 [as="?column?":2, immutable]
================================================================================
FoldBinary
  Cost: 0.05
================================================================================
   project
    ├── columns: "?column?":2!null
    ├── cardinality: [1 - 1]
  - ├── immutable
    ├── key: ()
    ├── fd: ()-->(2)
    ├── values
    │    ├── columns: column1:1!null
    │    ├── cardinality: [1 - 1]
    │    ├── key: ()
    │    ├── fd: ()-->(1)
    │    └── (1,)
    └── projections
  -      └── 1 + 1 [as="?column?":2, immutable]
  +      └── 2 [as="?column?":2]
================================================================================
MergeProjectWithValues
  Cost: 0.02
================================================================================
  -project
  +values
    ├── columns: "?column?":2!null
    ├── cardinality: [1 - 1]
    ├── key: ()
    ├── fd: ()-->(2)
  - ├── values
  - │    ├── columns: column1:1!null
  - │    ├── cardinality: [1 - 1]
  - │    ├── key: ()
  - │    ├── fd: ()-->(1)
  - │    └── (1,)
  - └── projections
  -      └── 2 [as="?column?":2]
  + └── (2,)
================================================================================
Final best expression
  Cost: 0.02
================================================================================
  values
   ├── columns: "?column?":2!null
   ├── cardinality: [1 - 1]
   ├── key: ()
   ├── fd: ()-->(2)
   └── (2,)

Anything beneath the Project should be non-constant (or at least not provably constant).


pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r2 (raw file):

Previously, msirek (Mark Sirek) wrote…

This looks like when intersecting spans the result is empty. Do you think it would be worth opening an issue to eliminate the UNION ALL when one side is known to return zero rows.

Great eye! I spent all day yesterday working on ways to eliminate the scan with a contradiction constraint and the union all. It's a bit tricky, but I'll keep at it. I've created an issue to track: #78026

Copy link
Copy Markdown
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/memo/expr.go, line 843 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I don't think it's necessary because normalization rules should merge all the Projects together, or merge the Projects into a subexpression that is a single Values row.

For example:

optsteps
SELECT a + 1 FROM (VALUES (1)) AS v(a)
----
================================================================================
Initial expression
  Cost: 0.05
================================================================================
  project
   ├── columns: "?column?":2!null
   ├── cardinality: [1 - 1]
   ├── immutable
   ├── key: ()
   ├── fd: ()-->(2)
   ├── values
   │    ├── columns: column1:1!null
   │    ├── cardinality: [1 - 1]
   │    ├── key: ()
   │    ├── fd: ()-->(1)
   │    └── (1,)
   └── projections
        └── column1:1 + 1 [as="?column?":2, outer=(1), immutable]
================================================================================
InlineProjectConstants
  Cost: 0.05
================================================================================
   project
    ├── columns: "?column?":2!null
    ├── cardinality: [1 - 1]
    ├── immutable
    ├── key: ()
    ├── fd: ()-->(2)
    ├── values
    │    ├── columns: column1:1!null
    │    ├── cardinality: [1 - 1]
    │    ├── key: ()
    │    ├── fd: ()-->(1)
    │    └── (1,)
    └── projections
  -      └── column1:1 + 1 [as="?column?":2, outer=(1), immutable]
  +      └── 1 + 1 [as="?column?":2, immutable]
================================================================================
FoldBinary
  Cost: 0.05
================================================================================
   project
    ├── columns: "?column?":2!null
    ├── cardinality: [1 - 1]
  - ├── immutable
    ├── key: ()
    ├── fd: ()-->(2)
    ├── values
    │    ├── columns: column1:1!null
    │    ├── cardinality: [1 - 1]
    │    ├── key: ()
    │    ├── fd: ()-->(1)
    │    └── (1,)
    └── projections
  -      └── 1 + 1 [as="?column?":2, immutable]
  +      └── 2 [as="?column?":2]
================================================================================
MergeProjectWithValues
  Cost: 0.02
================================================================================
  -project
  +values
    ├── columns: "?column?":2!null
    ├── cardinality: [1 - 1]
    ├── key: ()
    ├── fd: ()-->(2)
  - ├── values
  - │    ├── columns: column1:1!null
  - │    ├── cardinality: [1 - 1]
  - │    ├── key: ()
  - │    ├── fd: ()-->(1)
  - │    └── (1,)
  - └── projections
  -      └── 2 [as="?column?":2]
  + └── (2,)
================================================================================
Final best expression
  Cost: 0.02
================================================================================
  values
   ├── columns: "?column?":2!null
   ├── cardinality: [1 - 1]
   ├── key: ()
   ├── fd: ()-->(2)
   └── (2,)

Anything beneath the Project should be non-constant (or at least not provably constant).

OK, thanks.


pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Great eye! I spent all day yesterday working on ways to eliminate the scan with a contradiction constraint and the union all. It's a bit tricky, but I'll keep at it. I've created an issue to track: #78026

Thanks for looking into it. It shouldn't hold up this PR, but I was just curious about this. We don't currently have a lot of normalization rules matching on UNION ALL, but maybe it's possible to train EliminateSetLeft and EliminateSetRight to handle this. I see they call HasZeroRows which calls Relational().Cardinality.IsZero(). Maybe HasZeroRows could also look for a ScanExpr inside a ProjectExpr or SelectExpr and see if there's a contradiction constraint.

@mgartner
Copy link
Copy Markdown
Contributor Author


pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r2 (raw file):

Previously, msirek (Mark Sirek) wrote…

Thanks for looking into it. It shouldn't hold up this PR, but I was just curious about this. We don't currently have a lot of normalization rules matching on UNION ALL, but maybe it's possible to train EliminateSetLeft and EliminateSetRight to handle this. I see they call HasZeroRows which calls Relational().Cardinality.IsZero(). Maybe HasZeroRows could also look for a ScanExpr inside a ProjectExpr or SelectExpr and see if there's a contradiction constraint.

The problem is that normalization rules only apply when norm.Factory.ConstructXxxx functions are called. When SplitDisjunction calls Factory.ConstructUnionAll, the left side is an expression like (Select (Scan ...) Filters). This doesn't appear to be a contradiction, so the left side is not simplified to an empty values expression, thus the UnionAll is not eliminated.

Only later on, when GenerateConstrainedScans fires, is it determined that the Select's filters and the computed column expression form a contradiction. At this point, there is no opportunity for EliminateSetLeft to fire because the UnionAll has already been constructed.

I think we'd also have trouble making an exploration rule that eliminates the UnionAll. The (Select (Scan ...) Filters) created on the left side of the UnionAll by SplitDisjunction is the first member of a new group. The Cardinality in the logical properties of this group is not zero because, like I mentioned above, it's not apparent that there is a contradiction. GenerateConstrainedScans adds the (Scan contradiction) to this same group. However, the cardinality of the scan remains non-zero because logical properties are shared by all members of a group.

I do have a somewhat promising idea, though. We can add the computed column equality into the filters during normalization, and the other normalization rules should fire to eliminate the OR entirely. Then SplitDisjunction never matches, and the UnionAll is never created. I mentioned this in #78026. It is a non-trivial rule, however, mostly because there's not an obvious way to prevent the rule from matching over and over again forever.

Copy link
Copy Markdown
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

The problem is that normalization rules only apply when norm.Factory.ConstructXxxx functions are called. When SplitDisjunction calls Factory.ConstructUnionAll, the left side is an expression like (Select (Scan ...) Filters). This doesn't appear to be a contradiction, so the left side is not simplified to an empty values expression, thus the UnionAll is not eliminated.

Only later on, when GenerateConstrainedScans fires, is it determined that the Select's filters and the computed column expression form a contradiction. At this point, there is no opportunity for EliminateSetLeft to fire because the UnionAll has already been constructed.

I think we'd also have trouble making an exploration rule that eliminates the UnionAll. The (Select (Scan ...) Filters) created on the left side of the UnionAll by SplitDisjunction is the first member of a new group. The Cardinality in the logical properties of this group is not zero because, like I mentioned above, it's not apparent that there is a contradiction. GenerateConstrainedScans adds the (Scan contradiction) to this same group. However, the cardinality of the scan remains non-zero because logical properties are shared by all members of a group.

I do have a somewhat promising idea, though. We can add the computed column equality into the filters during normalization, and the other normalization rules should fire to eliminate the OR entirely. Then SplitDisjunction never matches, and the UnionAll is never created. I mentioned this in #78026. It is a non-trivial rule, however, mostly because there's not an obvious way to prevent the rule from matching over and over again forever.

Interesting idea. Would the rule keep firing because the idea adds computed column equality within a normalization rule? Would it be possible to add the equality around the same location as Builder.addCheckConstraintsForTable() is called? Maybe not all required info is available at that location though.

Another option may be to re-run normalization rules on all RelExprs after all transformation rules are applied. But that would require pulling all of the normalization rules out into functions instead of inlining them in the norm.Factory.ConstructXxxx functions; a slightly drastic change, but may be worth thinking about. Something like that might mean items in the internCache would need to get removed and re-added if their hash changes.

@mgartner
Copy link
Copy Markdown
Contributor Author


pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r2 (raw file):

Would the rule keep firing because the idea adds computed column equality within a normalization rule?

A simple implementation of the rule would match something like (Select (Scan) $filters:*) and produce (Select (Scan) (ConcatFilters $filters $compColFilters)). The generated Go would be a block in ConstructSelect that recursively calls ConstructSelect. After the new filters have been added the first time, we need a way to prevent the rule from matching on the subsequent recursive calls.

Another option may be to re-run normalization rules on all RelExprs after all transformation rules are applied.

This has come up before, but we've been reluctant to try it because it won't work in all cases. If we do a pass of normalization rules after the lowest cost plan has been selected, we might not end up with the optimal plan. The optimal plan could be one that only has the lowest cost after that final round of normalization rules.

@mgartner
Copy link
Copy Markdown
Contributor Author

@rytaft do you mind taking a look at this when you get a chance? No rush.

Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! So sorry for the delay.

Reviewed 2 of 2 files at r1, 12 of 12 files at r3, 13 of 13 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @msirek)


pkg/sql/opt/optbuilder/mutation_builder.go, line 1329 at r4 (raw file):

// non-nullable table columns).
//
// fk should be true when building inputs for FK checks, and false otherwise.

nit: I'd call this isFK


pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 86 at r4 (raw file):

# TODO (mgartner): there is a lookup join with lookup condition checking every
# single shard. This is unnecessary and could be improved by having the shard
# number calculated instead of looking at all possible shards.

can you remove this TODO now?


pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r4 (raw file):

                            │     estimated row count: 1 (missing stats)
                            │     table: t_unique_hash_pk@t_unique_hash_pk_pkey
                            │     spans

what is this expression? looks like a full table scan?


pkg/ccl/logictestccl/testdata/logic_test/partitioning_implicit, line 781 at r4 (raw file):

    └── • error if rows
        │
        └── • norows

awesome!

We could potentially optimize this to remove the constraint check altogether in this case (in a future PR)

Edit: I see you added a TODO for this elsewhere


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 1524 at r4 (raw file):

# TODO(treilly): The constraint check for uniq_idx should use uniq_idx but due
# to stats issues w/ empty stats, partial indexes and multicol stats its not.
# Hopefully fixing #67583 (and possibly #67479) will resolve this.

wonder if we can remove this TODO


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2207 at r4 (raw file):

│                         missing stats
│                         table: regional_by_row_table_virt_partial@regional_by_row_table_virt_partial_pkey
│                         spans: [/'ap-southeast-2' - /'ap-southeast-2'] [/'ca-central-1' - /'us-east-1']

Not related to this PR, but here's an example where I think we probably want to enable @msirek's stats logic to get distinct counts / histograms from check constraints even when there are no stats available.

This commit moves some logic in normalization custom functions to the
memo package so that it can be used more generally in the future.

Release note: None
In uniqueness checks, WithScans that buffer the mutation's input are now
replaced with Values expression when inserted values are constant. This
allows further optimization of the uniqueness check sub-expression. It
is particularly beneficial for `REGIONAL BY ROW` tables and hash-sharded
`REGIONAL BY ROW` tables because it reduces the search space for
duplicate values.

Note that this inlining does not occur for FK checks. The FK insert fast
path relies on WithScans in FK checks. Until this limitation is lifted,
inlining constant insert values would prevent the fast path from being
planned.

Informs cockroachdb#63882

Release note (performance improvement): Previously, uniqueness checks
performed for inserts into REGIONAL BY ROW tables always searched all
regions for duplicates. In some cases, these checks will now only search
a subset of regions when inserting a single row of constant values.
@mgartner mgartner force-pushed the uniq-check-const-values branch from 3ee4e37 to e55a7bf Compare March 31, 2022 23:20
Copy link
Copy Markdown
Contributor Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @msirek and @rytaft)


pkg/sql/opt/optbuilder/mutation_builder.go, line 1329 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: I'd call this isFK

Good idea. Done.


pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 86 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

can you remove this TODO now?

There is still the FK check where the values are not inlined, so all shards are searched. I think if we can inline the values in the FK check in the future, only a single shard would be searched. I've left the TODO but reworded it to make it more clear that it is referencing the FK check, not the uniqueness check.


pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

what is this expression? looks like a full table scan?

It's actually a span with a contradiction constraint. See the discussion Mark and I had above. It's quite peculiar. I've already created an issue to track this: #78026


pkg/ccl/logictestccl/testdata/logic_test/partitioning_implicit, line 781 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

awesome!

We could potentially optimize this to remove the constraint check altogether in this case (in a future PR)

Edit: I see you added a TODO for this elsewhere

👍


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 1524 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

wonder if we can remove this TODO

Ya, looks like uniq_idx is now being used. I'm removed the TODO.


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior, line 2207 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Not related to this PR, but here's an example where I think we probably want to enable @msirek's stats logic to get distinct counts / histograms from check constraints even when there are no stats available.

👍

Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: This feels like something we could consider backporting for 22.1.1 -- what do you think?

Reviewed 12 of 12 files at r5, 12 of 12 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @msirek)


pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index_query_plan, line 390 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

It's actually a span with a contradiction constraint. See the discussion Mark and I had above. It's quite peculiar. I've already created an issue to track this: #78026

Ohhh got it. Thanks!

@mgartner
Copy link
Copy Markdown
Contributor Author

mgartner commented Apr 1, 2022

This feels like something we could consider backporting for 22.1.1 -- what do you think?

Seems fine to me. I'll add the backport label.

TFTRs!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 1, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants