Skip to content

opt: add INSERT ON CONFLICT WHERE syntax for selecting arbiter indexes#53067

Merged
craig[bot] merged 1 commit into
cockroachdb:masterfrom
mgartner:arbiter-indexes
Aug 25, 2020
Merged

opt: add INSERT ON CONFLICT WHERE syntax for selecting arbiter indexes#53067
craig[bot] merged 1 commit into
cockroachdb:masterfrom
mgartner:arbiter-indexes

Conversation

@mgartner

@mgartner mgartner commented Aug 19, 2020

Copy link
Copy Markdown
Contributor

This commit adds support for selecting unique partial indexes as
"arbiters" for INSERT ON CONFLICT statements. An arbiter index is a
unique index that is used to detect conflicts in which to perform the
ON CONFLICT action on.

Unique partial indexes can only be used as arbiters if the arbiter
predicate in the ON CONFLICT statement implies the partial index's
predicate.

To avoid parsing the same partial index predicates multiple times, this
commit also adds a parsedIndexExprs cache to mutationBuilder,
similar to the cache it has for default and computed column expressions.

Finally, this commit also makes the parsing of ON CONFLICT DO UPDATE
more strict. It now requires a list of conflict columns. This matches
Postgres behavior.

From https://www.postgresql.org/docs/12/sql-insert.html:

For ON CONFLICT DO UPDATE, a conflict_target must be provided.

In Postgres, the conflict_target can be a list of columns or a
constraint name. Cockroach currently only supports a list of column
names as a conflict target, so the list of column names preceeding
DO UPDATE is now required by the parser.

Release note (sql change): An INSERT ... ON CONFLICT DO UPDATE
statement without a list of column names after ON CONFLICT now results
in a SQL syntax error with the error code 42601. Previously it errored
with the message "there is no unqiue or exlcusion constraint matching
the ON CONFLICT specification" and the error code 42P10.

@mgartner mgartner requested a review from a team as a code owner August 19, 2020 18:58
@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@mgartner mgartner left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/optbuilder/partial_index.go, line 26 at r1 (raw file):

// implication logic. See Factory.NormalizePartialIndexPredicate for more
// details.
func (b *Builder) buildPartialIndexPredicate(tableScope *scope, expr tree.Expr) memo.FiltersExpr {

I'm open to any better ideas on where to put this function. optbuilder/scalar.go didn't seem to be the best place, and neither did optbuilder/util.go, so I made a new file. 🤷

@mgartner mgartner force-pushed the arbiter-indexes branch 3 times, most recently from 3601872 to 1149ccd Compare August 19, 2020 21:51

@RaduBerinde RaduBerinde left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! :lgtm:

The release note should mention the DO UPDATE change. I also added a docs-todo label so the docs can be updated.

Maybe for a follow-up, but it would be nice if we kept the list of arbiter indexes in the private and passed it to the exec factory, so we can show it opt tests and in EXPLAIN.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/opt/optbuilder/insert.go, line 698 at r2 (raw file):

		// If conflict columns were explicitly specified, then only check for
		// conflicts on arbiter indexes.
		if !conflictOrds.Empty() && !arbiterIndexes.Contains(idx) {

This code would be simplified if had arbiterIndexes return the set of all unique indexes if conflictOrds is empty.


pkg/sql/opt/optbuilder/insert.go, line 1185 at r2 (raw file):

// or pseudo-partial index guarantee uniqueness of their columns across all
// rows.
func (mb *mutationBuilder) arbiterIndexes(

Very nice code, clean and well documented!


pkg/sql/opt/optbuilder/insert.go, line 1218 at r2 (raw file):

			arbiters = util.FastIntSet{}
			arbiters.Add(idx)
			return arbiters

[nit] return util.MakeFastIntSet(idx)


pkg/sql/opt/optbuilder/insert.go, line 1221 at r2 (raw file):

		}

		// Initialize tableScope once and only if needed.

Are we missing a tableScope == nil here?


pkg/sql/opt/optbuilder/testdata/upsert, line 2180 at r2 (raw file):

      └── projections
           └── column3:7 = 'foo' [as=column17:17]

[nit] I think we should have more tests here, at least for the case where a non-partial unique index is chosen as the sole arbiter, and for a DO UPDATE variant.

This commit adds support for selecting unique partial indexes as
"arbiters" for `INSERT ON CONFLICT` statements. An arbiter index is a
unique index that is used to detect conflicts in which to perform the
`ON CONFLICT` action on.

Unique partial indexes can only be used as arbiters if the arbiter
predicate in the `ON CONFLICT` statement implies the partial index's
predicate.

To avoid parsing the same partial index predicates multiple times, this
commit also adds a `parsedIndexExprs` cache to `mutationBuilder`,
similar to the cache it has for default and computed column expressions.

Finally, this commit also makes the parsing of `ON CONFLICT DO UPDATE`
more strict. It now requires a list of conflict columns. This matches
Postgres behavior.

From https://www.postgresql.org/docs/12/sql-insert.html:

> For ON CONFLICT DO UPDATE, a conflict_target must be provided.

In Postgres, the `conflict_target` can be a list of columns or a
constraint name. Cockroach currently only supports a list of column
names as a conflict target, so the list of column names preceeding
`DO UPDATE` is now required by the parser.

Release note (sql change): An `INSERT ... ON CONFLICT DO UPDATE`
statement without a list of column names after `ON CONFLICT` now results
in a SQL syntax error with the error code 42601. Previously it errored
with the message "there is no unqiue or exlcusion constraint matching
the ON CONFLICT specification" and the error code 42P10.

@mgartner mgartner left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the release note.

Good idea re: showing the arbiters in opt tests and EXPLAIN. I'll do that in a follow-up.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde)


pkg/sql/opt/optbuilder/insert.go, line 698 at r2 (raw file):

Previously, RaduBerinde wrote…

This code would be simplified if had arbiterIndexes return the set of all unique indexes if conflictOrds is empty.

Great idea. Done.


pkg/sql/opt/optbuilder/insert.go, line 1185 at r2 (raw file):

Previously, RaduBerinde wrote…

Very nice code, clean and well documented!

Thank you, sir!


pkg/sql/opt/optbuilder/insert.go, line 1218 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] return util.MakeFastIntSet(idx)

Done.


pkg/sql/opt/optbuilder/insert.go, line 1221 at r2 (raw file):

Previously, RaduBerinde wrote…

Are we missing a tableScope == nil here?

Ya this was the wrong conditional entirely. Looks like it worked anyway. Fixed.


pkg/sql/opt/optbuilder/testdata/upsert, line 2180 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] I think we should have more tests here, at least for the case where a non-partial unique index is chosen as the sole arbiter, and for a DO UPDATE variant.

I've added the case for the non-partial index chosen as the sole arbiter.

I'll leave DO UPDATE variants for when I add support for them, if that's ok. They use a different code path, buildInputForUpsert instead of buildInputForDoNothing. buildInputForUpsert doesn't yet use the arbiterIndexes function.

@rytaft rytaft left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 9 files at r1, 7 of 9 files at r2, 2 of 2 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @RaduBerinde)


pkg/sql/opt/optbuilder/testdata/upsert, line 2281 at r3 (raw file):

----

# If there is a non-partial unique index, then it is the only arbiter.

Is there a way to tell from the plan which index is the arbiter? (is this what Radu was talking about above?)


pkg/sql/opt/optbuilder/testdata/upsert, line 2306 at r3 (raw file):

      │    │         │    │    ├── columns: column1:5!null column2:6!null column3:7!null
      │    │         │    │    └── (1, 1, 'bar')
      │    │         │    ├── scan unique_partial_indexes

should this be @u3?

@mgartner mgartner left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/optbuilder/testdata/upsert, line 2281 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Is there a way to tell from the plan which index is the arbiter? (is this what Radu was talking about above?)

Yes that's what he was talking about. It's coming in this future PR: #53172


pkg/sql/opt/optbuilder/testdata/upsert, line 2306 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

should this be @u3?

optbuilder does not currently build index-specific scans or force an index for detecting UPSERT or INSERT ON CONFLICT collisions. So for the build directive, scans are always full table scans on the primary index. We rely on transformation rules to convert this into a constrained scan on the secondary index. In theory, it could used another index to detect collisions that may be less costly (though in practice I doubt that it would).

I thought about having a way to unsure that the unique indexes are used for these scans, given that UPSERT/INSERT ON CONFLICT performance would be abysmal if it wasn't. But if the optimizer is choosing a full table scan over a constrained scan then there's likely other problems as well...

@mgartner

Copy link
Copy Markdown
Contributor Author

bors r+

@rytaft rytaft left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde)


pkg/sql/opt/optbuilder/testdata/upsert, line 2306 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

optbuilder does not currently build index-specific scans or force an index for detecting UPSERT or INSERT ON CONFLICT collisions. So for the build directive, scans are always full table scans on the primary index. We rely on transformation rules to convert this into a constrained scan on the secondary index. In theory, it could used another index to detect collisions that may be less costly (though in practice I doubt that it would).

I thought about having a way to unsure that the unique indexes are used for these scans, given that UPSERT/INSERT ON CONFLICT performance would be abysmal if it wasn't. But if the optimizer is choosing a full table scan over a constrained scan then there's likely other problems as well...

ah right -- thanks for the explanation

@craig

craig Bot commented Aug 24, 2020

Copy link
Copy Markdown
Contributor

Build failed (retrying...):

craig Bot pushed a commit that referenced this pull request Aug 24, 2020
53067: opt: add INSERT ON CONFLICT WHERE syntax for selecting arbiter indexes r=mgartner a=mgartner

This commit adds support for selecting unique partial indexes as
"arbiters" for `INSERT ON CONFLICT` statements. An arbiter index is a
unique index that is used to detect conflicts in which to perform the
`ON CONFLICT` action on.

Unique partial indexes can only be used as arbiters if the arbiter
predicate in the `ON CONFLICT` statement implies the partial index's
predicate.

To avoid parsing the same partial index predicates multiple times, this
commit also adds a `parsedIndexExprs` cache to `mutationBuilder`,
similar to the cache it has for default and computed column expressions.

Finally, this commit also makes the parsing of `ON CONFLICT DO UPDATE`
more strict. It now requires a list of conflict columns. This matches
Postgres behavior.

From https://www.postgresql.org/docs/12/sql-insert.html:

> For ON CONFLICT DO UPDATE, a conflict_target must be provided.

In Postgres, the `conflict_target` can be a list of columns or a
constraint name. Cockroach currently only supports a list of column
names as a conflict target, so the list of column names preceeding
`DO UPDATE` is now required by the parser.

Release note (sql change): An `INSERT ... ON CONFLICT DO UPDATE`
statement without a list of column names after `ON CONFLICT` now results
in a SQL syntax error with the error code 42601. Previously it errored
with the message "there is no unqiue or exlcusion constraint matching
the ON CONFLICT specification" and the error code 42P10.


53126: sql: drop schemas when dropping parent database r=rohany a=rohany

Fixes #52362.

This commit updates `DROP DATABASE CASCADE` to also clean up any child
user defined schemas.

Release note: None

Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
@craig

craig Bot commented Aug 25, 2020

Copy link
Copy Markdown
Contributor

Build failed (retrying...):

@craig

craig Bot commented Aug 25, 2020

Copy link
Copy Markdown
Contributor

Build succeeded:

@craig craig Bot merged commit 56118d8 into cockroachdb:master Aug 25, 2020
@mgartner mgartner deleted the arbiter-indexes branch August 25, 2020 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants