sql: Deduplication logic for `ALTER PRIMARY KEY` by Xiang-Gu · Pull Request #78046 · cockroachdb/cockroach

Xiang-Gu · 2022-03-17T19:38:04Z

When we use ALTER PRIMARY KEY to change the primary key of a table, a new
secondary index on the old primary key columns will be created.
This PR added deduplication logic for ALTER PRIMARY KEY if there exists a unique,
non-partial secondary index that has the same columns as the (old) PK columns in
the same order with same direction.

Release note (sql change): ALTER PRIMARY KEY will not create a seconday index on
the old PK columns if there is already a unique, non-partial secondary index that is
"identical" to the (old) PK columns. Two indexes are "identical" if they have exactly
the same columns, in the same order, with same direction.

Fixes: #74116

Jira issue: CRDB-14766

cockroach-teamcity · 2022-03-17T19:38:11Z

This change is

postamar

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Xiang-Gu)

pkg/sql/alter_primary_key.go, line 663 at r1 (raw file):

		return true
	}
	alreadyHasSecondaryIndexOnPKColumns := func(desc *tabledesc.Mutable) bool {

Can you rewrite this with desc being of type catalog.TableDescriptor instead? We try to use these catalog interfaces wherever possible. There's no way you could have guessed this so don't feel bad.

Really, this whole shouldCopyPrimaryKey function should be rewritten in this way, the only reason it hasn't is because it was written before these interfaces were introduced about a year ago.

pkg/sql/alter_primary_key.go, line 666 at r1 (raw file):

		// Return whether any existing secondary index has identical key columns
		// as the primary key columns.
		pk := desc.GetPrimaryIndex().IndexDesc()

In light of the above comment, this should become pk := desc.GetPrimaryIndex() which returns a catalog.Index. The .IndexDesc() method returns the underlying descriptor protobuf, this has the same information in it but is much less safe to use. Partly because the interface is immutable by design so it's less easy to overwrite a value by mistake, but mostly because we have to keep descriptor protobufs compatible across cluster versions so sometimes some fields have unintuitive values.

pkg/sql/alter_primary_key.go, line 667 at r1 (raw file):

		// as the primary key columns.
		pk := desc.GetPrimaryIndex().IndexDesc()
		for _, existingIndex := range desc.GetIndexes() {

In light of the previous, here you'll want to iterate over desc.PublicNonPrimaryIndexes().

pkg/sql/alter_primary_key.go, line 670 at r1 (raw file):

			if len(pk.KeyColumnIDs) != len(existingIndex.KeyColumnIDs) || !existingIndex.Unique {
				continue
			}

You should also check that existingIndex is not a partial index. It isn't the most obvious thing in the world but we allow WHERE clauses in indexes! To reuse your example in the logic test, this could be something like CREATE TABLE t (a INT PRIMARY KEY, b INT NOT NULL, UNIQUE INDEX t_a_key (a) WHERE (a > 0));.

You can check for the presence of this partial predicate with the IsPartial method on catalog.Index.

pkg/sql/alter_primary_key.go, line 674 at r1 (raw file):

			for i := range pk.KeyColumnIDs {
				if pk.KeyColumnIDs[i] != existingIndex.KeyColumnIDs[i] ||
					pk.KeyColumnDirections[i] != existingIndex.KeyColumnDirections[i] {

I'm not sure if column directions matter here, or elsewhere in shouldCopyPrimaryKey for that matter. Let's ask @ajwerner

pkg/sql/alter_primary_key.go, line 678 at r1 (raw file):

					break
				}
			}

catalog.Index has a nice CollectKeyColumnIDs method which returns a catalog.TableColSet which should be much more convenient for what you want to do here, which is to test whether there exists a non-partial unique secondary index whose set of key columns is a superset of the PK columns.

pkg/sql/logictest/testdata/logic_test/alter_primary_key, line 1607 at r1 (raw file):

t_pkey     a   N/A
t_a_key    a   ASC
t_a_key    b   ASC

This is a good test. Perhaps you can also add a case for CREATE TABLE t (a INT PRIMARY KEY, b INT NOT NULL, UNIQUE INDEX idx (a, b));, we should see a similar outcome: only 2 indexes (the old primary index and idx), the old primary index should be discarded.

Xiang-Gu

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @postamar)

pkg/sql/alter_primary_key.go, line 663 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

Can you rewrite this with desc being of type catalog.TableDescriptor instead? We try to use these catalog interfaces wherever possible. There's no way you could have guessed this so don't feel bad.

Really, this whole shouldCopyPrimaryKey function should be rewritten in this way, the only reason it hasn't is because it was written before these interfaces were introduced about a year ago.

Done.

pkg/sql/alter_primary_key.go, line 666 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

In light of the above comment, this should become pk := desc.GetPrimaryIndex() which returns a catalog.Index. The .IndexDesc() method returns the underlying descriptor protobuf, this has the same information in it but is much less safe to use. Partly because the interface is immutable by design so it's less easy to overwrite a value by mistake, but mostly because we have to keep descriptor protobufs compatible across cluster versions so sometimes some fields have unintuitive values.

Done.

pkg/sql/alter_primary_key.go, line 667 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

In light of the previous, here you'll want to iterate over desc.PublicNonPrimaryIndexes().

Done.

pkg/sql/alter_primary_key.go, line 670 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

You should also check that existingIndex is not a partial index. It isn't the most obvious thing in the world but we allow WHERE clauses in indexes! To reuse your example in the logic test, this could be something like CREATE TABLE t (a INT PRIMARY KEY, b INT NOT NULL, UNIQUE INDEX t_a_key (a) WHERE (a > 0));.

You can check for the presence of this partial predicate with the IsPartial method on catalog.Index.

Done.

pkg/sql/alter_primary_key.go, line 674 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

I'm not sure if column directions matter here, or elsewhere in shouldCopyPrimaryKey for that matter. Let's ask @ajwerner

The reason I check column directions is one of the existing logic tests has an expected output that created a new secondary index for ALTER PRIMARY KEY while there is already a secondary index on the (old) primary key columns, but opposite direction.

If directions do not matter, then I'll need to modify existing logic tests. If so, is there any other guidelines I need to follow?

pkg/sql/alter_primary_key.go, line 678 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

catalog.Index has a nice CollectKeyColumnIDs method which returns a catalog.TableColSet which should be much more convenient for what you want to do here, which is to test whether there exists a non-partial unique secondary index whose set of key columns is a superset of the PK columns.

Currently, I code index equality as "same keyed columns, same size, in the same order".

I think superset is too "loose" because of the unorderedness -- e.g. it's hard to justify not to create a secondary index on the (old) PK column "a" when there is an existing secondary index on ("b", "a"), since this existing index cannot be used in queries filtered by "a" (e.g. SELECT ... WHERE "a" > 4)

I'd argue that a strict prefix will be a better index-equality definition. That is, we don't create a new secondary index on the (old) PK columns if there exists a secondary index s.t. (old) PK columns is a strict prefix of this secondary index columns.
This definition corresponds exactly to the additional test you suggested below.
However, such a change will also break one of the existing logic tests as expected. I've modified the existing logic test to accommodate this new code change.

pkg/sql/logictest/testdata/logic_test/alter_primary_key, line 1607 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

This is a good test. Perhaps you can also add a case for CREATE TABLE t (a INT PRIMARY KEY, b INT NOT NULL, UNIQUE INDEX idx (a, b));, we should see a similar outcome: only 2 indexes (the old primary index and idx), the old primary index should be discarded.

Added. Thank you for the suggestion.

postamar

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @Xiang-Gu)

-- commits, line 7 at r2:
nit: I believe git commit message lines should not exceed a certain length, which is probably described in the style guide. Try to set up your editor to enforce this limit, that way you won't have to concern yourself with this ever again. This isn't strictly enforced by the way, so no big deal.

pkg/sql/alter_primary_key.go, line 678 at r1 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

Currently, I code index equality as "same keyed columns, same size, in the same order".

I think superset is too "loose" because of the unorderedness -- e.g. it's hard to justify not to create a secondary index on the (old) PK column "a" when there is an existing secondary index on ("b", "a"), since this existing index cannot be used in queries filtered by "a" (e.g. SELECT ... WHERE "a" > 4)

I'd argue that a strict prefix will be a better index-equality definition. That is, we don't create a new secondary index on the (old) PK columns if there exists a secondary index s.t. (old) PK columns is a strict prefix of this secondary index columns.
This definition corresponds exactly to the additional test you suggested below.
However, such a change will also break one of the existing logic tests as expected. I've modified the existing logic test to accommodate this new code change.

My understanding, which may very well be wrong, was that the old primary index was guaranteeing a uniqueness property on the old primary key, and this property should be maintained even after replacing the old primary index with a new one. Your argument certainly has merit, though. @ajwerner can you weigh in here?

In any case this change of yours is a strict improvement to the existing code and so this question should not block merging this PR.

pkg/sql/logictest/testdata/logic_test/alter_primary_key, line 1208 at r2 (raw file):
This is surprising, shouldn't this index be preserved, following your reasoning?

Currently, I code index equality as "same keyed columns, same size, in the same order".

Reading your code I'm also not quite how this is happening. Am I missing something?

Xiang-Gu

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @postamar)

-- commits, line 7 at r2:

Previously, postamar (Marius Posta) wrote…

nit: I believe git commit message lines should not exceed a certain length, which is probably described in the style guide. Try to set up your editor to enforce this limit, that way you won't have to concern yourself with this ever again. This isn't strictly enforced by the way, so no big deal.

Done.

pkg/sql/alter_primary_key.go, line 678 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

My understanding, which may very well be wrong, was that the old primary index was guaranteeing a uniqueness property on the old primary key, and this property should be maintained even after replacing the old primary index with a new one. Your argument certainly has merit, though. @ajwerner can you weigh in here?

In any case this change of yours is a strict improvement to the existing code and so this question should not block merging this PR.

the old primary index was guaranteeing a uniqueness property on the old primary key, and this property should be maintained even after replacing the old primary index with a new one
If we'd like to continue to maintain the uniqueness property on the old primary key column(s), then yes, we do need a "strict" equality definition (i.e. same column(s), same direction, in same order, unique, non-partial). I am just curious why we want to do that, so yeah, Andrew, please give us more insight here.

In any case this change of yours is a strict improvement to the existing code and so this question should not block merging this PR.
In the most recent change, I change the equality definition to strict-prefix (and thus also modified some existing logic tests). We do need to wait until we figure this out before merging.

pkg/sql/logictest/testdata/logic_test/alter_primary_key, line 1208 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

This is surprising, shouldn't this index be preserved, following your reasoning?

Currently, I code index equality as "same keyed columns, same size, in the same order".

Reading your code I'm also not quite how this is happening. Am I missing something?

In the most recent change of this PR, I changed the index-equality from "same keyed columns, same size, in the same order" to "strict-prefix".

This breaks some existing logic tests, such as this one, because we already have a secondary index (id, id2), and the old PK (id) is a strict prefix of it.

This is also the reason for other logic tests modification.

ajwerner

I think I'd let the principle of lease surprise guide me here. Two unique indexes of the same exact columns would be surprising. I'm not sure much else would be. This is all an edge case for a behavior cockroachdb made up, so I more or less agree we can't mess up too badly here.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @postamar and @Xiang-Gu)

pkg/sql/alter_primary_key.go, line 674 at r1 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

The reason I check column directions is one of the existing logic tests has an expected output that created a new secondary index for ALTER PRIMARY KEY while there is already a secondary index on the (old) primary key columns, but opposite direction.

If directions do not matter, then I'll need to modify existing logic tests. If so, is there any other guidelines I need to follow?

This is all a convenience feature which postgres doesn't have. We can do whatever we want here. The reason we did this was out of fear that customers will experience very bad performance on queries they expected previously to work well. The difference in directions can affect some queries, but is much less likely to than the lack of the index. Either choice would be valid.

pkg/sql/alter_primary_key.go, line 678 at r1 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

the old primary index was guaranteeing a uniqueness property on the old primary key, and this property should be maintained even after replacing the old primary index with a new one
If we'd like to continue to maintain the uniqueness property on the old primary key column(s), then yes, we do need a "strict" equality definition (i.e. same column(s), same direction, in same order, unique, non-partial). I am just curious why we want to do that, so yeah, Andrew, please give us more insight here.

In any case this change of yours is a strict improvement to the existing code and so this question should not block merging this PR.
In the most recent change, I change the equality definition to strict-prefix (and thus also modified some existing logic tests). We do need to wait until we figure this out before merging.

I think I agree that the order of the columns matters. I don't know how much we care about the direction.

pkg/sql/alter_primary_key.go, line 631 at r3 (raw file):

//   other than hash sharding.
// * There is no partitioning change.
// * There is no existing secondary index on the old primary key columns.

This is stale given the strict prefix change. Let's make sure the design aligns with the commentary before we merge.

pkg/sql/alter_primary_key.go, line 670 at r3 (raw file):

		for _, existingIndex := range desc.PublicNonPrimaryIndexes() {
			if existingIndex.IsPartial() || !existingIndex.IsUnique() || pk.NumKeyColumns() > existingIndex.NumKeyColumns() {

nit: I encourage you to refactor this logic into yet another helper which takes two indexes and compares them on whatever properties we deem important. That way you can just early return a false. It'll also clean up the readability a tad.

Code quote:

			if existingIndex.IsPartial() || !existingIndex.IsUnique() || pk.NumKeyColumns() > existingIndex.NumKeyColumns() {
				continue
			}
			isDifferent := false

pkg/sql/alter_primary_key.go, line 670 at r3 (raw file):

		for _, existingIndex := range desc.PublicNonPrimaryIndexes() {
			if existingIndex.IsPartial() || !existingIndex.IsUnique() || pk.NumKeyColumns() > existingIndex.NumKeyColumns() {

I'm not so sure about this strict prefix choice. At the very least it should be well commented. I do buy that it makes sense from a query performance perspective. One thing that you lose which is surprising is that we'll no longer be maintaining uniqueness over the same tuple. I also don't expect this case to come up very frequently, so it's not clear that it's worth the complexity when compared to . It'd be weird to have a unique secondary index over a tuple with the same columns in the same order with the same direction as the primary key, but with even more columns.

Xiang-Gu

Thank you for the guidance. yeah, I will revert back to the definition of index equality to "same columns, same order, same direction" and consider only "unique, non-partial" existing secondary indexes.

RFAL

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @postamar)

pkg/sql/alter_primary_key.go, line 674 at r1 (raw file):

Previously, ajwerner wrote…

This is all a convenience feature which postgres doesn't have. We can do whatever we want here. The reason we did this was out of fear that customers will experience very bad performance on queries they expected previously to work well. The difference in directions can affect some queries, but is much less likely to than the lack of the index. Either choice would be valid.

Done.

pkg/sql/alter_primary_key.go, line 678 at r1 (raw file):

Previously, ajwerner wrote…

I think I agree that the order of the columns matters. I don't know how much we care about the direction.

acked.

pkg/sql/alter_primary_key.go, line 670 at r3 (raw file):

Previously, ajwerner wrote…

nit: I encourage you to refactor this logic into yet another helper which takes two indexes and compares them on whatever properties we deem important. That way you can just early return a false. It'll also clean up the readability a tad.

done

pkg/sql/alter_primary_key.go, line 670 at r3 (raw file):

Previously, ajwerner wrote…

I'm not so sure about this strict prefix choice. At the very least it should be well commented. I do buy that it makes sense from a query performance perspective. One thing that you lose which is surprising is that we'll no longer be maintaining uniqueness over the same tuple. I also don't expect this case to come up very frequently, so it's not clear that it's worth the complexity when compared to . It'd be weird to have a unique secondary index over a tuple with the same columns in the same order with the same direction as the primary key, but with even more columns.

acked. reverted back to strict-equality definition.

ajwerner

Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @postamar, and @Xiang-Gu)

pkg/sql/alter_primary_key.go, line 681 at r4 (raw file):

		isIdentical := func(idx1, idx2 catalog.Index) bool {
			if idx1.IsUnique() != idx2.IsUnique() ||
				idx1.IsPartial() != idx2.IsPartial() ||

I think that if either of them are partial, you would want to check if they have the same predicate. In practice, we know the primary index won't be partial, so 🤷 , but it feels weird to think this is enough to say that they are identical. Somebody might mindlessly refactor this in the future thinking it's complete.

Xiang-Gu

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @postamar)

pkg/sql/alter_primary_key.go, line 681 at r4 (raw file):

Previously, ajwerner wrote…

I think that if either of them are partial, you would want to check if they have the same predicate. In practice, we know the primary index won't be partial, so 🤷 , but it feels weird to think this is enough to say that they are identical. Somebody might mindlessly refactor this in the future thinking it's complete.

done

When we use `ALTER PRIMARY KEY` to change primary key of a table, a new secondary index on the old primary key columns will be created. This PR added deduplication logic for `ALTER PRIMARY KEY` if the old PK columns is a strict prefix of an existing secondary index columns. Release note (sql change): `ALTER PRIMARY KEY` will not create a seconday index on the old PK columns if they're a strict prefix of an existing secondary index.

Xiang-Gu · 2022-04-04T18:16:47Z

bors r+

craig · 2022-04-04T19:46:00Z

Build succeeded:

GitHub CI (Cockroach)

Xiang-Gu force-pushed the deduplicate_unique_indexes_after_altering_primary_key branch 2 times, most recently from d945a1a to f8b5824 Compare March 18, 2022 16:30

Xiang-Gu changed the title ~~pkg/sql/alter_primary_key.go: added deduplication logic when altering~~ pkg/sql/alter_primary_key.go: added deduplication logic for ALTER PRIMARY KEY command Mar 18, 2022

Xiang-Gu force-pushed the deduplicate_unique_indexes_after_altering_primary_key branch from f8b5824 to 9b01dff Compare March 18, 2022 18:10

Xiang-Gu marked this pull request as ready for review March 18, 2022 21:15

Xiang-Gu requested a review from a team March 18, 2022 21:15

Xiang-Gu force-pushed the deduplicate_unique_indexes_after_altering_primary_key branch from 9b01dff to b905629 Compare March 21, 2022 15:09

postamar reviewed Mar 21, 2022

View reviewed changes

Xiang-Gu force-pushed the deduplicate_unique_indexes_after_altering_primary_key branch from b905629 to c4746a6 Compare March 21, 2022 18:55

Xiang-Gu changed the title ~~pkg/sql/alter_primary_key.go: added deduplication logic for ALTER PRIMARY KEY command~~ sql: Deduplication logic for ALTER PRIMARY KEY Mar 21, 2022

Xiang-Gu force-pushed the deduplicate_unique_indexes_after_altering_primary_key branch 2 times, most recently from 3426343 to 9aae4ba Compare March 21, 2022 19:00

Xiang-Gu commented Mar 21, 2022

View reviewed changes

postamar reviewed Mar 23, 2022

View reviewed changes

Xiang-Gu force-pushed the deduplicate_unique_indexes_after_altering_primary_key branch from 9aae4ba to 8aa0f6c Compare March 25, 2022 18:51

Xiang-Gu commented Mar 25, 2022

View reviewed changes

ajwerner reviewed Mar 29, 2022

View reviewed changes

Xiang-Gu force-pushed the deduplicate_unique_indexes_after_altering_primary_key branch 2 times, most recently from 379c5d6 to 896c97e Compare March 30, 2022 15:03

Xiang-Gu commented Mar 30, 2022

View reviewed changes

Xiang-Gu force-pushed the deduplicate_unique_indexes_after_altering_primary_key branch from 896c97e to 5f2c786 Compare March 30, 2022 15:07

ajwerner approved these changes Apr 1, 2022

View reviewed changes

Xiang-Gu commented Apr 4, 2022

View reviewed changes

Xiang-Gu force-pushed the deduplicate_unique_indexes_after_altering_primary_key branch from 5f2c786 to 092329d Compare April 4, 2022 17:08

craig bot merged commit 76776c0 into cockroachdb:master Apr 4, 2022

cockroach-teamcity mentioned this pull request Apr 4, 2022

sql: Deduplication logic for ALTER PRIMARY KEY cockroachdb/docs#13484

Closed

This was referenced Jun 8, 2022

sql: Deduplication logic for ALTER PRIMARY KEY cockroachdb/docs#14116

Closed

sql: Deduplication logic for ALTER PRIMARY KEY cockroachdb/docs#14198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: Deduplication logic for `ALTER PRIMARY KEY`#78046

sql: Deduplication logic for `ALTER PRIMARY KEY`#78046
craig[bot] merged 1 commit intocockroachdb:masterfrom
Xiang-Gu:deduplicate_unique_indexes_after_altering_primary_key

Xiang-Gu commented Mar 17, 2022 •

edited by cockroach-jira-scripts

Loading

Uh oh!

cockroach-teamcity commented Mar 17, 2022

Uh oh!

postamar left a comment

Uh oh!

Xiang-Gu left a comment

Uh oh!

postamar left a comment

Uh oh!

Xiang-Gu left a comment

Uh oh!

ajwerner left a comment

Uh oh!

Xiang-Gu left a comment

Uh oh!

ajwerner left a comment

Uh oh!

Xiang-Gu left a comment

Uh oh!

Xiang-Gu commented Apr 4, 2022

Uh oh!

craig bot commented Apr 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Xiang-Gu commented Mar 17, 2022 • edited by cockroach-jira-scripts Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Mar 17, 2022

Uh oh!

postamar left a comment

Choose a reason for hiding this comment

Uh oh!

Xiang-Gu left a comment

Choose a reason for hiding this comment

Uh oh!

postamar left a comment

Choose a reason for hiding this comment

Uh oh!

Xiang-Gu left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

Xiang-Gu left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

Xiang-Gu left a comment

Choose a reason for hiding this comment

Uh oh!

Xiang-Gu commented Apr 4, 2022

Uh oh!

craig bot commented Apr 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Xiang-Gu commented Mar 17, 2022 •

edited by cockroach-jira-scripts

Loading