sql: Deduplication logic for ALTER PRIMARY KEY#78046
Conversation
d945a1a to
f8b5824
Compare
ALTER PRIMARY KEY command
f8b5824 to
9b01dff
Compare
9b01dff to
b905629
Compare
postamar
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @Xiang-Gu)
pkg/sql/alter_primary_key.go, line 663 at r1 (raw file):
return true } alreadyHasSecondaryIndexOnPKColumns := func(desc *tabledesc.Mutable) bool {
Can you rewrite this with desc being of type catalog.TableDescriptor instead? We try to use these catalog interfaces wherever possible. There's no way you could have guessed this so don't feel bad.
Really, this whole shouldCopyPrimaryKey function should be rewritten in this way, the only reason it hasn't is because it was written before these interfaces were introduced about a year ago.
pkg/sql/alter_primary_key.go, line 666 at r1 (raw file):
// Return whether any existing secondary index has identical key columns // as the primary key columns. pk := desc.GetPrimaryIndex().IndexDesc()
In light of the above comment, this should become pk := desc.GetPrimaryIndex() which returns a catalog.Index. The .IndexDesc() method returns the underlying descriptor protobuf, this has the same information in it but is much less safe to use. Partly because the interface is immutable by design so it's less easy to overwrite a value by mistake, but mostly because we have to keep descriptor protobufs compatible across cluster versions so sometimes some fields have unintuitive values.
pkg/sql/alter_primary_key.go, line 667 at r1 (raw file):
// as the primary key columns. pk := desc.GetPrimaryIndex().IndexDesc() for _, existingIndex := range desc.GetIndexes() {
In light of the previous, here you'll want to iterate over desc.PublicNonPrimaryIndexes().
pkg/sql/alter_primary_key.go, line 670 at r1 (raw file):
if len(pk.KeyColumnIDs) != len(existingIndex.KeyColumnIDs) || !existingIndex.Unique { continue }
You should also check that existingIndex is not a partial index. It isn't the most obvious thing in the world but we allow WHERE clauses in indexes! To reuse your example in the logic test, this could be something like CREATE TABLE t (a INT PRIMARY KEY, b INT NOT NULL, UNIQUE INDEX t_a_key (a) WHERE (a > 0));.
You can check for the presence of this partial predicate with the IsPartial method on catalog.Index.
pkg/sql/alter_primary_key.go, line 674 at r1 (raw file):
for i := range pk.KeyColumnIDs { if pk.KeyColumnIDs[i] != existingIndex.KeyColumnIDs[i] || pk.KeyColumnDirections[i] != existingIndex.KeyColumnDirections[i] {
I'm not sure if column directions matter here, or elsewhere in shouldCopyPrimaryKey for that matter. Let's ask @ajwerner
pkg/sql/alter_primary_key.go, line 678 at r1 (raw file):
break } }
catalog.Index has a nice CollectKeyColumnIDs method which returns a catalog.TableColSet which should be much more convenient for what you want to do here, which is to test whether there exists a non-partial unique secondary index whose set of key columns is a superset of the PK columns.
pkg/sql/logictest/testdata/logic_test/alter_primary_key, line 1607 at r1 (raw file):
t_pkey a N/A t_a_key a ASC t_a_key b ASC
This is a good test. Perhaps you can also add a case for CREATE TABLE t (a INT PRIMARY KEY, b INT NOT NULL, UNIQUE INDEX idx (a, b));, we should see a similar outcome: only 2 indexes (the old primary index and idx), the old primary index should be discarded.
b905629 to
c4746a6
Compare
ALTER PRIMARY KEY commandALTER PRIMARY KEY
3426343 to
9aae4ba
Compare
Xiang-Gu
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @postamar)
pkg/sql/alter_primary_key.go, line 663 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
Can you rewrite this with
descbeing of typecatalog.TableDescriptorinstead? We try to use thesecataloginterfaces wherever possible. There's no way you could have guessed this so don't feel bad.Really, this whole
shouldCopyPrimaryKeyfunction should be rewritten in this way, the only reason it hasn't is because it was written before these interfaces were introduced about a year ago.
Done.
pkg/sql/alter_primary_key.go, line 666 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
In light of the above comment, this should become
pk := desc.GetPrimaryIndex()which returns acatalog.Index. The.IndexDesc()method returns the underlying descriptor protobuf, this has the same information in it but is much less safe to use. Partly because the interface is immutable by design so it's less easy to overwrite a value by mistake, but mostly because we have to keep descriptor protobufs compatible across cluster versions so sometimes some fields have unintuitive values.
Done.
pkg/sql/alter_primary_key.go, line 667 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
In light of the previous, here you'll want to iterate over
desc.PublicNonPrimaryIndexes().
Done.
pkg/sql/alter_primary_key.go, line 670 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
You should also check that
existingIndexis not a partial index. It isn't the most obvious thing in the world but we allow WHERE clauses in indexes! To reuse your example in the logic test, this could be something likeCREATE TABLE t (a INT PRIMARY KEY, b INT NOT NULL, UNIQUE INDEX t_a_key (a) WHERE (a > 0));.You can check for the presence of this partial predicate with the
IsPartialmethod oncatalog.Index.
Done.
pkg/sql/alter_primary_key.go, line 674 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
I'm not sure if column directions matter here, or elsewhere in
shouldCopyPrimaryKeyfor that matter. Let's ask @ajwerner
The reason I check column directions is one of the existing logic tests has an expected output that created a new secondary index for ALTER PRIMARY KEY while there is already a secondary index on the (old) primary key columns, but opposite direction.
If directions do not matter, then I'll need to modify existing logic tests. If so, is there any other guidelines I need to follow?
pkg/sql/alter_primary_key.go, line 678 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
catalog.Indexhas a niceCollectKeyColumnIDsmethod which returns acatalog.TableColSetwhich should be much more convenient for what you want to do here, which is to test whether there exists a non-partial unique secondary index whose set of key columns is a superset of the PK columns.
Currently, I code index equality as "same keyed columns, same size, in the same order".
I think superset is too "loose" because of the unorderedness -- e.g. it's hard to justify not to create a secondary index on the (old) PK column "a" when there is an existing secondary index on ("b", "a"), since this existing index cannot be used in queries filtered by "a" (e.g. SELECT ... WHERE "a" > 4)
I'd argue that a strict prefix will be a better index-equality definition. That is, we don't create a new secondary index on the (old) PK columns if there exists a secondary index s.t. (old) PK columns is a strict prefix of this secondary index columns.
This definition corresponds exactly to the additional test you suggested below.
However, such a change will also break one of the existing logic tests as expected. I've modified the existing logic test to accommodate this new code change.
pkg/sql/logictest/testdata/logic_test/alter_primary_key, line 1607 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
This is a good test. Perhaps you can also add a case for
CREATE TABLE t (a INT PRIMARY KEY, b INT NOT NULL, UNIQUE INDEX idx (a, b));, we should see a similar outcome: only 2 indexes (the old primary index andidx), the old primary index should be discarded.
Added. Thank you for the suggestion.
postamar
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @Xiang-Gu)
-- commits, line 7 at r2:
nit: I believe git commit message lines should not exceed a certain length, which is probably described in the style guide. Try to set up your editor to enforce this limit, that way you won't have to concern yourself with this ever again. This isn't strictly enforced by the way, so no big deal.
pkg/sql/alter_primary_key.go, line 678 at r1 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
Currently, I code index equality as "same keyed columns, same size, in the same order".
I think superset is too "loose" because of the unorderedness -- e.g. it's hard to justify not to create a secondary index on the (old) PK column "a" when there is an existing secondary index on ("b", "a"), since this existing index cannot be used in queries filtered by "a" (e.g. SELECT ... WHERE "a" > 4)
I'd argue that a strict prefix will be a better index-equality definition. That is, we don't create a new secondary index on the (old) PK columns if there exists a secondary index s.t. (old) PK columns is a strict prefix of this secondary index columns.
This definition corresponds exactly to the additional test you suggested below.
However, such a change will also break one of the existing logic tests as expected. I've modified the existing logic test to accommodate this new code change.
My understanding, which may very well be wrong, was that the old primary index was guaranteeing a uniqueness property on the old primary key, and this property should be maintained even after replacing the old primary index with a new one. Your argument certainly has merit, though. @ajwerner can you weigh in here?
In any case this change of yours is a strict improvement to the existing code and so this question should not block merging this PR.
pkg/sql/logictest/testdata/logic_test/alter_primary_key, line 1208 at r2 (raw file):
This is surprising, shouldn't this index be preserved, following your reasoning?
Currently, I code index equality as "same keyed columns, same size, in the same order".
Reading your code I'm also not quite how this is happening. Am I missing something?
9aae4ba to
8aa0f6c
Compare
Xiang-Gu
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @postamar)
Previously, postamar (Marius Posta) wrote…
nit: I believe git commit message lines should not exceed a certain length, which is probably described in the style guide. Try to set up your editor to enforce this limit, that way you won't have to concern yourself with this ever again. This isn't strictly enforced by the way, so no big deal.
Done.
pkg/sql/alter_primary_key.go, line 678 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
My understanding, which may very well be wrong, was that the old primary index was guaranteeing a uniqueness property on the old primary key, and this property should be maintained even after replacing the old primary index with a new one. Your argument certainly has merit, though. @ajwerner can you weigh in here?
In any case this change of yours is a strict improvement to the existing code and so this question should not block merging this PR.
the old primary index was guaranteeing a uniqueness property on the old primary key, and this property should be maintained even after replacing the old primary index with a new one
If we'd like to continue to maintain the uniqueness property on the old primary key column(s), then yes, we do need a "strict" equality definition (i.e. same column(s), same direction, in same order, unique, non-partial). I am just curious why we want to do that, so yeah, Andrew, please give us more insight here.
In any case this change of yours is a strict improvement to the existing code and so this question should not block merging this PR.
In the most recent change, I change the equality definition to strict-prefix (and thus also modified some existing logic tests). We do need to wait until we figure this out before merging.
pkg/sql/logictest/testdata/logic_test/alter_primary_key, line 1208 at r2 (raw file):
Previously, postamar (Marius Posta) wrote…
This is surprising, shouldn't this index be preserved, following your reasoning?
Currently, I code index equality as "same keyed columns, same size, in the same order".
Reading your code I'm also not quite how this is happening. Am I missing something?
In the most recent change of this PR, I changed the index-equality from "same keyed columns, same size, in the same order" to "strict-prefix".
This breaks some existing logic tests, such as this one, because we already have a secondary index (id, id2), and the old PK (id) is a strict prefix of it.
This is also the reason for other logic tests modification.
ajwerner
left a comment
There was a problem hiding this comment.
I think I'd let the principle of lease surprise guide me here. Two unique indexes of the same exact columns would be surprising. I'm not sure much else would be. This is all an edge case for a behavior cockroachdb made up, so I more or less agree we can't mess up too badly here.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @postamar and @Xiang-Gu)
pkg/sql/alter_primary_key.go, line 674 at r1 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
The reason I check column directions is one of the existing logic tests has an expected output that created a new secondary index for
ALTER PRIMARY KEYwhile there is already a secondary index on the (old) primary key columns, but opposite direction.If directions do not matter, then I'll need to modify existing logic tests. If so, is there any other guidelines I need to follow?
This is all a convenience feature which postgres doesn't have. We can do whatever we want here. The reason we did this was out of fear that customers will experience very bad performance on queries they expected previously to work well. The difference in directions can affect some queries, but is much less likely to than the lack of the index. Either choice would be valid.
pkg/sql/alter_primary_key.go, line 678 at r1 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
the old primary index was guaranteeing a uniqueness property on the old primary key, and this property should be maintained even after replacing the old primary index with a new one
If we'd like to continue to maintain the uniqueness property on the old primary key column(s), then yes, we do need a "strict" equality definition (i.e. same column(s), same direction, in same order, unique, non-partial). I am just curious why we want to do that, so yeah, Andrew, please give us more insight here.
In any case this change of yours is a strict improvement to the existing code and so this question should not block merging this PR.
In the most recent change, I change the equality definition to strict-prefix (and thus also modified some existing logic tests). We do need to wait until we figure this out before merging.
I think I agree that the order of the columns matters. I don't know how much we care about the direction.
pkg/sql/alter_primary_key.go, line 631 at r3 (raw file):
// other than hash sharding. // * There is no partitioning change. // * There is no existing secondary index on the old primary key columns.
This is stale given the strict prefix change. Let's make sure the design aligns with the commentary before we merge.
pkg/sql/alter_primary_key.go, line 670 at r3 (raw file):
for _, existingIndex := range desc.PublicNonPrimaryIndexes() { if existingIndex.IsPartial() || !existingIndex.IsUnique() || pk.NumKeyColumns() > existingIndex.NumKeyColumns() {
nit: I encourage you to refactor this logic into yet another helper which takes two indexes and compares them on whatever properties we deem important. That way you can just early return a false. It'll also clean up the readability a tad.
Code quote:
if existingIndex.IsPartial() || !existingIndex.IsUnique() || pk.NumKeyColumns() > existingIndex.NumKeyColumns() {
continue
}
isDifferent := falsepkg/sql/alter_primary_key.go, line 670 at r3 (raw file):
for _, existingIndex := range desc.PublicNonPrimaryIndexes() { if existingIndex.IsPartial() || !existingIndex.IsUnique() || pk.NumKeyColumns() > existingIndex.NumKeyColumns() {
I'm not so sure about this strict prefix choice. At the very least it should be well commented. I do buy that it makes sense from a query performance perspective. One thing that you lose which is surprising is that we'll no longer be maintaining uniqueness over the same tuple. I also don't expect this case to come up very frequently, so it's not clear that it's worth the complexity when compared to . It'd be weird to have a unique secondary index over a tuple with the same columns in the same order with the same direction as the primary key, but with even more columns.
379c5d6 to
896c97e
Compare
Xiang-Gu
left a comment
There was a problem hiding this comment.
Thank you for the guidance. yeah, I will revert back to the definition of index equality to "same columns, same order, same direction" and consider only "unique, non-partial" existing secondary indexes.
RFAL
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @postamar)
pkg/sql/alter_primary_key.go, line 674 at r1 (raw file):
Previously, ajwerner wrote…
This is all a convenience feature which postgres doesn't have. We can do whatever we want here. The reason we did this was out of fear that customers will experience very bad performance on queries they expected previously to work well. The difference in directions can affect some queries, but is much less likely to than the lack of the index. Either choice would be valid.
Done.
pkg/sql/alter_primary_key.go, line 678 at r1 (raw file):
Previously, ajwerner wrote…
I think I agree that the order of the columns matters. I don't know how much we care about the direction.
acked.
pkg/sql/alter_primary_key.go, line 670 at r3 (raw file):
Previously, ajwerner wrote…
nit: I encourage you to refactor this logic into yet another helper which takes two indexes and compares them on whatever properties we deem important. That way you can just early return a false. It'll also clean up the readability a tad.
done
pkg/sql/alter_primary_key.go, line 670 at r3 (raw file):
Previously, ajwerner wrote…
I'm not so sure about this strict prefix choice. At the very least it should be well commented. I do buy that it makes sense from a query performance perspective. One thing that you lose which is surprising is that we'll no longer be maintaining uniqueness over the same tuple. I also don't expect this case to come up very frequently, so it's not clear that it's worth the complexity when compared to . It'd be weird to have a unique secondary index over a tuple with the same columns in the same order with the same direction as the primary key, but with even more columns.
acked. reverted back to strict-equality definition.
896c97e to
5f2c786
Compare
ajwerner
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @postamar, and @Xiang-Gu)
pkg/sql/alter_primary_key.go, line 681 at r4 (raw file):
isIdentical := func(idx1, idx2 catalog.Index) bool { if idx1.IsUnique() != idx2.IsUnique() || idx1.IsPartial() != idx2.IsPartial() ||
I think that if either of them are partial, you would want to check if they have the same predicate. In practice, we know the primary index won't be partial, so 🤷 , but it feels weird to think this is enough to say that they are identical. Somebody might mindlessly refactor this in the future thinking it's complete.
Xiang-Gu
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @postamar)
pkg/sql/alter_primary_key.go, line 681 at r4 (raw file):
Previously, ajwerner wrote…
I think that if either of them are partial, you would want to check if they have the same predicate. In practice, we know the primary index won't be partial, so 🤷 , but it feels weird to think this is enough to say that they are identical. Somebody might mindlessly refactor this in the future thinking it's complete.
done
When we use `ALTER PRIMARY KEY` to change primary key of a table, a new secondary index on the old primary key columns will be created. This PR added deduplication logic for `ALTER PRIMARY KEY` if the old PK columns is a strict prefix of an existing secondary index columns. Release note (sql change): `ALTER PRIMARY KEY` will not create a seconday index on the old PK columns if they're a strict prefix of an existing secondary index.
5f2c786 to
092329d
Compare
|
bors r+ |
|
Build succeeded: |
When we use
ALTER PRIMARY KEYto change the primary key of a table, a newsecondary index on the old primary key columns will be created.
This PR added deduplication logic for
ALTER PRIMARY KEYif there exists a unique,non-partial secondary index that has the same columns as the (old) PK columns in
the same order with same direction.
Release note (sql change):
ALTER PRIMARY KEYwill not create a seconday index onthe old PK columns if there is already a unique, non-partial secondary index that is
"identical" to the (old) PK columns. Two indexes are "identical" if they have exactly
the same columns, in the same order, with same direction.
Fixes: #74116
Jira issue: CRDB-14766