-
Notifications
You must be signed in to change notification settings - Fork 4.1k
sql: enable adding columns which are not written to the current primary index #59149
Description
Is your feature request related to a problem? Please describe.
This issue is a breakout of point 2 in #47989 (comment). Column backfills for changing the set of columns in a table have a number of issues outlined in #47989. The alternative is to build a new primary index and swap to it. We support this in theory with primary key changes, however, there's some slight nuance in that primary key changes require the set of columns in the new and old primary index are the same. This issue is concerned with the requirement that concurrent writer not write any new columns being backfilled to this new primary index to the old primary index.
The root of the problem is that primary index descriptors are handled specially. For secondary indexes, the columns stored in the value are primary index columns specified in ExtraColumnIDs for unique indexes and the columns in StoreColumnIDs. For primary indexes, however, all of the table columns are stored.
The hazard is that, if we don't change anything, the column in DELETE_AND_WRITE_ONLY will end up being written to the existing primary index.
Describe the solution you'd like
The thrust of the solution is twofold:
- Update the table descriptor to encode the need to not write these columns to the current primary index.
- Adopt this change where necessary.
Updated descriptor structure
I have two proposals here, one that I prefer but is riskier and one that is less invasive but worse. The latter does not in any
A clear and concise description of what you want to happen.
-
A) Make primary indexes look like secondary indexes and populate their
StoreColumnIDs.- Pros
- This is nice for a variety of reasons. It's not obvious to me why we maintain this distinction save for legacy reasons. It should simplify index handling code.
- Cons
- It's slightly less compact.
- It'd be a migration that would affect a lot of code.
- Pros
-
B) Add another field to the table descriptor to encode the set of columns which should not be written to the primary index
- Pros
- Much less invasive
- Cons
- More cruft
- Pros
Adopting change to descriptor structure.
Let's assume we're going to go with B) as it seems more tractable. One thing which will need to change is the code which writes rows. My current reading is that we might be able to isolate this change to legitimately just this function. @yuzefovich could I ask you to verify that claim and generally have a look at this issue?
The bigger unknown for me is what changes would need to be made in the optimizer. @RaduBerinde could I ask you to give review this and provide some guidance.
Additional context
Something must be done here to realize #47989. More pressingly, we are working to not include support for column backfills in the new schema change rewrite so realistically something needs to be done here in the next few weeks.
Epic: CRDB-2356