sql: column backfills should build a new index instead of mutating the existing one

Currently, when a backfill is needed for a column being added or dropped, the column backfiller rewrites the existing primary index in-place to add or remove the column value. This has some disadvantages compared to the implementation of index backfills (for non-interleaved indexes):

* The index backfiller is much faster because it batches KVs and writes SSTs directly.
* Column backfills are more difficult to clean up. A index being built has no effect on other indexes and can be GC'ed using `ClearRange`s (which is what the index GC job currently does). A column backfill can only be cleaned up by running the "reverse" column backfill.
    * This poses problems for restoring backups taken in the middle of a column backfill (#44019). Currently the best solution is to create a new job to resume the column schema schema change, but it would be good to have the option of being able to simply ignore the column.
    * We have bugs that prevent column backfills from being completely rolled back (#47712). In the worst case, it would be good to have the option of ignoring the partially backfilled column and clearing the mutation. Currently, the presence of column values with no accompanying column mutation on the table descriptor might make the KVs unreadable.
* Dropping a column requires deleting values. If we have to roll back dropping a column and want to recover the previous values (which we currently don't do; see #46541), the only way to get them back is via MVCC.

The proposal is to change the implementation of column backfills to build a new (primary) index every time, instead of mutating the existing one. There are 2 proposed steps in this change (which are somewhat independent):
1. Switch the column backfill implementation so that it works like today's index backfiller. Adding or dropping any column would be like a primary key change: we go through the states of adding a new index until we're writing to both indexes, then do an index backfill, then swap the primary index in the final table descriptor update of the schema change.
2. There are performance concerns with just doing the first step. A proposed optimization: Reorder the steps in adding an index so that the backfill happens before any live traffic to the new index is turned on, then start writing to the new index and use a rangefeed to catch up on writes in the gap between the backfill and when all nodes are writing to the new index. See #45888 and #36850 for details. We'd want to do some speed-of-light experiments before committing to this approach.

Rolling back a column schema change, with this proposed approach, would essentially mean switching back to the pre-existing index. If we rolled back dropping a column, we still have all the column data as of the timestamp when the schema change was started.

This approach would also allow us to do ALTER TYPE in a single index backfill operation (see #46893).

Open questions:

* What do we do for interleaved tables? Very preliminary thoughts:
    * For interleaved tables with no interleaved children, we should just be able to write the new primary index alongside the old one. I'm not sure what the implications are for cleaning up an index if we don't want to/can't roll it back normally. Presumably we can't just leave abandoned index KVs interleaved in some other table with no indication of this on any of the table descriptors.
    * For interleaved parents, can we do this without rewriting all the child interleaved indexes as well?

Jira issue: CRDB-4366

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: column backfills should build a new index instead of mutating the existing one #47989

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sql: column backfills should build a new index instead of mutating the existing one #47989

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions