Skip to content

sql: enable adding columns which are not written to the current primary index #59149

@ajwerner

Description

@ajwerner

Is your feature request related to a problem? Please describe.

This issue is a breakout of point 2 in #47989 (comment). Column backfills for changing the set of columns in a table have a number of issues outlined in #47989. The alternative is to build a new primary index and swap to it. We support this in theory with primary key changes, however, there's some slight nuance in that primary key changes require the set of columns in the new and old primary index are the same. This issue is concerned with the requirement that concurrent writer not write any new columns being backfilled to this new primary index to the old primary index.

The root of the problem is that primary index descriptors are handled specially. For secondary indexes, the columns stored in the value are primary index columns specified in ExtraColumnIDs for unique indexes and the columns in StoreColumnIDs. For primary indexes, however, all of the table columns are stored.

The hazard is that, if we don't change anything, the column in DELETE_AND_WRITE_ONLY will end up being written to the existing primary index.

Describe the solution you'd like

The thrust of the solution is twofold:

  1. Update the table descriptor to encode the need to not write these columns to the current primary index.
  2. Adopt this change where necessary.

Updated descriptor structure

I have two proposals here, one that I prefer but is riskier and one that is less invasive but worse. The latter does not in any
A clear and concise description of what you want to happen.

  • A) Make primary indexes look like secondary indexes and populate their StoreColumnIDs.

    • Pros
      • This is nice for a variety of reasons. It's not obvious to me why we maintain this distinction save for legacy reasons. It should simplify index handling code.
    • Cons
      • It's slightly less compact.
      • It'd be a migration that would affect a lot of code.
  • B) Add another field to the table descriptor to encode the set of columns which should not be written to the primary index

    • Pros
      • Much less invasive
    • Cons
      • More cruft

Adopting change to descriptor structure.

Let's assume we're going to go with B) as it seems more tractable. One thing which will need to change is the code which writes rows. My current reading is that we might be able to isolate this change to legitimately just this function. @yuzefovich could I ask you to verify that claim and generally have a look at this issue?

The bigger unknown for me is what changes would need to be made in the optimizer. @RaduBerinde could I ask you to give review this and provide some guidance.

Additional context

Something must be done here to realize #47989. More pressingly, we are working to not include support for column backfills in the new schema change rewrite so realistically something needs to be done here in the next few weeks.

Epic: CRDB-2356

Metadata

Metadata

Assignees

Labels

A-schema-descriptorsRelating to SQL table/db descriptor handling.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-sql-foundationsSQL Foundations Team (formerly SQL Schema + SQL Sessions)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions