[db_metadata] Parse schema on migration, await backfill by smklein · Pull Request #9878 · oxidecomputer/omicron

smklein · 2026-02-18T16:49:23Z

Non-atomic schema changes ARE real and CAN hurt you

Adds CRDB-specific preprocessing and sqlparser parsing to each schema change within Nexus
Verifies that each change contains "at most one" schema-modifying DDL statement
Adds a secondary transaction to each "apply schema change" step, which verifies that the underlying backfill operation has completed. The objective here is to transform "delayed, async DDL statements" into synchronous operations, so we can know "it succeeded fully" or "it failed" before moving on

Part of #9866
Fixes #9888

david-crespo · 2026-02-18T16:58:07Z

Wild, but also way simpler than I expected.

david-crespo · 2026-02-18T16:59:31Z

+    /// verification queries.
+    ///
+    /// If verification fails, the error propagates immediately — the outer
+    /// startup retry loop will re-attempt the entire migration.


It's worth spelling out (at least in the PR, if not in the comment) exactly how this prevents the problem we saw. What prevents this from running into the same error over and over upon re-running the migration?

Without @sunshowers ' fix for #9874: this would continue failing over and over again.

This PR is intended to prevent Nexuses from "crossing the barrier" of an individual schema change (basically: trying to help us treat them as synchronous atomic steps again). It makes no changes for "Max memory/batch sizes" nor changes to attempted retries.

I can update the PR description to make this more clear!

david-crespo · 2026-02-18T18:36:46Z

+///    detect them via regex here, in the same pass that removes them.
+///
+/// This is **only for classification** — the original SQL is always what
+/// gets executed against the database.


Calling out this line for other reviewers — important!

jgallagher · 2026-02-18T19:59:23Z

+    // Both are valid PostgreSQL, but sqlparser only handles the
+    // bracket notation.  We match `ARRAY` followed by `,` or end-of-
+    // definition (not `ARRAY[` which is the array literal syntax).
+    let type_array_re = Regex::new(r"(?i)(\w+)\s+ARRAY\s*([,)\n])").unwrap();


This function is impressive, but honestly turns me off of the whole sqlparser plan entirely; I didn't realize it couldn't parse our schema.

Is it worth going back to the idea board for other ways to accomplish this?

I agree that it is wild, but I would vote against "throw it out" at this point on the grounds that the main failure mode here is a test failure due to SQL not parsing. The scarier failure mode is a modification significant enough to be classified wrong, but I feel like we should be able to rule that out decently well with testing. I also wonder if it's worth considering a different parser like https://crates.io/crates/pg_query (much lower download count but still seems legit) since it's just a dev dep. I'm having the robot try it.

Probably not worth it — sounds like we might have to do actual work to get this to work on illumos — but I'll give it a shot and see how it looks.

I'm currently working on eliminating a chunk of the regex by just pulling in a more recent version of sqlparser. Will look into the difficulty of creating our own dialect next.

It's a lot better now. Meanwhile $10 later my pg_query attempt is crap.

I'm gonna try to tweak this further to basically do the following:

Try to remove all DML

Do the crdb-specific pre-processing, but only for the DDL which could have back-filled operations

Treat all other DDL as "Some other DDL", which we don't pass to sqlparser

This should reduce the regex usage further, and hopefully get it into a more bearable territory. The amount of crdb-specific regex will scale with the amount of "actually async backfilling DDL" operations we're catching, but maybe that will tolerable enough to let us punt on a CRDB-specific SQL parser a little longer...

Oh yep, right about the dep. But the logic of this function is identical between dev and prod because it's based on the hard coded SQL on hand. If you wanted to, you could avoid running this function in production by pre-processing the queries, couldn't you? Like you could do the classification at dev time, write it down, and have a snapshot test that makes sure the classification is up to date with the code.

I believe this is true

I think this latest version strips the regex about as far as it can go, by partitioning the world into "DDL we care about parsing" and "Other DDL which we aren't gonna bother parsing at all".

Oh yep, right about the dep. But the logic of this function is identical between dev and prod because it's based on the hard coded SQL on hand. If you wanted to, you could avoid running this function in production by pre-processing the queries, couldn't you? Like you could do the classification at dev time, write it down, and have a snapshot test that makes sure the classification is up to date with the code.

I believe this is true

I like this idea a lot. If we something like an expectorate test that this code filled in to maintain a static list of migration steps that need particular async backfill followup, we'd get:

SQL parsing out of prod and into tests - prod can read a compiled form of the static list

A chance to confirm in every PR review that the parser is catching all the new indices being created (I wouldn't want to rely on this at all - that's the whole point of tooling - but it seems good that there's an opportunity for us to review)

Alright, I went ahead and did that, and updated the docs. sqlparser and regex are now dev-dependencies, and we have an EXPECTORATE test that generates .verify.sql files which sit alongside the migrations.

This also retroactively added checks for all migrations, which I figured would be worthwhile for perusal. However, if we want me to drop that generation (e.g., only generate the .verify.sql files for relatively new migrations) I could do that too.

…docs

…le testing

smklein · 2026-02-20T22:11:39Z

+    // ---------------------------------------------------------------
+    // Integration tests for schema change verification
+    // ---------------------------------------------------------------


I tried to include a fair number of unit tests in this PR, but this sequence of integration tests is pretty dang important for the behavior we're trying to test.

These cover the verification queries:

CREATE INDEX IF NOT EXISTS

Checking that the verification query passes/fails if the index exists/doesn't exist.

Checking that concurrent invocations of index creation can cause some callers to "skip" the backfill, and that verification acts as a back-stop.

ALTER TABLE ADD CONSTRAINT

Checking that the verification query passes/fails if the constraint exists/doesn't exist

Checking that concurrent invocations of constraint creation can cause some callers to "skip" the backfill, and that verification acts as a back-stop.

Additionally, these cover some DDL changes that have backfill, but don't need verification queries:

ADD COLUMN IF NOT EXISTS ... NOT NULL DEFAULT

Confirms that concurrent callers block each other out, so no verification query is needed

ALTER COLUMN ... SET NOT NULL

Confirms that concurrent callers block each other out, so no verification query is needed

(an earlier revision of this PR emitted verification queries for these last two cases, but that has since been removed, as they are not necessary)

smklein · 2026-02-23T19:48:51Z

+///
+/// Permits only `[a-zA-Z0-9_]` characters, which covers all valid
+/// SQL identifiers and our enum variant values.
+#[doc(hidden)]


This struct, and SchemaChangeInfo, are pretty much only used at test-time, but they're used by multiple test modules, so it's handy to have them exposed publicly. I could probably use some cargo manipulation to do a "testing" feature flag, but for the moment I just opted to "use them in cfg(test) stuff, and keep them doc(hidden)".

smklein · 2026-02-23T19:49:42Z

+                // NOT VALID constraints are synchronous — no async
+                // validation job runs, so there is no race to guard
+                // against and no verification query is needed.
+                if *not_valid {


We have exactly one migration which adds a constraint with the NOT VALID keyword (it was one of yours, @david-crespo !) but it seems like a legit use-case, so I added it here.

Probably because only Claude knows about these arcane SQL features. I would never have come up with it myself.

smklein · 2026-02-23T19:51:26Z

+    ///
+    /// Returns trimmed, non-empty statement fragments with comments
+    /// stripped.
+    fn split_and_strip_sql(sql: &str) -> Vec<String> {


This was "strip comments" and "split statements" as two separate functions, but they kinda need to know about each other.

I know there's a ton of string manipulation going on here, but:

I have a lot of tests for it

I added proptests for it

It'll only be running at test-time (EXPECTORATE time) anyway

smklein · 2026-02-24T20:58:20Z

Merged with #9889 , now using the max_sql_memory_mib argument. Thanks @sunshowers

david-crespo · 2026-02-27T19:28:39Z

For posterity, I had gpt-5.3-codex and Opus 4.6 go through the CockroachDB source on our fork and make the argument that the verification queries in this PR work to ensure that a create index operation is complete when we think it's complete. We did this while we were debugging this issue and working out the solution, but I think it's worth documenting the result of that analysis canonically.

It's long, so I'm including it here collapsed so it doesn't take on more importance than it deserves. I think it is a very good account of our understanding of this change at this time. This was produced by ping-ponging back and forth between the two models, having Codex put together the initial argument, then Claude evaluate it and make it more readable, and back and forth a few more times.

Full argument for this change

This PR changes Nexus migrations from "execute DDL step and immediately move on" to "execute DDL step, then run a generated verification query, and only then advance migration state/version." The reason is #9866/#9888: for backfill-prone DDL, Cockroach can acknowledge the statement before asynchronous schema work has fully completed.

The rest of this comment argues that the verification predicates used by the PR are correct — that they become true only after Cockroach has actually finished the relevant async schema-change work, making them valid barriers before Nexus advances migration state. The argument is against release-22.1-oxide; all code links below are permalinks to the current tip of that branch.

Why `CREATE INDEX IF NOT EXISTS` needs a barrier

CockroachDB schema changes are asynchronous. When you run CREATE INDEX, Cockroach adds the index to the table descriptor as a mutation (an in-progress schema change) and returns. The actual work — backfilling data into the index, validating uniqueness constraints — happens asynchronously in the schema changer.

The problem is that IF NOT EXISTS is too permissive about what counts as "exists." In create_index.go#L659-L669, CREATE INDEX calls FindIndexWithName, which searches with AddMutations: true and DropMutations: true (table_desc.go#L361-L366). If it finds a match — even an index that's mid-backfill — IF NOT EXISTS returns success immediately. So a second Nexus replica running the same migration can get "success" before the index is actually usable.

Why `crdb_internal.table_indexes` is a correct barrier

The PR verifies index creation by checking crdb_internal.table_indexes. This virtual table is stricter than IF NOT EXISTS: it iterates indexes via catalog.ForEachIndex with default IndexOpts (crdb_internal.go#L2751-L2764), which excludes mutations. ForEachIndex filters out any index where Adding() or Dropped() is true unless you opt in (table_elements.go#L532-L540). So crdb_internal.table_indexes shows only public indexes — those that have graduated from mutation status.

The remaining question is whether an index becomes public only after the backfill and validation work is actually done. It does. The schema changer's lifecycle runs in three phases (schema_changer.go#L1959-L1971): first advance mutation state, then runBackfill (which includes data backfill, merge from a temporary index that captured concurrent writes, and validateIndexes — backfill.go#L1931-L1940, backfill.go#L2072-L2073), and only if that succeeds, done(). If runBackfill errors, done() is never called and the index stays in mutation state.

done() is the function that actually makes the index public: it calls MakeMutationComplete for each mutation (schema_changer.go#L1588), which for index-add mutations calls AddSecondaryIndex to insert the index into the public index list (structured.go#L1613-L1623), then trims the mutation from the descriptor (schema_changer.go#L1729-L1731).

So the chain is: an index appears in crdb_internal.table_indexes only if it's public, it becomes public only in done(), and done() runs only after backfill and validation succeed. Checking this table is a valid barrier for the IF NOT EXISTS race.

Why `SHOW CONSTRAINTS ... validated = true` is a correct barrier for `ADD CONSTRAINT`

The same pattern applies to constraint creation. The PR verifies ADD CONSTRAINT by checking SHOW CONSTRAINTS for validated = true.

SHOW CONSTRAINTS projects pg_constraint.convalidated as the validated column (show_table.go#L196-L210). convalidated is set to !con.Unvalidated (pg_catalog.go#L1006-L1014). And the constraint detail code marks any constraint whose Validity is not Validated — including Validating — as Unvalidated (table.go#L360-L362, table.go#L382-L384, table.go#L418-L420). So validated = true requires Validity == Validated, and a constraint that's still in the Validating phase reads as validated = false.

The transition to Validated happens in the same done() → MakeMutationComplete path: the function finds constraints in Validating state and flips them to Validated (structured.go#L1629-L1634, structured.go#L1646-L1652, structured.go#L1664-L1670). As with indexes, done() only runs after runBackfill — which includes validateConstraints (backfill.go#L341-L360) — completes successfully.

So for both verification forms used by the PR, the predicate becoming true corresponds to a post-condition stronger than "DDL statement returned OK" — it means completion in the schema changer's state machine sense, which is the property needed to avoid the #9866 / #9888 class of false-progress migration outcomes.

Safety vs. liveness under descriptor caching

The verification query may run on a different CRDB node than the one that executed the DDL. Descriptor lease/cache lag on other nodes can produce temporary false negatives, where a verification query doesn't yet see the now-public index or validated constraint. This affects liveness (Nexus retries the verification) but not safety (Nexus never advances migration state prematurely). The dangerous direction would be a false positive — seeing a public index before backfill completes — which the argument above rules out: an index or constraint can only appear as public after done() atomically commits the updated descriptor, and no stale cache can conjure a public index that was never written.

Scope and caveats

This argument covers the legacy schema changer job path. Our fork defaults use_declarative_schema_changer to on (exec_util.go#L477-L489), which means the declarative path is used for "supported statements in implicit transactions" and falls back to legacy otherwise. Nexus runs migration DDL inside explicit transactions (via transaction_retry_wrapper), so the legacy schema changer is always the active path for these statements. The declarative schema changer uses different code paths and state machines; this comment doesn't address it, but it also isn't exercised here. There are also other MakeMutationComplete call sites for scenarios like in-transaction schema changes on newly created tables and TRUNCATE on empty tables, but those aren't the async existing-table path behind #9866.

One concern worth addressing explicitly: could there be a window during done() where the index has been removed from the mutations list but not yet added to the public list (or vice versa), creating a moment where a verification query sees an inconsistent descriptor? No — done() modifies the descriptor in-memory (both MakeMutationComplete and trimming mutation entries), then writes and commits it in a single transaction via WriteDescToBatch + txn.Run (schema_changer.go#L1729-L1731, schema_changer.go#L1852-L1856). External readers see either the pre-commit descriptor (index still a mutation) or the post-commit descriptor (index public, mutation trimmed), never a half-state.

Finally, this proof covers the two verification predicates the PR actually uses: crdb_internal.table_indexes for index existence and SHOW CONSTRAINTS ... validated = true for constraint validation. If future work adds verification for other backfill-prone DDL types, each new predicate would need its own argument.

Fixing a conflict between #9898 and #9878 (New tests + new postgres -> changes how error messages got exposed)

[db_metadata] Parse schema on migration, await backfill

f118398

david-crespo reviewed Feb 18, 2026

View reviewed changes

jgallagher reviewed Feb 18, 2026

View reviewed changes

smklein force-pushed the migration-await-backfill branch from f792e27 to 0bf9368 Compare February 18, 2026 21:48

smklein changed the base branch from main to sqlparser February 18, 2026 21:48

smklein added 5 commits February 18, 2026 13:50

Upgrade sqlparser from 0.45 to 0.61

a5c9a81

merge

c775f2d

SqlIdentifiers, multiple changes in single statements allowed

b1a5da2

better integration tests

31b202f

fix docs

47abf04

smklein force-pushed the migration-await-backfill branch from 0bf9368 to 47abf04 Compare February 18, 2026 21:50

smklein force-pushed the sqlparser branch from 59125e5 to a5c9a81 Compare February 18, 2026 21:50

smklein added 5 commits February 18, 2026 14:14

use sqlparser 0.61, reduce regex usage

7503080

more regex simplification

535cde7

Reduce regexes further

4f2dfcf

Add docs, handle more column add cases

aa92314

move to test-time, make sqlparser/regex dev-dependency, EXPECTORATE, …

a67e505

…docs

smklein marked this pull request as ready for review February 20, 2026 15:34

Fix semicolon parsing, lazylock regex

ac9b16a

smklein mentioned this pull request Feb 20, 2026

[crdb] set index backfill batch size to 5000, max-sql-memory to 256MiB #9889

Merged

smklein added 3 commits February 20, 2026 12:44

Remove two DDL cases that don't visibly backfill, work on more reliab…

55eac77

…le testing

fmt

262df72

Make test more strict

55049bc

smklein commented Feb 20, 2026

View reviewed changes

Handle adding a NOT VALID constraint

546d8ce

Base automatically changed from sqlparser to main February 23, 2026 19:11

smklein added 2 commits February 23, 2026 11:31

Combine statement splitting and comment stripping, add proptest

e06030a

merge

336969e

smklein commented Feb 23, 2026

View reviewed changes

smklein added 2 commits February 24, 2026 12:44

merge

6bbb88b

use max_sql_memory_mib

c2c3c89

david-crespo approved these changes Feb 27, 2026

View reviewed changes

merge

c139074

smklein enabled auto-merge (squash) February 27, 2026 21:34

smklein merged commit 5a06ffa into main Feb 28, 2026
17 checks passed

smklein deleted the migration-await-backfill branch February 28, 2026 03:40

smklein mentioned this pull request Mar 2, 2026

Unbreak database tests #9955

Merged

smklein added a commit that referenced this pull request Mar 3, 2026

Unbreak database tests (#9955)

56b8e64

Fixing a conflict between #9898 and #9878 (New tests + new postgres -> changes how error messages got exposed)

smklein mentioned this pull request Mar 6, 2026

Make user_provision_type NOT NULL #9985

Merged

Conversation

smklein commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-crespo commented Feb 18, 2026

Uh oh!

david-crespo Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-crespo Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgallagher Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smklein commented Feb 24, 2026

Uh oh!

david-crespo commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why CREATE INDEX IF NOT EXISTS needs a barrier

Why crdb_internal.table_indexes is a correct barrier

Why SHOW CONSTRAINTS ... validated = true is a correct barrier for ADD CONSTRAINT

Safety vs. liveness under descriptor caching

Scope and caveats

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

smklein commented Feb 18, 2026 •

edited

Loading

david-crespo Feb 18, 2026 •

edited

Loading

david-crespo Feb 18, 2026 •

edited

Loading

jgallagher Feb 19, 2026 •

edited

Loading

david-crespo commented Feb 27, 2026 •

edited

Loading

Why `CREATE INDEX IF NOT EXISTS` needs a barrier

Why `crdb_internal.table_indexes` is a correct barrier

Why `SHOW CONSTRAINTS ... validated = true` is a correct barrier for `ADD CONSTRAINT`