Skip to content

fix: post-migration schema verification with self-healing#488

Merged
garrytan merged 1 commit intomasterfrom
fix/schema-verify
Apr 28, 2026
Merged

fix: post-migration schema verification with self-healing#488
garrytan merged 1 commit intomasterfrom
fix/schema-verify

Conversation

@garrytan
Copy link
Copy Markdown
Owner

@garrytan garrytan commented Apr 28, 2026

Problem

PgBouncer transaction-mode poolers can silently swallow ALTER TABLE statements during migrations. The SQL executes without error, but the column never materializes in the database. The migration system increments the schema version counter anyway, creating a dangerous desync: the application thinks it's on v29 but the actual table is missing columns.

This caused production embed failures when the embed handler tried to INSERT into symbol_type, start_line, end_line, parent_symbol_path columns that didn't exist on the content_chunks table.

Error Log

PostgresError: column "symbol_type" of relation "content_chunks" does not exist
    at ErrorResponse (.../postgres.js)
    at handle (.../postgres.js)

The schema version in config read v29 (latest), but information_schema.columns showed the v26/v27 columns (symbol_type, start_line, end_line, parent_symbol_path, doc_comment, symbol_name_qualified, search_vector) were absent from the actual table. The migrations had "succeeded" — PgBouncer acknowledged every statement — but ALTER TABLE was a no-op behind the pooler.

What We Tried

  1. Re-running gbrain apply-migrations — no-op because schema version already at v29
  2. Manually running the ALTER TABLE statements against direct Postgres (not PgBouncer) — worked, confirming the pooler was the issue
  3. Considered adding a migration that re-runs the ALTERs — but this doesn't prevent future occurrences

Solution

Add a verifySchema() function (src/core/schema-verify.ts) that runs AFTER all migrations complete in PostgresEngine.initSchema(). It:

  1. Parses the canonical schema from schema-embedded.ts — both CREATE TABLE blocks and ALTER TABLE ADD COLUMN IF NOT EXISTS statements
  2. Queries information_schema.columns for the actual database state
  3. Diffs expected vs actual columns (skipping tables that don't exist yet)
  4. Self-heals missing columns via ALTER TABLE ADD COLUMN IF NOT EXISTS with simplified definitions (strips FKs, CHECKs, UNIQUE that can't go in ADD COLUMN)
  5. Throws with actionable diagnostics if any column can't be healed

Behavior matrix

Scenario Behavior
All columns present (normal case) Silent pass, zero overhead beyond one info_schema query
Missing columns, self-heal succeeds Logs warning + healed columns, continues normally
Missing columns, self-heal fails Throws with list of failed columns + fix instructions
Table doesn't exist yet Skipped (will be created by schema.sql on next run)
PGLite engine Not called (in-process, no PgBouncer, no silent failures)

Files changed

  • src/core/schema-verify.ts (new) — parseExpectedColumns(), simplifyColumnDef(), verifySchema()
  • src/core/postgres-engine.ts — calls verifySchema(this) at end of initSchema()
  • src/core/db.ts — re-exports verifySchema for backward-compat module-level callers
  • test/schema-verify.test.ts (new) — 12 tests covering parser + simplifier
  • package.json — version bump to 0.22.6
  • CHANGELOG.md — release notes

Testing

  • 12 unit tests covering schema parser and definition simplifier
  • Parser correctly extracts all columns from all tables including the production-failing ones (symbol_type, start_line, end_line, parent_symbol_path, search_vector)
  • Typecheck passes clean
  • simplifyColumnDef correctly handles: REFERENCES with ON DELETE/UPDATE, nested CHECK constraints, UNIQUE, vector types, array types, TSVECTOR

View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

PgBouncer transaction-mode poolers can silently swallow ALTER TABLE
statements: the SQL doesn't error, but the column never gets created.
The migration system increments the schema version counter anyway, so
gbrain thinks it's on the latest version but the actual table is missing
columns. This caused production embed failures when the embed handler
tried to INSERT into columns that didn't exist.

Add verifySchema() that runs after all migrations complete:
1. Parses CREATE TABLE + ALTER TABLE ADD COLUMN from schema-embedded.ts
2. Queries information_schema.columns for actual DB state
3. Diffs expected vs actual columns
4. Self-heals missing columns via ALTER TABLE ADD COLUMN IF NOT EXISTS
5. Throws with actionable diagnostics if self-heal fails

Called from PostgresEngine.initSchema() after runMigrations().
PGLite skipped (in-process, no PgBouncer).
@garrytan garrytan merged commit be8fffa into master Apr 28, 2026
4 checks passed
garrytan added a commit that referenced this pull request Apr 28, 2026
Merges 5 master commits since last merge: v0.22.1 autopilot fix wave (#447),
v0.22.2 minions worker reliability (#458), v0.22.4 frontmatter-guard (#448),
sourceId in cycle sync phase (#475), and post-migration schema verification (#488).

Conflict resolutions:
- VERSION: kept this branch's reserved 0.27.0 slot (master at 0.22.6).
- CHANGELOG.md: kept v0.27.0 entry at top, then master's v0.22.6 → v0.21.0 entries below in order.
- CLAUDE.md: merged the v0.27 cycle bullet (8 phases, synthesize, patterns, transcript-discovery, dream CLI flags) with master's v0.22.1/v0.22.5 cycle additions (signal: AbortSignal, willRunExtractPhase, resolveSourceForDir).
- src/core/cycle.ts: kept v0.27 yieldDuringPhase + synthInputFile/synthDate/synthFrom/synthTo CycleOpts fields AND added master's v0.22.1 signal: AbortSignal field. Both coexist.
- llms-full.txt: regenerated against the merged tree.

The dream_verdicts schema migration moved v25 → v30 in the prior merge.
Master ended at v29 (cathedral_ii_code_edges_rls); v30 is uncontested.

Tests pass post-merge: 105/105 dream + cycle tests across 9 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mgunnin added a commit to mgunnin/gbrain that referenced this pull request Apr 28, 2026
* upstream/master:
  v0.22.6.1 fix: PGLite/initSchema upgrade-hardening wave (closes 2-year wedge cycle) (garrytan#440)
  fix: post-migration schema verification with self-healing (garrytan#488)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant