Skip to content

Upgrade from v0.17.x → v0.18.x on Postgres fails: SCHEMA_SQL runs before migrations, indexes reference not-yet-added columns #378

@valzav

Description

@valzav

Summary

On v0.18.2, upgrading a pre-v0.18 Postgres brain (schema ≤ v19) fails with column "source_id" does not exist during gbrain init --migrate-only. This blocks the v0.16.0 migration orchestrator (its Phase A shells out to init --migrate-only) and by extension any gbrain apply-migrations --yes run against an older brain.

Reproduced on a schema v15 Postgres brain upgrading to gbrain 0.18.2.

Root cause

Ordering bug in PostgresEngine.initSchema() (src/core/postgres-engine.ts:73):

async initSchema(): Promise<void> {
  await conn`SELECT pg_advisory_lock(42)`;
  try {
    await conn.unsafe(SCHEMA_SQL);              // ← runs first
    const { applied } = await runMigrations(this);  // ← migrations run after
    ...
  }
}

SCHEMA_SQL contains (schema.sql:74):

CREATE INDEX IF NOT EXISTS idx_pages_source_id ON pages(source_id);
  • On a fresh DB, the preceding CREATE TABLE IF NOT EXISTS pages (...) defines source_id as part of the table (schema.sql:52-66), so the index creation succeeds.
  • On an existing v15 DB, CREATE TABLE IF NOT EXISTS pages is a no-op (NOTICE: "relation pages already exists, skipping"). The source_id column is added by migration v21 (pages_source_id_composite_unique) via ALTER TABLE ... ADD COLUMN IF NOT EXISTS source_id. That migration only runs after SCHEMA_SQL finishes — but SCHEMA_SQL dies first on the index creation.

Same pattern applies to files.source_id (schema.sql:282, column added in v23) and files.page_id (schema.sql:idx_files_page_id, column added in v23).

Reproduction

Any Postgres brain with schema_version < 20 upgrading to 0.18.x:

$ gbrain apply-migrations --yes
...
column "source_id" does not exist
Migration v0.16.0 reported status=failed.

Full error chain:

  • apply-migrations --yes invokes migration orchestrator v0.16.0
  • v0.16.0 Phase A runs gbrain init --migrate-only
  • init --migrate-only calls engine.initSchema()
  • initSchema() runs SCHEMA_SQL before runMigrations()
  • SCHEMA_SQL hits CREATE INDEX ... ON pages(source_id) → 42703 column "source_id" does not exist
  • Postgres rolls back, migration status recorded as partial / schema phase failed

Suggested fix

Run runMigrations() before SCHEMA_SQL on existing brains, or split SCHEMA_SQL so object-creating DDL (CREATE TABLE) runs first and additive DDL that assumes multi-source columns (CREATE INDEX ... ON pages(source_id), CREATE INDEX ... ON files(source_id), CREATE INDEX ... ON files(page_id)) runs after migrations have had a chance to ALTER TABLE ... ADD COLUMN IF NOT EXISTS.

Lowest-risk fix: guard the three post-v0.18 indexes in schema.sql with a column-existence check, e.g.

DO $$ BEGIN
  IF EXISTS (SELECT 1 FROM information_schema.columns
             WHERE table_name = 'pages' AND column_name = 'source_id') THEN
    CREATE INDEX IF NOT EXISTS idx_pages_source_id ON pages(source_id);
  END IF;
END $$;

This keeps fresh-install behavior identical (the column is always present from CREATE TABLE) and lets the migration runner add the column + index on upgrade paths.

Alternative: change the init flow so migrations run first on existing brains (gated on whether the schema version row exists). More invasive; more correct long-term.

Workaround (what I did to unblock)

Applied the v20/v21/v23 ALTER TABLE + CREATE TABLE statements manually, then re-ran gbrain apply-migrations --yes. All statements in the migrations are already IF NOT EXISTS-guarded, so they no-op'd when the migration runner revisited them.

BEGIN;

-- v20
CREATE TABLE IF NOT EXISTS sources (
  id            TEXT PRIMARY KEY,
  name          TEXT NOT NULL UNIQUE,
  local_path    TEXT,
  last_commit   TEXT,
  last_sync_at  TIMESTAMPTZ,
  config        JSONB NOT NULL DEFAULT '{}'::jsonb,
  created_at    TIMESTAMPTZ NOT NULL DEFAULT now()
);
INSERT INTO sources (id, name, local_path, last_commit, config)
SELECT 'default', 'default',
       (SELECT value FROM config WHERE key = 'sync.repo_path'),
       (SELECT value FROM config WHERE key = 'sync.last_commit'),
       '{"federated": true}'::jsonb
WHERE NOT EXISTS (SELECT 1 FROM sources WHERE id = 'default');

-- v21
ALTER TABLE pages ADD COLUMN IF NOT EXISTS source_id TEXT
  NOT NULL DEFAULT 'default' REFERENCES sources(id) ON DELETE CASCADE;
CREATE INDEX IF NOT EXISTS idx_pages_source_id ON pages(source_id);

-- v23 (columns only; migration runner handles the rest)
ALTER TABLE files ADD COLUMN IF NOT EXISTS source_id TEXT
  NOT NULL DEFAULT 'default' REFERENCES sources(id) ON DELETE CASCADE;
ALTER TABLE files ADD COLUMN IF NOT EXISTS page_id INTEGER
  REFERENCES pages(id) ON DELETE SET NULL;

COMMIT;

After this, gbrain apply-migrations --yes walked cleanly v15 → v24, all three pending migrations (0.16.0, 0.18.0, 0.18.1) completed, and gbrain doctor --json went from health_score: 40 (unhealthy) to health_score: 85 (warnings) with RLS on 23/23 tables.

Environment

  • gbrain 0.18.2 (commit 08b3698)
  • Postgres (Supabase-compatible pooler at port 6543 not used here — direct local Postgres at 5435)
  • pgvector installed, pgbouncer not in front
  • Starting schema_version: 15, target: 24

Scope

Affects anyone upgrading from gbrain ≤ 0.17.x to 0.18.x on Postgres. PGLite is unaffected (fresh-install path re-creates tables per boot).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions