Skip to content

v0.18 migration chain fails on PGLite v16→v24 upgrade: column "source_id" does not exist #370

@ChenyqThu

Description

@ChenyqThu

Summary

Running gbrain apply-migrations --yes (or gbrain init --migrate-only
directly) against a PGLite brain at schema v16 with any meaningful page
count fails the migration chain. schema_version stays at 16 and the
sources table is never created. Data integrity is preserved — the
failure is clean — but the upgrade path is blocked.

Reproduced against feat/migration-hardening branch (= v0.18.2, PR
#356) as well as master (v0.18.1). v0.18.2's hardening addresses the
field report's Postgres-side issues (57014 timeouts, v21→v23 FK
integrity window, doctor --locks) but does not cover this PGLite
upgrade-path regression.

Reproduction (5 commands)

Requires: any gbrain PGLite brain at schema v16 with at least a few
hundred pages. Here I used an isolated copy of my production brain
(~1800 pages) in a throwaway \$HOME.

# 1. Clone upstream at migration-hardening branch, install deps
git clone https://github.com/garrytan/gbrain.git /tmp/gbrain-peek
cd /tmp/gbrain-peek && git checkout feat/migration-hardening && bun install

# 2. Set up isolated HOME pointing at a copy of a v16 PGLite brain
SMOKE=/tmp/gbrain-smoke-\$(date +%s)
mkdir -p \$SMOKE/.gbrain
cp -R ~/.gbrain/brain.pglite \$SMOKE/.gbrain/brain.pglite
printf '{\n  \"engine\": \"pglite\",\n  \"database_path\": \"%s/.gbrain/brain.pglite\"\n}\n' \"\$SMOKE\" > \$SMOKE/.gbrain/config.json

# 3. Verify pre-migration read works (it does — engine connects fine)
HOME=\$SMOKE bun src/cli.ts stats

# 4. Attempt the upgrade
HOME=\$SMOKE bun src/cli.ts apply-migrations --yes

# 5. Observe: schema version never advanced
HOME=\$SMOKE bun src/cli.ts doctor 2>&1 | grep schema_version

Expected

Schema advances v16 → v24. gbrain sources list shows seeded
`default` source with page count matching the brain's existing
inventory.

Actual

  • Direct init --migrate-only throws immediately:

    column \"source_id\" does not exist
    
  • apply-migrations --yes orchestrator path is softer but
    equivalent net outcome:

    === Applying migration v0.13.0: ... ===
    ... extract.links_db 100% done (no errors)
    Migration v0.13.0 complete.
    === Applying migration v0.13.1: ... ===
    Migration v0.13.1 finished as PARTIAL.
    === Applying migration v0.14.0: ... ===
    Migration v0.14.0 complete.
    === Applying migration v0.16.0: ... ===
    Schema up to date (engine: pglite).
    Migration v0.16.0 complete.
    === Applying migration v0.18.0: ... ===
    Schema up to date (engine: pglite).
    Migration v0.18.0 reported status=failed.
    

    — yet post-run `doctor` reports `schema_version: Version 16,
    latest is 24` and `gbrain sources list` returns `relation
    "sources" does not exist`. Pages table still has its v16 columns
    (no `source_id` added). Data unharmed.

Probable root cause

`src/core/pglite-engine.ts` references `pages.source_id` in
multiple engine methods used during the migration chain (not just
post-v21). Sample sites in the v0.18.2 branch:

  • Line 140: `ON CONFLICT (source_id, slug) DO UPDATE` in page upserts
  • Lines 226, 256: `SELECT p.slug, p.id as page_id, p.title, p.type,
    p.source_id FROM pages p ...`
  • Lines 395, 406-416: `addLinksBatch` JOINs pages on
    `(slug, source_id)`
  • Lines 759-765: `addTimelineEntriesBatch` same pattern

The v0.13.0 orchestrator triggers `gbrain extract links --source db`
as part of its phases. That call reaches one of these engine methods
at a point where schema is still at v16 (v21 hasn't added the
`source_id` column yet). The SQL errors in the engine bubble up to
the orchestrator, which captures them as `status=failed` without
surfacing the root cause; or in the `init --migrate-only` direct
path, they throw uncaught.

Fresh installs never trigger this because their schema starts at v24
(no upgrade chain runs). The CHANGELOG mentions v0.18.0 was tested
against real PGLite in <1s for the integration test — suggesting
that test exercised fresh install only, not v16→v24 upgrade.

Environment

Suggestions for a fix

One of these should work:

  1. Orchestrator ordering: have the v0.18.0 orchestrator run
    `init --migrate-only` (schema DDL) BEFORE any phase that calls
    engine methods using post-v21 columns. Currently the v0.13.0
    orchestrator's extract phase runs "up front" with engine methods
    that assume v24 schema.

  2. Engine version awareness: PGLite engine methods that reference
    `source_id` could check `schema_version` at call time and
    fall back to pre-v21 SQL shapes when schema < v21. More complex,
    but survives out-of-order migration runners.

  3. Simpler: split v21's `ADD COLUMN source_id` out to run FIRST
    in the MIGRATIONS[] array under a dedicated early orchestrator
    (e.g. `v0_18_0_prep`) that runs before any extract-style
    orchestrator phase. The engine methods would then always have the
    column.

Preserved artifacts for further debugging

My smoke environment is intact if deeper investigation helps:

  • `/tmp/gbrain-upstream-peek/` — built-from-source v0.18.2 binary
    • compiled bundle
  • `/tmp/gbrain-smoke-v018-1776964434/` — 285 MB isolated throwaway
    HOME with copy of the v16 brain that fails the migration

Happy to rerun with additional instrumentation or share the brain
itself privately if useful (it contains personal notes so can't go
on a public S3).


Filed by @ChenyqThu — Jarvis-KOS-v2 (private fork of gbrain at
ChenyqThu/jarvis-knowledge-os-v2, upgrade-blocked at v0.17.0 due
to this issue). Fork policy forbids local `src/*` patches; we're
holding at v0.17 until upstream fixes the PGLite upgrade path. No
urgency on our end — v0.17 features cover our Step 2.3 needs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions