v0.21.1 fix: initSchema bootstrap order — multi-version upgrades succeed#432
v0.21.1 fix: initSchema bootstrap order — multi-version upgrades succeed#432lloydarmbrust wants to merge 1 commit into
Conversation
PostgresEngine.initSchema() ran SCHEMA_SQL (the embedded latest-state
schema) before runMigrations(). For an existing brain at an intermediate
version, SCHEMA_SQL contains CREATE INDEX statements that reference
columns added by intermediate migrations (e.g. idx_pages_source_id on
pages.source_id, added in migration v21). The index creation aborts
with `column "source_id" does not exist` and the upgrade wedges.
When config.version > 0 is detected, run migrations first on a
best-effort basis so column-dependent indexes succeed. Then SCHEMA_SQL
runs as latest-state enforcer. Then a final runMigrations() pass mops
up post-SCHEMA_SQL migrations like the v24 RLS-backfill that reference
tables created by SCHEMA_SQL (subagent_messages, etc.). Fresh installs
unchanged: SCHEMA_SQL first, migrations second, no config.version row
to trigger the existing-brain branch.
Verified against a real Postgres rolled back to a v4-era schema:
recovery now reaches LATEST_VERSION and pages.source_id +
sources('default') exist post-recovery. Test added.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Master is at v0.21.0. Open PRs claim v0.21.1 (#432) and v0.24.0 (#387). v0.25 is the first uncontested slot, so this branch claims it. Pure rename across VERSION, package.json, CHANGELOG header, and every "v0.22.0" reference in CLAUDE.md / README.md / TODOS.md / docs/eval-capture.md / src/ / test/ files. CHANGELOG date bumped to 2026-04-26. llms.txt + llms-full.txt regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Hi, Severity: action required | Category: reliability How to fix: Log to stderr only Agent prompt to fix - you can give this to your LLM of choice:
We noticed a couple of other issues in this PR as well - happy to share if helpful. Found by Qodo code review |
…ct test (#437) * feat(v0.22.0): eval_candidates + eval_capture_failures schema (Lane 1A) R1 substrate for BrainBench-Real, replayed onto master after Cathedral II landed. Migration v30 (slotted after master's v25-v29 Cathedral II wave) creates two tables: eval_candidates: per-call capture of MCP/CLI/subagent query+search traffic. Column set lets gbrain-evals replay with full fidelity — source_ids from v0.18 multi-source, vector_enabled/detail_resolved/ expansion_applied so replay knows what hybridSearch actually did, remote + job_id + subagent_id so rows are traceable to their origin. query is CHECK-capped at 50KB; PII scrubber (Lane 1B) runs before insert. eval_capture_failures: cross-process audit trail. In-process counters don't work because `gbrain doctor` runs in a separate process from the MCP server. Persistent rows let doctor query capture health via COUNT(*) GROUP BY reason over the last 24h. Both tables get RLS on Postgres gated on BYPASSRLS (matches v24/v29 posture). PGLite ignores RLS; sqlFor split carries only DDL. 5 new BrainEngine methods (breaking-interface addition, drives v0.22.0 minor bump): logEvalCandidate, listEvalCandidates, deleteEvalCandidatesBefore, logEvalCaptureFailure, listEvalCaptureFailures. listEvalCandidates uses ORDER BY created_at DESC, id DESC so `gbrain eval export` is deterministic across same-millisecond inserts. Also adds HybridSearchMeta type for the side-channel callback used by Lane 1C's op-layer capture (no change to hybridSearch return shape — that respects Cathedral II's existing SearchResult[] contract). Tests: 14 PGLite round-trip cases + 8 v30 structural assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(v0.22.0): PII scrubber + op-layer capture module (Lane 1B) Replayed onto master post-Cathedral II. Same semantics as the original v0.21.0 work — only adjusted to import HybridSearchMeta from types.ts (canonical home) instead of redeclaring it locally. src/core/eval-capture-scrub.ts — pure-function regex scrubber with 6 pattern families: emails, phones (US + E.164), SSN (year-aware), Luhn-verified credit cards, JWT-shaped tokens, bearer tokens. Zero deps. Adversarial-input safe. src/core/eval-capture.ts — op-layer hook helper: - buildEvalCandidateInput(ctx, {scrub_pii}) — pure row builder - classifyCaptureFailure(err) — Postgres SQLSTATE → reason tag - captureEvalCandidate(engine, ctx, opts) — best-effort, never throws - isEvalCaptureEnabled / isEvalScrubEnabled — file-plane config checks GBrainConfig gains `eval?: {capture?, scrub_pii?}`. Both default ON. File-plane only — `gbrain config set` writes the DB plane, doesn't control capture. Tests: 17 scrubber + 21 capture-module cases. Zero regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(v0.22.0): hybridSearch onMeta callback + op-layer capture (Lane 1C) Replayed onto master. Adapted from the original v0.21.0 work to keep Cathedral II's contract intact: hybridSearch's return stays `Promise<SearchResult[]>` (unchanged), and meta surfaces via an optional `onMeta?: (meta: HybridSearchMeta) => void` callback in HybridSearchOpts. Cathedral II callers leave onMeta undefined and pay no cost. The op-layer capture wrapper passes a closure that threads meta into the captured row so gbrain-evals can distinguish: - "with OPENAI_API_KEY" vs "keyword-only fallback" (vector_enabled) - "expansion fired" vs "expansion requested + silently fell back" (expansion_applied) - what hybridSearch actually used after auto-detect (detail_resolved) Op-layer capture wired into both `query` and `search` op handlers in src/core/operations.ts. Single hook site catches MCP dispatch + CLI + subagent tool-bridge from the same place. Fire-and-forget, never throws, respects ctx.config.eval.capture off-switch. Tests: - test/hybrid-meta.test.ts (8 cases) — onMeta accuracy across the 4 return paths in hybridSearch + verification that omitting onMeta leaves Cathedral II callers unchanged. - test/mcp-eval-capture.test.ts (10 cases) — query/search ops capture correctly with MCP/CLI/subagent contexts, scrub on/off, capture=false off-switch, non-captured ops (list_pages, get_page), F1 failure isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(v0.22.0): gbrain eval export/prune + doctor eval_capture check (Lane 1D) Replayed onto master. Same semantics as the original v0.21.0 work. CLI: gbrain eval export [--since DUR] [--limit N] [--tool query|search] NDJSON to stdout, every row prefixed with "schema_version":1 per docs/eval-capture.md contract. EPIPE-safe streaming, stderr heartbeats, deterministic ordering (created_at DESC, id DESC). gbrain eval prune --older-than DUR [--dry-run] Explicit retention cleanup. Requires --older-than (never deletes without a window). Duration strings: 30d, 7d, 1h, 90m, 3600s. Legacy bare `gbrain eval --qrels …` still works via sub-subcommand fall-through. gbrain doctor gains an eval_capture check between markdown_body_completeness and queue_health: reads eval_capture_failures for the last 24h, groups by reason, warns when non-zero. Pre-v30 brains get "Skipped (table unavailable)" — non-fatal. docs/eval-capture.md ships the stable NDJSON schema reference for gbrain-evals consumers. Tests: 9 export cases + 5 prune cases. Doctor check covered by existing doctor tests on master. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(v0.22.0): public-exports contract test + CI count guard (Lane 2 / R2) Master locks 17 public subpath exports as gbrain's stable third-party contract. Zero enforcement existed. This PR locks the surface in two layers: 1. test/public-exports.test.ts — runtime contract test. Reads package.json "exports" at startup. For each subpath, imports via the package name ("gbrain/engine"), NOT the relative filesystem path — that's the difference between exercising the actual resolver and bypassing it. Every subpath gets a canary symbol pinned (e.g. gbrain/search/hybrid must export hybridSearch + rrfFusion) so a refactor that renames or removes one fails CI before downstream consumers (gbrain-evals) silently break. 2. scripts/check-exports-count.sh — CI structural guard. Wired into `bun test` after check-jsonb-pattern.sh + check-progress-to-stdout.sh + check-wasm-embedded.sh per master's precedent. EXPECTED_COUNT=17 baseline — shrinks fail loudly, growth also fails so the new canary must be pinned in the runtime test deliberately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs+e2e(v0.22.0): VERSION/CHANGELOG/CLAUDE/README + Postgres E2E (Lane 3) Bump VERSION + package.json to 0.22.0 (next free slot after master's v0.21.0 Code Cathedral II minor). CHANGELOG.md v0.22.0 entry follows the Garry voice template: - Bold 2-line headline - Lead paragraph contextualizing v0.20 + v0.21 + v0.22 progression - Numbers-that-matter table comparing v0.21.0 → v0.22.0 - "What this means for you" sectioned by audience - "## To take advantage of v0.22.0" operator runbook - Itemized changes CLAUDE.md updates: - Key files: 8 new module entries (eval-capture*, eval-export, eval-prune, docs/eval-capture.md, public-exports test). hybrid.ts entry rewritten to reflect the additive `onMeta` callback (return shape unchanged). - Key commands: new v0.22.0 section for `gbrain eval export`, `gbrain eval prune`, and the doctor `eval_capture` check, with the file-plane vs DB-plane config gotcha called out. README.md: one-paragraph pointer after the BrainBench blurb so anyone reading the landing page sees the new session-capture feature. llms.txt + llms-full.txt regenerated to pick up the doc additions. test/e2e/eval-capture.test.ts (Postgres-only E1 spec): - CHECK violation surfaces as Postgres SQLSTATE 23514 on oversize input - RLS is actually enabled on both eval_candidates + eval_capture_failures - 50 concurrent logEvalCandidate calls — no deadlock, all distinct IDs Skips gracefully when DATABASE_URL is unset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(todos): P0 — PGLite test-runner concurrency flake Pre-existing on master, surfaces ~27 false failures when bun test runs all 174 files together. Each failing file passes in isolation. Tracked for a dedicated investigation branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(v0.22.0): adversarial review post-fixes (doctor RLS, onMeta safety) Two surgical fixes from /ship adversarial review, plus 6 follow-ups TODO'd into v0.22.1: - doctor.ts: distinguish pre-v30 missing-table (42P01, ok skip) from RLS-denied SELECT (42501, warn) and other DB errors (warn). The check exists specifically to surface capture-failure misconfigs cross-process, so silently reporting "ok / skipped" on the most diagnostic class defeated the purpose. - hybrid.ts: wrap onMeta invocation in try/catch via small emitMeta helper. The callback is part of the public gbrain/search/hybrid contract; a throwing user-supplied closure must never break the search hot path. - TODOS.md: 6 P1 follow-ups (eval prune real COUNT, scrubber CC false positives, dead 'scrubber_exception' enum value, id-cursor for cross-window dedup, public-export canary pinning, EXPECTED_COUNT dedup). - TODOS.md: P0 entry for the pre-existing PGLite test-runner concurrency flake (~27 false failures in full bun test on master). - CHANGELOG.md: 2 bullets noting the doctor + onMeta hardening. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(version): bump v0.22.0 → v0.25.0 (queue-aware version pick) Master is at v0.21.0. Open PRs claim v0.21.1 (#432) and v0.24.0 (#387). v0.25 is the first uncontested slot, so this branch claims it. Pure rename across VERSION, package.json, CHANGELOG header, and every "v0.22.0" reference in CLAUDE.md / README.md / TODOS.md / docs/eval-capture.md / src/ / test/ files. CHANGELOG date bumped to 2026-04-26. llms.txt + llms-full.txt regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(v0.25.0): gbrain eval replay + contributor doc + CONTRIBUTING link Closes the gap between "session capture works" (this PR's core) and "contributors actually use it before merging." Three artifacts: - src/commands/eval-replay.ts (~340 LOC) — reads NDJSON from `gbrain eval export`, re-runs each captured query/search against the current brain, computes set-Jaccard@k, top-1 stability, and latency delta. Stable JSON shape (schema_version:1) for CI gating; human mode prints a regression table sorted worst-first. Pure Bun, zero new deps. Stub-engine tests cover Jaccard math, NDJSON parser (including v2 forward-compat rejection + line-numbered errors), --limit, --verbose, --json, and graceful per-row error handling. 16/16 passing. - docs/eval-bench.md (~80 lines) — contributor guide. The 4-command loop (export → change → replay → diff), metric definitions with healthy ranges (Jaccard ≥0.85, top-1 ≥85%, latency Δ within ±50ms), trigger paths, CI integration snippet, hand-crafted NDJSON corpus path for fresh installs, and the off-switch. Pairs with the existing docs/eval-capture.md which is the consumer-facing wire format. - CONTRIBUTING.md gains a "Running real-world eval benchmarks (touching retrieval code)" section with the trigger paths and a link to docs/eval-bench.md. Reviewers now have a one-line ask: "did you run replay?" CLAUDE.md key files updated. CHANGELOG bullets added. llms.txt regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(v0.25.0): CONTRIBUTOR_MODE flag — capture off by default for users Eval capture was on for everyone in the v0.25.0 draft. Privacy footgun: end users had retrieval traffic accumulate in their brain DB without asking, even with PII scrubbing. Flips to off by default + explicit opt-in for contributors who actually use the replay loop. Resolution order in isEvalCaptureEnabled(): 1. config.eval.capture === true → on 2. config.eval.capture === false → off 3. process.env.GBRAIN_CONTRIBUTOR_MODE === '1' → on 4. otherwise → off The env var is the contributor-facing toggle (one line in .zshrc, no JSON edit). Explicit config wins both directions for users who want to override per-brain. PII scrubbing gate stays independent — default true regardless of CONTRIBUTOR_MODE — so any brain that does capture still scrubs. Tests rewritten: env var hygiene per-test (origMode preserved + restored in finally). 9/9 pass; total v0.25.0 suite is 198/198. Docs: - README.md gains a Contributing-section pointer to the env var. - CONTRIBUTING.md gains a "CONTRIBUTOR_MODE — turn on the dev loop" section with verification commands and resolution-order table. - docs/eval-bench.md leads with the prerequisite (must set the env var for the rest of the doc to be useful). - docs/eval-capture.md "Config" section split into Path A (env var) + Path B (config) with explicit resolution-order rules. - CHANGELOG v0.25.0 entry corrected ("on by default" was wrong) plus a new top itemized bullet calling out the gate change. - CLAUDE.md eval-capture entry annotated with the new gate logic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: post-ship documentation pass for v0.25.0 Cross-references every doc against the final state of the branch (CONTRIBUTOR_MODE flag, eval replay tool, off-by-default capture): - README.md: top callout rewritten — was implying capture-on-by-default contradicting the gate landed in 7a80ce2. Now leads with "contributor opt-in" and links docs/eval-bench.md alongside docs/eval-capture.md. - AGENTS.md: new "Eval retrieval changes" task entry with the CONTRIBUTOR_MODE+replay one-liner so non-Claude agents (Codex, Cursor, Aider) have the same path. - CLAUDE.md: "Key commands added in v0.25.0" gains the replay command and a CONTRIBUTOR_MODE bullet covering the resolution order. - CHANGELOG.md: headline rewritten to match the actual feature ("benchmark retrieval changes against real captured queries before merging" — was "every real query is captured"). Stale "v0.22 ships the substrate" → v0.25. Test count corrected 82 → 144 (added 16 replay + 9 CONTRIBUTOR_MODE + 8 v31-shape tests since the original count). Two metric rows added to the numbers table: default-off posture, in-tree replay tooling. "To take advantage" block split into user vs contributor branches with shell-rc instructions. - TODOS.md: v0.22.1 follow-up reference corrected to v0.25.1. llms.txt + llms-full.txt regenerated. Typecheck clean. 198/198 v0.25.0 tests still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Closing — initSchema bootstrap order fix shipped in v0.22.6.1. Thanks for the report. If anything still reproduces on the latest release, please reopen with the version + repro. |
Summary
PostgresEngine.initSchema()ranSCHEMA_SQL(latest target schema) beforerunMigrations(). For any existing brain at an intermediate schema version,SCHEMA_SQLcontainsCREATE INDEXstatements that reference columns added by intermediate migrations (e.g.idx_pages_source_id ON pages(source_id), wherepages.source_idis added by migration v21). The index creation aborted withcolumn "source_id" does not existand the upgrade wedged.gbrain doctorreportedMINIONS HALF-INSTALLED (partial migration: 0.11.0)and re-runninggbrain apply-migrations --yeslooped on the same error.This PR detects existing brains via
config.versionand runs migrations first on a best-effort basis, so column-dependent indexes succeed. ThenSCHEMA_SQLruns, then a finalrunMigrations()pass mops up post-SCHEMA_SQLmigrations like the v24 RLS-backfill that needssubagent_messages(whichSCHEMA_SQLcreates).Repro case: a real production brain pinned at v4 (running gbrain 0.10.x for a while), tried
gbrain apply-migrations --yesafter upgrading to 0.20.4. Hit the wedge. PatchedinitSchema()and the same brain advanced to v24 cleanly.Pre-Landing Review
Diff is 30 lines of bootstrap logic + 113 lines of E2E test. Reviewed locally:
existingVersionis only non-null whenconfigtable exists and has aversionrow. New installs hit theelsebranch with the same SCHEMA_SQL → migrations order as today.initSchema()on an already-up-to-date brain runs migrations (no-op), then SCHEMA_SQL (no-op viaIF NOT EXISTS), then migrations again (no-op). Verified by the 4th test case.subagent_messagesyet), the swallow logs and continues. SCHEMA_SQL runs, then the second migrations pass retries the same migration with the now-existing tables and succeeds.Test plan
test/e2e/initschema-bootstrap.test.ts: 4 tests, all pass againstpgvector/pgvector:pg16. Rolls back to v4-era schema (dropspages.source_id,files.source_id,files.page_id,sourcestable,file_migration_ledgertable, restores pre-v21pages_slug_key), then assertsinitSchema()reachesLATEST_VERSIONand thatpages.source_id+sources('default')exist post-recovery. Also asserts idempotency on re-run.minions-shell.test.ts, unrelated to this change). Master before this patch: same 210/211.bun build --compilesucceeds.Manual repro before/after
CHANGELOG
v0.21.1entry added per the gbrain CHANGELOG voice rules: bold headline, lead paragraph, "what this means" closer, "To take advantage of v0.21.1" self-repair block (existing brains runninggbrain doctorneed it to surface this fix). NoPrivacyorSecurity hardeningframing — this is a straight bug fix.🤖 Generated with Claude Code