Skip to content

fix(brainstorm): cost guardrails + judge overflow + far set cap#1234

Closed
garrytan-agents wants to merge 2 commits into
garrytan:masterfrom
garrytan-agents:fix/brainstorm-cost-guardrails
Closed

fix(brainstorm): cost guardrails + judge overflow + far set cap#1234
garrytan-agents wants to merge 2 commits into
garrytan:masterfrom
garrytan-agents:fix/brainstorm-cost-guardrails

Conversation

@garrytan-agents

Copy link
Copy Markdown
Contributor

Incident: LSD Brainstorm 53× Cost Overrun

Estimated: $0.96 → Actual: $50.71 on a 13,690-page brain. Zero ideas delivered.

Root Causes

  1. Far set explosionlistPrefixSampledPages returned one page per prefix (~2K prefixes → 1,985 pages instead of configured 12)
  2. No cost circuit breaker — no mechanism to abort when actual spend diverges from estimate
  3. Judge context overflow — 15,868 ideas at ~350 tokens each = 5.5M tokens, exceeding Sonnet 1M limit
  4. Unpaired UTF-16 surrogates in OCR/import pages crashed JSON serialization
  5. No per-cross timeout — individual crosses could hang indefinitely

Implemented Fixes

P1: Far set cap (domain-bank.ts)

  • Shuffle candidate prefixes, slice to maxFarSet (default max(m*4, 50)) before SQL
  • Final trim to m by distance score
  • Bill now scales with m, not |prefixes|

P2: Cost guardrails (brainstorm.ts + orchestrator.ts)

  • --max-cost <usd> (default $5): hard-abort pre-run
  • --strict-budget: abort mid-run if spend exceeds 5× estimate
  • --max-far-set <n> (default 50): explicit cap
  • --judge-model <id>: route judge to larger-context model

P3: Judge chunking (judges.ts)

  • Split ideas into batches of 100 (configurable via --max-ideas-per-judge-call)
  • Each batch is separate LLM call; results concatenated
  • 15,868 ideas → 159 calls of ~100 instead of one 3M-token call

P4: Unicode sanitization (orchestrator.ts)

  • Strip unpaired UTF-16 surrogates before building cross prompts
  • Prevents JSON-encoding crashes on OCR/import-derived pages

Postmortem

Full incident report with token flow forensics and architectural proposals (global budgets for all analysis functions, diarization, checkpointing) in docs/incidents/2026-05-20-lsd-cost-explosion.md.

Proposed Future Work

  • P5: Global token/time/cost budgets for ALL gbrain analysis functions (brainstorm, dream, extract, enrich, eval, integrity, doctor)
  • P6: Diarization — summarize oversized payloads to fit context instead of failing
  • P7: Structured error recovery with checkpointing for interrupted runs

Tests

  • 12 new tests in test/brainstorm/cost-guardrails.test.ts
  • Full brainstorm suite: 82 pass, 0 fail
  • tsc --noEmit: clean

root added 2 commits May 20, 2026 16:08
- Add --max-cost flag (default $5) to brainstorm/lsd commands; hard-aborts
  pre-run if estimate exceeds, and mid-run if running cost overshoots.
- Add --max-far-set flag (default max(m*4, 50)) to cap the domain bank's
  prefix-stratified sampling. listPrefixSampledPages returns one page per
  prefix; on a 13K-page brain with ~2K distinct prefixes this was pulling
  ~1985 far pages instead of the configured m=6. fetchFar now shuffles +
  caps the prefix list, and trims final pages to m by distance score.
- Add --strict-budget flag: abort mid-run if running cost exceeds 5x the
  initial estimate (warn-only by default).
- Chunk the judge phase (default 100 ideas per LLM call, --max-ideas-per-judge-call
  to override). Large brain runs produced 15K+ ideas, blowing past the
  model's 1M-token context in a single call. Now batched and concatenated.
- Add --judge-model flag for routing the judge phase to a larger-context
  model when needed.
- Sanitize unpaired UTF-16 surrogates in cross-prompt content (close+far
  page bodies, titles, question) to prevent JSON-encoding crashes on
  OCR/import-derived pages with lone surrogates.

Fixes: 53x cost overrun on 13K-page brain ($0.96 estimate vs $50.71 actual)
Fixes: judge phase 3M-token overflow > 1M model context
Fixes: 1985-page far set when m_far was configured at 6
Incident report covering:
- Root cause analysis (5 contributing factors)
- Observed token flow and cost breakdown
- Implemented fixes (P1-P4) in dc080ac
- Proposed architectural changes:
  - P5: Global token/time/cost budgets for ALL analysis functions
  - P6: Diarization/summarization for oversized payloads
  - P7: Structured error recovery with checkpointing

Key insight: every gbrain analysis function that makes LLM calls
needs configurable budgets (tokens, cost, wall-clock time) with
graceful degradation on exhaustion.
garrytan added a commit that referenced this pull request May 22, 2026
… fix (#1283)

* feat(brainstorm): T1 cost guardrails + judge chunking + far-set cap

Ports PR #1234 with a typed-error swap (Q2). Brings:

- `--max-cost`, `--max-far-set`, `--strict-budget`, `--judge-model`,
  `--max-ideas-per-judge-call` CLI flags on `gbrain brainstorm` / `lsd`
- Domain-bank prefix-cap + shuffle + final-trim to `m` by distance score
- Judge auto-chunks idea sets > 100 across multiple LLM calls
- UTF-16 surrogate sanitization on cross prompts
- Phase-0.5 hard cost ceiling + mid-run cost guard

Phase-1 diff from PR #1234: per-cross error-rethrow uses inline typed
`BudgetExhausted` instead of string-match on the error message. Phase 2
of the wave will move the class to `src/core/budget/budget-tracker.ts`
and the orchestrator will import it.

Postmortem doc + 12-case regression test included verbatim from #1234.

T1 of the brainstorm cost cathedral plan
(~/.claude/plans/system-instruction-you-are-working-rippling-moth.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(budget): T2 BudgetTracker + BudgetExhausted + audit-week helper

The keystone primitive for the v0.37.x budget cathedral. One class,
one typed error, one schema-stable audit JSONL. Replaces three parallel
copies (brainstorm orchestrator inline class, cycle/budget-meter,
eval-contradictions cost-prompt/tracker) — those adapt to this one in
T5/T6.

Contracts pinned by 26 unit tests:
  - TX1: record() throws BudgetExhausted(reason:'cost') when cumulative
    spend > cap. A single underestimated call cannot leak past the cap.
  - TX2: reserve() hard-fails with BudgetExhausted(reason:'no_pricing')
    when cap is set + model is missing from pricing maps. When cap is
    unset, legacy warn-once behavior is preserved.
  - A3 amended: extractUsageFromError(err, fallback) returns err.usage
    when SDK provides it, else the pessimistic fallback (caller passes
    maxOutputTokens, not the optimistic pre-call estimate).
  - onExhausted callback fires once, synchronously, before the throw
    propagates. Callbacks do sync I/O (writeFileSync) for checkpoint
    persistence.
  - Audit JSONL is schema-stable: every line carries schema_version=1.
    Reorderings tolerated, field renames are breaking.

Also ships src/core/audit-week-file.ts — the shared ISO-week filename
helper consumed by every audit writer in T4. Year-boundary correctness
pinned by 5 cases including 2020-W53 (the 53-week year), 2025-W01
rolling in from 2024-12-30 (Monday), and the GBRAIN_AUDIT_DIR override.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gateway): T3 withBudgetTracker + AsyncLocalStorage composition

TX5: every gateway.chat / embed / rerank call now auto-composes the
active BudgetTracker via a module-internal AsyncLocalStorage. No
per-call injection seam, no flag plumbing — callers wrap their
entrypoint in `withBudgetTracker(tracker, async () => { ... })` and
every downstream LLM call honors the cap.

Outside any scope, the gateway is a budget no-op (back-compat with the
pre-v0.37 contract).

Wiring:
  - chat(): reserves on entry using prompt-char heuristic + opts.maxTokens.
    Records actual usage from result.usage on success; on failure, charges
    the pessimistic A3-amended fallback so the cap is real.
  - embed(): reserves total estimated input tokens (chars / chars-per-token).
    Records the same total in try/finally; SDK doesn't surface per-batch
    embed token counts.
  - rerank(): reserves and records query + docs char count.
    Reranker pricing isn't in the canonical map yet, so reserve() takes
    the warn-once path under no-cap and the TX2 hard-fail under cap.

6 unit cases pin the contract: chat auto-composes, outside-scope is
no-op, nested scope restores outer, over-cap reserve throws BEFORE
provider call (proves circuit breaker), TX1 mid-run cumulative cap
fires via record(), parallel Promise.all scopes do not bleed trackers.

All 255 existing gateway tests and 50 brainstorm tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(audit): T4 migrate 4 audit writers to shared isoWeekFilename helper

Q1: extract the ISO-week filename math into one canonical helper
(src/core/audit-week-file.ts, landed in T2) and migrate every audit
JSONL writer in the codebase to consume it.

Sites migrated:
  - src/core/minions/handlers/shell-audit.ts  (shell-jobs-YYYY-Www.jsonl)
  - src/core/facts/phantom-audit.ts            (phantoms-YYYY-Www.jsonl)
  - src/core/audit-slug-fallback.ts            (slug-fallback-YYYY-Www.jsonl)
  - src/core/cycle/budget-meter.ts             (dream-budget-YYYY-Www.jsonl)

Each call site had its own copy of the ISO-week-from-Date algorithm.
They mostly agreed but subtle drift was already accumulating (one used
local time, one approximated the Thursday-anchor formula, etc.). One
helper, one set of regression tests, no drift.

Compute helpers (computeAuditFilename, computePhantomAuditFilename,
computeSlugFallbackAuditFilename) are preserved as thin wrappers so
existing import sites and tests don't break.

All audit + slug-fallback + phantom + budget-meter tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cycle): T5 BudgetMeter schema_version=1 + golden fixture (A2 amended)

Adapter pass: the existing BudgetMeter keeps its public shape
(`BudgetMeter`, `SubmitEstimate`, `BudgetCheckResult`) verbatim so every
dream-cycle call site keeps working without rewires. The audit JSONL
grew one new field on every line: `schema_version: 1`.

A2 amended: the codex outside-voice review relaxed the byte-stable
contract to schema-stable. Field reorderings are tolerated; the
documented set (schema_version, ts, phase, event, model, label,
plus per-event cost or token fields) is what every consumer can rely
on. Renames or removals are breaking.

test/fixtures/dream-budget-schema-v1.jsonl carries one canonical row
per event variant (submit / submit_denied / submit_unpriced) as
documentation of the schema. The new in-suite case in
test/budget-meter.test.ts walks every emitted line and asserts the
fields are present + the right type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval): T6 wrap eval-contradictions runner in withBudgetTracker

The runner now installs a BudgetTracker scope around its body so every
gateway-layer chat / embed / rerank call (the judge model + per-query
embedding) auto-records via the AsyncLocalStorage from T3. Currently
telemetry-only — the existing CostTracker remains the primary soft-
ceiling enforcement, so the public --budget-usd surface and
PreFlightBudgetError shape are byte-identical.

The wiring is the seam: future waves can promote the cap to BudgetTracker
semantics (TX1 + TX2 semantics on cumulative + no_pricing) by passing
maxCostUsd through to BudgetTracker without touching the CLI.

All 79 eval-contradictions tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(doctor): T7 --remediate budget tracker + checkpoint + --resume (A4)

A4 amended: doctor --remediate gains a resumable cost ceiling. The
runRemediate loop now runs inside `withBudgetTracker(tracker, ...)` so
every gateway-routed LLM call inside a Minion handler (synthesize,
patterns, consolidate, embed) honors the cap. When BudgetExhausted
fires mid-run, the onExhausted callback persists a checkpoint of
completed step ids + idempotency_keys to
~/.gbrain/remediation/<plan_hash>.json BEFORE the throw propagates,
and the catch surfaces a paste-ready --resume hint.

Wire-up:
  - New --resume <plan_hash> flag (with implicit "most recent matching"
    when no hash given) loads the checkpoint and skips already-
    completed steps. Mismatched plan_hash refuses with an explicit
    message.
  - --max-cost is now an alias for --max-usd. Both spellings honored
    and threaded through to BudgetTracker.maxCostUsd so the cap is
    a real ceiling, not just pre-flight advice.
  - On BudgetExhausted, exit 1 with the resume hint; on clean
    completion, clear the checkpoint.

New file: src/core/remediation-checkpoint.ts with
computePlanHash / save / load / list / clear helpers. Atomic write
via .tmp + rename. Pinned by 13 unit cases including determinism +
sort-order invariance + schema-mismatch return-null + atomic-rename.

All 48 doctor.test.ts cases still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(subagent): T8 A1 ordering ASCII diagram before acquireLease

Documents the load-bearing ordering invariant: the gateway's
BudgetTracker reserve() runs (implicitly, via AsyncLocalStorage)
BEFORE acquireLease() inside the subagent loop. A BudgetExhausted
throw must NOT consume a rate-lease slot, because the lease is the
rate-limit pacer for the entire fleet.

The handler body intentionally does NOT explicitly thread BudgetTracker;
TX5 (gateway-layer composition) handles that. The comment is the
reader's signpost.

No behavioral change. All 58 subagent tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(diarize): T9 payload-fitter (P6) with batch + summarize + gate

Generic utility for fitting arbitrarily-large item lists into a
downstream caller's per-call token budget. Two strategies:

  - 'batch': deterministic token-budgeted chunking. No LLM calls. The
    fitted list shape matches the input; the caller decides how to
    consume it (e.g. brainstorm judge concatenates per-chunk results).
    Surfaces a `dropped` count for items that exceed the per-call cap.

  - 'summarize': embed-cluster into ceil(items/4) groups via cheap
    deterministic nearest-neighbor on cosine; Haiku-summarize each
    cluster via Promise.allSettled at parallelism=4 (Perf1). Each
    Haiku call composes the active BudgetTracker via the gateway's
    AsyncLocalStorage scope (T3) — no per-call injection.

Quality gate (codex outside-voice finding #4): when summarize's
success_ratio < min_success_ratio (default 0.75), the result is
flagged `degraded: true` so the caller (brainstorm) can decide to
surface a partial result or abort. The fitter itself preserves the
successful subset either way.

Tested via 4 cases across two files (T3 contract):
  - happy path (all clusters succeed → degraded=false)
  - partial failure tolerated (1/5 fails, success_ratio=0.8 > 0.75 → degraded=false)
  - high-failure rate flips the gate (3/5 fails → degraded=true)
  - budget-respecting (BudgetExhausted thrown mid-cluster propagates
    via Promise.allSettled)

11 unit cases across batch + summarize. Brainstorm + cost-guardrails
tests still green; judges.ts internal chunking deferred to a follow-up
wave (TODOS) so the existing chunked-batch contract stays byte-stable
during this drop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brainstorm): T10 checkpoint + --resume with full idea bodies (P7)

The brainstorm cathedral capstone. Crashed runs can resume cleanly via
`gbrain brainstorm --resume <run_id>` (and `gbrain lsd --resume` etc).

TX3 load-bearing contract: completed_crosses on disk carries FULL idea
bodies (~50KB per run), not just counts. The resumed BrainstormResult
contains the pre-crash ideas (loaded from disk) merged with the post-
resume ideas — codex's outside-voice finding was that a resume that
produces only "what we generated this run" is silent partial output.

TX4 single rule: --resume continues any cross not in completed_crosses.
The proposed --retry-failed was dropped per codex review; failed AND
never-attempted crosses both go through --resume.

A5 amended: run_id = sha256(question + profile + sort(close_slugs) +
sort(far_slugs)).slice(0,16). NO embedding bits — stable across
embedding-model swaps. 7-day mtime-based GC.

Q2 fold: orchestrator.ts drops its inline BudgetExhausted class and
re-exports the canonical one from src/core/budget/budget-tracker.ts
(Phase 2). runBrainstorm now wraps the body in withBudgetTracker so
every gateway-layer chat call auto-records cost. The cap remains
opts.maxCostUsd (default $5).

New CLI flags:
  --resume <run_id>   Continue any cross not in completed_crosses.
                      Refuses to start when run_id doesn't match the
                      active inputs (paste-ready hint).
  --force-resume      Bypass the 7-day staleness gate.
  --list-runs         Print saved run_ids and exit.

Cycle purge phase (the 9th cycle phase) now also GCs stale brainstorm
checkpoints alongside op_checkpoints (~7d window).

Tests:
  - 20 unit cases in test/brainstorm/checkpoint.test.ts:
    computeRunId is deterministic + slug-array-order invariant + stable
    across embedding-model swaps; round-trip preserves ideas verbatim;
    saveCheckpoint atomic via .tmp+rename; loadCheckpoint returns null
    on missing/schema-mismatch/corrupt-JSON; gcStaleCheckpoints unlinks
    >N days; listRuns mtime-ordered.
  - 3 E2E cases in test/e2e/brainstorm-resume.test.ts:
    crash on cross 4 → first run aborts with checkpoint of crosses 1..N
    with full idea bodies; second run with resumeRunId merges pre-crash
    + post-resume ideas (TX3 contract); mismatched run_id refuses with
    paste-ready hint.

The PGLite schema-gap workaround in the E2E (CREATE VIEW page_links AS
SELECT * FROM links) is filed as a follow-up in TODOS T12 — the
real-engine brainstorm path needs that view to materialize as a
canonical schema fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: T11 + T12 wave release docs + deferred follow-ups

CHANGELOG entry for the brainstorm cost cathedral (Unreleased slot;
/ship will assign the next version):
  - ELI10 lead per CLAUDE.md voice rules
  - "How to turn it on" with paste-ready commands
  - "Things to watch" calls out the A4 semantic shift for
    `doctor --remediate --max-usd` (pre-flight → mid-run abort
    with resumable checkpoint)
  - Itemized changes by file/area
  - "For contributors" section noting the 73 new tests + the PGLite
    schema-gap workaround for the E2E

CLAUDE.md Key Files: 6 new entries for budget-tracker, audit-week-file,
gateway withBudgetTracker, payload-fitter, brainstorm/checkpoint,
remediation-checkpoint. Regenerated llms-full.txt + llms.txt (passes
test/build-llms.test.ts).

docs/incidents/2026-05-20-lsd-cost-explosion.md gains a closing
"Shipped in v0.37.x (the budget cathedral wave)" section listing P1-P7
completion status + the deferred follow-ups so the incident's audit
trail closes the loop.

TODOS.md gets a new top section for the wave's deferred items:
  - PGLite `page_links` schema gap fix
  - Explicit --max-cost on extract / enrich / integrity auto
  - P5 config-schema budgets: block in ~/.gbrain/config.json
  - Multi-day brainstorm resume (>7d)
  - Async-batched audit writes (profiling trigger criterion)
  - BudgetLedger unification with BudgetTracker
  - judges.ts internal chunking → payload-fitter delegation

Also: fixed a payload-fitter typecheck error (ChatFn import). Final
typecheck is clean on every file the wave touched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(schema): F1 page_links view alias for both engines

Brainstorm's domain-bank queries reference `page_links` (pglite-engine.ts:896,
postgres-engine.ts:959) but the canonical table is `links`. Without the alias
view, `gbrain brainstorm` against PGLite fails with `relation "page_links"
does not exist`; the same was a latent bug on Postgres.

This commit lands the fix at three sites:

1. `src/core/pglite-schema.ts` — embedded schema bundle gets the view at
   table-bundle time, so fresh PGLite installs are correct from boot.
2. `src/core/migrate.ts` v81 (`page_links_view_alias`) — existing brains on
   either engine pick up the view via `gbrain apply-migrations`. CREATE OR
   REPLACE VIEW is idempotent; re-running is safe.
3. `test/e2e/brainstorm-resume.test.ts` — removed the ad-hoc workaround view
   from the test setup. The E2E now exercises the same schema path real
   users will see.

`TODOS.md` entry for the gap closed out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(brainstorm): F2 pre-flight --max-cost refusal smoke E2E

Pins the user-facing path that closed the original \$50 incident: when
the pre-run estimate exceeds the configured cap, runBrainstorm throws
BudgetExhausted with reason='cost' and a paste-ready hint pointing at
--limit / --max-cost / --max-far-set before any chat call happens.

The four assertions are the four things a real user can verify after
the throw lands:
  1. Typed BudgetExhausted (not a generic Error)
  2. reason === 'cost' (not runtime or no_pricing)
  3. Message names the remediation flags
  4. No provider HTTP would have happened (chat.crossCalls === 0)

Uses the same PGLite engine + tinyProfile + stub chatFn as the existing
--resume tests. Hermetic; ~5s wallclock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(reindex-code): F3 --max-cost flag via withBudgetTracker

Wires gbrain reindex --code into the v0.38 budget cathedral. When the
caller passes --max-cost N (or --max-cost-usd N), runReindexCode wraps
its per-page import loop in withBudgetTracker so every gateway.embed()
call inside importCodeFile auto-composes the cap. On BudgetExhausted,
the partial-progress result reports what got reindexed before the cap
fired plus a synthetic failure row naming the cap throw.

reindex-code is idempotent (content_hash short-circuit in importCodeFile),
so a re-run after a budget abort picks up where the cap fired — no
manual checkpoint state needed.

Both --max-cost and --max-cost-usd are accepted (symmetry with brainstorm
which uses --max-cost, and a precedent for the spelling we want long-term).

When --max-cost is unset, the body runs outside any tracker scope — byte-
stable pre-F3 behavior for legacy callers.

Files:
  src/commands/reindex-code.ts:
    - ReindexCodeOpts.maxCostUsd?: number
    - runReindexCode wraps body in withBudgetTracker when set
    - runReindexCodeCli parses --max-cost / --max-cost-usd
    - BudgetExhausted caught + returned as partial-progress result
  test/reindex-code-max-cost.serial.test.ts (NEW):
    - dry-run + maxCostUsd happy path
    - empty-brain + maxCostUsd hits early-return cleanly
    - no tracker installed when cap is unset (regression guard for
      the conditional wrap)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(schema): narrow page_links view projection to bootstrap-safe columns

The v0.38 page_links view alias initially used SELECT * FROM links, which
broke the pre-v0.13 bootstrap test: applyForwardReferenceBootstrap drops
link_source + origin_page_id to simulate the pre-v0.13 schema shape, but
the SELECT * view created a dependency that blocked the column DROP.

Engine queries only reference pl.id (via COUNT(*)) and pl.to_page_id, so
the view's projection is now SELECT id, from_page_id, to_page_id FROM links
— what callers actually use, no more. This unblocks legacy-brain upgrade
paths AND keeps the bootstrap forward-reference probes safe.

Bootstrap suite: 15/15 pass after the change.

Also files a P0 TODO for a pre-existing test failure
(test/doctor-report-remote.test.ts "full report on healthy brain") that
fails on master too — out of scope for this wave but noticed during
/ship triage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version to v0.39.0.0

Brainstorm cost cathedral wave (P1-P7). MINOR bump per user direction:
new architectural seam (gateway-layer BudgetTracker via AsyncLocalStorage),
5 new modules, new CLI flags (--max-cost / --resume / --list-runs /
--force-resume), new migration v81 (page_links view alias).

No breaking changes — BudgetExhausted re-exported from orchestrator for
back-compat; --max-usd preserved as alias for --max-cost; eval-contradictions
--budget-usd surface byte-identical.

CHANGELOG entry renamed from [Unreleased] to [0.39.0.0] and adds the
mandatory "To take advantage of v0.39.0.0" block per CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(isolation): rename 3 env-mutating tests to .serial.test.ts (CI fix)

CI's `check:test-isolation` flagged three tests added in the v0.39.0.0
cathedral that directly mutate `process.env` across test boundaries:

- test/brainstorm/checkpoint.test.ts (mutates GBRAIN_HOME)
- test/core/audit-week-file.test.ts (mutates GBRAIN_AUDIT_DIR)
- test/core/remediation-checkpoint.test.ts (mutates GBRAIN_HOME)

Per CLAUDE.md rule R1: env-mutating tests either use withEnv() OR rename
to *.serial.test.ts (the quarantine escape hatch). The mutation lives in
beforeEach/afterEach which spans the whole describe block, so .serial
rename is the cleaner fix — withEnv() would require restructuring every
test. The serial-test runner gives them their own bun process; no cross-
file env races.

Verified: check:test-isolation passes (527 non-serial unit files clean),
`bun run verify` passes, all 41 tests in the three renamed files pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 22, 2026
* v0.38 plan: schema packs — bring your own shape

CEO + Eng + 3x Outside Voice review complete; 16 decisions locked,
58 codex findings folded. Design doc captures the full scope decisions
+ 12-14 week budget + 4-lane parallelization strategy + 29
implementation tasks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38 T1: open PageType + TakeKind from closed unions to string

PageType and TakeKind become `string` instead of the pre-v0.38 closed
unions. Validation moves from compile-time exhaustiveness to runtime
checks against the active schema pack (T7+). The 13 `as PageType` and
3 `as TakeKind` casts at engine + cycle + enrichment boundaries widen
to `as string` (still narrowing from `unknown` at SQL row boundaries
but no longer pretending the union is closed).

Closed PageType was already a fiction: Garry's brain has 180+ types
(apple-note, therapy-session, tweet-bundle, …) all riding `as PageType`
casts the engine never enforced. v0.38 formalizes the open shape so
schema packs can declare their own types at runtime.

test/page-type-exhaustive.test.ts rewritten for the v0.38 model:
ALL_PAGE_TYPES becomes the gbrain-base seed list (no longer an
exhaustive enum); a new test asserts the markdown surface accepts
arbitrary user-declared types (paper, researcher, therapy-session,
apple-note, tweet-bundle); assertNever stays as a generic helper for
switches over the closed PackPrimitive enum.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38 T2 (+E8, E9, D13): schema-pack module skeleton

New module src/core/schema-pack/ — 9 files implementing the v0.38
foundations:

  manifest-v1.ts     SchemaPackManifest (Zod-validated) + sha8 +
                     pack-identity (`<name>@<version>+<sha8>`)
                     per E10/codex F7.
  primitives.ts      Five closed primitives (entity/media/temporal/
                     annotation/concept) with default link verbs,
                     frontmatter fields, expert-routing, rubric,
                     extractable. Closed enum is the *new* surface
                     for compile-time exhaustiveness (assertNever
                     migrates from PageType to PackPrimitive).
  loader.ts          YAML/JSON sniffing by extension. Hand-rolled
                     YAML mini-parser (follows storage-config.ts
                     pattern; no js-yaml dep). Handles nested
                     mappings, sequences of scalars + mappings,
                     YAML flow sequences with bare words.
  closure.ts         E8 alias graph BFS. Symmetric per declaration:
                     `aliases: [other]` adds BOTH directions. Depth
                     cap 4. Cycle detection at LOAD time (codex F15
                     — prevents primitive-sibling adversary-profile
                     leak into expert queries).
  per-source.ts      D13 per-source closure CTE builder. Emits
                     deterministic SQL via UNION ALL + lex-sorted
                     source_id branches. Cache-key stable.
  candidate-audit.ts T12 codex fix — privacy-redacted by default.
                     Audit JSONL stores SHA-8 type hashes,
                     slug_prefix (first segment only), frontmatter
                     KEY names (never values). GBRAIN_SCHEMA_AUDIT_
                     VERBOSE=1 opts into full type names. ISO-week
                     rotation; honors GBRAIN_AUDIT_DIR.
  redos-guard.ts     E6/E9 ReDoS defense. vm.runInContext({timeout:
                     50}) primary path; LINK_EXTRACTION_TOTAL_
                     BUDGET_MS=500 per-page cap. PageRegexBudget
                     class tracks cumulative regex time; degrades
                     to mentions on exhaust (deterministic lex
                     order). T24 spike confirms Bun behavior.
  registry.ts        D13 7-tier resolution chain (per-call CLI-only
                     trust-gated → env → per-source-db → brain-db
                     → gbrain.yml → home-config → gbrain-base
                     default). resolvePack walks extends chain with
                     E4 soft-warn-at-4 + hard-cap-at-8. In-memory
                     cache keyed on pack identity (manifest sha8).
  index.ts           Public exports barrel for downstream Phase B
                     refactors.

Test: 38 cases pinning the contracts (alias graph symmetric per
declaration, E8 adversary-profile-excluded regression, transitive
depth cap, cycle reject at load, CTE deterministic ordering, 7-tier
resolver including D13 trust-gate, YAML round-trip JSON+YAML+flow
sequences, sha8 determinism, primitive defaults). All hermetic; uses
withEnv() per the test-isolation lint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38 T24: Bun vm.runInContext timeout spike (E9 prerequisite)

E6 locked vm.runInContext({timeout: 50}) as the ReDoS guard. E9
required verifying Bun's vm timeout actually interrupts catastrophic
regex before trusting it in production. This spike runs `^(a+)+$`
against 1MB of 'a' to confirm the timeout fires.

Verdict on Bun 1.3.13 (macOS arm64): PASS — vm.runInContext throws
"Script execution timed out after 50ms" within ~507ms wall-clock for
the test pattern. Wall-clock is ~10x configured timeout because Bun
checks timeout at instruction boundaries and tight backtracking loops
yield infrequently. The per-page budget (500ms cumulative in
redos-guard.ts) absorbs this: ONE catastrophic regex burns the budget,
ALL remaining verbs on that page degrade to mentions per design.
Total CPU per page bounded regardless of pathological pattern count.

Re-run this spike on Bun version bumps: `bun scripts/spike-bun-vm-
timeout.ts`. Exit 0 = production path safe; exit 1 = fall back to
E6 option B (persistent worker pool).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38 T3 + T4 + T28 + E11: migrations v80 + v81 (takes.kind + eval_candidates)

Migration v80 (T3 + codex T10): drops `takes_kind_check` CONSTRAINT
from the takes table on both engines. Pre-v0.38, kind values were
enforced by a closed DB CHECK (fact|take|bet|hunch) AND a closed TS
union. v0.38 widens both layers together — DB CHECK dropped here;
TS type widened in the prior T1 commit. Runtime validation moves to
the active schema pack's annotation primitive `takes_kinds:` field.
Existing brains see no change (gbrain-base seeds the same 4 values);
schema packs extend to {finding, hypothesis, observation, …} per
domain.

Migration v81 (T4 + T28 + E11 inline canonical snapshot): adds
`eval_candidates.schema_pack_per_source JSONB NULL`. Per-row shape:

  {
    "<source_id>": {
      "pack_name": "garry-pack",
      "pack_version": "1.2.0",
      "manifest_sha8": "ab12cd34",
      "alias_closure_resolved": {"person": ["person","researcher"], ...}
    }, ...
  }

The inline `alias_closure_resolved` is the codex F8/E11 fix — replay
self-contained so a pack file deletion can't break a year-old eval.
~1KB per row, ~10MB/year for a heavy user. Pack identity =
`<pack-name>@<version>+<manifest_sha8>` (codex F7). Replay fails
closed on version-drift unless --use-captured-snapshot.

Tests:
  - test/v80-v81-smoke.test.ts (3 cases) — pins the drop + add via
    real PGLite engine round-trip. Inserts a 'finding' kind take
    (pre-v80 would have failed CHECK); verifies the new JSONB column
    accepts the canonical snapshot shape.
  - test/schema-bootstrap-coverage.test.ts — adds
    eval_candidates.schema_pack_per_source to COLUMN_EXEMPTIONS
    (no forward-reference index in PGLITE_SCHEMA_SQL so bootstrap
    probe isn't required).

Numbering: v77 + v78 were claimed by v0.37 waves (skillpack-registry
+ cross-modal). v79 was claimed by v0.37.1.0 brainstorm/lsd. v80 +
v81 are the next available slots.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38 T5 + T25: gbrain-base.yaml + codegen validator + parity gate

`src/core/schema-pack/base/gbrain-base.yaml` is the universal starter
pack — every brain inherits gbrain-base by default unless it explicitly
opts out via `extends: null`. Existing brains see ZERO behavior change
after upgrade: the YAML reproduces pre-v0.38 hardcoded behavior
byte-for-byte across:

  - All 22 ALL_PAGE_TYPES seed entries with primitive classifications
    matching the pre-v0.38 inferType + enrichment routing
  - inferType path-prefix table (people/, companies/, deals/, …)
  - inferLinkType verb regexes (founded/invested_in/advises/works_at
    + meeting→attended + image→image_of)
  - FRONTMATTER_LINK_MAP entries (person:company → works_at, etc.)
  - takes_kinds = {fact, take, bet, hunch} (replaces the v41/v48 CHECK)
  - person + company are the only expert_routing defaults (replaces
    whoknows DEFAULT_TYPES + find_experts SQL hardcodes)
  - Empty alias graph (codex F8 + E8 — gbrain-base ships with NO
    alias edges so existing search semantics are unchanged; users
    opt into aliases via schema review-candidates)

scripts/generate-gbrain-base.ts is the codegen validator (T5+T25 +
codex F21 determinism gate). v0.38 ships hand-maintained YAML
validated by this script:
  - Re-loads gbrain-base.yaml and asserts manifest validates
  - Asserts every ALL_PAGE_TYPES seed has a matching page_type entry
  - Asserts re-load produces consistent page_type count
  - Run: `bun scripts/generate-gbrain-base.ts`
  - Exits 0 on PASS, 1 on drift, 2 on script error

test/regressions/gbrain-base-equivalence.test.ts is the CI-blocking
parity gate (8 cases pinning ALL_PAGE_TYPES coverage, takes_kinds
exact match, person+company expert_routing, inferType path mappings,
FRONTMATTER_LINK_MAP key entries, inferLinkType verb regexes, empty
alias graph by default, codegen consistency in-process). If this test
fails, gbrain-base.yaml drifted from the source-of-truth constants.

Loader fix: YAML mini-parser extended to handle flow sequences with
bare words (`[company, companies]`) — previously only accepted
JSON-quoted variants. Tests in T2 commit cover this.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38 T-LaneB1 + E2 + T26: src/core/distribution/ shared-helpers boundary

E2 promotes the shared distribution surface (tarball, trust-prompt,
registry-client, remote-source, registry-schema, scaffold-third-party)
from src/core/skillpack/ to a named src/core/distribution/ module.
Schema-pack (v0.38) and skillpack (v0.37) both consume these helpers
— the new module makes that reuse explicit instead of forcing
schema-pack code to import from a skillpack-named module.

Physical layout (eng-review E2 Option B): the implementations stay
at src/core/skillpack/ to avoid a big-bang move that would touch ~15
v0.37 callers and risk breaking the just-shipped skillpack pipeline.
src/core/distribution/index.ts is a re-export barrel — schema-pack
imports from the canonical name; v0.37 internals stay where they are.
A v0.39+ pass may physically move the implementations if signal
warrants it.

T26 (codex F6 + F25) — src/core/distribution/ has a strict import
boundary: MAY only import from `../skillpack/` and node built-ins.
Forbidden from importing src/commands/, src/core/schema-pack/,
engines, or config resolution. The boundary is pinned by a source-
text grep in test/distribution-import-boundary.test.ts — if a future
edit adds a forbidden import, the test fails loud before the bad
module shape lands in `bun run verify`.

Re-exported surface:
  Tarball: extractTarball, packTarball, fileSha256,
    DEFAULT_EXTRACT_CAPS, TarballError, TarballExtractResult,
    TarballPackOptions/Result, ExtractCaps, TarballErrorCode
  Trust: askTrust, renderIdentityBlock, AskTrustOptions,
    SkillpackTier, TrustPromptInput/Decision
  Registry HTTP: loadRegistry, findPack, findPackWithTier,
    searchPacks, resolveRegistryUrl, DEFAULT_REGISTRY_URL,
    DEFAULT_ENDORSEMENTS_URL, RegistryClientError,
    LoadRegistryOptions, LoadedRegistry, RegistryClientErrorCode
  Remote source: resolveSource, classifySpec, RemoteSourceError,
    ResolvedSource, ResolveSourceOptions, SpecKind, ResolvedSourceKind
  Registry schema: REGISTRY_SCHEMA_VERSION (v1),
    ENDORSEMENTS_SCHEMA_VERSION (v1), RegistryCatalog,
    EndorsementsFile, validateRegistryCatalog,
    validateEndorsementsFile, validateRegistryEntry, effectiveTier,
    RegistryEntry, RegistrySource, RegistryBundles, RegistryTier,
    EndorsementRecord, RegistrySchemaError, RegistrySchemaErrorCode
  Scaffold pipeline: runScaffoldThirdParty, defaultStatePath,
    ScaffoldThirdPartyError, ScaffoldThirdPartyOptions/Result/Status

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38 T-AP: active-pack boundary loader

`src/core/schema-pack/load-active.ts` is the boundary helper Phase B
consumers call from operations.ts + engines + cycle handlers. It
composes:
  1. 7-tier resolution chain (registry.resolveActivePackName)
  2. Disk-backed pack manifest loading
     - gbrain-base from bundled src/core/schema-pack/base/gbrain-base.yaml
     - User packs from ~/.gbrain/schema-packs/<name>/pack.{yaml,yml,json}
  3. extends-chain resolution + alias-graph build (registry.resolvePack)

Returns a `ResolvedPack` with stable pack identity (`<name>@<version>+
<manifest_sha8>`). In-process cached by identity; cache invalidated by
manifest content change.

Trust gate: per-call schema_pack opt (tier 1) is honored ONLY when
`remote === false`. Operations.ts handles the explicit
permission_denied rejection for remote callers BEFORE invoking this
helper (T8). This loader assumes the input is already-vetted.

Test seam: `__setPackLocatorForTests(locator)` lets tests inject
synthetic packs without writing to ~/.gbrain. Paired
`_resetPackLocatorForTests` in afterAll prevents leak across files.
`resolveActivePackNameOnly` returns just the name + tier source for
`gbrain schema active` provenance display without paying the load cost.

config.ts: GBrainConfig gains `schema_pack?: string` (tier-6 file-plane
field). Edit ~/.gbrain/config.json directly; tier 4 (`gbrain config
set schema_pack <name>`) writes the DB plane and beats the file.

Test: 9 cases covering default-tier-7 gbrain-base load, tier-1
per-call resolution, tier-1 trust gate rejection on remote=true,
tier-2 GBRAIN_SCHEMA_PACK env override (via withEnv()), tier-3
per-source DB config priority, UnknownPackError when pack missing,
injected locator end-to-end with a tempfile-backed pack, identity
stability across reloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38 T8 + D13: schema_pack per-call trust gate

`src/core/schema-pack/op-trust-gate.ts` is the operations-layer
defense for the per-call `schema_pack` opt (tier 1 of the 7-tier
resolution chain in registry.ts). D13 + codex F4 — remote/MCP callers
passing `schema_pack` even with read+write scope could broaden their
effective read closure or escape strict-mode validation. The
v0.26.9 + v0.34.1.0 trust-boundary hardening waves explicitly closed
this attack class for source_id; v0.38 re-applies the same posture.

Two exports:
  validateSchemaPackTrustGate(ctx, schemaPackParam) — pure validator
    that returns the validated pack name or undefined; throws
    SchemaPackTrustGateError (code: 'permission_denied') on:
      - ctx.remote !== false AND schemaPackParam is set (fail-closed)
      - schemaPackParam is non-string + non-null/undefined
    Op handlers call this once at entry against their declared params.

  loadActivePackForOp(ctx, params) — convenience wrapper that does
    the trust gate AND loads the resolved active pack in one call.
    Threads sourceId from sourceScopeOpts(ctx) into the resolution.
    Returns ResolvedPack.

Fail-closed default per v0.26.9 F7b: `ctx.remote === undefined` is
treated as remote/untrusted. Only the literal `false` is the CLI
escape hatch. Casts via `as any` or `Partial<>` spreads can't downgrade
trust by accident.

Test (test/schema-pack-trust-boundary.test.ts, 8 cases):
  - CLI (remote=false) accepts per-call freely
  - MCP (remote=true) rejects with SchemaPackTrustGateError
  - Fail-closed: undefined remote rejects
  - undefined/null per-call is a no-op (returns undefined)
  - Non-string per-call rejects with type error
  - Error envelope carries `code: 'permission_denied'` for the
    dispatch layer to surface uniformly
  - Error message names ALL safe channels (gbrain.yml,
    GBRAIN_SCHEMA_PACK env, ~/.gbrain/config.json, `gbrain config
    set schema_pack`) so an MCP operator can self-serve.

The wider op-handler wiring (each query/search/list_pages/find_experts/
traverse/put_page handler calling loadActivePackForOp + threading the
pack into engine queries) lands in T6/T7 alongside the per-source CTE
and inferType refactors. T8 lands the trust gate primitive in
isolation so future handler-by-handler wiring stays mechanical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38 T7a: pack-aware inferType + gbrain-base.yaml priority reorder

`inferTypeFromPack(filePath, manifest)` is the new pack-aware path →
type primitive. Async/import-aware callers (import-file.ts, sync.ts,
cycle phases) can switch to this variant in subsequent commits to
honor user-declared types in their active pack. Existing
`inferType(filePath)` stays as a synchronous wrapper around the
GBRAIN_BASE_PATH_PREFIXES table that mirrors gbrain-base.yaml exactly.

Caught a real parity bug in gbrain-base.yaml: the YAML emitted page
types in ALL_PAGE_TYPES order, but pre-v0.38 inferType ran in a
SPECIFIC PRIORITY ORDER. `projects/blog/writing/essay.md` should
resolve to `writing` (writing/ wins over projects/ as a stronger
signal), but pack-driven iteration in ALL_PAGE_TYPES order returned
`project` first. Reorder gbrain-base.yaml so the priority chain
preserves pre-v0.38 behavior:

  1. writing → wiki/{analysis,guides,hardware,architecture} → concept
     (wiki subtypes scan FIRST; stronger signal than ancestor dirs)
  2. Ancestor entities: person/company/deal/yc/civic/project/source/media
  3. BrainBench v1 amara-life-v1 corpus: email/slack/calendar-event/note/meeting
  4. No-prefix types (set via frontmatter): code/image/synthesis

Parity is now CI-pinned by test/infer-type-pack.test.ts which:
  - asserts inferTypeFromPack(path, gbrain-base) matches parseMarkdown's
    legacy type inference for 21 representative paths
  - verifies a synthetic research pack with `researchers/` + `papers/`
    routes correctly to user-declared types
  - verifies empty `page_types` arrays fall back to gbrain-base defaults
  - covers undefined filePath + case-insensitive matching

gbrain-base-equivalence.test.ts continues to pass (the path-prefix
spot-checks didn't care about ordering — they just verified each
mapping exists).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38 T7b: pack-aware inferLinkType + frontmatter_link primitives

`src/core/schema-pack/link-inference.ts` adds two new primitives:
  inferLinkTypeFromPack(pack, pageType, context, budget?)
  frontmatterLinkTypeFromPack(pack, pageType, fieldName)

Pre-v0.38 `inferLinkType` (in link-extraction.ts) uses richly tuned
production regexes (FOUNDED_RE / INVESTED_RE / ADVISES_RE / WORKS_AT_RE
+ page-role priors) refined against real brain content. Reproducing
those literally in gbrain-base.yaml would require multi-line YAML
escape jujitsu and lose the WHY comments. Pragmatic split:

  - gbrain-base.yaml carries verb NAMES + simplified sketch regexes.
    Community-pack authors copy this pattern; gbrain-base provides
    documentation-grade examples.
  - Production matching for built-in verbs stays in link-extraction.ts
    via the rich FOUNDED_RE / INVESTED_RE / ... constants. Legacy
    `inferLinkType` continues to work exactly as before.
  - `inferLinkTypeFromPack` CONSULTS pack-declared verbs in addition
    to legacy. Pack matches win (user opts in deliberately); fall
    through to legacy `inferLinkType` when no pack rule fires.

Resolution order in inferLinkTypeFromPack:
  1. Page-type-bound verbs from pack (meeting → attended,
     image → image_of). Declared via inference.page_type.
  2. Pack-declared regex matchers, in manifest declaration order
     (first match wins). Runs under PageRegexBudget when one is
     passed — cumulative regex time on the page stays capped at
     LINK_EXTRACTION_TOTAL_BUDGET_MS (500ms) per E9.
  3. Returns null on no match — caller falls through to legacy
     `inferLinkType` for built-in matchers (founded / invested_in /
     advises / works_at + person→company priors).

Malformed regex in a pack returns null gracefully (skip + continue
to next link_type) — defense in depth on top of load-time validation.

frontmatterLinkTypeFromPack mirrors the legacy FRONTMATTER_LINK_MAP
walk: iterates pack.frontmatter_links in declaration order; first
(page_type, field) match wins; returns null on no match.

Test (test/link-inference-pack.test.ts, 10 cases):
  - meeting → attended via page_type binding
  - image → image_of via page_type binding
  - regex matchers: supports / weakens / cites
  - returns null when no rule fires (caller fall-through contract)
  - declaration order: first match wins
  - PageRegexBudget integration (regex time accounted toward cap)
  - legacy inferLinkType still resolves founded / invested_in /
    advises independently (pack-aware path doesn't break legacy)
  - malformed regex returns null gracefully
  - frontmatterLinkTypeFromPack: person:company → works_at,
    meeting:attendees → attended, plus negative cases

Phase B follow-up: callers in extract.ts / sync.ts / cycle phases
that want to honor user-declared verbs call inferLinkTypeFromPack
first then inferLinkType. T7b lands the primitive; per-call-site
adoption is mechanical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38 T_W: pack-driven expert types for whoknows / find_experts

`expertTypesFromPack(pack)` returns the list of pack-declared types
with `expert_routing: true`, in manifest declaration order. Replaces
the pre-v0.38 hardcoded `DEFAULT_TYPES = ['person', 'company']` in
whoknows.ts:89 (and the matching ['person','company'] literals in
postgres-engine.ts:3451+3482 and pglite-engine.ts:3489+3523 — codex
finding #3's named sites).

gbrain-base preserves person + company as expert_routing defaults, so
existing whoknows behavior is byte-for-byte unchanged. Research packs
declaring `researcher` + `principal-investigator` with
`expert_routing: true` get those types routed automatically.

Two variants:
  expertTypesFromPack(pack) — returns array, possibly empty
  expertTypesFromPackOrThrow(pack) — throws clear error on empty so
    the whoknows CLI entrypoint surfaces "this pack doesn't support
    expert routing — switch packs or edit the manifest" instead of
    silently returning zero results

Test (test/expert-types-pack.test.ts, 6 cases):
  - gbrain-base parity: returns [person, company]
  - Research pack: returns researcher + principal-investigator
  - Declaration order preserved (NOT sorted)
  - Empty array when no expert_routing types declared
  - OrThrow variant throws on empty with paste-ready hint
  - OrThrow variant passes when types exist

Phase B follow-up wires whoknows.ts + postgres-engine + pglite-engine
to call expertTypesFromPack(activePack) instead of the hardcoded
DEFAULT_TYPES literal. T_W lands the primitive in isolation; per-call-
site adoption is mechanical and per-engine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38 T7d: pack-driven facts extractable types + gbrain-base.yaml fix

Adds `extractableTypesFromPack(pack)` + `isExtractableType(pack, type)`
primitives. Replaces the hardcoded ELIGIBLE_TYPES list at
src/core/facts/eligibility.ts:51 with pack-driven lookup. gbrain-base
preserves the exact 7 legacy types — note, meeting, slack, email,
calendar-event, source, writing — so existing facts extraction
behavior is byte-for-byte unchanged.

Also fixes gbrain-base.yaml extractable flags. The original codegen
emitted incorrect defaults (person/company/deal marked extractable,
note/slack/email/calendar-event/source/writing marked NOT extractable).
Adjusted to match the legacy ELIGIBLE_TYPES list exactly:
  - writing: true (was false)
  - source: true (was false)
  - email: true (was false)
  - slack: true (was false)
  - calendar-event: true (was false)
  - note: true (was false)
  - meeting: true (was already true)
  - person/company/deal: false (entities, not facts-eligible content)

Tests (test/extractable-pack.test.ts, 4 cases):
  - gbrain-base extractable Set exactly matches legacy 7 types
  - Per-type isExtractableType lookups parity
  - research-state pack with paper + claim + finding extractable
  - Empty page_types returns empty Set

Phase B follow-up wires facts/eligibility.ts to call
extractableTypesFromPack(activePack) instead of the hardcoded
ELIGIBLE_TYPES literal. T7d lands the primitive in isolation; per-call-
site adoption is mechanical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38 T_E: pack-driven enrichable types + rubric routing

`enrichableTypesFromPack(pack)` + `rubricNameForType(pack, type)` primitives
replace the hardcoded ['person', 'company', 'deal'] in
src/core/enrichment-service.ts:25 + src/core/enrichment/completeness.ts:221
RUBRICS_BY_TYPE map.

gbrain-base preserves person + company + deal as enrichable defaults
with rubric slots person-default / company-default / deal-default —
existing enrichment behavior unchanged. Custom packs (research-state,
legal, product) override with domain-specific entities.

Design note: the pack manifest declares rubric NAMES, not rubric
BODIES. Rubric implementations stay in-source at
src/core/enrichment/completeness.ts where they're authored
deterministically. Serializing rubric structure into YAML would
require multi-page schemas; the name-to-implementation indirection
keeps the YAML manifest small and rubric authoring stays where
linters + tests already cover it.

Test (test/enrichable-pack.test.ts, 4 cases):
  - gbrain-base parity: person + company + deal enrichable
  - rubricNameForType returns the declared slot name
  - returns null for non-enrichable types
  - custom research pack overrides cleanly

Phase B follow-up wires enrichment-service + completeness.ts to call
enrichableTypesFromPack(activePack) instead of the hardcoded literal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.38 Phase C: gbrain schema CLI (active|list|show|validate|use)

User-facing CLI surface that exposes the v0.38 schema-pack engine.
Five essential subcommands ship in v0.38:

  gbrain schema active                Show resolved pack + tier source
  gbrain schema list                  List bundled + installed packs
  gbrain schema show [<pack>]         Pretty-print manifest (default: active)
  gbrain schema validate [<pack>]     Validate manifest shape
  gbrain schema use <pack>            Activate pack (file-plane, tier 6)

Deferred to v0.39+ (mechanical follow-up — primitives are in place):
  init, fork, edit, diff, detect, suggest, review-candidates,
  review-orphans, graph, lint, explain

`gbrain schema use <name>` writes to ~/.gbrain/config.json's schema_pack
field (tier 6 in the 7-tier resolution chain). DB-plane tier 4
(`gbrain config set schema_pack <name>`) and env tier 2
(GBRAIN_SCHEMA_PACK) still beat tier 6 for runtime overrides without
editing the file.

Dispatch lives in handleCliOnly (no engine connect needed — schema
commands are pure file I/O). Added 'schema' to CLI_ONLY allowlist
so the dispatcher doesn't reject it.

The `use` path runs validation BEFORE writing — refuses to activate
a malformed pack. The `show` and `validate` commands accept either an
explicit pack name or default to the active pack.

Test (test/schema-cli.test.ts, 8 cases via Bun subprocess):
  - list shows bundled gbrain-base
  - show gbrain-base prints 22 page types + 12 link verbs + takes_kinds
  - validate gbrain-base passes
  - active reports default resolution + pack identity
  - unknown pack errors with paste-ready hint
  - unknown subcommand exits 2 with usage hint
  - `schema use` without arg shows usage

End-to-end smoke against the real bundled gbrain-base.yaml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.38.0.0)

v0.38.0.0 — Schema Packs: Bring Your Own Shape

PageType opens from closed 23-element union to `string`. Schema packs
declare your domain (types, link verbs, expert routing, facts
eligibility, enrichment rubrics) and the engine consults the active
pack instead of hardcoded literals.

Phase A (engine flex foundation) + Phase B foundational primitives +
Phase C minimal CLI surface, all landed as 16 atomic bisect-friendly
commits. 95+ new tests across 12 test files. Existing brains see
zero change after upgrade (gbrain-base reproduces pre-v0.38 behavior
byte-for-byte).

16 decisions locked through CEO + Eng + 3x Outside Voice review.
58 codex findings folded.

Phase B per-call-site wiring, Phase C CLI follow-ups (detect/suggest/
init/fork/diff/graph/lint/explain), and Phase D (7 example packs +
distribution + docs) follow in subsequent waves. Primitives are in
place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brainstorm): T1 cost guardrails + judge chunking + far-set cap

Ports PR #1234 with a typed-error swap (Q2). Brings:

- `--max-cost`, `--max-far-set`, `--strict-budget`, `--judge-model`,
  `--max-ideas-per-judge-call` CLI flags on `gbrain brainstorm` / `lsd`
- Domain-bank prefix-cap + shuffle + final-trim to `m` by distance score
- Judge auto-chunks idea sets > 100 across multiple LLM calls
- UTF-16 surrogate sanitization on cross prompts
- Phase-0.5 hard cost ceiling + mid-run cost guard

Phase-1 diff from PR #1234: per-cross error-rethrow uses inline typed
`BudgetExhausted` instead of string-match on the error message. Phase 2
of the wave will move the class to `src/core/budget/budget-tracker.ts`
and the orchestrator will import it.

Postmortem doc + 12-case regression test included verbatim from #1234.

T1 of the brainstorm cost cathedral plan
(~/.claude/plans/system-instruction-you-are-working-rippling-moth.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(budget): T2 BudgetTracker + BudgetExhausted + audit-week helper

The keystone primitive for the v0.37.x budget cathedral. One class,
one typed error, one schema-stable audit JSONL. Replaces three parallel
copies (brainstorm orchestrator inline class, cycle/budget-meter,
eval-contradictions cost-prompt/tracker) — those adapt to this one in
T5/T6.

Contracts pinned by 26 unit tests:
  - TX1: record() throws BudgetExhausted(reason:'cost') when cumulative
    spend > cap. A single underestimated call cannot leak past the cap.
  - TX2: reserve() hard-fails with BudgetExhausted(reason:'no_pricing')
    when cap is set + model is missing from pricing maps. When cap is
    unset, legacy warn-once behavior is preserved.
  - A3 amended: extractUsageFromError(err, fallback) returns err.usage
    when SDK provides it, else the pessimistic fallback (caller passes
    maxOutputTokens, not the optimistic pre-call estimate).
  - onExhausted callback fires once, synchronously, before the throw
    propagates. Callbacks do sync I/O (writeFileSync) for checkpoint
    persistence.
  - Audit JSONL is schema-stable: every line carries schema_version=1.
    Reorderings tolerated, field renames are breaking.

Also ships src/core/audit-week-file.ts — the shared ISO-week filename
helper consumed by every audit writer in T4. Year-boundary correctness
pinned by 5 cases including 2020-W53 (the 53-week year), 2025-W01
rolling in from 2024-12-30 (Monday), and the GBRAIN_AUDIT_DIR override.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gateway): T3 withBudgetTracker + AsyncLocalStorage composition

TX5: every gateway.chat / embed / rerank call now auto-composes the
active BudgetTracker via a module-internal AsyncLocalStorage. No
per-call injection seam, no flag plumbing — callers wrap their
entrypoint in `withBudgetTracker(tracker, async () => { ... })` and
every downstream LLM call honors the cap.

Outside any scope, the gateway is a budget no-op (back-compat with the
pre-v0.37 contract).

Wiring:
  - chat(): reserves on entry using prompt-char heuristic + opts.maxTokens.
    Records actual usage from result.usage on success; on failure, charges
    the pessimistic A3-amended fallback so the cap is real.
  - embed(): reserves total estimated input tokens (chars / chars-per-token).
    Records the same total in try/finally; SDK doesn't surface per-batch
    embed token counts.
  - rerank(): reserves and records query + docs char count.
    Reranker pricing isn't in the canonical map yet, so reserve() takes
    the warn-once path under no-cap and the TX2 hard-fail under cap.

6 unit cases pin the contract: chat auto-composes, outside-scope is
no-op, nested scope restores outer, over-cap reserve throws BEFORE
provider call (proves circuit breaker), TX1 mid-run cumulative cap
fires via record(), parallel Promise.all scopes do not bleed trackers.

All 255 existing gateway tests and 50 brainstorm tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(audit): T4 migrate 4 audit writers to shared isoWeekFilename helper

Q1: extract the ISO-week filename math into one canonical helper
(src/core/audit-week-file.ts, landed in T2) and migrate every audit
JSONL writer in the codebase to consume it.

Sites migrated:
  - src/core/minions/handlers/shell-audit.ts  (shell-jobs-YYYY-Www.jsonl)
  - src/core/facts/phantom-audit.ts            (phantoms-YYYY-Www.jsonl)
  - src/core/audit-slug-fallback.ts            (slug-fallback-YYYY-Www.jsonl)
  - src/core/cycle/budget-meter.ts             (dream-budget-YYYY-Www.jsonl)

Each call site had its own copy of the ISO-week-from-Date algorithm.
They mostly agreed but subtle drift was already accumulating (one used
local time, one approximated the Thursday-anchor formula, etc.). One
helper, one set of regression tests, no drift.

Compute helpers (computeAuditFilename, computePhantomAuditFilename,
computeSlugFallbackAuditFilename) are preserved as thin wrappers so
existing import sites and tests don't break.

All audit + slug-fallback + phantom + budget-meter tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cycle): T5 BudgetMeter schema_version=1 + golden fixture (A2 amended)

Adapter pass: the existing BudgetMeter keeps its public shape
(`BudgetMeter`, `SubmitEstimate`, `BudgetCheckResult`) verbatim so every
dream-cycle call site keeps working without rewires. The audit JSONL
grew one new field on every line: `schema_version: 1`.

A2 amended: the codex outside-voice review relaxed the byte-stable
contract to schema-stable. Field reorderings are tolerated; the
documented set (schema_version, ts, phase, event, model, label,
plus per-event cost or token fields) is what every consumer can rely
on. Renames or removals are breaking.

test/fixtures/dream-budget-schema-v1.jsonl carries one canonical row
per event variant (submit / submit_denied / submit_unpriced) as
documentation of the schema. The new in-suite case in
test/budget-meter.test.ts walks every emitted line and asserts the
fields are present + the right type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval): T6 wrap eval-contradictions runner in withBudgetTracker

The runner now installs a BudgetTracker scope around its body so every
gateway-layer chat / embed / rerank call (the judge model + per-query
embedding) auto-records via the AsyncLocalStorage from T3. Currently
telemetry-only — the existing CostTracker remains the primary soft-
ceiling enforcement, so the public --budget-usd surface and
PreFlightBudgetError shape are byte-identical.

The wiring is the seam: future waves can promote the cap to BudgetTracker
semantics (TX1 + TX2 semantics on cumulative + no_pricing) by passing
maxCostUsd through to BudgetTracker without touching the CLI.

All 79 eval-contradictions tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(doctor): T7 --remediate budget tracker + checkpoint + --resume (A4)

A4 amended: doctor --remediate gains a resumable cost ceiling. The
runRemediate loop now runs inside `withBudgetTracker(tracker, ...)` so
every gateway-routed LLM call inside a Minion handler (synthesize,
patterns, consolidate, embed) honors the cap. When BudgetExhausted
fires mid-run, the onExhausted callback persists a checkpoint of
completed step ids + idempotency_keys to
~/.gbrain/remediation/<plan_hash>.json BEFORE the throw propagates,
and the catch surfaces a paste-ready --resume hint.

Wire-up:
  - New --resume <plan_hash> flag (with implicit "most recent matching"
    when no hash given) loads the checkpoint and skips already-
    completed steps. Mismatched plan_hash refuses with an explicit
    message.
  - --max-cost is now an alias for --max-usd. Both spellings honored
    and threaded through to BudgetTracker.maxCostUsd so the cap is
    a real ceiling, not just pre-flight advice.
  - On BudgetExhausted, exit 1 with the resume hint; on clean
    completion, clear the checkpoint.

New file: src/core/remediation-checkpoint.ts with
computePlanHash / save / load / list / clear helpers. Atomic write
via .tmp + rename. Pinned by 13 unit cases including determinism +
sort-order invariance + schema-mismatch return-null + atomic-rename.

All 48 doctor.test.ts cases still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(subagent): T8 A1 ordering ASCII diagram before acquireLease

Documents the load-bearing ordering invariant: the gateway's
BudgetTracker reserve() runs (implicitly, via AsyncLocalStorage)
BEFORE acquireLease() inside the subagent loop. A BudgetExhausted
throw must NOT consume a rate-lease slot, because the lease is the
rate-limit pacer for the entire fleet.

The handler body intentionally does NOT explicitly thread BudgetTracker;
TX5 (gateway-layer composition) handles that. The comment is the
reader's signpost.

No behavioral change. All 58 subagent tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(diarize): T9 payload-fitter (P6) with batch + summarize + gate

Generic utility for fitting arbitrarily-large item lists into a
downstream caller's per-call token budget. Two strategies:

  - 'batch': deterministic token-budgeted chunking. No LLM calls. The
    fitted list shape matches the input; the caller decides how to
    consume it (e.g. brainstorm judge concatenates per-chunk results).
    Surfaces a `dropped` count for items that exceed the per-call cap.

  - 'summarize': embed-cluster into ceil(items/4) groups via cheap
    deterministic nearest-neighbor on cosine; Haiku-summarize each
    cluster via Promise.allSettled at parallelism=4 (Perf1). Each
    Haiku call composes the active BudgetTracker via the gateway's
    AsyncLocalStorage scope (T3) — no per-call injection.

Quality gate (codex outside-voice finding #4): when summarize's
success_ratio < min_success_ratio (default 0.75), the result is
flagged `degraded: true` so the caller (brainstorm) can decide to
surface a partial result or abort. The fitter itself preserves the
successful subset either way.

Tested via 4 cases across two files (T3 contract):
  - happy path (all clusters succeed → degraded=false)
  - partial failure tolerated (1/5 fails, success_ratio=0.8 > 0.75 → degraded=false)
  - high-failure rate flips the gate (3/5 fails → degraded=true)
  - budget-respecting (BudgetExhausted thrown mid-cluster propagates
    via Promise.allSettled)

11 unit cases across batch + summarize. Brainstorm + cost-guardrails
tests still green; judges.ts internal chunking deferred to a follow-up
wave (TODOS) so the existing chunked-batch contract stays byte-stable
during this drop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brainstorm): T10 checkpoint + --resume with full idea bodies (P7)

The brainstorm cathedral capstone. Crashed runs can resume cleanly via
`gbrain brainstorm --resume <run_id>` (and `gbrain lsd --resume` etc).

TX3 load-bearing contract: completed_crosses on disk carries FULL idea
bodies (~50KB per run), not just counts. The resumed BrainstormResult
contains the pre-crash ideas (loaded from disk) merged with the post-
resume ideas — codex's outside-voice finding was that a resume that
produces only "what we generated this run" is silent partial output.

TX4 single rule: --resume continues any cross not in completed_crosses.
The proposed --retry-failed was dropped per codex review; failed AND
never-attempted crosses both go through --resume.

A5 amended: run_id = sha256(question + profile + sort(close_slugs) +
sort(far_slugs)).slice(0,16). NO embedding bits — stable across
embedding-model swaps. 7-day mtime-based GC.

Q2 fold: orchestrator.ts drops its inline BudgetExhausted class and
re-exports the canonical one from src/core/budget/budget-tracker.ts
(Phase 2). runBrainstorm now wraps the body in withBudgetTracker so
every gateway-layer chat call auto-records cost. The cap remains
opts.maxCostUsd (default $5).

New CLI flags:
  --resume <run_id>   Continue any cross not in completed_crosses.
                      Refuses to start when run_id doesn't match the
                      active inputs (paste-ready hint).
  --force-resume      Bypass the 7-day staleness gate.
  --list-runs         Print saved run_ids and exit.

Cycle purge phase (the 9th cycle phase) now also GCs stale brainstorm
checkpoints alongside op_checkpoints (~7d window).

Tests:
  - 20 unit cases in test/brainstorm/checkpoint.test.ts:
    computeRunId is deterministic + slug-array-order invariant + stable
    across embedding-model swaps; round-trip preserves ideas verbatim;
    saveCheckpoint atomic via .tmp+rename; loadCheckpoint returns null
    on missing/schema-mismatch/corrupt-JSON; gcStaleCheckpoints unlinks
    >N days; listRuns mtime-ordered.
  - 3 E2E cases in test/e2e/brainstorm-resume.test.ts:
    crash on cross 4 → first run aborts with checkpoint of crosses 1..N
    with full idea bodies; second run with resumeRunId merges pre-crash
    + post-resume ideas (TX3 contract); mismatched run_id refuses with
    paste-ready hint.

The PGLite schema-gap workaround in the E2E (CREATE VIEW page_links AS
SELECT * FROM links) is filed as a follow-up in TODOS T12 — the
real-engine brainstorm path needs that view to materialize as a
canonical schema fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: T11 + T12 wave release docs + deferred follow-ups

CHANGELOG entry for the brainstorm cost cathedral (Unreleased slot;
/ship will assign the next version):
  - ELI10 lead per CLAUDE.md voice rules
  - "How to turn it on" with paste-ready commands
  - "Things to watch" calls out the A4 semantic shift for
    `doctor --remediate --max-usd` (pre-flight → mid-run abort
    with resumable checkpoint)
  - Itemized changes by file/area
  - "For contributors" section noting the 73 new tests + the PGLite
    schema-gap workaround for the E2E

CLAUDE.md Key Files: 6 new entries for budget-tracker, audit-week-file,
gateway withBudgetTracker, payload-fitter, brainstorm/checkpoint,
remediation-checkpoint. Regenerated llms-full.txt + llms.txt (passes
test/build-llms.test.ts).

docs/incidents/2026-05-20-lsd-cost-explosion.md gains a closing
"Shipped in v0.37.x (the budget cathedral wave)" section listing P1-P7
completion status + the deferred follow-ups so the incident's audit
trail closes the loop.

TODOS.md gets a new top section for the wave's deferred items:
  - PGLite `page_links` schema gap fix
  - Explicit --max-cost on extract / enrich / integrity auto
  - P5 config-schema budgets: block in ~/.gbrain/config.json
  - Multi-day brainstorm resume (>7d)
  - Async-batched audit writes (profiling trigger criterion)
  - BudgetLedger unification with BudgetTracker
  - judges.ts internal chunking → payload-fitter delegation

Also: fixed a payload-fitter typecheck error (ChatFn import). Final
typecheck is clean on every file the wave touched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(schema): F1 page_links view alias for both engines

Brainstorm's domain-bank queries reference `page_links` (pglite-engine.ts:896,
postgres-engine.ts:959) but the canonical table is `links`. Without the alias
view, `gbrain brainstorm` against PGLite fails with `relation "page_links"
does not exist`; the same was a latent bug on Postgres.

This commit lands the fix at three sites:

1. `src/core/pglite-schema.ts` — embedded schema bundle gets the view at
   table-bundle time, so fresh PGLite installs are correct from boot.
2. `src/core/migrate.ts` v81 (`page_links_view_alias`) — existing brains on
   either engine pick up the view via `gbrain apply-migrations`. CREATE OR
   REPLACE VIEW is idempotent; re-running is safe.
3. `test/e2e/brainstorm-resume.test.ts` — removed the ad-hoc workaround view
   from the test setup. The E2E now exercises the same schema path real
   users will see.

`TODOS.md` entry for the gap closed out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(brainstorm): F2 pre-flight --max-cost refusal smoke E2E

Pins the user-facing path that closed the original \$50 incident: when
the pre-run estimate exceeds the configured cap, runBrainstorm throws
BudgetExhausted with reason='cost' and a paste-ready hint pointing at
--limit / --max-cost / --max-far-set before any chat call happens.

The four assertions are the four things a real user can verify after
the throw lands:
  1. Typed BudgetExhausted (not a generic Error)
  2. reason === 'cost' (not runtime or no_pricing)
  3. Message names the remediation flags
  4. No provider HTTP would have happened (chat.crossCalls === 0)

Uses the same PGLite engine + tinyProfile + stub chatFn as the existing
--resume tests. Hermetic; ~5s wallclock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(reindex-code): F3 --max-cost flag via withBudgetTracker

Wires gbrain reindex --code into the v0.38 budget cathedral. When the
caller passes --max-cost N (or --max-cost-usd N), runReindexCode wraps
its per-page import loop in withBudgetTracker so every gateway.embed()
call inside importCodeFile auto-composes the cap. On BudgetExhausted,
the partial-progress result reports what got reindexed before the cap
fired plus a synthetic failure row naming the cap throw.

reindex-code is idempotent (content_hash short-circuit in importCodeFile),
so a re-run after a budget abort picks up where the cap fired — no
manual checkpoint state needed.

Both --max-cost and --max-cost-usd are accepted (symmetry with brainstorm
which uses --max-cost, and a precedent for the spelling we want long-term).

When --max-cost is unset, the body runs outside any tracker scope — byte-
stable pre-F3 behavior for legacy callers.

Files:
  src/commands/reindex-code.ts:
    - ReindexCodeOpts.maxCostUsd?: number
    - runReindexCode wraps body in withBudgetTracker when set
    - runReindexCodeCli parses --max-cost / --max-cost-usd
    - BudgetExhausted caught + returned as partial-progress result
  test/reindex-code-max-cost.serial.test.ts (NEW):
    - dry-run + maxCostUsd happy path
    - empty-brain + maxCostUsd hits early-return cleanly
    - no tracker installed when cap is unset (regression guard for
      the conditional wrap)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(schema): narrow page_links view projection to bootstrap-safe columns

The v0.38 page_links view alias initially used SELECT * FROM links, which
broke the pre-v0.13 bootstrap test: applyForwardReferenceBootstrap drops
link_source + origin_page_id to simulate the pre-v0.13 schema shape, but
the SELECT * view created a dependency that blocked the column DROP.

Engine queries only reference pl.id (via COUNT(*)) and pl.to_page_id, so
the view's projection is now SELECT id, from_page_id, to_page_id FROM links
— what callers actually use, no more. This unblocks legacy-brain upgrade
paths AND keeps the bootstrap forward-reference probes safe.

Bootstrap suite: 15/15 pass after the change.

Also files a P0 TODO for a pre-existing test failure
(test/doctor-report-remote.test.ts "full report on healthy brain") that
fails on master too — out of scope for this wave but noticed during
/ship triage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version to v0.39.0.0

Brainstorm cost cathedral wave (P1-P7). MINOR bump per user direction:
new architectural seam (gateway-layer BudgetTracker via AsyncLocalStorage),
5 new modules, new CLI flags (--max-cost / --resume / --list-runs /
--force-resume), new migration v81 (page_links view alias).

No breaking changes — BudgetExhausted re-exported from orchestrator for
back-compat; --max-usd preserved as alias for --max-cost; eval-contradictions
--budget-usd surface byte-identical.

CHANGELOG entry renamed from [Unreleased] to [0.39.0.0] and adds the
mandatory "To take advantage of v0.39.0.0" block per CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(isolation): rename 3 env-mutating tests to .serial.test.ts (CI fix)

CI's `check:test-isolation` flagged three tests added in the v0.39.0.0
cathedral that directly mutate `process.env` across test boundaries:

- test/brainstorm/checkpoint.test.ts (mutates GBRAIN_HOME)
- test/core/audit-week-file.test.ts (mutates GBRAIN_AUDIT_DIR)
- test/core/remediation-checkpoint.test.ts (mutates GBRAIN_HOME)

Per CLAUDE.md rule R1: env-mutating tests either use withEnv() OR rename
to *.serial.test.ts (the quarantine escape hatch). The mutation lives in
beforeEach/afterEach which spans the whole describe block, so .serial
rename is the cleaner fix — withEnv() would require restructuring every
test. The serial-test runner gives them their own bun process; no cross-
file env races.

Verified: check:test-isolation passes (527 non-serial unit files clean),
`bun run verify` passes, all 41 tests in the three renamed files pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(v0.38): close schema-pack coverage gaps (candidate-audit, registry depth, schema use)

3 new test files / extensions surfacing during the v0.38 wave audit:

- test/candidate-audit.test.ts (17 cases): pins the privacy contract for
  the schema-candidate audit log (sha8 redaction by default, slug-prefix-
  only, frontmatter key names without values, GBRAIN_AUDIT_DIR honor,
  malformed-JSONL skipping, ISO-week-rotation including the 2026-W53 year
  boundary, best-effort write).

- test/schema-pack-registry.test.ts (9 cases): pins the extends-chain
  depth ladder (soft warn at >4, hard cap reject at >8), cyclic-extends
  rejection, and cache identity reuse. Pure unit tests with the loader
  dependency injected — never touches disk.

- test/schema-cli.test.ts (4 new cases extending the existing file): pins
  `gbrain schema use` happy path writing schema_pack to ~/.gbrain/
  config.json, config-merge preservation across re-runs, overwrite
  semantics, and unknown-pack rejection without a config write.

Total: 38 new cases, all green. Closes the gap audit's HIGH-priority items
(candidate-audit file I/O, registry depth-cap enforcement, schema use
happy path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): add --no-embedding to claw-test + thin-client init paths

Both tests run `gbrain init --pglite` without an embedding provider env
var. Since v0.37.10.0's env-detection picker, init refuses without
either a provider key or the --no-embedding deferral flag, so these
tests began exiting 1 in their setup phase. Neither test exercises
embedding pipelines (claw-test exercises CLI ergonomics, thin-client
exercises remote routing), so deferring embedding setup is the
correct shape — not stuffing fake API keys into the env.

- src/commands/claw-test.ts: install_brain phase argv adds --no-embedding
- test/e2e/thin-client.test.ts: beforeAll init spawn adds --no-embedding

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): make multi-source e2e order-independent vs storage-tiering

multi-source.test.ts asserts sources.name='default' (lowercase) for the
seeded default source. storage-tiering.test.ts uses
`INSERT INTO sources (id, name) VALUES ('default', 'Default')` (capital D)
without restoring the canonical name on cleanup. When storage-tiering ran
first against the same Postgres DB, multi-source picked up the polluted
'Default' and failed.

Fix at the consumer: reset the default source's name + config + path
fields back to the canonical seed shape in each describe block's
beforeAll. Order-independent regardless of which other e2e file
mutated sources first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(init): honor DEFAULT_EMBEDDING_DIMENSIONS for canonical default model

The v0.37.11.0 fresh-install fix wave introduced
src/core/ai/defaults.ts with DEFAULT_EMBEDDING_MODEL=zeroentropyai:zembed-1
and DEFAULT_EMBEDDING_DIMENSIONS=1280 — the closest Matryoshka step to
legacy OpenAI 1536 while staying on ZE's high-recall section. Every
schema/engine/registry call site was updated to track these constants,
EXCEPT init.ts:resolveEmbeddingByEnv at line 398, which kept using the
recipe's `default_dims` (the recipe's "largest sensible" tier — 2560
for ZE).

Effect: with ZEROENTROPY_API_KEY set, `gbrain init --pglite
--non-interactive` produced a 2560-d schema while every other path
(programmatic SDK, configureGateway, etc.) defaulted to 1280-d. Tests
that round-trip "init resolved choice matches DEFAULT_EMBEDDING_DIMENSIONS"
(test/e2e/fresh-install-pglite.test.ts) failed when ZEROENTROPY_API_KEY
was set, and a real init→embed flow on the same env would produce a
schema-width / vector-width mismatch on first embed.

Fix at the boundary: when the env-detected provider matches
DEFAULT_EMBEDDING_MODEL, use DEFAULT_EMBEDDING_DIMENSIONS. Otherwise
fall back to the recipe's default_dims (correct for non-canonical
providers like Voyage, etc.).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* T1.5: engine wiring — thread activePack through parseMarkdown / import / sync

v0.39.0.0 schema cathedral wave T1.5. Addresses the load-bearing gap codex
caught: the v0.38 schema-pack engine (1964 LOC) shipped but was INERT at
runtime — no caller consumed loadActivePack except the inspection CLI. This
patch closes the gap end-to-end through the central markdown parsing seam.

Changes:
- src/core/markdown.ts:       ParseOpts.activePack added; parseMarkdown uses
                              inferTypeFromPack(pack) when set, else falls
                              back to legacy inferType (parity preserved).
- src/core/import-file.ts:    importFromContent + importFromFile accept
                              opts.activePack and thread to parseMarkdown.
- src/core/operations.ts:     put_page handler loads activePack ONCE per
                              invocation via loadActivePack(); threads to
                              importFromContent. Best-effort load (failure
                              falls through to legacy behavior).
- src/commands/sync.ts:       performSyncInner loads activePack ONCE at entry
                              and threads to BOTH importFile call sites
                              (serial path + parallel worker path).
- src/commands/import.ts:     runImport loads activePack ONCE at entry and
                              threads to importFile.
- src/commands/whoknows.ts:   types? doc-only note pointing future callers
                              at expertTypesFromPack (actual handler wiring
                              deferred to T19 federated_read closure fix).

Codex perf finding #7 honored: loadActivePack runs ONCE per command, never
per file. The per-process cache in registry.ts amortizes manifest reads
across put_page invocations.

Parity:
- test/regressions/gbrain-base-equivalence.test.ts (8 pass, 69 expects) still green
- New: test/active-pack-wiring.test.ts (5 pass, 8 expects) covers the
  threading regression — pack changes inferred type AND no-pack falls back
  AND empty pack falls back AND frontmatter wins AND Persona A scenario
  (Notion-shape paths typed correctly).

Deferred to T19:
- find_experts / whoknows handler pack-aware type derivation via
  expertTypesFromPack. T19 fixes loadActivePackForOp's first-source
  collapse bug; only then is it safe to wire find_experts through it.
- facts/eligibility ELIGIBLE_TYPES pack-aware variant (extractableTypesFromPack
  already exists; awaits the same closure fix).

Plan: ~/.claude/plans/system-instruction-you-are-working-jiggly-tower.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* T2-T5+T15+T20+T23: schema cathedral CLI verbs land

v0.39.0.0 — eleven new schema CLI verbs + supporting libraries + events
audit. Ships as one cohesive bundle because every verb shares the
loadActivePack boundary + the --json/--source CLI contract surface.

New verbs (in `gbrain schema`):
- detect              (T2 P1) — SQL heuristic clustering on pages.source_path
- suggest             (T3 P1) — runSuggest library; heuristic-by-default
                                 with optional gateway refinement
- review-candidates   (T4 P1) — disk-derived candidates; --apply writes a
                                 delta file under ~/.gbrain/schema-pack-deltas/
- init                (T5 P2, experimental) — scaffolds a stub pack
- fork                (T5 P2, experimental) — copies an existing pack
- edit                (T5 P2, experimental) — surfaces the pack file path
- diff                (T5 P2, experimental) — set-diffs type names
- graph               (T5 P2, experimental) — ASCII type listing
- lint                (T5 P2) — flags duplicate names + missing prefixes
- explain             (T5 P2, experimental) — pretty-prints one type
- review-orphans      (T5 P2) — surfaces type=null pages by source
- downgrade           (T20 P1) — restores config.schema_pack to previous
- usage               (T23 P2) — per-verb 30d usage from schema-events audit

New files:
- src/core/schema-pack/detect.ts   (~150 LOC pure data + runDetect)
- src/core/schema-pack/suggest.ts  (~120 LOC runSuggest library + test seam)
- src/core/schema-pack/review.ts   (~140 LOC review-candidates + review-orphans)
- src/core/schema-events.ts        (~80  LOC JSONL audit + readback)

Shared contracts:
- parseFlags() helper enforces --json + --source/--source-id across every
  verb that consumes a brain. T6 will pin this in CI.
- withConnectedEngine() factory connects + disconnects for the verbs that
  need a brain (detect/suggest/review-candidates/review-orphans/usage).
- EXPERIMENTAL_VERBS set = {init, fork, edit, diff, graph, explain}.
  D14 hybrid: surfaced via "(experimental)" in help + JSON tier field +
  T15 audit + T23 usage subcommand for v0.40+ retro deprecation decisions.

Privacy: review-candidates does disk re-derivation, NOT audit-log reads
(D3(eng) + codex #10). CLI output explicitly says "Disk-derived candidates
from current brain state. Audit history at ~/.gbrain/audit/..." so users
understand the data origin.

D4(eng) honored: single runSuggest() library, multiple thin callers (CLI
in this commit; T12 dream-cycle phase, T10 EIIRP, T7 doctor in later
commits all import the same function).

Codex finding #9 honored: heuristic fallback ALWAYS returns confidence 0.5.
Downstream EIIRP consumer (T10) MUST treat confidence < 0.6 as "manual
review required, not auto-apply" — pinned in T16 eval harness.

Tests green:
- typecheck clean
- test/active-pack-wiring.test.ts: 5 pass
- test/regressions/gbrain-base-equivalence.test.ts: 8 pass
- test/schema-cli.test.ts: 12 pass

Plan: ~/.claude/plans/system-instruction-you-are-working-jiggly-tower.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* T6: CLI contract test pins --json + --source on every new schema verb

v0.39.0.0 — locks the parseFlags() contract for the 13 new cathedral CLI
verbs (T2-T5, T20, T23). Source-grep guard ensures every future verb-handler
runs through parseFlags(), preserves the schema_version:1 JSON envelope
shape, and accepts both --source / --source-id flag forms.

7 cases, 67 expect calls. Test runs in <100ms — cheap CI signal that
guarantees the cathedral CLI surface stays uniform for agent consumers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* T7 + T9: import warn + 3 schema-pack doctor checks

v0.39.0.0 — closes the silent-mismatch failure mode that Persona A hits
when 3000 Notion-shape pages import against gbrain-base.

Import warn (T7):
- runImport prints end-of-run stderr line when >=10% of imported pages
  have type=null. Fires ONCE, not per page. Best-effort; query failure
  is non-fatal. Breadcrumb points at `gbrain schema detect` + the
  doctor consistency check.

Doctor checks (T7+T9, three v0.38-promised checks finally shipped):
- schema_pack_active        ok/warn — does the active pack resolve?
- schema_pack_consistency   ok/warn at 10% untyped threshold; names
                            the worst source + paste-ready fix command.
- schema_pack_source_drift  ok/warn when per-source overrides disagree.

All three are warn-only; never fail-block.

Files:
- src/commands/import.ts:    end-of-run warn after Import complete summary
- src/commands/doctor.ts:    runDoctor pushes 3 new checks + implementations
                             at file bottom (~110 LOC total)

Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* T8: bundle gbrain-recommended pack from GBRAIN_RECOMMENDED_SCHEMA.md

v0.39.0.0 — 1013-line prose taxonomy becomes a real activatable pack.
Users who like the documented operational-brain pattern type
`gbrain schema use gbrain-recommended` and get the documented
behavior in one command instead of inferring it from the doc.

New pack adds 13 page types beyond gbrain-base: deal, meeting, concept,
project, source, daily, personal, civic, original, place, trip,
conversation, writing. Extends gbrain-base; pinned to it via the
`extends:` field so users still get all the legacy types.

Files:
- src/core/schema-pack/base/gbrain-recommended.yaml (new pack manifest)
- src/core/schema-pack/load-active.ts (bundled-packs registry now arrayed)
- src/commands/schema.ts (`gbrain schema list` shows both bundled packs)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* T10 + T11: port EIIRP + brain-taxonomist skills (genericized)

v0.39.0.0 — wintermute-side filing intelligence ports to gbrain as
first-class skillpack skills. Both rewritten against v0.39 cathedral
primitives (D9 from plan-eng-review) so they consume the active schema
pack as data, not their own hardcoded taxonomy.

skills/eiirp/         — 7-phase post-work organizer:
                        1) INVENTORY  2) TAXONOMY  3) SCHEMA CHECK
                        4) FILE  5) SKILL GRAPH AUDIT  6) VERIFY  7) REPORT
                        Phase 3 calls `gbrain schema detect|suggest|review-candidates`
                        Phase 5 calls `gbrain check-resolvable`
                        Phase 6 reads `gbrain doctor` schema_pack_consistency
                        + routing-eval.jsonl (10 fixtures)

skills/brain-taxonomist/ — write-time filing gate:
                        Zero hardcoded directory table. Every decision
                        reads `gbrain schema show --json`. The active
                        pack is the single source of truth.
                        Per-source flag (--source) first-class for
                        multi-brain users.
                        + routing-eval.jsonl (7 fixtures)

Privacy (CLAUDE.md compliance):
- Zero references to the private fork name (verified via grep).
- "private fork" / "upstream OpenClaw" used in changelog notes only.
- Per CLAUDE.md, this code uses "OpenClaw" / "your OpenClaw" semantics.

Codex finding #9 honored in EIIRP Phase 3:
  Confidence < 0.6 from runSuggest MUST surface to user, NOT auto-apply.
  The cathedral ships the primitives; EIIRP enforces the human gate.

RESOLVER.md updated: 2 new rows; routing is MECE against existing skills
(brain-taxonomist distinct from repo-architecture; EIIRP distinct from
ingest/skillify/data-research per the SKILL.md distinct_from blocks).

bash scripts/check-skill-brain-first.sh: passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* T12+T13+T19+T21: cycle phase, auto-prompt, federated closure, cache isolation

v0.39.0.0 — four surgical wires that thread the schema-pack cathedral
into existing engine paths.

T12 (dream-cycle schema-suggest phase):
- src/core/cycle/schema-suggest.ts (new, ~80 LOC) — thin wrapper around
  runSuggest() library. D4 honored: single library, multiple thin callers
  (CLI verb + EIIRP + this phase all import the same fu…
@garrytan

Copy link
Copy Markdown
Owner

Closing — shipped in v0.39.0.0.\n\nbrainstorm cost cathedral (P1-P7) shipped the same fix surface — src/core/brainstorm/* and the cost-guardrails test live in master from PR #1283\n\nThanks for the agent-fork PR. Closing because the work is already on master.

@garrytan garrytan closed this May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants