Skip to content

v0.33.3.0 feat(v0.33.3): code intelligence MCP foundation (v0.34 W0a-c + W3)#934

Merged
garrytan merged 13 commits into
masterfrom
garrytan/miami
May 14, 2026
Merged

v0.33.3.0 feat(v0.33.3): code intelligence MCP foundation (v0.34 W0a-c + W3)#934
garrytan merged 13 commits into
masterfrom
garrytan/miami

Conversation

@garrytan

@garrytan garrytan commented May 12, 2026

Copy link
Copy Markdown
Owner

Summary

The v0.34 Cathedral III foundation. Ships the agent-facing MCP surface for the v0.20+ tree-sitter call graph (code_callers, code_callees, code_def, code_refs are no longer CLI-only) plus the foundation fixes Codex's outside-voice review caught before any recursive blast/flow ships in v0.34.

Code intelligence (W3)

  • code_callers / code_callees / code_def / code_refs MCP ops with resolver-grade descriptions agents auto-route to
  • Resolver wording: "BEFORE editing any function, run code_callers..." — pinned in operations-descriptions.ts
  • Each description carries an inline example response per the v0.34 eng-review D10 finding
  • SEARCH_DESCRIPTION gains a cross-link clause so agents stop falling through to text search for code-symbol questions

Source routing (W0a, Codex finding #2)

  • query op handler threads ctx.sourceId to hybridSearch (was advertised, never applied)
  • two-pass.ts:81 (nearSymbol lookup) and :131 (unresolved-edge resolution) now apply opts.sourceId via a pages.source_id join
  • Multi-source brains stop cross-contaminating structural retrieval

CLI source-scoping default flipped (W0b, Codex finding #7)

  • gbrain code-callers <symbol> without --source resolves to the brain's default source (was silently global despite the docstring claim)
  • New resolveDefaultSource(engine) helper in sources-ops.ts: returns the only source's id; throws SourceResolutionError with the list on multi-source brains
  • --all-sources is the explicit opt-out for the pre-v0.33.2 cross-source default

Within-file symbol resolver (W0c)

  • New module src/core/chunkers/symbol-resolver.ts walks unresolved edges in 200-row batches, writes outcome to code_edges_symbol.edge_metadata:
    • {resolved_chunk_id: N} — one unambiguous same-file match
    • {ambiguous: true, candidates: [...]} — 2+ matches
    • (no metadata change) — zero matches; caller's two-pass walk handles cross-file
  • New cycle phase resolve_symbol_edges between extract_facts and patterns (13 phases total)
  • Schema migration v55 (edges_backfilled_at_v0_33_2): content_chunks.edges_backfilled_at TIMESTAMPTZ + composite/partial indexes for the resolver hot path (D11 from eng review)
  • Resumable: SIGINT/OOM/sleep loses at most one batch; EDGE_EXTRACTOR_VERSION_TS constant bump forces re-walk

Eval baseline (pre-W0)

  • New gbrain eval code-retrieval [--baseline | --with-code-intel | --compare] CLI subcommand
  • 30-question fixture covering callers / callees / definition / references / blast_radius / execution_flow / cluster_membership
  • Captures pre-v0.34 retrieval quality on the gbrain self-corpus so the v0.34 ship gate measures real improvement vs a retroactively-tuned baseline

What this DOES NOT ship (deferred to v0.34)

Per the design doc's slip-handling clause:

  • Recursive code_blast / code_flow (depth-grouped traversal with confidence decay)
  • Leiden community detection: code_clusters_list + code_cluster_get
  • gbrain wiki zero-LLM aggregator
  • code_traversal_cache (REPEATABLE READ + xmin_max snapshot isolation)
  • Per-op graph-traversal eval metrics
  • imports and references edge types for JS/TS/TSX + Python
  • Receiver-type scope walkers (obj.method()Class.method)

Test Coverage

50 new test cases across 6 new files:

  • test/code-retrieval-harness.test.ts — 26 unit tests (precision@k, recall@k, top-1 stability, gate logic)
  • test/e2e/source-routing.test.ts — 4 E2E (PGLite)
  • test/e2e/cli-source-scoping-pglite.test.ts — 3 E2E
  • test/e2e/symbol-resolver-pglite.test.ts — 5 E2E (unambiguous, ambiguous, no-match, watermark, source isolation)
  • test/e2e/code-intel-mcp-ops-pglite.test.ts — 11 E2E
  • Cycle phase-order pins updated to 13 phases (cycle.serial.test.ts, e2e/cycle.test.ts, e2e/dream-cycle-phase-order-pglite.test.ts)

Pre-Landing Review

Spec review (2 rounds), DX review (8 passes + Codex outside voice catching 9 findings + plan revision), eng review (5 architecture issues + 3 code quality + 1 perf, all fixed). The foundation work is the result of Codex catching that the v0.20+ graph stores BARE callee tokens — recursive blast/flow on it would alias every execute/find/render across classes.

Eval Results

No prompt-related files changed — evals skipped.

Plan Completion

Plan deliberately scoped to foundation only. v0.34 deliverables (clusters, wiki, blast/flow recursion, scope walkers) deferred per slip-handling clause and re-tracked as v0.34 work.

Verification Results

  • Full unit suite (bun run test): 5464 pass / 1 pre-existing flake (LongMemEval p50 perf gate under parallel-shard contention; passes 12/12 isolated at p50=30ms vs 500ms threshold)
  • Full E2E suite (bun run test:e2e against real Postgres on port 5435): 550 pass / 0 fail / 81 files
  • bun run verify clean
  • bun run typecheck clean
  • Trio synced: VERSION=0.33.2.0, package.json=0.33.2.0, CHANGELOG=## [0.33.2.0] - 2026-05-12

Documentation

  • skills/migrations/v0.33.0.md — agent-readable upgrade doc (will be renamed to v0.33.2.0.md once Garry confirms target version)
  • llms.txt + llms-full.txt regenerated

Test plan

  • All unit tests pass (5464/5464 isolated, 1 pre-existing parallel-shard flake)
  • All E2E tests pass against real Postgres (550/550)
  • Typecheck clean
  • Verify gate clean
  • VERSION + package.json + CHANGELOG trio in lockstep at 0.33.2.0
  • Master merged in (extract_facts + whoknows + recall-morning-pulse all preserved alongside v0.33.2 changes)

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

garrytan and others added 11 commits May 11, 2026 12:19
Captures pre-v0.34 retrieval quality on the gbrain self-corpus before any
code-intel work lands, so the v0.34 ship gate (precision@5 +10pp OR
answered_rate +15pp on >=15/30 questions) measures real improvement
rather than an after-the-fact retuned baseline.

* src/eval/code-retrieval/harness.ts -- pure-function metrics (precision@k,
  recall@k, top-1 stability, gate evaluator) + EvalRunReport types stable
  across schema_version 1
* src/eval/code-retrieval/questions.json -- 30 questions across callers /
  callees / definition / references / blast_radius / execution_flow /
  cluster_membership kinds, expected_files captured against current
  gbrain layout
* src/eval/code-retrieval/strategies.ts -- BaselineStrategy (hybridSearch)
  + WithCodeIntelStrategy stub (post-W3 fills in code_blast/code_flow/etc.)
* src/commands/eval-code-retrieval.ts -- gbrain eval code-retrieval CLI
  with --baseline / --with-code-intel / --compare subcommands
* test/code-retrieval-harness.test.ts -- 26 unit tests across metrics,
  loader, gate logic; no engine dependency

PRE-V0.34 BASELINE WORKFLOW:
  gbrain eval code-retrieval --baseline --save /tmp/baseline-1.json
  (run 3x for noise floor)

V0.34 SHIP GATE (after W3 lands):
  gbrain eval code-retrieval --with-code-intel --save /tmp/v034.json
  gbrain eval code-retrieval --compare /tmp/baseline-1.json /tmp/v034.json

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex outside-voice review on the v0.34 plan caught two load-bearing
sites where sourceId was advertised but never applied — multi-source
brains silently cross-contaminated structural retrieval:

* operations.ts ~323 — `query` op handler called hybridSearch without
  threading ctx.sourceId. Multi-source agents querying with a
  --source flag got cross-source results.
* two-pass.ts:81 (nearSymbol lookup) and two-pass.ts:131 (unresolved
  edge resolution) — TwoPassOpts.sourceId was declared and threaded
  through hybridSearch's expandAnchors call, but the actual SQL ignored
  it. The walk window crossed source boundaries every time.

Fix:
* `query` op now reads ctx.sourceId AND accepts a new `source_id`
  param (with '__all__' as the explicit force-cross-source escape
  hatch). Per-call param wins over ctx context.
* two-pass.ts both lookups join through pages.source_id when
  opts.sourceId is set; omitted opts.sourceId preserves the legacy
  cross-source contract for callers who want it.

Regression test: test/e2e/source-routing.test.ts seeds two sources
with the same `parseMarkdown` symbol + a cross-source caller edge.
Pins:
  - nearSymbol + sourceId='source-a' returns ONLY source-a chunks
  - nearSymbol + sourceId='source-b' returns ONLY source-b chunks
  - nearSymbol with no sourceId still crosses sources (contract preserved)
  - walk_depth=1 unresolved-edge resolution stays in source-a

PGLite in-memory, no DATABASE_URL needed. The fix proves out under
realistic structural retrieval not just a contrived unit test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex outside-voice review (finding #7) caught that the v0.20.0
docstring claim "by default we only match the caller's source_id"
contradicted the implementation in code-callers.ts:54 + code-callees.ts:43:

  allSources: allSources || !sourceId

The right side made `allSources` TRUE whenever `--source` was omitted,
INVERTING the documented default. Multi-source brains silently cross-
contaminated structural retrieval; `gbrain code-callers parseMarkdown`
on a brain with two repos returned callers from both even though the
docstring promised per-source scoping.

Fix:
* New canonical helper `resolveDefaultSource(engine)` in sources-ops.ts.
  Contract per eng review D7:
    - exactly 1 source registered → return its id (single-source brains,
      the 80% case; --source flag is unnecessary friction there)
    - 2+ sources → throw SourceResolutionError(multiple_sources_ambiguous)
      with the list of valid ids
    - 0 sources → throw SourceResolutionError(no_sources)
* code-callers.ts + code-callees.ts now resolve to the default source
  when both --source AND --all-sources are absent. To get the pre-v0.34
  cross-source behavior, callers must pass --all-sources explicitly.
* Same hint text on both commands. Pinned by test/e2e/cli-source-scoping-pglite.test.ts.

IRON RULE regression R2: docstring promise now holds. Multi-source brain
running `gbrain code-callers <symbol>` without --source gets a clear
error listing valid source ids instead of silent cross-resolution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…led_at watermark

Codex's outside-voice review caught that the v0.20.0 graph stores BARE
callee tokens (`render`, `find`, `execute`) — not qualified names. Pre-v0.34
recursive blast/flow would alias every same-named function across classes.
W0c is the foundation that fixes this: resolve `code_edges_symbol` rows by
matching `to_symbol_qualified` against the SAME-FILE chunks'
`symbol_name_qualified`, then write the outcome to `edge_metadata`.

This commit is the resolver primitive + schema. The cycle-phase wiring
that calls it on every quick-cycle tick lands in the next commit.

Schema (v51 migration `edges_backfilled_at_v0_34`):
* `content_chunks.edges_backfilled_at TIMESTAMPTZ` — resume watermark.
  Chunks where the column is NULL OR older than EDGE_EXTRACTOR_VERSION_TS
  get re-walked next tick. SIGINT/OOM/sleep mid-backfill loses at most
  one batch.
* Indexes per D11 from eng review:
  - `idx_code_edges_symbol_resolver(source_id, to_symbol_qualified)` —
    composite for the resolver's per-source lookup.
  - `idx_content_chunks_symbol_lookup(page_id, symbol_name_qualified)`
    WHERE `symbol_name_qualified IS NOT NULL` — file-batched candidate
    fetch; also reused by W4-5 cluster recompute.
  - `idx_content_chunks_edges_backfill(edges_backfilled_at)` WHERE
    `edges_backfilled_at IS NULL` — fast unresumed-row scan.

Module (`src/core/chunkers/symbol-resolver.ts`):
* `resolveSymbolEdgesIncremental(engine, {sourceId, maxChunks?, onProgress?})`
  walks stale chunks in 200-chunk batches. For each chunk, loads its
  unresolved edges, finds same-page candidates by symbol_name_qualified,
  and writes outcome to `edge_metadata`:
   - exactly 1 candidate → `{resolved_chunk_id: <id>}`
   - 2+ candidates → `{ambiguous: true, candidates: [...]}`
   - 0 candidates → unchanged (cross-file; two-pass.ts handles those)
  Each batch bumps `edges_backfilled_at = NOW()` for the chunks.
* `readEdgeResolution(metadata)` — public helper for downstream code
  (two-pass.ts, code_blast op, eval-capture) to consume the resolver's
  output without parsing JSON directly. Returns a tagged union.
* `EDGE_EXTRACTOR_VERSION_TS` exported constant — bump when extractor
  shape changes and the next cycle re-walks all chunks.

Tests (5 E2E in test/e2e/symbol-resolver-pglite.test.ts, all PGLite,
no DATABASE_URL): unambiguous match, ambiguous multi-match, no match,
watermark advance + idempotency, source isolation (no cross-source
candidate leak).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
W0c's symbol resolver lands as a 12th cycle phase between extract and
patterns. The autopilot's quick-cycle path (60s watchdog interval per
D2 from eng review) now resolves stale chunks incrementally so agents
see resolved edges within ~60s of writes rather than waiting on the
slow full-walk path.

* CyclePhase + ALL_PHASES + NEEDS_LOCK_PHASES extended with
  'resolve_symbol_edges'. Position: between extract (which emits new
  bare-token edges from sync diffs) and patterns (which reads the
  graph). Acquires the cycle lock because it writes edge_metadata.
* CycleReport.totals adds edges_resolved + edges_ambiguous so doctor
  and autopilot summaries surface the numbers.
* runPhaseResolveSymbolEdges walks every registered source via
  listSources() + resolveSymbolEdgesIncremental(). Per-call cap is
  BATCH_SIZE*10 = 2000 chunks so a single watchdog tick stays bounded
  even on a 100K-chunk brain. Subsequent ticks pick up the leftovers
  via the edges_backfilled_at watermark.
* Test count bumped from 11 → 12 phases in cycle.serial.test.ts and
  cycle.test.ts (both pinned by the regression guards). Existing 28
  cycle tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ode_refs

Pre-v0.34 these four code-intelligence commands lived in CLI_ONLY at
cli.ts:30 — agents calling gbrain via MCP couldn't reach them and fell
through to text search. This commit ships the agent-facing MCP surface
for v0.34 against the existing v0.20+ tree-sitter call graph; recursive
blast/flow and clusters land in subsequent commits.

* `code_callers(symbol, [limit, source_id, all_sources])` — wraps
  engine.getCallersOf. Reverse view of the A1 call graph.
* `code_callees(symbol, [limit, source_id, all_sources])` — wraps
  engine.getCalleesOf. Forward view.
* `code_def(symbol, [limit, lang])` — wraps findCodeDef. Returns
  definition sites with file/line/snippet.
* `code_refs(symbol, [limit, lang])` — wraps findCodeRefs. Returns
  every reference (comments, strings, imports, call sites).

All four are scope:'read', source-scoped by default via ctx.sourceId
(W0a contract). Per-call source_id param wins over ctx; pass '__all__'
or all_sources=true to force cross-source.

* operations-descriptions.ts: 4 new constants per the eng review D10
  finding — every description carries an inline example response so
  agents don't burn first-call context discovering shape. Resolver-grade
  wording ("BEFORE editing any function, run code_callers...") routes
  plan-mode questions straight to the right op.
* SEARCH_DESCRIPTION gains a cross-link clause pointing at the four new
  ops so agents stop falling through to text search for code-symbol
  questions.

Tests (11 E2E in test/e2e/code-intel-mcp-ops-pglite.test.ts):
  - All four ops registered + scope:read + description pinned by constant
  - All four ops have required symbol param
  - code_callers / code_callees return the documented envelope shape
  - Source scoping honors ctx.sourceId
  - all_sources=true / source_id='__all__' force cross-source
  - code_def returns the def-site snippet

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion

skills/migrations/v0.33.0.md gives existing-user upgrade guidance for the
v0.33.0 foundation pre-release (this branch's accumulated work toward
v0.34 Cathedral III):

* Source-routing fix (Codex #2) — query / two-pass now honor sourceId
* CLI source-scoping default flipped (Codex #7) — gbrain code-callers
  defaults to source-scoped, --all-sources is the explicit opt-out
* MCP exposure of code-callers / code-callees / code-def / code-refs
  with resolver-grade descriptions agents auto-route to
* Within-file symbol resolver runs as a new `resolve_symbol_edges`
  cycle phase between extract and patterns
* Schema migration v51: edges_backfilled_at watermark + 3 composite/
  partial indexes for the resolver hot path
* Verification commands the agent runs after `gbrain upgrade`

Bumps the existing-user migration ladder so the auto-update agent
(SKILLPACK Section 17) discovers + runs the v0.33.0 migration steps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.33.0 ships the v0.34 Cathedral III foundation: MCP exposure of
code_callers / code_callees / code_def / code_refs with resolver-grade
tool descriptions, plus the source-routing fix + within-file symbol
resolver + cycle-phase wiring that v0.34's recursive blast/flow and
Leiden clusters will build on.

Full release notes in CHANGELOG.md. Trio in lockstep:
  VERSION:      0.33.0
  package.json: 0.33.0
  CHANGELOG.md: ## [0.33.0] - 2026-05-11

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…symbol_edges

E2E test pinned the canonical phase sequence as a regression guard. The
v0.33.0 resolve_symbol_edges phase (added between extract and patterns)
correctly bumps the count to 12 — caught by the canonical-order test on
fresh-Postgres run, fixed by adding the new phase to EXPECTED_PHASES
and bumping the version history comment.

Both cycle.serial.test.ts and cycle.test.ts were already updated in the
W0c cycle-phase commit (6f7dbe1); this third pin lives in
test/e2e/dream-cycle-phase-order-pglite.test.ts and was missed.

Full E2E suite now: 550 passed / 0 failed / 81 files (real Postgres on
port 5435 via Docker pgvector/pgvector:pg16).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
#	src/commands/eval.ts
#	src/core/cycle.ts
#	src/core/migrate.ts
#	src/core/operations-descriptions.ts
#	src/core/operations.ts
#	test/core/cycle.serial.test.ts
#	test/e2e/cycle.test.ts
#	test/e2e/dream-cycle-phase-order-pglite.test.ts
User asked to ship as v0.33.3.0 instead of v0.33.2.0. Single sweep:

* VERSION + package.json bumped to 0.33.3.0
* CHANGELOG header + body rewritten to v0.33.3
* skills/migrations/v0.33.0.md → skills/migrations/v0.33.3.0.md
  (migration files use the version they ship FROM; renaming aligns with
  the v0.21.0.md / v0.31.0.md convention in CLAUDE.md)
* Schema migration name edges_backfilled_at_v0_33_2 →
  edges_backfilled_at_v0_33_3 in src/core/migrate.ts (also bumps the
  in-code identifier so the registry name matches the version)
* All v0.33.2 comment references swept to v0.33.3 in cycle.ts,
  operations.ts, operations-descriptions.ts, eval.ts, symbol-resolver.ts
  + cycle test phase-history comments
* llms.txt + llms-full.txt regenerated

Trio verified:
  VERSION:      0.33.3.0
  package.json: 0.33.3.0
  CHANGELOG.md: ## [0.33.3.0] - 2026-05-12

bun run verify clean; 90 v0.33.3-touched tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title v0.33.2.0 feat(v0.33.2): code intelligence MCP foundation (v0.34 W0a-c + W3) v0.33.3.0 feat(v0.33.3): code intelligence MCP foundation (v0.34 W0a-c + W3) May 12, 2026
# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
#	src/core/migrate.ts
#	src/core/operations.ts
garrytan added a commit that referenced this pull request May 14, 2026
Per user direction: ship as v0.33.2.1 instead of v0.33.3.1.
0.33.2.x is unclaimed in the queue (PR #934 holds 0.33.3.0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 14, 2026
* docs(CLAUDE.md): add workflow for fork PRs from garrytan-agents

Fork PRs from non-collaborator accounts don't receive base-repo secrets on
pull_request events, so CI jobs needing ANTHROPIC_API_KEY / OPENAI_API_KEY
fail with empty-env auth errors. Document the move-branch-to-base-repo
workflow as the narrow-scope alternative to adding the account as a
collaborator or flipping the repo-wide fork-secret toggle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.33.3.1)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: rebump to v0.33.2.1

Per user direction: ship as v0.33.2.1 instead of v0.33.3.1.
0.33.2.x is unclaimed in the queue (PR #934 holds 0.33.3.0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
@garrytan garrytan merged commit 9fb4d7e into master May 14, 2026
7 checks passed
garrytan added a commit that referenced this pull request May 15, 2026
Master landed PR #934 (v0.33.3.0 code intelligence foundation: W0a
source-routing fix + W0b CLI source-scoping flip + W0c within-file
two-pass symbol resolver + W3 MCP exposure of code_callers/callees/def/
refs + pre-w0 eval harness). My branch already contained all of that
work via the original miami merge at the start of this session; the
conflicts are version-label drift (my comments said v0.33.2, master
shipped v0.33.3) and a few additive cases.

Conflict resolutions:
- VERSION / package.json: kept 0.34.0.0 (higher semver wins).
- CHANGELOG.md: both entries preserved. Order is v0.34.0.0 → v0.33.3.0 →
  v0.33.2.1 → v0.33.2.0 → v0.33.1.1 → v0.33.1.0; chronologically
  reasonable with newest-on-top.
- src/core/chunkers/symbol-resolver.ts (add/add): kept my version. Diff
  was W1+W2 documentation block + bumped EDGE_EXTRACTOR_VERSION_TS
  ('2026-05-14T01:00:00Z' vs master's '2026-05-11T00:00:00Z') so the
  next dream cycle re-walks every chunk and picks up qualified-name
  matches from the W1 receiver-type resolution + W2 imports/references.
- src/core/cycle.ts, operations-descriptions.ts, src/commands/eval.ts,
  test/core/cycle.serial.test.ts, test/e2e/cycle.test.ts, test/e2e/
  dream-cycle-phase-order-pglite.test.ts: pure version-string drift
  (v0.33.2 → v0.33.3 in comments). Took master's labels — that's the
  shipped version number.
- src/core/operations.ts: 4 zones merged.
  1. Kept my "v0.34 (Codex finding #2) sourceId resolution" comment.
  2. Took master's wording on the hybridSearchCached comment (functionally
     identical).
  3. Kept my new code_blast + code_flow + code_traversal_cache_clear op
     definitions (W3 + W3b — master doesn't have these).
  4. Deduplicated the ops registration: kept master's v0.33.3 label +
     my W3 + W3b ops registered alongside the foundation ones.

Verification:
- `bun run typecheck` clean
- `bun run verify` clean (all 11 pre-checks pass)
- Migrations v50→v59 schema still valid (no new master migrations in
  this merge; v55-v57 search-lite + v58-v59 v0.34 already landed
  pre-merge in commit f25b674)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 15, 2026
CI surfaced a duplicate migration version in test/migrate.test.ts:371
("runMigrations sorts by version ascending" — uniq.size === versions.length).

Root cause: the second master merge (PR #934 v0.33.3.0 foundation, commit
3fc0ca5) brought in master's `edges_backfilled_at` migration alongside
the one already in my branch. Both functionally identical (ALTER TABLE
content_chunks ADD COLUMN edges_backfilled_at + 3 indexes), both
renumbered to v58 (mine via the f25b674 merge that pushed past master's
v55 search-lite migrations; master's PR #934 originally claimed v55
which would have collided). Auto-merge kept both, named `_v0_33_2` and
`_v0_33_3`. Tests caught it.

Fix: deleted the `_v0_33_3` duplicate. The remaining `_v0_33_2` entry at
v58 is unchanged; SQL idempotency (ALTER TABLE IF NOT EXISTS + CREATE
INDEX IF NOT EXISTS) means brains that already applied either label
pass through cleanly.

Verification:
- 55 migrations total, all unique versions
- `bun run typecheck` clean
- `bun test test/migrate.test.ts`: 109 pass / 0 fail / 321 expect calls
garrytan added a commit that referenced this pull request May 15, 2026
…clusters + eval gate (#994)

* feat(v0.34 pre-w0): add code-retrieval eval harness for v0.34 ship gate

Captures pre-v0.34 retrieval quality on the gbrain self-corpus before any
code-intel work lands, so the v0.34 ship gate (precision@5 +10pp OR
answered_rate +15pp on >=15/30 questions) measures real improvement
rather than an after-the-fact retuned baseline.

* src/eval/code-retrieval/harness.ts -- pure-function metrics (precision@k,
  recall@k, top-1 stability, gate evaluator) + EvalRunReport types stable
  across schema_version 1
* src/eval/code-retrieval/questions.json -- 30 questions across callers /
  callees / definition / references / blast_radius / execution_flow /
  cluster_membership kinds, expected_files captured against current
  gbrain layout
* src/eval/code-retrieval/strategies.ts -- BaselineStrategy (hybridSearch)
  + WithCodeIntelStrategy stub (post-W3 fills in code_blast/code_flow/etc.)
* src/commands/eval-code-retrieval.ts -- gbrain eval code-retrieval CLI
  with --baseline / --with-code-intel / --compare subcommands
* test/code-retrieval-harness.test.ts -- 26 unit tests across metrics,
  loader, gate logic; no engine dependency

PRE-V0.34 BASELINE WORKFLOW:
  gbrain eval code-retrieval --baseline --save /tmp/baseline-1.json
  (run 3x for noise floor)

V0.34 SHIP GATE (after W3 lands):
  gbrain eval code-retrieval --with-code-intel --save /tmp/v034.json
  gbrain eval code-retrieval --compare /tmp/baseline-1.json /tmp/v034.json

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(v0.34 W0a): source-routing leak across query + two-pass

Codex outside-voice review on the v0.34 plan caught two load-bearing
sites where sourceId was advertised but never applied — multi-source
brains silently cross-contaminated structural retrieval:

* operations.ts ~323 — `query` op handler called hybridSearch without
  threading ctx.sourceId. Multi-source agents querying with a
  --source flag got cross-source results.
* two-pass.ts:81 (nearSymbol lookup) and two-pass.ts:131 (unresolved
  edge resolution) — TwoPassOpts.sourceId was declared and threaded
  through hybridSearch's expandAnchors call, but the actual SQL ignored
  it. The walk window crossed source boundaries every time.

Fix:
* `query` op now reads ctx.sourceId AND accepts a new `source_id`
  param (with '__all__' as the explicit force-cross-source escape
  hatch). Per-call param wins over ctx context.
* two-pass.ts both lookups join through pages.source_id when
  opts.sourceId is set; omitted opts.sourceId preserves the legacy
  cross-source contract for callers who want it.

Regression test: test/e2e/source-routing.test.ts seeds two sources
with the same `parseMarkdown` symbol + a cross-source caller edge.
Pins:
  - nearSymbol + sourceId='source-a' returns ONLY source-a chunks
  - nearSymbol + sourceId='source-b' returns ONLY source-b chunks
  - nearSymbol with no sourceId still crosses sources (contract preserved)
  - walk_depth=1 unresolved-edge resolution stays in source-a

PGLite in-memory, no DATABASE_URL needed. The fix proves out under
realistic structural retrieval not just a contrived unit test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(v0.34 W0b): flip CLI source-scoping default to truly source-scoped

Codex outside-voice review (finding #7) caught that the v0.20.0
docstring claim "by default we only match the caller's source_id"
contradicted the implementation in code-callers.ts:54 + code-callees.ts:43:

  allSources: allSources || !sourceId

The right side made `allSources` TRUE whenever `--source` was omitted,
INVERTING the documented default. Multi-source brains silently cross-
contaminated structural retrieval; `gbrain code-callers parseMarkdown`
on a brain with two repos returned callers from both even though the
docstring promised per-source scoping.

Fix:
* New canonical helper `resolveDefaultSource(engine)` in sources-ops.ts.
  Contract per eng review D7:
    - exactly 1 source registered → return its id (single-source brains,
      the 80% case; --source flag is unnecessary friction there)
    - 2+ sources → throw SourceResolutionError(multiple_sources_ambiguous)
      with the list of valid ids
    - 0 sources → throw SourceResolutionError(no_sources)
* code-callers.ts + code-callees.ts now resolve to the default source
  when both --source AND --all-sources are absent. To get the pre-v0.34
  cross-source behavior, callers must pass --all-sources explicitly.
* Same hint text on both commands. Pinned by test/e2e/cli-source-scoping-pglite.test.ts.

IRON RULE regression R2: docstring promise now holds. Multi-source brain
running `gbrain code-callers <symbol>` without --source gets a clear
error listing valid source ids instead of silent cross-resolution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W0c): within-file two-pass symbol resolver + edges_backfilled_at watermark

Codex's outside-voice review caught that the v0.20.0 graph stores BARE
callee tokens (`render`, `find`, `execute`) — not qualified names. Pre-v0.34
recursive blast/flow would alias every same-named function across classes.
W0c is the foundation that fixes this: resolve `code_edges_symbol` rows by
matching `to_symbol_qualified` against the SAME-FILE chunks'
`symbol_name_qualified`, then write the outcome to `edge_metadata`.

This commit is the resolver primitive + schema. The cycle-phase wiring
that calls it on every quick-cycle tick lands in the next commit.

Schema (v51 migration `edges_backfilled_at_v0_34`):
* `content_chunks.edges_backfilled_at TIMESTAMPTZ` — resume watermark.
  Chunks where the column is NULL OR older than EDGE_EXTRACTOR_VERSION_TS
  get re-walked next tick. SIGINT/OOM/sleep mid-backfill loses at most
  one batch.
* Indexes per D11 from eng review:
  - `idx_code_edges_symbol_resolver(source_id, to_symbol_qualified)` —
    composite for the resolver's per-source lookup.
  - `idx_content_chunks_symbol_lookup(page_id, symbol_name_qualified)`
    WHERE `symbol_name_qualified IS NOT NULL` — file-batched candidate
    fetch; also reused by W4-5 cluster recompute.
  - `idx_content_chunks_edges_backfill(edges_backfilled_at)` WHERE
    `edges_backfilled_at IS NULL` — fast unresumed-row scan.

Module (`src/core/chunkers/symbol-resolver.ts`):
* `resolveSymbolEdgesIncremental(engine, {sourceId, maxChunks?, onProgress?})`
  walks stale chunks in 200-chunk batches. For each chunk, loads its
  unresolved edges, finds same-page candidates by symbol_name_qualified,
  and writes outcome to `edge_metadata`:
   - exactly 1 candidate → `{resolved_chunk_id: <id>}`
   - 2+ candidates → `{ambiguous: true, candidates: [...]}`
   - 0 candidates → unchanged (cross-file; two-pass.ts handles those)
  Each batch bumps `edges_backfilled_at = NOW()` for the chunks.
* `readEdgeResolution(metadata)` — public helper for downstream code
  (two-pass.ts, code_blast op, eval-capture) to consume the resolver's
  output without parsing JSON directly. Returns a tagged union.
* `EDGE_EXTRACTOR_VERSION_TS` exported constant — bump when extractor
  shape changes and the next cycle re-walks all chunks.

Tests (5 E2E in test/e2e/symbol-resolver-pglite.test.ts, all PGLite,
no DATABASE_URL): unambiguous match, ambiguous multi-match, no match,
watermark advance + idempotency, source isolation (no cross-source
candidate leak).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W0c): wire resolve_symbol_edges as a new cycle phase

W0c's symbol resolver lands as a 12th cycle phase between extract and
patterns. The autopilot's quick-cycle path (60s watchdog interval per
D2 from eng review) now resolves stale chunks incrementally so agents
see resolved edges within ~60s of writes rather than waiting on the
slow full-walk path.

* CyclePhase + ALL_PHASES + NEEDS_LOCK_PHASES extended with
  'resolve_symbol_edges'. Position: between extract (which emits new
  bare-token edges from sync diffs) and patterns (which reads the
  graph). Acquires the cycle lock because it writes edge_metadata.
* CycleReport.totals adds edges_resolved + edges_ambiguous so doctor
  and autopilot summaries surface the numbers.
* runPhaseResolveSymbolEdges walks every registered source via
  listSources() + resolveSymbolEdgesIncremental(). Per-call cap is
  BATCH_SIZE*10 = 2000 chunks so a single watchdog tick stays bounded
  even on a 100K-chunk brain. Subsequent ticks pick up the leftovers
  via the edges_backfilled_at watermark.
* Test count bumped from 11 → 12 phases in cycle.serial.test.ts and
  cycle.test.ts (both pinned by the regression guards). Existing 28
  cycle tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W3): MCP-expose code_callers / code_callees / code_def / code_refs

Pre-v0.34 these four code-intelligence commands lived in CLI_ONLY at
cli.ts:30 — agents calling gbrain via MCP couldn't reach them and fell
through to text search. This commit ships the agent-facing MCP surface
for v0.34 against the existing v0.20+ tree-sitter call graph; recursive
blast/flow and clusters land in subsequent commits.

* `code_callers(symbol, [limit, source_id, all_sources])` — wraps
  engine.getCallersOf. Reverse view of the A1 call graph.
* `code_callees(symbol, [limit, source_id, all_sources])` — wraps
  engine.getCalleesOf. Forward view.
* `code_def(symbol, [limit, lang])` — wraps findCodeDef. Returns
  definition sites with file/line/snippet.
* `code_refs(symbol, [limit, lang])` — wraps findCodeRefs. Returns
  every reference (comments, strings, imports, call sites).

All four are scope:'read', source-scoped by default via ctx.sourceId
(W0a contract). Per-call source_id param wins over ctx; pass '__all__'
or all_sources=true to force cross-source.

* operations-descriptions.ts: 4 new constants per the eng review D10
  finding — every description carries an inline example response so
  agents don't burn first-call context discovering shape. Resolver-grade
  wording ("BEFORE editing any function, run code_callers...") routes
  plan-mode questions straight to the right op.
* SEARCH_DESCRIPTION gains a cross-link clause pointing at the four new
  ops so agents stop falling through to text search for code-symbol
  questions.

Tests (11 E2E in test/e2e/code-intel-mcp-ops-pglite.test.ts):
  - All four ops registered + scope:read + description pinned by constant
  - All four ops have required symbol param
  - code_callers / code_callees return the documented envelope shape
  - Source scoping honors ctx.sourceId
  - all_sources=true / source_id='__all__' force cross-source
  - code_def returns the def-site snippet

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(v0.33.0): agent-readable migration doc for the code-intel foundation

skills/migrations/v0.33.0.md gives existing-user upgrade guidance for the
v0.33.0 foundation pre-release (this branch's accumulated work toward
v0.34 Cathedral III):

* Source-routing fix (Codex #2) — query / two-pass now honor sourceId
* CLI source-scoping default flipped (Codex #7) — gbrain code-callers
  defaults to source-scoped, --all-sources is the explicit opt-out
* MCP exposure of code-callers / code-callees / code-def / code-refs
  with resolver-grade descriptions agents auto-route to
* Within-file symbol resolver runs as a new `resolve_symbol_edges`
  cycle phase between extract and patterns
* Schema migration v51: edges_backfilled_at watermark + 3 composite/
  partial indexes for the resolver hot path
* Verification commands the agent runs after `gbrain upgrade`

Bumps the existing-user migration ladder so the auto-update agent
(SKILLPACK Section 17) discovers + runs the v0.33.0 migration steps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(v0.33.0): bump VERSION + package.json + CHANGELOG

v0.33.0 ships the v0.34 Cathedral III foundation: MCP exposure of
code_callers / code_callees / code_def / code_refs with resolver-grade
tool descriptions, plus the source-routing fix + within-file symbol
resolver + cycle-phase wiring that v0.34's recursive blast/flow and
Leiden clusters will build on.

Full release notes in CHANGELOG.md. Trio in lockstep:
  VERSION:      0.33.0
  package.json: 0.33.0
  CHANGELOG.md: ## [0.33.0] - 2026-05-11

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(v0.33.0): update dream-cycle phase-order assertions for resolve_symbol_edges

E2E test pinned the canonical phase sequence as a regression guard. The
v0.33.0 resolve_symbol_edges phase (added between extract and patterns)
correctly bumps the count to 12 — caught by the canonical-order test on
fresh-Postgres run, fixed by adding the new phase to EXPECTED_PHASES
and bumping the version history comment.

Both cycle.serial.test.ts and cycle.test.ts were already updated in the
W0c cycle-phase commit (6f7dbe1); this third pin lives in
test/e2e/dream-cycle-phase-order-pglite.test.ts and was missed.

Full E2E suite now: 550 passed / 0 failed / 81 files (real Postgres on
port 5435 via Docker pgvector/pgvector:pg16).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 STEP 0): promote OperationContext.sourceId to REQUIRED (D4)

Flip src/core/operations.ts:350 `sourceId?: string` → `sourceId: string`.
Mirrors v0.26.9 `remote` REQUIRED pattern that closed the HTTP RCE class —
the compiler is the first defense against any v0.34 code-intel op
forgetting to thread sourceId and silently cross-contaminating retrieval
across sources.

- src/mcp/dispatch.ts: buildOperationContext auto-fills 'default' when
  opts.sourceId is undefined. Single-source brains (~80% of installs)
  keep working with no caller change; multi-source brains pass sourceId
  explicitly via dispatch opts.
- src/cli.ts:makeContext: always populates sourceId via the existing
  resolveSourceId() 6-tier chain, falling back to 'default' on
  fresh/pre-init brains where the sources table doesn't exist yet.
- src/commands/book-mirror.ts, src/core/minions/tools/brain-allowlist.ts:
  Two production context-builders that previously omitted sourceId.
  Both now pass sourceId: 'default' (operator-trust path, single-source
  by design).
- 10 test/* files: every OperationContext literal now passes sourceId.

test/operation-context-sourceid-required.test.ts: paired contract test
(6 cases) pinning the type contract. @ts-expect-error directives on
omitted-sourceId / undefined-sourceId guard against future regression;
runtime tests verify buildOperationContext's auto-fill safety net.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W1): receiver-type resolution at edge-extraction time

The edge-extractor emits qualified callee names (Class::method,
module::method) for the 3 MUST-resolve patterns from the design doc
when running against JS/TS/TSX + Python source:

  1. `import { x } from 'y'; x.method()` → emit `y::method`
  2. `class C { m() { this.m() } }` → emit `C::m`
  3. `const c = new C(); c.m()` → emit `C::m`

When the receiver can't be resolved within WALK_DEPTH_CAP (32) ancestor
hops of the call site, falls back to bare-token emit (pre-W1 behavior).
Ambiguous-but-named-correctly beats wrong-but-confident; the symbol
resolver's second pass still gets a chance to disambiguate via same-page
symbol_name_qualified lookups.

Per D18 from eng review — only JS/TS/TSX + Python get receiver
resolution. Ruby/Go/Rust/Java keep pre-W1 bare-token emit semantics.
RECEIVER_RESOLUTION_LANGS pins the eligible set.

Per D12 from eng review — WALK_DEPTH_CAP=32 covers any realistic code
shape; JSX-in-JSX or closure chains rarely exceed depth-20. The cap
prevents one pathological file from multiplying cycle cost across the
whole brain on every dream run.

- src/core/chunkers/edge-extractor.ts: new `resolveReceiverType` helper
  + WALK_DEPTH_CAP export + RECEIVER_RESOLUTION_LANGS set. extractCallEdges
  attempts resolution on every member-call emit; falls back on miss.
- src/core/chunkers/symbol-resolver.ts: EDGE_EXTRACTOR_VERSION_TS bumped
  to 2026-05-14 so the next dream cycle re-walks every chunk and lets
  the resolver pick up qualified-name matches.

test/code-intel/scope-walker-resolution.test.ts: 10 hermetic snapshot
tests covering all 3 MUST patterns + bare-call fallback + unresolvable
member call. Tests load tree-sitter WASMs on demand and short-circuit
when grammars are unavailable in the test runtime.

Scope reduction from the original plan: the .scm pattern-file
architecture envisioned by the design doc is deferred to v0.34.1. The
codebase doesn't use tree-sitter's Query API anywhere today; introducing
it across chunkers/scope/patterns/* is a multi-day investment that
duplicates the manual-AST-walker idiom edge-extractor.ts already uses.
This commit ships the same functional outcome (qualified names for the
3 MUST patterns + depth cap + honest language scope) via the existing
idiom; v0.34.1 can refactor to .scm files if/when query-API benefits
materialize.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W2): edge densification — imports + references edge types

Edge extractor now emits three edge kinds:
  - calls (v0.20 baseline; v0.34 W1 added qualified-name receiver
    resolution for JS/TS/TSX + Python)
  - imports (NEW in v0.34 W2; JS/TS/TSX + Python at depth)
  - references (NEW in v0.34 W2; TS-only)

Why this matters: Leiden clusters on a calls-only graph produce overfit
garbage (GitNexus showed 0.052 cluster/node on calls-only — useless).
Adding imports + references densifies the graph so W4-5's clusters can
land meaningful communities. Per design doc Constraint #1.

- src/core/chunkers/edge-extractor.ts: new extractImportEdges and
  extractReferenceEdges functions + combined extractAllEdges wrapper.
  ExtractedEdge.edgeType widened to 'calls' | 'imports' | 'references'.
- src/core/chunkers/code.ts: switched the chunker's edge-extraction call
  site from extractCallEdges to extractAllEdges so imports + references
  flow into code_edges_symbol alongside calls.
- src/core/chunkers/symbol-resolver.ts: EDGE_EXTRACTOR_VERSION_TS bumped
  to 2026-05-14T01:00:00Z so the next dream cycle re-walks every chunk.

Language scope per D18 from eng review:
  - JS/TS/TSX: imports + references emitted
  - Python: imports emitted, references skipped (Python type hints too
    sparse for v0.34; v0.35 may revisit)
  - Ruby/Go/Rust/Java: calls only — no imports, no references. Honest
    coverage matrix; code_blast/code_flow return 'unsupported_language'
    response for these langs (W2 commit 4 wires this).

Edge schema reused: code_edges_symbol.edge_type is the existing TEXT
column populated by the unique constraint
(from_chunk_id, to_symbol_qualified, edge_type). Adding new types
doesn't conflict with existing calls edges.

test/code-intel/edge-densification.test.ts: 13 hermetic tests covering
named/default/namespace/aliased/side-effect imports for JS/TS, from-x-
import-y + import-pkg for Python, function parameter + return type
references for TS, and unsupported-language returns-empty contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W3b): code_traversal_cache table, module, and clear admin op

Schema migration v56 (code_traversal_cache_v0_34):
  - new table: code_traversal_cache (id, symbol_qualified, depth,
    source_id, response_json JSONB, max_chunk_updated_at, xmin_max,
    cluster_generation, computed_at)
  - unique index on (symbol_qualified, depth, source_id)
  - secondary index on source_id for cheap source-scoped clears

D3 — generation-counter cache invalidation. cluster_generation is a
BIGINT column on every cache row; bumped once per recompute_code_clusters
phase via bumpClusterGeneration(). Cache rows referencing stale
generations naturally miss on read. Eliminates the bug class where
cluster recompute leaves stale cache entries that reference dropped or
renamed clusters.

D8 — destructive-guard parity. clearTraversalCache requires either
source_id OR all_sources=true. Without either it throws. Mirrors v0.26.5
destructive-guard pattern; the MCP op (code_traversal_cache_clear,
scope: admin, localOnly: true) inherits the gate.

- src/core/code-intel/traversal-cache.ts: cache module with public API
  - getClusterGeneration / bumpClusterGeneration (config-backed counter)
  - getCachedTraversal / putCachedTraversal (low-level read/write)
  - getCachedOrCompute (try-cache-then-compute wrapper for W3 ops)
  - clearTraversalCache (admin clear with source-scope gate)
- src/core/operations.ts: code_traversal_cache_clear op registered with
  scope: 'admin' + localOnly: true. Dry-run aware; resolves source_id
  from params or ctx.

v0.34.0.0 scope: cache writes use xmin_max=0 sentinel (no snapshot
isolation). REPEATABLE READ + xmin_max snapshot isolation + PGLite
serialization_failure retry is wired in the module but disabled by
default; v0.34.1 enables it once W3 ops produce enough load to justify
the correctness gain. Under low-write workloads (the common case for an
agent's plan-mode session, 5-15 blast calls without concurrent sync),
the cache stays correctness-safe via the cluster_generation invalidation
+ the natural UPSERT on conflict.

test/code-intel/traversal-cache.test.ts: 13 hermetic PGLite tests
covering cache hit/miss, D3 generation-counter invalidation, UPSERT
replacement, source-scoped + all-sources clear paths, and getCachedOrCompute
try-cache-then-compute happy path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W3): code_blast + code_flow recursive ops + sinks

Recursive caller (code_blast) + recursive callee (code_flow) walks land
as first-class MCP ops. The user-facing payoff for v0.34: v0.33.3
shipped flat callers/callees; v0.34 ships depth-grouped recursive walks
with cycle detection, truncation flags, freshness reporting, sink
tagging on terminal nodes, and bare-name disambiguation with
did_you_mean suggestions.

- src/core/code-intel/recursive-walk.ts: BFS over existing engine
  single-hop methods (getCallersOf, getCalleesOf). Depth-grouped output;
  confidence = clamp(1 / (1 + 0.3 * depth), 0.05, 1.0). Cycle detection
  via visited-set; truncation enum captures both depth_cap and max_nodes
  exhaustion. Source-scoped per D4 sourceId REQUIRED.
- src/core/code-intel/sinks/{ts,py,index}.ts: per-language sink patterns
  as TypeScript constants (D9 — auditable literal-string + glob; NOT
  regex). Pattern cache hits warm after first match per process.
  TS_SINKS covers fetch, axios.*, fs.*, Bun.*, execSync, spawnSync;
  PY_SINKS covers requests.*, urllib.*, subprocess.*, open, pathlib.*.
- src/core/operations.ts: code_blast + code_flow registered with
  scope: 'read'. Both wrap their walks through
  getCachedOrCompute (W3b) so repeat blasts in a plan-mode session hit
  cache. depth + max_nodes hard-capped at handler entry per design doc
  Constraints. exact: true skips bare-name disambiguation.

Response envelope (shared):
  { result: 'ok' | 'not_found' | 'ambiguous' | 'unsupported_language',
    depth_groups?, cycles_detected?, truncation?, freshness?,
    did_you_mean?, candidates?, supported? }
code_flow adds: terminal_nodes: [{symbol, sink_kind}] where sink_kind ∈
  'db_call' | 'http_call' | 'file_io' | 'process_exec' | 'unknown'

Per D18 from eng review — only JS/TS/TSX + Python get walks. Other
languages return {result: 'unsupported_language', supported: ['ts',
'tsx','js','py']} cleanly rather than aliasing same-named callees.

test/code-intel/recursive-walk.test.ts: 11 hermetic PGLite tests:
  - 7 sinks classifier cases (http_call, file_io, db_call, process_exec
    for TS + Python, unknown for made-up symbol, unknown for ruby lang)
  - not_found returns did_you_mean
  - happy-path: caller chain emerges in depth_groups; confidence ~0.77
    at depth 1
  - truncation: depth_cap fires when walk exceeds depth
  - sink-tagging: fetch lands in terminal_nodes with http_call kind

v0.34.0.0 scope reductions: stdio rate limiter at dispatch.ts and CLI
wrappers (gbrain blast / gbrain flow) deferred — the ops are MCP-
reachable today and the W8 release packaging step adds CLI thin-shims.
The eng-review's stdio limiter at dispatch.ts (D10) is queued behind
the eval gate run; concurrent code-intel load needed to justify it
hasn't materialized at v0.34.0.0 ship time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W6): gbrain edges-backfill CLI

Operator escape hatch for the symbol-resolution backfill chain. Thin
wrapper over resolveSymbolEdgesIncremental that takes explicit
--source / --all-sources / --max-chunks flags.

Resumable via the edges_backfilled_at watermark (W0c). Per-batch
transactions commit, so Ctrl-C leaves a clean resumable state. A re-run
picks up where the prior invocation stopped.

Usage:
  gbrain edges-backfill                # default source
  gbrain edges-backfill --source <id>  # specific source
  gbrain edges-backfill --all-sources  # every registered source
  gbrain edges-backfill --json         # machine-readable output

Wired into src/cli.ts CLI_ONLY + dispatch table.

Scope reduction from the original plan: gbrain wiki (the zero-LLM
cluster aggregator) is deferred to v0.34.1 alongside W4-5 clusters —
without clusters, the wiki aggregator has nothing to aggregate.
gbrain upgrade backfill prompt is also deferred to v0.34.1; v0.34.0.0's
upgrade chain runs apply-migrations only, and users who want to
materialize the new W1/W2 edge shapes invoke gbrain edges-backfill
manually.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W7): per-op graph-traversal metrics module

src/core/eval-capture-graph.ts — pure-function metrics module for
comparing code_blast / code_flow / code_cluster_get result shapes
across two runs (eval-replay's regression check).

Per Codex finding #3 from the plan-review: page-slug Jaccard is the
wrong metric for graph traversal. v0.34 W7 ships proper per-op metrics:

  - nodeSetJaccard(a, b): set Jaccard over (file, line, symbol)
    tuples. Right metric for code_blast/code_flow node sets.
  - depthGroupStability(a, b): 1 - (displaced / |union|). Catches the
    case where node membership is identical but nodes moved between
    depth buckets between runs.
  - truncationMatch(a, b): boolean match on the truncation enum.
    Discrete signal that pairs with Jaccard.
  - adjustedRandIndex(a, b): cluster-membership stability via ARI for
    code_cluster_get. v0.34.1 consumer; lands in W7 alongside the rest
    so the cluster-replay path is ready when clusters ship.
  - compareCodeWalk(a, b): convenience wrapper returning
    {jaccard, depth_stability, truncation_match} in one call.

Hermetic — no engine, no DB, fully unit-testable. 20 test cases
covering identical / disjoint / partial-overlap / empty / dedup /
file+line-distinguished, depth-bucket reshuffles, truncation-enum
matching, ARI identical-clustering recognition through label-rename,
ARI singleton-vs-all-one expected-zero, equal-length contract, and
combined compareCodeWalk envelope.

Scope reduction from the original plan: extending
src/core/eval-capture.ts capture wrapper with `tool` field +
`result_shape` payload, and extending src/commands/eval-replay.ts to
dispatch on tool — both deferred to v0.34.1. The metric MODULE is the
load-bearing piece (Codex finding #3's primary fix); wiring it through
the existing capture/replay surface is a follow-up that doesn't change
production behavior until clusters ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(v0.34.0.0): VERSION + package.json + CHANGELOG + migration doc

Final release packaging for v0.34.0.0. Three-line audit will show:
  VERSION:     0.34.0.0
  package.json: 0.34.0.0
  CHANGELOG:   ## [0.34.0.0] - 2026-05-14

CHANGELOG entry follows CLAUDE.md voice rules:
  - Bold headline + lead paragraph
  - "What ships in v0.34.0.0" itemized list
  - "Slip handling — deferred to v0.34.1" honest scope note
  - Numbers-that-matter table comparing v0.33.3 → v0.34.0.0
  - Mandatory "## To take advantage of v0.34.0.0" block with verify
    commands (gbrain edges-backfill, gbrain doctor, code_blast/flow,
    eval gate run)

skills/migrations/v0.34.0.0.md — agent-readable upgrade doc. Lists
the mechanical migration chain (apply-migrations adds v56), the
manual `gbrain edges-backfill --all-sources` step for re-walking
existing chunks with the new W1/W2 emission shape, and the slipped
v0.34.1 scope.

v0.34.0.0 ships:
  STEP 0 (sourceId REQUIRED), W1 (receiver-type resolution),
  W2 (imports + references), W3b (traversal cache),
  W3 (code_blast + code_flow + sinks),
  W6 (gbrain edges-backfill CLI),
  W7 (eval-capture-graph metrics module).

v0.34.1 backlog: W4-5 Leiden clusters, W6 wiki, W7 capture wiring,
W1 .scm rewrite, W3 stdio limiter, W3 CLI shims, D2 autopilot
sub-loop. All deferred per the plan's explicit slip-handling clause
because the cluster ship gate (≤0.03 clusters/node) and the eval
gate (+10pp precision@5) both require real brain data unavailable
at ship time.

Test surface in v0.34.0.0 (73 hermetic pass across 6 new files):
  - test/operation-context-sourceid-required.test.ts (6 cases)
  - test/code-intel/scope-walker-resolution.test.ts (10 cases)
  - test/code-intel/edge-densification.test.ts (13 cases)
  - test/code-intel/traversal-cache.test.ts (13 cases)
  - test/code-intel/recursive-walk.test.ts (11 cases)
  - test/code-intel/eval-capture-graph.test.ts (20 cases)

Migration v56 (code_traversal_cache_v0_34) verified applying clean
on PGLite via the test suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(v0.34 D7): snapshotIndexes helper for cross-engine index parity

Extends test/helpers/schema-diff.ts with snapshotIndexes() +
diffIndexSnapshots() + isCleanIndexDiff() + formatIndexDiffForFailure().

Why this matters: the existing snapshotSchema() captures
information_schema.columns only, so a missing INDEX (not column)
between Postgres and PGLite silently passes the schema-drift test
while the symbol resolver degrades from index-only-scan to Cartesian
on 96K-chunk brains. The v0.34 D7 finding from the eng review called
this out specifically for the W4-5 hot-path indexes
(code_edges_symbol_unresolved_idx partial composite +
content_chunks_symbol_lookup_idx composite).

Implementation: queries pg_index + pg_class via pg_catalog views
(supported by both Postgres and PGLite). Captures index name, owning
table, full pg_get_indexdef() shape, uniqueness, partial-predicate.
The diff compares definitions after normalizing whitespace +
lowercasing — engine-specific formatting differences are filtered out
so only real shape drift surfaces.

Reused by future test/e2e/schema-drift.test.ts wiring (sibling test
that spins up real Postgres + PGLite, snapshots both, diffs).

test/helpers/schema-diff-indexes.test.ts: 7 hermetic cases on
synthetic snapshots — matching, pg-only, pglite-only, uniqueness
mismatch, partial-predicate mismatch, allowlist suppression, and the
formatter producing a readable failure message naming the missing
side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(v0.34): update 4 pre-existing tests for new emit shapes + sourceId contract

Three test files updated to match the v0.34 contract changes:

- test/edge-extractor.test.ts: two assertions on `toSymbol` exact-match
  were brittle to the W1 receiver-type resolution. `this.go()` /
  `self.go()` now resolve to `Foo::go` instead of bare `go`. Tests
  accept either form for back-compat with brains still on pre-W1
  extracted edges.

- test/source-id-tx-regression.test.ts: the D16 "back-compat
  cross-source view preserved" test was asserting that ctx.sourceId
  undefined → cross-source view. v0.34 STEP 0 (D4) closes that path
  by design — it's the exact cross-source-bleed bug class STEP 0
  fixed. Test renamed + assertion updated to reflect: makeCtx() with
  no override now falls back to 'default' (per the dispatch + cli
  auto-fill), and cross-source visibility is an explicit caller
  decision, not an implicit consequence of ctx omission.

- test/chunker-timeout.test.ts: the GBRAIN_CHUNKER_TIMEOUT_MS=1
  fallback case asserted edges=[] under the calls-only extractor.
  W2's extractAllEdges emits imports/references from top-level
  statements even on a partial parse, so the timeout-fallback path
  can return non-empty edges. Assertion relaxed to "edges is an
  array" — the contract that matters is "returns cleanly without
  hanging," not the edges-array shape.

Full unit suite (parallel + serial): 6132 pass / 0 fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(migrate): remove duplicate edges_backfilled_at migration at v58

CI surfaced a duplicate migration version in test/migrate.test.ts:371
("runMigrations sorts by version ascending" — uniq.size === versions.length).

Root cause: the second master merge (PR #934 v0.33.3.0 foundation, commit
3fc0ca5) brought in master's `edges_backfilled_at` migration alongside
the one already in my branch. Both functionally identical (ALTER TABLE
content_chunks ADD COLUMN edges_backfilled_at + 3 indexes), both
renumbered to v58 (mine via the f25b674 merge that pushed past master's
v55 search-lite migrations; master's PR #934 originally claimed v55
which would have collided). Auto-merge kept both, named `_v0_33_2` and
`_v0_33_3`. Tests caught it.

Fix: deleted the `_v0_33_3` duplicate. The remaining `_v0_33_2` entry at
v58 is unchanged; SQL idempotency (ALTER TABLE IF NOT EXISTS + CREATE
INDEX IF NOT EXISTS) means brains that already applied either label
pass through cleanly.

Verification:
- 55 migrations total, all unique versions
- `bun run typecheck` clean
- `bun test test/migrate.test.ts`: 109 pass / 0 fail / 321 expect calls

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
brandonlipman added a commit to brandonlipman/gbrain that referenced this pull request May 29, 2026
* upstream/master:
  v0.35.1.0: embedder shootout prereqs (pricing + gateway export + --resume-from) (garrytan#1055)
  v0.35.0.0 feat: ZeroEntropy zembed-1 + zerank-2 reranker (garrytan#1008)
  v0.34.4.0 fix(embed): cursor-paginated --stale hardening wave (D2/D3/D4/D6/D7/D8 + regression test) (garrytan#991)
  v0.34.3.0 fix: supervisor treats code=0 watchdog exits as crashes (garrytan#1003)
  v0.34.2.0 fix(import): path-based checkpoint resume — kills parallel-drop + failed-file-skip + sort-flip bugs (garrytan#988)
  v0.34.1.0 fix(mcp): MCP fix wave — source-isolation P0 + PKCE DCR + federated_read + 3 more (garrytan#996)
  v0.34.0.0 feat: Cathedral III — recursive code intelligence + Leiden clusters + eval gate (garrytan#994)
  v0.33.3.0 feat(v0.33.3): code intelligence MCP foundation (v0.34 W0a-c + W3) (garrytan#934)
  v0.33.2.1 docs: fork-PR workflow for garrytan-agents (garrytan#992)
  fix(sync): raise maxBuffer to 100 MiB to prevent silent ENOBUFS crash (garrytan#982)
  v0.33.2.0 feat(search-lite): token budget + semantic query cache + intent weighting (garrytan#897)
  v0.33.1.1 fix: Voyage output_dimension + flexible-dim guard + OOM-cap rethrow (garrytan#962)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant