v0.41.34.0 feat(search): retrieval cathedral — max-pool + title + alias + evidence#1657
Merged
Conversation
T1 of the retrieval-cathedral wave (supersedes #1616). Vector search returned chunk-grain top-k with no DISTINCT ON, so a page could be represented by a weak chunk while a hub page's chunks crowded a distinct page's strong chunk out of the candidate set entirely. Keyword search always pooled per page; the vector path did not. - New shared buildBestPerPagePoolCte() in sql-ranking.ts — single source of truth consumed by searchKeyword + searchVector across postgres + pglite, so the two engines can't drift (the recurring parity bug class). - searchVector both engines: compute score as a select-list expr (HNSW ORDER BY stays pure-distance), pool DISTINCT ON (slug) over the full candidate set before the user LIMIT, deterministic tiebreak (slug, score DESC, page_id ASC, chunk_id ASC). - All keyword pooling blocks refactored onto the shared builder (DRY). - Regression test: a hub page's chunks no longer crowd out a distinct page's strong chunk; results are one-per-page by best chunk. Fails on old path. Verified: real-Postgres engine-parity 22/22, PGLite hermetic suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
T2 of the retrieval-cathedral wave. A query that is a phrase from a page's
title ("Greek amphitheater" -> "The Mingtang - Indoor Greek Amphitheater")
matched a weak body chunk instead of being recognized as a title hit. Names
of things deserve weight.
- New pure title-match.ts: isTitlePhraseMatch (contiguous token-run inside
page.title OR exact full-title match). Precision guards: >= 2 content
tokens OR exact full-title; stopword filter; token-boundary match (no raw
substring). Reused by the eval later so production + bench can't drift.
- applyTitleBoost post-fusion stage in hybrid.ts: reads page.title (not the
brittle "first chunk"), floor-ratio-gated, stamps title_match_boost for
--explain, never touches base_score (the agent's dedup confidence).
- ModeBundle.title_boost knob (1.25, on in all modes - cheap gated
correctness fix), search.title_boost config key, dashboard description.
- KNOBS_HASH_VERSION 6 -> 7 so a boost-on cache write can't serve a
boost-off lookup; all version-pin + canonical-bundle assertions updated.
- 18 new tests (matcher 13 + stage 5); typecheck clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Free-text alias resolution for search. gbrain stored a page's chosen names
in pages.frontmatter `aliases:` JSONB but search never consulted them, so a
query like "Hall of Light" or "明堂" couldn't surface the "Mingtang" page.
DELIBERATELY SEPARATE from slug_aliases (re-grounded against current code):
- slug_aliases: old-slug -> canonical-slug (wikilink/get_page redirect,
populated only from concept-redirect conversions)
- page_aliases: normalized free-text name -> canonical slug (search hop)
Overloading slug_aliases would muddy two distinct semantics, so this is a
new table, not an extension (honors DRY by keeping concepts separate).
- src/core/search/alias-normalize.ts: ONE normalizeAlias() (NFKC + lowercase
+ ws-collapse + quote-strip) + normalizeAliasList() shared by the write
(ingest) and read (search) paths so they match on the same key (CQ2).
- Migration v108 page_aliases (source_id, alias_norm, slug); btree
(source_id, alias_norm) for indexed-equality hop, NOT ILIKE; unique TRIPLE
(not source_id+alias_norm) so two pages may claim one alias — collisions
reported + resolved at query time, not blocked at ingest (Codex#8).
Mirror in pglite-schema.ts; Postgres fresh gets it from the migration.
- engine.resolveAliases(aliasNorms, {sourceId|sourceIds}) read +
setPageAliases(slug, source, aliasNorms) write, both engines, source-scoped.
- 17 tests: normalize round-trip, collision, source-scope, replace, clear.
Ingest projection + the hybridSearch alias hop land next (T3 wiring).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wires the page_aliases data layer into ingest (write) and hybridSearch (read) so a query that is a page's declared chosen name surfaces that page — the named-thing class neither max-pool nor title-boost can fix (true synonyms with zero surface overlap: "Hall of Light" / "明堂" -> the Mingtang page). - Ingest projection (import-file.ts): after the page write commits, normalizeAliasList(frontmatter.aliases) -> engine.setPageAliases. Always called (even []) so removing an alias clears its row; content_hash includes non-timestamp frontmatter so alias edits reach this path, not the skip branch. Fail-soft + pre-v108-safe (isUndefinedTableError swallowed). - applyAliasHop (hybrid.ts), AFTER rerank so a named query reliably surfaces its page: FULL normalized-query exact match only (no substring/n-grams), skip >6-token prose queries, present-boost 1.10x / inject absent canonical at top-of-organic + epsilon (never absolute 1.0, D3), collisions alpha-ordered + capped at 3, fail-open on pre-v108 / lookup error (D9). Stamps alias_hit for the T4 evidence contract. - SearchResult.alias_hit attribution field. - 8 tests: inject/boost/CJK/no-match/long-skip/collision + ingest projection round-trip + alias-removal-clears. 73 pass across the T1/T2/T3 + import suite. Backfill of existing pages' aliases lands as T8 (reindex --aliases). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… per-call mode (T4) The agent-facing fix for the incident's ROOT behavior: tonight the agent read a single blended 0.64 score, decided "no strong match, safe to write a new page", and wrote a duplicate on a developed concept page. A blended RRF/cosine score is not a calibrated probability, so the don't-duplicate decision must key off WHY a page matched, not a raw number. - evidence.ts: classifyEvidence (alias_hit > exact_title_match > high_vector_match > keyword_exact > weak_semantic) + createSafetyFor (exists | probable | unknown). stampEvidence runs at the end of every hybrid return path (main + both keyword fallbacks). SearchResult gains evidence + create_safety. The agent keys don't-duplicate off create_safety='exists', not a score threshold. - search op → cheap-hybrid everywhere (D4/D15): full vector+keyword+RRF+pool+ title+alias, expansion OFF (no per-call LLM cost); `query` stays full-control. search.mcp_keyword_only escape hatch (D17) keeps the old keyword-only behavior for operators who don't want query text sent to an embedding provider. - Alias hop + evidence now also run on the keyword-only fallback paths (the named-thing fix is most valuable exactly when vector is unavailable). - Per-call `mode` (D5): honored ONLY for local/trusted callers (ctx.remote=== false) so a remote OAuth client can't escalate to costly tokenmax; local + unknown mode rejects loudly; threaded into resolveSearchMode + the cache key. - 30 tests (evidence classifier incl. before/after-incident cases, per-call mode gate, alias hop). Updated mcp-eval-capture to the new cheap-hybrid contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After T4 made the `search` op cheap-hybrid, `gbrain search "x"` already does the
right thing — but `gbrain search modes/stats/tune` would have run a hybrid search
for the literal word "modes" instead of opening the config dashboard (the op
intercepts before the unreachable handleCliOnly dashboard path).
Add a pre-dispatch interception in main(): `search` + subArgs[0] in
{modes,stats,tune} → runSearch dashboard (with the v0.41.6.0 read-only connect+
dispatch 10s timeout preserved); everything else (free-text) falls through to the
cheap-hybrid `search` op. Subprocess test pins all three routes:
modes/stats → dashboard, free-text → search op ("No results", not "Unknown
subcommand").
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The eval that makes the retrieval-maxpool incident impossible to reintroduce silently. 7 query families, each a failure class the incident exposed: title-substring, generic-to-named, alias-synonym, multi-chunk-dilution, short-vs-rich, graph-relationship, hard-negative. - src/eval/retrieval-quality/harness.ts: pure scoring (Hit@1/Hit@3/MRR per family) + injected SearchFn (CLI uses hybridSearch; tests stub it) + evaluateGate. D12 gate: hard-gate the families that ARE the bug from day one (title-substring Hit@1>=0.95, alias-synonym Hit@1>=0.98, dilution Hit@3=1.0), warn-then-enforce the softer families. Env-overridable floors. - `gbrain eval retrieval-quality <fixture.jsonl> [--json] [--source]` + dispatch in eval.ts. Exit 0 PASS / 1 FAIL / 2 USAGE. - Synthetic fixture (placeholder names only, privacy-grep guarded) + hermetic gate test: seeds a synthetic brain, forces the keyword+title+alias path (embed transport stubbed to throw — free, deterministic), asserts the bug families pass. The vector max-pool guarantee is pinned separately by searchvector-maxpool.test.ts. - CI gate: the hermetic test is a normal unit test, so it runs in every PR shard — the gate is live on every change. - 23 tests (harness unit + hermetic gate + fixture privacy guard). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Standing observability so a retrieval regression is caught before a human hits it in chat (like tonight). Aggregate columns on search_telemetry (NOT per-query rows, D10): sum_rank1_score + count_rank1 + 3 coarse buckets (<0.6 / 0.6-0.85 / >=0.85). The mean rank-1 base_score is the headline; a downward drift = retrieval quality regressing. - hybrid.ts: capture rank-1 base_score at all three return paths, thread through emitMeta → recordSearchTelemetry opts (like results_count). - telemetry.ts: Bucket + record + flush ON CONFLICT-add + readSearchStats expose avg_rank1_score (null when no samples — no NaN) + rank1_distribution. - Migration v109 ADD COLUMN IF NOT EXISTS (both engines; search_telemetry lives only in migration v57, so the v57+v109 chain covers fresh + upgrade). Columns exempted in schema-bootstrap-coverage (no forward-ref index → no bootstrap need). - `gbrain search stats` surfaces the avg + bucket line; JSON envelope auto-carries the fields. "true-positive" wording dropped per Codex#14 — production has no labels, so this is an unlabeled rank-1 score histogram; labeled calibration lives in NamedThingBench (T6). - 3 round-trip tests (mean+buckets, no-result excluded, empty=null). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Import-time projection (T3) covers new + changed pages; this backfills EXISTING pages whose frontmatter `aliases:` predate v108 / the projection. Walks listAllPageRefs (cheap cross-source (source_id, slug) enumeration), reads each page's frontmatter aliases, writes page_aliases via setPageAliases. Idempotent (setPageAliases replaces) so re-running is convergent — no op-checkpoint needed (fast, no embedding). --dry-run reports would-write counts, --source narrows, --limit caps, --json envelope, progress reporter. Wired into the `reindex` dispatch alongside --markdown / --multimodal. 4 tests: backfill from array + comma-scalar frontmatter, --dry-run writes nothing, idempotent second run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pins that pre-v108 brains (no page_aliases table) keep working: applyAliasHop returns input unchanged + doesn't throw, importFromContent with frontmatter aliases still imports (projection swallows table-missing via isUndefinedTableError), and resolveAliases surfaces the error for the caller to catch. Completes the T9 mandatory regression set (dilution → searchvector-maxpool, dispatch → cli-search-dispatch, MCP contract → mcp-eval-capture, engine parity → engine-parity 22/22, pre-migration → here). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (T0) The operator-facing trace the user runs against the production brain to pin which retrieval layer surfaces (or misses) a target page — the diagnostic the plan front-loaded so we don't ship a fix that doesn't move the incident. `gbrain search diagnose "<query>" --target <slug> [--json] [--source]` reports, for the target: keyword rank+score, vector rank+score (skipped/graceful if no embedding provider), whether the query is a registered alias, and the hybrid final rank + evidence + create_safety + which boosts fired (title/alias). The verdict names the layer that surfaces the target at rank 1 (or "none"), telling you whether the lever is max-pool/innerLimit (vector) vs title/alias. Wired into the `search` dispatch alongside modes/stats/tune (60s timeout since it runs real retrieval). 2 hermetic tests (alias-query trace + title-phrase trace). For the Mingtang incident, run: gbrain search diagnose "Greek amphitheater" --target projects/new-greek-theater/concept_v0 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ssary (T10) - RETRIEVAL_MAXPOOL_INCIDENT.md: replaces closed PR #1616's RFC with the verified record — what happened, the disease, the corrections to the RFC's mechanics (search was keyword-only, --mode unthreaded, hybrid already pooled at dedup, aliases dead to search), the four-layer fix that shipped, and the triage commands (search diagnose / reindex --aliases / search stats / eval retrieval-quality). - RETRIEVAL.md: new "Named-thing retrieval" section documenting per-page pool + title boost + alias hop + the evidence contract, reconciling the doc with the shipped pipeline (closes the doc/reality gap). - metric-glossary.ts + regenerated METRIC_GLOSSARY.md: Hit@1, Hit@3, avg_rank1_score (drift signal, not labeled accuracy), and create_safety (the evidence contract) now carry plain-English glossary entries. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…fixup) The banned-name literal list itself tripped check-privacy/check-test-real-names. Replace it with the load-bearing assertion: every fixture slug must be an *-example placeholder (no real brain page can be referenced). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… adversarial)
Codex outside-voice caught two source-isolation P0s in the retrieval wave — the
exact class the v0.34.1 seal guards. Both fixed before merge.
P0-1: buildBestPerPagePoolCte pooled on `slug` alone. In a federated brain, two
pages with the same slug in different sources collapsed before ranking/pagination
(the neighbor-source page dropped). Now DISTINCT ON (COALESCE(source_id,'default'),
slug) — composite key matching dedup.ts's pageKey. Also fixes the PRE-EXISTING
keyword-path bug (best_per_page was slug-only before this wave); real-PG parity 23/23.
P0-2: the alias hop dropped source_id. resolveAliases returned bare slugs and
applyAliasHop hydrated via getPage(slug, undefined), so a federated caller could
get the default-source page injected or the right allowed-source page suppressed.
resolveAliases now returns {slug, source_id} pairs; applyAliasHop matches by
(source_id, slug) and fetches each canonical in its OWN source.
Regression tests: alias hop boosts only the aliased source (not same-slug in
another source); resolveAliases keeps cross-source same-slug distinct.
Deferred as documented tradeoffs (TODO): evidence high_vector_match label uses
blended base_score not pure cosine; deep-pagination candidate budget is
chunk-bounded; telemetry writes swallow errors pre-v109 on rolling deploys.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Version trio + CHANGELOG header + CLAUDE.md key-file annotations + TODOS heading + regenerated llms bundles, all moved to 0.41.34.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rch-max-pool # Conflicts: # CHANGELOG.md # CLAUDE.md # TODOS.md # VERSION # llms-full.txt # package.json # src/core/migrate.ts
…rch-max-pool # Conflicts: # CHANGELOG.md # CLAUDE.md # TODOS.md # VERSION # llms-full.txt # package.json # src/core/migrate.ts # src/core/search/hybrid.ts
Two CI failures surfaced after the master merges that brought the branch to 111 migrations: 1. shard 1 — `ALL_METRICS roster > matches the renderer output (no orphans)`: the merge took master's `renderMetricGlossaryMarkdown` whose `groups` array lacked this branch's 4 retrieval-quality keys (hit@1, hit@3, avg_rank1_score, create_safety). `ALL_METRICS` (derived via Object.keys) kept them, so the roster test saw 4 orphans. The freshness check (check:eval-glossary) passed because renderer-output == committed doc — it can't catch a renderer that drops a metric; the roster test can. Restored the "Retrieval-Quality / Evidence Metrics (NamedThingBench)" group + regenerated docs/eval/METRIC_GLOSSARY.md. 2. shard 2 — facts-anti-loop's two engine-dependent put_page tests failed while the two engine-free extractFactsFromTurn tests passed (the signature of a partially-failed beforeAll). This file has a documented PGLite-cold-start-under-deep-shard-load timeout history; the 30s budget was tuned for 95 migrations and the chain is now 111. Bumped to 60s. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to the prior hook-timeout bump, which was the wrong theory: the [58ms]/[71ms] body times in the re-run prove beforeAll did NOT time out — the engine connects and the two put_page tests run and fail for real, while the two engine-free extractFactsFromTurn tests in the same file pass. put_page (via dispatchToolCall) touches process-global singletons (the facts queue + the AI gateway used by importFromContent's embed step). Some sibling file in the 78-file shard-2 process leaves residual global state that makes put_page's pre-backstop path fail on the CI runner. The failure is NOT reproducible alone, in a Linux oven/bun:1 container, or in a full local shard-2 run (1172 pass) — only on the GitHub runner, deterministically. Per CLAUDE.md's test-isolation rules, a test coupled to shared process state belongs in its own process. Renamed to *.serial.test.ts so it runs in the dedicated serial-tests job (scripts/run-serial-tests.sh spawns a fresh `bun test` per serial file), where it passes deterministically; test-shard.sh excludes serial files from the matrix. Updated the comment to reflect the real cause and refreshed the test-weights.json key. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The prior serial-move theory was incomplete. The real, single root cause behind all three shard failures (2, 5, 10) is cross-file AI-gateway config pollution within a shard's bun process: - A test calls configureGateway() and doesn't restore the gateway on exit. The legacy-embedding preload pins OpenAI/1536 ONCE at process start and re-pins per-test ONLY when the gateway slot is empty — so a leaker that reconfigured the gateway to the v0.37 default (zeroentropyai:zembed-1 / 1280-d) and never reset poisons every later file in the shard. - Victim A (shard 5, test/search/searchvector-maxpool.test.ts): runs initSchema in beforeAll under the leaked gateway → content_chunks.embedding becomes vector(1280) → inserting its hardcoded 1536-d basis vectors throws pgvector CheckExpectedDim. - Victims B/C (shard 10 facts-backstop-gating, shard 2 facts-anti-loop): put_page's importFromContent embeds by design (embed failure PROPAGATES, Codex C2). Under a leaked fake-key gateway the embed step 401s and put_page returns isError → the backstop assertions fail. My branch's shard re-partition (added test files + weight changes) merely co-located leakers with victims; the hazard was latent. Fixes (root cause + self-sufficient victims): - test/search/rerank.test.ts (the shard-5 leaker): add afterAll(resetGateway). Its stub omits embedding_model, so it fell back to the ZE/1280 default; now it restores the empty slot so the preload re-pins legacy for the next file. - test/search/searchvector-maxpool.test.ts: pin configureGateway(openai/1536) in beforeAll BEFORE initSchema (initSchema runs before any preload beforeEach, so it can't rely on the inherited slot). - test/facts-backstop-gating.test.ts + test/facts-anti-loop.test.ts: reset the gateway in beforeEach so put_page's embed is a graceful no-op; reverted anti-loop from the serial quarantine back into the matrix (the serial move was the wrong fix for a gateway-state problem). Validated deterministically: a non-resetting leaker that poisons the gateway to ZE, run first in one bun process, no longer breaks any of the three victims (14/14 pass). verify 29/29, typecheck clean, isolation lint clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ukd1
added a commit
to ukd1/gbrain
that referenced
this pull request
May 30, 2026
Upstream shipped again after the last sync (v0.41.34.0 "retrieval cathedral" — garrytan#1657), which took version 0.41.34.0 (ours) and added migrations v110 (page_aliases) + v111 (search_telemetry_rank1_columns). Re-synced so PR garrytan#1388 merges clean. Re-resolutions: - Version: bumped 0.41.34.0 → 0.41.35.0 (next after upstream's 0.41.34.0). - Migration: our links_link_source_widen renumbered v110 → v112 (upstream now owns v110 + v111). Full-union CHECK preserved ('markdown','frontmatter','manual','mentions','wikilink-resolved'). - CHANGELOG: our 0.41.35.0 entry on top, upstream's 0.41.34.0 below. - schema-embedded.ts + llms-full.txt regenerated; link_source union verified intact in schema.sql + pglite-schema.ts after auto-merge against the new page_aliases / alias-hop schema. Verified: typecheck clean, privacy/jsonb/source-id guards pass, 376 surface + E2E tests green; migrations apply [110] page_aliases ... [112] wikilink-basename. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
Jun 3, 2026
* upstream/master: v0.41.36.0 feat(mcp): publish agent skills (list_skills / get_skill) for thin clients (garrytan#1661) v0.41.35.0 feat(guardrails): vendor-neutral content guardrail seams (supersedes garrytan#1652) (garrytan#1660) v0.41.34.0 feat(search): retrieval cathedral — max-pool + title + alias + evidence (garrytan#1657) v0.41.33.0 feat(search): intent-aware adaptive return-sizing + agent-facing query param (garrytan#1640) v0.41.32.0 fix(staleness): commit-relative sync staleness (supersedes garrytan#1623) (garrytan#1656) v0.41.31.0 feat(embed): delta-aware sync --all cost gate + real stale-embedding semantics (garrytan#1632) v0.41.30.0 fix(brainstorm/lsd): --save writes the advertised .md file via canonical ingestion path (garrytan#1655) # Conflicts: # src/core/operations.ts
This was referenced Jun 8, 2026
This was referenced Jun 8, 2026
fix(code-graph): TS/TSX call edges now resolve (const-arrow naming + source_id on edge writes)
#1723
Closed
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Supersedes the docs-only RFC in #1616 (closed) with the verified diagnosis + the complete fix. A real miss prompted it: an agent searched "Greek amphitheater," didn't confidently find the existing concept page (it came back at a weak 0.64 via a throwaway body chunk), and wrote a duplicate stub on top of 2,000 words. Four layers + a contract so that can't happen again:
Retrieval layers
6f5028a1) —searchVector(both engines) pools to one row per page (its best chunk) via a sharedbuildBestPerPagePoolCte; a strong page can't be crowded out by another page's chunks.dff45961) — a query that's a phrase inpage.titlegets a bounded, floor-gated boost (search.title_boost).3cd0f1fd,91669d23) — frontmatteraliases:project into a newpage_aliasestable (migration v108) + a query-time hop, so chosen names ("Hall of Light", "明堂") resolve to the page.gbrain reindex --aliasesbackfills existing pages.e34f60df) — every result carriesevidence+create_safetyso the agent's don't-duplicate decision keys off why a page matched, not a fuzzy score.Surface + UX
searchop → cheap-hybrid everywhere (CLI + MCP);querystays full-control;search.mcp_keyword_onlyescape hatch.gbrain search "<text>"routes to hybrid;modes/stats/tunestay subcommands (d17bb075).gbrain search diagnose "<q>" --target <slug>— Phase-0 retrieval trace (7a3e4d2a).--modefor local/trusted callers (remote uses configured mode).Eval + observability
gbrain eval retrieval-quality+ hermetic CI gate over 7 query families (1d697fbe).gbrain search stats(eee121a2).Fixes
322e2990): per-page pool and alias hop now key on(source_id, slug), not bare slug — caught by the codex adversarial pass.Test Coverage
~120 new test cases across the wave (max-pool dilution regression, title-match, alias engine round-trip + hop + source-isolation, evidence classifier, per-call mode, dispatch routes, telemetry round-trip, NamedThingBench harness + hermetic gate, pre-migration fail-open, diagnose). Real-Postgres engine-parity 23/23 (incl. the composite-key pooling). Full unit suite green (exit 0);
bun run verify29/29.Pre-Landing Review
Built test-first throughout;
bun run verifygreen on every step.Adversarial Review
Codex (cross-model) caught 2 genuine P0 source-isolation bugs (per-page pool + alias hop keyed on slug-only) — both fixed in-wave (
322e2990), real-PG parity re-verified. P1/P2 (evidence-label calibration, deep-pagination candidate budget, telemetry rolling-deploy gap) filed as TODOs — documented tradeoffs, not blockers.Plan Completion
All 11 plan workstreams (T0–T10) DONE. Plan:
~/.claude/plans/system-instruction-you-are-working-frolicking-brooks.md(CEO/eng/codex outside-voice reviewed, 19 decisions).TODOS
3 follow-ups filed (evidence calibration, page-bounded pagination, telemetry rolling-deploy).
Documentation
Corrected
RETRIEVAL_MAXPOOL_INCIDENT.md(closes #1616's narrative),RETRIEVAL.mdnamed-thing section,METRIC_GLOSSARY.md(Hit@1/3, avg_rank1_score, create_safety), CLAUDE.md key-files + llms regenerated.To take advantage
gbrain upgradeapplies migrations v108 + v109. Thengbrain reindex --aliasesto backfill names on existing pages. Thesearchop is now hybrid by default; setsearch.mcp_keyword_only truefor the old keyword-only behavior.Test plan
🤖 Generated with Claude Code