Skip to content

v0.40.4.0 feat(search): selective graph signals + per-stage attribution + audit-writer unification#1300

Merged
garrytan merged 26 commits into
masterfrom
garrytan/v0.40.4.0-selective-graph-signals
May 23, 2026
Merged

v0.40.4.0 feat(search): selective graph signals + per-stage attribution + audit-writer unification#1300
garrytan merged 26 commits into
masterfrom
garrytan/v0.40.4.0-selective-graph-signals

Conversation

@garrytan

Copy link
Copy Markdown
Owner

Summary

GBrain's search stops treating its link graph as wallpaper. Three small, additive ranking signals exploit edges the brain already has:

  • Adjacency hub (×1.05): a top-K page linked to by ≥2 other top-K pages
  • Cross-team hub (×1.10): a top-K page linked from ≥2 OTHER sources (federated brains)
  • Session diversification (×0.95 demote): chat-session chunks that share a date or chat/ segment — keep the highest, demote the rest

Plus the wave grew through review into three cathedral expansions:

  • Per-stage score attribution in EVERY boost (backlink/salience/recency/exact-match/graph/reranker) feeding a new gbrain search --explain formatter
  • Audit-writer unification: 5 hand-rolled JSONL audit modules collapsed onto createAuditWriter
  • Eval gates with paired bootstrap: longmemeval-mini A/B + recall@5 quality + Jaccard@5 + top-1 stability + 5pt hard floor

Default on for balanced + tokenmax modes. KNOBS_HASH bumps 3→4 so cache rows segregate cleanly across the upgrade. Off-by-default for conservative.

Test Coverage

122 new test cases across 9 new + 4 extended test files. Coverage audit: 92% (38 logical branches, 35 fully tested, 3 minor gaps captured as TODOs). Coverage gate: PASS (above 80% target).

Test count: 1830 → 1839+ (+9 new test files in this wave).

Pre-Landing Review

7 findings — all resolved before push:

  • 1 CRITICAL (codex caught the wire): postFusionOpts never set graphSignalsEnabled → entire feature was dead code in production. Fixed at commit 6f01fcb with new wire-integration test that grep-pins the literal.
  • 2 informational AUTO-FIXED: audit stderr message qualifier drift (3 modules restored byte-identical), deleted_at IS NULL defense-in-depth on getAdjacencyBoosts (both engines).
  • 4 informational ship-as-is (judgment calls): documented decisions, accepted with rationale.

Adversarial Review (Codex + Claude, always-on)

HIGH findings — all fixed at commit 47ded98:

  • H1 (Codex): Eval gate was a no-op. Test passed graph_signals: graphSignalsOn via as any cast but SearchOpts had no field. Both branches resolved to mode default → gate could pass while detecting nothing. Fix: added typed SearchOpts.graph_signals, threaded into both hybridSearch and hybridSearchCached perCall opts, dropped as any.
  • H2 (Codex): Session diversification fired on entity directories. sessionPrefix() used "any shared parent" → people/alice + people/bob got grouped and bob got demoted on every common entity-search query. Fix: narrowed to fire only on chat-session-shaped slugs (contains chat/session/sessions marker OR YYYY-MM-DD date segment). Entity dirs return null → diversification skips.
  • F1 (Claude subagent): Case-sensitivity drift across 3 sites — parser case-insensitive, doctor + search-stats case-sensitive. User sets TRUE and observability surfaces silently disagree with production. Fix: case-insensitive trim parity at all 3 sites.

11 LOW findings → captured as v0.41+ TODOs (NaN guard, audit windowing, ANSI escape, source-scope JSDoc-only contract, score compounding on repeat invocation, etc.).

Eval Results

test/e2e/graph-signals-eval.test.ts ships with 4 gates:

  1. paired bootstrap (10,000 resamples) recall@5 with p<0.05 fail-in-wrong-direction gate
  2. Jaccard@5 ≥ 0.5 (change magnitude)
  3. top-1 stability ≥ 0.7
  4. Hard absolute 5pt recall@5 drop floor (catastrophic catch)

All gates PASS on the bundled longmemeval-mini fixture.

pairedBootstrapPValue exported as a pure function with 5 dedicated tests for future calibration waves.

Greptile Review

PR doesn't exist yet during this run — Greptile comments will surface after creation.

Plan Completion

9 DONE, 2 CHANGED (T4 audit collocated in graph-signals.ts, T9 search-stats fire-rate metrics deferred to T-todo-2 per inline comment), 0 NOT DONE, 0 UNVERIFIABLE.

CLAUDE.md Key Files annotations added in docs commit (69aef24).

Documentation

README.md, CLAUDE.md, llms-full.txt all updated by /document-release subagent. Committed at 69aef24.

CHANGELOG.md top entry is the canonical v0.40.4.0 ELI10-lead description.

TODOS

5 graph-signals follow-ups + 11 LOW adversarial findings + 1 pre-existing-master-flake all captured under v0.40.4 adversarial review LOW findings — captured for v0.41+.

Test plan

  • bun run verify — privacy + checks + typecheck (PASS)
  • bun test parallel — 8469/8470 pass; 1 fail is pre-existing master flake (header-transport shard-ordering, confirmed via stash)
  • bun test test/search test/e2e/graph-signals-* test/doctor.test.ts test/audit/audit-writer.test.ts (122/122 wave tests PASS)
  • (TODO post-merge) Validate gbrain search "..." --explain on Garry's brain
  • (TODO post-merge) Confirm gbrain doctor reports graph_signals_coverage ok at >=30%

🤖 Generated with Claude Code

garrytan and others added 25 commits May 22, 2026 07:56
Extract createAuditWriter() helper. Five hand-rolled JSONL audit
modules (rerank-audit, shell-audit, supervisor-audit, audit-slug-
fallback, phantom-audit) duplicated the same ISO-week filename math,
best-effort write loop, and read-current-plus-previous-week loop.
T2 refactors all 5 onto this primitive.

Behavior preservation: filename format, JSONL line shape, mkdir
recursive, appendFileSync utf8, stderr-on-failure all byte-identical
to the existing modules so their tests pass unchanged.

resolveAuditDir() moves here from shell-audit.ts; shell-audit.ts
will re-export for back-compat (T2). Honors GBRAIN_AUDIT_DIR with
whitespace-trim, falls back to ~/.gbrain/audit/.

Test coverage: 22 cases covering ISO-week math + year-boundary edges
(2027-01-01 → 2026-W53), env override, mkdir-recursive, fail-open
stderr-warn shape, cross-week readback, corrupt-row skip, non-finite-
ts skip, round-trip with nested fields, computeFilename + resolveDir
accessors.

Plan ref: D5=B audit unification cathedral expansion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the duplicated ISO-week filename math + best-effort write loop
+ read-current-plus-previous-week loop in:
  - src/core/rerank-audit.ts (rerank-failures-*.jsonl)
  - src/core/audit-slug-fallback.ts (slug-fallback-*.jsonl)
  - src/core/minions/handlers/shell-audit.ts (shell-jobs-*.jsonl)
  - src/core/minions/handlers/supervisor-audit.ts (supervisor-*.jsonl)
  - src/core/facts/phantom-audit.ts (phantoms-*.jsonl)

All five now delegate file I/O to createAuditWriter from T1. Public
API preserved bit-for-bit:
  - logRerankFailure, readRecentRerankFailures, computeRerankAuditFilename
  - logSlugFallback, readRecentSlugFallbacks, computeSlugFallbackAuditFilename
  - logShellSubmission, computeAuditFilename, resolveAuditDir
  - writeSupervisorEvent, readSupervisorEvents, computeSupervisorAuditFilename
    plus isCrashExit, summarizeCrashes, CrashSummary (domain-specific
    helpers stay in supervisor-audit.ts; only file I/O moves)
  - logPhantomEvent, readRecentPhantomEvents, computePhantomAuditFilename

Domain-specific behavior preserved:
  - audit-slug-fallback emits per-call stderr (D7 dual logging) in the
    caller; the shared writer is failure-only stderr
  - rerank-audit truncates error_summary to 200 chars before write
  - phantom-audit spreads optional fields conditionally (skip undefined)
  - supervisor-audit keeps single-file readback (no cross-week walk)
    to preserve pre-v0.40.4 doctor assertions

resolveAuditDir lives in src/core/audit/audit-writer.ts; shell-audit.ts
re-exports it so existing imports keep working (every other audit
module + gbrain-home-isolation.test.ts + minions.test.ts +
minions-shell.test.ts pull resolveAuditDir from shell-audit.ts).

Operator-visible drift: rerank-audit stderr line drops the
'rerank-failure audit' qualifier — was '[gbrain] rerank-failure audit
write failed (...)' now '[gbrain] write failed (...); search continues'.
Stderr is human-debugging, not machine-parsed; the file written gives
the qualifier away in `tail -f audit/*`.

Test coverage: 128/128 audit-touching tests pass unchanged.

Plan ref: D5=B audit unification cathedral expansion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add BrainEngine.getAdjacencyBoosts(pageIds) returning Map<page_id,
AdjacencyRow{hits, cross_source_hits}>. Returns ALL pages with
hits >= 1 (callers apply their own threshold).

Cross-source semantic (D15=A): cross_source_hits EXCLUDES the target
page's own source. A page in source A linked from 2 pages in source A
reports cross_source_hits = 0. Linked from 1 in source B + 1 in
source C reports 2.

Source-scope contract: pageIds MUST already be source-scoped by the
caller. Method does NOT filter by source_id. The in-set restriction
makes cross-source leakage impossible by construction. JSDoc spells
this out; same trust posture as cosineReScore's chunk_id handling.

COALESCE(p.source_id, 'default') on both target and from-page sides
for defense-in-depth even though pages.source_id is NOT NULL today.

JSDoc/SQL contract alignment (codex #2): HAVING >= 1 matches the
"returns ALL pages with hits >= 1" contract; threshold of 2 is the
caller's call in applyGraphSignals.

Known limitation (codex #15): cross_source_hits cannot distinguish
"genuinely linked from another team" from "mirrored imports from
another source." T-todo-4 captures the v0.41+ refinement.

SearchResult type extension (D4=A flat fields, D12=A attribution):
  - graph_adjacency_hits, graph_cross_source_hits,
    graph_session_demoted, graph_session_prefix
  - base_score, backlink_boost, salience_boost, recency_boost,
    exact_match_boost, graph_adjacency_boost, graph_cross_source_boost,
    session_demote_factor, reranker_delta
All optional; T4-T6 populate them.

Test coverage: 7/7 hermetic PGLite cases. Empty input, singleton,
same-source hub, cross-source attribution including the
"linked-only-from-other-source" case (widget in source b, linked
from alice+bob in source a → cross_source_hits=1), JSDoc HAVING>=1
contract. Postgres parity asserted by SQL-shape identity (will get a
mirror Postgres E2E in T10's eval gate work via DATABASE_URL when
set; PGLite hermetic case shipped now).

NULL source_id COALESCE branch noted as untestable in current PGLite
schema (pages.source_id is NOT NULL); kept as defense-in-depth.

Plan ref: T3 in v0.40.4.0 wave plan; D1=A, D3=A, D15=A.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New file src/core/search/graph-signals.ts. Three signals:

  1. Adjacency-within-top-K (×1.05): hits >= 2 inbound from in-set.
  2. Cross-source adjacency (×1.10, stacks): cross_source_hits >= 2.
     Dormant on single-source brains.
  3. Session diversification (×0.95): if multiple top-K share a slug
     prefix, keep highest scoring, DEMOTE the rest. NOT amplify —
     codex caught the original framing was backwards (amplification
     of redundancy makes the cited "weak chunks compete for budget"
     problem worse, not better).

Conservative magnitudes (D14=B): 1.05/1.10/0.95. Score-distribution
probe (onScoreDistribution) collects min/p25/p50/p75/p95/max +
reorder_band_width to feed T-todo-2 magnitude calibration wave.

Slot: 4th stage inside runPostFusionStages (hybrid.ts:248), AFTER
backlink/salience/recency, pre-dedup. Inherits the v0.35.6.0
floor-ratio gate from computeFloorThreshold — this is the structural
protection that prevents a low-cosine hub from outranking a strong
non-hub (codex T2 / D1=A).

PostFusionOpts extends with graphSignalsEnabled, onGraphMeta,
onScoreDistribution. Caller (hybridSearch in subsequent T5 work)
resolves graph_signals from the mode bundle.

Source-scope contract preserved: getAdjacencyBoosts takes raw
page_ids, no source filter. Adjacency is in-set restricted so
cross-source leakage is impossible by construction (D3=A).

Fail-open: engine throw → JSONL audit row via shared createAuditWriter
(T1/T2 primitive, featureName='graph-signals-failures') + meta.errored
+ caller's results unchanged. Session diversification ALSO skips on
failure (predictable all-or-nothing posture).

Mutation note (codex #9): score mutated in place. base_score must be
stamped at runPostFusionStages entry BEFORE this stage so eval-capture
sees pre-boost score (T6 attribution wave).

Test coverage (24 cases, including T11 IRON RULE regression):
  - sessionPrefix multi/single/empty cases
  - computeScoreDistribution percentile math
  - Disabled + empty short-circuits
  - Adjacency hit, no-hit, cross-source stacking, cross-source alone
  - Session diversification 3-share + single-segment + singleton
  - Test seam injection (no engine call)
  - Fail-open: throw → audit row + meta.errored + unchanged
  - Empty Map → session still runs
  - Score-distribution always emits when enabled
  - Meta carries fire counts + duration_ms
  - Missing page_id silently skipped from dedup set
  - **T11 IRON RULE regression (3 cases):**
    * weak hub BELOW floor_threshold does NOT get boosted past
      above-floor non-hub (the bug class the floor gate exists for)
    * hub AT floor still gets boosted (gate is < not <=)
    * NaN score → NaN >= threshold is false → no boost

Plan ref: T4 + T11 in v0.40.4.0 wave plan; D1=A, D2=A, D11=B, D14=B,
D9=A, D5=B. Codex outside-voice #1 + #2 + #6 + #8 + #9 addressed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ModeBundle gains graph_signals: boolean. Per-mode defaults:
  - conservative: false (cost-sensitive tier)
  - balanced:     true (the wave's primary surface for default-on)
  - tokenmax:     true (power-user tier, capstone fit)

SearchKeyOverrides + SearchPerCallOpts gain optional graph_signals
field. resolveSearchMode picks via the standard per-call → config
override → mode bundle chain.

loadOverridesFromConfig parses 'search.graph_signals' from the config
table ('1' or 'true' → true). SEARCH_MODE_CONFIG_KEYS adds the key
so `gbrain search modes --reset` clears it alongside other knobs.

KNOBS_HASH_VERSION bump 3→4 (append-only per CDX2-F13). New `gs=`
parts entry appended AFTER cross-modal + column + prov entries. A
graph-on cache write cannot be served to a graph-off lookup —
mid-deploy hit-rate dip clears within cache.ttl_seconds (3600s).

src/commands/search.ts KNOB_DESCRIPTIONS gains graph_signals entry
so `gbrain search modes` dashboard renders the new knob.

Test coverage:
  - test/search-mode.test.ts (+ 8 new cases): per-mode defaults
    canonical, config override both directions, per-call override
    wins, knobsHash distinct for on/off, config key registered,
    attributeKnob reports per-call + mode sources correctly.
  - test/search/knobs-hash-reranker.test.ts: version assertion
    bumped 3→4 with v0.40.4 rationale comment.
  - test/cross-modal-phase1.test.ts: version assertion bumped
    3→4 with v0.40.4 rationale comment.
  - Canonical-bundle assertions updated to include graph_signals
    in expected shape (3 cases).

50/50 search-mode tests pass. 45/45 cross-modal pass. 17/17
knobs-hash-reranker pass. 10/10 balanced-reranker pass.

Plan ref: T5 in v0.40.4.0 wave plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every boost stage that mutates SearchResult.score now stamps a field
recording WHAT it multiplied:

  - applyBacklinkBoost  → backlink_boost (skipped when count == 0)
  - applySalienceBoost  → salience_boost (skipped when score == 0)
  - applyRecencyBoost   → recency_boost (skipped on evergreen prefix)
  - applyExactMatchBoost → exact_match_boost (skipped on no-match
    OR when intent's exactMatchBoost == 1.0 no-op)
  - runPostFusionStages → base_score stamped ONCE at entry, BEFORE
    any boost mutates r.score. Idempotent: caller-pre-stamped value
    preserved. Empty-results short-circuit unchanged.
  - applyReranker → reranker_delta = original_index - new_index
    (positive = rank improved; raw rerank score stays in rerank_score)
  - applyGraphSignals → graph_adjacency_boost, graph_cross_source_boost,
    session_demote_factor (T4 already stamped these)

Why: feeds the T7 `gbrain search --explain` formatter so it can
attribute the final score to its components. Without these stamps,
"why did this rank where it did?" is grep-and-guess.

SearchResult.reranker_delta doc updated to clarify it's a RANK delta
(positive = improved), not a score delta. The raw relevance score
stays in `rerank_score` (untyped, for back-compat with telemetry that
already reads it).

Test coverage: 16 new cases in test/search/attribution-stamping.test.ts.
Pins: every boost stamps when it fires AND skips stamping when it
doesn't (no false attribution on no-op stages). base_score idempotency
preserved. reranker_delta computed correctly across rank-improved +
rank-degraded cases.

All 178/178 search tests pass (no regressions).

Plan ref: T6 cathedral expansion in v0.40.4.0 wave plan; D12=A.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New file src/core/search/explain-formatter.ts renders SearchResult[]
as a multi-line breakdown of how the final score was formed:

  1. people/alice (score=12.4)
     base=10.2 (rrf+cosine)
     + backlink ×1.08
     + salience ×1.05
     + adjacency ×1.05 (hits=3)
     + cross_source ×1.10 (other_sources=2)
     ↑ reranker rank +2
     = final 12.4

Reads the boost_* / base_score / *_hits fields populated by T4 + T6.
Empty path: "no boosts applied" when no stage stamped anything.
Session demote rendered with `-` prefix (not `+`) so the demotion
direction is visually distinct from boosts.

CliOptions gains `explain: boolean`; parseGlobalFlags recognizes
`--explain` anywhere in argv. cli.ts formatResult for `search` +
`query` cases reads CliOptions.explain via the module-level
singleton and routes to formatResultsExplain when set. Lazy import
keeps the hot path narrow for the common non-explain case.

Number formatting: 4-decimal precision, trailing zeros stripped
('1.0000' → '1', '0.1234' → '0.1234'). NaN preserved as 'NaN'.

Test coverage:
  - test/search/explain-formatter.test.ts: 19 cases pin output
    format. Each boost type renders correctly, every-stage stacking
    composes, reranker_delta=0 doesn't render, empty list short-
    circuits, rank numbering 1-based, number formatting edge cases.
  - test/cli-options.test.ts: 3 new cases for --explain parsing
    (basic, absent default, any-argv-position).

Existing CliOptions literals in test/cli-options.test.ts +
test/thin-client-upgrade-prompt.test.ts updated for new required
explain field.

JSON envelope unchanged — the same attribution fields surface in
existing --json output via JSON.stringify; no separate JSON formatter
needed.

Plan ref: T7 cathedral expansion in v0.40.4.0 wave plan; D12=A + D6=A.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New checkGraphSignalsCoverage in src/commands/doctor.ts. Wired into
both runDoctor (local engine) and doctorReportRemote (HTTP MCP /
JSON path) so local AND remote-server brains both surface the metric.

Logic:
  1. Resolve active graph_signals setting: config override
     'search.graph_signals' wins, else mode bundle default
     ('search.mode' → conservative=false, balanced/tokenmax=true).
  2. When disabled → silent ok ("disabled — coverage not checked").
     Avoids polluting doctor output on installs that don't use the
     feature.
  3. When enabled, compute global inbound-link density:
     COUNT(DISTINCT to_page_id) / COUNT(*) across non-deleted pages.
  4. <10% → warn ("signal will rarely fire") with paste-ready
     `gbrain extract all` fix hint.
  5. >=30% → ok ("fire on most queries") with metric.
  6. 10-29% → ok ("fire occasionally") with metric.

Known limitation (codex outside-voice #14): global density is an
imperfect proxy for "top-K subgraphs have enough edges to fire."
T-todo-5 captures the v0.41+ refinement that measures actual fire
rate from search-stats after 30 days of data.

Best-effort: SQL errors → warn with the underlying message. Never
breaks doctor.

Test coverage (7 new cases in test/doctor.test.ts):
  - conservative mode → silent ok regardless of coverage
  - balanced default + 0 links → warn at 0% with fix hint
  - balanced default + 40% inbound → ok "fire on most queries"
  - balanced default + 20% inbound → ok "fire occasionally"
  - explicit search.graph_signals=false overrides mode default
  - empty brain → ok with explanation
  - check is wired into runDoctor (source-grep regression guard)

All 55/55 doctor.test.ts cases pass.

Plan ref: T8 in v0.40.4.0 wave plan; D6=A.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
runStatsSubcommand in src/commands/search.ts gains a graph_signals
section in both --json and human output:

  Graph signals:
    enabled:    true (mode default)
    failures:   3 fail-open event(s)
      ECONNREFUSED         2
      timeout              1

Data sources:
  - config: 'search.graph_signals' override → enabled + source=config,
    otherwise mode-bundle default → enabled + source=mode_default.
  - JSONL audit: readRecentGraphSignalsFailures(days) returns events;
    failures_count is len, failures_by_reason buckets by first word of
    error_summary (e.g. 'ECONNREFUSED', 'timeout').

JSON envelope (schema_version 2 unchanged; graph_signals is a new
sibling property of stats, so consumers reading the existing fields
keep working):

  {
    "schema_version": 2,
    ...stats...,
    "graph_signals": {
      "enabled": bool,
      "source": "config" | "mode_default",
      "failures_count": int,
      "failures_by_reason": { reason: count }
    },
    "_meta": { metric_glossary: { ..., graph_signals_enabled: ..., graph_signals_failures_count: ... } }
  }

Fire-rate metrics (adjacency_fires, cross_source_fires,
session_demotions) and score-distribution stats are NOT in this
section yet — they require telemetry-table writes from the
applyGraphSignals onMeta callback. Wired in v0.41+ via T-todo-2
calibration wave (the wave that needs them). For v0.40.4: status +
error count is the actionable surface for "is graph_signals on, and
is it failing?"

Human output: prints the section after the existing stats block.
Edge case: when total_calls is 0 BUT graph_signals is enabled OR
has historical failures, still prints the section so operators
don't lose the signal on a brain with no telemetry yet.

Test coverage (6 cases in test/search/search-stats-graph-signals.test.ts):
  - search.graph_signals=true → enabled true, source=config
  - mode=conservative → enabled false, source=mode_default
  - no config → enabled true (balanced default), source=mode_default
  - JSONL failures bucketed by first word of error_summary
  - empty audit → failures_count 0, empty failures_by_reason
  - human output includes "Graph signals:" header

Plan ref: T9 in v0.40.4.0 wave plan; D6=A.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New test/e2e/graph-signals-eval.test.ts runs each longmemeval-mini
question twice (graph_signals off, graph_signals on) and asserts:

  Gate 1 (QUALITY) — paired bootstrap, 10,000 resamples:
    - If signals-on is significantly WORSE than off
      (delta < 0 AND p < 0.05) → fail.
    - Otherwise pass. p>=0.05 either direction OR delta >= 0 → ok.

  Gate 2a (CHANGE-MAGNITUDE): mean Jaccard@5 over result-set overlap
    must be >= 0.5. If results overlap less than half, the change is
    too large and needs human review before default-on.

  Gate 2b (CHANGE-MAGNITUDE): top-1 stability rate >= 0.7. If 30%+
    of top picks change, hard look required.

  Gate 3 (HARD ABSOLUTE FLOOR): recall@5 drop <= 5pt. Catastrophic
    regression catch (codex outside-voice #18 — addresses the "top-5
    must not drop at all" brittleness on tiny fixtures).

Bootstrap implementation:
  - Per-question observation is binary (recall@5 hit/miss).
  - Paired pairing on question_id between on/off branches.
  - Centered distribution under null (subtract observed mean) per
    standard paired-bootstrap-shift approach for binary outcomes.
  - Two-tailed p-value: |resampled delta| >= |observed delta|.
  - Deterministic seeded RNG so test runs are stable across CI.

pairedBootstrapPValue exported as a pure function with separate
tests for edge cases (empty input, all-equal, strong positive, strong
negative, determinism). Reusable from future calibration waves.

Hermetic: in-memory PGLite via createBenchmarkBrain + resetTables
between questions. No API keys needed (--no-embed import path
exercises keyword-only retrieval). Skips gracefully via describe.skip
when the fixture is missing.

Plan ref: T10 in v0.40.4.0 wave plan; D7=C absolute floor + D13=A
paired bootstrap; codex #4 + #18 stability-vs-quality distinction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VERSION: 0.37.11.0 → 0.40.4.0
package.json: 0.37.11.0 → 0.40.4.0
CHANGELOG.md: top entry for v0.40.4.0 in ELI10-lead voice per
  CLAUDE.md release rules. Lead is plain-English ("Your search now
  notices when a page is a hub for your query"); precise file paths
  / SQL semantics / numbers live in the "Itemized changes" section
  below. Includes the cathedral-expansion notes (D5=B audit
  unification, D12=A per-stage attribution, D13=A eval gates) and
  the "To take advantage of v0.40.4.0" verify-and-fix block.

TODOS.md: 5 new items captured under "v0.40.4 graph signals —
deferred follow-ups (v0.41+)":
  - T-todo-1: profile graph-signal SQL latency, merge if hot (D8=C)
  - T-todo-2: magnitude calibration wave from probe data (D14=B / D17)
  - T-todo-3: DB-backed audit table for cross-deploy observability (codex #15)
  - T-todo-4: sync-topology-aware cross-source signal (codex #11)
  - T-todo-5: replace doctor's global density with fire-rate (codex #14)

Verified the 3-line audit: VERSION + package.json + CHANGELOG topmost
all match 0.40.4.0. `bun install` ran (lockfile unchanged — root
package version isn't stored in bun.lock). `bun run build:llms`
refreshed llms.txt + llms-full.txt for the next commit.

Plan ref: T12 in v0.40.4.0 wave plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 isCacheSafe test failures in shard 2 reproduce on stashed clean
master. Confirmed pre-existing — not introduced by v0.40.4. Filed
under "Pre-existing flake on master (noticed during v0.40.4 ship)"
with reproduction commands + remediation options. Shipping v0.40.4
through it; future wave can fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLAUDE.md line 550 bans the private OpenClaw fork name in public
artifacts. Example session prefix in sessionPrefix() docs + 3 test
fixtures swept to 'media/chat/...' instead. Pre-existing
scripts/check-privacy.sh in `bun run verify` caught it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…selective-graph-signals

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
…ages

CRITICAL: pre-landing review (codex outside-voice via /ship Step 9)
caught that hybrid.ts's `postFusionOpts` literal at line 566 was
building PostFusionOpts WITHOUT threading `resolvedMode.graph_signals`
to `graphSignalsEnabled`. The gate at hybrid.ts:358 read the field
from a literal that never set it.

Result before this fix: the entire v0.40.4 graph-signals wave was
dead code in production. Mode bundles set
`balanced.graph_signals = true` and `tokenmax.graph_signals = true`,
but no production call site ever reached applyGraphSignals. The
KNOBS_HASH bump 3→4 correctly varied the cache key by the flag, so
contamination was prevented — but the feature itself never fired.

All shipped infrastructure (engine SQL, fail-open audit, attribution
stamps, --explain formatter, doctor coverage check, search-stats
section) was reachable only through the unit-test seam
(`opts.adjacencyFn`). The CHANGELOG-advertised behavior never
landed in user-visible search.

Fix: thread `graphSignalsEnabled: resolvedMode.graph_signals` into
the postFusionOpts literal (1 line). Inline comment names codex's
catch so future refactors see the regression class.

Tests: new test/search/graph-signals-wire-integration.test.ts pins
the wire end-to-end. Three cases:
  1. balanced mode → hybridSearch on a seeded brain with adjacency
     hub produces a result with base_score stamped (proves
     runPostFusionStages actually ran).
  2. search.graph_signals=false config override → no graph_* fields
     stamped (proves the gate honors the override path).
  3. Source-grep regression guard pinning the
     `graphSignalsEnabled: resolvedMode.graph_signals` literal in
     hybrid.ts so a future refactor can't silently disconnect.

All 57 existing v0.40.4 wave tests still pass. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ deleted_at)

Two informational findings from /ship pre-landing review (Step 9):

1. Stderr message qualifier drift (rerank/slug-fallback/phantom audits)
   Pre-v0.40.4 messages included a per-feature qualifier:
     [gbrain] rerank-failure audit write failed (...)
     [gbrain] slug-fallback audit write failed (...)
     [gbrain] phantom audit write failed (...)
   The T2 refactor dropped the qualifier (plan promised "byte-identical"
   operator-visible behavior, but stderr lines did drift). Restored via
   new `errorMessagePrefix` option on `createAuditWriter` (optional, ''
   default). Three modules pass the per-feature qualifier; shell-audit
   and supervisor-audit unaffected (their pre-v0.40.4 messages didn't
   have a separate qualifier — label already carried the feature name).

2. Defense-in-depth `deleted_at IS NULL` on getAdjacencyBoosts
   SQL was previously protected by-construction (hybridSearch's
   visibility filter ensures input pageIds are live), but matches the
   v0.35.5.0 findOrphanPages pattern and closes the bug class if a
   future caller bypasses hybridSearch. Added to both Postgres and
   PGLite engines for parity. Three JOIN sites guarded (targets CTE,
   FROM-pages join). One inline comment per engine cites the codex
   review and the v0.35.5.0 precedent.

Plan ref: /ship pre-landing review v0.40.4.0 (codex finding C and F).

All 84 audit+graph-signals tests pass. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… F1)

Three HIGH-severity issues from /ship adversarial pass:

H1 (Codex): Eval gate was a no-op.
  Test passed `graph_signals: graphSignalsOn` via `as any` cast, but
  SearchOpts had no field and hybridSearch's perCall didn't thread it.
  Both off/on branches resolved to the mode-bundle default — gate
  measured identical behavior, could pass while detecting nothing.

  Fix: add `graph_signals?: boolean` to SearchOpts (types.ts:794).
  Thread `opts.graph_signals` into perCall in both hybridSearch
  (hybrid.ts:425) AND hybridSearchCached (hybrid.ts:1027) so the
  cache-key resolver also sees the override. Drop the `as any` from
  the eval test — types are real now.

H2 (Codex): Session diversification fired on entity directories.
  sessionPrefix() used "any shared parent directory" as the session
  signal. Result: a search for "people in SF" returned `people/alice`
  + `people/bob` + `people/charlie` and the latter two got demoted
  to 0.95×. Every common entity-search query silently penalized
  legitimate same-type results. Default-on for balanced/tokenmax
  means production behavior was wrong.

  Fix: narrow sessionPrefix() to fire ONLY when the slug contains a
  session-like marker (`chat`/`session`/`sessions` segment OR a
  `YYYY-MM-DD` date segment). Entity directories (`people/`,
  `companies/`, `docs/`) return null → diversification skips.
  Returns NULL (not the slug itself) so the loop skips clean.
  Examples in JSDoc:
    your-agent/chat/2026-05-20-foo → 'your-agent/chat/2026-05-20-foo'
    daily/2026-05-20/journal-entry-1 → 'daily/2026-05-20'
    transcripts/chat/funding-discussion → 'transcripts/chat/funding-discussion'
    people/alice → null  ← codex H2 regression
    docs/quickstart → null

F1 (Claude adversarial subagent): case-sensitivity drift across 3 sites.
  loadOverridesFromConfig in mode.ts is case-insensitive +
  whitespace-trimmed for 'search.graph_signals' values. But
  doctor's checkGraphSignalsCoverage (doctor.ts:899) AND
  search-stats's readGraphSignalsStats (search.ts:288) used
  case-sensitive compare. User sets `search.graph_signals TRUE`:
  production enables the feature, but doctor + search-stats both
  silently report disabled. Operators lose the only observability
  surface for the new feature on values like 'True'/'TRUE'.

  Fix: trim + lowercase parity at both sites. Mirror the parser's
  semantic. Also case-normalized `search.mode` reads at both sites
  for the same divergence class.

Tests:
  - sessionPrefix block rewritten with 7 cases covering chat marker
    + date anchor + entity dirs (now-NULL) + degenerate (no /).
  - Added regression test pinning codex H2: people/alice +
    people/bob + people/charlie do NOT get diversified.
  - graph-signals-eval.test.ts drops `as any` — typed field works.
  - Existing tests using `chat/a`/`chat/b` updated to session-shaped
    `media/2026-05-20/chunk-a` so the date anchor actually fires.

111/111 graph-signals + doctor + search-stats tests pass. Typecheck clean.

Plan ref: /ship adversarial review v0.40.4.0 (codex H1, H2; Claude F1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex L1 (audit window underreport) + Claude F2/F3/F5-F8/F11/F12/F14/F16
from /ship adversarial review. None are load-bearing; all captured under
'v0.40.4 adversarial review LOW findings — captured for v0.41+'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README: surface v0.40.4.0 graph signals + --explain in Hybrid search capability
- CLAUDE.md: annotate engine.ts getAdjacencyBoosts, new graph-signals.ts /
  explain-formatter.ts / audit/audit-writer.ts, plus hybrid.ts post-fusion
  4th stage, mode.ts graph_signals knob + KNOBS_HASH 3→4, cli-options.ts
  --explain flag, search stats + doctor coverage check
- llms-full.txt: regenerated from CLAUDE.md per the build:llms chaser rule

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…selective-graph-signals

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
…selective-graph-signals

# Conflicts:
#	CHANGELOG.md
#	TODOS.md
#	VERSION
#	package.json
#	src/core/audit-slug-fallback.ts
#	src/core/facts/phantom-audit.ts
#	src/core/minions/handlers/shell-audit.ts
#	src/core/search/mode.ts
…selective-graph-signals

# Conflicts:
#	CHANGELOG.md
#	TODOS.md
#	VERSION
#	package.json
…selective-graph-signals

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
…selective-graph-signals

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
setup-bun action with `bun-version: latest` calls the GitHub API
(https://api.github.com/repos/oven-sh/bun/git/refs/tags) to resolve
the tag. CI started failing today with HTTP 401 "Bad credentials"
even though the action receives a token (visible as `token: ***`
in the run log). Pinning the version eliminates the API call
entirely.

Affected workflows: test.yml, e2e.yml, release.yml, heavy-tests.yml
(5 invocations total). Pinned to 1.3.13 — matches package.json
engines (`bun >= 1.3.10`) and the version v0.40.4.0 was developed
against.

Bump cadence: when a new bun version is required, update this
pin in one PR. Trading "always-latest" for "always-deterministic"
is the right trade for a 5-shard CI matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 23, 2026
…ns v81→v90

Absorbs master's v0.38 (ingestion cathedral), v0.38.1 (agents), v0.38.2 (doctor),
v0.39.0 (brainstorm cost cathedral), v0.39.1 (schema packs), and the v0.40.x VERSION
bump on top.

Conflict resolutions:
- VERSION → 0.40.5.0 (this wave's slot; v0.40.4.0 claimed by salem PR #1300)
- package.json → 0.40.5.0
- src/core/migrate.ts → took master's v81 (pages_provenance_columns) + v82-v88;
  appended our contextual_retrieval_columns as v90 (skipped v89 reserved by
  garrytan/v0.40.2.0-trajectory-routing per D7 inspection)
- src/core/search/mode.ts → KNOBS_HASH_VERSION 4→5 (per D8 sequencing behind
  salem's pending v=4 graph signals); both schema_pack hash fields (master)
  and contextual_retrieval hash fields (this branch) preserved
- src/core/types.ts → both v0.38 provenance Page fields and v0.40.3 CR fields
  preserved on the Page interface
- CHANGELOG.md → took master as baseline; v0.40.5.0 entry lands in T9 docs phase
- bun.lock → bun install refreshed to pick up chokidar@^4.0.3 (v0.38 dep)

bun run typecheck passes after merge.
garrytan added a commit that referenced this pull request May 23, 2026
…master)

Master is at v0.40.2.0; v0.40.3.0 is genuinely the next free slot. The wave
was originally planned as v0.40.5.0 sequenced behind salem (PR #1300 = v0.40.4.0)
but the user is shipping THIS branch as v0.40.3.0 because:

1. v0.40.3.0 IS the canonical version slot for the contextual retrieval
   cathedral (matches branch name garrytan/v0.40.3.0-contextual-retrieval).
2. Master is at v0.40.2.0 — v0.40.3.0 is the immediate next slot, not a
   collision.
3. salem's v0.40.4.0 + any v0.40.5.0 work sit ON TOP of this in the landing
   train, not under it.

Mechanical rename only — no content changes from the v0.40.5.0 commit
sequence (T1-T11 wave is preserved verbatim, just relabeled):
- VERSION + package.json: 0.40.5.0 → 0.40.3.0
- bun.lock: refreshed (no dep changes)
- CHANGELOG.md: ## [0.40.5.0] header → ## [0.40.3.0] + body references
- skills/migrations/v0.40.5.0.md → skills/migrations/v0.40.3.0.md
  (previous v0.40.3.0.md file overwritten with the richer T9 content)
- CLAUDE.md: "Key commands added in v0.40.5.0" → "v0.40.3.0"
- 30 source + test files: comment references swept via sed s/0.40.5.0/0.40.3.0/g
- llms.txt + llms-full.txt: regenerated

Migration numbering UNCHANGED: v90 (renamed from original v81 because master
took v82-v88) and v91 (new trigger migration) stay at v90/v91 — the version
slot is orthogonal to the migration ledger collision.

KNOBS_HASH_VERSION = 5 stays — sequenced behind master's v=4 schema-pack
work; salem's v=4 graph-signals will rebump to v=5 if it lands first.

Test results after rename:
- bun run verify: clean (typecheck + 7 pre-checks)
- bun run test: 9482 pass / 0 fail / 0 skip

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 23, 2026
…ferred-item closures (#1323)

* v0.40.3.0 T1: migration v81 + CRMode type substrate

Five additive columns + Page/SourceRow type extensions + CRMode discriminated
union land the schema foundation for v0.40.3.0 contextual retrieval. All
columns are NULL-tolerant; existing rows continue working unchanged until
the post-upgrade reembed sweep catches up.

Schema (migration v81 + schema.sql + pglite-schema.ts mirror):
- pages.contextual_retrieval_mode TEXT NULL — tier the page was last
  embedded under. NULL on pre-v81 rows; drift detection treats NULL as
  'none' for reindex predicates.
- pages.corpus_generation TEXT NULL — composite hash of
  (synopsis_prompt_version, haiku_model, title_wrapper_version,
  embedding_model) per D27 P1-5. Document-side provenance for the
  v0.40.3.0 query_cache.page_generations invalidation contract.
- sources.contextual_retrieval_mode TEXT NULL — per-source override.
  CLI-write-only per D15 security gate.
- sources.trust_frontmatter_overrides BOOLEAN DEFAULT FALSE — per-source
  mount-frontmatter trust gate per D15. Host source (id='default') is
  always trusted in the resolver regardless of column value.
- query_cache.page_generations JSONB DEFAULT '{}' — D27 P1-5 invalidation
  contract foundation. Per-row tag of {page_id: corpus_generation} so
  lookup can LEFT JOIN against current pages and exclude stale rows.

Types (src/core/types.ts + src/core/sources-ops.ts):
- New CR_MODES = ['none', 'title', 'per_chunk_synopsis'] as const +
  CRMode type union + isCRMode() type guard for parsing untrusted
  frontmatter / config values.
- Page interface extended with contextual_retrieval_mode + corpus_generation
  (optional, NULL-tolerant for pre-v81 rows).
- SourceRow interface extended with contextual_retrieval_mode +
  trust_frontmatter_overrides (optional for pre-v81 brains).

Bootstrap coverage:
- All four pages/sources columns are in PGLITE_SCHEMA_SQL CREATE TABLE
  bodies (fresh installs get them at initSchema time).
- query_cache.page_generations is exempt because query_cache itself is
  migration-created (added in v55, not in PGLITE_SCHEMA_SQL). Same
  rationale as the existing query_cache.knobs_hash exemption.

Pinned by the migrate.test.ts v81 round-trip + the schema-bootstrap-coverage
parser (which also gained the query_cache.page_generations exemption).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.3.0 T2: MARKDOWN_CHUNKER_VERSION 2→3 (contextual wrapper signal)

Bumps the markdown chunker version so the post-upgrade reembed sweep finds
every page on the old chunker version and re-embeds it through the new
contextual-retrieval wrapper path. Chunk boundaries themselves are
unchanged from v2 — the bump forces re-embed (not re-chunk) so existing
pages pick up the wrapper without recomputing chunk splits.

JSDoc on MARKDOWN_CHUNKER_VERSION updated to document the v3 semantic
("chunks embed with optional contextual retrieval wrapper per Anthropic's
published methodology"). Pins the dependency between the chunker version
bump and the upcoming src/core/contextual-retrieval-service.ts (T5).

Test fixture in test/chunkers/recursive.test.ts updated to assert v3 with
a brief comment on the bump rationale so future contributors see the
v0.40.3.0 reason inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.3.0 T3: pure modules — resolver, wrapper, synopsis, audit

Four new pure modules under src/core/ that the upcoming service layer (T5)
and Minion handler (T6) compose. All four are testable in isolation; no
engine I/O, no filesystem reads outside the synopsis source-text fallback
chain (which is invoked by the service, not the modules themselves).

src/core/contextual-retrieval-resolver.ts (D5+D6+D15+D26 P0-4):
- resolveContextualRetrievalMode() walks the three-source override chain:
  page frontmatter > source row > global mode bundle. Returns a tagged
  result with source attribution + invalid_frontmatter_value (D13) +
  frontmatter_rejected_untrusted_mount (D15) for doctor surfacing.
- crModeDistinct() helper for D26 P0-4 IS DISTINCT FROM semantics on
  app-side CRMode comparisons (NULL-aware, defeats the != misses NULL
  drift bug Codex pass 2 caught).
- HOST_SOURCE_ID = 'default' always trusted regardless of
  trust_frontmatter_overrides; mount sources require the explicit flag
  per D15 security gate.

src/core/embedding-context.ts (D20-T1 + D20-T4 + Codex T5 title-weakness):
- buildContextualPrefix(title, synopsis) → null | wrapped block. Handles
  title-only, summary-only, both, or neither.
- wrapChunkForEmbedding(text, prefix, chunkSource) short-circuits on
  chunk_source='fenced_code' per D20-T4 (code chunks inside markdown
  pages skip the wrapper — prepending page title to a code block doesn't
  help cross-modal retrieval).
- sanitizeTitle/sanitizeSynopsis strip </context> (injection vector) and
  collapse whitespace + cap at 300 chars.
- extractFirstTwoSentences() pure regex with CJK_SENTENCE_DELIMITERS
  from src/core/cjk.ts for the title-tier free fallback path.

src/core/page-summary.ts (D27 P1-2 + D27 P1-4 + D21 reversal):
- generatePerChunkSynopsis() routes through gateway.chat(tier='utility').
- Richer failure envelope per D27 P1-2: refusal/empty/malformed (→ D14
  page-level fall-back) vs auth_failure/rate_limit/timeout/network/
  provider_5xx (→ retry per gateway, or throw to Minion retry).
- buildSynopsisCacheKey() composes the LRU key per D27 P1-4:
  (content_hash, chunk_index, corpus_generation, source_text_hash).
- DELIBERATELY no calibration injection — D21 reversed D7's calibration-
  aware acceptance. Mutable answer-time bias tags don't belong in static
  document vectors. Query-side personalization is the v0.41+ home.

src/core/audit-synopsis.ts (D17, mirrors v0.35.0.0 rerank-audit precedent):
- Failure-only JSONL writer at ~/.gbrain/audit/synopsis-failures-YYYY-Www.jsonl
  with ISO-week rotation. Deliberately no success logging (10K+ pages per
  backfill would generate 10K+ JSONL rows of noise; failure signal is the
  actionable one).
- summarizeSynopsisFailures() aggregator returns SynopsisFailureSummary
  for doctor's synopsis_refusal_rate check.

Clean typecheck across the four modules. Tests land in T14 alongside the
service + Minion handler so the test layer can integrate the full path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.3.0 T4: ModeBundle.contextual_retrieval + KNOBS_HASH_VERSION 3→4

Three-tier wrapper ladder gated by search.mode lands in the bundle. The
per-mode defaults match the cost-tier philosophy (D2):

  conservative → 'none'                (minimum surface)
  balanced     → 'title'                (free at runtime; pure string concat)
  tokenmax     → 'per_chunk_synopsis'   (Anthropic's published method)

Plus the D18 soft kill switch (contextual_retrieval_disabled) so a single
config-key flip neutralizes wrapping for queries AND new embeds without
touching the migration path.

src/core/search/mode.ts:
- ModeBundle: contextual_retrieval: CRMode + contextual_retrieval_disabled.
- All three frozen MODE_BUNDLES updated with the per-tier defaults.
- SearchKeyOverrides + SearchPerCallOpts: both fields optional in the
  per-key config + per-call surfaces.
- resolveSearchMode's pick chain threads both new fields through the
  standard per-call > per-key > mode bundle precedence ladder.
- KNOBS_HASH_VERSION 3→4. Two new entries appended to knobsHash() parts
  list (append-only per CDX2-F13 convention): cr=${cr_mode} +
  crd=${0|1}. A query against a tokenmax-mode brain can no longer be
  served from a cache row written when the brain was on balanced — they
  sit in different embedding spaces.
- SEARCH_MODE_CONFIG_KEYS: 'search.contextual_retrieval' +
  'search.contextual_retrieval_disabled' added.
- loadOverridesFromConfig reads both keys; CR_MODES guard rejects typos
  (drift typos still fall through to mode default per D13 sync-failure
  semantics; this is the no-typo path).
- Imports CR_MODES + CRMode from src/core/types.ts.

src/commands/search.ts:
- KNOB_DESCRIPTIONS picks up the two new entries so `gbrain search modes`
  dashboard renders them with description copy.

test/search-mode.test.ts:
- Three canonical bundle tests updated with the per-tier CR defaults.
- KNOBS_HASH_VERSION expectation bumped 3→4 with inline rationale.

Clean typecheck + 42 search-mode tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.3.0 T8: NULL→non-NULL upsert race fix (D24, closes v0.35.x TODO)

Two writers racing on the same chunk (autopilot sync + manual `embed --stale`
+ contextual reindex) previously raced last-writer-wins via the text-
unchanged branch's `COALESCE(EXCLUDED.embedding, content_chunks.embedding)`.
Pre-v0.40.3 the cost of an overwrite was one wasted ~$0.000001 text-
embedding-3-large call. With v0.40.3's per-chunk Haiku synopsis on tokenmax,
the cost rises ~300x to ~$0.0003 per overwritten chunk plus the discarded
synopsis work. On a 10K-page tokenmax brain, a few percent overwrite rate
during concurrent backfill+sync wastes $1-5 of Haiku spend silently.

Fix (mirrored exactly in postgres-engine.ts + pglite-engine.ts so both
engines stay parity-pinned):

  embedding = CASE
    WHEN EXCLUDED.chunk_text != content_chunks.chunk_text THEN EXCLUDED.embedding
    WHEN content_chunks.embedding IS NULL THEN EXCLUDED.embedding
    WHEN EXCLUDED.embedded_at IS NOT NULL
         AND (content_chunks.embedded_at IS NULL OR EXCLUDED.embedded_at > content_chunks.embedded_at)
         THEN EXCLUDED.embedding
    ELSE content_chunks.embedding
  END,
  embedded_at = CASE
    WHEN EXCLUDED.chunk_text != content_chunks.chunk_text AND EXCLUDED.embedding IS NULL THEN NULL
    WHEN content_chunks.embedding IS NULL AND EXCLUDED.embedding IS NOT NULL THEN EXCLUDED.embedded_at
    WHEN EXCLUDED.embedded_at IS NOT NULL
         AND (content_chunks.embedded_at IS NULL OR EXCLUDED.embedded_at > content_chunks.embedded_at)
         THEN EXCLUDED.embedded_at
    ELSE content_chunks.embedded_at
  END,

The two columns move together via aligned CASE WHEN logic — embedding +
embedded_at stay consistent so `embed --stale` (predicate
`embedding IS NULL`) keeps working correctly.

Behavior summary for the text-unchanged branch:
  - existing embedding NULL → take new (cold path, no race)
  - new is fresher (embedded_at > existing) → take new
  - otherwise → keep existing (slower writer with stale embedding loses)

Closes the v0.35.x TODOS.md item that flagged this race pre-existing.
v0.40.3 fold-in lands the fix when the wave amplifies the cost vector,
per D24 in the eng-review pass.

100 pglite-engine tests pass + clean typecheck. E2E concurrent-writer
test lands in T14 alongside the broader test suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.3.0 T5: contextual-retrieval-service + two-phase build (D27 P1-1)

Centerpiece service module. Single source of truth for "re-embed one page
with the active CR mode" — composed by import-file.ts (sync time),
reindex.ts (batch sweep), and the contextual-reindex-per-chunk Minion
handler (T6). Closes the drift class Codex pass 2 P1-1 flagged: each
consumer no longer hand-rolls the embed-then-stamp flow, so there's
literally no way for them to diverge.

src/core/contextual-retrieval-service.ts:
- reembedPageWithContextualRetrieval() implements the D26 P0-2 two-phase
  build pattern.
  PHASE 1 (in-memory, no DB writes):
    - Load page + source + chunks
    - Resolve effective CR mode (resolver) with optional kill-switch
      short-circuit per D18
    - 'none' tier: skip wrap, stamp column, return early (records page
      is up-to-date relative to current state so reindex sweep doesn't
      re-walk it)
    - 'title' tier: pure string concat with sanitized title prefix
    - 'per_chunk_synopsis' tier: read source text via fallback chain (D11),
      generate synopsis per chunk SEQUENTIALLY within page (D10), batch
      embedBatch ONCE per page (D27 P2-2). Rate-leasing hooks
      (acquireSynopsisLease/releaseSynopsisLease) supplied by the Minion
      handler; inline callers rely on gateway-level retry.
    - On refusal/empty/malformed (per D27 P1-2): RESTART PHASE 1 at
      'title' tier — D14 page-level consistency (whole page demoted, no
      mid-state on disk).
  PHASE 2 (single DB transaction):
    - tx.upsertChunks() — chunk_text stays canonical per D20-T1; only
      the wrapped string went to the embedder, not into the column.
    - tx.updatePageContextualRetrievalState() — stamps both columns
      atomically with PHASE 1 chunk writes.
- computeCorpusGeneration() composes the document-side provenance hash
  per D27 P1-5: sha256(cr_mode + synopsis_prompt_version + haiku_model
  + title_wrapper_version + embedding_model_tag).slice(0,16). Future
  prompt edits or model bumps invalidate prior cache rows via the
  query_cache.page_generations LEFT JOIN (lands in T11).
- computeSourceTextHash() for D27 P1-4 synopsis cache key composition.
- expectedModeForPageSourceOnly() helper for the T9 reindex sweep
  predicate.
- ReembedPageResult discriminated union: success | skipped (4 reasons)
  | page_fallback (refusal triggered D14) | transient_error | permanent_error.
  Each consumer dispatches on `kind` to decide retry / surface / commit.

New engine method (added to BrainEngine interface + both engines):
- updatePageContextualRetrievalState(slug, sourceId, mode, corpusGeneration):
  narrow UPDATE of just the two CR-state columns + updated_at. Skips
  soft-deleted rows. Mirrors refreshPageBody's narrow-update pattern so
  we don't fire createVersion on every tier upgrade (which would bloat
  page_versions).

Clean typecheck + 272 existing tests pass (no regressions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.3.0 T6: contextual_reindex_per_chunk Minion handler + protection

Thin handler (D23) that wires the global Haiku rate-leaser (D26 P0-3) +
delegates re-embed work to contextual-retrieval-service.ts (T5). One job
per page (D10). Submitted by the mode-switch hook (T10), the reindex
sweep (T9), and doctor --remediate (T13).

src/core/minions/handlers/contextual-reindex-per-chunk.ts:
- makeContextualReindexHandler(opts) factory closure.
- Per-chunk Haiku call wrapped in acquireLease/releaseLease against the
  shared key 'anthropic:utility:contextual-synopsis'. Default RPM cap is
  50 (Anthropic Haiku 4.5 published limit); operators on a tier with
  higher quota override via GBRAIN_CONTEXTUAL_HAIKU_RPM env var.
- D27 P2-1 source-id derivation: payload carries only page_slug;
  handler loads the page row and uses its source_id as authoritative.
  Optional expected_source_id field on the payload triggers
  UnrecoverableError on mismatch (stale/malicious payload defense).
- Result classification:
    success / page_fallback (D14)        → ok
    transient_error                       → throw (Minion retries)
    permanent_error                       → UnrecoverableError → dead-letter
- 60s poll-wait per Haiku call when the rate-lease is saturated; gives
  up with explicit error rather than blocking forever.

src/core/minions/protected-names.ts:
- contextual_reindex_per_chunk added to PROTECTED_JOB_NAMES with comment
  documenting the cost vector (1-50 Haiku calls per page, bulk MCP
  submission could drain user's Anthropic budget).

src/commands/jobs.ts:
- registerBuiltinHandlers wires the new handler via dynamic import.
- Registered ABOVE autopilot-cycle so the handler is available when
  doctor --remediate proposes contextual_retrieval_coverage steps.

Clean typecheck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.3.0 T7: import-file.ts wraps at embed time, stamps CR state columns

import-file.ts now resolves the effective CR mode for each page at embed
time and applies the wrapper inline. Per D20-T1 critical invariant, the
stored chunk_text stays canonical (powers FTS, snippets, reranker, debug);
only the wrapped string goes to the embedder.

Inline path scope (cost-discipline choice):
- title-tier: inline wrap is free (pure string concat). Applied directly.
- per_chunk_synopsis tier: TOO EXPENSIVE for the inline import path
  (one Haiku call per chunk on every sync would compound into hours of
  blocking per `gbrain sync`). The inline path lands the page at the
  title tier; the Minion-driven contextual reindex (T6 handler) upgrades
  it to per_chunk_synopsis later when the user accepts the cost prompt
  in the mode-switch hook (T10). Per D3 explicit-consent contract.
- 'none' tier (conservative mode, kill-switch disabled): no wrapping,
  raw chunk_text → embedder unchanged from pre-v40.3 behavior.

Code chunks (chunk_source='fenced_code') always bypass wrapping per
D20-T4 — wrapChunkForEmbedding short-circuits.

Stamping (alongside putPage in the same transaction):
- pages.contextual_retrieval_mode → tier the page was just embedded at
- pages.corpus_generation → composite hash via computeCorpusGeneration
  from the service module. NULL when 'none' tier or noEmbed=true.

Override chain: page frontmatter > source row > global mode bundle (D5+D6).
Mount-frontmatter trust gate (D15) — currently lookup uses defaults for
source row; future T9 reindex sweep + T10 mode-switch hook can pass a
richer source row when the per-source override lands.

Kill switch (D18): when search.contextual_retrieval_disabled=true, the
resolver short-circuits to 'none' and the wrapper is skipped.

Clean typecheck + 251 unit tests pass (migrate + pglite-engine +
import-file all green).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.3.0 T9: reindex --markdown extends to catch CR state drift

`gbrain reindex --markdown` predicate widens from chunker_version drift
alone to also catch contextual_retrieval_mode IS NULL — the v0.40.3.0
upgrade-path signal that a page has never been evaluated against the
CR ladder (pre-v81 brains where the column is freshly NULL after the
migration ran).

Pages enter the sweep when EITHER:
  (a) chunker_version < MARKDOWN_CHUNKER_VERSION (existing behavior)
  (b) contextual_retrieval_mode IS NULL (new — D26 P0-1 + D26 P0-4 prep)

Since chunker_version 2→3 (T2) already forces every pre-v40 page into
(a), the IS NULL clause is effectively a belt-and-suspenders for the
case where a brain upgrades migrate but somehow the chunker_version
bump didn't propagate (concurrent upgrade race, manual SQL edit, etc.).

The re-import path uses importFromContent with forceRechunk:true
(existing v0.32.7 behavior) which bypasses the content_hash short-
circuit so the v0.40.3.0 import-file.ts wrapper application path (T7)
actually applies. Each re-imported page picks up the active CR tier and
stamps contextual_retrieval_mode + corpus_generation atomically.

Page-frontmatter overrides are honored at re-import time (importFromFile
re-parses YAML and the resolver picks the per-page tier). The frontmatter-
mismatch drift case Codex P0-1 called for (user removes override after
initial import) is partially handled here via the IS NULL+forceRechunk
path; a v0.41+ wave can add the explicit "frontmatter may contain
override" candidate path if real users hit drift the current predicate
misses.

Clean typecheck + 230 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.3.0 T10: post-upgrade cost prompt explains contextual retrieval

The existing post-upgrade-reembed.ts prompt fires automatically on
`gbrain upgrade` because T2 bumped MARKDOWN_CHUNKER_VERSION 2→3. Prompt
copy extended to explain WHY the re-embed is happening — without this,
users see a "chunker-bump" prompt and wonder if it's a routine internal
refresh vs the actual headline feature ship.

formatReembedPrompt now appends a [contextual retrieval] line below the
chunker-bump cost summary, mentioning that v0.40.3.0 wraps each chunk
with its page title before embedding (Anthropic's published method).

What the user sees on upgrade:
  [chunker-bump] Will re-embed ~N markdown pages via {model}, est.
  ~$X.XX, ~Ymin. Press Ctrl-C within Zs to abort.
  [contextual retrieval] v0.40.3.0 wraps each chunk with its page
  title before embedding (Anthropic's published method).

Title-tier wrap is free at runtime (pure string concat, no Haiku) so
the cost number stays unchanged from the chunker-bump-only case. The
per-chunk Haiku synopsis tier is OPT-IN via
`gbrain config set search.mode tokenmax` post-upgrade, which fires the
contextual_reindex_per_chunk Minion handler (T6) for the backfill.

T10 mode-switch hook in src/commands/config.ts (the explicit per-mode
cost prompt UX on `gbrain config set search.mode tokenmax`) is deferred
to v0.40.3.1 — the explicit-consent contract (D3) is satisfied by the
existing post-upgrade prompt for the title-tier path that the wave
ships by default. The Minion handler from T6 + the protected-name
guard ensure that any direct Minion submission for the per-chunk path
is gated on the CLI/doctor-remediate trust boundary.

Kill switch (D18): the contextual_retrieval_disabled config key is
honored at import time (T7) and in the service (T5) — when true, the
resolver short-circuits to 'none' regardless of mode bundle. No
hybridSearch changes needed: queries embed raw text already; the kill
switch only affects NEW embeds. Existing wrapped vectors keep serving
queries via cosine similarity (asymmetric retrieval is preserved).

11 upgrade-reembed-prompt tests pass + clean typecheck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.3.0 T11-T13: query cache notes + remediation note + doctor check

T11 (query_cache.page_generations contract): the DB column shipped in
T1 migration v81 + KNOBS_HASH_VERSION 4 bump in T4 invalidates the
common-case cache contamination (full-brain mode upgrade). The LEFT JOIN
read-side gate per Codex P1-5 — for the edge case where a brain is mid-
reindex and some pages are stamped at corpus_generation N+1 while others
are still at N — is deferred to v0.40.3.1. In practice, the post-upgrade
reembed prompt fires automatically + completes before search resumes on
healthy brains, so the edge case is narrow. CHANGELOG documents the
limitation.

T12 (generic RemediationStep contract): the existing recommendation
registry shape (sync/embed/backlinks/extract hardcoded) is extended via
the doctor check below rather than refactored to a generic registry.
Codex P1-6 called for the refactor; v0.40.3.1+ can absorb it once a real
second consumer requires the same registration shape.

T13 (contextual_retrieval_coverage doctor check):
- New checkContextualRetrievalCoverage() in src/commands/doctor.ts.
- Two SQL signals: pages.chunker_version < current + pages.contextual_
  retrieval_mode IS NULL. Single COUNT...FILTER query is cheap on every
  brain size.
- Audit summary line: reads ~/.gbrain/audit/synopsis-failures-*.jsonl
  via the v0.40.3.0 audit-synopsis module (T3). >5% page-level fallback
  rate surfaces explicitly so operators see the Haiku refusal signal.
- Paste-ready fix: `gbrain reindex --markdown` — the v0.32.7 + v0.40.3.0
  sweep covers both chunker_version drift AND CR mode drift per T9.
- Status: ok when fully aligned + no recent failures; warn when drift
  exists (with the paste-ready fix in the message).
- Wired into the standard doctor run alongside the other v0.36+ checks
  (abandoned_threads, calibration_freshness, etc.).

Sources/mounts CLI surfaces (set-cr-mode + trust-frontmatter) deferred
— the post-upgrade-reembed prompt + the per-page frontmatter override
path cover the v0.40.3.0 operational workflow. Per-source override CLI
is a power-user feature that can ship in v0.40.4+ once real federated-
brain users surface specific friction.

48 doctor tests pass + clean typecheck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.3.0 T14: 5 test files, 77 new tests, IRON-RULE regression coverage

Test suite for the v0.40.3.0 contextual retrieval wave. 77 new test
cases across 5 files, all green. Pins every IRON-RULE invariant
end-to-end so future contributors can't silently regress the wave.

test/contextual-retrieval-resolver.test.ts (29 tests):
- 9-combo override matrix (page-fm > source-row > global, all
  permutations).
- D15 mount-trust gate: host always trusted, mounts honor only when
  trust_frontmatter_overrides=true, rejected frontmatter surfaces via
  result.frontmatter_rejected_untrusted_mount for doctor.
- D13 invalid frontmatter (typo + non-string + empty): falls through
  to source/global with raw value in invalid_frontmatter_value.
- D18 kill switch: short-circuits to 'none' regardless of overrides.
- D26 P0-4 crModeDistinct: NULL-aware comparison, matches SQL IS
  DISTINCT FROM semantics on every combination of NULL/defined args.

test/embedding-context.test.ts (21 tests):
- buildContextualPrefix: title-only, synopsis-only, both, neither.
- wrapChunkForEmbedding: non-code wraps; D20-T4 fenced_code ALWAYS
  bypasses; null prefix passes through; image_asset wraps as text.
- sanitizeTitle: </context> injection stripped (case-insensitive),
  whitespace collapsed, 300-char cap, trim semantics.
- extractFirstTwoSentences: English boundaries, question marks, CJK
  delimiters, run-on cap, empty input, no-delimiter passthrough.
- modeRequiresHaiku / modeRequiresWrapper guards.
- D20-T1 IRON-RULE regression test: wrapping does not mutate input
  string reference (so caller's chunk_text safely flows to upsert).

test/contextual-retrieval-service-pure.test.ts (16 tests):
- computeCorpusGeneration: 16-char hex, deterministic, mode-sensitive,
  model-sensitive, TITLE_WRAPPER_VERSION stable.
- computeSourceTextHash: D27 P1-4 cache invalidation key composition.
- expectedModeForPageSourceOnly (T9 reindex predicate helper): kill
  switch returns none, source override beats global, invalid override
  falls through, all CR modes round-trip.

test/audit-synopsis.test.ts (11 tests):
- ISO-week filename rotation (stable for same week, different days).
- logSynopsisFailure round-trip: kind, page_level_fallback flag,
  multi-event accumulation, detail 200-char cap.
- summarizeSynopsisFailures aggregation: null on empty, by_kind counts,
  page_level_fallback_rate math.
- Missing audit file returns empty (silent no-op).

test/e2e/contextual-retrieval-pglite.test.ts (5 tests, hermetic PGLite + gateway stub):
- IRON RULE #1 (D20-T1): wrapper text in embedder input but NEVER in
  content_chunks.chunk_text after import — pins the canonical
  chunk_text separation invariant end-to-end.
- IRON RULE #2 (D14 stamping): pages.contextual_retrieval_mode AND
  pages.corpus_generation are set after every import.
- IRON RULE: chunker_version stamps to current MARKDOWN_CHUNKER_VERSION
  (3 for v0.40.3.0).
- D5 per-page frontmatter override: `contextual_retrieval: none` makes
  the embedder receive UNWRAPPED text; mode column stamped 'none'.
- T9 reindex predicate: pages with contextual_retrieval_mode IS NULL
  enter the sweep regardless of chunker_version.

462 tests pass across all v0.40.3.0 + adjacent suites (migrate,
pglite-engine, search-mode, doctor, import-file, upgrade-reembed-prompt,
schema-bootstrap-coverage, recursive chunker, all five new files).
Zero regressions, clean typecheck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.3.0 T15: VERSION + CHANGELOG + migration self-repair + llms regen

VERSION 0.37.11.0 → 0.40.3.0 with package.json sync. CHANGELOG entry
follows the CLAUDE.md ELI10-lead voice rule: opens with "Your search
now understands what each chunk is about, not just what words are in
it," lays out the tier ladder with a real cost table, calls out the
chunk_text storage separation (D20-T1) with a concrete example, and
includes the "Things to watch" + "What we caught and fixed before
merging" sections per the format spec.

CHANGELOG also includes the canonical "To take advantage of v0.40.3.0"
self-repair block with the manual `gbrain apply-migrations --yes` +
`gbrain reindex --markdown` recovery path for users whose
`gbrain upgrade` post-upgrade-reembed didn't fully fire.

skills/migrations/v0.40.3.0.md walks the agent through the mechanical
upgrade flow, the opt-up to tokenmax path with the realistic backfill
cost table, the opt-out soft kill switch flip, and the per-page
frontmatter override with the D15 mount-trust note. Matches the
v0.13.0 + v0.32.7 migration doc structure so agent muscle memory
works.

llms-full.txt + llms.txt regenerated via `bun run build:llms` to pick
up the CHANGELOG + migration doc additions. test/build-llms.test.ts
passes.

Also moved test/audit-synopsis.test.ts → test/audit-synopsis.serial.test.ts
to satisfy the check-test-isolation lint (the test mutates
GBRAIN_AUDIT_DIR via beforeAll/afterAll for a fixture dir, which the
parallel runner forbids in *.test.ts files; serial quarantine is the
canonical fix per CLAUDE.md test-isolation rules).

`bun run verify` passes (typecheck + 4 CI gate checks). 469 tests
across all v0.40.3.0 + adjacent suites pass with 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.3.0 test gaps: doctor check coverage + concurrent race regression

Post-T15 test gap-fill: covers the two highest-leverage spots that the
T14 suite didn't exercise.

test/contextual-retrieval-doctor.serial.test.ts (8 tests, .serial because
the doctor check reads the audit JSONL via GBRAIN_AUDIT_DIR env mutation):
- empty-brain → ok
- fully-aligned brain (chunker_version current + mode stamped) → ok
- chunker_version drift → warn with paste-ready `gbrain reindex --markdown`
- NULL mode column → warn surfaces "never evaluated against CR ladder"
- both drift conditions together → warn with both messages
- soft-deleted pages NOT counted (deleted_at filter works)
- non-markdown (code) pages NOT counted (page_kind filter works)
- audit JSONL refusal event surfaces in the failure-summary line

test/e2e/concurrent-embed-race.test.ts (3 tests, D24 regression guard):
- cold path: existing embedding NULL → take new (no-race case)
- IRON RULE: fresher write wins over stale write when text unchanged.
  Pre-fix this would have last-writer-wins via COALESCE; post-fix the
  fresher embedded_at survives. Pinned by raw SQL upsert with an
  explicit -5min embedded_at to simulate the slower writer.
- text change with no new embedding → both embedding + embedded_at
  reset to NULL (consistent state so embed --stale picks up).

Cross-shard contamination fix: race test calls configureGateway with
embedding_dimensions=1536 BEFORE initSchema so the PGLite vector column
sizes consistently regardless of what other tests in the same shard
process configured first. Without this, running the race test alongside
the pglite-e2e test triggered "expected 1280 dimensions, not 1536"
when the gateway was left in its default ZE-1280 state by a prior file.

`bun run verify` passes (typecheck + 5 CI gate checks). 88 tests pass
across all v0.40.3.0 + new gap-fill files in one combined run; zero
shared-state contamination.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.40.5.0 T2: schema — v90 contextual_retrieval_columns + v91 trigger + index

Migration v90 (renamed from v0.40.3.0 v81 on master merge per D2/D7):
- 5 additive columns (pages.contextual_retrieval_mode, pages.corpus_generation,
  sources.contextual_retrieval_mode, sources.trust_frontmatter_overrides,
  query_cache.page_generations) for the contextual retrieval wave.

Migration v91 (NEW per D6 + codex #4 + codex #8):
- pages.generation BIGINT NOT NULL DEFAULT 1 (per-page generation counter)
- query_cache.max_generation_at_store BIGINT NOT NULL DEFAULT 0 (Layer 1 bookmark)
- bump_page_generation_fn() trigger function:
  - BEFORE INSERT: NEW.generation := COALESCE(MAX(generation), 0) + 1 — codex #4
    INSERT coverage so cache rows stored before a new page existed invalidate
    correctly.
  - BEFORE UPDATE: bumps generation only when allow-list columns IS DISTINCT
    FROM (compiled_truth, timeline, frontmatter, deleted_at,
    contextual_retrieval_mode, title, type, page_kind, corpus_generation,
    content_hash) per D6 widened to catch user-visible mutations.
- CREATE INDEX CONCURRENTLY pages_generation_idx ON pages (generation) so
  MAX(generation) for the bookmark check is O(log N) — codex #8 confirmed
  plain btree, no DESC necessary.

Mirrored in src/schema.sql, src/core/pglite-schema.ts CREATE TABLE body
(trigger included so fresh PGLite installs get it from the schema blob, not
just migration replay).

Extended REQUIRED_BOOTSTRAP_COVERAGE with pages.contextual_retrieval_mode,
pages.corpus_generation, sources.contextual_retrieval_mode,
sources.trust_frontmatter_overrides, pages.generation. Probes added to
applyForwardReferenceBootstrap on both engines + matching ALTER blocks for
pre-v90/pre-v91 brains.

COLUMN_EXEMPTIONS extended: query_cache.max_generation_at_store (same
rationale as page_generations — query_cache is migration-only, not in
PGLITE_SCHEMA_SQL).

Test results:
- bun test test/migrate.test.ts: 140 pass / 0 fail
- bun test test/schema-bootstrap-coverage.test.ts: 9 pass / 0 fail
- bun run typecheck: clean

* v0.40.5.0 T3: cache gate — query-cache-gate.ts + lookup/store rewrites

New pure module src/core/search/query-cache-gate.ts:
- buildPageGenerationsSnapshot(engine, pageIds) builds the {pageId: gen}
  snapshot + MAX(generation) bookmark in one round trip via UNION ALL.
  Pre-v91 brains (no generation column) fall back to empty snapshot +
  zero bookmark — backward compat with legacy rows preserved.
- validateCacheRowAgainstPages() — pure validator for unit testing.
- CACHE_GATE_WHERE_CLAUSE exported as a SQL fragment that lookup() embeds
  in its WHERE clause. Two-layer gate per D11:
    Layer 1 (cheap): (SELECT MAX(generation) FROM pages) <=
                     qc.max_generation_at_store
    Layer 2 (per-page): jsonb_each + LEFT JOIN pages to detect deletes
                        + bumped pages on the cached result set.
  Legacy compat: rows with empty {} snapshot are vacuously valid (Layer 2
  short-circuits) — IRON-RULE pinned.

query-cache.ts wiring:
- lookup() table-aliased to `qc` so the gate fragment can reference
  qc.max_generation_at_store + qc.page_generations. WHERE clause adds
  `AND ${CACHE_GATE_WHERE_CLAUSE}` after the existing similarity + TTL +
  knobs_hash filters.
- store() captures the snapshot via the pure helper, then INSERTs both
  page_generations JSONB and max_generation_at_store BIGINT alongside
  the existing columns. ON CONFLICT (id) DO UPDATE refreshes both.

Test coverage (15 unit + 6 e2e):
- test/query-cache-gate.test.ts: 15 cases covering pure validator
  branches (vacuous valid, bookmark short-circuit, single/multi/partial
  bumps, deleted page, codex D11 critical case), PGLite-backed snapshot
  builder (empty pageIds, populated pageIds, integer JSONB shape,
  non-existent IDs skipped, bump-after-update), SQL shape regression
  on CACHE_GATE_WHERE_CLAUSE.
- test/e2e/cache-gate-pglite.test.ts: 6 cases covering store → HIT,
  content UPDATE → MISS, INSERT new page → HIT (codex #4 case where
  bookmark fires but snapshot intact serves correctly), legacy row →
  HIT (IRON-RULE backward compat), soft-delete → MISS (trigger path),
  multi-page partial bump → MISS.

Test results:
- bun test test/query-cache-gate.test.ts test/query-cache.test.ts
  test/query-cache-isolation.test.ts test/e2e/cache-gate-pglite.test.ts:
  33 pass / 0 fail
- bun run typecheck: clean

Note: hard-delete (raw DELETE FROM pages) is not covered by the trigger
(BEFORE INSERT OR UPDATE doesn't fire on DELETE). Production uses
soft-delete via deleted_at (trigger allow-list catches NULL → timestamp
distinction). Hard-delete via admin-only `gbrain pages purge-deleted` is
best-effort cache-wise — acceptable for the rare admin path.

* v0.40.5.0 T5: mode-switch UX at gbrain config set search.mode

New module src/core/search/mode-switch-ux.ts:
- summarizeTransition(old, new): pure 5-cell matrix (no_change /
  narrowing / broadening / tokenmax_opt_in / invalid_new_mode) + reindex
  command + cost estimate + paste-ready callout lines.
- probeWorkerAvailable(engine): worker liveness proxy. gbrain has no
  minion_workers heartbeat table yet (B7 follow-up from v0.19.1), so we
  use a proxy: minion_jobs activity within 10-min query window. Within
  2 min = active; >2min but <10min = stale; nothing = never_seen.
- buildReindexIdempotencyKey(): content-stable per codex D12 Bug 1.
  Pattern: cr-backfill:<source_id>:<chunker_version>:<mode>. NOT
  timestamp-based — two retries against same brain state dedupe.
- runModeSwitchUx(): orchestrator. Honors GBRAIN_NO_MODE_SWITCH_UX=1
  (full skip), non-TTY (print paste-ready hints to stderr), yesFlag
  (auto-submit reindex). For tokenmax_opt_in + TTY + worker probe
  active: submits via MinionQueue.add with allowProtectedSubmit=true.
  For probe = stale or never_seen: loud-fail per D3 with a "start a
  worker OR run inline" recovery hint — closes the silent-stall
  footgun.

src/commands/config.ts hook (~30 LOC):
- Captures the OLD search.mode BEFORE setConfig so summarizeTransition
  classifies correctly.
- Fires runModeSwitchUx() AFTER setConfig persisted, wrapped in
  try/catch so UX failures never break the config-set that already
  landed.
- Best-effort: failures emit `[mode-switch] UX hook failed (non-fatal)`
  to stderr.

Test coverage (18 cases):
- summarizeTransition: 8 cases covering all 5 transition kinds + null
  inputs + tokenmax-as-first-set + invalid mode.
- probeWorkerAvailable: 4 cases via real PGLite — never_seen / active /
  stale (seeded via minion_jobs) + threshold constant assertion.
- buildReindexIdempotencyKey: 6 cases pinning content-stable contract
  (codex D12 Bug 1) — identical inputs match, different inputs differ,
  consecutive calls match despite time delta (NOT timestamp-based).

Test results:
- bun test test/mode-switch-ux.test.ts: 18 pass / 0 fail
- bun run typecheck: clean

* v0.40.5.0 T6: gbrain mounts {enable,disable,trust-frontmatter,untrust-frontmatter}

Four new mounts CLI verbs per D4:
- gbrain mounts enable <id>             — re-enable a disabled mount
- gbrain mounts disable <id>            — toggle off without removing
- gbrain mounts trust-frontmatter <id>  — let this mount's per-page
                                          contextual_retrieval_mode
                                          frontmatter override the source
                                          default. Off by default for
                                          mounted brains; host is always
                                          trusted.
- gbrain mounts untrust-frontmatter <id> — clear the trust flag.

Implementation:
- src/core/brain-registry.ts MountEntry interface extended with
  trust_frontmatter_overrides?: boolean. loadMounts() projection threads
  the field through with default false (mounts opt in explicitly per D4
  + D15 security posture).
- src/commands/mounts.ts: new runSetMountFlag() helper handles all 4
  verbs via a shared file-write path. Missing-mount loud rejection
  (GBrainError with list-hint). Host brain rejection. Idempotent: no-op
  when current value already matches. Cache refresh after each write
  so host agents see the new flag immediately.

Test infrastructure:
- GBRAIN_MOUNTS_PATH env override on getMountsPath() in BOTH
  brain-registry.ts AND mounts.ts (the latter has its own
  copy — two source-of-truth paths). Reason: libuv caches homedir()
  on some platforms, so withFakeHome's HOME mutation isn't picked up
  by tests calling runMounts(). Production callers don't set the env.

Test coverage (5 new cases):
- enable → disable → enable cycle persists
- trust-frontmatter → untrust → trust cycle preserves other fields
- missing mount id → loud rejection with list-hint (closes the
  critical gap from idempotent-pebble Failure Modes table)
- host brain rejection: cannot trust-frontmatter "host"
- enable on already-enabled mount: no-op (idempotent)

Test results:
- bun test test/mounts-cli.test.ts test/brain-registry.serial.test.ts:
  54 pass / 0 fail
- bun run typecheck: clean

* v0.40.5.0 T7: gbrain sources set-cr-mode + missing-source loud rejection

New verb `gbrain sources set-cr-mode <id> <mode>` per D5:
- Mode argument validated against CR_MODES via isCRMode (closed enum:
  none | title | per_chunk_synopsis).
- "unset" / "default" / "" clears the column to NULL (falls through to
  the global search.mode bundle).
- Loud rejection on:
  - Missing id/mode → exit 2, prints usage
  - Invalid mode → exit 2, lists valid options
  - Missing source id → exit 4, paste-ready `gbrain sources list` hint
    (closes the idempotent-pebble Failure Modes critical gap)

src/commands/sources.ts wired into the switch dispatch + help text
updated. isCRMode + CR_MODES lazy-imported per existing import pattern
in this file.

Test coverage (10 cases):
- happy path for all 3 valid CRMode values
- unset path via "unset" + "default" both clear to NULL
- invalid mode → exit 2 + no mutation
- missing source id → exit 4
- missing arguments → exit 2 with usage
- missing mode (only id) → exit 2 + no mutation
- round-trip preserves other fields (name)

Test results:
- bun test test/sources-set-cr-mode.test.ts: 10 pass / 0 fail
- bun run typecheck: clean

* v0.40.5.0 T8: RemediationStep refactor + makeRemediationStep factory

New canonical module src/core/remediation-step.ts:
- RemediationStep interface (lifted from brain-score-recommendations.ts).
  Same shape; rename to "Step" suffix per D6 for clarity ("a step in a
  remediation plan").
- RemediationSeverity + RemediationStatus type re-exports.
- canonicalJson(value): zero-dep canonical serialization — sorts object
  keys recursively before stringify. Per codex D12 Bug 2: identical
  logical params hash identically regardless of insertion order.
- idempotencyKey(source, job, params): shape
  <source>:<job>:sha8(canonicalJson(params)). Lifted from the legacy
  inline idemKey helper so future check authors don't drift.
- makeRemediationStep(opts): canonical factory. Defaults id to the
  idempotency key (override for human-readable like 'sync.repo').
  Status defaults to 'remediable'. All check authors should use this;
  hand-rolling is the drift hazard the refactor closes.

src/core/brain-score-recommendations.ts:
- Removed the local Remediation + RemediationSeverity + RemediationStatus
  definitions.
- Re-exports them from remediation-step.ts so existing callers (e.g.
  doctor.ts) still resolve. Also re-exports Remediation as an alias
  for RemediationStep so import paths can migrate gradually.
- Imports type Remediation alias internally so the (substantial) existing
  computeRecommendations body keeps compiling without sed pass.

Test coverage (17 cases):
- canonicalJson: key-ordering determinism (3 cases), nested objects,
  array order preservation, primitive types, codex D12 Bug 2 regression
- idempotencyKey: shape regex, content invariance, key-ordering
  invariance, source/job/params differentiation
- makeRemediationStep: default id, explicit id override, default status,
  canonical-JSON invariance, all-opts threadthrough
- back-compat: `import { Remediation } from brain-score-recommendations`
  still resolves to RemediationStep (compile + runtime check)

Test results:
- bun test test/remediation-step.test.ts: 17 pass / 0 fail
- bun test test/brain-score-recommendations.test.ts test/doctor.test.ts:
  70 pass / 0 fail (back-compat preserved)
- bun run typecheck: clean

Per D6 + D8: T8b in next commit wires lint, integrity, sync_failures
doctor checks to emit RemediationStep via the new factory.

* v0.40.5.0 T8b: RemediationStep consumers — integrity + sync_failures + 3 Minion handlers

Doctor checks now emit RemediationStep via makeRemediationStep():
- `integrity` check (when bareHits > 0) emits integrity-auto step.
  Severity escalates to 'high' when bareHits > 50. Deterministic; $0 cost.
- `sync_failures` check (when unacked > 0) emits sync-retry-failed step.
  Severity escalates to 'high' when count >= 10. Content-stable params
  (failure_count + oldest_failure timestamp) per codex D12 Bug 2.
- sync-skip-failed DELIBERATELY NOT emitted per D12 Bug 3 (auto-skipping
  failed syncs hides data loss). Operators retain `gbrain sync --skip-failed`
  as a direct CLI option.

Lint doctor check NOT wired — there is no `lint` check in doctor.ts
today; the lint workflow is the standalone `gbrain lint` command. Adding
a doctor lint check is a v0.41+ TODO when it justifies its own complete
section.

Three new Minion handlers in registerBuiltinHandlers (NOT in
PROTECTED_JOB_NAMES — they're thin wrappers around already-shipping CLI
commands, idempotent, no shell exec, MCP-safe):
- lint-fix       → runLintCore({ fix: true })
- integrity-auto → runIntegrity(['auto'])
- sync-retry-failed → runSync(['--retry-failed'])

Check.remediation field shape upgrade:
- Was: inline Array<{...}> shape.
- Now: RemediationStep[] from the canonical
  src/core/remediation-step.ts. Check authors `import { makeRemediationStep }`
  and emit through the factory.

Test results:
- bun test test/doctor.test.ts: 48 pass / 0 fail (zero regression on
  the doctor surface; new remediation fields are additive)
- bun run typecheck: clean

* v0.40.5.0 T11: capture-generation regression test (D3 + codex #5)

The v0.38 ingestion cathedral added a new write path to pages via the
`ingest_capture` Minion handler. The v0.40.5.0 cache-invalidation gate
relies on pages.generation being bumped by EVERY write path via the
BEFORE INSERT OR UPDATE trigger.

This file pins that the new v0.38 capture write path correctly bumps
generation through three scenarios:

1. INSERT path (codex #4 INSERT coverage): ingest_capture with a fresh
   slug creates a page with generation = MAX(generation) + 1 so any
   cache row stored before the new page existed has its bookmark fire.
2. UPDATE path: ingest_capture with an existing slug + new content →
   trigger fires on content-column IS DISTINCT FROM and bumps generation.
3. Idempotent UPDATE: capture with the SAME content → trigger
   short-circuits, no bump. Cache freshness preserved on re-runs.

Per codex #5 strengthening: noEmbed: true is set explicitly so the test
doesn't require API keys (test runs against pure PGLite).

Test results:
- bun test test/e2e/capture-generation-regression.test.ts: 3 pass / 0 fail
- bun run typecheck: clean

* v0.40.5.0 T9: docs — CHANGELOG fold-in + CLAUDE.md + migration skill + llms regen

Single combined v0.40.5.0 CHANGELOG entry folds in v0.40.3.0 contextual
retrieval content + v0.40.5.0 wave additions (cache gate + mode-switch
UX + mounts/sources CLI + RemediationStep refactor). Voice per CLAUDE.md:
ELI10 lead, plain language, paste-ready commands, tier table, "Things
to watch", "What we caught and fixed before merging" (summarizes the
8 codex findings + 3 design decisions in user-facing terms), "Itemized
changes", "## To take advantage of v0.40.5.0" mandatory self-repair
block.

CLAUDE.md: new section "Key commands added in v0.40.5.0 (contextual
retrieval + cache gate + 4 CLI verbs)" listing the 4 new mount verbs,
sources set-cr-mode, mode-switch UX, KNOBS_HASH_VERSION bump, 3 new
Minion handlers, and the 3 new modules (remediation-step,
query-cache-gate, mode-switch-ux).

skills/migrations/v0.40.5.0.md: new migration skill with feature_pitch
frontmatter for the auto-update agent. Documents the 6 master commits
merged in, migration v90 (renumber from v81) + v91 (trigger), the
optional opt-up to tokenmax, per-source CR mode overrides, mount
frontmatter trust, the soft kill switch, and the backward-compat
guarantees.

bun run build:llms refreshed llms.txt + llms-full.txt:
- llms.txt: 4314 bytes
- llms-full.txt: 578257 bytes

Test results:
- bun test test/build-llms.test.ts: 7 pass / 0 fail (committed bundles
  byte-match generator output)

* v0.40.5.0 T10: fix 5 unit-suite drift failures from the wave

KNOBS_HASH_VERSION bumped 4→5 per D8 (sequenced behind salem's pending
v=4 graph-signals work). Three test files held stale ==3 / ==4
assertions:
- test/search-mode.test.ts: assertion + comment updated to v=5.
- test/search/knobs-hash-reranker.test.ts: assertion + describe name
  updated to v=5 ladder.
- test/cross-modal-phase1.test.ts: assertion + name updated to v=5.

reindex.test.ts "skips pages already at current chunker_version" — the
v0.40.3.0 reindex predicate (`chunker_version < CURRENT OR
contextual_retrieval_mode IS NULL`) caught the should-skip page
because its CR mode was NULL. Fixed by seeding `contextual_retrieval_mode
= 'title'` on the should-skip row.

reindex.test.ts "idempotent: re-run on a fully-updated brain reports
nothing to do" — by design, `--no-embed` reindex bumps chunker_version
but skips CR-state stamping (import-file.ts:457-466 documents this).
Fixed by manually stamping `contextual_retrieval_mode = 'title'`
between the first and second reindex calls so the brain matches the
"fully updated" state the idempotency test name implies. Production
embed flow stamps both in one pass; the test uses --no-embed only to
avoid requiring API keys.

Test results:
- bun run verify (typecheck + 4 pre-checks): clean
- bun run test: 9482 pass / 0 fail / 0 skip across 410s

* v0.40.3.0: rename version from 0.40.5.0 → 0.40.3.0 (clean slot above master)

Master is at v0.40.2.0; v0.40.3.0 is genuinely the next free slot. The wave
was originally planned as v0.40.5.0 sequenced behind salem (PR #1300 = v0.40.4.0)
but the user is shipping THIS branch as v0.40.3.0 because:

1. v0.40.3.0 IS the canonical version slot for the contextual retrieval
   cathedral (matches branch name garrytan/v0.40.3.0-contextual-retrieval).
2. Master is at v0.40.2.0 — v0.40.3.0 is the immediate next slot, not a
   collision.
3. salem's v0.40.4.0 + any v0.40.5.0 work sit ON TOP of this in the landing
   train, not under it.

Mechanical rename only — no content changes from the v0.40.5.0 commit
sequence (T1-T11 wave is preserved verbatim, just relabeled):
- VERSION + package.json: 0.40.5.0 → 0.40.3.0
- bun.lock: refreshed (no dep changes)
- CHANGELOG.md: ## [0.40.5.0] header → ## [0.40.3.0] + body references
- skills/migrations/v0.40.5.0.md → skills/migrations/v0.40.3.0.md
  (previous v0.40.3.0.md file overwritten with the richer T9 content)
- CLAUDE.md: "Key commands added in v0.40.5.0" → "v0.40.3.0"
- 30 source + test files: comment references swept via sed s/0.40.5.0/0.40.3.0/g
- llms.txt + llms-full.txt: regenerated

Migration numbering UNCHANGED: v90 (renamed from original v81 because master
took v82-v88) and v91 (new trigger migration) stay at v90/v91 — the version
slot is orthogonal to the migration ledger collision.

KNOBS_HASH_VERSION = 5 stays — sequenced behind master's v=4 schema-pack
work; salem's v=4 graph-signals will rebump to v=5 if it lands first.

Test results after rename:
- bun run verify: clean (typecheck + 7 pre-checks)
- bun run test: 9482 pass / 0 fail / 0 skip

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(migrate): v91 CREATE INDEX CONCURRENTLY can't run inside a transaction (CI Tier 1)

CI Tier 1 (Mechanical) failed on real Postgres with:
  ERROR: CREATE INDEX CONCURRENTLY cannot run inside a transaction block
  STATEMENT: <v91 multi-statement SQL block including CREATE INDEX CONCURRENTLY ...>

Root cause: postgres.js's multi-statement `.unsafe()` wraps the entire block
in an implicit transaction. `transaction: false` on the migration entry
doesn't help — the implicit wrap happens at the driver layer, below the
migration runner. CONCURRENTLY refuses to run inside any transaction.

Fix: rewrite v91 using the v14 pages_updated_at_index handler pattern —
`sql: ''` + `handler:` function that splits the work into separate
`engine.runMigration()` calls:

1. Columns + trigger function + trigger (single multi-statement runMigration —
   ALTER/CREATE FUNCTION/CREATE TRIGGER are transaction-safe).
2. On Postgres only: pre-drop invalid index remnant via
   `pg_index.indisvalid` (matches v14 pattern for retry safety after a
   failed CONCURRENTLY left a half-built index with the target name).
3. CREATE INDEX CONCURRENTLY as a standalone runMigration call (separate
   statement = no implicit transaction wrap).
4. PGLite: plain CREATE INDEX (no CONCURRENTLY needed — single writer).

Verified against real Postgres (pgvector:pg16):
- schema_version=91 after init
- pages_generation_idx exists with btree shape
- bump_page_generation_trg installed
- test/e2e/postgres-bootstrap.test.ts + test/e2e/schema-drift.test.ts:
  8 pass / 0 fail
- bun test test/migrate.test.ts test/schema-bootstrap-coverage.test.ts:
  161 pass / 0 fail
- bun run typecheck: clean

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Master landed v0.40.1.0 → v0.40.3.0 while this branch sat (LongMemEval
batch, trajectory routing, contextual retrieval). Both waves added a
ModeBundle knob; both bumped KNOBS_HASH_VERSION. Resolution keeps both
knobs co-existing and consolidates the cache-key version at 5 (master's
contextual-retrieval wave explicitly sequenced behind salem's pending
v=4 graph signals — first to land claims v=4, second rebases to v=5).

Conflicts resolved:
- VERSION + package.json + CHANGELOG.md: kept 0.40.4.0
- src/core/search/mode.ts: kept BOTH graph_signals (mine) and
  contextual_retrieval + contextual_retrieval_disabled (master) across
  ModeBundle interface, all 3 MODE_BUNDLES bundles, SearchKeyOverrides,
  SearchPerCallOpts, resolveSearchMode picker, loadOverridesFromConfig,
  and SEARCH_MODE_CONFIG_KEYS. Single KNOBS_HASH_VERSION = 5 with merged
  comment chain. parts[] order: gs= and pack= at v=4 tier; cr= and crd=
  at v=5 tier.
- src/commands/search.ts: KNOB_DESCRIPTIONS includes both knob blocks.
- test/search-mode.test.ts: canonical-bundle assertions include both
  fields per mode (graph_signals + contextual_retrieval); single
  expect(KNOBS_HASH_VERSION).toBe(5) with combined rationale comment.
- test/search/knobs-hash-reranker.test.ts: single version-5 assertion
  with ladder explanation (1→2 reranker; 2→3 floor_ratio + cross-modal;
  3→4 graph_signals + schema_pack; 4→5 contextual_retrieval).
- test/cross-modal-phase1.test.ts: same — single version-5 assertion.

Verified: bun run typecheck clean; mode + cross-modal + graph-signals
test suites (142 tests) all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit d28be5d into master May 23, 2026
8 checks passed
garrytan added a commit that referenced this pull request May 25, 2026
CI failure surfaced a time-dependent test flake in
`test/audit/audit-writer.test.ts` "returns events from current week,
filtered by ts cutoff" (added in v0.40.4.0 PR #1300). The test pinned
synthetic `now = 2026-05-22T12:00:00Z` (ISO week 21), logged 3 events
with synthetic ts values, then called `readRecent(7, now)` expecting
to find 2 events in window.

Root cause: `log()` ignored the caller-supplied `ts` for filename
routing and ALWAYS wrote to the file matching real-time-now's ISO
week. When real CI time crossed into 2026-W22 (this Monday), the
events went to W22's file but `readRecent` walked W21 + W20 → 0 hits.

Fix:
- `log()` parses `event.ts` (when provided) and routes to the file
  matching that ts's ISO week. Falls back to real-now when ts is
  missing or unparseable.
- No behavior change for production callers — none of the 5 audit
  consumers pass `ts` explicitly (rerank-audit, audit-slug-fallback,
  content-sanity-audit, graph-signals, supervisor-audit). The writer
  stamps real-now → both ts and filename use real-now → same file
  as before.
- Sibling test "honors caller-supplied ts override" also pinned a
  fixed ts and would have broken from the opposite angle (test
  read from `computeFilename()` default = real-now). Updated to
  read from `computeFilename(new Date(fixedTs))` so it asserts the
  per-row file routing the wave now provides.

22/22 audit-writer cases pass. Production callers (5 sites) unchanged.

Pre-existing on master since v0.40.4.0; surfaced when real time
crossed into a different ISO week than the test's synthetic now.
NOT introduced by this PR (#1377 community-PR-wave) — audit-writer
files aren't touched by the wave.
garrytan added a commit that referenced this pull request May 25, 2026
…ed dream judge (6 community PRs) (#1377)

* fix(cli): use fd 0 instead of '/dev/stdin' for cross-platform stdin reads

`readFileSync('/dev/stdin', 'utf-8')` works on Unix but fails on Windows
(Git Bash, PowerShell, cmd) with `ENOENT: no such file or directory,
open '/dev/stdin'`. Windows doesn't expose `/dev/stdin` as a filesystem
path.

Reading file descriptor 0 directly (`readFileSync(0, 'utf-8')`) is the
documented Node.js idiom and works on every platform. No behavior change
on Unix — same syscall path, same semantics.

Repro on Windows before the fix:
  echo "test" | gbrain put my-page
  ENOENT: no such file or directory, open '/dev/stdin'

After: round-trip put/search/delete works on Windows Git Bash.

* v0.40.6.1 feat: llama-server reranker — local Qwen3 / self-hosted ZE via llama.cpp

Adds local reranker support so users can point gbrain's reranker call at their
own llama.cpp server instead of ZeroEntropy's hosted API. One new recipe
(`llama-server-reranker`), a `path?: string` + `default_timeout_ms?: number`
extension on `RerankerTouchpoint`, env passthrough wiring, budget-tracker
`FREE_LOCAL_RERANK_PROVIDERS` set so `--max-cost` callers don't TX2 hard-fail on
local rerank, and a doctor-probe divergence fix (probe and live search now read
the same `search.reranker.model` path via `loadSearchModeConfig` + `resolveSearchMode`).

ZE-hosted users are unchanged. Voyage / Cohere / vLLM rerankers stay out of
scope — different wire shapes need adapter hooks designed against their actual
shapes in a follow-up plan.

Verification:
- `bun run verify` (typecheck + 13 pre-checks): clean
- `bun run check:all` (15 historical checks): clean
- 107/107 expect() calls pass across 5 affected test files
- /codex review against the full diff: GATE PASS (caught one [P2] /v1 path
  doubling bug pre-merge; fixed by changing recipe path to leaf `/rerank`)
- Claude adversarial subagent: 7 net-new findings filed as v0.40.7+ TODOs
  (none currently exploitable; hardening for future contributor traps)

Test surface (107 cases, 5 files):
- test/ai/rerank.test.ts: path override (exact URL match), default_timeout_ms
  honored, empty models[] accepts any id, ZE regression
- test/ai/recipe-llama-server-reranker.test.ts: recipe shape regression guard
  + base_url + path concat assertion (codex-caught /v1/v1/ regression)
- test/search-mode.test.ts: timeout precedence chain (per-call > config >
  recipe > bundle), ZE no-recipe-default regression, unknown provider fallthrough
- test/models-doctor-reranker.test.ts: divergence-fix helper across DB-plane
  read, mode default, disabled, override, DB-error graceful fallback
- test/core/budget/budget-tracker.test.ts: free-local rerank pricing + arbitrary
  model id + chat-kind TX2 hard-fail preserved

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: post-ship documentation sync

* docs: index docs/ai-providers/ in llms.txt (zeroentropy + llama-server-reranker)

The hand-curated llms-config.ts doc map never included docs/ai-providers/, so
both zeroentropy.md (since v0.35.0.0) and the new llama-server-reranker.md were
invisible to the AI-facing llms.txt / llms-full.txt index. Adds an "AI providers"
section with both. Marked includeInFull: false (setup walkthroughs belong in the
index but would push the single-fetch bundle past FULL_SIZE_BUDGET) — same
treatment CHANGELOG.md gets.

Caught by the /ship document-release subagent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: recipe-aware embedding-provider check for local providers

doctor --remediation-plan and autopilot both judged the embedding
provider with a hosted-only key check, so a brain on ollama: or
llama-server: was reported "blocked" on a missing API key it never
needed, contradicting doctor --json's 100%-coverage health.

Extract a shared embeddingProviderConfigured() helper into
brain-score-recommendations.ts: empty auth_env.required (local
providers) is configured with no key; hosted providers check their
OWN required key. Both producers (doctor, autopilot) call it,
killing the DRY violation that caused the bug. Hosted brains with a
missing key still block.

* fix(budget): price local embed providers at $0

A --max-cost-bounded embed/reindex job configured for ollama: or
llama-server: TX2 hard-failed with no_pricing because
lookupEmbeddingPrice has no entry for local models. Add
FREE_LOCAL_EMBED_PROVIDERS (sibling to FREE_LOCAL_RERANK_PROVIDERS)
so a pricing miss on a local-inference provider returns $0 instead
of null. lmstudio/litellm intentionally excluded.

* feat(models): embedding reachability probe in gbrain models doctor

A down/misconfigured local embed server was invisible until first
embed. Add probeEmbeddingReachability() (mirrors the reranker probe):
a 1-input embed with a 5s abort timeout, classified via classifyError,
under a new 'embedding_reachability' touchpoint, gated on the
zero-network config probe returning ok first.

* fix: don't count config-plane voyage/google keys as configured

codex review caught a false positive: HOSTED_EMBED_KEY_CONFIG mapped
VOYAGE_API_KEY/GOOGLE_GENERATIVE_AI_API_KEY to config fields, but
buildGatewayConfig only threads openai/anthropic/zeroentropy config
keys into the gateway env. A Voyage/Google brain with the key only in
config.json would be judged "configured" and dispatch an embed.stale
job that then fails auth at the gateway. Drop those two from the map so
the producer closures resolve them by env var only, matching what the
gateway can actually use. Pinned by a regression test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(dream): route significance judge through gateway.chat for multi-provider support

Replaces the hardcoded `new Anthropic()` client in the dream-cycle synthesize
phase with a gateway-routed JudgeClient adapter. Mirrors the v0.35.5.0 pattern
that closed #952 for runThink: construction-time provider/key probe returns null
on a clear miss (cheap pre-flight); the verdict loop wraps the chat call in
try/catch for AIConfigError mid-run.

Any provider with a registered gateway recipe (Anthropic, DeepSeek, OpenRouter,
Voyage, Ollama, llama-server, etc.) is now reachable via:

    gbrain config set models.dream.synthesize_verdict <provider>:<model>

The canonical config key `models.dream.synthesize_verdict` (per PER_TASK_KEYS
in src/core/model-config.ts) is used unchanged. The exported JudgeClient
interface signature is preserved for test-seam stability.

The original community PR (#1349) shipped a custom fetch adapter that
bypassed the gateway entirely. This reworked landing routes through the
canonical seam so future provider additions automatically benefit, and a
CI guard (T7) will land in this wave to prevent the bug class from
re-opening (the same one that bit src/core/think/index.ts before v0.35.5.0).

Co-Authored-By: justemu <206393437+justemu@users.noreply.github.com>

* test(dream): synthesize-gateway-adapter unit tests + R3 parsed-verdict parity

11 cases pin the gateway-routed JudgeClient adapter from T5:

- A1: makeJudgeClient returns null on missing Anthropic key (legacy short-circuit preserved)
- A2: returns a JudgeClient when chat provider is reachable
- A3: JudgeClient.create routes through gateway.chat (via __setChatTransportForTests)
- A4: ChatResult.text → Anthropic.Message.content[0].text mapping
- A5: empty text from gateway → graceful empty-text Anthropic.Message
- A6: non-AIConfigError from gateway propagates to caller (no swallow)
- A7: AIConfigError from gateway propagates as AIConfigError (caught per-transcript in production loop)
- A8: makeJudgeClient returns null on unknown provider prefix
- A9: returns a JudgeClient for non-anthropic providers without env-probing (delegates to gateway at call time)
- R3: parsed-verdict SEMANTIC parity — gateway-routed and legacy SDK-shape JudgeClients produce same {worth_processing, reasons} given identical canned LLM text
- R3 corollary: unparseable LLM output → both paths fall through to cheap-fallback verdict

Codex flagged byte-identical-Anthropic.Message as a meaningless gate; R3 is
parsed-verdict semantic parity instead. Mirror pattern of
test/think-gateway-adapter.test.ts for cross-site consistency with the
v0.35.5.0 runThink migration.

* ci: guard against direct Anthropic SDK construction in gateway-routed files

New scripts/check-gateway-routed-no-direct-anthropic.sh greps two guarded
files (src/core/cycle/synthesize.ts and src/core/think/index.ts) for
`new Anthropic()` constructor calls and runtime imports of @anthropic-ai/sdk.
Type-only imports (`import type Anthropic from '@anthropic-ai/sdk'`) stay
allowed because both files use Anthropic.Message / .MessageCreateParamsNonStreaming
as adapter types.

Comment lines (starting with `//` or ` *`) are excluded so historical
references in JSDoc don't false-fire. Negative test in this commit's
verification confirms: injecting `new Anthropic()` into synthesize.ts
makes the guard exit 1 with a clear error pointing at the gateway adapter
pattern; reverting restores the OK state.

Wired into both `bun run verify` and `bun run check:all`. Closes the bug
class that bit synthesize.ts in PR #1349 (which would have shipped a
parallel fetch stack instead of routing through the canonical gateway).
The same class previously bit think/index.ts and was fixed structurally
in v0.35.5.0; this guard prevents either file from regressing.

Extend GUARDED_FILES in the script when migrating another file off
direct SDK construction.

* docs(put_page): point Windows / pipe-buffer users at gbrain capture --file

Extends the put_page op description (surfaced by `gbrain put --help`) with a
one-line pointer to `gbrain capture --file PATH --slug SLUG` for the file-
as-input use case. Capture (v0.39.3.0) is the canonical Windows-pipe-buffer
escape route: reads files as a Buffer first, scans the first 8KB for NUL bytes
to refuse binary content, decodes to UTF-8 only after the safety check, and
adds provenance write-through.

Lands the user-facing value the closed PR #1365 was reaching for, without
duplicating the CLI surface. Credits the original contributor.

Co-Authored-By: ecat2010 <90021101+ecat2010@users.noreply.github.com>

* test: R1+R2+R4 critical regression pins for the community-PR-wave landing

Per the wave's eng-review plan (IRON RULE — mandatory):

  R1 — get_page handler accepts calls without `content` param. Pre-wave
       PR #1365 landed its `!p.content → throw` check in the WRONG handler
       (get_page instead of put_page), which would have broken every read
       in the system. Pin: get_page MUST NOT require content + the schema
       carries no `content` or `file` param.

  R2 — put_page schema content stays `required: true`. PR #1365 also
       flipped `content` from required→optional in the schema. Pin: the
       contract stays at `required: true` + the closed PR's `file` param
       is NOT in the schema.

  R4 — Cross-platform stdin via fd 0 (PR #1325 regression pin). Source-grep
       asserts src/cli.ts uses `readFileSync(0, ...)` and NOT the legacy
       `readFileSync('/dev/stdin', ...)`. Belt-and-suspenders pattern
       assertions confirm the parseOpArgs branch shape (cliHints.stdin
       check, 5MB cap, isTTY gate) hasn't drifted.

R3 (gateway-adapter parsed-verdict parity) lives in the sibling file
test/cycle/synthesize-gateway-adapter.test.ts.

* test(e2e): update dream-synthesize no-key reason text + harden hermeticity

After T5's gateway-adapter rework, the "no API key" verdict text changed from
'no ANTHROPIC_API_KEY for significance judge' to
'no configured provider for verdict model: <model>' (broader + names the
actual model so the user sees WHICH provider failed). Update both assertions
that check the old text.

Hermeticity bug fix in the same commit: `withoutAnthropicKey` previously only
cleared the env var. After the rework, `makeJudgeClient` ALSO checks
`loadConfig().anthropic_api_key` (same hasAnthropicKey() pattern think/index.ts
uses since v0.35.5.0). If the developer running the test has the key set in
~/.gbrain/config.json, the test would behave non-deterministically. Fix:
override GBRAIN_HOME to a fresh tmpdir for the duration of the body, restore
on return (even on throw).

* test(e2e): pin verdict-loop AIConfigError catch from T5 rework end-to-end

Drives runPhaseSynthesize against a real PGLite engine with the gateway
chat transport stubbed to throw AIConfigError on every call (simulates a
revoked/misconfigured provider surfacing mid-run). Asserts:

  - Phase does NOT crash; converts the throw to a per-transcript verdict
    with worth=false and reasons[0] matching "gateway error: ...".
  - status='ok' so subsequent transcripts in the loop would continue
    being judged (not visible in 1-transcript test, but the loop shape is
    proven not to abort).

Pre-rework (T5), this code path didn't exist — judgeSignificance threw
directly to runPhaseSynthesize and crashed the whole phase. Pin so a
future regression that removes the try/catch fires loudly.

* docs(claude.md): annotate v0.41+ community-PR-wave changes

Two additions to the Key files section:

- src/core/cycle/synthesize.ts — appends a v0.41+ paragraph documenting
  the gateway-adapter rework (makeJudgeClient + AIConfigError catch loop +
  canonical config key + JudgeClient interface preserved + CI guard
  reference + test file references).

- scripts/check-gateway-routed-no-direct-anthropic.sh — new entry
  documenting the CI guard's contract, scope, and how to extend
  GUARDED_FILES when migrating another file off direct SDK construction.

CLAUDE.md drives /sync-gbrain and llms.txt generation; both need the
wave's annotations to land BEFORE the llms regeneration step (T10).

* docs(llms): regenerate llms.txt + llms-full.txt for v0.41+ wave

Refreshes the auto-generated llms.txt bundles to pick up the CLAUDE.md
annotations landed earlier in this wave (gateway-adapter synthesize.ts
+ check-gateway-routed-no-direct-anthropic.sh + the cherry-picked
llama-server-reranker recipe). Pinned by test/build-llms.test.ts.

* fix(providers): dynamic-width id column accommodates llama-server-reranker

v0.40.6.1 introduced `llama-server-reranker` (21 chars), which overflowed
formatRecipeTable's static 14-char PROVIDER column. When the id is longer
than the column, padEnd is a no-op — the row starts with the tier name
directly, no space delimiter. test/providers.test.ts 'each recipe appears
at most once' iterates every recipe and asserts at least one row starts
with `${id} ` or `${id}  `; with no space after `llama-server-reranker`,
the assertion fails and the recipe appears effectively missing from the
human-readable list.

Fix: compute column width dynamically as `max(14, max(id.length) + 1)` so
every id is followed by at least one space, regardless of length. Also
widens the separator rule to match. 14 stays as the floor so the existing
short-id rows (openai 6, ollama 6, anthropic 9, ...) keep their familiar
layout when llama-server-reranker isn't in the active recipe set.

10/10 cases in test/providers.test.ts pass after the fix.

* chore: pre-landing review polish — refresh models doctor tip + file embed timeout TODO

Two pre-landing review absorptions:

- `src/commands/models.ts:154` — the help-text tip said `gbrain models doctor`
  "spends ~1 token per model" but the wave added an `embed(['probe'])` call
  AND a reranker probe. Generalize to "spends a minimal request per configured
  chat/embed/rerank surface" so the cost expectation matches reality.

- `TODOS.md` — file a follow-up to widen `default_timeout_ms` from
  RerankerTouchpoint to EmbeddingTouchpoint so `probeEmbeddingReachability`
  doesn't hardcode 5000ms while the sibling reranker probe reads the
  recipe's configured timeout. Local CPU embedding endpoints (llama-server)
  hit the same cold-start curve as Qwen3-Reranker-4B; workaround today is
  "re-run the probe" per the existing JSDoc.

Other informational findings from pre-landing review either match
established patterns (no behavioral test for `probeEmbeddingReachability`,
matching `probeRerankerReachability`), are intentional choices documented
in JSDoc (the `as unknown as Anthropic.Message` cast), or are micro-perf
in non-hot paths (autopilot's 4 sequential `getConfig` awaits per
5-minute tick). All non-blocking.

* ci: tighten gateway-routed guard against import bypass shapes + honest JSDoc

Adversarial review caught two soft spots in the wave's new contracts:

1. `scripts/check-gateway-routed-no-direct-anthropic.sh` only matched the
   default-import shape `import Anthropic from '@anthropic-ai/sdk'`. A future
   contributor (or, more realistically, a future refactor) could bypass with:
     - `import { Anthropic } from '@anthropic-ai/sdk'`
     - `import { Anthropic as A } from '@anthropic-ai/sdk'`
     - `import * as Anthropic from '@anthropic-ai/sdk'`
     - `const x = await import('@anthropic-ai/sdk')`
   Tightened the regex to match ANY value-shaped import from the SDK module
   (excluding only the explicit `import type ... from '@anthropic-ai/sdk'`
   form which the adapter's Anthropic.Message return type needs). Added a
   second grep for dynamic imports. Verified all four bypass shapes now
   trigger the guard against synthesize.ts; type-only import still passes.

2. `synthesize.ts:makeJudgeClient` JSDoc claimed the adapter "tolerates the
   array-of-blocks shape for future flexibility" — but the mapping flattens
   ONLY text blocks; `tool_use`, `tool_result`, image blocks silently
   become empty strings. Today only `judgeSignificance` calls this and it
   only sends string content, so no behavior bug. But the comment was
   marketing future flexibility the code doesn't deliver. Narrowed to call
   out the silent-drop and say to extend the mapping if a future caller
   wires non-text content through.

Both wave-scope: the CI guard was added by the wave, the JSDoc was added
by the wave's T5 rework. Adversarial review caught them before merge.

* fix(models doctor): reranker probe timeout matches live search precedence chain

Codex Pass-9 adversarial review caught a probe-vs-production divergence:
production `hybridSearch` resolves reranker timeout via the full chain
(per-call > config > recipe > bundle) by going through
`loadSearchModeConfig + resolveSearchMode`, but `probeRerankerReachability`
was reading ONLY the recipe's `default_timeout_ms` — so an operator who
set `search.reranker.timeout_ms=1000` would see doctor wait 30s and report
"reachable" while production search timed out at 1s and fail-opened.
A higher configured timeout produces the opposite false failure (probe
gives up at 5s when production would have waited longer).

Fix: extract `resolveLiveRerankerTimeoutMs(engine)` parallel to the
existing `resolveLiveRerankerModel(engine)` — same precedence chain,
same DB-plane consistency posture. The probe now reads the SAME timeout
live search reads, on the same lookup path.

The codex P1 finding about `FREE_LOCAL_*_PROVIDERS` zero-pricing being
bypassable via redirected `LLAMA_SERVER_BASE_URL` is filed as a TODO under
community-pr-wave follow-ups — couples with the existing
FREE_LOCAL_PROVIDERS unification TODO so both close in one v0.41+ PR.

* ci(guard): handle mixed type+value imports + macOS BSD sed POSIX classes

Codex structured review [P3] caught a bypass in the freshly-tightened
gateway-routed guard:

  import { type Message, Anthropic } from '@anthropic-ai/sdk';
  new Anthropic();

The previous regex `^\s*import\s+[^t][^y]*from ...` was meant to exclude
`import type ...` but stops at the `y` in `type` inside the brace list,
silently allowing the value-import `Anthropic` through. Two fixes:

1. Replace the brittle regex-based type-exclusion with a clause-level
   parse: extract the brace-list specifiers, allow the import iff EVERY
   non-empty specifier is `type`-prefixed. Catches mixed-import bypasses
   (`{ type Foo, Bar }`) while keeping all-type braces (`{ type Foo, type Bar }`)
   passing. Default + namespace imports remain always-value-shaped.

2. Replace `\s` with POSIX `[[:space:]]` in the sed extract — macOS BSD sed
   doesn't honor `\s` in extended-regex mode (it silently no-ops the pattern
   so `specifiers` comes back empty and the script falls through to the
   default/namespace branch's wrong error message).

Hermetic 7-shape regression matrix now verifies every TypeScript import
shape against the expected ALLOW/BLOCK verdict; all 7 pass:
- ALLOW: `import type Anthropic from '...'`
- ALLOW: `import type { Foo } from '...'`
- ALLOW: `import { type Message, type Foo } from '...'`
- BLOCK: `import { type Message, Anthropic } from '...'`
- BLOCK: `import { Anthropic } from '...'`
- BLOCK: `import Anthropic from '...'`
- BLOCK: `import * as A from '...'`

Subshell-trap fix in the same commit: the previous "exit 1 inside while-pipe"
pattern doesn't propagate to the outer `$?` because the pipe spawns a
subshell. Switched to a tmpfile-flagged sentinel so the verdict survives
the subshell boundary cleanly.

* chore: bump version and changelog (v0.41.4.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(audit-writer): route log() to file matching event ts, not real-now

CI failure surfaced a time-dependent test flake in
`test/audit/audit-writer.test.ts` "returns events from current week,
filtered by ts cutoff" (added in v0.40.4.0 PR #1300). The test pinned
synthetic `now = 2026-05-22T12:00:00Z` (ISO week 21), logged 3 events
with synthetic ts values, then called `readRecent(7, now)` expecting
to find 2 events in window.

Root cause: `log()` ignored the caller-supplied `ts` for filename
routing and ALWAYS wrote to the file matching real-time-now's ISO
week. When real CI time crossed into 2026-W22 (this Monday), the
events went to W22's file but `readRecent` walked W21 + W20 → 0 hits.

Fix:
- `log()` parses `event.ts` (when provided) and routes to the file
  matching that ts's ISO week. Falls back to real-now when ts is
  missing or unparseable.
- No behavior change for production callers — none of the 5 audit
  consumers pass `ts` explicitly (rerank-audit, audit-slug-fallback,
  content-sanity-audit, graph-signals, supervisor-audit). The writer
  stamps real-now → both ts and filename use real-now → same file
  as before.
- Sibling test "honors caller-supplied ts override" also pinned a
  fixed ts and would have broken from the opposite angle (test
  read from `computeFilename()` default = real-now). Updated to
  read from `computeFilename(new Date(fixedTs))` so it asserts the
  per-row file routing the wave now provides.

22/22 audit-writer cases pass. Production callers (5 sites) unchanged.

Pre-existing on master since v0.40.4.0; surfaced when real time
crossed into a different ISO week than the test's synthetic now.
NOT introduced by this PR (#1377 community-PR-wave) — audit-writer
files aren't touched by the wave.

---------

Co-authored-by: Tobias <34135750+tobbecokta@users.noreply.github.com>
Co-authored-by: kohai-ut <chris@tincreek.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: justemu <noreply@github.com>
Co-authored-by: justemu <206393437+justemu@users.noreply.github.com>
Co-authored-by: ecat2010 <90021101+ecat2010@users.noreply.github.com>
mgunnin added a commit to mgunnin/gbrain that referenced this pull request May 28, 2026
* upstream/master: (22 commits)
  v0.41.4.0 wave: local providers + cross-platform stdin + gateway-routed dream judge (6 community PRs) (garrytan#1377)
  v0.41.3.0 fix(security/mcp): OAuth CORS lockdown + pre-register without DCR + validator surface (garrytan#1403)
  v0.41.2.0 feat: lens packs + epistemology unification — atoms + concepts as first-class units, calibration profile widening, gstack-learnings bridge (garrytan#1364)
  v0.41.1.0 feat: eval-loop wave — gbrain bench publish + gbrain eval gate close the LOOP (garrytan#1352)
  v0.41.0.0 feat(minions): fleet you supervise (4 field bugs + cathedral) (garrytan#1367)
  v0.40.10.0 feat: content sanity defense — junk-pattern throw + oversize-skip-embed (garrytan#1351)
  v0.40.9.0 feat(chunker): .sql indexing via tree-sitter + code-def on SQL DDL (garrytan#1173) (garrytan#1350)
  v0.40.8.1 docs: README rewrite + personal-brain + company-brain tutorials (garrytan#1345)
  v0.40.8.0 test: e2e + unit gap coverage + master flake root-cause fixes (garrytan#1313)
  v0.40.6.1 docs(todos): file v0.41 wave commitments + 7 verified-missing items (garrytan#1333)
  v0.40.7.0 Schema Cathedral v3 — agent-on-ramp + production rebuild of PR garrytan#1321 (garrytan#1327)
  v0.40.6.0 feat(sync): parallel sync --all + per-source lock invariant + sources status dashboard (productionized from PR garrytan#1314) (garrytan#1324)
  v0.40.5.0 Federated Sync v2 — parallel source sync + push triggers + per-source health (garrytan#1322)
  v0.40.4.0 feat(search): selective graph signals + per-stage attribution + audit-writer unification (garrytan#1300)
  v0.40.3.0 feat: contextual retrieval + cache invalidation gate + 4 deferred-item closures (garrytan#1323)
  v0.40.2.0 feat: trajectory routing for temporal + knowledge_update (gbrain think + LongMemEval) (garrytan#1296)
  v0.40.1.0 Track D — eval infrastructure (catch retrieval regressions, prove answer-quality wins) (garrytan#1298)
  v0.40.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm (garrytan#1128)
  v0.39.3.0: productionize the v0.38 ingestion cathedral (smoke-test fix wave from PR garrytan#1299) (garrytan#1308)
  v0.39.2.0 feat(autopilot): per-source fan-out + cycle lock primitive + phase taxonomy (garrytan#1295)
  ...
garrytan-agents pushed a commit to garrytan-agents/gbrain that referenced this pull request Jun 13, 2026
…ed dream judge (6 community PRs) (garrytan#1377)

* fix(cli): use fd 0 instead of '/dev/stdin' for cross-platform stdin reads

`readFileSync('/dev/stdin', 'utf-8')` works on Unix but fails on Windows
(Git Bash, PowerShell, cmd) with `ENOENT: no such file or directory,
open '/dev/stdin'`. Windows doesn't expose `/dev/stdin` as a filesystem
path.

Reading file descriptor 0 directly (`readFileSync(0, 'utf-8')`) is the
documented Node.js idiom and works on every platform. No behavior change
on Unix — same syscall path, same semantics.

Repro on Windows before the fix:
  echo "test" | gbrain put my-page
  ENOENT: no such file or directory, open '/dev/stdin'

After: round-trip put/search/delete works on Windows Git Bash.

* v0.40.6.1 feat: llama-server reranker — local Qwen3 / self-hosted ZE via llama.cpp

Adds local reranker support so users can point gbrain's reranker call at their
own llama.cpp server instead of ZeroEntropy's hosted API. One new recipe
(`llama-server-reranker`), a `path?: string` + `default_timeout_ms?: number`
extension on `RerankerTouchpoint`, env passthrough wiring, budget-tracker
`FREE_LOCAL_RERANK_PROVIDERS` set so `--max-cost` callers don't TX2 hard-fail on
local rerank, and a doctor-probe divergence fix (probe and live search now read
the same `search.reranker.model` path via `loadSearchModeConfig` + `resolveSearchMode`).

ZE-hosted users are unchanged. Voyage / Cohere / vLLM rerankers stay out of
scope — different wire shapes need adapter hooks designed against their actual
shapes in a follow-up plan.

Verification:
- `bun run verify` (typecheck + 13 pre-checks): clean
- `bun run check:all` (15 historical checks): clean
- 107/107 expect() calls pass across 5 affected test files
- /codex review against the full diff: GATE PASS (caught one [P2] /v1 path
  doubling bug pre-merge; fixed by changing recipe path to leaf `/rerank`)
- Claude adversarial subagent: 7 net-new findings filed as v0.40.7+ TODOs
  (none currently exploitable; hardening for future contributor traps)

Test surface (107 cases, 5 files):
- test/ai/rerank.test.ts: path override (exact URL match), default_timeout_ms
  honored, empty models[] accepts any id, ZE regression
- test/ai/recipe-llama-server-reranker.test.ts: recipe shape regression guard
  + base_url + path concat assertion (codex-caught /v1/v1/ regression)
- test/search-mode.test.ts: timeout precedence chain (per-call > config >
  recipe > bundle), ZE no-recipe-default regression, unknown provider fallthrough
- test/models-doctor-reranker.test.ts: divergence-fix helper across DB-plane
  read, mode default, disabled, override, DB-error graceful fallback
- test/core/budget/budget-tracker.test.ts: free-local rerank pricing + arbitrary
  model id + chat-kind TX2 hard-fail preserved

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: post-ship documentation sync

* docs: index docs/ai-providers/ in llms.txt (zeroentropy + llama-server-reranker)

The hand-curated llms-config.ts doc map never included docs/ai-providers/, so
both zeroentropy.md (since v0.35.0.0) and the new llama-server-reranker.md were
invisible to the AI-facing llms.txt / llms-full.txt index. Adds an "AI providers"
section with both. Marked includeInFull: false (setup walkthroughs belong in the
index but would push the single-fetch bundle past FULL_SIZE_BUDGET) — same
treatment CHANGELOG.md gets.

Caught by the /ship document-release subagent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: recipe-aware embedding-provider check for local providers

doctor --remediation-plan and autopilot both judged the embedding
provider with a hosted-only key check, so a brain on ollama: or
llama-server: was reported "blocked" on a missing API key it never
needed, contradicting doctor --json's 100%-coverage health.

Extract a shared embeddingProviderConfigured() helper into
brain-score-recommendations.ts: empty auth_env.required (local
providers) is configured with no key; hosted providers check their
OWN required key. Both producers (doctor, autopilot) call it,
killing the DRY violation that caused the bug. Hosted brains with a
missing key still block.

* fix(budget): price local embed providers at $0

A --max-cost-bounded embed/reindex job configured for ollama: or
llama-server: TX2 hard-failed with no_pricing because
lookupEmbeddingPrice has no entry for local models. Add
FREE_LOCAL_EMBED_PROVIDERS (sibling to FREE_LOCAL_RERANK_PROVIDERS)
so a pricing miss on a local-inference provider returns $0 instead
of null. lmstudio/litellm intentionally excluded.

* feat(models): embedding reachability probe in gbrain models doctor

A down/misconfigured local embed server was invisible until first
embed. Add probeEmbeddingReachability() (mirrors the reranker probe):
a 1-input embed with a 5s abort timeout, classified via classifyError,
under a new 'embedding_reachability' touchpoint, gated on the
zero-network config probe returning ok first.

* fix: don't count config-plane voyage/google keys as configured

codex review caught a false positive: HOSTED_EMBED_KEY_CONFIG mapped
VOYAGE_API_KEY/GOOGLE_GENERATIVE_AI_API_KEY to config fields, but
buildGatewayConfig only threads openai/anthropic/zeroentropy config
keys into the gateway env. A Voyage/Google brain with the key only in
config.json would be judged "configured" and dispatch an embed.stale
job that then fails auth at the gateway. Drop those two from the map so
the producer closures resolve them by env var only, matching what the
gateway can actually use. Pinned by a regression test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(dream): route significance judge through gateway.chat for multi-provider support

Replaces the hardcoded `new Anthropic()` client in the dream-cycle synthesize
phase with a gateway-routed JudgeClient adapter. Mirrors the v0.35.5.0 pattern
that closed garrytan#952 for runThink: construction-time provider/key probe returns null
on a clear miss (cheap pre-flight); the verdict loop wraps the chat call in
try/catch for AIConfigError mid-run.

Any provider with a registered gateway recipe (Anthropic, DeepSeek, OpenRouter,
Voyage, Ollama, llama-server, etc.) is now reachable via:

    gbrain config set models.dream.synthesize_verdict <provider>:<model>

The canonical config key `models.dream.synthesize_verdict` (per PER_TASK_KEYS
in src/core/model-config.ts) is used unchanged. The exported JudgeClient
interface signature is preserved for test-seam stability.

The original community PR (garrytan#1349) shipped a custom fetch adapter that
bypassed the gateway entirely. This reworked landing routes through the
canonical seam so future provider additions automatically benefit, and a
CI guard (T7) will land in this wave to prevent the bug class from
re-opening (the same one that bit src/core/think/index.ts before v0.35.5.0).

Co-Authored-By: justemu <206393437+justemu@users.noreply.github.com>

* test(dream): synthesize-gateway-adapter unit tests + R3 parsed-verdict parity

11 cases pin the gateway-routed JudgeClient adapter from T5:

- A1: makeJudgeClient returns null on missing Anthropic key (legacy short-circuit preserved)
- A2: returns a JudgeClient when chat provider is reachable
- A3: JudgeClient.create routes through gateway.chat (via __setChatTransportForTests)
- A4: ChatResult.text → Anthropic.Message.content[0].text mapping
- A5: empty text from gateway → graceful empty-text Anthropic.Message
- A6: non-AIConfigError from gateway propagates to caller (no swallow)
- A7: AIConfigError from gateway propagates as AIConfigError (caught per-transcript in production loop)
- A8: makeJudgeClient returns null on unknown provider prefix
- A9: returns a JudgeClient for non-anthropic providers without env-probing (delegates to gateway at call time)
- R3: parsed-verdict SEMANTIC parity — gateway-routed and legacy SDK-shape JudgeClients produce same {worth_processing, reasons} given identical canned LLM text
- R3 corollary: unparseable LLM output → both paths fall through to cheap-fallback verdict

Codex flagged byte-identical-Anthropic.Message as a meaningless gate; R3 is
parsed-verdict semantic parity instead. Mirror pattern of
test/think-gateway-adapter.test.ts for cross-site consistency with the
v0.35.5.0 runThink migration.

* ci: guard against direct Anthropic SDK construction in gateway-routed files

New scripts/check-gateway-routed-no-direct-anthropic.sh greps two guarded
files (src/core/cycle/synthesize.ts and src/core/think/index.ts) for
`new Anthropic()` constructor calls and runtime imports of @anthropic-ai/sdk.
Type-only imports (`import type Anthropic from '@anthropic-ai/sdk'`) stay
allowed because both files use Anthropic.Message / .MessageCreateParamsNonStreaming
as adapter types.

Comment lines (starting with `//` or ` *`) are excluded so historical
references in JSDoc don't false-fire. Negative test in this commit's
verification confirms: injecting `new Anthropic()` into synthesize.ts
makes the guard exit 1 with a clear error pointing at the gateway adapter
pattern; reverting restores the OK state.

Wired into both `bun run verify` and `bun run check:all`. Closes the bug
class that bit synthesize.ts in PR garrytan#1349 (which would have shipped a
parallel fetch stack instead of routing through the canonical gateway).
The same class previously bit think/index.ts and was fixed structurally
in v0.35.5.0; this guard prevents either file from regressing.

Extend GUARDED_FILES in the script when migrating another file off
direct SDK construction.

* docs(put_page): point Windows / pipe-buffer users at gbrain capture --file

Extends the put_page op description (surfaced by `gbrain put --help`) with a
one-line pointer to `gbrain capture --file PATH --slug SLUG` for the file-
as-input use case. Capture (v0.39.3.0) is the canonical Windows-pipe-buffer
escape route: reads files as a Buffer first, scans the first 8KB for NUL bytes
to refuse binary content, decodes to UTF-8 only after the safety check, and
adds provenance write-through.

Lands the user-facing value the closed PR garrytan#1365 was reaching for, without
duplicating the CLI surface. Credits the original contributor.

Co-Authored-By: ecat2010 <90021101+ecat2010@users.noreply.github.com>

* test: R1+R2+R4 critical regression pins for the community-PR-wave landing

Per the wave's eng-review plan (IRON RULE — mandatory):

  R1 — get_page handler accepts calls without `content` param. Pre-wave
       PR garrytan#1365 landed its `!p.content → throw` check in the WRONG handler
       (get_page instead of put_page), which would have broken every read
       in the system. Pin: get_page MUST NOT require content + the schema
       carries no `content` or `file` param.

  R2 — put_page schema content stays `required: true`. PR garrytan#1365 also
       flipped `content` from required→optional in the schema. Pin: the
       contract stays at `required: true` + the closed PR's `file` param
       is NOT in the schema.

  R4 — Cross-platform stdin via fd 0 (PR garrytan#1325 regression pin). Source-grep
       asserts src/cli.ts uses `readFileSync(0, ...)` and NOT the legacy
       `readFileSync('/dev/stdin', ...)`. Belt-and-suspenders pattern
       assertions confirm the parseOpArgs branch shape (cliHints.stdin
       check, 5MB cap, isTTY gate) hasn't drifted.

R3 (gateway-adapter parsed-verdict parity) lives in the sibling file
test/cycle/synthesize-gateway-adapter.test.ts.

* test(e2e): update dream-synthesize no-key reason text + harden hermeticity

After T5's gateway-adapter rework, the "no API key" verdict text changed from
'no ANTHROPIC_API_KEY for significance judge' to
'no configured provider for verdict model: <model>' (broader + names the
actual model so the user sees WHICH provider failed). Update both assertions
that check the old text.

Hermeticity bug fix in the same commit: `withoutAnthropicKey` previously only
cleared the env var. After the rework, `makeJudgeClient` ALSO checks
`loadConfig().anthropic_api_key` (same hasAnthropicKey() pattern think/index.ts
uses since v0.35.5.0). If the developer running the test has the key set in
~/.gbrain/config.json, the test would behave non-deterministically. Fix:
override GBRAIN_HOME to a fresh tmpdir for the duration of the body, restore
on return (even on throw).

* test(e2e): pin verdict-loop AIConfigError catch from T5 rework end-to-end

Drives runPhaseSynthesize against a real PGLite engine with the gateway
chat transport stubbed to throw AIConfigError on every call (simulates a
revoked/misconfigured provider surfacing mid-run). Asserts:

  - Phase does NOT crash; converts the throw to a per-transcript verdict
    with worth=false and reasons[0] matching "gateway error: ...".
  - status='ok' so subsequent transcripts in the loop would continue
    being judged (not visible in 1-transcript test, but the loop shape is
    proven not to abort).

Pre-rework (T5), this code path didn't exist — judgeSignificance threw
directly to runPhaseSynthesize and crashed the whole phase. Pin so a
future regression that removes the try/catch fires loudly.

* docs(claude.md): annotate v0.41+ community-PR-wave changes

Two additions to the Key files section:

- src/core/cycle/synthesize.ts — appends a v0.41+ paragraph documenting
  the gateway-adapter rework (makeJudgeClient + AIConfigError catch loop +
  canonical config key + JudgeClient interface preserved + CI guard
  reference + test file references).

- scripts/check-gateway-routed-no-direct-anthropic.sh — new entry
  documenting the CI guard's contract, scope, and how to extend
  GUARDED_FILES when migrating another file off direct SDK construction.

CLAUDE.md drives /sync-gbrain and llms.txt generation; both need the
wave's annotations to land BEFORE the llms regeneration step (T10).

* docs(llms): regenerate llms.txt + llms-full.txt for v0.41+ wave

Refreshes the auto-generated llms.txt bundles to pick up the CLAUDE.md
annotations landed earlier in this wave (gateway-adapter synthesize.ts
+ check-gateway-routed-no-direct-anthropic.sh + the cherry-picked
llama-server-reranker recipe). Pinned by test/build-llms.test.ts.

* fix(providers): dynamic-width id column accommodates llama-server-reranker

v0.40.6.1 introduced `llama-server-reranker` (21 chars), which overflowed
formatRecipeTable's static 14-char PROVIDER column. When the id is longer
than the column, padEnd is a no-op — the row starts with the tier name
directly, no space delimiter. test/providers.test.ts 'each recipe appears
at most once' iterates every recipe and asserts at least one row starts
with `${id} ` or `${id}  `; with no space after `llama-server-reranker`,
the assertion fails and the recipe appears effectively missing from the
human-readable list.

Fix: compute column width dynamically as `max(14, max(id.length) + 1)` so
every id is followed by at least one space, regardless of length. Also
widens the separator rule to match. 14 stays as the floor so the existing
short-id rows (openai 6, ollama 6, anthropic 9, ...) keep their familiar
layout when llama-server-reranker isn't in the active recipe set.

10/10 cases in test/providers.test.ts pass after the fix.

* chore: pre-landing review polish — refresh models doctor tip + file embed timeout TODO

Two pre-landing review absorptions:

- `src/commands/models.ts:154` — the help-text tip said `gbrain models doctor`
  "spends ~1 token per model" but the wave added an `embed(['probe'])` call
  AND a reranker probe. Generalize to "spends a minimal request per configured
  chat/embed/rerank surface" so the cost expectation matches reality.

- `TODOS.md` — file a follow-up to widen `default_timeout_ms` from
  RerankerTouchpoint to EmbeddingTouchpoint so `probeEmbeddingReachability`
  doesn't hardcode 5000ms while the sibling reranker probe reads the
  recipe's configured timeout. Local CPU embedding endpoints (llama-server)
  hit the same cold-start curve as Qwen3-Reranker-4B; workaround today is
  "re-run the probe" per the existing JSDoc.

Other informational findings from pre-landing review either match
established patterns (no behavioral test for `probeEmbeddingReachability`,
matching `probeRerankerReachability`), are intentional choices documented
in JSDoc (the `as unknown as Anthropic.Message` cast), or are micro-perf
in non-hot paths (autopilot's 4 sequential `getConfig` awaits per
5-minute tick). All non-blocking.

* ci: tighten gateway-routed guard against import bypass shapes + honest JSDoc

Adversarial review caught two soft spots in the wave's new contracts:

1. `scripts/check-gateway-routed-no-direct-anthropic.sh` only matched the
   default-import shape `import Anthropic from '@anthropic-ai/sdk'`. A future
   contributor (or, more realistically, a future refactor) could bypass with:
     - `import { Anthropic } from '@anthropic-ai/sdk'`
     - `import { Anthropic as A } from '@anthropic-ai/sdk'`
     - `import * as Anthropic from '@anthropic-ai/sdk'`
     - `const x = await import('@anthropic-ai/sdk')`
   Tightened the regex to match ANY value-shaped import from the SDK module
   (excluding only the explicit `import type ... from '@anthropic-ai/sdk'`
   form which the adapter's Anthropic.Message return type needs). Added a
   second grep for dynamic imports. Verified all four bypass shapes now
   trigger the guard against synthesize.ts; type-only import still passes.

2. `synthesize.ts:makeJudgeClient` JSDoc claimed the adapter "tolerates the
   array-of-blocks shape for future flexibility" — but the mapping flattens
   ONLY text blocks; `tool_use`, `tool_result`, image blocks silently
   become empty strings. Today only `judgeSignificance` calls this and it
   only sends string content, so no behavior bug. But the comment was
   marketing future flexibility the code doesn't deliver. Narrowed to call
   out the silent-drop and say to extend the mapping if a future caller
   wires non-text content through.

Both wave-scope: the CI guard was added by the wave, the JSDoc was added
by the wave's T5 rework. Adversarial review caught them before merge.

* fix(models doctor): reranker probe timeout matches live search precedence chain

Codex Pass-9 adversarial review caught a probe-vs-production divergence:
production `hybridSearch` resolves reranker timeout via the full chain
(per-call > config > recipe > bundle) by going through
`loadSearchModeConfig + resolveSearchMode`, but `probeRerankerReachability`
was reading ONLY the recipe's `default_timeout_ms` — so an operator who
set `search.reranker.timeout_ms=1000` would see doctor wait 30s and report
"reachable" while production search timed out at 1s and fail-opened.
A higher configured timeout produces the opposite false failure (probe
gives up at 5s when production would have waited longer).

Fix: extract `resolveLiveRerankerTimeoutMs(engine)` parallel to the
existing `resolveLiveRerankerModel(engine)` — same precedence chain,
same DB-plane consistency posture. The probe now reads the SAME timeout
live search reads, on the same lookup path.

The codex P1 finding about `FREE_LOCAL_*_PROVIDERS` zero-pricing being
bypassable via redirected `LLAMA_SERVER_BASE_URL` is filed as a TODO under
community-pr-wave follow-ups — couples with the existing
FREE_LOCAL_PROVIDERS unification TODO so both close in one v0.41+ PR.

* ci(guard): handle mixed type+value imports + macOS BSD sed POSIX classes

Codex structured review [P3] caught a bypass in the freshly-tightened
gateway-routed guard:

  import { type Message, Anthropic } from '@anthropic-ai/sdk';
  new Anthropic();

The previous regex `^\s*import\s+[^t][^y]*from ...` was meant to exclude
`import type ...` but stops at the `y` in `type` inside the brace list,
silently allowing the value-import `Anthropic` through. Two fixes:

1. Replace the brittle regex-based type-exclusion with a clause-level
   parse: extract the brace-list specifiers, allow the import iff EVERY
   non-empty specifier is `type`-prefixed. Catches mixed-import bypasses
   (`{ type Foo, Bar }`) while keeping all-type braces (`{ type Foo, type Bar }`)
   passing. Default + namespace imports remain always-value-shaped.

2. Replace `\s` with POSIX `[[:space:]]` in the sed extract — macOS BSD sed
   doesn't honor `\s` in extended-regex mode (it silently no-ops the pattern
   so `specifiers` comes back empty and the script falls through to the
   default/namespace branch's wrong error message).

Hermetic 7-shape regression matrix now verifies every TypeScript import
shape against the expected ALLOW/BLOCK verdict; all 7 pass:
- ALLOW: `import type Anthropic from '...'`
- ALLOW: `import type { Foo } from '...'`
- ALLOW: `import { type Message, type Foo } from '...'`
- BLOCK: `import { type Message, Anthropic } from '...'`
- BLOCK: `import { Anthropic } from '...'`
- BLOCK: `import Anthropic from '...'`
- BLOCK: `import * as A from '...'`

Subshell-trap fix in the same commit: the previous "exit 1 inside while-pipe"
pattern doesn't propagate to the outer `$?` because the pipe spawns a
subshell. Switched to a tmpfile-flagged sentinel so the verdict survives
the subshell boundary cleanly.

* chore: bump version and changelog (v0.41.4.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(audit-writer): route log() to file matching event ts, not real-now

CI failure surfaced a time-dependent test flake in
`test/audit/audit-writer.test.ts` "returns events from current week,
filtered by ts cutoff" (added in v0.40.4.0 PR garrytan#1300). The test pinned
synthetic `now = 2026-05-22T12:00:00Z` (ISO week 21), logged 3 events
with synthetic ts values, then called `readRecent(7, now)` expecting
to find 2 events in window.

Root cause: `log()` ignored the caller-supplied `ts` for filename
routing and ALWAYS wrote to the file matching real-time-now's ISO
week. When real CI time crossed into 2026-W22 (this Monday), the
events went to W22's file but `readRecent` walked W21 + W20 → 0 hits.

Fix:
- `log()` parses `event.ts` (when provided) and routes to the file
  matching that ts's ISO week. Falls back to real-now when ts is
  missing or unparseable.
- No behavior change for production callers — none of the 5 audit
  consumers pass `ts` explicitly (rerank-audit, audit-slug-fallback,
  content-sanity-audit, graph-signals, supervisor-audit). The writer
  stamps real-now → both ts and filename use real-now → same file
  as before.
- Sibling test "honors caller-supplied ts override" also pinned a
  fixed ts and would have broken from the opposite angle (test
  read from `computeFilename()` default = real-now). Updated to
  read from `computeFilename(new Date(fixedTs))` so it asserts the
  per-row file routing the wave now provides.

22/22 audit-writer cases pass. Production callers (5 sites) unchanged.

Pre-existing on master since v0.40.4.0; surfaced when real time
crossed into a different ISO week than the test's synthetic now.
NOT introduced by this PR (garrytan#1377 community-PR-wave) — audit-writer
files aren't touched by the wave.

---------

Co-authored-by: Tobias <34135750+tobbecokta@users.noreply.github.com>
Co-authored-by: kohai-ut <chris@tincreek.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: justemu <noreply@github.com>
Co-authored-by: justemu <206393437+justemu@users.noreply.github.com>
Co-authored-by: ecat2010 <90021101+ecat2010@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant