v0.40.4.0 feat(search): selective graph signals + per-stage attribution + audit-writer unification#1300
Merged
Merged
Conversation
Extract createAuditWriter() helper. Five hand-rolled JSONL audit modules (rerank-audit, shell-audit, supervisor-audit, audit-slug- fallback, phantom-audit) duplicated the same ISO-week filename math, best-effort write loop, and read-current-plus-previous-week loop. T2 refactors all 5 onto this primitive. Behavior preservation: filename format, JSONL line shape, mkdir recursive, appendFileSync utf8, stderr-on-failure all byte-identical to the existing modules so their tests pass unchanged. resolveAuditDir() moves here from shell-audit.ts; shell-audit.ts will re-export for back-compat (T2). Honors GBRAIN_AUDIT_DIR with whitespace-trim, falls back to ~/.gbrain/audit/. Test coverage: 22 cases covering ISO-week math + year-boundary edges (2027-01-01 → 2026-W53), env override, mkdir-recursive, fail-open stderr-warn shape, cross-week readback, corrupt-row skip, non-finite- ts skip, round-trip with nested fields, computeFilename + resolveDir accessors. Plan ref: D5=B audit unification cathedral expansion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the duplicated ISO-week filename math + best-effort write loop
+ read-current-plus-previous-week loop in:
- src/core/rerank-audit.ts (rerank-failures-*.jsonl)
- src/core/audit-slug-fallback.ts (slug-fallback-*.jsonl)
- src/core/minions/handlers/shell-audit.ts (shell-jobs-*.jsonl)
- src/core/minions/handlers/supervisor-audit.ts (supervisor-*.jsonl)
- src/core/facts/phantom-audit.ts (phantoms-*.jsonl)
All five now delegate file I/O to createAuditWriter from T1. Public
API preserved bit-for-bit:
- logRerankFailure, readRecentRerankFailures, computeRerankAuditFilename
- logSlugFallback, readRecentSlugFallbacks, computeSlugFallbackAuditFilename
- logShellSubmission, computeAuditFilename, resolveAuditDir
- writeSupervisorEvent, readSupervisorEvents, computeSupervisorAuditFilename
plus isCrashExit, summarizeCrashes, CrashSummary (domain-specific
helpers stay in supervisor-audit.ts; only file I/O moves)
- logPhantomEvent, readRecentPhantomEvents, computePhantomAuditFilename
Domain-specific behavior preserved:
- audit-slug-fallback emits per-call stderr (D7 dual logging) in the
caller; the shared writer is failure-only stderr
- rerank-audit truncates error_summary to 200 chars before write
- phantom-audit spreads optional fields conditionally (skip undefined)
- supervisor-audit keeps single-file readback (no cross-week walk)
to preserve pre-v0.40.4 doctor assertions
resolveAuditDir lives in src/core/audit/audit-writer.ts; shell-audit.ts
re-exports it so existing imports keep working (every other audit
module + gbrain-home-isolation.test.ts + minions.test.ts +
minions-shell.test.ts pull resolveAuditDir from shell-audit.ts).
Operator-visible drift: rerank-audit stderr line drops the
'rerank-failure audit' qualifier — was '[gbrain] rerank-failure audit
write failed (...)' now '[gbrain] write failed (...); search continues'.
Stderr is human-debugging, not machine-parsed; the file written gives
the qualifier away in `tail -f audit/*`.
Test coverage: 128/128 audit-touching tests pass unchanged.
Plan ref: D5=B audit unification cathedral expansion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add BrainEngine.getAdjacencyBoosts(pageIds) returning Map<page_id,
AdjacencyRow{hits, cross_source_hits}>. Returns ALL pages with
hits >= 1 (callers apply their own threshold).
Cross-source semantic (D15=A): cross_source_hits EXCLUDES the target
page's own source. A page in source A linked from 2 pages in source A
reports cross_source_hits = 0. Linked from 1 in source B + 1 in
source C reports 2.
Source-scope contract: pageIds MUST already be source-scoped by the
caller. Method does NOT filter by source_id. The in-set restriction
makes cross-source leakage impossible by construction. JSDoc spells
this out; same trust posture as cosineReScore's chunk_id handling.
COALESCE(p.source_id, 'default') on both target and from-page sides
for defense-in-depth even though pages.source_id is NOT NULL today.
JSDoc/SQL contract alignment (codex #2): HAVING >= 1 matches the
"returns ALL pages with hits >= 1" contract; threshold of 2 is the
caller's call in applyGraphSignals.
Known limitation (codex #15): cross_source_hits cannot distinguish
"genuinely linked from another team" from "mirrored imports from
another source." T-todo-4 captures the v0.41+ refinement.
SearchResult type extension (D4=A flat fields, D12=A attribution):
- graph_adjacency_hits, graph_cross_source_hits,
graph_session_demoted, graph_session_prefix
- base_score, backlink_boost, salience_boost, recency_boost,
exact_match_boost, graph_adjacency_boost, graph_cross_source_boost,
session_demote_factor, reranker_delta
All optional; T4-T6 populate them.
Test coverage: 7/7 hermetic PGLite cases. Empty input, singleton,
same-source hub, cross-source attribution including the
"linked-only-from-other-source" case (widget in source b, linked
from alice+bob in source a → cross_source_hits=1), JSDoc HAVING>=1
contract. Postgres parity asserted by SQL-shape identity (will get a
mirror Postgres E2E in T10's eval gate work via DATABASE_URL when
set; PGLite hermetic case shipped now).
NULL source_id COALESCE branch noted as untestable in current PGLite
schema (pages.source_id is NOT NULL); kept as defense-in-depth.
Plan ref: T3 in v0.40.4.0 wave plan; D1=A, D3=A, D15=A.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New file src/core/search/graph-signals.ts. Three signals:
1. Adjacency-within-top-K (×1.05): hits >= 2 inbound from in-set.
2. Cross-source adjacency (×1.10, stacks): cross_source_hits >= 2.
Dormant on single-source brains.
3. Session diversification (×0.95): if multiple top-K share a slug
prefix, keep highest scoring, DEMOTE the rest. NOT amplify —
codex caught the original framing was backwards (amplification
of redundancy makes the cited "weak chunks compete for budget"
problem worse, not better).
Conservative magnitudes (D14=B): 1.05/1.10/0.95. Score-distribution
probe (onScoreDistribution) collects min/p25/p50/p75/p95/max +
reorder_band_width to feed T-todo-2 magnitude calibration wave.
Slot: 4th stage inside runPostFusionStages (hybrid.ts:248), AFTER
backlink/salience/recency, pre-dedup. Inherits the v0.35.6.0
floor-ratio gate from computeFloorThreshold — this is the structural
protection that prevents a low-cosine hub from outranking a strong
non-hub (codex T2 / D1=A).
PostFusionOpts extends with graphSignalsEnabled, onGraphMeta,
onScoreDistribution. Caller (hybridSearch in subsequent T5 work)
resolves graph_signals from the mode bundle.
Source-scope contract preserved: getAdjacencyBoosts takes raw
page_ids, no source filter. Adjacency is in-set restricted so
cross-source leakage is impossible by construction (D3=A).
Fail-open: engine throw → JSONL audit row via shared createAuditWriter
(T1/T2 primitive, featureName='graph-signals-failures') + meta.errored
+ caller's results unchanged. Session diversification ALSO skips on
failure (predictable all-or-nothing posture).
Mutation note (codex #9): score mutated in place. base_score must be
stamped at runPostFusionStages entry BEFORE this stage so eval-capture
sees pre-boost score (T6 attribution wave).
Test coverage (24 cases, including T11 IRON RULE regression):
- sessionPrefix multi/single/empty cases
- computeScoreDistribution percentile math
- Disabled + empty short-circuits
- Adjacency hit, no-hit, cross-source stacking, cross-source alone
- Session diversification 3-share + single-segment + singleton
- Test seam injection (no engine call)
- Fail-open: throw → audit row + meta.errored + unchanged
- Empty Map → session still runs
- Score-distribution always emits when enabled
- Meta carries fire counts + duration_ms
- Missing page_id silently skipped from dedup set
- **T11 IRON RULE regression (3 cases):**
* weak hub BELOW floor_threshold does NOT get boosted past
above-floor non-hub (the bug class the floor gate exists for)
* hub AT floor still gets boosted (gate is < not <=)
* NaN score → NaN >= threshold is false → no boost
Plan ref: T4 + T11 in v0.40.4.0 wave plan; D1=A, D2=A, D11=B, D14=B,
D9=A, D5=B. Codex outside-voice #1 + #2 + #6 + #8 + #9 addressed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ModeBundle gains graph_signals: boolean. Per-mode defaults:
- conservative: false (cost-sensitive tier)
- balanced: true (the wave's primary surface for default-on)
- tokenmax: true (power-user tier, capstone fit)
SearchKeyOverrides + SearchPerCallOpts gain optional graph_signals
field. resolveSearchMode picks via the standard per-call → config
override → mode bundle chain.
loadOverridesFromConfig parses 'search.graph_signals' from the config
table ('1' or 'true' → true). SEARCH_MODE_CONFIG_KEYS adds the key
so `gbrain search modes --reset` clears it alongside other knobs.
KNOBS_HASH_VERSION bump 3→4 (append-only per CDX2-F13). New `gs=`
parts entry appended AFTER cross-modal + column + prov entries. A
graph-on cache write cannot be served to a graph-off lookup —
mid-deploy hit-rate dip clears within cache.ttl_seconds (3600s).
src/commands/search.ts KNOB_DESCRIPTIONS gains graph_signals entry
so `gbrain search modes` dashboard renders the new knob.
Test coverage:
- test/search-mode.test.ts (+ 8 new cases): per-mode defaults
canonical, config override both directions, per-call override
wins, knobsHash distinct for on/off, config key registered,
attributeKnob reports per-call + mode sources correctly.
- test/search/knobs-hash-reranker.test.ts: version assertion
bumped 3→4 with v0.40.4 rationale comment.
- test/cross-modal-phase1.test.ts: version assertion bumped
3→4 with v0.40.4 rationale comment.
- Canonical-bundle assertions updated to include graph_signals
in expected shape (3 cases).
50/50 search-mode tests pass. 45/45 cross-modal pass. 17/17
knobs-hash-reranker pass. 10/10 balanced-reranker pass.
Plan ref: T5 in v0.40.4.0 wave plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every boost stage that mutates SearchResult.score now stamps a field
recording WHAT it multiplied:
- applyBacklinkBoost → backlink_boost (skipped when count == 0)
- applySalienceBoost → salience_boost (skipped when score == 0)
- applyRecencyBoost → recency_boost (skipped on evergreen prefix)
- applyExactMatchBoost → exact_match_boost (skipped on no-match
OR when intent's exactMatchBoost == 1.0 no-op)
- runPostFusionStages → base_score stamped ONCE at entry, BEFORE
any boost mutates r.score. Idempotent: caller-pre-stamped value
preserved. Empty-results short-circuit unchanged.
- applyReranker → reranker_delta = original_index - new_index
(positive = rank improved; raw rerank score stays in rerank_score)
- applyGraphSignals → graph_adjacency_boost, graph_cross_source_boost,
session_demote_factor (T4 already stamped these)
Why: feeds the T7 `gbrain search --explain` formatter so it can
attribute the final score to its components. Without these stamps,
"why did this rank where it did?" is grep-and-guess.
SearchResult.reranker_delta doc updated to clarify it's a RANK delta
(positive = improved), not a score delta. The raw relevance score
stays in `rerank_score` (untyped, for back-compat with telemetry that
already reads it).
Test coverage: 16 new cases in test/search/attribution-stamping.test.ts.
Pins: every boost stamps when it fires AND skips stamping when it
doesn't (no false attribution on no-op stages). base_score idempotency
preserved. reranker_delta computed correctly across rank-improved +
rank-degraded cases.
All 178/178 search tests pass (no regressions).
Plan ref: T6 cathedral expansion in v0.40.4.0 wave plan; D12=A.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New file src/core/search/explain-formatter.ts renders SearchResult[]
as a multi-line breakdown of how the final score was formed:
1. people/alice (score=12.4)
base=10.2 (rrf+cosine)
+ backlink ×1.08
+ salience ×1.05
+ adjacency ×1.05 (hits=3)
+ cross_source ×1.10 (other_sources=2)
↑ reranker rank +2
= final 12.4
Reads the boost_* / base_score / *_hits fields populated by T4 + T6.
Empty path: "no boosts applied" when no stage stamped anything.
Session demote rendered with `-` prefix (not `+`) so the demotion
direction is visually distinct from boosts.
CliOptions gains `explain: boolean`; parseGlobalFlags recognizes
`--explain` anywhere in argv. cli.ts formatResult for `search` +
`query` cases reads CliOptions.explain via the module-level
singleton and routes to formatResultsExplain when set. Lazy import
keeps the hot path narrow for the common non-explain case.
Number formatting: 4-decimal precision, trailing zeros stripped
('1.0000' → '1', '0.1234' → '0.1234'). NaN preserved as 'NaN'.
Test coverage:
- test/search/explain-formatter.test.ts: 19 cases pin output
format. Each boost type renders correctly, every-stage stacking
composes, reranker_delta=0 doesn't render, empty list short-
circuits, rank numbering 1-based, number formatting edge cases.
- test/cli-options.test.ts: 3 new cases for --explain parsing
(basic, absent default, any-argv-position).
Existing CliOptions literals in test/cli-options.test.ts +
test/thin-client-upgrade-prompt.test.ts updated for new required
explain field.
JSON envelope unchanged — the same attribution fields surface in
existing --json output via JSON.stringify; no separate JSON formatter
needed.
Plan ref: T7 cathedral expansion in v0.40.4.0 wave plan; D12=A + D6=A.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New checkGraphSignalsCoverage in src/commands/doctor.ts. Wired into
both runDoctor (local engine) and doctorReportRemote (HTTP MCP /
JSON path) so local AND remote-server brains both surface the metric.
Logic:
1. Resolve active graph_signals setting: config override
'search.graph_signals' wins, else mode bundle default
('search.mode' → conservative=false, balanced/tokenmax=true).
2. When disabled → silent ok ("disabled — coverage not checked").
Avoids polluting doctor output on installs that don't use the
feature.
3. When enabled, compute global inbound-link density:
COUNT(DISTINCT to_page_id) / COUNT(*) across non-deleted pages.
4. <10% → warn ("signal will rarely fire") with paste-ready
`gbrain extract all` fix hint.
5. >=30% → ok ("fire on most queries") with metric.
6. 10-29% → ok ("fire occasionally") with metric.
Known limitation (codex outside-voice #14): global density is an
imperfect proxy for "top-K subgraphs have enough edges to fire."
T-todo-5 captures the v0.41+ refinement that measures actual fire
rate from search-stats after 30 days of data.
Best-effort: SQL errors → warn with the underlying message. Never
breaks doctor.
Test coverage (7 new cases in test/doctor.test.ts):
- conservative mode → silent ok regardless of coverage
- balanced default + 0 links → warn at 0% with fix hint
- balanced default + 40% inbound → ok "fire on most queries"
- balanced default + 20% inbound → ok "fire occasionally"
- explicit search.graph_signals=false overrides mode default
- empty brain → ok with explanation
- check is wired into runDoctor (source-grep regression guard)
All 55/55 doctor.test.ts cases pass.
Plan ref: T8 in v0.40.4.0 wave plan; D6=A.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
runStatsSubcommand in src/commands/search.ts gains a graph_signals
section in both --json and human output:
Graph signals:
enabled: true (mode default)
failures: 3 fail-open event(s)
ECONNREFUSED 2
timeout 1
Data sources:
- config: 'search.graph_signals' override → enabled + source=config,
otherwise mode-bundle default → enabled + source=mode_default.
- JSONL audit: readRecentGraphSignalsFailures(days) returns events;
failures_count is len, failures_by_reason buckets by first word of
error_summary (e.g. 'ECONNREFUSED', 'timeout').
JSON envelope (schema_version 2 unchanged; graph_signals is a new
sibling property of stats, so consumers reading the existing fields
keep working):
{
"schema_version": 2,
...stats...,
"graph_signals": {
"enabled": bool,
"source": "config" | "mode_default",
"failures_count": int,
"failures_by_reason": { reason: count }
},
"_meta": { metric_glossary: { ..., graph_signals_enabled: ..., graph_signals_failures_count: ... } }
}
Fire-rate metrics (adjacency_fires, cross_source_fires,
session_demotions) and score-distribution stats are NOT in this
section yet — they require telemetry-table writes from the
applyGraphSignals onMeta callback. Wired in v0.41+ via T-todo-2
calibration wave (the wave that needs them). For v0.40.4: status +
error count is the actionable surface for "is graph_signals on, and
is it failing?"
Human output: prints the section after the existing stats block.
Edge case: when total_calls is 0 BUT graph_signals is enabled OR
has historical failures, still prints the section so operators
don't lose the signal on a brain with no telemetry yet.
Test coverage (6 cases in test/search/search-stats-graph-signals.test.ts):
- search.graph_signals=true → enabled true, source=config
- mode=conservative → enabled false, source=mode_default
- no config → enabled true (balanced default), source=mode_default
- JSONL failures bucketed by first word of error_summary
- empty audit → failures_count 0, empty failures_by_reason
- human output includes "Graph signals:" header
Plan ref: T9 in v0.40.4.0 wave plan; D6=A.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New test/e2e/graph-signals-eval.test.ts runs each longmemeval-mini
question twice (graph_signals off, graph_signals on) and asserts:
Gate 1 (QUALITY) — paired bootstrap, 10,000 resamples:
- If signals-on is significantly WORSE than off
(delta < 0 AND p < 0.05) → fail.
- Otherwise pass. p>=0.05 either direction OR delta >= 0 → ok.
Gate 2a (CHANGE-MAGNITUDE): mean Jaccard@5 over result-set overlap
must be >= 0.5. If results overlap less than half, the change is
too large and needs human review before default-on.
Gate 2b (CHANGE-MAGNITUDE): top-1 stability rate >= 0.7. If 30%+
of top picks change, hard look required.
Gate 3 (HARD ABSOLUTE FLOOR): recall@5 drop <= 5pt. Catastrophic
regression catch (codex outside-voice #18 — addresses the "top-5
must not drop at all" brittleness on tiny fixtures).
Bootstrap implementation:
- Per-question observation is binary (recall@5 hit/miss).
- Paired pairing on question_id between on/off branches.
- Centered distribution under null (subtract observed mean) per
standard paired-bootstrap-shift approach for binary outcomes.
- Two-tailed p-value: |resampled delta| >= |observed delta|.
- Deterministic seeded RNG so test runs are stable across CI.
pairedBootstrapPValue exported as a pure function with separate
tests for edge cases (empty input, all-equal, strong positive, strong
negative, determinism). Reusable from future calibration waves.
Hermetic: in-memory PGLite via createBenchmarkBrain + resetTables
between questions. No API keys needed (--no-embed import path
exercises keyword-only retrieval). Skips gracefully via describe.skip
when the fixture is missing.
Plan ref: T10 in v0.40.4.0 wave plan; D7=C absolute floor + D13=A
paired bootstrap; codex #4 + #18 stability-vs-quality distinction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VERSION: 0.37.11.0 → 0.40.4.0
package.json: 0.37.11.0 → 0.40.4.0
CHANGELOG.md: top entry for v0.40.4.0 in ELI10-lead voice per
CLAUDE.md release rules. Lead is plain-English ("Your search now
notices when a page is a hub for your query"); precise file paths
/ SQL semantics / numbers live in the "Itemized changes" section
below. Includes the cathedral-expansion notes (D5=B audit
unification, D12=A per-stage attribution, D13=A eval gates) and
the "To take advantage of v0.40.4.0" verify-and-fix block.
TODOS.md: 5 new items captured under "v0.40.4 graph signals —
deferred follow-ups (v0.41+)":
- T-todo-1: profile graph-signal SQL latency, merge if hot (D8=C)
- T-todo-2: magnitude calibration wave from probe data (D14=B / D17)
- T-todo-3: DB-backed audit table for cross-deploy observability (codex #15)
- T-todo-4: sync-topology-aware cross-source signal (codex #11)
- T-todo-5: replace doctor's global density with fire-rate (codex #14)
Verified the 3-line audit: VERSION + package.json + CHANGELOG topmost
all match 0.40.4.0. `bun install` ran (lockfile unchanged — root
package version isn't stored in bun.lock). `bun run build:llms`
refreshed llms.txt + llms-full.txt for the next commit.
Plan ref: T12 in v0.40.4.0 wave plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 isCacheSafe test failures in shard 2 reproduce on stashed clean master. Confirmed pre-existing — not introduced by v0.40.4. Filed under "Pre-existing flake on master (noticed during v0.40.4 ship)" with reproduction commands + remediation options. Shipping v0.40.4 through it; future wave can fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLAUDE.md line 550 bans the private OpenClaw fork name in public artifacts. Example session prefix in sessionPrefix() docs + 3 test fixtures swept to 'media/chat/...' instead. Pre-existing scripts/check-privacy.sh in `bun run verify` caught it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…selective-graph-signals # Conflicts: # CHANGELOG.md # VERSION # package.json
…ages
CRITICAL: pre-landing review (codex outside-voice via /ship Step 9)
caught that hybrid.ts's `postFusionOpts` literal at line 566 was
building PostFusionOpts WITHOUT threading `resolvedMode.graph_signals`
to `graphSignalsEnabled`. The gate at hybrid.ts:358 read the field
from a literal that never set it.
Result before this fix: the entire v0.40.4 graph-signals wave was
dead code in production. Mode bundles set
`balanced.graph_signals = true` and `tokenmax.graph_signals = true`,
but no production call site ever reached applyGraphSignals. The
KNOBS_HASH bump 3→4 correctly varied the cache key by the flag, so
contamination was prevented — but the feature itself never fired.
All shipped infrastructure (engine SQL, fail-open audit, attribution
stamps, --explain formatter, doctor coverage check, search-stats
section) was reachable only through the unit-test seam
(`opts.adjacencyFn`). The CHANGELOG-advertised behavior never
landed in user-visible search.
Fix: thread `graphSignalsEnabled: resolvedMode.graph_signals` into
the postFusionOpts literal (1 line). Inline comment names codex's
catch so future refactors see the regression class.
Tests: new test/search/graph-signals-wire-integration.test.ts pins
the wire end-to-end. Three cases:
1. balanced mode → hybridSearch on a seeded brain with adjacency
hub produces a result with base_score stamped (proves
runPostFusionStages actually ran).
2. search.graph_signals=false config override → no graph_* fields
stamped (proves the gate honors the override path).
3. Source-grep regression guard pinning the
`graphSignalsEnabled: resolvedMode.graph_signals` literal in
hybrid.ts so a future refactor can't silently disconnect.
All 57 existing v0.40.4 wave tests still pass. Typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ deleted_at)
Two informational findings from /ship pre-landing review (Step 9):
1. Stderr message qualifier drift (rerank/slug-fallback/phantom audits)
Pre-v0.40.4 messages included a per-feature qualifier:
[gbrain] rerank-failure audit write failed (...)
[gbrain] slug-fallback audit write failed (...)
[gbrain] phantom audit write failed (...)
The T2 refactor dropped the qualifier (plan promised "byte-identical"
operator-visible behavior, but stderr lines did drift). Restored via
new `errorMessagePrefix` option on `createAuditWriter` (optional, ''
default). Three modules pass the per-feature qualifier; shell-audit
and supervisor-audit unaffected (their pre-v0.40.4 messages didn't
have a separate qualifier — label already carried the feature name).
2. Defense-in-depth `deleted_at IS NULL` on getAdjacencyBoosts
SQL was previously protected by-construction (hybridSearch's
visibility filter ensures input pageIds are live), but matches the
v0.35.5.0 findOrphanPages pattern and closes the bug class if a
future caller bypasses hybridSearch. Added to both Postgres and
PGLite engines for parity. Three JOIN sites guarded (targets CTE,
FROM-pages join). One inline comment per engine cites the codex
review and the v0.35.5.0 precedent.
Plan ref: /ship pre-landing review v0.40.4.0 (codex finding C and F).
All 84 audit+graph-signals tests pass. Typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… F1)
Three HIGH-severity issues from /ship adversarial pass:
H1 (Codex): Eval gate was a no-op.
Test passed `graph_signals: graphSignalsOn` via `as any` cast, but
SearchOpts had no field and hybridSearch's perCall didn't thread it.
Both off/on branches resolved to the mode-bundle default — gate
measured identical behavior, could pass while detecting nothing.
Fix: add `graph_signals?: boolean` to SearchOpts (types.ts:794).
Thread `opts.graph_signals` into perCall in both hybridSearch
(hybrid.ts:425) AND hybridSearchCached (hybrid.ts:1027) so the
cache-key resolver also sees the override. Drop the `as any` from
the eval test — types are real now.
H2 (Codex): Session diversification fired on entity directories.
sessionPrefix() used "any shared parent directory" as the session
signal. Result: a search for "people in SF" returned `people/alice`
+ `people/bob` + `people/charlie` and the latter two got demoted
to 0.95×. Every common entity-search query silently penalized
legitimate same-type results. Default-on for balanced/tokenmax
means production behavior was wrong.
Fix: narrow sessionPrefix() to fire ONLY when the slug contains a
session-like marker (`chat`/`session`/`sessions` segment OR a
`YYYY-MM-DD` date segment). Entity directories (`people/`,
`companies/`, `docs/`) return null → diversification skips.
Returns NULL (not the slug itself) so the loop skips clean.
Examples in JSDoc:
your-agent/chat/2026-05-20-foo → 'your-agent/chat/2026-05-20-foo'
daily/2026-05-20/journal-entry-1 → 'daily/2026-05-20'
transcripts/chat/funding-discussion → 'transcripts/chat/funding-discussion'
people/alice → null ← codex H2 regression
docs/quickstart → null
F1 (Claude adversarial subagent): case-sensitivity drift across 3 sites.
loadOverridesFromConfig in mode.ts is case-insensitive +
whitespace-trimmed for 'search.graph_signals' values. But
doctor's checkGraphSignalsCoverage (doctor.ts:899) AND
search-stats's readGraphSignalsStats (search.ts:288) used
case-sensitive compare. User sets `search.graph_signals TRUE`:
production enables the feature, but doctor + search-stats both
silently report disabled. Operators lose the only observability
surface for the new feature on values like 'True'/'TRUE'.
Fix: trim + lowercase parity at both sites. Mirror the parser's
semantic. Also case-normalized `search.mode` reads at both sites
for the same divergence class.
Tests:
- sessionPrefix block rewritten with 7 cases covering chat marker
+ date anchor + entity dirs (now-NULL) + degenerate (no /).
- Added regression test pinning codex H2: people/alice +
people/bob + people/charlie do NOT get diversified.
- graph-signals-eval.test.ts drops `as any` — typed field works.
- Existing tests using `chat/a`/`chat/b` updated to session-shaped
`media/2026-05-20/chunk-a` so the date anchor actually fires.
111/111 graph-signals + doctor + search-stats tests pass. Typecheck clean.
Plan ref: /ship adversarial review v0.40.4.0 (codex H1, H2; Claude F1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex L1 (audit window underreport) + Claude F2/F3/F5-F8/F11/F12/F14/F16 from /ship adversarial review. None are load-bearing; all captured under 'v0.40.4 adversarial review LOW findings — captured for v0.41+'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README: surface v0.40.4.0 graph signals + --explain in Hybrid search capability - CLAUDE.md: annotate engine.ts getAdjacencyBoosts, new graph-signals.ts / explain-formatter.ts / audit/audit-writer.ts, plus hybrid.ts post-fusion 4th stage, mode.ts graph_signals knob + KNOBS_HASH 3→4, cli-options.ts --explain flag, search stats + doctor coverage check - llms-full.txt: regenerated from CLAUDE.md per the build:llms chaser rule Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…selective-graph-signals # Conflicts: # CHANGELOG.md # VERSION # package.json
…selective-graph-signals # Conflicts: # CHANGELOG.md # TODOS.md # VERSION # package.json # src/core/audit-slug-fallback.ts # src/core/facts/phantom-audit.ts # src/core/minions/handlers/shell-audit.ts # src/core/search/mode.ts
…selective-graph-signals # Conflicts: # CHANGELOG.md # TODOS.md # VERSION # package.json
…selective-graph-signals # Conflicts: # CHANGELOG.md # VERSION # package.json
…selective-graph-signals # Conflicts: # CHANGELOG.md # VERSION # package.json
setup-bun action with `bun-version: latest` calls the GitHub API (https://api.github.com/repos/oven-sh/bun/git/refs/tags) to resolve the tag. CI started failing today with HTTP 401 "Bad credentials" even though the action receives a token (visible as `token: ***` in the run log). Pinning the version eliminates the API call entirely. Affected workflows: test.yml, e2e.yml, release.yml, heavy-tests.yml (5 invocations total). Pinned to 1.3.13 — matches package.json engines (`bun >= 1.3.10`) and the version v0.40.4.0 was developed against. Bump cadence: when a new bun version is required, update this pin in one PR. Trading "always-latest" for "always-deterministic" is the right trade for a 5-shard CI matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
May 23, 2026
…ns v81→v90 Absorbs master's v0.38 (ingestion cathedral), v0.38.1 (agents), v0.38.2 (doctor), v0.39.0 (brainstorm cost cathedral), v0.39.1 (schema packs), and the v0.40.x VERSION bump on top. Conflict resolutions: - VERSION → 0.40.5.0 (this wave's slot; v0.40.4.0 claimed by salem PR #1300) - package.json → 0.40.5.0 - src/core/migrate.ts → took master's v81 (pages_provenance_columns) + v82-v88; appended our contextual_retrieval_columns as v90 (skipped v89 reserved by garrytan/v0.40.2.0-trajectory-routing per D7 inspection) - src/core/search/mode.ts → KNOBS_HASH_VERSION 4→5 (per D8 sequencing behind salem's pending v=4 graph signals); both schema_pack hash fields (master) and contextual_retrieval hash fields (this branch) preserved - src/core/types.ts → both v0.38 provenance Page fields and v0.40.3 CR fields preserved on the Page interface - CHANGELOG.md → took master as baseline; v0.40.5.0 entry lands in T9 docs phase - bun.lock → bun install refreshed to pick up chokidar@^4.0.3 (v0.38 dep) bun run typecheck passes after merge.
garrytan
added a commit
that referenced
this pull request
May 23, 2026
…master) Master is at v0.40.2.0; v0.40.3.0 is genuinely the next free slot. The wave was originally planned as v0.40.5.0 sequenced behind salem (PR #1300 = v0.40.4.0) but the user is shipping THIS branch as v0.40.3.0 because: 1. v0.40.3.0 IS the canonical version slot for the contextual retrieval cathedral (matches branch name garrytan/v0.40.3.0-contextual-retrieval). 2. Master is at v0.40.2.0 — v0.40.3.0 is the immediate next slot, not a collision. 3. salem's v0.40.4.0 + any v0.40.5.0 work sit ON TOP of this in the landing train, not under it. Mechanical rename only — no content changes from the v0.40.5.0 commit sequence (T1-T11 wave is preserved verbatim, just relabeled): - VERSION + package.json: 0.40.5.0 → 0.40.3.0 - bun.lock: refreshed (no dep changes) - CHANGELOG.md: ## [0.40.5.0] header → ## [0.40.3.0] + body references - skills/migrations/v0.40.5.0.md → skills/migrations/v0.40.3.0.md (previous v0.40.3.0.md file overwritten with the richer T9 content) - CLAUDE.md: "Key commands added in v0.40.5.0" → "v0.40.3.0" - 30 source + test files: comment references swept via sed s/0.40.5.0/0.40.3.0/g - llms.txt + llms-full.txt: regenerated Migration numbering UNCHANGED: v90 (renamed from original v81 because master took v82-v88) and v91 (new trigger migration) stay at v90/v91 — the version slot is orthogonal to the migration ledger collision. KNOBS_HASH_VERSION = 5 stays — sequenced behind master's v=4 schema-pack work; salem's v=4 graph-signals will rebump to v=5 if it lands first. Test results after rename: - bun run verify: clean (typecheck + 7 pre-checks) - bun run test: 9482 pass / 0 fail / 0 skip Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
May 23, 2026
…ferred-item closures (#1323) * v0.40.3.0 T1: migration v81 + CRMode type substrate Five additive columns + Page/SourceRow type extensions + CRMode discriminated union land the schema foundation for v0.40.3.0 contextual retrieval. All columns are NULL-tolerant; existing rows continue working unchanged until the post-upgrade reembed sweep catches up. Schema (migration v81 + schema.sql + pglite-schema.ts mirror): - pages.contextual_retrieval_mode TEXT NULL — tier the page was last embedded under. NULL on pre-v81 rows; drift detection treats NULL as 'none' for reindex predicates. - pages.corpus_generation TEXT NULL — composite hash of (synopsis_prompt_version, haiku_model, title_wrapper_version, embedding_model) per D27 P1-5. Document-side provenance for the v0.40.3.0 query_cache.page_generations invalidation contract. - sources.contextual_retrieval_mode TEXT NULL — per-source override. CLI-write-only per D15 security gate. - sources.trust_frontmatter_overrides BOOLEAN DEFAULT FALSE — per-source mount-frontmatter trust gate per D15. Host source (id='default') is always trusted in the resolver regardless of column value. - query_cache.page_generations JSONB DEFAULT '{}' — D27 P1-5 invalidation contract foundation. Per-row tag of {page_id: corpus_generation} so lookup can LEFT JOIN against current pages and exclude stale rows. Types (src/core/types.ts + src/core/sources-ops.ts): - New CR_MODES = ['none', 'title', 'per_chunk_synopsis'] as const + CRMode type union + isCRMode() type guard for parsing untrusted frontmatter / config values. - Page interface extended with contextual_retrieval_mode + corpus_generation (optional, NULL-tolerant for pre-v81 rows). - SourceRow interface extended with contextual_retrieval_mode + trust_frontmatter_overrides (optional for pre-v81 brains). Bootstrap coverage: - All four pages/sources columns are in PGLITE_SCHEMA_SQL CREATE TABLE bodies (fresh installs get them at initSchema time). - query_cache.page_generations is exempt because query_cache itself is migration-created (added in v55, not in PGLITE_SCHEMA_SQL). Same rationale as the existing query_cache.knobs_hash exemption. Pinned by the migrate.test.ts v81 round-trip + the schema-bootstrap-coverage parser (which also gained the query_cache.page_generations exemption). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.3.0 T2: MARKDOWN_CHUNKER_VERSION 2→3 (contextual wrapper signal) Bumps the markdown chunker version so the post-upgrade reembed sweep finds every page on the old chunker version and re-embeds it through the new contextual-retrieval wrapper path. Chunk boundaries themselves are unchanged from v2 — the bump forces re-embed (not re-chunk) so existing pages pick up the wrapper without recomputing chunk splits. JSDoc on MARKDOWN_CHUNKER_VERSION updated to document the v3 semantic ("chunks embed with optional contextual retrieval wrapper per Anthropic's published methodology"). Pins the dependency between the chunker version bump and the upcoming src/core/contextual-retrieval-service.ts (T5). Test fixture in test/chunkers/recursive.test.ts updated to assert v3 with a brief comment on the bump rationale so future contributors see the v0.40.3.0 reason inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.3.0 T3: pure modules — resolver, wrapper, synopsis, audit Four new pure modules under src/core/ that the upcoming service layer (T5) and Minion handler (T6) compose. All four are testable in isolation; no engine I/O, no filesystem reads outside the synopsis source-text fallback chain (which is invoked by the service, not the modules themselves). src/core/contextual-retrieval-resolver.ts (D5+D6+D15+D26 P0-4): - resolveContextualRetrievalMode() walks the three-source override chain: page frontmatter > source row > global mode bundle. Returns a tagged result with source attribution + invalid_frontmatter_value (D13) + frontmatter_rejected_untrusted_mount (D15) for doctor surfacing. - crModeDistinct() helper for D26 P0-4 IS DISTINCT FROM semantics on app-side CRMode comparisons (NULL-aware, defeats the != misses NULL drift bug Codex pass 2 caught). - HOST_SOURCE_ID = 'default' always trusted regardless of trust_frontmatter_overrides; mount sources require the explicit flag per D15 security gate. src/core/embedding-context.ts (D20-T1 + D20-T4 + Codex T5 title-weakness): - buildContextualPrefix(title, synopsis) → null | wrapped block. Handles title-only, summary-only, both, or neither. - wrapChunkForEmbedding(text, prefix, chunkSource) short-circuits on chunk_source='fenced_code' per D20-T4 (code chunks inside markdown pages skip the wrapper — prepending page title to a code block doesn't help cross-modal retrieval). - sanitizeTitle/sanitizeSynopsis strip </context> (injection vector) and collapse whitespace + cap at 300 chars. - extractFirstTwoSentences() pure regex with CJK_SENTENCE_DELIMITERS from src/core/cjk.ts for the title-tier free fallback path. src/core/page-summary.ts (D27 P1-2 + D27 P1-4 + D21 reversal): - generatePerChunkSynopsis() routes through gateway.chat(tier='utility'). - Richer failure envelope per D27 P1-2: refusal/empty/malformed (→ D14 page-level fall-back) vs auth_failure/rate_limit/timeout/network/ provider_5xx (→ retry per gateway, or throw to Minion retry). - buildSynopsisCacheKey() composes the LRU key per D27 P1-4: (content_hash, chunk_index, corpus_generation, source_text_hash). - DELIBERATELY no calibration injection — D21 reversed D7's calibration- aware acceptance. Mutable answer-time bias tags don't belong in static document vectors. Query-side personalization is the v0.41+ home. src/core/audit-synopsis.ts (D17, mirrors v0.35.0.0 rerank-audit precedent): - Failure-only JSONL writer at ~/.gbrain/audit/synopsis-failures-YYYY-Www.jsonl with ISO-week rotation. Deliberately no success logging (10K+ pages per backfill would generate 10K+ JSONL rows of noise; failure signal is the actionable one). - summarizeSynopsisFailures() aggregator returns SynopsisFailureSummary for doctor's synopsis_refusal_rate check. Clean typecheck across the four modules. Tests land in T14 alongside the service + Minion handler so the test layer can integrate the full path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.3.0 T4: ModeBundle.contextual_retrieval + KNOBS_HASH_VERSION 3→4 Three-tier wrapper ladder gated by search.mode lands in the bundle. The per-mode defaults match the cost-tier philosophy (D2): conservative → 'none' (minimum surface) balanced → 'title' (free at runtime; pure string concat) tokenmax → 'per_chunk_synopsis' (Anthropic's published method) Plus the D18 soft kill switch (contextual_retrieval_disabled) so a single config-key flip neutralizes wrapping for queries AND new embeds without touching the migration path. src/core/search/mode.ts: - ModeBundle: contextual_retrieval: CRMode + contextual_retrieval_disabled. - All three frozen MODE_BUNDLES updated with the per-tier defaults. - SearchKeyOverrides + SearchPerCallOpts: both fields optional in the per-key config + per-call surfaces. - resolveSearchMode's pick chain threads both new fields through the standard per-call > per-key > mode bundle precedence ladder. - KNOBS_HASH_VERSION 3→4. Two new entries appended to knobsHash() parts list (append-only per CDX2-F13 convention): cr=${cr_mode} + crd=${0|1}. A query against a tokenmax-mode brain can no longer be served from a cache row written when the brain was on balanced — they sit in different embedding spaces. - SEARCH_MODE_CONFIG_KEYS: 'search.contextual_retrieval' + 'search.contextual_retrieval_disabled' added. - loadOverridesFromConfig reads both keys; CR_MODES guard rejects typos (drift typos still fall through to mode default per D13 sync-failure semantics; this is the no-typo path). - Imports CR_MODES + CRMode from src/core/types.ts. src/commands/search.ts: - KNOB_DESCRIPTIONS picks up the two new entries so `gbrain search modes` dashboard renders them with description copy. test/search-mode.test.ts: - Three canonical bundle tests updated with the per-tier CR defaults. - KNOBS_HASH_VERSION expectation bumped 3→4 with inline rationale. Clean typecheck + 42 search-mode tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.3.0 T8: NULL→non-NULL upsert race fix (D24, closes v0.35.x TODO) Two writers racing on the same chunk (autopilot sync + manual `embed --stale` + contextual reindex) previously raced last-writer-wins via the text- unchanged branch's `COALESCE(EXCLUDED.embedding, content_chunks.embedding)`. Pre-v0.40.3 the cost of an overwrite was one wasted ~$0.000001 text- embedding-3-large call. With v0.40.3's per-chunk Haiku synopsis on tokenmax, the cost rises ~300x to ~$0.0003 per overwritten chunk plus the discarded synopsis work. On a 10K-page tokenmax brain, a few percent overwrite rate during concurrent backfill+sync wastes $1-5 of Haiku spend silently. Fix (mirrored exactly in postgres-engine.ts + pglite-engine.ts so both engines stay parity-pinned): embedding = CASE WHEN EXCLUDED.chunk_text != content_chunks.chunk_text THEN EXCLUDED.embedding WHEN content_chunks.embedding IS NULL THEN EXCLUDED.embedding WHEN EXCLUDED.embedded_at IS NOT NULL AND (content_chunks.embedded_at IS NULL OR EXCLUDED.embedded_at > content_chunks.embedded_at) THEN EXCLUDED.embedding ELSE content_chunks.embedding END, embedded_at = CASE WHEN EXCLUDED.chunk_text != content_chunks.chunk_text AND EXCLUDED.embedding IS NULL THEN NULL WHEN content_chunks.embedding IS NULL AND EXCLUDED.embedding IS NOT NULL THEN EXCLUDED.embedded_at WHEN EXCLUDED.embedded_at IS NOT NULL AND (content_chunks.embedded_at IS NULL OR EXCLUDED.embedded_at > content_chunks.embedded_at) THEN EXCLUDED.embedded_at ELSE content_chunks.embedded_at END, The two columns move together via aligned CASE WHEN logic — embedding + embedded_at stay consistent so `embed --stale` (predicate `embedding IS NULL`) keeps working correctly. Behavior summary for the text-unchanged branch: - existing embedding NULL → take new (cold path, no race) - new is fresher (embedded_at > existing) → take new - otherwise → keep existing (slower writer with stale embedding loses) Closes the v0.35.x TODOS.md item that flagged this race pre-existing. v0.40.3 fold-in lands the fix when the wave amplifies the cost vector, per D24 in the eng-review pass. 100 pglite-engine tests pass + clean typecheck. E2E concurrent-writer test lands in T14 alongside the broader test suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.3.0 T5: contextual-retrieval-service + two-phase build (D27 P1-1) Centerpiece service module. Single source of truth for "re-embed one page with the active CR mode" — composed by import-file.ts (sync time), reindex.ts (batch sweep), and the contextual-reindex-per-chunk Minion handler (T6). Closes the drift class Codex pass 2 P1-1 flagged: each consumer no longer hand-rolls the embed-then-stamp flow, so there's literally no way for them to diverge. src/core/contextual-retrieval-service.ts: - reembedPageWithContextualRetrieval() implements the D26 P0-2 two-phase build pattern. PHASE 1 (in-memory, no DB writes): - Load page + source + chunks - Resolve effective CR mode (resolver) with optional kill-switch short-circuit per D18 - 'none' tier: skip wrap, stamp column, return early (records page is up-to-date relative to current state so reindex sweep doesn't re-walk it) - 'title' tier: pure string concat with sanitized title prefix - 'per_chunk_synopsis' tier: read source text via fallback chain (D11), generate synopsis per chunk SEQUENTIALLY within page (D10), batch embedBatch ONCE per page (D27 P2-2). Rate-leasing hooks (acquireSynopsisLease/releaseSynopsisLease) supplied by the Minion handler; inline callers rely on gateway-level retry. - On refusal/empty/malformed (per D27 P1-2): RESTART PHASE 1 at 'title' tier — D14 page-level consistency (whole page demoted, no mid-state on disk). PHASE 2 (single DB transaction): - tx.upsertChunks() — chunk_text stays canonical per D20-T1; only the wrapped string went to the embedder, not into the column. - tx.updatePageContextualRetrievalState() — stamps both columns atomically with PHASE 1 chunk writes. - computeCorpusGeneration() composes the document-side provenance hash per D27 P1-5: sha256(cr_mode + synopsis_prompt_version + haiku_model + title_wrapper_version + embedding_model_tag).slice(0,16). Future prompt edits or model bumps invalidate prior cache rows via the query_cache.page_generations LEFT JOIN (lands in T11). - computeSourceTextHash() for D27 P1-4 synopsis cache key composition. - expectedModeForPageSourceOnly() helper for the T9 reindex sweep predicate. - ReembedPageResult discriminated union: success | skipped (4 reasons) | page_fallback (refusal triggered D14) | transient_error | permanent_error. Each consumer dispatches on `kind` to decide retry / surface / commit. New engine method (added to BrainEngine interface + both engines): - updatePageContextualRetrievalState(slug, sourceId, mode, corpusGeneration): narrow UPDATE of just the two CR-state columns + updated_at. Skips soft-deleted rows. Mirrors refreshPageBody's narrow-update pattern so we don't fire createVersion on every tier upgrade (which would bloat page_versions). Clean typecheck + 272 existing tests pass (no regressions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.3.0 T6: contextual_reindex_per_chunk Minion handler + protection Thin handler (D23) that wires the global Haiku rate-leaser (D26 P0-3) + delegates re-embed work to contextual-retrieval-service.ts (T5). One job per page (D10). Submitted by the mode-switch hook (T10), the reindex sweep (T9), and doctor --remediate (T13). src/core/minions/handlers/contextual-reindex-per-chunk.ts: - makeContextualReindexHandler(opts) factory closure. - Per-chunk Haiku call wrapped in acquireLease/releaseLease against the shared key 'anthropic:utility:contextual-synopsis'. Default RPM cap is 50 (Anthropic Haiku 4.5 published limit); operators on a tier with higher quota override via GBRAIN_CONTEXTUAL_HAIKU_RPM env var. - D27 P2-1 source-id derivation: payload carries only page_slug; handler loads the page row and uses its source_id as authoritative. Optional expected_source_id field on the payload triggers UnrecoverableError on mismatch (stale/malicious payload defense). - Result classification: success / page_fallback (D14) → ok transient_error → throw (Minion retries) permanent_error → UnrecoverableError → dead-letter - 60s poll-wait per Haiku call when the rate-lease is saturated; gives up with explicit error rather than blocking forever. src/core/minions/protected-names.ts: - contextual_reindex_per_chunk added to PROTECTED_JOB_NAMES with comment documenting the cost vector (1-50 Haiku calls per page, bulk MCP submission could drain user's Anthropic budget). src/commands/jobs.ts: - registerBuiltinHandlers wires the new handler via dynamic import. - Registered ABOVE autopilot-cycle so the handler is available when doctor --remediate proposes contextual_retrieval_coverage steps. Clean typecheck. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.3.0 T7: import-file.ts wraps at embed time, stamps CR state columns import-file.ts now resolves the effective CR mode for each page at embed time and applies the wrapper inline. Per D20-T1 critical invariant, the stored chunk_text stays canonical (powers FTS, snippets, reranker, debug); only the wrapped string goes to the embedder. Inline path scope (cost-discipline choice): - title-tier: inline wrap is free (pure string concat). Applied directly. - per_chunk_synopsis tier: TOO EXPENSIVE for the inline import path (one Haiku call per chunk on every sync would compound into hours of blocking per `gbrain sync`). The inline path lands the page at the title tier; the Minion-driven contextual reindex (T6 handler) upgrades it to per_chunk_synopsis later when the user accepts the cost prompt in the mode-switch hook (T10). Per D3 explicit-consent contract. - 'none' tier (conservative mode, kill-switch disabled): no wrapping, raw chunk_text → embedder unchanged from pre-v40.3 behavior. Code chunks (chunk_source='fenced_code') always bypass wrapping per D20-T4 — wrapChunkForEmbedding short-circuits. Stamping (alongside putPage in the same transaction): - pages.contextual_retrieval_mode → tier the page was just embedded at - pages.corpus_generation → composite hash via computeCorpusGeneration from the service module. NULL when 'none' tier or noEmbed=true. Override chain: page frontmatter > source row > global mode bundle (D5+D6). Mount-frontmatter trust gate (D15) — currently lookup uses defaults for source row; future T9 reindex sweep + T10 mode-switch hook can pass a richer source row when the per-source override lands. Kill switch (D18): when search.contextual_retrieval_disabled=true, the resolver short-circuits to 'none' and the wrapper is skipped. Clean typecheck + 251 unit tests pass (migrate + pglite-engine + import-file all green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.3.0 T9: reindex --markdown extends to catch CR state drift `gbrain reindex --markdown` predicate widens from chunker_version drift alone to also catch contextual_retrieval_mode IS NULL — the v0.40.3.0 upgrade-path signal that a page has never been evaluated against the CR ladder (pre-v81 brains where the column is freshly NULL after the migration ran). Pages enter the sweep when EITHER: (a) chunker_version < MARKDOWN_CHUNKER_VERSION (existing behavior) (b) contextual_retrieval_mode IS NULL (new — D26 P0-1 + D26 P0-4 prep) Since chunker_version 2→3 (T2) already forces every pre-v40 page into (a), the IS NULL clause is effectively a belt-and-suspenders for the case where a brain upgrades migrate but somehow the chunker_version bump didn't propagate (concurrent upgrade race, manual SQL edit, etc.). The re-import path uses importFromContent with forceRechunk:true (existing v0.32.7 behavior) which bypasses the content_hash short- circuit so the v0.40.3.0 import-file.ts wrapper application path (T7) actually applies. Each re-imported page picks up the active CR tier and stamps contextual_retrieval_mode + corpus_generation atomically. Page-frontmatter overrides are honored at re-import time (importFromFile re-parses YAML and the resolver picks the per-page tier). The frontmatter- mismatch drift case Codex P0-1 called for (user removes override after initial import) is partially handled here via the IS NULL+forceRechunk path; a v0.41+ wave can add the explicit "frontmatter may contain override" candidate path if real users hit drift the current predicate misses. Clean typecheck + 230 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.3.0 T10: post-upgrade cost prompt explains contextual retrieval The existing post-upgrade-reembed.ts prompt fires automatically on `gbrain upgrade` because T2 bumped MARKDOWN_CHUNKER_VERSION 2→3. Prompt copy extended to explain WHY the re-embed is happening — without this, users see a "chunker-bump" prompt and wonder if it's a routine internal refresh vs the actual headline feature ship. formatReembedPrompt now appends a [contextual retrieval] line below the chunker-bump cost summary, mentioning that v0.40.3.0 wraps each chunk with its page title before embedding (Anthropic's published method). What the user sees on upgrade: [chunker-bump] Will re-embed ~N markdown pages via {model}, est. ~$X.XX, ~Ymin. Press Ctrl-C within Zs to abort. [contextual retrieval] v0.40.3.0 wraps each chunk with its page title before embedding (Anthropic's published method). Title-tier wrap is free at runtime (pure string concat, no Haiku) so the cost number stays unchanged from the chunker-bump-only case. The per-chunk Haiku synopsis tier is OPT-IN via `gbrain config set search.mode tokenmax` post-upgrade, which fires the contextual_reindex_per_chunk Minion handler (T6) for the backfill. T10 mode-switch hook in src/commands/config.ts (the explicit per-mode cost prompt UX on `gbrain config set search.mode tokenmax`) is deferred to v0.40.3.1 — the explicit-consent contract (D3) is satisfied by the existing post-upgrade prompt for the title-tier path that the wave ships by default. The Minion handler from T6 + the protected-name guard ensure that any direct Minion submission for the per-chunk path is gated on the CLI/doctor-remediate trust boundary. Kill switch (D18): the contextual_retrieval_disabled config key is honored at import time (T7) and in the service (T5) — when true, the resolver short-circuits to 'none' regardless of mode bundle. No hybridSearch changes needed: queries embed raw text already; the kill switch only affects NEW embeds. Existing wrapped vectors keep serving queries via cosine similarity (asymmetric retrieval is preserved). 11 upgrade-reembed-prompt tests pass + clean typecheck. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.3.0 T11-T13: query cache notes + remediation note + doctor check T11 (query_cache.page_generations contract): the DB column shipped in T1 migration v81 + KNOBS_HASH_VERSION 4 bump in T4 invalidates the common-case cache contamination (full-brain mode upgrade). The LEFT JOIN read-side gate per Codex P1-5 — for the edge case where a brain is mid- reindex and some pages are stamped at corpus_generation N+1 while others are still at N — is deferred to v0.40.3.1. In practice, the post-upgrade reembed prompt fires automatically + completes before search resumes on healthy brains, so the edge case is narrow. CHANGELOG documents the limitation. T12 (generic RemediationStep contract): the existing recommendation registry shape (sync/embed/backlinks/extract hardcoded) is extended via the doctor check below rather than refactored to a generic registry. Codex P1-6 called for the refactor; v0.40.3.1+ can absorb it once a real second consumer requires the same registration shape. T13 (contextual_retrieval_coverage doctor check): - New checkContextualRetrievalCoverage() in src/commands/doctor.ts. - Two SQL signals: pages.chunker_version < current + pages.contextual_ retrieval_mode IS NULL. Single COUNT...FILTER query is cheap on every brain size. - Audit summary line: reads ~/.gbrain/audit/synopsis-failures-*.jsonl via the v0.40.3.0 audit-synopsis module (T3). >5% page-level fallback rate surfaces explicitly so operators see the Haiku refusal signal. - Paste-ready fix: `gbrain reindex --markdown` — the v0.32.7 + v0.40.3.0 sweep covers both chunker_version drift AND CR mode drift per T9. - Status: ok when fully aligned + no recent failures; warn when drift exists (with the paste-ready fix in the message). - Wired into the standard doctor run alongside the other v0.36+ checks (abandoned_threads, calibration_freshness, etc.). Sources/mounts CLI surfaces (set-cr-mode + trust-frontmatter) deferred — the post-upgrade-reembed prompt + the per-page frontmatter override path cover the v0.40.3.0 operational workflow. Per-source override CLI is a power-user feature that can ship in v0.40.4+ once real federated- brain users surface specific friction. 48 doctor tests pass + clean typecheck. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.3.0 T14: 5 test files, 77 new tests, IRON-RULE regression coverage Test suite for the v0.40.3.0 contextual retrieval wave. 77 new test cases across 5 files, all green. Pins every IRON-RULE invariant end-to-end so future contributors can't silently regress the wave. test/contextual-retrieval-resolver.test.ts (29 tests): - 9-combo override matrix (page-fm > source-row > global, all permutations). - D15 mount-trust gate: host always trusted, mounts honor only when trust_frontmatter_overrides=true, rejected frontmatter surfaces via result.frontmatter_rejected_untrusted_mount for doctor. - D13 invalid frontmatter (typo + non-string + empty): falls through to source/global with raw value in invalid_frontmatter_value. - D18 kill switch: short-circuits to 'none' regardless of overrides. - D26 P0-4 crModeDistinct: NULL-aware comparison, matches SQL IS DISTINCT FROM semantics on every combination of NULL/defined args. test/embedding-context.test.ts (21 tests): - buildContextualPrefix: title-only, synopsis-only, both, neither. - wrapChunkForEmbedding: non-code wraps; D20-T4 fenced_code ALWAYS bypasses; null prefix passes through; image_asset wraps as text. - sanitizeTitle: </context> injection stripped (case-insensitive), whitespace collapsed, 300-char cap, trim semantics. - extractFirstTwoSentences: English boundaries, question marks, CJK delimiters, run-on cap, empty input, no-delimiter passthrough. - modeRequiresHaiku / modeRequiresWrapper guards. - D20-T1 IRON-RULE regression test: wrapping does not mutate input string reference (so caller's chunk_text safely flows to upsert). test/contextual-retrieval-service-pure.test.ts (16 tests): - computeCorpusGeneration: 16-char hex, deterministic, mode-sensitive, model-sensitive, TITLE_WRAPPER_VERSION stable. - computeSourceTextHash: D27 P1-4 cache invalidation key composition. - expectedModeForPageSourceOnly (T9 reindex predicate helper): kill switch returns none, source override beats global, invalid override falls through, all CR modes round-trip. test/audit-synopsis.test.ts (11 tests): - ISO-week filename rotation (stable for same week, different days). - logSynopsisFailure round-trip: kind, page_level_fallback flag, multi-event accumulation, detail 200-char cap. - summarizeSynopsisFailures aggregation: null on empty, by_kind counts, page_level_fallback_rate math. - Missing audit file returns empty (silent no-op). test/e2e/contextual-retrieval-pglite.test.ts (5 tests, hermetic PGLite + gateway stub): - IRON RULE #1 (D20-T1): wrapper text in embedder input but NEVER in content_chunks.chunk_text after import — pins the canonical chunk_text separation invariant end-to-end. - IRON RULE #2 (D14 stamping): pages.contextual_retrieval_mode AND pages.corpus_generation are set after every import. - IRON RULE: chunker_version stamps to current MARKDOWN_CHUNKER_VERSION (3 for v0.40.3.0). - D5 per-page frontmatter override: `contextual_retrieval: none` makes the embedder receive UNWRAPPED text; mode column stamped 'none'. - T9 reindex predicate: pages with contextual_retrieval_mode IS NULL enter the sweep regardless of chunker_version. 462 tests pass across all v0.40.3.0 + adjacent suites (migrate, pglite-engine, search-mode, doctor, import-file, upgrade-reembed-prompt, schema-bootstrap-coverage, recursive chunker, all five new files). Zero regressions, clean typecheck. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.3.0 T15: VERSION + CHANGELOG + migration self-repair + llms regen VERSION 0.37.11.0 → 0.40.3.0 with package.json sync. CHANGELOG entry follows the CLAUDE.md ELI10-lead voice rule: opens with "Your search now understands what each chunk is about, not just what words are in it," lays out the tier ladder with a real cost table, calls out the chunk_text storage separation (D20-T1) with a concrete example, and includes the "Things to watch" + "What we caught and fixed before merging" sections per the format spec. CHANGELOG also includes the canonical "To take advantage of v0.40.3.0" self-repair block with the manual `gbrain apply-migrations --yes` + `gbrain reindex --markdown` recovery path for users whose `gbrain upgrade` post-upgrade-reembed didn't fully fire. skills/migrations/v0.40.3.0.md walks the agent through the mechanical upgrade flow, the opt-up to tokenmax path with the realistic backfill cost table, the opt-out soft kill switch flip, and the per-page frontmatter override with the D15 mount-trust note. Matches the v0.13.0 + v0.32.7 migration doc structure so agent muscle memory works. llms-full.txt + llms.txt regenerated via `bun run build:llms` to pick up the CHANGELOG + migration doc additions. test/build-llms.test.ts passes. Also moved test/audit-synopsis.test.ts → test/audit-synopsis.serial.test.ts to satisfy the check-test-isolation lint (the test mutates GBRAIN_AUDIT_DIR via beforeAll/afterAll for a fixture dir, which the parallel runner forbids in *.test.ts files; serial quarantine is the canonical fix per CLAUDE.md test-isolation rules). `bun run verify` passes (typecheck + 4 CI gate checks). 469 tests across all v0.40.3.0 + adjacent suites pass with 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.3.0 test gaps: doctor check coverage + concurrent race regression Post-T15 test gap-fill: covers the two highest-leverage spots that the T14 suite didn't exercise. test/contextual-retrieval-doctor.serial.test.ts (8 tests, .serial because the doctor check reads the audit JSONL via GBRAIN_AUDIT_DIR env mutation): - empty-brain → ok - fully-aligned brain (chunker_version current + mode stamped) → ok - chunker_version drift → warn with paste-ready `gbrain reindex --markdown` - NULL mode column → warn surfaces "never evaluated against CR ladder" - both drift conditions together → warn with both messages - soft-deleted pages NOT counted (deleted_at filter works) - non-markdown (code) pages NOT counted (page_kind filter works) - audit JSONL refusal event surfaces in the failure-summary line test/e2e/concurrent-embed-race.test.ts (3 tests, D24 regression guard): - cold path: existing embedding NULL → take new (no-race case) - IRON RULE: fresher write wins over stale write when text unchanged. Pre-fix this would have last-writer-wins via COALESCE; post-fix the fresher embedded_at survives. Pinned by raw SQL upsert with an explicit -5min embedded_at to simulate the slower writer. - text change with no new embedding → both embedding + embedded_at reset to NULL (consistent state so embed --stale picks up). Cross-shard contamination fix: race test calls configureGateway with embedding_dimensions=1536 BEFORE initSchema so the PGLite vector column sizes consistently regardless of what other tests in the same shard process configured first. Without this, running the race test alongside the pglite-e2e test triggered "expected 1280 dimensions, not 1536" when the gateway was left in its default ZE-1280 state by a prior file. `bun run verify` passes (typecheck + 5 CI gate checks). 88 tests pass across all v0.40.3.0 + new gap-fill files in one combined run; zero shared-state contamination. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.40.5.0 T2: schema — v90 contextual_retrieval_columns + v91 trigger + index Migration v90 (renamed from v0.40.3.0 v81 on master merge per D2/D7): - 5 additive columns (pages.contextual_retrieval_mode, pages.corpus_generation, sources.contextual_retrieval_mode, sources.trust_frontmatter_overrides, query_cache.page_generations) for the contextual retrieval wave. Migration v91 (NEW per D6 + codex #4 + codex #8): - pages.generation BIGINT NOT NULL DEFAULT 1 (per-page generation counter) - query_cache.max_generation_at_store BIGINT NOT NULL DEFAULT 0 (Layer 1 bookmark) - bump_page_generation_fn() trigger function: - BEFORE INSERT: NEW.generation := COALESCE(MAX(generation), 0) + 1 — codex #4 INSERT coverage so cache rows stored before a new page existed invalidate correctly. - BEFORE UPDATE: bumps generation only when allow-list columns IS DISTINCT FROM (compiled_truth, timeline, frontmatter, deleted_at, contextual_retrieval_mode, title, type, page_kind, corpus_generation, content_hash) per D6 widened to catch user-visible mutations. - CREATE INDEX CONCURRENTLY pages_generation_idx ON pages (generation) so MAX(generation) for the bookmark check is O(log N) — codex #8 confirmed plain btree, no DESC necessary. Mirrored in src/schema.sql, src/core/pglite-schema.ts CREATE TABLE body (trigger included so fresh PGLite installs get it from the schema blob, not just migration replay). Extended REQUIRED_BOOTSTRAP_COVERAGE with pages.contextual_retrieval_mode, pages.corpus_generation, sources.contextual_retrieval_mode, sources.trust_frontmatter_overrides, pages.generation. Probes added to applyForwardReferenceBootstrap on both engines + matching ALTER blocks for pre-v90/pre-v91 brains. COLUMN_EXEMPTIONS extended: query_cache.max_generation_at_store (same rationale as page_generations — query_cache is migration-only, not in PGLITE_SCHEMA_SQL). Test results: - bun test test/migrate.test.ts: 140 pass / 0 fail - bun test test/schema-bootstrap-coverage.test.ts: 9 pass / 0 fail - bun run typecheck: clean * v0.40.5.0 T3: cache gate — query-cache-gate.ts + lookup/store rewrites New pure module src/core/search/query-cache-gate.ts: - buildPageGenerationsSnapshot(engine, pageIds) builds the {pageId: gen} snapshot + MAX(generation) bookmark in one round trip via UNION ALL. Pre-v91 brains (no generation column) fall back to empty snapshot + zero bookmark — backward compat with legacy rows preserved. - validateCacheRowAgainstPages() — pure validator for unit testing. - CACHE_GATE_WHERE_CLAUSE exported as a SQL fragment that lookup() embeds in its WHERE clause. Two-layer gate per D11: Layer 1 (cheap): (SELECT MAX(generation) FROM pages) <= qc.max_generation_at_store Layer 2 (per-page): jsonb_each + LEFT JOIN pages to detect deletes + bumped pages on the cached result set. Legacy compat: rows with empty {} snapshot are vacuously valid (Layer 2 short-circuits) — IRON-RULE pinned. query-cache.ts wiring: - lookup() table-aliased to `qc` so the gate fragment can reference qc.max_generation_at_store + qc.page_generations. WHERE clause adds `AND ${CACHE_GATE_WHERE_CLAUSE}` after the existing similarity + TTL + knobs_hash filters. - store() captures the snapshot via the pure helper, then INSERTs both page_generations JSONB and max_generation_at_store BIGINT alongside the existing columns. ON CONFLICT (id) DO UPDATE refreshes both. Test coverage (15 unit + 6 e2e): - test/query-cache-gate.test.ts: 15 cases covering pure validator branches (vacuous valid, bookmark short-circuit, single/multi/partial bumps, deleted page, codex D11 critical case), PGLite-backed snapshot builder (empty pageIds, populated pageIds, integer JSONB shape, non-existent IDs skipped, bump-after-update), SQL shape regression on CACHE_GATE_WHERE_CLAUSE. - test/e2e/cache-gate-pglite.test.ts: 6 cases covering store → HIT, content UPDATE → MISS, INSERT new page → HIT (codex #4 case where bookmark fires but snapshot intact serves correctly), legacy row → HIT (IRON-RULE backward compat), soft-delete → MISS (trigger path), multi-page partial bump → MISS. Test results: - bun test test/query-cache-gate.test.ts test/query-cache.test.ts test/query-cache-isolation.test.ts test/e2e/cache-gate-pglite.test.ts: 33 pass / 0 fail - bun run typecheck: clean Note: hard-delete (raw DELETE FROM pages) is not covered by the trigger (BEFORE INSERT OR UPDATE doesn't fire on DELETE). Production uses soft-delete via deleted_at (trigger allow-list catches NULL → timestamp distinction). Hard-delete via admin-only `gbrain pages purge-deleted` is best-effort cache-wise — acceptable for the rare admin path. * v0.40.5.0 T5: mode-switch UX at gbrain config set search.mode New module src/core/search/mode-switch-ux.ts: - summarizeTransition(old, new): pure 5-cell matrix (no_change / narrowing / broadening / tokenmax_opt_in / invalid_new_mode) + reindex command + cost estimate + paste-ready callout lines. - probeWorkerAvailable(engine): worker liveness proxy. gbrain has no minion_workers heartbeat table yet (B7 follow-up from v0.19.1), so we use a proxy: minion_jobs activity within 10-min query window. Within 2 min = active; >2min but <10min = stale; nothing = never_seen. - buildReindexIdempotencyKey(): content-stable per codex D12 Bug 1. Pattern: cr-backfill:<source_id>:<chunker_version>:<mode>. NOT timestamp-based — two retries against same brain state dedupe. - runModeSwitchUx(): orchestrator. Honors GBRAIN_NO_MODE_SWITCH_UX=1 (full skip), non-TTY (print paste-ready hints to stderr), yesFlag (auto-submit reindex). For tokenmax_opt_in + TTY + worker probe active: submits via MinionQueue.add with allowProtectedSubmit=true. For probe = stale or never_seen: loud-fail per D3 with a "start a worker OR run inline" recovery hint — closes the silent-stall footgun. src/commands/config.ts hook (~30 LOC): - Captures the OLD search.mode BEFORE setConfig so summarizeTransition classifies correctly. - Fires runModeSwitchUx() AFTER setConfig persisted, wrapped in try/catch so UX failures never break the config-set that already landed. - Best-effort: failures emit `[mode-switch] UX hook failed (non-fatal)` to stderr. Test coverage (18 cases): - summarizeTransition: 8 cases covering all 5 transition kinds + null inputs + tokenmax-as-first-set + invalid mode. - probeWorkerAvailable: 4 cases via real PGLite — never_seen / active / stale (seeded via minion_jobs) + threshold constant assertion. - buildReindexIdempotencyKey: 6 cases pinning content-stable contract (codex D12 Bug 1) — identical inputs match, different inputs differ, consecutive calls match despite time delta (NOT timestamp-based). Test results: - bun test test/mode-switch-ux.test.ts: 18 pass / 0 fail - bun run typecheck: clean * v0.40.5.0 T6: gbrain mounts {enable,disable,trust-frontmatter,untrust-frontmatter} Four new mounts CLI verbs per D4: - gbrain mounts enable <id> — re-enable a disabled mount - gbrain mounts disable <id> — toggle off without removing - gbrain mounts trust-frontmatter <id> — let this mount's per-page contextual_retrieval_mode frontmatter override the source default. Off by default for mounted brains; host is always trusted. - gbrain mounts untrust-frontmatter <id> — clear the trust flag. Implementation: - src/core/brain-registry.ts MountEntry interface extended with trust_frontmatter_overrides?: boolean. loadMounts() projection threads the field through with default false (mounts opt in explicitly per D4 + D15 security posture). - src/commands/mounts.ts: new runSetMountFlag() helper handles all 4 verbs via a shared file-write path. Missing-mount loud rejection (GBrainError with list-hint). Host brain rejection. Idempotent: no-op when current value already matches. Cache refresh after each write so host agents see the new flag immediately. Test infrastructure: - GBRAIN_MOUNTS_PATH env override on getMountsPath() in BOTH brain-registry.ts AND mounts.ts (the latter has its own copy — two source-of-truth paths). Reason: libuv caches homedir() on some platforms, so withFakeHome's HOME mutation isn't picked up by tests calling runMounts(). Production callers don't set the env. Test coverage (5 new cases): - enable → disable → enable cycle persists - trust-frontmatter → untrust → trust cycle preserves other fields - missing mount id → loud rejection with list-hint (closes the critical gap from idempotent-pebble Failure Modes table) - host brain rejection: cannot trust-frontmatter "host" - enable on already-enabled mount: no-op (idempotent) Test results: - bun test test/mounts-cli.test.ts test/brain-registry.serial.test.ts: 54 pass / 0 fail - bun run typecheck: clean * v0.40.5.0 T7: gbrain sources set-cr-mode + missing-source loud rejection New verb `gbrain sources set-cr-mode <id> <mode>` per D5: - Mode argument validated against CR_MODES via isCRMode (closed enum: none | title | per_chunk_synopsis). - "unset" / "default" / "" clears the column to NULL (falls through to the global search.mode bundle). - Loud rejection on: - Missing id/mode → exit 2, prints usage - Invalid mode → exit 2, lists valid options - Missing source id → exit 4, paste-ready `gbrain sources list` hint (closes the idempotent-pebble Failure Modes critical gap) src/commands/sources.ts wired into the switch dispatch + help text updated. isCRMode + CR_MODES lazy-imported per existing import pattern in this file. Test coverage (10 cases): - happy path for all 3 valid CRMode values - unset path via "unset" + "default" both clear to NULL - invalid mode → exit 2 + no mutation - missing source id → exit 4 - missing arguments → exit 2 with usage - missing mode (only id) → exit 2 + no mutation - round-trip preserves other fields (name) Test results: - bun test test/sources-set-cr-mode.test.ts: 10 pass / 0 fail - bun run typecheck: clean * v0.40.5.0 T8: RemediationStep refactor + makeRemediationStep factory New canonical module src/core/remediation-step.ts: - RemediationStep interface (lifted from brain-score-recommendations.ts). Same shape; rename to "Step" suffix per D6 for clarity ("a step in a remediation plan"). - RemediationSeverity + RemediationStatus type re-exports. - canonicalJson(value): zero-dep canonical serialization — sorts object keys recursively before stringify. Per codex D12 Bug 2: identical logical params hash identically regardless of insertion order. - idempotencyKey(source, job, params): shape <source>:<job>:sha8(canonicalJson(params)). Lifted from the legacy inline idemKey helper so future check authors don't drift. - makeRemediationStep(opts): canonical factory. Defaults id to the idempotency key (override for human-readable like 'sync.repo'). Status defaults to 'remediable'. All check authors should use this; hand-rolling is the drift hazard the refactor closes. src/core/brain-score-recommendations.ts: - Removed the local Remediation + RemediationSeverity + RemediationStatus definitions. - Re-exports them from remediation-step.ts so existing callers (e.g. doctor.ts) still resolve. Also re-exports Remediation as an alias for RemediationStep so import paths can migrate gradually. - Imports type Remediation alias internally so the (substantial) existing computeRecommendations body keeps compiling without sed pass. Test coverage (17 cases): - canonicalJson: key-ordering determinism (3 cases), nested objects, array order preservation, primitive types, codex D12 Bug 2 regression - idempotencyKey: shape regex, content invariance, key-ordering invariance, source/job/params differentiation - makeRemediationStep: default id, explicit id override, default status, canonical-JSON invariance, all-opts threadthrough - back-compat: `import { Remediation } from brain-score-recommendations` still resolves to RemediationStep (compile + runtime check) Test results: - bun test test/remediation-step.test.ts: 17 pass / 0 fail - bun test test/brain-score-recommendations.test.ts test/doctor.test.ts: 70 pass / 0 fail (back-compat preserved) - bun run typecheck: clean Per D6 + D8: T8b in next commit wires lint, integrity, sync_failures doctor checks to emit RemediationStep via the new factory. * v0.40.5.0 T8b: RemediationStep consumers — integrity + sync_failures + 3 Minion handlers Doctor checks now emit RemediationStep via makeRemediationStep(): - `integrity` check (when bareHits > 0) emits integrity-auto step. Severity escalates to 'high' when bareHits > 50. Deterministic; $0 cost. - `sync_failures` check (when unacked > 0) emits sync-retry-failed step. Severity escalates to 'high' when count >= 10. Content-stable params (failure_count + oldest_failure timestamp) per codex D12 Bug 2. - sync-skip-failed DELIBERATELY NOT emitted per D12 Bug 3 (auto-skipping failed syncs hides data loss). Operators retain `gbrain sync --skip-failed` as a direct CLI option. Lint doctor check NOT wired — there is no `lint` check in doctor.ts today; the lint workflow is the standalone `gbrain lint` command. Adding a doctor lint check is a v0.41+ TODO when it justifies its own complete section. Three new Minion handlers in registerBuiltinHandlers (NOT in PROTECTED_JOB_NAMES — they're thin wrappers around already-shipping CLI commands, idempotent, no shell exec, MCP-safe): - lint-fix → runLintCore({ fix: true }) - integrity-auto → runIntegrity(['auto']) - sync-retry-failed → runSync(['--retry-failed']) Check.remediation field shape upgrade: - Was: inline Array<{...}> shape. - Now: RemediationStep[] from the canonical src/core/remediation-step.ts. Check authors `import { makeRemediationStep }` and emit through the factory. Test results: - bun test test/doctor.test.ts: 48 pass / 0 fail (zero regression on the doctor surface; new remediation fields are additive) - bun run typecheck: clean * v0.40.5.0 T11: capture-generation regression test (D3 + codex #5) The v0.38 ingestion cathedral added a new write path to pages via the `ingest_capture` Minion handler. The v0.40.5.0 cache-invalidation gate relies on pages.generation being bumped by EVERY write path via the BEFORE INSERT OR UPDATE trigger. This file pins that the new v0.38 capture write path correctly bumps generation through three scenarios: 1. INSERT path (codex #4 INSERT coverage): ingest_capture with a fresh slug creates a page with generation = MAX(generation) + 1 so any cache row stored before the new page existed has its bookmark fire. 2. UPDATE path: ingest_capture with an existing slug + new content → trigger fires on content-column IS DISTINCT FROM and bumps generation. 3. Idempotent UPDATE: capture with the SAME content → trigger short-circuits, no bump. Cache freshness preserved on re-runs. Per codex #5 strengthening: noEmbed: true is set explicitly so the test doesn't require API keys (test runs against pure PGLite). Test results: - bun test test/e2e/capture-generation-regression.test.ts: 3 pass / 0 fail - bun run typecheck: clean * v0.40.5.0 T9: docs — CHANGELOG fold-in + CLAUDE.md + migration skill + llms regen Single combined v0.40.5.0 CHANGELOG entry folds in v0.40.3.0 contextual retrieval content + v0.40.5.0 wave additions (cache gate + mode-switch UX + mounts/sources CLI + RemediationStep refactor). Voice per CLAUDE.md: ELI10 lead, plain language, paste-ready commands, tier table, "Things to watch", "What we caught and fixed before merging" (summarizes the 8 codex findings + 3 design decisions in user-facing terms), "Itemized changes", "## To take advantage of v0.40.5.0" mandatory self-repair block. CLAUDE.md: new section "Key commands added in v0.40.5.0 (contextual retrieval + cache gate + 4 CLI verbs)" listing the 4 new mount verbs, sources set-cr-mode, mode-switch UX, KNOBS_HASH_VERSION bump, 3 new Minion handlers, and the 3 new modules (remediation-step, query-cache-gate, mode-switch-ux). skills/migrations/v0.40.5.0.md: new migration skill with feature_pitch frontmatter for the auto-update agent. Documents the 6 master commits merged in, migration v90 (renumber from v81) + v91 (trigger), the optional opt-up to tokenmax, per-source CR mode overrides, mount frontmatter trust, the soft kill switch, and the backward-compat guarantees. bun run build:llms refreshed llms.txt + llms-full.txt: - llms.txt: 4314 bytes - llms-full.txt: 578257 bytes Test results: - bun test test/build-llms.test.ts: 7 pass / 0 fail (committed bundles byte-match generator output) * v0.40.5.0 T10: fix 5 unit-suite drift failures from the wave KNOBS_HASH_VERSION bumped 4→5 per D8 (sequenced behind salem's pending v=4 graph-signals work). Three test files held stale ==3 / ==4 assertions: - test/search-mode.test.ts: assertion + comment updated to v=5. - test/search/knobs-hash-reranker.test.ts: assertion + describe name updated to v=5 ladder. - test/cross-modal-phase1.test.ts: assertion + name updated to v=5. reindex.test.ts "skips pages already at current chunker_version" — the v0.40.3.0 reindex predicate (`chunker_version < CURRENT OR contextual_retrieval_mode IS NULL`) caught the should-skip page because its CR mode was NULL. Fixed by seeding `contextual_retrieval_mode = 'title'` on the should-skip row. reindex.test.ts "idempotent: re-run on a fully-updated brain reports nothing to do" — by design, `--no-embed` reindex bumps chunker_version but skips CR-state stamping (import-file.ts:457-466 documents this). Fixed by manually stamping `contextual_retrieval_mode = 'title'` between the first and second reindex calls so the brain matches the "fully updated" state the idempotency test name implies. Production embed flow stamps both in one pass; the test uses --no-embed only to avoid requiring API keys. Test results: - bun run verify (typecheck + 4 pre-checks): clean - bun run test: 9482 pass / 0 fail / 0 skip across 410s * v0.40.3.0: rename version from 0.40.5.0 → 0.40.3.0 (clean slot above master) Master is at v0.40.2.0; v0.40.3.0 is genuinely the next free slot. The wave was originally planned as v0.40.5.0 sequenced behind salem (PR #1300 = v0.40.4.0) but the user is shipping THIS branch as v0.40.3.0 because: 1. v0.40.3.0 IS the canonical version slot for the contextual retrieval cathedral (matches branch name garrytan/v0.40.3.0-contextual-retrieval). 2. Master is at v0.40.2.0 — v0.40.3.0 is the immediate next slot, not a collision. 3. salem's v0.40.4.0 + any v0.40.5.0 work sit ON TOP of this in the landing train, not under it. Mechanical rename only — no content changes from the v0.40.5.0 commit sequence (T1-T11 wave is preserved verbatim, just relabeled): - VERSION + package.json: 0.40.5.0 → 0.40.3.0 - bun.lock: refreshed (no dep changes) - CHANGELOG.md: ## [0.40.5.0] header → ## [0.40.3.0] + body references - skills/migrations/v0.40.5.0.md → skills/migrations/v0.40.3.0.md (previous v0.40.3.0.md file overwritten with the richer T9 content) - CLAUDE.md: "Key commands added in v0.40.5.0" → "v0.40.3.0" - 30 source + test files: comment references swept via sed s/0.40.5.0/0.40.3.0/g - llms.txt + llms-full.txt: regenerated Migration numbering UNCHANGED: v90 (renamed from original v81 because master took v82-v88) and v91 (new trigger migration) stay at v90/v91 — the version slot is orthogonal to the migration ledger collision. KNOBS_HASH_VERSION = 5 stays — sequenced behind master's v=4 schema-pack work; salem's v=4 graph-signals will rebump to v=5 if it lands first. Test results after rename: - bun run verify: clean (typecheck + 7 pre-checks) - bun run test: 9482 pass / 0 fail / 0 skip Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(migrate): v91 CREATE INDEX CONCURRENTLY can't run inside a transaction (CI Tier 1) CI Tier 1 (Mechanical) failed on real Postgres with: ERROR: CREATE INDEX CONCURRENTLY cannot run inside a transaction block STATEMENT: <v91 multi-statement SQL block including CREATE INDEX CONCURRENTLY ...> Root cause: postgres.js's multi-statement `.unsafe()` wraps the entire block in an implicit transaction. `transaction: false` on the migration entry doesn't help — the implicit wrap happens at the driver layer, below the migration runner. CONCURRENTLY refuses to run inside any transaction. Fix: rewrite v91 using the v14 pages_updated_at_index handler pattern — `sql: ''` + `handler:` function that splits the work into separate `engine.runMigration()` calls: 1. Columns + trigger function + trigger (single multi-statement runMigration — ALTER/CREATE FUNCTION/CREATE TRIGGER are transaction-safe). 2. On Postgres only: pre-drop invalid index remnant via `pg_index.indisvalid` (matches v14 pattern for retry safety after a failed CONCURRENTLY left a half-built index with the target name). 3. CREATE INDEX CONCURRENTLY as a standalone runMigration call (separate statement = no implicit transaction wrap). 4. PGLite: plain CREATE INDEX (no CONCURRENTLY needed — single writer). Verified against real Postgres (pgvector:pg16): - schema_version=91 after init - pages_generation_idx exists with btree shape - bump_page_generation_trg installed - test/e2e/postgres-bootstrap.test.ts + test/e2e/schema-drift.test.ts: 8 pass / 0 fail - bun test test/migrate.test.ts test/schema-bootstrap-coverage.test.ts: 161 pass / 0 fail - bun run typecheck: clean --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Master landed v0.40.1.0 → v0.40.3.0 while this branch sat (LongMemEval batch, trajectory routing, contextual retrieval). Both waves added a ModeBundle knob; both bumped KNOBS_HASH_VERSION. Resolution keeps both knobs co-existing and consolidates the cache-key version at 5 (master's contextual-retrieval wave explicitly sequenced behind salem's pending v=4 graph signals — first to land claims v=4, second rebases to v=5). Conflicts resolved: - VERSION + package.json + CHANGELOG.md: kept 0.40.4.0 - src/core/search/mode.ts: kept BOTH graph_signals (mine) and contextual_retrieval + contextual_retrieval_disabled (master) across ModeBundle interface, all 3 MODE_BUNDLES bundles, SearchKeyOverrides, SearchPerCallOpts, resolveSearchMode picker, loadOverridesFromConfig, and SEARCH_MODE_CONFIG_KEYS. Single KNOBS_HASH_VERSION = 5 with merged comment chain. parts[] order: gs= and pack= at v=4 tier; cr= and crd= at v=5 tier. - src/commands/search.ts: KNOB_DESCRIPTIONS includes both knob blocks. - test/search-mode.test.ts: canonical-bundle assertions include both fields per mode (graph_signals + contextual_retrieval); single expect(KNOBS_HASH_VERSION).toBe(5) with combined rationale comment. - test/search/knobs-hash-reranker.test.ts: single version-5 assertion with ladder explanation (1→2 reranker; 2→3 floor_ratio + cross-modal; 3→4 graph_signals + schema_pack; 4→5 contextual_retrieval). - test/cross-modal-phase1.test.ts: same — single version-5 assertion. Verified: bun run typecheck clean; mode + cross-modal + graph-signals test suites (142 tests) all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
May 25, 2026
CI failure surfaced a time-dependent test flake in `test/audit/audit-writer.test.ts` "returns events from current week, filtered by ts cutoff" (added in v0.40.4.0 PR #1300). The test pinned synthetic `now = 2026-05-22T12:00:00Z` (ISO week 21), logged 3 events with synthetic ts values, then called `readRecent(7, now)` expecting to find 2 events in window. Root cause: `log()` ignored the caller-supplied `ts` for filename routing and ALWAYS wrote to the file matching real-time-now's ISO week. When real CI time crossed into 2026-W22 (this Monday), the events went to W22's file but `readRecent` walked W21 + W20 → 0 hits. Fix: - `log()` parses `event.ts` (when provided) and routes to the file matching that ts's ISO week. Falls back to real-now when ts is missing or unparseable. - No behavior change for production callers — none of the 5 audit consumers pass `ts` explicitly (rerank-audit, audit-slug-fallback, content-sanity-audit, graph-signals, supervisor-audit). The writer stamps real-now → both ts and filename use real-now → same file as before. - Sibling test "honors caller-supplied ts override" also pinned a fixed ts and would have broken from the opposite angle (test read from `computeFilename()` default = real-now). Updated to read from `computeFilename(new Date(fixedTs))` so it asserts the per-row file routing the wave now provides. 22/22 audit-writer cases pass. Production callers (5 sites) unchanged. Pre-existing on master since v0.40.4.0; surfaced when real time crossed into a different ISO week than the test's synthetic now. NOT introduced by this PR (#1377 community-PR-wave) — audit-writer files aren't touched by the wave.
garrytan
added a commit
that referenced
this pull request
May 25, 2026
…ed dream judge (6 community PRs) (#1377) * fix(cli): use fd 0 instead of '/dev/stdin' for cross-platform stdin reads `readFileSync('/dev/stdin', 'utf-8')` works on Unix but fails on Windows (Git Bash, PowerShell, cmd) with `ENOENT: no such file or directory, open '/dev/stdin'`. Windows doesn't expose `/dev/stdin` as a filesystem path. Reading file descriptor 0 directly (`readFileSync(0, 'utf-8')`) is the documented Node.js idiom and works on every platform. No behavior change on Unix — same syscall path, same semantics. Repro on Windows before the fix: echo "test" | gbrain put my-page ENOENT: no such file or directory, open '/dev/stdin' After: round-trip put/search/delete works on Windows Git Bash. * v0.40.6.1 feat: llama-server reranker — local Qwen3 / self-hosted ZE via llama.cpp Adds local reranker support so users can point gbrain's reranker call at their own llama.cpp server instead of ZeroEntropy's hosted API. One new recipe (`llama-server-reranker`), a `path?: string` + `default_timeout_ms?: number` extension on `RerankerTouchpoint`, env passthrough wiring, budget-tracker `FREE_LOCAL_RERANK_PROVIDERS` set so `--max-cost` callers don't TX2 hard-fail on local rerank, and a doctor-probe divergence fix (probe and live search now read the same `search.reranker.model` path via `loadSearchModeConfig` + `resolveSearchMode`). ZE-hosted users are unchanged. Voyage / Cohere / vLLM rerankers stay out of scope — different wire shapes need adapter hooks designed against their actual shapes in a follow-up plan. Verification: - `bun run verify` (typecheck + 13 pre-checks): clean - `bun run check:all` (15 historical checks): clean - 107/107 expect() calls pass across 5 affected test files - /codex review against the full diff: GATE PASS (caught one [P2] /v1 path doubling bug pre-merge; fixed by changing recipe path to leaf `/rerank`) - Claude adversarial subagent: 7 net-new findings filed as v0.40.7+ TODOs (none currently exploitable; hardening for future contributor traps) Test surface (107 cases, 5 files): - test/ai/rerank.test.ts: path override (exact URL match), default_timeout_ms honored, empty models[] accepts any id, ZE regression - test/ai/recipe-llama-server-reranker.test.ts: recipe shape regression guard + base_url + path concat assertion (codex-caught /v1/v1/ regression) - test/search-mode.test.ts: timeout precedence chain (per-call > config > recipe > bundle), ZE no-recipe-default regression, unknown provider fallthrough - test/models-doctor-reranker.test.ts: divergence-fix helper across DB-plane read, mode default, disabled, override, DB-error graceful fallback - test/core/budget/budget-tracker.test.ts: free-local rerank pricing + arbitrary model id + chat-kind TX2 hard-fail preserved Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: post-ship documentation sync * docs: index docs/ai-providers/ in llms.txt (zeroentropy + llama-server-reranker) The hand-curated llms-config.ts doc map never included docs/ai-providers/, so both zeroentropy.md (since v0.35.0.0) and the new llama-server-reranker.md were invisible to the AI-facing llms.txt / llms-full.txt index. Adds an "AI providers" section with both. Marked includeInFull: false (setup walkthroughs belong in the index but would push the single-fetch bundle past FULL_SIZE_BUDGET) — same treatment CHANGELOG.md gets. Caught by the /ship document-release subagent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: recipe-aware embedding-provider check for local providers doctor --remediation-plan and autopilot both judged the embedding provider with a hosted-only key check, so a brain on ollama: or llama-server: was reported "blocked" on a missing API key it never needed, contradicting doctor --json's 100%-coverage health. Extract a shared embeddingProviderConfigured() helper into brain-score-recommendations.ts: empty auth_env.required (local providers) is configured with no key; hosted providers check their OWN required key. Both producers (doctor, autopilot) call it, killing the DRY violation that caused the bug. Hosted brains with a missing key still block. * fix(budget): price local embed providers at $0 A --max-cost-bounded embed/reindex job configured for ollama: or llama-server: TX2 hard-failed with no_pricing because lookupEmbeddingPrice has no entry for local models. Add FREE_LOCAL_EMBED_PROVIDERS (sibling to FREE_LOCAL_RERANK_PROVIDERS) so a pricing miss on a local-inference provider returns $0 instead of null. lmstudio/litellm intentionally excluded. * feat(models): embedding reachability probe in gbrain models doctor A down/misconfigured local embed server was invisible until first embed. Add probeEmbeddingReachability() (mirrors the reranker probe): a 1-input embed with a 5s abort timeout, classified via classifyError, under a new 'embedding_reachability' touchpoint, gated on the zero-network config probe returning ok first. * fix: don't count config-plane voyage/google keys as configured codex review caught a false positive: HOSTED_EMBED_KEY_CONFIG mapped VOYAGE_API_KEY/GOOGLE_GENERATIVE_AI_API_KEY to config fields, but buildGatewayConfig only threads openai/anthropic/zeroentropy config keys into the gateway env. A Voyage/Google brain with the key only in config.json would be judged "configured" and dispatch an embed.stale job that then fails auth at the gateway. Drop those two from the map so the producer closures resolve them by env var only, matching what the gateway can actually use. Pinned by a regression test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(dream): route significance judge through gateway.chat for multi-provider support Replaces the hardcoded `new Anthropic()` client in the dream-cycle synthesize phase with a gateway-routed JudgeClient adapter. Mirrors the v0.35.5.0 pattern that closed #952 for runThink: construction-time provider/key probe returns null on a clear miss (cheap pre-flight); the verdict loop wraps the chat call in try/catch for AIConfigError mid-run. Any provider with a registered gateway recipe (Anthropic, DeepSeek, OpenRouter, Voyage, Ollama, llama-server, etc.) is now reachable via: gbrain config set models.dream.synthesize_verdict <provider>:<model> The canonical config key `models.dream.synthesize_verdict` (per PER_TASK_KEYS in src/core/model-config.ts) is used unchanged. The exported JudgeClient interface signature is preserved for test-seam stability. The original community PR (#1349) shipped a custom fetch adapter that bypassed the gateway entirely. This reworked landing routes through the canonical seam so future provider additions automatically benefit, and a CI guard (T7) will land in this wave to prevent the bug class from re-opening (the same one that bit src/core/think/index.ts before v0.35.5.0). Co-Authored-By: justemu <206393437+justemu@users.noreply.github.com> * test(dream): synthesize-gateway-adapter unit tests + R3 parsed-verdict parity 11 cases pin the gateway-routed JudgeClient adapter from T5: - A1: makeJudgeClient returns null on missing Anthropic key (legacy short-circuit preserved) - A2: returns a JudgeClient when chat provider is reachable - A3: JudgeClient.create routes through gateway.chat (via __setChatTransportForTests) - A4: ChatResult.text → Anthropic.Message.content[0].text mapping - A5: empty text from gateway → graceful empty-text Anthropic.Message - A6: non-AIConfigError from gateway propagates to caller (no swallow) - A7: AIConfigError from gateway propagates as AIConfigError (caught per-transcript in production loop) - A8: makeJudgeClient returns null on unknown provider prefix - A9: returns a JudgeClient for non-anthropic providers without env-probing (delegates to gateway at call time) - R3: parsed-verdict SEMANTIC parity — gateway-routed and legacy SDK-shape JudgeClients produce same {worth_processing, reasons} given identical canned LLM text - R3 corollary: unparseable LLM output → both paths fall through to cheap-fallback verdict Codex flagged byte-identical-Anthropic.Message as a meaningless gate; R3 is parsed-verdict semantic parity instead. Mirror pattern of test/think-gateway-adapter.test.ts for cross-site consistency with the v0.35.5.0 runThink migration. * ci: guard against direct Anthropic SDK construction in gateway-routed files New scripts/check-gateway-routed-no-direct-anthropic.sh greps two guarded files (src/core/cycle/synthesize.ts and src/core/think/index.ts) for `new Anthropic()` constructor calls and runtime imports of @anthropic-ai/sdk. Type-only imports (`import type Anthropic from '@anthropic-ai/sdk'`) stay allowed because both files use Anthropic.Message / .MessageCreateParamsNonStreaming as adapter types. Comment lines (starting with `//` or ` *`) are excluded so historical references in JSDoc don't false-fire. Negative test in this commit's verification confirms: injecting `new Anthropic()` into synthesize.ts makes the guard exit 1 with a clear error pointing at the gateway adapter pattern; reverting restores the OK state. Wired into both `bun run verify` and `bun run check:all`. Closes the bug class that bit synthesize.ts in PR #1349 (which would have shipped a parallel fetch stack instead of routing through the canonical gateway). The same class previously bit think/index.ts and was fixed structurally in v0.35.5.0; this guard prevents either file from regressing. Extend GUARDED_FILES in the script when migrating another file off direct SDK construction. * docs(put_page): point Windows / pipe-buffer users at gbrain capture --file Extends the put_page op description (surfaced by `gbrain put --help`) with a one-line pointer to `gbrain capture --file PATH --slug SLUG` for the file- as-input use case. Capture (v0.39.3.0) is the canonical Windows-pipe-buffer escape route: reads files as a Buffer first, scans the first 8KB for NUL bytes to refuse binary content, decodes to UTF-8 only after the safety check, and adds provenance write-through. Lands the user-facing value the closed PR #1365 was reaching for, without duplicating the CLI surface. Credits the original contributor. Co-Authored-By: ecat2010 <90021101+ecat2010@users.noreply.github.com> * test: R1+R2+R4 critical regression pins for the community-PR-wave landing Per the wave's eng-review plan (IRON RULE — mandatory): R1 — get_page handler accepts calls without `content` param. Pre-wave PR #1365 landed its `!p.content → throw` check in the WRONG handler (get_page instead of put_page), which would have broken every read in the system. Pin: get_page MUST NOT require content + the schema carries no `content` or `file` param. R2 — put_page schema content stays `required: true`. PR #1365 also flipped `content` from required→optional in the schema. Pin: the contract stays at `required: true` + the closed PR's `file` param is NOT in the schema. R4 — Cross-platform stdin via fd 0 (PR #1325 regression pin). Source-grep asserts src/cli.ts uses `readFileSync(0, ...)` and NOT the legacy `readFileSync('/dev/stdin', ...)`. Belt-and-suspenders pattern assertions confirm the parseOpArgs branch shape (cliHints.stdin check, 5MB cap, isTTY gate) hasn't drifted. R3 (gateway-adapter parsed-verdict parity) lives in the sibling file test/cycle/synthesize-gateway-adapter.test.ts. * test(e2e): update dream-synthesize no-key reason text + harden hermeticity After T5's gateway-adapter rework, the "no API key" verdict text changed from 'no ANTHROPIC_API_KEY for significance judge' to 'no configured provider for verdict model: <model>' (broader + names the actual model so the user sees WHICH provider failed). Update both assertions that check the old text. Hermeticity bug fix in the same commit: `withoutAnthropicKey` previously only cleared the env var. After the rework, `makeJudgeClient` ALSO checks `loadConfig().anthropic_api_key` (same hasAnthropicKey() pattern think/index.ts uses since v0.35.5.0). If the developer running the test has the key set in ~/.gbrain/config.json, the test would behave non-deterministically. Fix: override GBRAIN_HOME to a fresh tmpdir for the duration of the body, restore on return (even on throw). * test(e2e): pin verdict-loop AIConfigError catch from T5 rework end-to-end Drives runPhaseSynthesize against a real PGLite engine with the gateway chat transport stubbed to throw AIConfigError on every call (simulates a revoked/misconfigured provider surfacing mid-run). Asserts: - Phase does NOT crash; converts the throw to a per-transcript verdict with worth=false and reasons[0] matching "gateway error: ...". - status='ok' so subsequent transcripts in the loop would continue being judged (not visible in 1-transcript test, but the loop shape is proven not to abort). Pre-rework (T5), this code path didn't exist — judgeSignificance threw directly to runPhaseSynthesize and crashed the whole phase. Pin so a future regression that removes the try/catch fires loudly. * docs(claude.md): annotate v0.41+ community-PR-wave changes Two additions to the Key files section: - src/core/cycle/synthesize.ts — appends a v0.41+ paragraph documenting the gateway-adapter rework (makeJudgeClient + AIConfigError catch loop + canonical config key + JudgeClient interface preserved + CI guard reference + test file references). - scripts/check-gateway-routed-no-direct-anthropic.sh — new entry documenting the CI guard's contract, scope, and how to extend GUARDED_FILES when migrating another file off direct SDK construction. CLAUDE.md drives /sync-gbrain and llms.txt generation; both need the wave's annotations to land BEFORE the llms regeneration step (T10). * docs(llms): regenerate llms.txt + llms-full.txt for v0.41+ wave Refreshes the auto-generated llms.txt bundles to pick up the CLAUDE.md annotations landed earlier in this wave (gateway-adapter synthesize.ts + check-gateway-routed-no-direct-anthropic.sh + the cherry-picked llama-server-reranker recipe). Pinned by test/build-llms.test.ts. * fix(providers): dynamic-width id column accommodates llama-server-reranker v0.40.6.1 introduced `llama-server-reranker` (21 chars), which overflowed formatRecipeTable's static 14-char PROVIDER column. When the id is longer than the column, padEnd is a no-op — the row starts with the tier name directly, no space delimiter. test/providers.test.ts 'each recipe appears at most once' iterates every recipe and asserts at least one row starts with `${id} ` or `${id} `; with no space after `llama-server-reranker`, the assertion fails and the recipe appears effectively missing from the human-readable list. Fix: compute column width dynamically as `max(14, max(id.length) + 1)` so every id is followed by at least one space, regardless of length. Also widens the separator rule to match. 14 stays as the floor so the existing short-id rows (openai 6, ollama 6, anthropic 9, ...) keep their familiar layout when llama-server-reranker isn't in the active recipe set. 10/10 cases in test/providers.test.ts pass after the fix. * chore: pre-landing review polish — refresh models doctor tip + file embed timeout TODO Two pre-landing review absorptions: - `src/commands/models.ts:154` — the help-text tip said `gbrain models doctor` "spends ~1 token per model" but the wave added an `embed(['probe'])` call AND a reranker probe. Generalize to "spends a minimal request per configured chat/embed/rerank surface" so the cost expectation matches reality. - `TODOS.md` — file a follow-up to widen `default_timeout_ms` from RerankerTouchpoint to EmbeddingTouchpoint so `probeEmbeddingReachability` doesn't hardcode 5000ms while the sibling reranker probe reads the recipe's configured timeout. Local CPU embedding endpoints (llama-server) hit the same cold-start curve as Qwen3-Reranker-4B; workaround today is "re-run the probe" per the existing JSDoc. Other informational findings from pre-landing review either match established patterns (no behavioral test for `probeEmbeddingReachability`, matching `probeRerankerReachability`), are intentional choices documented in JSDoc (the `as unknown as Anthropic.Message` cast), or are micro-perf in non-hot paths (autopilot's 4 sequential `getConfig` awaits per 5-minute tick). All non-blocking. * ci: tighten gateway-routed guard against import bypass shapes + honest JSDoc Adversarial review caught two soft spots in the wave's new contracts: 1. `scripts/check-gateway-routed-no-direct-anthropic.sh` only matched the default-import shape `import Anthropic from '@anthropic-ai/sdk'`. A future contributor (or, more realistically, a future refactor) could bypass with: - `import { Anthropic } from '@anthropic-ai/sdk'` - `import { Anthropic as A } from '@anthropic-ai/sdk'` - `import * as Anthropic from '@anthropic-ai/sdk'` - `const x = await import('@anthropic-ai/sdk')` Tightened the regex to match ANY value-shaped import from the SDK module (excluding only the explicit `import type ... from '@anthropic-ai/sdk'` form which the adapter's Anthropic.Message return type needs). Added a second grep for dynamic imports. Verified all four bypass shapes now trigger the guard against synthesize.ts; type-only import still passes. 2. `synthesize.ts:makeJudgeClient` JSDoc claimed the adapter "tolerates the array-of-blocks shape for future flexibility" — but the mapping flattens ONLY text blocks; `tool_use`, `tool_result`, image blocks silently become empty strings. Today only `judgeSignificance` calls this and it only sends string content, so no behavior bug. But the comment was marketing future flexibility the code doesn't deliver. Narrowed to call out the silent-drop and say to extend the mapping if a future caller wires non-text content through. Both wave-scope: the CI guard was added by the wave, the JSDoc was added by the wave's T5 rework. Adversarial review caught them before merge. * fix(models doctor): reranker probe timeout matches live search precedence chain Codex Pass-9 adversarial review caught a probe-vs-production divergence: production `hybridSearch` resolves reranker timeout via the full chain (per-call > config > recipe > bundle) by going through `loadSearchModeConfig + resolveSearchMode`, but `probeRerankerReachability` was reading ONLY the recipe's `default_timeout_ms` — so an operator who set `search.reranker.timeout_ms=1000` would see doctor wait 30s and report "reachable" while production search timed out at 1s and fail-opened. A higher configured timeout produces the opposite false failure (probe gives up at 5s when production would have waited longer). Fix: extract `resolveLiveRerankerTimeoutMs(engine)` parallel to the existing `resolveLiveRerankerModel(engine)` — same precedence chain, same DB-plane consistency posture. The probe now reads the SAME timeout live search reads, on the same lookup path. The codex P1 finding about `FREE_LOCAL_*_PROVIDERS` zero-pricing being bypassable via redirected `LLAMA_SERVER_BASE_URL` is filed as a TODO under community-pr-wave follow-ups — couples with the existing FREE_LOCAL_PROVIDERS unification TODO so both close in one v0.41+ PR. * ci(guard): handle mixed type+value imports + macOS BSD sed POSIX classes Codex structured review [P3] caught a bypass in the freshly-tightened gateway-routed guard: import { type Message, Anthropic } from '@anthropic-ai/sdk'; new Anthropic(); The previous regex `^\s*import\s+[^t][^y]*from ...` was meant to exclude `import type ...` but stops at the `y` in `type` inside the brace list, silently allowing the value-import `Anthropic` through. Two fixes: 1. Replace the brittle regex-based type-exclusion with a clause-level parse: extract the brace-list specifiers, allow the import iff EVERY non-empty specifier is `type`-prefixed. Catches mixed-import bypasses (`{ type Foo, Bar }`) while keeping all-type braces (`{ type Foo, type Bar }`) passing. Default + namespace imports remain always-value-shaped. 2. Replace `\s` with POSIX `[[:space:]]` in the sed extract — macOS BSD sed doesn't honor `\s` in extended-regex mode (it silently no-ops the pattern so `specifiers` comes back empty and the script falls through to the default/namespace branch's wrong error message). Hermetic 7-shape regression matrix now verifies every TypeScript import shape against the expected ALLOW/BLOCK verdict; all 7 pass: - ALLOW: `import type Anthropic from '...'` - ALLOW: `import type { Foo } from '...'` - ALLOW: `import { type Message, type Foo } from '...'` - BLOCK: `import { type Message, Anthropic } from '...'` - BLOCK: `import { Anthropic } from '...'` - BLOCK: `import Anthropic from '...'` - BLOCK: `import * as A from '...'` Subshell-trap fix in the same commit: the previous "exit 1 inside while-pipe" pattern doesn't propagate to the outer `$?` because the pipe spawns a subshell. Switched to a tmpfile-flagged sentinel so the verdict survives the subshell boundary cleanly. * chore: bump version and changelog (v0.41.4.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(audit-writer): route log() to file matching event ts, not real-now CI failure surfaced a time-dependent test flake in `test/audit/audit-writer.test.ts` "returns events from current week, filtered by ts cutoff" (added in v0.40.4.0 PR #1300). The test pinned synthetic `now = 2026-05-22T12:00:00Z` (ISO week 21), logged 3 events with synthetic ts values, then called `readRecent(7, now)` expecting to find 2 events in window. Root cause: `log()` ignored the caller-supplied `ts` for filename routing and ALWAYS wrote to the file matching real-time-now's ISO week. When real CI time crossed into 2026-W22 (this Monday), the events went to W22's file but `readRecent` walked W21 + W20 → 0 hits. Fix: - `log()` parses `event.ts` (when provided) and routes to the file matching that ts's ISO week. Falls back to real-now when ts is missing or unparseable. - No behavior change for production callers — none of the 5 audit consumers pass `ts` explicitly (rerank-audit, audit-slug-fallback, content-sanity-audit, graph-signals, supervisor-audit). The writer stamps real-now → both ts and filename use real-now → same file as before. - Sibling test "honors caller-supplied ts override" also pinned a fixed ts and would have broken from the opposite angle (test read from `computeFilename()` default = real-now). Updated to read from `computeFilename(new Date(fixedTs))` so it asserts the per-row file routing the wave now provides. 22/22 audit-writer cases pass. Production callers (5 sites) unchanged. Pre-existing on master since v0.40.4.0; surfaced when real time crossed into a different ISO week than the test's synthetic now. NOT introduced by this PR (#1377 community-PR-wave) — audit-writer files aren't touched by the wave. --------- Co-authored-by: Tobias <34135750+tobbecokta@users.noreply.github.com> Co-authored-by: kohai-ut <chris@tincreek.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: justemu <noreply@github.com> Co-authored-by: justemu <206393437+justemu@users.noreply.github.com> Co-authored-by: ecat2010 <90021101+ecat2010@users.noreply.github.com>
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
May 28, 2026
* upstream/master: (22 commits) v0.41.4.0 wave: local providers + cross-platform stdin + gateway-routed dream judge (6 community PRs) (garrytan#1377) v0.41.3.0 fix(security/mcp): OAuth CORS lockdown + pre-register without DCR + validator surface (garrytan#1403) v0.41.2.0 feat: lens packs + epistemology unification — atoms + concepts as first-class units, calibration profile widening, gstack-learnings bridge (garrytan#1364) v0.41.1.0 feat: eval-loop wave — gbrain bench publish + gbrain eval gate close the LOOP (garrytan#1352) v0.41.0.0 feat(minions): fleet you supervise (4 field bugs + cathedral) (garrytan#1367) v0.40.10.0 feat: content sanity defense — junk-pattern throw + oversize-skip-embed (garrytan#1351) v0.40.9.0 feat(chunker): .sql indexing via tree-sitter + code-def on SQL DDL (garrytan#1173) (garrytan#1350) v0.40.8.1 docs: README rewrite + personal-brain + company-brain tutorials (garrytan#1345) v0.40.8.0 test: e2e + unit gap coverage + master flake root-cause fixes (garrytan#1313) v0.40.6.1 docs(todos): file v0.41 wave commitments + 7 verified-missing items (garrytan#1333) v0.40.7.0 Schema Cathedral v3 — agent-on-ramp + production rebuild of PR garrytan#1321 (garrytan#1327) v0.40.6.0 feat(sync): parallel sync --all + per-source lock invariant + sources status dashboard (productionized from PR garrytan#1314) (garrytan#1324) v0.40.5.0 Federated Sync v2 — parallel source sync + push triggers + per-source health (garrytan#1322) v0.40.4.0 feat(search): selective graph signals + per-stage attribution + audit-writer unification (garrytan#1300) v0.40.3.0 feat: contextual retrieval + cache invalidation gate + 4 deferred-item closures (garrytan#1323) v0.40.2.0 feat: trajectory routing for temporal + knowledge_update (gbrain think + LongMemEval) (garrytan#1296) v0.40.1.0 Track D — eval infrastructure (catch retrieval regressions, prove answer-quality wins) (garrytan#1298) v0.40.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm (garrytan#1128) v0.39.3.0: productionize the v0.38 ingestion cathedral (smoke-test fix wave from PR garrytan#1299) (garrytan#1308) v0.39.2.0 feat(autopilot): per-source fan-out + cycle lock primitive + phase taxonomy (garrytan#1295) ...
garrytan-agents
pushed a commit
to garrytan-agents/gbrain
that referenced
this pull request
Jun 13, 2026
…ed dream judge (6 community PRs) (garrytan#1377) * fix(cli): use fd 0 instead of '/dev/stdin' for cross-platform stdin reads `readFileSync('/dev/stdin', 'utf-8')` works on Unix but fails on Windows (Git Bash, PowerShell, cmd) with `ENOENT: no such file or directory, open '/dev/stdin'`. Windows doesn't expose `/dev/stdin` as a filesystem path. Reading file descriptor 0 directly (`readFileSync(0, 'utf-8')`) is the documented Node.js idiom and works on every platform. No behavior change on Unix — same syscall path, same semantics. Repro on Windows before the fix: echo "test" | gbrain put my-page ENOENT: no such file or directory, open '/dev/stdin' After: round-trip put/search/delete works on Windows Git Bash. * v0.40.6.1 feat: llama-server reranker — local Qwen3 / self-hosted ZE via llama.cpp Adds local reranker support so users can point gbrain's reranker call at their own llama.cpp server instead of ZeroEntropy's hosted API. One new recipe (`llama-server-reranker`), a `path?: string` + `default_timeout_ms?: number` extension on `RerankerTouchpoint`, env passthrough wiring, budget-tracker `FREE_LOCAL_RERANK_PROVIDERS` set so `--max-cost` callers don't TX2 hard-fail on local rerank, and a doctor-probe divergence fix (probe and live search now read the same `search.reranker.model` path via `loadSearchModeConfig` + `resolveSearchMode`). ZE-hosted users are unchanged. Voyage / Cohere / vLLM rerankers stay out of scope — different wire shapes need adapter hooks designed against their actual shapes in a follow-up plan. Verification: - `bun run verify` (typecheck + 13 pre-checks): clean - `bun run check:all` (15 historical checks): clean - 107/107 expect() calls pass across 5 affected test files - /codex review against the full diff: GATE PASS (caught one [P2] /v1 path doubling bug pre-merge; fixed by changing recipe path to leaf `/rerank`) - Claude adversarial subagent: 7 net-new findings filed as v0.40.7+ TODOs (none currently exploitable; hardening for future contributor traps) Test surface (107 cases, 5 files): - test/ai/rerank.test.ts: path override (exact URL match), default_timeout_ms honored, empty models[] accepts any id, ZE regression - test/ai/recipe-llama-server-reranker.test.ts: recipe shape regression guard + base_url + path concat assertion (codex-caught /v1/v1/ regression) - test/search-mode.test.ts: timeout precedence chain (per-call > config > recipe > bundle), ZE no-recipe-default regression, unknown provider fallthrough - test/models-doctor-reranker.test.ts: divergence-fix helper across DB-plane read, mode default, disabled, override, DB-error graceful fallback - test/core/budget/budget-tracker.test.ts: free-local rerank pricing + arbitrary model id + chat-kind TX2 hard-fail preserved Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: post-ship documentation sync * docs: index docs/ai-providers/ in llms.txt (zeroentropy + llama-server-reranker) The hand-curated llms-config.ts doc map never included docs/ai-providers/, so both zeroentropy.md (since v0.35.0.0) and the new llama-server-reranker.md were invisible to the AI-facing llms.txt / llms-full.txt index. Adds an "AI providers" section with both. Marked includeInFull: false (setup walkthroughs belong in the index but would push the single-fetch bundle past FULL_SIZE_BUDGET) — same treatment CHANGELOG.md gets. Caught by the /ship document-release subagent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: recipe-aware embedding-provider check for local providers doctor --remediation-plan and autopilot both judged the embedding provider with a hosted-only key check, so a brain on ollama: or llama-server: was reported "blocked" on a missing API key it never needed, contradicting doctor --json's 100%-coverage health. Extract a shared embeddingProviderConfigured() helper into brain-score-recommendations.ts: empty auth_env.required (local providers) is configured with no key; hosted providers check their OWN required key. Both producers (doctor, autopilot) call it, killing the DRY violation that caused the bug. Hosted brains with a missing key still block. * fix(budget): price local embed providers at $0 A --max-cost-bounded embed/reindex job configured for ollama: or llama-server: TX2 hard-failed with no_pricing because lookupEmbeddingPrice has no entry for local models. Add FREE_LOCAL_EMBED_PROVIDERS (sibling to FREE_LOCAL_RERANK_PROVIDERS) so a pricing miss on a local-inference provider returns $0 instead of null. lmstudio/litellm intentionally excluded. * feat(models): embedding reachability probe in gbrain models doctor A down/misconfigured local embed server was invisible until first embed. Add probeEmbeddingReachability() (mirrors the reranker probe): a 1-input embed with a 5s abort timeout, classified via classifyError, under a new 'embedding_reachability' touchpoint, gated on the zero-network config probe returning ok first. * fix: don't count config-plane voyage/google keys as configured codex review caught a false positive: HOSTED_EMBED_KEY_CONFIG mapped VOYAGE_API_KEY/GOOGLE_GENERATIVE_AI_API_KEY to config fields, but buildGatewayConfig only threads openai/anthropic/zeroentropy config keys into the gateway env. A Voyage/Google brain with the key only in config.json would be judged "configured" and dispatch an embed.stale job that then fails auth at the gateway. Drop those two from the map so the producer closures resolve them by env var only, matching what the gateway can actually use. Pinned by a regression test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(dream): route significance judge through gateway.chat for multi-provider support Replaces the hardcoded `new Anthropic()` client in the dream-cycle synthesize phase with a gateway-routed JudgeClient adapter. Mirrors the v0.35.5.0 pattern that closed garrytan#952 for runThink: construction-time provider/key probe returns null on a clear miss (cheap pre-flight); the verdict loop wraps the chat call in try/catch for AIConfigError mid-run. Any provider with a registered gateway recipe (Anthropic, DeepSeek, OpenRouter, Voyage, Ollama, llama-server, etc.) is now reachable via: gbrain config set models.dream.synthesize_verdict <provider>:<model> The canonical config key `models.dream.synthesize_verdict` (per PER_TASK_KEYS in src/core/model-config.ts) is used unchanged. The exported JudgeClient interface signature is preserved for test-seam stability. The original community PR (garrytan#1349) shipped a custom fetch adapter that bypassed the gateway entirely. This reworked landing routes through the canonical seam so future provider additions automatically benefit, and a CI guard (T7) will land in this wave to prevent the bug class from re-opening (the same one that bit src/core/think/index.ts before v0.35.5.0). Co-Authored-By: justemu <206393437+justemu@users.noreply.github.com> * test(dream): synthesize-gateway-adapter unit tests + R3 parsed-verdict parity 11 cases pin the gateway-routed JudgeClient adapter from T5: - A1: makeJudgeClient returns null on missing Anthropic key (legacy short-circuit preserved) - A2: returns a JudgeClient when chat provider is reachable - A3: JudgeClient.create routes through gateway.chat (via __setChatTransportForTests) - A4: ChatResult.text → Anthropic.Message.content[0].text mapping - A5: empty text from gateway → graceful empty-text Anthropic.Message - A6: non-AIConfigError from gateway propagates to caller (no swallow) - A7: AIConfigError from gateway propagates as AIConfigError (caught per-transcript in production loop) - A8: makeJudgeClient returns null on unknown provider prefix - A9: returns a JudgeClient for non-anthropic providers without env-probing (delegates to gateway at call time) - R3: parsed-verdict SEMANTIC parity — gateway-routed and legacy SDK-shape JudgeClients produce same {worth_processing, reasons} given identical canned LLM text - R3 corollary: unparseable LLM output → both paths fall through to cheap-fallback verdict Codex flagged byte-identical-Anthropic.Message as a meaningless gate; R3 is parsed-verdict semantic parity instead. Mirror pattern of test/think-gateway-adapter.test.ts for cross-site consistency with the v0.35.5.0 runThink migration. * ci: guard against direct Anthropic SDK construction in gateway-routed files New scripts/check-gateway-routed-no-direct-anthropic.sh greps two guarded files (src/core/cycle/synthesize.ts and src/core/think/index.ts) for `new Anthropic()` constructor calls and runtime imports of @anthropic-ai/sdk. Type-only imports (`import type Anthropic from '@anthropic-ai/sdk'`) stay allowed because both files use Anthropic.Message / .MessageCreateParamsNonStreaming as adapter types. Comment lines (starting with `//` or ` *`) are excluded so historical references in JSDoc don't false-fire. Negative test in this commit's verification confirms: injecting `new Anthropic()` into synthesize.ts makes the guard exit 1 with a clear error pointing at the gateway adapter pattern; reverting restores the OK state. Wired into both `bun run verify` and `bun run check:all`. Closes the bug class that bit synthesize.ts in PR garrytan#1349 (which would have shipped a parallel fetch stack instead of routing through the canonical gateway). The same class previously bit think/index.ts and was fixed structurally in v0.35.5.0; this guard prevents either file from regressing. Extend GUARDED_FILES in the script when migrating another file off direct SDK construction. * docs(put_page): point Windows / pipe-buffer users at gbrain capture --file Extends the put_page op description (surfaced by `gbrain put --help`) with a one-line pointer to `gbrain capture --file PATH --slug SLUG` for the file- as-input use case. Capture (v0.39.3.0) is the canonical Windows-pipe-buffer escape route: reads files as a Buffer first, scans the first 8KB for NUL bytes to refuse binary content, decodes to UTF-8 only after the safety check, and adds provenance write-through. Lands the user-facing value the closed PR garrytan#1365 was reaching for, without duplicating the CLI surface. Credits the original contributor. Co-Authored-By: ecat2010 <90021101+ecat2010@users.noreply.github.com> * test: R1+R2+R4 critical regression pins for the community-PR-wave landing Per the wave's eng-review plan (IRON RULE — mandatory): R1 — get_page handler accepts calls without `content` param. Pre-wave PR garrytan#1365 landed its `!p.content → throw` check in the WRONG handler (get_page instead of put_page), which would have broken every read in the system. Pin: get_page MUST NOT require content + the schema carries no `content` or `file` param. R2 — put_page schema content stays `required: true`. PR garrytan#1365 also flipped `content` from required→optional in the schema. Pin: the contract stays at `required: true` + the closed PR's `file` param is NOT in the schema. R4 — Cross-platform stdin via fd 0 (PR garrytan#1325 regression pin). Source-grep asserts src/cli.ts uses `readFileSync(0, ...)` and NOT the legacy `readFileSync('/dev/stdin', ...)`. Belt-and-suspenders pattern assertions confirm the parseOpArgs branch shape (cliHints.stdin check, 5MB cap, isTTY gate) hasn't drifted. R3 (gateway-adapter parsed-verdict parity) lives in the sibling file test/cycle/synthesize-gateway-adapter.test.ts. * test(e2e): update dream-synthesize no-key reason text + harden hermeticity After T5's gateway-adapter rework, the "no API key" verdict text changed from 'no ANTHROPIC_API_KEY for significance judge' to 'no configured provider for verdict model: <model>' (broader + names the actual model so the user sees WHICH provider failed). Update both assertions that check the old text. Hermeticity bug fix in the same commit: `withoutAnthropicKey` previously only cleared the env var. After the rework, `makeJudgeClient` ALSO checks `loadConfig().anthropic_api_key` (same hasAnthropicKey() pattern think/index.ts uses since v0.35.5.0). If the developer running the test has the key set in ~/.gbrain/config.json, the test would behave non-deterministically. Fix: override GBRAIN_HOME to a fresh tmpdir for the duration of the body, restore on return (even on throw). * test(e2e): pin verdict-loop AIConfigError catch from T5 rework end-to-end Drives runPhaseSynthesize against a real PGLite engine with the gateway chat transport stubbed to throw AIConfigError on every call (simulates a revoked/misconfigured provider surfacing mid-run). Asserts: - Phase does NOT crash; converts the throw to a per-transcript verdict with worth=false and reasons[0] matching "gateway error: ...". - status='ok' so subsequent transcripts in the loop would continue being judged (not visible in 1-transcript test, but the loop shape is proven not to abort). Pre-rework (T5), this code path didn't exist — judgeSignificance threw directly to runPhaseSynthesize and crashed the whole phase. Pin so a future regression that removes the try/catch fires loudly. * docs(claude.md): annotate v0.41+ community-PR-wave changes Two additions to the Key files section: - src/core/cycle/synthesize.ts — appends a v0.41+ paragraph documenting the gateway-adapter rework (makeJudgeClient + AIConfigError catch loop + canonical config key + JudgeClient interface preserved + CI guard reference + test file references). - scripts/check-gateway-routed-no-direct-anthropic.sh — new entry documenting the CI guard's contract, scope, and how to extend GUARDED_FILES when migrating another file off direct SDK construction. CLAUDE.md drives /sync-gbrain and llms.txt generation; both need the wave's annotations to land BEFORE the llms regeneration step (T10). * docs(llms): regenerate llms.txt + llms-full.txt for v0.41+ wave Refreshes the auto-generated llms.txt bundles to pick up the CLAUDE.md annotations landed earlier in this wave (gateway-adapter synthesize.ts + check-gateway-routed-no-direct-anthropic.sh + the cherry-picked llama-server-reranker recipe). Pinned by test/build-llms.test.ts. * fix(providers): dynamic-width id column accommodates llama-server-reranker v0.40.6.1 introduced `llama-server-reranker` (21 chars), which overflowed formatRecipeTable's static 14-char PROVIDER column. When the id is longer than the column, padEnd is a no-op — the row starts with the tier name directly, no space delimiter. test/providers.test.ts 'each recipe appears at most once' iterates every recipe and asserts at least one row starts with `${id} ` or `${id} `; with no space after `llama-server-reranker`, the assertion fails and the recipe appears effectively missing from the human-readable list. Fix: compute column width dynamically as `max(14, max(id.length) + 1)` so every id is followed by at least one space, regardless of length. Also widens the separator rule to match. 14 stays as the floor so the existing short-id rows (openai 6, ollama 6, anthropic 9, ...) keep their familiar layout when llama-server-reranker isn't in the active recipe set. 10/10 cases in test/providers.test.ts pass after the fix. * chore: pre-landing review polish — refresh models doctor tip + file embed timeout TODO Two pre-landing review absorptions: - `src/commands/models.ts:154` — the help-text tip said `gbrain models doctor` "spends ~1 token per model" but the wave added an `embed(['probe'])` call AND a reranker probe. Generalize to "spends a minimal request per configured chat/embed/rerank surface" so the cost expectation matches reality. - `TODOS.md` — file a follow-up to widen `default_timeout_ms` from RerankerTouchpoint to EmbeddingTouchpoint so `probeEmbeddingReachability` doesn't hardcode 5000ms while the sibling reranker probe reads the recipe's configured timeout. Local CPU embedding endpoints (llama-server) hit the same cold-start curve as Qwen3-Reranker-4B; workaround today is "re-run the probe" per the existing JSDoc. Other informational findings from pre-landing review either match established patterns (no behavioral test for `probeEmbeddingReachability`, matching `probeRerankerReachability`), are intentional choices documented in JSDoc (the `as unknown as Anthropic.Message` cast), or are micro-perf in non-hot paths (autopilot's 4 sequential `getConfig` awaits per 5-minute tick). All non-blocking. * ci: tighten gateway-routed guard against import bypass shapes + honest JSDoc Adversarial review caught two soft spots in the wave's new contracts: 1. `scripts/check-gateway-routed-no-direct-anthropic.sh` only matched the default-import shape `import Anthropic from '@anthropic-ai/sdk'`. A future contributor (or, more realistically, a future refactor) could bypass with: - `import { Anthropic } from '@anthropic-ai/sdk'` - `import { Anthropic as A } from '@anthropic-ai/sdk'` - `import * as Anthropic from '@anthropic-ai/sdk'` - `const x = await import('@anthropic-ai/sdk')` Tightened the regex to match ANY value-shaped import from the SDK module (excluding only the explicit `import type ... from '@anthropic-ai/sdk'` form which the adapter's Anthropic.Message return type needs). Added a second grep for dynamic imports. Verified all four bypass shapes now trigger the guard against synthesize.ts; type-only import still passes. 2. `synthesize.ts:makeJudgeClient` JSDoc claimed the adapter "tolerates the array-of-blocks shape for future flexibility" — but the mapping flattens ONLY text blocks; `tool_use`, `tool_result`, image blocks silently become empty strings. Today only `judgeSignificance` calls this and it only sends string content, so no behavior bug. But the comment was marketing future flexibility the code doesn't deliver. Narrowed to call out the silent-drop and say to extend the mapping if a future caller wires non-text content through. Both wave-scope: the CI guard was added by the wave, the JSDoc was added by the wave's T5 rework. Adversarial review caught them before merge. * fix(models doctor): reranker probe timeout matches live search precedence chain Codex Pass-9 adversarial review caught a probe-vs-production divergence: production `hybridSearch` resolves reranker timeout via the full chain (per-call > config > recipe > bundle) by going through `loadSearchModeConfig + resolveSearchMode`, but `probeRerankerReachability` was reading ONLY the recipe's `default_timeout_ms` — so an operator who set `search.reranker.timeout_ms=1000` would see doctor wait 30s and report "reachable" while production search timed out at 1s and fail-opened. A higher configured timeout produces the opposite false failure (probe gives up at 5s when production would have waited longer). Fix: extract `resolveLiveRerankerTimeoutMs(engine)` parallel to the existing `resolveLiveRerankerModel(engine)` — same precedence chain, same DB-plane consistency posture. The probe now reads the SAME timeout live search reads, on the same lookup path. The codex P1 finding about `FREE_LOCAL_*_PROVIDERS` zero-pricing being bypassable via redirected `LLAMA_SERVER_BASE_URL` is filed as a TODO under community-pr-wave follow-ups — couples with the existing FREE_LOCAL_PROVIDERS unification TODO so both close in one v0.41+ PR. * ci(guard): handle mixed type+value imports + macOS BSD sed POSIX classes Codex structured review [P3] caught a bypass in the freshly-tightened gateway-routed guard: import { type Message, Anthropic } from '@anthropic-ai/sdk'; new Anthropic(); The previous regex `^\s*import\s+[^t][^y]*from ...` was meant to exclude `import type ...` but stops at the `y` in `type` inside the brace list, silently allowing the value-import `Anthropic` through. Two fixes: 1. Replace the brittle regex-based type-exclusion with a clause-level parse: extract the brace-list specifiers, allow the import iff EVERY non-empty specifier is `type`-prefixed. Catches mixed-import bypasses (`{ type Foo, Bar }`) while keeping all-type braces (`{ type Foo, type Bar }`) passing. Default + namespace imports remain always-value-shaped. 2. Replace `\s` with POSIX `[[:space:]]` in the sed extract — macOS BSD sed doesn't honor `\s` in extended-regex mode (it silently no-ops the pattern so `specifiers` comes back empty and the script falls through to the default/namespace branch's wrong error message). Hermetic 7-shape regression matrix now verifies every TypeScript import shape against the expected ALLOW/BLOCK verdict; all 7 pass: - ALLOW: `import type Anthropic from '...'` - ALLOW: `import type { Foo } from '...'` - ALLOW: `import { type Message, type Foo } from '...'` - BLOCK: `import { type Message, Anthropic } from '...'` - BLOCK: `import { Anthropic } from '...'` - BLOCK: `import Anthropic from '...'` - BLOCK: `import * as A from '...'` Subshell-trap fix in the same commit: the previous "exit 1 inside while-pipe" pattern doesn't propagate to the outer `$?` because the pipe spawns a subshell. Switched to a tmpfile-flagged sentinel so the verdict survives the subshell boundary cleanly. * chore: bump version and changelog (v0.41.4.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(audit-writer): route log() to file matching event ts, not real-now CI failure surfaced a time-dependent test flake in `test/audit/audit-writer.test.ts` "returns events from current week, filtered by ts cutoff" (added in v0.40.4.0 PR garrytan#1300). The test pinned synthetic `now = 2026-05-22T12:00:00Z` (ISO week 21), logged 3 events with synthetic ts values, then called `readRecent(7, now)` expecting to find 2 events in window. Root cause: `log()` ignored the caller-supplied `ts` for filename routing and ALWAYS wrote to the file matching real-time-now's ISO week. When real CI time crossed into 2026-W22 (this Monday), the events went to W22's file but `readRecent` walked W21 + W20 → 0 hits. Fix: - `log()` parses `event.ts` (when provided) and routes to the file matching that ts's ISO week. Falls back to real-now when ts is missing or unparseable. - No behavior change for production callers — none of the 5 audit consumers pass `ts` explicitly (rerank-audit, audit-slug-fallback, content-sanity-audit, graph-signals, supervisor-audit). The writer stamps real-now → both ts and filename use real-now → same file as before. - Sibling test "honors caller-supplied ts override" also pinned a fixed ts and would have broken from the opposite angle (test read from `computeFilename()` default = real-now). Updated to read from `computeFilename(new Date(fixedTs))` so it asserts the per-row file routing the wave now provides. 22/22 audit-writer cases pass. Production callers (5 sites) unchanged. Pre-existing on master since v0.40.4.0; surfaced when real time crossed into a different ISO week than the test's synthetic now. NOT introduced by this PR (garrytan#1377 community-PR-wave) — audit-writer files aren't touched by the wave. --------- Co-authored-by: Tobias <34135750+tobbecokta@users.noreply.github.com> Co-authored-by: kohai-ut <chris@tincreek.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: justemu <noreply@github.com> Co-authored-by: justemu <206393437+justemu@users.noreply.github.com> Co-authored-by: ecat2010 <90021101+ecat2010@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GBrain's search stops treating its link graph as wallpaper. Three small, additive ranking signals exploit edges the brain already has:
chat/segment — keep the highest, demote the restPlus the wave grew through review into three cathedral expansions:
gbrain search --explainformattercreateAuditWriterDefault on for
balanced+tokenmaxmodes. KNOBS_HASH bumps 3→4 so cache rows segregate cleanly across the upgrade. Off-by-default forconservative.Test Coverage
122 new test cases across 9 new + 4 extended test files. Coverage audit: 92% (38 logical branches, 35 fully tested, 3 minor gaps captured as TODOs). Coverage gate: PASS (above 80% target).
Test count: 1830 → 1839+ (+9 new test files in this wave).
Pre-Landing Review
7 findings — all resolved before push:
postFusionOptsnever setgraphSignalsEnabled→ entire feature was dead code in production. Fixed at commit 6f01fcb with new wire-integration test that grep-pins the literal.deleted_at IS NULLdefense-in-depth ongetAdjacencyBoosts(both engines).Adversarial Review (Codex + Claude, always-on)
HIGH findings — all fixed at commit 47ded98:
graph_signals: graphSignalsOnviaas anycast but SearchOpts had no field. Both branches resolved to mode default → gate could pass while detecting nothing. Fix: added typedSearchOpts.graph_signals, threaded into bothhybridSearchandhybridSearchCachedperCall opts, droppedas any.sessionPrefix()used "any shared parent" →people/alice+people/bobgot grouped and bob got demoted on every common entity-search query. Fix: narrowed to fire only on chat-session-shaped slugs (containschat/session/sessionsmarker ORYYYY-MM-DDdate segment). Entity dirs return null → diversification skips.TRUEand observability surfaces silently disagree with production. Fix: case-insensitive trim parity at all 3 sites.11 LOW findings → captured as v0.41+ TODOs (NaN guard, audit windowing, ANSI escape, source-scope JSDoc-only contract, score compounding on repeat invocation, etc.).
Eval Results
test/e2e/graph-signals-eval.test.tsships with 4 gates:All gates PASS on the bundled longmemeval-mini fixture.
pairedBootstrapPValueexported as a pure function with 5 dedicated tests for future calibration waves.Greptile Review
PR doesn't exist yet during this run — Greptile comments will surface after creation.
Plan Completion
9 DONE, 2 CHANGED (T4 audit collocated in
graph-signals.ts, T9 search-stats fire-rate metrics deferred to T-todo-2 per inline comment), 0 NOT DONE, 0 UNVERIFIABLE.CLAUDE.md Key Files annotations added in docs commit (69aef24).
Documentation
README.md, CLAUDE.md, llms-full.txt all updated by /document-release subagent. Committed at 69aef24.
CHANGELOG.md top entry is the canonical v0.40.4.0 ELI10-lead description.
TODOS
5 graph-signals follow-ups + 11 LOW adversarial findings + 1 pre-existing-master-flake all captured under
v0.40.4 adversarial review LOW findings — captured for v0.41+.Test plan
bun run verify— privacy + checks + typecheck (PASS)bun testparallel — 8469/8470 pass; 1 fail is pre-existing master flake (header-transport shard-ordering, confirmed via stash)bun test test/search test/e2e/graph-signals-* test/doctor.test.ts test/audit/audit-writer.test.ts(122/122 wave tests PASS)gbrain search "..." --explainon Garry's braingbrain doctorreportsgraph_signals_coverageok at >=30%🤖 Generated with Claude Code