v0.33.2.0 feat(search-lite): token budget + semantic query cache + intent weighting by garrytan-agents · Pull Request #897 · garrytan/gbrain

garrytan-agents · 2026-05-12T02:34:57Z

What

Three additive search features inspired by brain-kit. All backward-compatible: existing callers see identical behavior unless they opt in to the new options.

1. Token Budget Enforcement (`src/core/search/token-budget.ts`)

Cap the cumulative token cost of returned results so search payloads fit downstream context windows. Greedy top-down walk; preserves caller ordering; no re-rank. char/4 heuristic for token counting (no tokenizer dependency — keeps the bun --compile bundle small).

SearchOpts.tokenBudget — numeric cap. Default undefined = no-op.
HybridSearchMeta.token_budget = { budget, used, kept, dropped }
HTTP query op: pass token_budget param.

2. Semantic Query Cache (`src/core/search/query-cache.ts` + migration v52)

Cache search results keyed by query embedding similarity. HNSW lookup: embedding <=> $1 < 0.08 (cosine similarity ≥ 0.92). Per-source isolation. Per-row TTL (default 3600s). Best-effort writes; all errors swallowed so the cache never breaks the search hot path.

Migration v52 creates query_cache table. HALFVEC where pgvector ≥ 0.7; VECTOR otherwise. Embedding dim resolved from config.embedding_dimensions so non-OpenAI brains work.
New gbrain cache CLI: stats / clear --yes / prune.
Config keys: search.cache.enabled / search.cache.similarity_threshold / search.cache.ttl_seconds.
HybridSearchMeta.cache = { status, similarity?, age_seconds? }
Routed through new hybridSearchCached(engine, query, opts) wrapper. The query op in operations.ts now uses this wrapper so MCP/CLI calls benefit automatically. Skipped for two-pass walks + non-default embedding columns where cache semantics don't hold.

3. Zero-LLM Intent Weighting (`src/core/search/intent-weights.ts`)

Builds on the existing 4-intent classifier (src/core/search/query-intent.ts). New weight-adjustment layer applies subtle per-intent nudges:

Intent	Adjustment
entity	boost keyword RRF + exact slug/title match boost
temporal	default `recency='on'` when caller left it unset
event	boost keyword RRF (rare named entities) + soft recency
general	no-op (1.0 multipliers)

All adjustments are subtle (max 1.25×). Caller-explicit options always win — intent weighting never silently overrides explicit recency / salience. Default ON; opt out via opts.intentWeighting = false. The LLM query expansion path is still available and opt-in via opts.expansion = true — it just isn't the default anymore.

HybridSearchMeta.intent now surfaces classifier output for debugging.

Files changed (13)

 src/cli.ts                                |  +13 -1
 src/commands/cache.ts                     | +106 NEW
 src/core/migrate.ts                       | +104 NEW (migration v52)
 src/core/operations.ts                    |  +11 -2
 src/core/search/hybrid.ts                 | +281 -7
 src/core/search/intent-weights.ts         | +132 NEW
 src/core/search/query-cache.ts            | +321 NEW
 src/core/search/token-budget.ts           | +110 NEW
 src/core/types.ts                         |  +57 NEW
 test/hybrid-search-lite.serial.test.ts    | +163 NEW
 test/intent-weights.test.ts               | +123 NEW
 test/query-cache.test.ts                  | +247 NEW
 test/token-budget.test.ts                 | +125 NEW

Test results

44 new tests pass across the 4 new test files
- test/token-budget.test.ts — 10 tests (pure module)
- test/intent-weights.test.ts — 13 tests (pure module)
- test/query-cache.test.ts — 12 tests (PGLite, including migration v52 schema verification + TTL expiration + source isolation)
- test/hybrid-search-lite.serial.test.ts — 9 tests (PGLite e2e through hybridSearchCached)
105 existing search tests still pass (test/search.test.ts, test/query-intent*.test.ts, test/hybrid-meta.test.ts, test/search-limit.test.ts, test/search-lang-symbol-kind.test.ts, test/search-image-column.test.ts)
162 tests total pass when run together (no cross-file isolation issues)
bun run verify clean (privacy + jsonb + progress + test-isolation + wasm + admin-build + admin-scope-drift + cli-exec + system-of-record + tsc --noEmit)
bun run src/cli.ts doctor --fast runs successfully — pre-existing resolver_health / minions_migration failures on master are unrelated to this PR.

Design notes

Backward compat is the contract. Token budget defaults to off. Cache defaults on but cache miss = normal search. Intent weighting defaults on but adjusts weights at most 1.25× and never overrides caller-explicit options.
No new dependencies. Token counting uses char/4 (js-tiktoken would balloon the bun --compile bundle). Migration uses the same HALFVEC/VECTOR pgvector-version probe pattern as v45 (facts) and v40, so it works on both Postgres and PGLite without extra config.
System-of-record clean. The query_cache table is a derived cache, not authoritative state. gbrain cache clear is the documented invalidation surface; the system-of-record check passes.

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

…ting Adds three additive features to the hybrid search pipeline. All backward-compatible: existing callers see identical behavior unless they opt in to the new options. ## 1. Token Budget Enforcement (src/core/search/token-budget.ts) Cap the cumulative token cost of returned results so search payloads fit downstream context windows. Greedy top-down walk; preserves caller ordering; no re-rank. char/4 heuristic for token counting (no tokenizer dependency \u2014 keeps the bun --compile bundle small). SearchOpts.tokenBudget \u2014 numeric cap. Default undefined = no-op. HybridSearchMeta.token_budget = { budget, used, kept, dropped } HTTP query op: pass `token_budget` param. ## 2. Semantic Query Cache (src/core/search/query-cache.ts + migration v52) Cache search results keyed by query embedding similarity. HNSW lookup: `embedding <=> $1 < 0.08` (cosine similarity >= 0.92). Per-source isolation so multi-source brains don\u2019t bleed. Per-row TTL (default 3600s). Best-effort writes; all errors swallowed so the cache never breaks the search hot path. Migration v52 creates query_cache table with HALFVEC where pgvector >= 0.7; falls back to VECTOR with the resolved config.embedding_dimensions dim. New `gbrain cache` CLI: stats / clear --yes / prune. Config keys: search.cache.enabled / similarity_threshold / ttl_seconds. HybridSearchMeta.cache = { status, similarity?, age_seconds? } Routed through new `hybridSearchCached(engine, query, opts)` wrapper; the operations.ts query op now uses this wrapper so MCP/CLI calls benefit automatically. Skipped for two-pass walks + non-default embedding columns where cache semantics don\u2019t hold. ## 3. Zero-LLM Intent Weighting (src/core/search/intent-weights.ts) Builds on the existing query-intent classifier (4 intents: entity / temporal / event / general). New weight-adjustment layer applies subtle per-intent nudges: entity \u2192 boost keyword RRF + exact slug/title match temporal \u2192 default recency=on when caller left it unset event \u2192 boost keyword RRF (rare named entities) + soft recency general \u2192 no-op (1.0 multipliers everywhere) All adjustments are SUBTLE (max 1.25x). Caller-explicit options ALWAYS win \u2014 intent weighting never silently overrides recency / salience. Default ON; opt out via `opts.intentWeighting = false`. LLM query expansion (expansion.ts) is still available and opt-in via `opts.expansion = true` \u2014 it just isn\u2019t the default anymore. HybridSearchMeta.intent now surfaces classifier output for debugging. ## Tests test/token-budget.test.ts (10 tests, pure module) test/intent-weights.test.ts (13 tests, pure module) test/query-cache.test.ts (12 tests, PGLite) test/hybrid-search-lite.serial.test.ts (9 tests, PGLite e2e) Plus 105 pre-existing search tests still pass. `bun run verify` clean. Co-authored-by: Wintermute <agents@garrytan.com>

Resolved single conflict in src/core/migrate.ts: master claimed v52/53/54 (eval_contradictions_cache + eval_contradictions_runs + cjk_wave). PR garrytan#897's v52 query_cache migration renumbered to v55 to sit after. Typecheck clean.

…ybridSearch Three named modes (conservative / balanced / tokenmax) that bundle the search-lite knobs from PR garrytan#897 into a single config key. Mode resolution lives in bare hybridSearch (NOT just the cached wrapper) so eval-replay and eval-longmemeval — which call bare hybridSearch — test the same mode-affected behavior as production. See [CDX-5+6] in the plan. The mode bundle supplies DEFAULTS for intentWeighting, tokenBudget, expansion, and searchLimit when the caller leaves those undefined. Per-call SearchOpts and per-key config overrides still win (matches the v0.31.12 model-tier resolution chain at model-config.ts:resolveModel). knobsHash() exposes a stable SHA-256 of the resolved knob set; the cache contamination hotfix (next commit) consumes it to prevent a tokenmax write from being served to a conservative read. Three new fields on HybridSearchMeta: - mode (resolved mode name) - existing token_budget meta now fires from bare hybridSearch too Bare hybridSearch now applies tokenBudget at all three return paths (no-embedding-provider, keyword-only-fallback, main). Previously only hybridSearchCached enforced budget; eval commands missed it. Tests: 37 unit cases pin the 3x7 bundle table cell-by-cell, the resolution chain semantics, knobs hash determinism + cross-mode separation, and the config-table parser. All 72 search-lite tests pass. Bisect-friendly: this commit ONLY adds mode resolution. The cache-key contamination hotfix [CDX-4] is a separate atomic commit (next).

PR garrytan#897's query_cache keyed rows on sha256(source_id::query_text) only. A tokenmax search (expansion=on, limit=50) populated a row that a subsequent conservative call (no expansion, limit=10) read back, serving the wrong-shape results. This is a real bug in PR garrytan#897 today, regardless of the v0.32.3 mode picker work — Codex caught it in plan review. Fix: - Migration v56 adds query_cache.knobs_hash TEXT column + composite (source_id, knobs_hash, created_at) index. Existing rows have NULL knobs_hash and are excluded from lookups (silently re-populated with the right hash on first hit — no orphan data, no destructive migration). - cacheRowId(query, source, knobsHash) — knobsHash now part of the PK so a tokenmax write and a conservative write for the same (query, source) land in distinct rows. - SemanticQueryCache.lookup({knobsHash}) filters WHERE knobs_hash = $. - SemanticQueryCache.store({knobsHash}) writes the resolved hash. - hybridSearchCached threads knobsHash from resolveSearchMode through every cache call. Cache config (enabled/threshold/TTL) now reads from the resolved mode bundle, not directly from the config table. Tests (test/query-cache-knobs-hash.test.ts, 11 cases): - cacheRowId bifurcates by knobsHash - Tokenmax write does NOT contaminate conservative lookup - Three modes coexist as distinct rows for same query - Legacy NULL-knobs_hash rows are excluded from lookup - Same-mode write updates in place (no duplicate rows) All 58 cache + mode tests pass. Migration v56 applies cleanly on a fresh PGLite brain. Bisect-friendly: this commit is the cache-key hotfix alone. Mode resolution wiring lives in the previous commit.

…able Migration v57 creates search_telemetry (date, mode, intent, count, sum_results, sum_tokens, sum_budget_dropped, cache_hit, cache_miss, first_seen, last_seen). PK (date, mode, intent) caps growth at ~4380 rows/year. Sums + counts only — averages derive at read time so concurrent ON CONFLICT writes from multiple gbrain processes accumulate correctly [CDX-17]. In-memory bucket flushed periodically (60s OR 100 calls) + on process beforeExit/SIGINT/SIGTERM with a 2-second cap. The search hot path NEVER waits on this write [D2, CDX-19]. Date-bucketed cache_hit / cache_miss columns make hit rate over --days N derivable [CDX-18]. query_cache.hit_count is a lifetime counter and can't be sliced by window. Wired into bare hybridSearch via emitMeta: every search call sync-bumps a bucket. flush() drains atomically by swapping the map before SQL writes so a record() during flush lands in the new map. readSearchStats(engine, {days}) returns the StatsWindow shape that gbrain search stats consumes (next commit). Tests: 16 unit cases pin record/flush/read semantics including ON-CONFLICT-adds-raw-values, concurrent-flush coalescing, cache hit-rate math, missing-table graceful degradation, and window clamping. 53 migrations apply on a fresh PGLite brain.

…+8+9] CDX-8: gbrain config has no unset path today. Required before `gbrain search modes --reset` can clear search.* overrides. - BrainEngine.unsetConfig(key) → returns rows deleted (0|1) - BrainEngine.listConfigKeys(prefix) → exact-literal prefix match with LIKE-escape on user-supplied % / _ / \ characters - PGLiteEngine + PostgresEngine implementations - `gbrain config unset <key>` and `gbrain config unset --pattern <prefix>` sub-subcommands CDX-9: readLine has no EOF detection or timeout. Mode-picker plan calls out "TTY closes mid-prompt → defaults to balanced" but the raw helper hangs forever. New readLineSafe(prompt, defaultValue, timeoutMs=60s): - Returns defaultValue on stdin 'end' event - Returns defaultValue on timeout - Returns defaultValue on empty Enter - Non-TTY stdin returns defaultValue immediately (e2e safe) - Returns trimmed user input otherwise Exported so install picker (next task) can use it. Tests: 9 cases pin unset semantics + prefix matcher edge cases (glob-wildcard escape, sort order, idempotent loop, search.* sweep). All 53 migrations apply on a fresh PGLite brain.

Install picker (src/commands/init-mode-picker.ts): - Runs as a phase inside `gbrain init` AFTER engine.initSchema() so DB config writes work [CDX-7]. - Idempotent: skipped on re-init if search.mode is already set. - Smart auto-suggestion via recommendModeFor() reads models.tier.subagent / models.default / OPENAI_API_KEY: * Opus default/subagent → tokenmax (quality ceiling) * Haiku subagent → conservative (4K budget keeps cost down) * No OpenAI key → conservative (no LLM expansion possible) * Sonnet / unknown → balanced (safe default) - TTY shows menu via readLineSafe (60s timeout, defaults on EOF/empty). - Non-TTY auto-selects + emits operator hint: [gbrain] search mode: X (auto-selected — reason) [gbrain] To change: gbrain config set search.mode <...> - --json mode emits structured `{phase: 'search_mode_picker', ...}` event. - Wired into both initPGLite and initPostgres flows. Upgrade banner (src/commands/upgrade.ts): - One-shot stderr banner in runPostUpgrade. - State persisted via config key `search.mode_upgrade_notice_shown=true` — fires at most once per install. - Copy corrected per [CDX-1+2+3]: production query op STILL defaults expand=true and limit=20. The banner reframes from "behavior is regressing" to "named modes available + here's how to preserve exact current shape." Tests (test/init-mode-picker.test.ts, 16 cases): - recommendModeFor heuristic for all 4 input shapes - parseModeInput accepts numeric/named/case-insensitive, rejects garbage - runModePicker non-TTY auto-selects + writes config - Idempotent + --force re-prompt + JSON output - Opus → tokenmax, Haiku → conservative real wiring through engine

Three sub-subcommands mirroring the gbrain models (v0.31.12) shape: gbrain search modes [--json] Read-only routing dashboard. Shows the three mode bundles, the active mode, and the source of every resolved knob: cache_enabled = true [override: search.cache.enabled] tokenBudget = 4000 [mode: conservative] Plus knob descriptions for legibility. gbrain search modes --reset [--source <mode>] Clears every search.* override (NOT search.mode itself). Preserves the upgrade-notice state key. --source <mode> is a dry-run that lists what --reset would change without writing — the paved path [CDX-8] flagged as missing. gbrain search stats [--days N] [--json] Observability. Reads the search_telemetry rollup over the window (clamps to [1, 365]). Prints cache hit rate, mode mix, intent mix, budget drops, avg results/tokens. JSON output includes _meta.metric_glossary block per [CDX-25]. gbrain search tune [--apply] [--json] Recommendation engine. 5 rules cover the bug class: - Insufficient data → "no_recommendations" status - Conservative + high budget-drop rate → suggest balanced - High cache hit rate (>85%) → suggest similarity threshold bump - Tokenmax + Haiku subagent → suggest balanced (cost mismatch) - Cache disabled but stats show usage → suggest re-enabling --apply mutates config via setConfig / unsetConfig with a paste-ready revert command printed at the end. Registered in src/cli.ts dispatch table. 17 unit cases pin: - Dashboard report shape + per-knob source attribution - --reset preserves search.mode + notice key - --source dry-run never writes - stats reads telemetry rollup; --days clamps - tune recommendation rules fire on real telemetry data - --apply mutates config - --help + unknown subcommand exit codes

… guard Single source of truth at src/core/eval/metric-glossary.ts. Every entry carries 3 fields: - industry_term (canonical IR/NLP literature name, preserved verbatim) - eli10 (plain-English a 16-year-old can follow) - range (numeric range + interpretation) Covers 4 metric families: - Retrieval: P@k, R@k, MRR, nDCG@k - Stability: Jaccard@k, top-1 stability - Statistical: p-value (paired bootstrap + Bonferroni), 95% CI - Operational: cache hit rate, avg results/tokens, cost per query, p99 latency Public surface: - getMetricGloss(metric) → full entry or null - eli10For(metric) → plain-English string or null - buildMetricGlossaryMeta(metrics[]) → {metric → eli10} record for JSON `_meta.metric_glossary` blocks per [CDX-25]. ONE block per response, NOT sibling `_gloss` fields on every metric. - renderMetricGlossaryMarkdown() → deterministic Markdown for the doc Auto-generation: scripts/generate-metric-glossary.ts emits docs/eval/METRIC_GLOSSARY.md. Deterministic (same input → same bytes) so the CI guard can diff. CI guard: scripts/check-eval-glossary-fresh.sh regenerates into a temp file and diffs against the committed doc. Out-of-date doc fails the build. Wired into `bun run verify` (and therefore `bun run test:full`). Tests (test/metric-glossary.test.ts, 18 cases): - Every documented metric is present - Every entry has all 3 required fields - Accessors return null on unknown metrics (no throw) - buildMetricGlossaryMeta silently drops unknown metrics - renderer output is deterministic across calls - Renderer groups metrics into 4 sections docs/eval/METRIC_GLOSSARY.md: 5491 bytes, 124 lines, fresh.

src/core/eval/drift-watch.ts — curated retrieval watch-list [CDX-6]. Five patterns covering the surface that actually affects retrieval quality: - src/core/search/ (search pipeline) - src/core/embedding.ts (embedding shape) - src/core/chunkers/ (chunk granularity) - src/core/ai/recipes/anthropic.ts + openai.ts (expansion + embed routing) - src/core/operations.ts (the query op definition) Adding to the list is a deliberate act — requires a CHANGELOG line so coverage grows on purpose, not by accident. Pure functions: - matchesWatchPattern(path) — trailing-slash = prefix, bare = equality - filesDriftedSince(repoRoot, sha?) — git diff --name-only wrapper - watchedFilesDrifted(repoRoot, sha?) — composite src/commands/doctor.ts — two new checks. checkSearchMode [CDX-20]: status stays 'ok' (never warns, never docks health score). Hint in message field. Three branches: - unset → "search.mode is unset (using balanced fallback). Run `gbrain search modes` to see what is running and pick a mode." - mode + no overrides → "Mode: X (no per-key overrides — mode bundle is canonical)." - mode + overrides → "Mode: X with N per-key override(s) (k1, k2, …). To consolidate to the pure mode bundle: gbrain search modes --reset" Upgrade-notice state key (search.mode_upgrade_notice_shown) is excluded from the override roster — it's not a knob. checkEvalDrift [CDX-6]: surfaces uncommitted changes to retrieval-watched files. Always 'ok'; operator-facing reminder. Names up to 3 drifted files in the message + paste-ready re-eval command. Both helpers exported (was: file-private) so tests can pin behavior without walking the full runDoctor pipeline. Tests: 12 drift-watch cases + 7 doctor-check cases. Pin watch-list shape, prefix-vs-equality matcher semantics, missing-repo graceful failure, and all three search_mode branches.

Per-mode --mode flag plumbed into: - gbrain eval longmemeval --mode <conservative|balanced|tokenmax> Sets search.mode in the benchmark brain's config table; config is in PRESERVE_TABLES so resetTables doesn't wipe it between questions. Mode surfaces in the per-question NDJSON row. - gbrain eval replay --mode <m> + --compare-limit N --compare-limit forces a constant K across modes [CDX-13]; without it, Jaccard@k against the captured baseline measures K-drift, not quality. Mode is set once before the replay loop. - NOT cross-modal per [CDX-11]: cross-modal scores OUTPUT against TASK; it doesn't retrieve. Adding --mode there is theater. New: gbrain eval run-all orchestrator (src/commands/eval-run-all.ts): - Sweeps every requested mode × suite combination - Sequential default per D9; --parallel N opt-in (clamped to mode count) - Cost guard with split caps [CDX-15+16]: --budget-usd-retrieval N (default $5) --budget-usd-answer N (default $20) Non-TTY refuses with exit 2 unless --yes AND explicit --budget-usd-* flags pass. TTY refuses without --yes (defense against agent loops). - estimateRunCost computes per-(suite,mode) breakdown including the expansion-Haiku surcharge for tokenmax. - Audit trail: appends to <repo>/.gbrain-evals/eval-results.jsonl [CDX-23]. Personal brain (~/.gbrain) NEVER touched. - v0.32.3 ships orchestrator + argv + guard + persist hook. In-process per-suite invocation is a v0.32.4 follow-up (operator runs the per-suite CLIs with the documented --mode flag for now; each completion calls persistRunRecord to log). New: gbrain eval compare report (src/commands/eval-compare.ts): - Reads eval-results.jsonl, groups by (suite, mode), renders MD or JSON - Most-recent (suite, mode, commit) wins when duplicates exist - JSON output has schema_version=2 + _meta.metric_glossary block per [CDX-25] (ONE block per response, not sibling _gloss fields) - _meta.methodology field names the paired-bootstrap + Bonferroni discipline per [CDX-14] so haters can reproduce - Missing file → friendly hint pointing at `gbrain eval run-all` Wired into eval dispatch table in src/commands/eval.ts. Metric glossary fuzzy fallback: `recall@10` → `recall@k` lookup (the glossary documents the family; report rows carry specific K values). Routes through getMetricGloss for every call site. Tests (42 cases total — all green): - eval-run-all.test.ts (19): argv parser, cost estimate, guard semantics for all 4 (over/under × tty/non-tty) shapes, persist hook NDJSON shape. - eval-compare.test.ts (5): JSON + MD output shapes, glossary integration, missing-file graceful, mode filter, most-recent-wins. - metric-glossary.test.ts (18): unchanged but updated assertions to cover the fuzzy `@N` → `@k` fallback. Pre-existing eval-replay / eval-longmemeval / eval-export / eval-prune tests (42 cases) still pass — --mode + --compare-limit are additive.

docs/eval/SEARCH_MODE_METHODOLOGY.md — haters-immune 8-section template. Documents what the eval measures + does NOT measure, datasets + sizes (LongMemEval n=500, Replay n=200, BrainBench n=1240 docs / 350 qrels), random seed 42, run procedure verbatim, threats to validity (LongMemEval English+technical skew, char/4 heuristic ~5-10% off, expansion ~97.6% relative lift on this corpus), per-question raw outputs, pre-registered expectations (tokenmax wins R@10 by 5-15pp, conservative wins cost by 5-15x, balanced lands within 3pp), re-run cadence anchored to the src/core/eval/drift-watch.ts watch-list. Statistical-significance section pins paired bootstrap with 10,000 resamples + Bonferroni correction across 3 modes × 4 metrics [CDX-14]. CLAUDE.md gets two new sections: ## Search Mode (3-mode table + resolution chain + [CDX-4] cache contamination fix note + CLI commands) and ## Eval discipline (single-source-of-truth glossary, methodology doc, eval_results in repo NOT personal brain per [CDX-23]). README.md Quick Start gets a paragraph naming the install picker, mode heuristic, and the methodology link. skills/conventions/search-modes.md NEW — convention file consumed by brain-ops + query + signal-detector skills via the existing `> **Convention:**` callout pattern. Routes "what mode" / "tune retrieval" / "compare modes" queries to the right CLI surface. skills/RESOLVER.md gets two new trigger rows pointing at gbrain search * and gbrain eval compare.

bun run build:llms — picks up the new CLAUDE.md sections (Search Mode + Eval discipline) and the docs/eval/SEARCH_MODE_METHODOLOGY.md addition. build-llms.test.ts gate now passes.

… flow The v0.32.3 search_mode + eval_drift helpers were inserted into the DB-checks sub-helper at runDbChecks (line 345-355), but runDoctor itself maintains its own check list and only calls the helpers' subset. Push the two checks into the main runDoctor path (after the existing sync_freshness check at line 2347) so they actually appear in `gbrain doctor --json` output. Both checks gated on engine !== null. Progress reporter heartbeat fires for each. Both still return status 'ok' per [CDX-20] so health score is preserved. Verified end-to-end on a real Postgres brain: gbrain doctor --json now includes 'search_mode' and 'eval_drift' in the checks array.

Two root causes for the hang, both fixed. 1. DATABASE_URL leak in claw-test scripted harness The harness inherits the parent process's env via `...process.env` for every phase child (init / import / query / extract / doctor). When the e2e runner sets DATABASE_URL (for OTHER e2e tests), it leaks into claw-test's children. `loadConfig` at src/core/config.ts:143 then flips inferredEngine to 'postgres' for every subsequent phase, breaking the hermetic-PGLite-tempdir contract: phases race against each other on a shared test Postgres while pointing at different brain states. Fix: strip DATABASE_URL + GBRAIN_DATABASE_URL from the child env before forwarding. Re-apply GBRAIN_HOME / GBRAIN_FRICTION_RUN_ID after the merge so a parent's override can't win. The harness is PGLite-only by design. 2. Telemetry beforeExit deadlock v0.32.3's recordSearchTelemetry installed a `process.on('beforeExit', drainOnExit)` hook that wrapped the flush in `Promise.race([flush(), setTimeout(2000)])`. beforeExit fires when the event loop empties, but the hook enqueued NEW async work (the race's setTimeout + pending flush), so the event loop never re-emptied. Short-lived CLI invocations (`gbrain query "the"` finishing in ~100ms) ended up waiting on the DB write indefinitely. The claw-test harness spawns several short-lived gbrain queries. Each one hung after its real work finished. The harness then waited forever on its child subprocess's exit code. Fix: drop the beforeExit + SIGINT + SIGTERM hooks. Per [CDX-19]'s "stats are directional, not exact" contract, losing one unflushed bucket on process exit is acceptable. The unref'd setInterval handles long-running processes (HTTP MCP, autopilot, jobs work). Short-lived CLI invocations exit immediately. Verified: - `gbrain query "the"` on a fresh PGLite brain exits in <1s (was hanging forever). - `bun test test/e2e/claw-test.test.ts` → 3 pass / 0 fail / 3.86s (was hanging at the banner indefinitely). - 85/85 e2e files / 574/574 tests pass including claw-test, with DATABASE_URL set (the configuration that originally repro'd the hang). - 6235/6235 unit tests pass. - Typecheck clean. The two bugs interacted: the DATABASE_URL leak meant queries hit the real Postgres (slow), making the beforeExit deadlock visible. Fixing either alone would have masked the other. Both fixed in this commit.

…docs The install picker already asks explicitly (1/2/3 menu, default to the recommendation on Enter). What was missing: a way to reason about the cost tradeoff. Without numbers, "tokenmax" looks free and "conservative" sounds restrictive; with numbers, the operator picks intentionally. Cost anchors added everywhere the user encounters the mode choice: - Install picker MENU_TEXT (gbrain init) - Upgrade banner (gbrain upgrade post-upgrade) - CLAUDE.md ## Search Mode section - README.md Quick Start - docs/eval/SEARCH_MODE_METHODOLOGY.md (with the math) Anchors at Sonnet 4.6 downstream ($3/M input): conservative ~$0.012/query ~$12/mo @ 1K ~$1,200/mo @ 100K balanced ~$0.030/query ~$30/mo @ 1K ~$3,000/mo @ 100K tokenmax ~$0.060/query ~$60/mo @ 1K ~$6,000/mo @ 100K Plus tokenmax's Haiku expansion overhead: ~$1.50 per 1K queries on top. Cache hits roughly halve these on a brain with repeat-query traffic. The math is documented in SEARCH_MODE_METHODOLOGY.md so a reviewer can audit each variable (T = ~400 tokens/chunk from the recursive chunker's 300-word target; N = `searchLimit` cap; R = downstream model rate from src/core/anthropic-pricing.ts). Drift away from these numbers requires updating CLAUDE.md + the picker + the methodology doc in lockstep — a regression test pins the picker's anchor strings to enforce this. The framing also names the cost rule honestly: the dominant cost isn't gbrain (semantic cache is free; Haiku expansion is rounding-error). It's the downstream agent reading retrieved chunks back into its context. Operators who don't realize this pick badly. Tests: 5 new regression cases in init-mode-picker.test.ts pin every cost string in MENU_TEXT. Total 21/21 picker tests pass; 6240/6240 unit tests pass; verify gate green.

The per-query cost framing in the picker (~$0.012/$0.030/$0.060) is honest but theoretical — it treats each search as an isolated billable event. Real agent loops amortize a lot of context across turns via Anthropic prompt caching, so the per-query 5x ratio doesn't translate 1:1 into total agent spend. Added a "Realistic-scale anchor" section to SEARCH_MODE_METHODOLOGY.md representing one heavy power-user agent loop running tokenmax: - ~860 turns/mo (~29/day, one active agent) - ~900K tokens/turn (system + tools + history + reasoning + search) - ~$0.85/turn → ~$700/mo total agent spend at tokenmax - ~88% Anthropic prompt-cache hit rate Scaling balanced + conservative DOWN from that anchor: - tokenmax → ~$700/mo, search ~22% of total spend - balanced → ~$620/mo, search ~12% (saves ~$78/mo vs tokenmax) - conservative → ~$575/mo, search ~5% (saves ~$124/mo vs tokenmax) Honest takeaway: at realistic agent-loop scale WITH disciplined prompt caching, mode choice saves 10-20% of total agent spend, not 5x. The per-query math kicks back in for setups WITHOUT cache discipline (churn the prompt prefix every turn → search payload becomes a larger fraction). Both framings live in the doc. CLAUDE.md ## Search Mode gets a forward-pointer paragraph naming the "per-query math vs real-world spend" delta so agents reading the section find the methodology footnote. Numbers in the doc are anonymized + scaled away from any specific deployment. No model names, no specific dollar figures from a real production setup — just the per-turn / cache-hit-rate / search-count shape ratios that a thoughtful operator can validate against their own billing dashboard.

Previous version showed mode costs assuming Sonnet-only downstream. That muted the spread to 5x and made mode choice look minor. Reality: the downstream model tier is the BIGGER cost lever — pairing mode with model is where the 25x spread lives. New 3×3 matrix in the install picker, CLAUDE.md, methodology doc, README: Haiku 4.5 Sonnet 4.6 Opus 4.7 ($1/M input) ($3/M input) ($5/M input) conservative $400/mo $1,200/mo $2,000/mo balanced $1,000/mo $3,000/mo $5,000/mo tokenmax $2,000/mo $6,000/mo $10,000/mo (per-query cost @ 100K queries/mo, full search payload, no cache savings) The methodology doc gets a new "Mode × Model matrix" section above the realistic-scale anchor with concrete right-sizing guidance: - tokenmax + Haiku: wrong direction. Haiku can't filter 50 chunks → noise not signal. Pay Haiku rates, get sub-Haiku quality. - conservative + Opus: wasted Opus. 200K context window starved on retrieval depth. Pay Opus rates, get conservative-shape retrieval. - Natural pairings span ~4x; the matrix corners span 25x. The natural diagonal is where most users should land. Realistic-scale anchor refreshed: - tokenmax + Opus: ~$700/mo at 860 turns - balanced + Sonnet: ~$430/mo - conservative + Haiku: ~$170/mo Plus a "mismatched pairings" section showing the math for tokenmax+Haiku and conservative+Opus — both burn budget for no improvement. Regression test updated: pins the 25x framing + the four anchor cells (two corners + two diagonal mids) + the three downstream model rates. 22/22 picker tests pass. 6241/6241 unit tests pass. CI guards green.

… single user) Most users running gbrain are single-user installs at ~10K queries/month, not the 100K fleet-scale used in the original matrix. The picker numbers ($400 to $10,000/mo) looked alien to the actual audience. Rescaled to 10K with an explicit linear-scaling callout. New matrix in picker, CLAUDE.md, README, methodology doc: Haiku 4.5 Sonnet 4.6 Opus 4.7 ($1/M) ($3/M) ($5/M) conservative $40/mo $120/mo $200/mo balanced $100/mo $300/mo $500/mo tokenmax $200/mo $600/mo $1,000/mo Still 25x corner-to-corner. Still 4x natural-diagonal spread. But now in numbers a single user picks up and reasons about: "balanced + Sonnet at $300/mo, that's fine" or "tokenmax + Opus at $1,000/mo, that's a deliberate choice for max-quality high-stakes work." Every surface updated: - Install picker MENU_TEXT (with "scales linearly — multiply by 10 for 100K/mo" footnote so heavier users still see their number) - CLAUDE.md ## Search Mode table + scaling prose - README Quick Start - methodology doc Mode × Model matrix section - upgrade banner (post-upgrade notice) Regression test updated: pins the 3 new anchor cells ($40, $300, $1,000) + the 10K/mo volume frame + the linear-scaling callout. 23/23 picker tests pass, 6241/6241 unit tests pass, verify gate green. Methodology doc's existing 1K/10K/100K Monthly cost breakdown tables left intact (they already show the linear scaling explicitly).

…NT] directive DX gap: an agent installing gbrain (OpenClaw, Hermes, Codex, Cursor) ran gbrain init non-TTY, saw 2 stderr lines flash by, and silently auto-applied a default search mode. The operator never saw the cost matrix or the choice. At 25x corner-to-corner cost spread, that's surprise-spend territory. Five surfaces fixed: 1. **Auto-suggest default flipped balanced → tokenmax.** The Sonnet/unknown fallback now recommends tokenmax (preserves v0.31.x retrieval shape: expand=on, generous result set). Haiku subagent → conservative still wins (cost-sensitive signal). No-OpenAI-key → conservative still wins (vector search not possible). Heuristic reordered: Haiku check now fires BEFORE the Opus check, because a Haiku subagent loop signalling cost sensitivity should win over a default-model heuristic. 2. **gbrain init non-TTY output rebuilt.** Previously: 2 stderr lines. Now: the full 3×3 cost matrix + an explicit [AGENT] directive block telling the agent to relay the matrix to its operator before continuing. Includes a pointer to INSTALL_FOR_AGENTS.md Step 3.5 for the full protocol. 3. **gbrain upgrade banner same treatment.** Existing v0.32.3 banner now includes [AGENT] directive at the top so upgrading agents relay the matrix to their operator instead of silently accepting v0.31.x → v0.32.x default-applied behavior. 4. **INSTALL_FOR_AGENTS.md Step 3.5 NEW** with the matrix verbatim, the exact paraphrasable ask-the-user wording, and the gbrain config set commands to run after the operator picks. Plus a paragraph in the Upgrade section pointing back at Step 3.5. 5. **AGENTS.md install checklist** gets a new Step 4 ("STOP — ask the user about search mode") between init and the rest of the flow. The agent's job description now explicitly says: silent acceptance is the wrong default. Tests (24/24 pass): - Updated recommendModeFor heuristic order (Haiku floor > Opus default) - New regression test: non-TTY output contains the matrix corners + [AGENT] directive + INSTALL_FOR_AGENTS.md pointer - withEnv() helper used for OPENAI_API_KEY mutation (test-isolation lint) - Default-recommendation tests updated: Sonnet / unknown → tokenmax Privacy + test-isolation gates clean. 6256/6256 unit tests pass.

* upstream/master: v0.35.1.0: embedder shootout prereqs (pricing + gateway export + --resume-from) (garrytan#1055) v0.35.0.0 feat: ZeroEntropy zembed-1 + zerank-2 reranker (garrytan#1008) v0.34.4.0 fix(embed): cursor-paginated --stale hardening wave (D2/D3/D4/D6/D7/D8 + regression test) (garrytan#991) v0.34.3.0 fix: supervisor treats code=0 watchdog exits as crashes (garrytan#1003) v0.34.2.0 fix(import): path-based checkpoint resume — kills parallel-drop + failed-file-skip + sort-flip bugs (garrytan#988) v0.34.1.0 fix(mcp): MCP fix wave — source-isolation P0 + PKCE DCR + federated_read + 3 more (garrytan#996) v0.34.0.0 feat: Cathedral III — recursive code intelligence + Leiden clusters + eval gate (garrytan#994) v0.33.3.0 feat(v0.33.3): code intelligence MCP foundation (v0.34 W0a-c + W3) (garrytan#934) v0.33.2.1 docs: fork-PR workflow for garrytan-agents (garrytan#992) fix(sync): raise maxBuffer to 100 MiB to prevent silent ENOBUFS crash (garrytan#982) v0.33.2.0 feat(search-lite): token budget + semantic query cache + intent weighting (garrytan#897) v0.33.1.1 fix: Voyage output_dimension + flexible-dim guard + OOM-cap rethrow (garrytan#962)

garrytan changed the title ~~feat(search-lite): token budget + semantic query cache + intent weighting~~ v0.33.1.0 feat(search-lite): token budget + semantic query cache + intent weighting May 12, 2026

garrytan changed the title ~~v0.33.1.0 feat(search-lite): token budget + semantic query cache + intent weighting~~ v0.33.2.0 feat(search-lite): token budget + semantic query cache + intent weighting May 12, 2026

garrytan added 21 commits May 11, 2026 23:25

chore: regen llms.txt + llms-full.txt for v0.32.3 search-mode docs

ca44080

bun run build:llms — picks up the new CLAUDE.md sections (Search Mode + Eval discipline) and the docs/eval/SEARCH_MODE_METHODOLOGY.md addition. build-llms.test.ts gate now passes.

Merge remote-tracking branch 'origin/master' into feat/search-lite-mode

13be24e

Merge remote-tracking branch 'origin/master' into feat/search-lite-mode

cfddf6f

garrytan merged commit 1a6b543 into garrytan:master May 13, 2026
7 checks passed

100yenadmin mentioned this pull request May 17, 2026

Merge upstream GBrain v0.35.1.1 while preserving Eva OpenClaw defaults electricsheephq/eva-brain#101

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.33.2.0 feat(search-lite): token budget + semantic query cache + intent weighting#897

v0.33.2.0 feat(search-lite): token budget + semantic query cache + intent weighting#897
garrytan merged 22 commits into
garrytan:masterfrom
garrytan-agents:feat/search-lite-mode

garrytan-agents commented May 12, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

garrytan-agents commented May 12, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

1. Token Budget Enforcement (src/core/search/token-budget.ts)

2. Semantic Query Cache (src/core/search/query-cache.ts + migration v52)

3. Zero-LLM Intent Weighting (src/core/search/intent-weights.ts)

Files changed (13)

Test results

Design notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

garrytan-agents commented May 12, 2026 •

edited by blacksmith-sh Bot

Loading

1. Token Budget Enforcement (`src/core/search/token-budget.ts`)

2. Semantic Query Cache (`src/core/search/query-cache.ts` + migration v52)

3. Zero-LLM Intent Weighting (`src/core/search/intent-weights.ts`)