Skip to content

v0.42.3.0 feat(search): autocut — score-discontinuity result-sizing (#1663 wave 1)#1682

Merged
garrytan merged 9 commits into
masterfrom
garrytan/fix-issue-1663
Jun 2, 2026
Merged

v0.42.3.0 feat(search): autocut — score-discontinuity result-sizing (#1663 wave 1)#1682
garrytan merged 9 commits into
masterfrom
garrytan/fix-issue-1663

Conversation

@garrytan

@garrytan garrytan commented May 31, 2026

Copy link
Copy Markdown
Owner

Autocut: score-discontinuity result-sizing on the rerank separatrix (v0.42.3.0)

Fixes one wave of #1663 (the floor/ceiling retrieval redesign): recommendation #2, "the direct fix for the 20-vs-1 problem, no LLM call."

What it does. Search returns the confident handful instead of a fixed top-K. When the cross-encoder rerank scores show a clear cliff, autocut cuts there — one obvious answer comes back as one result, a real cluster as that cluster, a broad ambiguous query still returns the full set. Default-ON in balanced/tokenmax; documented no-op in conservative (no reranker → no trustworthy signal).

Why it's trustworthy. gbrain measured (documented in return-policy.ts) that the raw RRF/cosine rank1→rank2 gap is ~identical whether rank-1 is right or wrong — not a separatrix. The cross-encoder rerank score is. So autocut cuts on rerank_score only, and only where the reranker ran.

Reviews run on this branch

  • /plan-eng-review — CLEAR (rerank-gating, default-ON + eval gate, DRY parallel-module).
  • /codex (plan) — caught the unscored-tail recall risk → fixed by D4 (reranker_top_n_in = searchLimit in reranked modes, no unscored tail).
  • /codex (pre-landing review on the diff) — caught a P1 introduced by the master merge: applyAliasHop() (master's recent retrieval work) injects an exact alias-match page after reranking with no rerank_score, and autocut would drop it when cutting. Fixed: applyAutocut gains a preserve predicate; hybrid passes r => r.alias_hit === true. Also a P2: cache-HIT meta now carries autocut/adaptive_return/mode/embedding_column. 3 new regression tests.

Eval gate (in-repo, runs in CI)

bun run eval:autocut (test/search/autocut-eval.test.ts) measures precision-lift-without-recall-regression over labeled qrels fixtures with modeled cross-encoder distributions — no API key, no sibling repo. Result: precision 0.33 → 0.94, recall 1.00 → 0.95, zero recall regression on enumeration queries. Live-corpus PrecisionMemBench remains an optional empirical confirmation, not a blocker.

Diff

  • NEW src/core/search/autocut.ts — pure algorithm + resolve ladder + preserve predicate.
  • mode.tsautocut/autocut_jump knobs → knobsHash (KNOBS_HASH_VERSION 7→8, stacked on master's title_boost v=7); reranker_top_n_in = searchLimit for reranked modes.
  • hybrid.ts — wired after adaptive-return + alias-hop, before the limit slice, first page only; cache-miss AND cache-HIT meta both carry the trimmed-set decision fields.
  • rerank_score first-class on SearchResult; query op autocut boolean; --explain per-result rerank score + formatAutocutSummary; gbrain search modes + metric-glossary.
  • Tests: pure-fn, agent-surface, IRON-RULE behavioral (PGLite via rerankerFn seam), preserve regression, eval gate, v=8 knobsHash + bundle pins.

Verification

  • bun run verify — 29/29 green; bun run typecheck — clean; full search suite (377 tests) green.
  • Full unit suite: the local machine was under heavy load and the parallel runner SIGTERM-killed shards on the wall-clock cap across attempts — zero actual test failures observed in any shard (shard 4 + serial completed clean; the killed shards showed no (fail) markers). CI runs the authoritative full suite on a clean machine. One known pre-existing, environment-dependent failure lives in master's own test/audit/batch-retry-audit.test.ts (its "ENOENT no-op" case assumes an empty default ~/.gbrain/audit; a real brain has audit files) — not in this diff, fails on master independently, passes in clean CI.

Not closing #1663

One wave. Remaining: query-shape router, structural exact-lookup tier, CRAG-style auto-escalation to think.

🤖 Generated with Claude Code

garrytan and others added 6 commits May 30, 2026 22:17
…nk separatrix

Cut the ranked set at the cross-encoder rerank-score cliff instead of a fixed
top-K. Default-ON in reranked modes (balanced/tokenmax), no-op without a
reranker. New pure src/core/search/autocut.ts; mode.ts knobs + reranker_top_n_in
= searchLimit (no unscored tail); query op autocut param; --explain + glossary.
…recall eval gate

Adds autocut.test.ts, query-op-autocut.test.ts, autocut-integration.serial.test.ts
(IRON-RULE behavioral via rerankerFn seam), autocut-eval.test.ts (in-repo
precision-lift-without-recall-regression gate). Updates existing knobsHash/bundle
pins to v=7 + reranker_top_n_in.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1663

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
#	src/core/search/mode.ts
#	test/search-mode.test.ts
…1663

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
…ta (codex P1/P2)

P1: applyAliasHop injects the canonical page after reranking (no rerank_score);
autocut would drop it when cutting on the scored set. applyAutocut gains an
optional preserve predicate; hybrid passes r => r.alias_hit === true.
P2: cache-HIT cachedMeta now carries autocut/adaptive_return/mode/embedding_column.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title feat(search): autocut — score-discontinuity result-sizing (#1663 wave 1) (v0.41.39.0) feat(search): autocut — score-discontinuity result-sizing (#1663 wave 1) (v0.42.2.0) May 31, 2026
@garrytan garrytan changed the title feat(search): autocut — score-discontinuity result-sizing (#1663 wave 1) (v0.42.2.0) v0.42.2.0 - feat(search): autocut — score-discontinuity result-sizing (#1663 wave 1) May 31, 2026
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title v0.42.2.0 - feat(search): autocut — score-discontinuity result-sizing (#1663 wave 1) feat(search): autocut — score-discontinuity result-sizing (#1663 wave 1) (v0.42.3.0) May 31, 2026
@garrytan garrytan changed the title feat(search): autocut — score-discontinuity result-sizing (#1663 wave 1) (v0.42.3.0) v0.42.3.0 feat(search): autocut — score-discontinuity result-sizing (#1663 wave 1) May 31, 2026
garrytan and others added 2 commits May 31, 2026 09:19
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1663

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
@garrytan garrytan merged commit d9eadfe into master Jun 2, 2026
21 checks passed
mgunnin added a commit to mgunnin/gbrain that referenced this pull request Jun 3, 2026
* upstream/master:
  v0.42.8.0 feat: content-quality gate on sync — quarantine junk + flag boilerplate (garrytan#1699) (garrytan#1756)
  v0.42.7.0 feat(extract): link/timeline extraction freshness watermark — gbrain extract --stale + doctor lag check (garrytan#1696) (garrytan#1755)
  v0.42.6.0 feat(enrich): gbrain enrich --thin — brain-internal grounded synthesis for stub pages (garrytan#1700) (garrytan#1757)
  v0.42.5.0 fix(minions): RSS watchdog opacity + pooler-reap self-heal + silent lens backlog + cycle lint DB-disconnect (garrytan#1678) (garrytan#1735)
  v0.42.4.0 fix: think --model fails loud — slash-form ids + never persist empty synthesis (garrytan#1698) (garrytan#1736)
  v0.42.3.0 feat(search): autocut — score-discontinuity result-sizing (garrytan#1663 wave 1) (garrytan#1682)
  v0.42.2.0 feat: gbrain connect — one-command Claude Code onboarding from a bearer token (garrytan#1683)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant