v0.42.3.0 feat(search): autocut — score-discontinuity result-sizing (#1663 wave 1)#1682
Merged
Conversation
…nk separatrix Cut the ranked set at the cross-encoder rerank-score cliff instead of a fixed top-K. Default-ON in reranked modes (balanced/tokenmax), no-op without a reranker. New pure src/core/search/autocut.ts; mode.ts knobs + reranker_top_n_in = searchLimit (no unscored tail); query op autocut param; --explain + glossary.
…recall eval gate Adds autocut.test.ts, query-op-autocut.test.ts, autocut-integration.serial.test.ts (IRON-RULE behavioral via rerankerFn seam), autocut-eval.test.ts (in-repo precision-lift-without-recall-regression gate). Updates existing knobsHash/bundle pins to v=7 + reranker_top_n_in.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1663 # Conflicts: # CHANGELOG.md # VERSION # package.json # src/core/search/mode.ts # test/search-mode.test.ts
…1663 # Conflicts: # CHANGELOG.md # VERSION # package.json
…ta (codex P1/P2) P1: applyAliasHop injects the canonical page after reranking (no rerank_score); autocut would drop it when cutting on the scored set. applyAutocut gains an optional preserve predicate; hybrid passes r => r.alias_hit === true. P2: cache-HIT cachedMeta now carries autocut/adaptive_return/mode/embedding_column. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1663 # Conflicts: # CHANGELOG.md # VERSION # package.json
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
Jun 3, 2026
* upstream/master: v0.42.8.0 feat: content-quality gate on sync — quarantine junk + flag boilerplate (garrytan#1699) (garrytan#1756) v0.42.7.0 feat(extract): link/timeline extraction freshness watermark — gbrain extract --stale + doctor lag check (garrytan#1696) (garrytan#1755) v0.42.6.0 feat(enrich): gbrain enrich --thin — brain-internal grounded synthesis for stub pages (garrytan#1700) (garrytan#1757) v0.42.5.0 fix(minions): RSS watchdog opacity + pooler-reap self-heal + silent lens backlog + cycle lint DB-disconnect (garrytan#1678) (garrytan#1735) v0.42.4.0 fix: think --model fails loud — slash-form ids + never persist empty synthesis (garrytan#1698) (garrytan#1736) v0.42.3.0 feat(search): autocut — score-discontinuity result-sizing (garrytan#1663 wave 1) (garrytan#1682) v0.42.2.0 feat: gbrain connect — one-command Claude Code onboarding from a bearer token (garrytan#1683)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Autocut: score-discontinuity result-sizing on the rerank separatrix (v0.42.3.0)
Fixes one wave of #1663 (the floor/ceiling retrieval redesign): recommendation #2, "the direct fix for the 20-vs-1 problem, no LLM call."
What it does. Search returns the confident handful instead of a fixed top-K. When the cross-encoder rerank scores show a clear cliff, autocut cuts there — one obvious answer comes back as one result, a real cluster as that cluster, a broad ambiguous query still returns the full set. Default-ON in
balanced/tokenmax; documented no-op inconservative(no reranker → no trustworthy signal).Why it's trustworthy. gbrain measured (documented in
return-policy.ts) that the raw RRF/cosine rank1→rank2 gap is ~identical whether rank-1 is right or wrong — not a separatrix. The cross-encoder rerank score is. So autocut cuts onrerank_scoreonly, and only where the reranker ran.Reviews run on this branch
reranker_top_n_in = searchLimitin reranked modes, no unscored tail).applyAliasHop()(master's recent retrieval work) injects an exact alias-match page after reranking with norerank_score, and autocut would drop it when cutting. Fixed:applyAutocutgains apreservepredicate; hybrid passesr => r.alias_hit === true. Also a P2: cache-HIT meta now carriesautocut/adaptive_return/mode/embedding_column. 3 new regression tests.Eval gate (in-repo, runs in CI)
bun run eval:autocut(test/search/autocut-eval.test.ts) measures precision-lift-without-recall-regression over labeled qrels fixtures with modeled cross-encoder distributions — no API key, no sibling repo. Result: precision 0.33 → 0.94, recall 1.00 → 0.95, zero recall regression on enumeration queries. Live-corpus PrecisionMemBench remains an optional empirical confirmation, not a blocker.Diff
src/core/search/autocut.ts— pure algorithm + resolve ladder +preservepredicate.mode.ts—autocut/autocut_jumpknobs → knobsHash (KNOBS_HASH_VERSION7→8, stacked on master'stitle_boostv=7);reranker_top_n_in = searchLimitfor reranked modes.hybrid.ts— wired after adaptive-return + alias-hop, before the limit slice, first page only; cache-miss AND cache-HIT meta both carry the trimmed-set decision fields.rerank_scorefirst-class onSearchResult;queryopautocutboolean;--explainper-result rerank score +formatAutocutSummary;gbrain search modes+ metric-glossary.rerankerFnseam), preserve regression, eval gate, v=8 knobsHash + bundle pins.Verification
bun run verify— 29/29 green;bun run typecheck— clean; full search suite (377 tests) green.(fail)markers). CI runs the authoritative full suite on a clean machine. One known pre-existing, environment-dependent failure lives in master's owntest/audit/batch-retry-audit.test.ts(its "ENOENT no-op" case assumes an empty default~/.gbrain/audit; a real brain has audit files) — not in this diff, fails on master independently, passes in clean CI.Not closing #1663
One wave. Remaining: query-shape router, structural exact-lookup tier, CRAG-style auto-escalation to
think.🤖 Generated with Claude Code