feat(search): opt-in floor-ratio gate for post-fusion boost stages#1091
feat(search): opt-in floor-ratio gate for post-fusion boost stages#1091jayzalowitz wants to merge 1 commit into
Conversation
Adds an optional `floorRatio?: number` to applyBacklinkBoost, applySalienceBoost, applyRecencyBoost, and PostFusionOpts. When set, each boost stage skips results whose pre-boost score is below floorRatio * topScore at the moment that stage runs — only the head of the candidate pool receives the multiplicative bonus. Default undefined preserves exact prior behavior bit-for-bit. The failure mode ──────────────── Bounded boosts (the [1.0, ~1.6] log-compressed clip on salience, the log-scaled backlink factor, the half-life-decayed recency factor) work as designed on curated test corpora. On larger corpora indexed with real high-dimensional embedders (text-embedding-3-large, voyage-3-large, voyage-4-large, zembed-1), baseline vector similarity between topically-unrelated "professional content" is non-trivial. Weak-overlap pages land in a query's top-K via vector overlap alone, receive the multiplicative boost, and on a non-trivial fraction of queries a weak page with high metadata signal climbs above the legitimate primary hit. Per-boost factors look harmless in isolation; the compound effect across the long tail is what shifts ranks. The fix ─────── A boost only fires for results within floorRatio * topScore at the moment that stage runs. The long tail keeps its unboosted score and original rank. Stages compose naturally — salience runs against its own top, recency runs against the post-salience top, etc. 0.85 as a starting point comes from a labeled-retrieval ablation in the SkyTwin twin-memory layer: the largest ratio that fully eliminated the leapfrog regression on our labeled corpus while preserving baseline rankings on queries without a metadata signal. Reference: jayzalowitz/skytwin#272 Backward compatibility ────────────────────── floorRatio defaults to undefined → no gate, no threshold computation, exact prior behavior. Existing call sites are untouched; the new param is positional-last and optional on each function. PostFusionOpts.floorRatio is similarly optional and unset by default. Opt-in by design — it changes ranking behavior, so each consumer evaluates against their own corpus before flipping it on. Tests ───── 7 new cases in test/search.test.ts: - default (floorRatio undefined) preserves existing behavior - weak page gated out, top page boosted as before - borderline page at exactly the threshold is eligible - regression scenario: weak page with strong metadata signal cannot leapfrog a strong primary - applySalienceBoost honors the gate (parity with applyBacklinkBoost) - empty results no-op without divide-by-zero - single-result trivially eligible bun test test/search.test.ts: 33/33 pass (was 26/26). bun run verify: pass (typecheck + 12 guard scripts). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
837d1ce to
043b403
Compare
|
Hi @jayzalowitz — thank you for this. The labeled-retrieval ablation behind it is the kind of contribution that makes gbrain better than a single-corpus tool, and the failure-mode framing is sharp. After running this through Three correctness fixes codex caught:
Two architectural shape changes:
API surface change: the three boost functions now take What stays exactly as you shipped:
You're credited via I'll mark this PR as superseded by #1129 (which carries Thanks again. Filed two follow-up TODOs that grew out of this work: (1) gbrain-side ablation across |
…loses #1091) (#1129) * v0.35.6.0 feat(search): floor-ratio gate for metadata boost stages Opt-in score-based gate on the three metadata-axis boost stages (backlink, salience, recency) inside `runPostFusionStages`. When `SearchOpts.floorRatio` or `search.floor_ratio` config is set, each stage skips results whose post-cosine-rescore score is below `floorRatio * topScore`. Default undefined preserves prior behavior bit-for-bit. Prevents weak-overlap candidates from accumulating metadata boosts and leapfrogging the legitimate primary hit on dense-embedder corpora. Built on the contributor PR from @jayzalowitz (PR #1091, SkyTwin twin-memory layer). Refactored on top: threshold is computed ONCE at runPostFusionStages entry instead of per-stage (single-baseline semantic, order-independent); knobsHash bumped 2->3 so a no-floor cache write can't be served to a floor-enabled lookup; NaN scores skip the boost instead of bypassing the gate; SearchOpts/config/MODE_BUNDLES integration replaces the PR's PostFusionOpts-only surface; no env var (resolveSearchMode is pure by design). Three correctness issues codex outside-voice review caught and this landed with fixed: - Cache contamination via knobsHash() (same bug class as v0.32.3 CDX-4 hotfix for the other search-lite knobs) - NaN scores would have bypassed the gate (NaN < threshold is false in JS); realistic on Voyage flexible-dim / zembed-1 Matryoshka dim drift - Negative top scores would have broken the "single result trivially eligible" claim; gate now disables on no-positive-signal inputs Scope: gates metadata stages only. Exact-match boost (applyExactMatchBoost) runs independently as a lexical-relevance signal by design. Cross-source floor stays global (per-source deferred to v0.36 if federated-read users hit the suppression). Default-on for any mode bundle deferred until gbrain-side ablation against longmemeval / whoknows / suspected-contradictions / BrainBench-Real (TODOS.md). Plan + 9-decision review trail (D1-D9): ~/.claude/plans/swift-sniffing-nygaard.md. Empirical motivation, failure-mode framing, dense-embedder targeting, and the 0.85 starting value all from @jayzalowitz's labeled-retrieval ablation. Integration shape is gbrain-side. Test surface: 30+ new cases (computeFloorThreshold edge cases including T1a NaN / T1b negative top, three boost-function gate parity tests including T6 IRON-RULE applyRecencyBoost regression, runPostFusionStages single-baseline composition pin, KNOBS_HASH_VERSION bump from 2 to 3, floor-ratio-changes-hash cache-contamination prevention, loadOverridesFromConfig coverage for search.floor_ratio config key). bun run verify clean; full unit suite 6753 pass / 0 fail. Co-Authored-By: Jay Zalowitz <jayzalowitz@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: rewrite v0.35.6.0 CHANGELOG ELI10-lead-first; codify the rule in CLAUDE.md CHANGELOG entry for v0.35.6.0 was readable only by someone who already understood gbrain's internals (RRF, knobsHash, MODE_BUNDLES, runPostFusionStages, Matryoshka, CDX-4). Rewrote it so the first ~150 words explain what shipped in everyday English, with a concrete worked example, before any file paths or function names appear. Itemized changes section keeps the technical precision for engineers who need it. Then codified the rule in CLAUDE.md so future release entries land the same way. The "Release-summary template" section now has an iron rule: "lead ELI10, get precise after." No file paths or internal constants in the first 150 words; user-visible behavior change first; everyday-language column headers in any tables. Technical precision is required (the entry is still the technical record) but lives BELOW the plain-English lead, never before it. Smell test: if a reader who has never opened gbrain can walk away from the first 150 words knowing what shipped and whether they care, the entry passes. bun run build:llms regenerated to pick up the CLAUDE.md change (CI guard test/build-llms.test.ts pins committed bundles against fresh generator output). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jay Zalowitz <jayzalowitz@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* upstream/master: v0.37.0.0 feat(skillpack): registry cathedral — third-party publish + install + 10/10 quality bar (garrytan#1208) v0.36.6.0 feat: cross-modal search wave (text↔image + unified column + LLM intent) (garrytan#1165) v0.36.5.0 feat: secure DATABASE_URL access for shell jobs (inherit: ["database_url"]) (garrytan#1192) v0.36.4.0 feat: brain-health-100 — autonomous remediation via doctor --remediate + Minions (garrytan#1193) fix(docs): comprehensive drift audit — contradictions, broken links, stale refs (garrytan#1201) v0.36.3.0 feat: dynamic embedding column selection for search (garrytan#1164) v0.36.2.0 feat: ZeroEntropy as default + zero-based README rewrite (garrytan#1136) v0.36.1.1 fix-wave: community PR triage + 28 atomic fixes (garrytan#1182) v0.36.1.0 Hindsight calibration wave: brain learns how you tend to be wrong (garrytan#1139) v0.36.0.0 feat(skillpack): scaffold + reference + harvest (retire managed-block install) (garrytan#1130) v0.35.8.0 feat(cycle): phantom-page redirect inside extract_facts (garrytan#1138) v0.35.7.0 feat: temporal trajectory + founder scorecard (Phases 2-4) (garrytan#1131) v0.35.6.0 feat(search): floor-ratio gate for metadata boost stages (closes garrytan#1091) (garrytan#1129) v0.35.5.1 fix(doctor): stop counting clean supervisor exits as crashes (garrytan#1108) v0.35.5.0 fix wave: bootstrap + orphans + think MCP + worktree + walker (garrytan#1111) v0.35.4.0 fix(doctor,entities): supervisor crash classification + bare-name resolver + 58x perf + stub guard observability (garrytan#1085) v0.35.3.1 feat(eval): temporal-aware contradiction probe + verdict enum (garrytan#1052) v0.35.3.0 fix wave: extract_facts items + git --no-recurse-submodules placement (garrytan#1053) # Conflicts: # src/core/postgres-engine.ts # test/schema-bootstrap-coverage.test.ts
What
Adds an optional
floorRatio?: numbertoapplyBacklinkBoost,applySalienceBoost,applyRecencyBoost, andPostFusionOpts. When set, each boost stage skips results whose pre-boost score is belowfloorRatio * topScoreat the moment that stage runs — only the head of the candidate pool receives the multiplicative bonus. Defaultundefinedpreserves exact prior behavior bit-for-bit.The failure mode this addresses
gbrain's bounded boosts (log-compressed salience clipped to
[1.0, ~1.6], log-scaled backlink factor, half-life-decayed recency) work as designed on curated test corpora. The existing comments inhybrid.ts:69make the design intent explicit: "a strong boost can't catastrophically flip rankings."That guarantee weakens on larger corpora indexed with real high-dimensional embedders (
text-embedding-3-large,voyage-3-large,voyage-4-large,zembed-1). Baseline vector similarity between topically-unrelated "professional content" is non-trivial — weak-overlap pages routinely land in a query's top-K via vector overlap alone, each receives the multiplicative boost, and on a non-trivial fraction of queries a weak page with high metadata signal climbs above the legitimate primary hit. The per-boost factor still looks harmless in isolation; the compound effect across the long tail of top-K candidates is what shifts ranks.Concrete example from the new test suite:
strong-primaryweak-with-signalstrong-primarystill wins in that pair, but the gap is now ~50% of what the raw text+vector signal said it should be. Dropstrong-primary's raw score below ~1.35× the weak page (common in the long tail) and rank flips, even though every component signal said the strong page was the unambiguous match.The fix
A boost only fires for results within
floorRatio × topScoreat the moment that stage runs. The long tail keeps its unboosted score and original rank. Stages compose naturally — salience boosts against its own top, then recency runs against the post-salience top, etc. A truly-strong primary always wins regardless of boost magnitude on weaker candidates.0.85is a reasonable starting point for dense embedder corpora — it covers the head of a typical RRF candidate pool while excluding weak-overlap distractors. The value comes from a labeled-retrieval ablation in the SkyTwin twin-memory layer (which consumes gbrain as its default memory backend via skytwin#250):0.85is the largest ratio that fully eliminated the leapfrog regression on our labeled corpus while preserving baseline rankings on queries that don't have a metadata signal to bias. The relevant ablation is in skytwin#272.Backward compatibility
floorRatiodefaults toundefined→ no gate, no threshold computation, exact prior behavior. Existing call sites are untouched; the new parameter is positional-last and optional on each function.PostFusionOpts.floorRatiois similarly optional and unset by default.The gate is opt-in by design — it changes ranking behavior, and existing adopters have tuned around the current post-fusion stack. Surfacing it as a flag lets each consumer evaluate against their own corpus before flipping it on, rather than landing a behavior change in a patch release.
Tests
7 new cases in
test/search.test.ts:floorRatioundefined preserves existing behavior bit-for-bitapplySalienceBoosthonors the gate (parity withapplyBacklinkBoost)bun test test/search.test.ts: 33/33 pass (was 26/26).bun run verify: pass (typecheck + 12 guard scripts).Why I'm sending this
We've been running gbrain as the production memory backend in SkyTwin for the last week, and this came out of the labeled-retrieval ablation we did on top of it. The gate is general-purpose — the shape applies to any per-result boost on metadata orthogonal to the raw text+vector signal — so it seemed worth offering upstream rather than keeping it forked. The opt-in framing makes it cheap to merge if it lands cleanly and equally cheap to leave un-flipped if your shootout corpus doesn't surface the same regression.