feat(search): opt-in floor-ratio gate for post-fusion boost stages by jayzalowitz · Pull Request #1091 · garrytan/gbrain

jayzalowitz · 2026-05-17T00:44:54Z

What

Adds an optional floorRatio?: number to applyBacklinkBoost, applySalienceBoost, applyRecencyBoost, and PostFusionOpts. When set, each boost stage skips results whose pre-boost score is below floorRatio * topScore at the moment that stage runs — only the head of the candidate pool receives the multiplicative bonus. Default undefined preserves exact prior behavior bit-for-bit.

The failure mode this addresses

gbrain's bounded boosts (log-compressed salience clipped to [1.0, ~1.6], log-scaled backlink factor, half-life-decayed recency) work as designed on curated test corpora. The existing comments in hybrid.ts:69 make the design intent explicit: "a strong boost can't catastrophically flip rankings."

That guarantee weakens on larger corpora indexed with real high-dimensional embedders (text-embedding-3-large, voyage-3-large, voyage-4-large, zembed-1). Baseline vector similarity between topically-unrelated "professional content" is non-trivial — weak-overlap pages routinely land in a query's top-K via vector overlap alone, each receives the multiplicative boost, and on a non-trivial fraction of queries a weak page with high metadata signal climbs above the legitimate primary hit. The per-boost factor still looks harmless in isolation; the compound effect across the long tail of top-K candidates is what shifts ranks.

Concrete example from the new test suite:

page	raw RRF score	backlink count	post-boost score
`strong-primary`	1.00	0	1.000
`weak-with-signal`	0.50	1000	0.673

strong-primary still wins in that pair, but the gap is now ~50% of what the raw text+vector signal said it should be. Drop strong-primary's raw score below ~1.35× the weak page (common in the long tail) and rank flips, even though every component signal said the strong page was the unambiguous match.

The fix

A boost only fires for results within floorRatio × topScore at the moment that stage runs. The long tail keeps its unboosted score and original rank. Stages compose naturally — salience boosts against its own top, then recency runs against the post-salience top, etc. A truly-strong primary always wins regardless of boost magnitude on weaker candidates.

0.85 is a reasonable starting point for dense embedder corpora — it covers the head of a typical RRF candidate pool while excluding weak-overlap distractors. The value comes from a labeled-retrieval ablation in the SkyTwin twin-memory layer (which consumes gbrain as its default memory backend via skytwin#250): 0.85 is the largest ratio that fully eliminated the leapfrog regression on our labeled corpus while preserving baseline rankings on queries that don't have a metadata signal to bias. The relevant ablation is in skytwin#272.

Backward compatibility

floorRatio defaults to undefined → no gate, no threshold computation, exact prior behavior. Existing call sites are untouched; the new parameter is positional-last and optional on each function. PostFusionOpts.floorRatio is similarly optional and unset by default.

The gate is opt-in by design — it changes ranking behavior, and existing adopters have tuned around the current post-fusion stack. Surfacing it as a flag lets each consumer evaluate against their own corpus before flipping it on, rather than landing a behavior change in a patch release.

Tests

7 new cases in test/search.test.ts:

floorRatio undefined preserves existing behavior bit-for-bit
weak page gated out, top page boosted as before
borderline page at exactly the threshold is eligible (inclusive boundary)
regression scenario: weak page with strong metadata signal cannot leapfrog strong primary
applySalienceBoost honors the gate (parity with applyBacklinkBoost)
empty results no-op without divide-by-zero
single-result trivially eligible (page is its own top)

bun test test/search.test.ts: 33/33 pass (was 26/26).
bun run verify: pass (typecheck + 12 guard scripts).

Why I'm sending this

We've been running gbrain as the production memory backend in SkyTwin for the last week, and this came out of the labeled-retrieval ablation we did on top of it. The gate is general-purpose — the shape applies to any per-result boost on metadata orthogonal to the raw text+vector signal — so it seemed worth offering upstream rather than keeping it forked. The opt-in framing makes it cheap to merge if it lands cleanly and equally cheap to leave un-flipped if your shootout corpus doesn't surface the same regression.

Adds an optional `floorRatio?: number` to applyBacklinkBoost, applySalienceBoost, applyRecencyBoost, and PostFusionOpts. When set, each boost stage skips results whose pre-boost score is below floorRatio * topScore at the moment that stage runs — only the head of the candidate pool receives the multiplicative bonus. Default undefined preserves exact prior behavior bit-for-bit. The failure mode ──────────────── Bounded boosts (the [1.0, ~1.6] log-compressed clip on salience, the log-scaled backlink factor, the half-life-decayed recency factor) work as designed on curated test corpora. On larger corpora indexed with real high-dimensional embedders (text-embedding-3-large, voyage-3-large, voyage-4-large, zembed-1), baseline vector similarity between topically-unrelated "professional content" is non-trivial. Weak-overlap pages land in a query's top-K via vector overlap alone, receive the multiplicative boost, and on a non-trivial fraction of queries a weak page with high metadata signal climbs above the legitimate primary hit. Per-boost factors look harmless in isolation; the compound effect across the long tail is what shifts ranks. The fix ─────── A boost only fires for results within floorRatio * topScore at the moment that stage runs. The long tail keeps its unboosted score and original rank. Stages compose naturally — salience runs against its own top, recency runs against the post-salience top, etc. 0.85 as a starting point comes from a labeled-retrieval ablation in the SkyTwin twin-memory layer: the largest ratio that fully eliminated the leapfrog regression on our labeled corpus while preserving baseline rankings on queries without a metadata signal. Reference: jayzalowitz/skytwin#272 Backward compatibility ────────────────────── floorRatio defaults to undefined → no gate, no threshold computation, exact prior behavior. Existing call sites are untouched; the new param is positional-last and optional on each function. PostFusionOpts.floorRatio is similarly optional and unset by default. Opt-in by design — it changes ranking behavior, so each consumer evaluates against their own corpus before flipping it on. Tests ───── 7 new cases in test/search.test.ts: - default (floorRatio undefined) preserves existing behavior - weak page gated out, top page boosted as before - borderline page at exactly the threshold is eligible - regression scenario: weak page with strong metadata signal cannot leapfrog a strong primary - applySalienceBoost honors the gate (parity with applyBacklinkBoost) - empty results no-op without divide-by-zero - single-result trivially eligible bun test test/search.test.ts: 33/33 pass (was 26/26). bun run verify: pass (typecheck + 12 guard scripts). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

garrytan · 2026-05-17T22:04:01Z

Hi @jayzalowitz — thank you for this. The labeled-retrieval ablation behind it is the kind of contribution that makes gbrain better than a single-corpus tool, and the failure-mode framing is sharp.

After running this through /plan-eng-review and /codex (outside-voice adversarial review), we integrated it on top of master in #1129 with some refactoring. Summary of what changed and why:

Three correctness fixes codex caught:

Cache contamination via knobsHash(). Without bumping KNOBS_HASH_VERSION 2→3 in src/core/search/mode.ts, a cached no-floor result gets served to a floor-enabled call. Same bug class CDX-4 v0.32.3 hotfix closed for the other search-lite knobs. Direct ranking-correctness leak — the PR would have shipped with this.
NaN scores bypass the gate. NaN < threshold is false in JS, so a NaN-scored row would have skipped the floor and received boosts. Realistic on Voyage flexible-dim or zembed-1 Matryoshka dim drift across reindexes. Fixed: NaN scores skip the boost entirely.
Negative top scores break "single result trivially eligible". Your single-result test passed because score was positive; with a negative top score (-0.5), threshold becomes -0.425 and the top itself fails its own floor. Edge case but real on RRF outputs that can go negative.

Two architectural shape changes:

Single up-front threshold vs per-stage recompute. Your PR computes the threshold separately at each stage (each of backlink → salience → recency uses topScore × ratio from the array state at that moment). Codex argued this makes stage order part of the API — a backlink boost to the leader raises the salience/recency floor and gates out a page that was originally within 0.85 × top. We refactored to compute the threshold ONCE at runPostFusionStages entry from the post-cosine-rescore snapshot, then pass the same number to all three stages. Order-independent.
Public surface beyond PostFusionOpts. Wired SearchOpts.floorRatio (per-call), search.floor_ratio config key (operator default — included in SEARCH_MODE_CONFIG_KEYS), and a floor_ratio field on MODE_BUNDLES (currently undefined for all three modes pending gbrain-side ablation). No GBRAIN_SEARCH_FLOOR_RATIO env var — resolveSearchMode() is pure by design, and a hidden env knob would make gbrain search modes lie about the resolved state.

API surface change: the three boost functions now take floorThreshold?: number (an absolute score floor) instead of floorRatio?: number. runPostFusionStages is the single ratio→threshold converter. This is what makes the single-baseline semantic enforceable through the type system. Your tests' shape stays similar; they update from passing 0.85 to passing 0.85 * topScore inline.

What stays exactly as you shipped:

Default-off posture is correct and preserved.
Empirical framing (dense-embedder targeting, 0.85 starting value, leapfrog regression mechanism) is unchanged — your ablation is what motivates the whole thing.
Gate scope (metadata-axis boost stages only, exact-match runs independently) matches your design intent.

You're credited via Co-Authored-By: Jay Zalowitz <jayzalowitz@gmail.com> on the integration commit, and the CHANGELOG section in #1129 explicitly names you and links your SkyTwin ablation as the empirical motivation. The full 9-decision review trail (D1-D9) is at ~/.claude/plans/swift-sniffing-nygaard.md in the repo if you want to see exactly which questions came up and how we landed on the integrated shape.

I'll mark this PR as superseded by #1129 (which carries Closes #1091 so it'll auto-close on merge). If anything about the integration looks wrong, please flag it on #1129 — we have ~24h before merge for any final adjustments.

Thanks again. Filed two follow-up TODOs that grew out of this work: (1) gbrain-side ablation across longmemeval / whoknows / suspected-contradictions / BrainBench-Real before flipping any mode-bundle default, and (2) per-source threshold for federated-read users if real usage surfaces the cross-source suppression codex flagged.

@jayzalowitz

…loses #1091) (#1129) * v0.35.6.0 feat(search): floor-ratio gate for metadata boost stages Opt-in score-based gate on the three metadata-axis boost stages (backlink, salience, recency) inside `runPostFusionStages`. When `SearchOpts.floorRatio` or `search.floor_ratio` config is set, each stage skips results whose post-cosine-rescore score is below `floorRatio * topScore`. Default undefined preserves prior behavior bit-for-bit. Prevents weak-overlap candidates from accumulating metadata boosts and leapfrogging the legitimate primary hit on dense-embedder corpora. Built on the contributor PR from @jayzalowitz (PR #1091, SkyTwin twin-memory layer). Refactored on top: threshold is computed ONCE at runPostFusionStages entry instead of per-stage (single-baseline semantic, order-independent); knobsHash bumped 2->3 so a no-floor cache write can't be served to a floor-enabled lookup; NaN scores skip the boost instead of bypassing the gate; SearchOpts/config/MODE_BUNDLES integration replaces the PR's PostFusionOpts-only surface; no env var (resolveSearchMode is pure by design). Three correctness issues codex outside-voice review caught and this landed with fixed: - Cache contamination via knobsHash() (same bug class as v0.32.3 CDX-4 hotfix for the other search-lite knobs) - NaN scores would have bypassed the gate (NaN < threshold is false in JS); realistic on Voyage flexible-dim / zembed-1 Matryoshka dim drift - Negative top scores would have broken the "single result trivially eligible" claim; gate now disables on no-positive-signal inputs Scope: gates metadata stages only. Exact-match boost (applyExactMatchBoost) runs independently as a lexical-relevance signal by design. Cross-source floor stays global (per-source deferred to v0.36 if federated-read users hit the suppression). Default-on for any mode bundle deferred until gbrain-side ablation against longmemeval / whoknows / suspected-contradictions / BrainBench-Real (TODOS.md). Plan + 9-decision review trail (D1-D9): ~/.claude/plans/swift-sniffing-nygaard.md. Empirical motivation, failure-mode framing, dense-embedder targeting, and the 0.85 starting value all from @jayzalowitz's labeled-retrieval ablation. Integration shape is gbrain-side. Test surface: 30+ new cases (computeFloorThreshold edge cases including T1a NaN / T1b negative top, three boost-function gate parity tests including T6 IRON-RULE applyRecencyBoost regression, runPostFusionStages single-baseline composition pin, KNOBS_HASH_VERSION bump from 2 to 3, floor-ratio-changes-hash cache-contamination prevention, loadOverridesFromConfig coverage for search.floor_ratio config key). bun run verify clean; full unit suite 6753 pass / 0 fail. Co-Authored-By: Jay Zalowitz <jayzalowitz@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: rewrite v0.35.6.0 CHANGELOG ELI10-lead-first; codify the rule in CLAUDE.md CHANGELOG entry for v0.35.6.0 was readable only by someone who already understood gbrain's internals (RRF, knobsHash, MODE_BUNDLES, runPostFusionStages, Matryoshka, CDX-4). Rewrote it so the first ~150 words explain what shipped in everyday English, with a concrete worked example, before any file paths or function names appear. Itemized changes section keeps the technical precision for engineers who need it. Then codified the rule in CLAUDE.md so future release entries land the same way. The "Release-summary template" section now has an iron rule: "lead ELI10, get precise after." No file paths or internal constants in the first 150 words; user-visible behavior change first; everyday-language column headers in any tables. Technical precision is required (the entry is still the technical record) but lives BELOW the plain-English lead, never before it. Smell test: if a reader who has never opened gbrain can walk away from the first 150 words knowing what shipped and whether they care, the entry passes. bun run build:llms regenerated to pick up the CLAUDE.md change (CI guard test/build-llms.test.ts pins committed bundles against fresh generator output). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jay Zalowitz <jayzalowitz@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* upstream/master: v0.37.0.0 feat(skillpack): registry cathedral — third-party publish + install + 10/10 quality bar (garrytan#1208) v0.36.6.0 feat: cross-modal search wave (text↔image + unified column + LLM intent) (garrytan#1165) v0.36.5.0 feat: secure DATABASE_URL access for shell jobs (inherit: ["database_url"]) (garrytan#1192) v0.36.4.0 feat: brain-health-100 — autonomous remediation via doctor --remediate + Minions (garrytan#1193) fix(docs): comprehensive drift audit — contradictions, broken links, stale refs (garrytan#1201) v0.36.3.0 feat: dynamic embedding column selection for search (garrytan#1164) v0.36.2.0 feat: ZeroEntropy as default + zero-based README rewrite (garrytan#1136) v0.36.1.1 fix-wave: community PR triage + 28 atomic fixes (garrytan#1182) v0.36.1.0 Hindsight calibration wave: brain learns how you tend to be wrong (garrytan#1139) v0.36.0.0 feat(skillpack): scaffold + reference + harvest (retire managed-block install) (garrytan#1130) v0.35.8.0 feat(cycle): phantom-page redirect inside extract_facts (garrytan#1138) v0.35.7.0 feat: temporal trajectory + founder scorecard (Phases 2-4) (garrytan#1131) v0.35.6.0 feat(search): floor-ratio gate for metadata boost stages (closes garrytan#1091) (garrytan#1129) v0.35.5.1 fix(doctor): stop counting clean supervisor exits as crashes (garrytan#1108) v0.35.5.0 fix wave: bootstrap + orphans + think MCP + worktree + walker (garrytan#1111) v0.35.4.0 fix(doctor,entities): supervisor crash classification + bare-name resolver + 58x perf + stub guard observability (garrytan#1085) v0.35.3.1 feat(eval): temporal-aware contradiction probe + verdict enum (garrytan#1052) v0.35.3.0 fix wave: extract_facts items + git --no-recurse-submodules placement (garrytan#1053) # Conflicts: # src/core/postgres-engine.ts # test/schema-bootstrap-coverage.test.ts

jayzalowitz force-pushed the jayzalowitz/floor-ratio-gate branch from 837d1ce to 043b403 Compare May 17, 2026 00:51

garrytan mentioned this pull request May 17, 2026

v0.35.6.0 feat(search): floor-ratio gate for metadata boost stages (closes #1091) #1129

Merged

5 tasks

garrytan closed this in #1129 May 17, 2026

jayzalowitz mentioned this pull request May 18, 2026

v0.6.52.0 sync(memory): align floor-ratio gate with gbrain v0.35.6.0 / PR #1129 jayzalowitz/skytwin#334

Merged

4 tasks

dcapclaw mentioned this pull request Jun 12, 2026

Ranking results inverted - most relevant content gets lowest score #895

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(search): opt-in floor-ratio gate for post-fusion boost stages#1091

feat(search): opt-in floor-ratio gate for post-fusion boost stages#1091
jayzalowitz wants to merge 1 commit into
garrytan:masterfrom
jayzalowitz:jayzalowitz/floor-ratio-gate

jayzalowitz commented May 17, 2026 •

edited

Loading

Uh oh!

garrytan commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jayzalowitz commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

The failure mode this addresses

The fix

Backward compatibility

Tests

Why I'm sending this

Uh oh!

garrytan commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jayzalowitz commented May 17, 2026 •

edited

Loading