Skip to content

feat(search): opt-in floor-ratio gate for post-fusion boost stages#1091

Closed
jayzalowitz wants to merge 1 commit into
garrytan:masterfrom
jayzalowitz:jayzalowitz/floor-ratio-gate
Closed

feat(search): opt-in floor-ratio gate for post-fusion boost stages#1091
jayzalowitz wants to merge 1 commit into
garrytan:masterfrom
jayzalowitz:jayzalowitz/floor-ratio-gate

Conversation

@jayzalowitz

@jayzalowitz jayzalowitz commented May 17, 2026

Copy link
Copy Markdown
Contributor

What

Adds an optional floorRatio?: number to applyBacklinkBoost, applySalienceBoost, applyRecencyBoost, and PostFusionOpts. When set, each boost stage skips results whose pre-boost score is below floorRatio * topScore at the moment that stage runs — only the head of the candidate pool receives the multiplicative bonus. Default undefined preserves exact prior behavior bit-for-bit.

The failure mode this addresses

gbrain's bounded boosts (log-compressed salience clipped to [1.0, ~1.6], log-scaled backlink factor, half-life-decayed recency) work as designed on curated test corpora. The existing comments in hybrid.ts:69 make the design intent explicit: "a strong boost can't catastrophically flip rankings."

That guarantee weakens on larger corpora indexed with real high-dimensional embedders (text-embedding-3-large, voyage-3-large, voyage-4-large, zembed-1). Baseline vector similarity between topically-unrelated "professional content" is non-trivial — weak-overlap pages routinely land in a query's top-K via vector overlap alone, each receives the multiplicative boost, and on a non-trivial fraction of queries a weak page with high metadata signal climbs above the legitimate primary hit. The per-boost factor still looks harmless in isolation; the compound effect across the long tail of top-K candidates is what shifts ranks.

Concrete example from the new test suite:

page raw RRF score backlink count post-boost score
strong-primary 1.00 0 1.000
weak-with-signal 0.50 1000 0.673

strong-primary still wins in that pair, but the gap is now ~50% of what the raw text+vector signal said it should be. Drop strong-primary's raw score below ~1.35× the weak page (common in the long tail) and rank flips, even though every component signal said the strong page was the unambiguous match.

The fix

A boost only fires for results within floorRatio × topScore at the moment that stage runs. The long tail keeps its unboosted score and original rank. Stages compose naturally — salience boosts against its own top, then recency runs against the post-salience top, etc. A truly-strong primary always wins regardless of boost magnitude on weaker candidates.

0.85 is a reasonable starting point for dense embedder corpora — it covers the head of a typical RRF candidate pool while excluding weak-overlap distractors. The value comes from a labeled-retrieval ablation in the SkyTwin twin-memory layer (which consumes gbrain as its default memory backend via skytwin#250): 0.85 is the largest ratio that fully eliminated the leapfrog regression on our labeled corpus while preserving baseline rankings on queries that don't have a metadata signal to bias. The relevant ablation is in skytwin#272.

Backward compatibility

floorRatio defaults to undefined → no gate, no threshold computation, exact prior behavior. Existing call sites are untouched; the new parameter is positional-last and optional on each function. PostFusionOpts.floorRatio is similarly optional and unset by default.

The gate is opt-in by design — it changes ranking behavior, and existing adopters have tuned around the current post-fusion stack. Surfacing it as a flag lets each consumer evaluate against their own corpus before flipping it on, rather than landing a behavior change in a patch release.

Tests

7 new cases in test/search.test.ts:

  • floorRatio undefined preserves existing behavior bit-for-bit
  • weak page gated out, top page boosted as before
  • borderline page at exactly the threshold is eligible (inclusive boundary)
  • regression scenario: weak page with strong metadata signal cannot leapfrog strong primary
  • applySalienceBoost honors the gate (parity with applyBacklinkBoost)
  • empty results no-op without divide-by-zero
  • single-result trivially eligible (page is its own top)

bun test test/search.test.ts: 33/33 pass (was 26/26).
bun run verify: pass (typecheck + 12 guard scripts).

Why I'm sending this

We've been running gbrain as the production memory backend in SkyTwin for the last week, and this came out of the labeled-retrieval ablation we did on top of it. The gate is general-purpose — the shape applies to any per-result boost on metadata orthogonal to the raw text+vector signal — so it seemed worth offering upstream rather than keeping it forked. The opt-in framing makes it cheap to merge if it lands cleanly and equally cheap to leave un-flipped if your shootout corpus doesn't surface the same regression.

Adds an optional `floorRatio?: number` to applyBacklinkBoost,
applySalienceBoost, applyRecencyBoost, and PostFusionOpts. When set,
each boost stage skips results whose pre-boost score is below
floorRatio * topScore at the moment that stage runs — only the head of
the candidate pool receives the multiplicative bonus. Default undefined
preserves exact prior behavior bit-for-bit.

The failure mode
────────────────
Bounded boosts (the [1.0, ~1.6] log-compressed clip on salience, the
log-scaled backlink factor, the half-life-decayed recency factor) work
as designed on curated test corpora. On larger corpora indexed with
real high-dimensional embedders (text-embedding-3-large, voyage-3-large,
voyage-4-large, zembed-1), baseline vector similarity between
topically-unrelated "professional content" is non-trivial. Weak-overlap
pages land in a query's top-K via vector overlap alone, receive the
multiplicative boost, and on a non-trivial fraction of queries a weak
page with high metadata signal climbs above the legitimate primary hit.
Per-boost factors look harmless in isolation; the compound effect
across the long tail is what shifts ranks.

The fix
───────
A boost only fires for results within floorRatio * topScore at the
moment that stage runs. The long tail keeps its unboosted score and
original rank. Stages compose naturally — salience runs against its
own top, recency runs against the post-salience top, etc.

0.85 as a starting point comes from a labeled-retrieval ablation in
the SkyTwin twin-memory layer: the largest ratio that fully eliminated
the leapfrog regression on our labeled corpus while preserving
baseline rankings on queries without a metadata signal.
Reference: jayzalowitz/skytwin#272

Backward compatibility
──────────────────────
floorRatio defaults to undefined → no gate, no threshold computation,
exact prior behavior. Existing call sites are untouched; the new param
is positional-last and optional on each function. PostFusionOpts.floorRatio
is similarly optional and unset by default. Opt-in by design — it
changes ranking behavior, so each consumer evaluates against their own
corpus before flipping it on.

Tests
─────
7 new cases in test/search.test.ts:
- default (floorRatio undefined) preserves existing behavior
- weak page gated out, top page boosted as before
- borderline page at exactly the threshold is eligible
- regression scenario: weak page with strong metadata signal cannot
  leapfrog a strong primary
- applySalienceBoost honors the gate (parity with applyBacklinkBoost)
- empty results no-op without divide-by-zero
- single-result trivially eligible

bun test test/search.test.ts: 33/33 pass (was 26/26).
bun run verify: pass (typecheck + 12 guard scripts).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@garrytan

Copy link
Copy Markdown
Owner

Hi @jayzalowitz — thank you for this. The labeled-retrieval ablation behind it is the kind of contribution that makes gbrain better than a single-corpus tool, and the failure-mode framing is sharp.

After running this through /plan-eng-review and /codex (outside-voice adversarial review), we integrated it on top of master in #1129 with some refactoring. Summary of what changed and why:

Three correctness fixes codex caught:

  1. Cache contamination via knobsHash(). Without bumping KNOBS_HASH_VERSION 2→3 in src/core/search/mode.ts, a cached no-floor result gets served to a floor-enabled call. Same bug class CDX-4 v0.32.3 hotfix closed for the other search-lite knobs. Direct ranking-correctness leak — the PR would have shipped with this.
  2. NaN scores bypass the gate. NaN < threshold is false in JS, so a NaN-scored row would have skipped the floor and received boosts. Realistic on Voyage flexible-dim or zembed-1 Matryoshka dim drift across reindexes. Fixed: NaN scores skip the boost entirely.
  3. Negative top scores break "single result trivially eligible". Your single-result test passed because score was positive; with a negative top score (-0.5), threshold becomes -0.425 and the top itself fails its own floor. Edge case but real on RRF outputs that can go negative.

Two architectural shape changes:

  • Single up-front threshold vs per-stage recompute. Your PR computes the threshold separately at each stage (each of backlink → salience → recency uses topScore × ratio from the array state at that moment). Codex argued this makes stage order part of the API — a backlink boost to the leader raises the salience/recency floor and gates out a page that was originally within 0.85 × top. We refactored to compute the threshold ONCE at runPostFusionStages entry from the post-cosine-rescore snapshot, then pass the same number to all three stages. Order-independent.
  • Public surface beyond PostFusionOpts. Wired SearchOpts.floorRatio (per-call), search.floor_ratio config key (operator default — included in SEARCH_MODE_CONFIG_KEYS), and a floor_ratio field on MODE_BUNDLES (currently undefined for all three modes pending gbrain-side ablation). No GBRAIN_SEARCH_FLOOR_RATIO env var — resolveSearchMode() is pure by design, and a hidden env knob would make gbrain search modes lie about the resolved state.

API surface change: the three boost functions now take floorThreshold?: number (an absolute score floor) instead of floorRatio?: number. runPostFusionStages is the single ratio→threshold converter. This is what makes the single-baseline semantic enforceable through the type system. Your tests' shape stays similar; they update from passing 0.85 to passing 0.85 * topScore inline.

What stays exactly as you shipped:

  • Default-off posture is correct and preserved.
  • Empirical framing (dense-embedder targeting, 0.85 starting value, leapfrog regression mechanism) is unchanged — your ablation is what motivates the whole thing.
  • Gate scope (metadata-axis boost stages only, exact-match runs independently) matches your design intent.

You're credited via Co-Authored-By: Jay Zalowitz <jayzalowitz@gmail.com> on the integration commit, and the CHANGELOG section in #1129 explicitly names you and links your SkyTwin ablation as the empirical motivation. The full 9-decision review trail (D1-D9) is at ~/.claude/plans/swift-sniffing-nygaard.md in the repo if you want to see exactly which questions came up and how we landed on the integrated shape.

I'll mark this PR as superseded by #1129 (which carries Closes #1091 so it'll auto-close on merge). If anything about the integration looks wrong, please flag it on #1129 — we have ~24h before merge for any final adjustments.

Thanks again. Filed two follow-up TODOs that grew out of this work: (1) gbrain-side ablation across longmemeval / whoknows / suspected-contradictions / BrainBench-Real before flipping any mode-bundle default, and (2) per-source threshold for federated-read users if real usage surfaces the cross-source suppression codex flagged.

garrytan added a commit that referenced this pull request May 17, 2026
…loses #1091) (#1129)

* v0.35.6.0 feat(search): floor-ratio gate for metadata boost stages

Opt-in score-based gate on the three metadata-axis boost stages (backlink,
salience, recency) inside `runPostFusionStages`. When `SearchOpts.floorRatio`
or `search.floor_ratio` config is set, each stage skips results whose
post-cosine-rescore score is below `floorRatio * topScore`. Default
undefined preserves prior behavior bit-for-bit. Prevents weak-overlap
candidates from accumulating metadata boosts and leapfrogging the
legitimate primary hit on dense-embedder corpora.

Built on the contributor PR from @jayzalowitz (PR #1091, SkyTwin
twin-memory layer). Refactored on top: threshold is computed ONCE at
runPostFusionStages entry instead of per-stage (single-baseline semantic,
order-independent); knobsHash bumped 2->3 so a no-floor cache write can't
be served to a floor-enabled lookup; NaN scores skip the boost instead of
bypassing the gate; SearchOpts/config/MODE_BUNDLES integration replaces
the PR's PostFusionOpts-only surface; no env var (resolveSearchMode is
pure by design).

Three correctness issues codex outside-voice review caught and this
landed with fixed:
- Cache contamination via knobsHash() (same bug class as v0.32.3 CDX-4
  hotfix for the other search-lite knobs)
- NaN scores would have bypassed the gate (NaN < threshold is false in
  JS); realistic on Voyage flexible-dim / zembed-1 Matryoshka dim drift
- Negative top scores would have broken the "single result trivially
  eligible" claim; gate now disables on no-positive-signal inputs

Scope: gates metadata stages only. Exact-match boost
(applyExactMatchBoost) runs independently as a lexical-relevance signal
by design. Cross-source floor stays global (per-source deferred to
v0.36 if federated-read users hit the suppression). Default-on for any
mode bundle deferred until gbrain-side ablation against longmemeval /
whoknows / suspected-contradictions / BrainBench-Real (TODOS.md).

Plan + 9-decision review trail (D1-D9): ~/.claude/plans/swift-sniffing-nygaard.md.
Empirical motivation, failure-mode framing, dense-embedder targeting, and
the 0.85 starting value all from @jayzalowitz's labeled-retrieval
ablation. Integration shape is gbrain-side.

Test surface: 30+ new cases (computeFloorThreshold edge cases including
T1a NaN / T1b negative top, three boost-function gate parity tests
including T6 IRON-RULE applyRecencyBoost regression, runPostFusionStages
single-baseline composition pin, KNOBS_HASH_VERSION bump from 2 to 3,
floor-ratio-changes-hash cache-contamination prevention,
loadOverridesFromConfig coverage for search.floor_ratio config key).
bun run verify clean; full unit suite 6753 pass / 0 fail.

Co-Authored-By: Jay Zalowitz <jayzalowitz@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: rewrite v0.35.6.0 CHANGELOG ELI10-lead-first; codify the rule in CLAUDE.md

CHANGELOG entry for v0.35.6.0 was readable only by someone who already
understood gbrain's internals (RRF, knobsHash, MODE_BUNDLES, runPostFusionStages,
Matryoshka, CDX-4). Rewrote it so the first ~150 words explain what
shipped in everyday English, with a concrete worked example, before any
file paths or function names appear. Itemized changes section keeps the
technical precision for engineers who need it.

Then codified the rule in CLAUDE.md so future release entries land the same
way. The "Release-summary template" section now has an iron rule:
"lead ELI10, get precise after." No file paths or internal constants in
the first 150 words; user-visible behavior change first; everyday-language
column headers in any tables. Technical precision is required (the entry
is still the technical record) but lives BELOW the plain-English lead,
never before it.

Smell test: if a reader who has never opened gbrain can walk away from
the first 150 words knowing what shipped and whether they care, the entry
passes.

bun run build:llms regenerated to pick up the CLAUDE.md change (CI guard
test/build-llms.test.ts pins committed bundles against fresh generator
output).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jay Zalowitz <jayzalowitz@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
brandonlipman added a commit to brandonlipman/gbrain that referenced this pull request May 29, 2026
* upstream/master:
  v0.37.0.0 feat(skillpack): registry cathedral — third-party publish + install + 10/10 quality bar (garrytan#1208)
  v0.36.6.0 feat: cross-modal search wave (text↔image + unified column + LLM intent) (garrytan#1165)
  v0.36.5.0 feat: secure DATABASE_URL access for shell jobs (inherit: ["database_url"]) (garrytan#1192)
  v0.36.4.0 feat: brain-health-100 — autonomous remediation via doctor --remediate + Minions (garrytan#1193)
  fix(docs): comprehensive drift audit — contradictions, broken links, stale refs (garrytan#1201)
  v0.36.3.0 feat: dynamic embedding column selection for search (garrytan#1164)
  v0.36.2.0 feat: ZeroEntropy as default + zero-based README rewrite (garrytan#1136)
  v0.36.1.1 fix-wave: community PR triage + 28 atomic fixes (garrytan#1182)
  v0.36.1.0 Hindsight calibration wave: brain learns how you tend to be wrong (garrytan#1139)
  v0.36.0.0 feat(skillpack): scaffold + reference + harvest (retire managed-block install) (garrytan#1130)
  v0.35.8.0 feat(cycle): phantom-page redirect inside extract_facts (garrytan#1138)
  v0.35.7.0 feat: temporal trajectory + founder scorecard (Phases 2-4) (garrytan#1131)
  v0.35.6.0 feat(search): floor-ratio gate for metadata boost stages (closes garrytan#1091) (garrytan#1129)
  v0.35.5.1 fix(doctor): stop counting clean supervisor exits as crashes (garrytan#1108)
  v0.35.5.0 fix wave: bootstrap + orphans + think MCP + worktree + walker (garrytan#1111)
  v0.35.4.0 fix(doctor,entities): supervisor crash classification + bare-name resolver + 58x perf + stub guard observability (garrytan#1085)
  v0.35.3.1 feat(eval): temporal-aware contradiction probe + verdict enum (garrytan#1052)
  v0.35.3.0 fix wave: extract_facts items + git --no-recurse-submodules placement (garrytan#1053)

# Conflicts:
#	src/core/postgres-engine.ts
#	test/schema-bootstrap-coverage.test.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants