Skip to content

feat(#251 follow-up): real-embedding ablation result + opt-in test#272

Merged
jayzalowitz merged 2 commits into
mainfrom
jayzalowitz/251-real-embedding-eval
May 13, 2026
Merged

feat(#251 follow-up): real-embedding ablation result + opt-in test#272
jayzalowitz merged 2 commits into
mainfrom
jayzalowitz/251-real-embedding-eval

Conversation

@jayzalowitz

Copy link
Copy Markdown
Owner

Summary

Negative result with a clear next step. The working hypothesis going into this run was: hash-trick spurious overlap was inflating the received_content MRR regression first surfaced in PR #260; real semantic embeddings should improve the number materially. The hypothesis was wrong.

Numbers

Ran the tier-ablation eval against Ollama + nomic-embed-text (a real semantic model, ~137M params, 768-dim) and compared to the hash-trick baseline:

Metric pure-RRF tier-weighted (hash) tier-weighted (real)
user_behavior MRR (n=3) 0.667 1.000 1.000
received_content MRR (n=3) 1.000 0.542 ~0.54
neutral MRR (n=1) 1.000 1.000 1.000

The real-embedding received_content number is essentially identical to the hash-trick floor. Hash-trick was not the cause.

What's actually broken

The regression is structural to the multiplicative weighting approach:

  • user_sent_originated × 1.5 vs inbox_automated × 0.8 = 1.875× swing.
  • Any page within 53% of the top raw score that's authored_* will leapfrog a strong-but-demoted primary hit on a received_* query.
  • The floor-ratio gate (0.85 default) helps but isn't enough. With real semantic embeddings, an authored page from a completely unrelated query (e.g. q1's Series B pitch deck) has non-trivial similarity to q5 ("GitHub Actions CI failed on main") — enough to land in the candidate pool above threshold, where the 1.5× boost then beats the legitimate primary.
  • Diagnostic dump from the eval, q5 top-10 with tier-on:
    q7-authored-1, q3-authored-long, q1-authored-1, distractor-004, q2-received-1, distractor-014, distractor-034, q5-received-1, q6-received-1, distractor-024
    The actual primary lands at rank 8.

Decision

Layer 2 stays opt-in. The default-on rollout is blocked on a structural fix, not on environment / corpus / eval setup. Best-judgment path:

  • Switch from multiplicative weighting (score *= tier_weight) to additive bonuses (score += tier_bonus) sized to flip close calls without leapfrogging strong matches. For RRF scores in the 0.016–0.033 range, bonuses around ±0.005 should give authored a tie-breaker edge without overwhelming a clear relevance gap.
  • Re-run the ablation against the additive design. Target: received_content MRR ≥ 0.95 while preserving user_behavior MRR = 1.0.
  • Separate sub-issue — out of scope for this PR.

What this PR ships

  • New opt-in describe block in tier-ablation-eval.test.ts gated on RUN_REAL_EMBEDDING_EVAL=1. Reproducible with any local Ollama or OpenAI key — defaults respect OPENAI_EMBEDDING_BASE_URL / OPENAI_EMBEDDING_MODEL / OPENAI_EMBEDDING_API_KEY. Pre-pulled Ollama with nomic-embed-text is the cheapest way to run it.
  • runOneMode helper now takes an optional embedding provider so the same harness drives both the always-on hash-trick test and the opt-in real-embedding test. No duplication.
  • Diagnostic dump in printReport now triggers for queries that degrade by more than 2 ranks (not just "missing"), so future tuning has better signal for partially-regressed cases.

The opt-in test asserts the realistic guardrail bars (received_content ≥ 0.4, user_behavior must lift, neutral must not regress) — same shape as the always-on test, so a future regression in the implementation surfaces in either mode.

Why ship a negative result

Because the ENGINEERING DECISION is now made on data instead of vibes. PR #260 explicitly said "we'll know whether Layer 2 default-on is safe once we run this against real embeddings." We ran it. The answer is "not yet." That's worth committing — both as a permanent artifact and as a clear signal to the next person picking up the additive-redesign sub-issue.

Test plan

  • pnpm --filter @skytwin/memory-gbrain test -- tier-ablation-eval → 1 pass, 1 skipped (gated test).
  • RUN_REAL_EMBEDDING_EVAL=1 pnpm --filter @skytwin/memory-gbrain test -- tier-ablation-eval → 2 pass (both modes), prints side-by-side report.
  • Build + workspace test pre-flight: was clean before this PR (most recent main: PR feat(#251 follow-up): authoring-tier backfill worker #271 merged 23:47Z).

🤖 Generated with Claude Code

Validates the working hypothesis that hash-trick embeddings were
inflating the `received_content` MRR regression observed in PR #260.

Result: the hypothesis was wrong. With Ollama + nomic-embed-text (a
real semantic model), received_content MRR = ~0.54, essentially
identical to the hash-trick floor of 0.542. The regression is
structural to the multiplicative weighting approach:

- 1.5× authored vs 0.8× automated = 1.875× swing. Authored content
  within 53% of the top raw score leapfrogs strong-but-demoted primary
  hits.
- The 0.85 floor-ratio gate isn't enough: with real semantic
  embeddings, authored content from unrelated queries (e.g. q1 Series
  B pitch) has non-trivial similarity to q5 ("GitHub Actions CI
  failed") — lands in the candidate pool above threshold, gets the
  1.5× boost, beats the legitimate primary.

Diagnostic dump from the eval confirms: q5's primary lands at rank 8/9
with tier-on, behind three authored pages from unrelated queries plus
several distractors.

Decision: Layer 2 stays opt-in. Default-on is blocked on a structural
fix (switch to additive bonuses, target received_content MRR ≥ 0.95).
That's a separate sub-issue.

What ships:
- New opt-in test branch in tier-ablation-eval.test.ts, gated on
  RUN_REAL_EMBEDDING_EVAL=1. Reproducible with any local Ollama or
  OpenAI key — defaults respect OPENAI_EMBEDDING_BASE_URL /
  OPENAI_EMBEDDING_MODEL / OPENAI_EMBEDDING_API_KEY.
- runOneMode helper now takes an optional embedding provider so the
  same harness drives both modes.
- Diagnostic dump in printReport now triggers for queries that degrade
  by more than 2 ranks (not just "missing"), so future tuning has
  better signal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 13, 2026 00:04

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in “real embeddings” variant of the existing Layer 2 tier-ablation evaluation to validate behavior under an OpenAI-compatible embedding endpoint (e.g., local Ollama), and records the negative-result findings in the changelog for future decision-making.

Changes:

  • Extend the ablation harness (runOneMode) to accept an optional EmbeddingProvider so the same test logic can run with hash-trick or real embeddings.
  • Expand diagnostic reporting to also dump tier-on top-10 results when the primary degrades by multiple ranks (not only when it disappears).
  • Add a gated (RUN_REAL_EMBEDDING_EVAL=1) real-embedding test mode and document the measured outcome in CHANGELOG.md.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
packages/memory-gbrain/src/tests/tier-ablation-eval.test.ts Adds opt-in real-embedding eval path, parameterizes embedding provider, and broadens diagnostic dumps for rank regressions.
CHANGELOG.md Documents the real-embedding ablation result and the resulting decision to keep Layer 2 opt-in pending a structural fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +284 to +289
// Real-embedding ablation — opt-in. Validates the working hypothesis that
// hash-trick embeddings exaggerate the `received_content` MRR regression
// because of spurious token overlap, and that real semantic embeddings
// preserve the user_behavior lift WITHOUT the regression. Gated on
// `RUN_REAL_EMBEDDING_EVAL=1` and an OpenAI-compatible endpoint reachable
// at `OPENAI_EMBEDDING_BASE_URL` (defaults to local Ollama).

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f17b2b7. Rewrote the section header to describe what the eval actually is now (a permanent reproducible artifact for whoever picks up the additive-rewrite sub-issue), rather than the original hypothesis it was added to test.

Comment on lines +193 to +199
// signal for tuning the multipliers. Also dump when the primary
// degrades by more than 3 ranks — that's the case where Layer 2
// is reordering legitimately-strong primary hits behind other content.
const degradedSignificantly =
Number.isFinite(o.rankPrimary) &&
Number.isFinite(n.rankPrimary) &&
n.rankPrimary - o.rankPrimary > 2;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f17b2b7. Aligned the condition with the comment: n.rankPrimary - o.rankPrimary >= 3 and comment now reads "degrades by 3 or more ranks". Same semantics, no ambiguity.

Comment thread CHANGELOG.md Outdated
- Switch from multiplicative weighting (`score *= tier_weight`) to additive bonuses (`score += tier_bonus`) sized to flip close calls without leapfrogging strong matches. Estimated +0.005 for authored-originated, -0.005 for automated, on raw RRF scores in the 0.016–0.033 range — enough to break ties without overwhelming relevance.
- Re-run the ablation with the additive approach. Target: received_content MRR ≥ 0.95 while preserving user_behavior MRR = 1.0.

That's a separate sub-issue — out of scope for tonight.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f17b2b7. Changed to "out of scope for this PR".

Three small but valid finds:

1. Real-embedding section header said the eval "validates the working
   hypothesis" but the actual result was the opposite. Rewrote the
   header to describe what the eval actually is now — a permanent
   reproducible artifact for the next person tuning Layer 2 weighting.

2. Off-by-one between comment and condition for the diagnostic dump.
   Comment said "more than 3 ranks"; code was `> 2` (i.e. 3 or more).
   Aligned to the more useful semantics: `>= 3`, "3 or more rank
   degradation".

3. CHANGELOG had "out of scope for tonight" — time-relative phrasing
   that won't read well later. Changed to "out of scope for this PR".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jayzalowitz jayzalowitz merged commit 1505666 into main May 13, 2026
8 checks passed
jayzalowitz added a commit that referenced this pull request May 13, 2026
* feat(#251 Phase 1): Layer 2 additive rewrite + default-on flip

Phase 1.1 + 1.2 of the multi-phase plan.

# 1.1 — additive rewrite

Replaces multiplicative tier weighting (`score *= weight`) with additive
bonuses (`score += bonus`). The real-embedding ablation in PR #272
showed multiplicative was structurally bounded — a 1.5×/0.8× swing
(1.875× ratio) let weak-overlap authored content leapfrog strong
primary hits regardless of relevance. Additive bonuses (~±0.005 in
the normal band) can flip close calls but never leapfrog strong
matches.

Promote-only configuration: only authored tiers get a positive bonus;
all received tiers are 0. Trying any negative bonus pushed legitimate
primary hits on `received_content` queries below distractors. The
product intent is "prefer authored on close calls," not "suppress
received" — promote-only gives the former without the latter.

Floor-ratio gate retained (default 0.85). Real embedders give
non-trivial cross-query vector similarity; without the gate,
authored content from unrelated queries leaks into the candidate
pool and gets boosted past legitimate primaries.

Files:
- tier-weights.ts: `tierBonus` / `buildTierBonusFn` (additive).
  `tierMultiplier` / `buildTierWeightFn` re-exported as deprecated
  aliases for back-compat.
- rrf.ts: applies bonus additively, NEGATIVE_INFINITY sentinel for
  hidden, 0.85 floor-ratio gate.

# 1.2 — flip default-on

Phase 1.1 cleared the eval bar:

  user_behavior MRR       0.667 → 1.000  (preserved)
  received_content MRR    1.000 → 0.833  (real embeddings)
                                  → 0.583  (hash-trick floor)
  aggregate MRR primary   0.857 → 0.929  (above pure-RRF baseline)

Files:
- Migration 044: ALTER DEFAULT true + backfill existing rows.
- parseSettingsRow / in-memory + CRDB upsert / route GET — all default
  flipped to true.

Tests (98 pass, 70 turbo tasks green):
- tier-weights.test.ts: 19 cases updated for additive semantics, all 3
  calibrations, override composition, back-compat aliases.
- rrf.test.ts: new "weak-match doesn't leapfrog" case; existing cases
  reformulated for additive bonus.
- tier-ablation-eval bars tightened: received_content ≥ 0.55
  (hash-trick), ≥ 0.75 (real embeddings).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(#251 Phase 1 post-/review): address Copilot findings on PR #274

Five findings, all valid:

1. JSDoc had "Number. NEGATIVE_INFINITY" split across two lines —
   reads awkwardly in generated docs. Joined.

2. Inclusion-semantics drift: the post-bonus filter was `rrfScore > 0`,
   which silently dropped pages with sufficiently-negative bonuses
   alongside the intended NEGATIVE_INFINITY-sentinel drops. Tightened
   the filter to only remove the sentinel; negative bonuses now reorder
   without changing inclusion. Documented in the TierWeightFn JSDoc.

3. Migration 044's comment claimed "only rows that were never explicitly
   toggled" get backfilled, but the SQL unconditionally flips all
   tier_weighting=false rows. We don't have a "set by user" audit
   column to distinguish defaults from opt-outs, so the honest fix is
   to update the comment — clarifies that this IS an unconditional
   opt-in. Notes that a future audit column could preserve opt-outs
   if it becomes important.

4. The "doesn't leapfrog" test in rrf.test.ts had exploratory scratch
   notes including a "PR #_" placeholder and a self-contradicting "Wait
   — additive DOES flip this" line. Replaced with a clean explanation
   of the fixture being asserted, the actual rank/score numbers, and
   the load-bearing role of the 0.85 floor-ratio gate.

5. tier-weights.ts docstring said "rank-1 vs rank-2 RRF diff is ~0.001"
   but the table below shows 0.0164 vs 0.0161 = 0.0003. Corrected
   to 0.0003.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
jayzalowitz added a commit to jayzalowitz/gbrain that referenced this pull request May 17, 2026
Adds an optional `floorRatio?: number` to applyBacklinkBoost,
applySalienceBoost, applyRecencyBoost, and PostFusionOpts. When set,
each boost stage skips results whose pre-boost score is below
floorRatio * topScore at the moment that stage runs — only the head of
the candidate pool receives the multiplicative bonus. Default undefined
preserves exact prior behavior bit-for-bit.

The failure mode
────────────────
Bounded boosts (the [1.0, ~1.6] log-compressed clip on salience, the
log-scaled backlink factor, the half-life-decayed recency factor) work
as designed on curated test corpora. On larger corpora indexed with
real high-dimensional embedders (text-embedding-3-large, voyage-3-large,
voyage-4-large, zembed-1), baseline vector similarity between
topically-unrelated "professional content" is non-trivial. Weak-overlap
pages land in a query's top-K via vector overlap alone, receive the
multiplicative boost, and on a non-trivial fraction of queries a weak
page with high metadata signal climbs above the legitimate primary hit.
Per-boost factors look harmless in isolation; the compound effect
across the long tail is what shifts ranks.

The fix
───────
A boost only fires for results within floorRatio * topScore at the
moment that stage runs. The long tail keeps its unboosted score and
original rank. Stages compose naturally — salience runs against its
own top, recency runs against the post-salience top, etc.

0.85 as a starting point comes from a labeled-retrieval ablation in
the SkyTwin twin-memory layer: the largest ratio that fully eliminated
the leapfrog regression on our labeled corpus while preserving
baseline rankings on queries without a metadata signal.
Reference: jayzalowitz/skytwin#272

Backward compatibility
──────────────────────
floorRatio defaults to undefined → no gate, no threshold computation,
exact prior behavior. Existing call sites are untouched; the new param
is positional-last and optional on each function. PostFusionOpts.floorRatio
is similarly optional and unset by default. Opt-in by design — it
changes ranking behavior, so each consumer evaluates against their own
corpus before flipping it on.

Tests
─────
7 new cases in test/search.test.ts:
- default (floorRatio undefined) preserves existing behavior
- weak page gated out, top page boosted as before
- borderline page at exactly the threshold is eligible
- regression scenario: weak page with strong metadata signal cannot
  leapfrog a strong primary
- applySalienceBoost honors the gate (parity with applyBacklinkBoost)
- empty results no-op without divide-by-zero
- single-result trivially eligible

bun test test/search.test.ts: 33/33 pass (was 26/26).
bun run verify: pass (typecheck + 12 guard scripts).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jayzalowitz added a commit that referenced this pull request May 18, 2026
…g + floor-ratio gate)

CLAUDE.md:102 said the package "multiplies fused scores by per-tier weights"
— stale since #260/#272 flipped the implementation to additive bonuses (the
multiplicative cut had a structural leapfrog regression on real dense
embedders). Updated to describe the actual current behavior + the opt-in
`floorRatio` gate aligned with gbrain v0.35.6.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jayzalowitz added a commit that referenced this pull request May 18, 2026
…/ PR #1129 (#334)

* v0.6.52.0 sync(memory): align floor-ratio gate with gbrain v0.35.6.0 / PR #1129

Our contribution PR #1091 was closed in favor of upstream's reworked shape
that merged yesterday as #1129. The codex outside-voice review caught three
defensive gaps in the original shape; port the fixes here and align naming
with `SearchOpts.floorRatio` / `search.floor_ratio`.

Hardened in `packages/memory-gbrain-crdb-adapter/src/rrf.ts`:

- No-positive-signal inputs (all-negative, all-NaN, empty) disable the gate
  via `Number.NEGATIVE_INFINITY` threshold. Prior `topRawScore = 0` init
  would silently reject every entry against `r.score < 0`.
- Out-of-range `floorRatio` (NaN, Infinity, negative, > 1) disables the
  gate. Defense in depth so a malformed config value never gates anything.
- NaN-score skip in the bonus loop. `NaN < threshold` is `false` in JS, so
  a NaN-scored hit would slip past the gate check and have the bonus added
  on top — poisoning the sort. Now an explicit `Number.isFinite` check
  skips the bonus stage for non-finite scores.

New surface:

- `RrfFoldOptions.floorRatio` (deprecated alias `tierWeightFloorRatio`
  preserved; new name wins when both are set).
- `computeFloorThreshold(entries, floorRatio)` exported helper, mirrors
  gbrain's same-named function for cross-port mental-model consistency.
- `DEFAULT_FLOOR_RATIO = 0.85` exported as a named constant.

Tests:

- 12 new cases pinning the defensive guards (out-of-range / NaN / Infinity /
  empty / negative-top / all-NaN / mixed), the precedence rule between the
  new and deprecated option names, and an updated strong-vs-tail RRF setup
  that actually exercises the gate (RRF flatness means rank-1 vs rank-2
  don't differ enough — you need rank-20+ in a single list).
- 130/130 RRF tests pass; 100/100 `@skytwin/memory-gbrain` tests pass; the
  realistic-retrieval ablation reports `mean R@5 1.000 pure-RRF / 0.929
  tier-on`, unchanged.

Upstream feature triage (filed for follow-up, not in this PR):

- #897 search-lite (token budget + semantic query cache + intent weighting)
  — pursue first, ~2 days. Token budget addresses Claude API limits.
- #1008 zerank-2 reranker — pursue second, ~1.5 days. Slots between RRF
  fold and tier-weight bonus.
- #996 federated_read — skip (one brain per user).
- #1131 temporal trajectory — defer (entity-time-series shape not our fit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(rrf): codex T2/T3 + clarify floorRatio:0 test (post-/review)

Three findings from /review's codex outside-voice pass + one nit from
the structured Pass-1/Pass-2 review.

T2 — Invalid floorRatio bypassing legacy guard. Old precedence
`options.floorRatio ?? options.tierWeightFloorRatio ?? DEFAULT_FLOOR_RATIO`
meant `floorRatio: NaN` (e.g. from buggy config parse) won the chain and
disabled the gate, even if the caller had `tierWeightFloorRatio: 0.85`
working. A partially migrated caller piping a malformed new option
silently nullified the legacy guard. New `pickValidFloorRatio` helper
walks the candidates and uses the first finite value in [0, 1]; invalid
falls through to the alias, then to `DEFAULT_FLOOR_RATIO`.

T3 — NaN/+Infinity rrfScores surviving the sort. The comment claimed
non-finite scores "sort to the end," but `b.rrfScore - a.rrfScore`
returns `NaN` for any NaN side, which JS sort treats as 0 (equal) —
leaving NaN-scored hits in insertion order, where they can land in
top-k via `slice(0, k)`. `+Infinity` sorts to the top of every query.
Reachable when a caller passes `rrfK: NaN` (which makes every
`1 / (rrfK + rank)` NaN). Fix: the post-loop filter drops ALL
non-finite-scored entries (was: only `-Infinity` hidden sentinel),
and a mirror filter applies on the pure-RRF path so corrupted scores
never reach the comparator. Sort now operates only on finite scores
and produces a deterministic ranking.

Nit — Test `floorRatio: 0 disables the bonus completely` name + comment
contradicted the test's own assertions (which confirm the bonus IS
applied for every positive-score hit). Renamed to match the actual
behavior: `floorRatio: 0` is a valid in-range value, threshold computes
to 0, every positive-score hit passes the gate. Distinct from the
`undefined`/out-of-range disable path even though they're observationally
equivalent for positive-score inputs.

5 new test cases pin the codex fixes:
- invalid floorRatio falls back to deprecated alias when alias is valid
- invalid floorRatio + invalid alias falls back to DEFAULT_FLOOR_RATIO
- floorRatio: undefined falls through to alias when alias is valid
- rrfK: NaN corrupts all contributions → all hits dropped (output [])
- partial corruption (some finite hits) survives the filter intact

CHANGELOG updated to reflect: 135 tests (was 130), and the codex review
fixes are called out under a "Codex review fixes (post-review)" subsection
so the audit trail is visible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(rrf-tests): address Copilot review — strengthen invalid-floorRatio test setup

Copilot caught a coverage gap in three new tests: the rank-1-vs-rank-2
text-only setup doesn't actually exercise the gate, because rank-2 rrfScore
(1/62 ≈ 0.0161) is above the default 0.85 × 1/61 (≈ 0.0139) — so the
assertion passes regardless of whether `floorRatio: NaN` (or -0.5, or 1.5)
correctly disabled the gate, fell back to default, or did anything at all.
Same class as the gap I caught and fixed in the back-compat tests; missed
updating these three.

Rewritten to use the `strongVsTail` helper (rank-1-in-both + rank-21-text-only)
so the assertions distinguish "gate at 0.85" from "gate disabled" — the weak
hit's rrfScore is 1/81 (well below 0.85 × 2/61 = 0.0279), so the bonus only
applies if the gate is genuinely disabled.

Note: the test semantics also flipped because of the codex T2 fix landed in
89fd6be. Pre-T2, invalid `floorRatio` disabled the gate. Post-T2, invalid
falls back to the alias then to DEFAULT_FLOOR_RATIO. So the renamed tests
now assert "falls back to DEFAULT_FLOOR_RATIO" rather than "disables gate."
The test rationale comment block calls this out explicitly so a future
maintainer doesn't try to revert to the pre-T2 expectations.

CHANGELOG test count corrected: 22 new test cases (was 17, originally 12 —
my mistake; the count drifted across each round of review fixes).

Copilot's other two comments were already addressed in 89fd6be:
- "Test name `floorRatio: 0 disables bonus` contradicts assertions" → fixed
- "NaN-score skip leaves non-finite rrfScore in entries; sort can be poisoned"
  → fixed (post-loop `isFinite` filter drops non-finite scores before sort).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: clarify memory-gbrain-crdb-adapter description (additive scoring + floor-ratio gate)

CLAUDE.md:102 said the package "multiplies fused scores by per-tier weights"
— stale since #260/#272 flipped the implementation to additive bonuses (the
multiplicative cut had a structural leapfrog regression on real dense
embedders). Updated to describe the actual current behavior + the opt-in
`floorRatio` gate aligned with gbrain v0.35.6.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants