feat(#251 Phase 1): Layer 2 additive rewrite + default-on flip#274
Conversation
Phase 1.1 + 1.2 of the multi-phase plan. # 1.1 — additive rewrite Replaces multiplicative tier weighting (`score *= weight`) with additive bonuses (`score += bonus`). The real-embedding ablation in PR #272 showed multiplicative was structurally bounded — a 1.5×/0.8× swing (1.875× ratio) let weak-overlap authored content leapfrog strong primary hits regardless of relevance. Additive bonuses (~±0.005 in the normal band) can flip close calls but never leapfrog strong matches. Promote-only configuration: only authored tiers get a positive bonus; all received tiers are 0. Trying any negative bonus pushed legitimate primary hits on `received_content` queries below distractors. The product intent is "prefer authored on close calls," not "suppress received" — promote-only gives the former without the latter. Floor-ratio gate retained (default 0.85). Real embedders give non-trivial cross-query vector similarity; without the gate, authored content from unrelated queries leaks into the candidate pool and gets boosted past legitimate primaries. Files: - tier-weights.ts: `tierBonus` / `buildTierBonusFn` (additive). `tierMultiplier` / `buildTierWeightFn` re-exported as deprecated aliases for back-compat. - rrf.ts: applies bonus additively, NEGATIVE_INFINITY sentinel for hidden, 0.85 floor-ratio gate. # 1.2 — flip default-on Phase 1.1 cleared the eval bar: user_behavior MRR 0.667 → 1.000 (preserved) received_content MRR 1.000 → 0.833 (real embeddings) → 0.583 (hash-trick floor) aggregate MRR primary 0.857 → 0.929 (above pure-RRF baseline) Files: - Migration 044: ALTER DEFAULT true + backfill existing rows. - parseSettingsRow / in-memory + CRDB upsert / route GET — all default flipped to true. Tests (98 pass, 70 turbo tasks green): - tier-weights.test.ts: 19 cases updated for additive semantics, all 3 calibrations, override composition, back-compat aliases. - rrf.test.ts: new "weak-match doesn't leapfrog" case; existing cases reformulated for additive bonus. - tier-ablation-eval bars tightened: received_content ≥ 0.55 (hash-trick), ≥ 0.75 (real embeddings). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR implements Phase 1 of issue #251 by changing Layer 2 authoring-tier weighting from multiplicative factors to additive bonuses in the RRF fold, and flips brain_settings.tier_weighting to be enabled by default (including a backfill migration).
Changes:
- Replaced multiplicative tier weighting with additive
tierBonussemantics (includingPINNED_BOOSTand a hidden sentinel) and updated RRF fold logic accordingly. - Flipped tier-weighting defaults to “on” across persistence + API fallback paths, and added a DB migration to set the default and backfill existing rows.
- Updated/rewrote unit tests and eval guardrails to match additive behavior and tightened expected floors.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/memory-gbrain/src/tests/tier-ablation-eval.test.ts | Updates eval guardrail commentary and floors for additive semantics. |
| packages/memory-gbrain-crdb-adapter/src/tier-weights.ts | Rewrites tier weighting into additive bonuses; adds pinned/hidden semantics; adds deprecated aliases. |
| packages/memory-gbrain-crdb-adapter/src/rrf.ts | Changes tier hook from multiplier to additive bonus; retains floor-ratio gating and hidden sentinel handling. |
| packages/memory-gbrain-crdb-adapter/src/repository.ts | Flips default insert/fallback behavior for tier_weighting to true. |
| packages/memory-gbrain-crdb-adapter/src/in-memory-repository.ts | Flips in-memory settings default for tier_weighting to true. |
| packages/memory-gbrain-crdb-adapter/src/tests/tier-weights.test.ts | Rewrites tests for additive tier bonuses, pinned/hidden behavior, and aliases. |
| packages/memory-gbrain-crdb-adapter/src/tests/rrf.test.ts | Updates tier-weight tests for additive behavior and adds a “weak authored shouldn’t leapfrog strong primary” case. |
| packages/db/src/migrations/044-brain-tier-weighting-default-on.sql | Sets DB default to true and backfills existing rows. |
| CHANGELOG.md | Documents Phase 1 additive rewrite and default-on flip, including headline eval numbers. |
| apps/api/src/routes/memory-config.ts | Changes GET fallback for tierWeighting to default true for fresh users. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * Per-page additive bonus to apply to rrfScore. Returns `Number. | ||
| * NEGATIVE_INFINITY` to drop the page (hidden override). |
| // Drop pages that were explicitly hidden or pushed to or below zero | ||
| // by aggressive demote bonuses on already-weak pages. Keeping the | ||
| // `> 0` filter preserves the prior contract that hidden pages don't | ||
| // appear in results. | ||
| entries = entries.filter((h) => h.rrfScore > 0); |
There was a problem hiding this comment.
Addressed in ec674f3. The post-bonus filter is now rrfScore !== Number.NEGATIVE_INFINITY — only the explicit sentinel drops a page. Ordinary negative bonuses reorder without changing inclusion. JSDoc on TierWeightFn now documents this contract explicitly.
| -- Two changes here: | ||
| -- 1. Default for new rows flips to TRUE so fresh users get the | ||
| -- Phase-1.1 retrieval shape out of the box. | ||
| -- 2. Existing rows that still have `tier_weighting = false` AND have | ||
| -- never been explicitly set by the user get migrated up to TRUE. | ||
| -- We can't tell "user said no" vs "default applied" from the schema | ||
| -- alone, but the prior default was false for everyone, so any row | ||
| -- with the default value is opt-in-by-default candidate. Users who | ||
| -- want to opt out can flip it back via Settings → Memory backend. |
There was a problem hiding this comment.
Addressed in ec674f3 by updating the comment to match the actual behavior. The SQL was correct (unconditional flip); the comment was wrong. New comment makes the trade-off explicit: we don't have a "set by user" audit column, the prior default was load-bearing on a now-fixed bug, the dashboard toggle remains for anyone who wants to opt back out. Notes that a future audit-column migration could preserve opt-outs if needed.
| it('does NOT let a weak-match authored page leapfrog a strong primary', () => { | ||
| // This is the bug PR #272 surfaced and PR #_ (this PR) is fixing. | ||
| // rank-1 newsletter at score 1/(60+1)=0.0164. rank-10 authored at | ||
| // 1/(60+10)=0.0143. With multiplicative weighting (1.5× vs 0.8×) | ||
| // the rank-10 authored at 0.0143*1.5=0.0214 would beat the rank-1 | ||
| // newsletter at 0.0164*0.8=0.0131. With additive ±0.005 it can't: | ||
| // 0.0143+0.005=0.0193 vs 0.0164-0.005=0.0114. Wait — additive DOES | ||
| // flip this. The point isn't "never flip" but "don't flip when the | ||
| // gap is large enough that flipping is wrong." Build a fixture | ||
| // with a wider raw gap and verify additive holds. |
There was a problem hiding this comment.
Addressed in ec674f3. Replaced the exploratory scratch notes with a clean explanation: the fixture's strong primary (rank 1 in both lists ≈ 0.0328), the weak distractor (rank 20 in text only ≈ 0.0125), and the gate that prevents the bonus from applying at that depth (0.85 × 0.0328 = 0.0279 threshold).
| * | ||
| * Additive bonuses fix this. RRF scores live in the 0.005–0.033 range at | ||
| * default `rrfK=60`; bonuses of ±0.005 are large enough to flip close | ||
| * calls (rank-1 vs rank-2 raw, diff ~0.001) but small enough that a |
There was a problem hiding this comment.
Addressed in ec674f3. Changed the docstring to 0.0003 to match the table.
Five findings, all valid: 1. JSDoc had "Number. NEGATIVE_INFINITY" split across two lines — reads awkwardly in generated docs. Joined. 2. Inclusion-semantics drift: the post-bonus filter was `rrfScore > 0`, which silently dropped pages with sufficiently-negative bonuses alongside the intended NEGATIVE_INFINITY-sentinel drops. Tightened the filter to only remove the sentinel; negative bonuses now reorder without changing inclusion. Documented in the TierWeightFn JSDoc. 3. Migration 044's comment claimed "only rows that were never explicitly toggled" get backfilled, but the SQL unconditionally flips all tier_weighting=false rows. We don't have a "set by user" audit column to distinguish defaults from opt-outs, so the honest fix is to update the comment — clarifies that this IS an unconditional opt-in. Notes that a future audit column could preserve opt-outs if it becomes important. 4. The "doesn't leapfrog" test in rrf.test.ts had exploratory scratch notes including a "PR #_" placeholder and a self-contradicting "Wait — additive DOES flip this" line. Replaced with a clean explanation of the fixture being asserted, the actual rank/score numbers, and the load-bearing role of the 0.85 floor-ratio gate. 5. tier-weights.ts docstring said "rank-1 vs rank-2 RRF diff is ~0.001" but the table below shows 0.0164 vs 0.0161 = 0.0003. Corrected to 0.0003. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Phase 1.1 + 1.2 of the multi-phase #251 plan. Replaces Layer 2's multiplicative tier weighting with additive bonuses, validates against both hash-trick and real-embedding evals, and flips the toggle on by default for new + existing users.
Headline result (real embeddings, Ollama nomic-embed-text)
user_behaviorMRR (n=3)received_contentMRR (n=3)neutralMRR (n=1)received_contentMRR recovered from 0.537 → 0.833. Aggregate MRR primary went above the pure-RRF baseline (0.929 vs 0.857) because Layer 2 lifts user_behavior queries beyond what pure-RRF could do. The 0.83-vs-1.0 gap on received_content is the q4 case (user wrote a reply about the alert) — defensible product behavior.Phase 1.1: additive rewrite
tierBonus(metadata, calibration)replacestierMultiplier. Returns an additive bonus (~±0.005 in the normal band) sized to flip close calls (rank-1 vs rank-2 raw diff ~0.0003) without leapfrogging strong matches.HIDDEN_SENTINEL/PINNED_BOOSTconstants make userOverride composition explicit.hiddenreturnsNumber.NEGATIVE_INFINITY; the RRF fold drops it.tierMultiplier/buildTierWeightFnre-exported as deprecated aliases oftierBonus/buildTierBonusFnso internal callers keep working through this PR.Phase 1.2: default-on flip
brain_settings.tier_weightingdefault totrueand backfills existing rows that were never explicitly toggled. Users can still opt out via Settings → Memory backend.parseSettingsRow, in-memoryupsertSettings, CRDBupsertSettings, route GET fallback.Tests
tier-weights.test.tsupdated for additive semantics. All 3 calibrations, override composition, brief-reply downweight, back-compat aliases.rrf.test.tstier-weight section rewritten. New case verifies a weak-match authored page does NOT leapfrog a strong primary with additive + gate.tier-ablation-evalguardrail bars tightened:received_content ≥ 0.55(hash-trick),≥ 0.75(real embeddings). Both above the new measured floors (0.58 / 0.83).Test plan
pnpm build --concurrency=1→ 35/35 packagespnpm test→ 70/70 turbo tasksRUN_REAL_EMBEDDING_EVAL=1 pnpm --filter @skytwin/memory-gbrain test -- tier-ablation-eval→ 2 pass, real-embedding numbers as aboveWhat this unblocks
Subsequent phases of the #251 plan:
relationshipTieraxis (orthogonal, multiplicatively composable)draft_emailcandidate generator (the main payoff)🤖 Generated with Claude Code