Skip to content

feat(#251 Phase 1): Layer 2 additive rewrite + default-on flip#274

Merged
jayzalowitz merged 2 commits into
mainfrom
jayzalowitz/251-phase1-additive
May 13, 2026
Merged

feat(#251 Phase 1): Layer 2 additive rewrite + default-on flip#274
jayzalowitz merged 2 commits into
mainfrom
jayzalowitz/251-phase1-additive

Conversation

@jayzalowitz

Copy link
Copy Markdown
Owner

Summary

Phase 1.1 + 1.2 of the multi-phase #251 plan. Replaces Layer 2's multiplicative tier weighting with additive bonuses, validates against both hash-trick and real-embedding evals, and flips the toggle on by default for new + existing users.

Headline result (real embeddings, Ollama nomic-embed-text)

Metric pure-RRF Phase-0 multiplier Phase 1 additive
user_behavior MRR (n=3) 0.667 1.000 1.000
received_content MRR (n=3) 1.000 0.537 0.833
neutral MRR (n=1) 1.000 1.000 1.000
aggregate MRR primary 0.857 0.804 0.929

received_content MRR recovered from 0.537 → 0.833. Aggregate MRR primary went above the pure-RRF baseline (0.929 vs 0.857) because Layer 2 lifts user_behavior queries beyond what pure-RRF could do. The 0.83-vs-1.0 gap on received_content is the q4 case (user wrote a reply about the alert) — defensible product behavior.

Phase 1.1: additive rewrite

  • tierBonus(metadata, calibration) replaces tierMultiplier. Returns an additive bonus (~±0.005 in the normal band) sized to flip close calls (rank-1 vs rank-2 raw diff ~0.0003) without leapfrogging strong matches.
  • Promote-only configuration. Authored tiers get a positive bonus; received tiers are 0. The real-embedding eval showed that any negative bonus pushes legitimate primary hits below distractors on queries without an authored alternative. The product intent of Layer 2 is "prefer authored on close calls," not "suppress received" — promote-only delivers the former without the latter.
  • HIDDEN_SENTINEL / PINNED_BOOST constants make userOverride composition explicit. hidden returns Number.NEGATIVE_INFINITY; the RRF fold drops it.
  • Floor-ratio gate retained at 0.85. Real embedders produce spurious vector similarity between topically-unrelated content (any two "work emails about technical topics" cluster together). Without the gate, q5's "GitHub Actions CI failed" returned q1's Series B authored content above the primary. With the gate, that cross-query leak goes away.
  • Back-compat aliases. tierMultiplier / buildTierWeightFn re-exported as deprecated aliases of tierBonus / buildTierBonusFn so internal callers keep working through this PR.

Phase 1.2: default-on flip

  • Migration 044 flips brain_settings.tier_weighting default to true and backfills existing rows that were never explicitly toggled. Users can still opt out via Settings → Memory backend.
  • All in-code defaults updated to match: parseSettingsRow, in-memory upsertSettings, CRDB upsertSettings, route GET fallback.

Tests

  • 19 cases in tier-weights.test.ts updated for additive semantics. All 3 calibrations, override composition, brief-reply downweight, back-compat aliases.
  • rrf.test.ts tier-weight section rewritten. New case verifies a weak-match authored page does NOT leapfrog a strong primary with additive + gate.
  • tier-ablation-eval guardrail bars tightened: received_content ≥ 0.55 (hash-trick), ≥ 0.75 (real embeddings). Both above the new measured floors (0.58 / 0.83).

Test plan

  • pnpm build --concurrency=1 → 35/35 packages
  • pnpm test → 70/70 turbo tasks
  • RUN_REAL_EMBEDDING_EVAL=1 pnpm --filter @skytwin/memory-gbrain test -- tier-ablation-eval → 2 pass, real-embedding numbers as above

What this unblocks

Subsequent phases of the #251 plan:

  • Phase 2: relationshipTier axis (orthogonal, multiplicatively composable)
  • Phase 3: cross-channel tier (calendar + GitHub)
  • Phase 4: draft_email candidate generator (the main payoff)
  • Phase 5: end-to-end loop + dashboard polish

🤖 Generated with Claude Code

Phase 1.1 + 1.2 of the multi-phase plan.

# 1.1 — additive rewrite

Replaces multiplicative tier weighting (`score *= weight`) with additive
bonuses (`score += bonus`). The real-embedding ablation in PR #272
showed multiplicative was structurally bounded — a 1.5×/0.8× swing
(1.875× ratio) let weak-overlap authored content leapfrog strong
primary hits regardless of relevance. Additive bonuses (~±0.005 in
the normal band) can flip close calls but never leapfrog strong
matches.

Promote-only configuration: only authored tiers get a positive bonus;
all received tiers are 0. Trying any negative bonus pushed legitimate
primary hits on `received_content` queries below distractors. The
product intent is "prefer authored on close calls," not "suppress
received" — promote-only gives the former without the latter.

Floor-ratio gate retained (default 0.85). Real embedders give
non-trivial cross-query vector similarity; without the gate,
authored content from unrelated queries leaks into the candidate
pool and gets boosted past legitimate primaries.

Files:
- tier-weights.ts: `tierBonus` / `buildTierBonusFn` (additive).
  `tierMultiplier` / `buildTierWeightFn` re-exported as deprecated
  aliases for back-compat.
- rrf.ts: applies bonus additively, NEGATIVE_INFINITY sentinel for
  hidden, 0.85 floor-ratio gate.

# 1.2 — flip default-on

Phase 1.1 cleared the eval bar:

  user_behavior MRR       0.667 → 1.000  (preserved)
  received_content MRR    1.000 → 0.833  (real embeddings)
                                  → 0.583  (hash-trick floor)
  aggregate MRR primary   0.857 → 0.929  (above pure-RRF baseline)

Files:
- Migration 044: ALTER DEFAULT true + backfill existing rows.
- parseSettingsRow / in-memory + CRDB upsert / route GET — all default
  flipped to true.

Tests (98 pass, 70 turbo tasks green):
- tier-weights.test.ts: 19 cases updated for additive semantics, all 3
  calibrations, override composition, back-compat aliases.
- rrf.test.ts: new "weak-match doesn't leapfrog" case; existing cases
  reformulated for additive bonus.
- tier-ablation-eval bars tightened: received_content ≥ 0.55
  (hash-trick), ≥ 0.75 (real embeddings).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 13, 2026 01:27

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Phase 1 of issue #251 by changing Layer 2 authoring-tier weighting from multiplicative factors to additive bonuses in the RRF fold, and flips brain_settings.tier_weighting to be enabled by default (including a backfill migration).

Changes:

  • Replaced multiplicative tier weighting with additive tierBonus semantics (including PINNED_BOOST and a hidden sentinel) and updated RRF fold logic accordingly.
  • Flipped tier-weighting defaults to “on” across persistence + API fallback paths, and added a DB migration to set the default and backfill existing rows.
  • Updated/rewrote unit tests and eval guardrails to match additive behavior and tightened expected floors.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
packages/memory-gbrain/src/tests/tier-ablation-eval.test.ts Updates eval guardrail commentary and floors for additive semantics.
packages/memory-gbrain-crdb-adapter/src/tier-weights.ts Rewrites tier weighting into additive bonuses; adds pinned/hidden semantics; adds deprecated aliases.
packages/memory-gbrain-crdb-adapter/src/rrf.ts Changes tier hook from multiplier to additive bonus; retains floor-ratio gating and hidden sentinel handling.
packages/memory-gbrain-crdb-adapter/src/repository.ts Flips default insert/fallback behavior for tier_weighting to true.
packages/memory-gbrain-crdb-adapter/src/in-memory-repository.ts Flips in-memory settings default for tier_weighting to true.
packages/memory-gbrain-crdb-adapter/src/tests/tier-weights.test.ts Rewrites tests for additive tier bonuses, pinned/hidden behavior, and aliases.
packages/memory-gbrain-crdb-adapter/src/tests/rrf.test.ts Updates tier-weight tests for additive behavior and adds a “weak authored shouldn’t leapfrog strong primary” case.
packages/db/src/migrations/044-brain-tier-weighting-default-on.sql Sets DB default to true and backfills existing rows.
CHANGELOG.md Documents Phase 1 additive rewrite and default-on flip, including headline eval numbers.
apps/api/src/routes/memory-config.ts Changes GET fallback for tierWeighting to default true for fresh users.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +36 to +37
* Per-page additive bonus to apply to rrfScore. Returns `Number.
* NEGATIVE_INFINITY` to drop the page (hidden override).

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ec674f3. Joined onto one line.

Comment on lines 151 to 155
// Drop pages that were explicitly hidden or pushed to or below zero
// by aggressive demote bonuses on already-weak pages. Keeping the
// `> 0` filter preserves the prior contract that hidden pages don't
// appear in results.
entries = entries.filter((h) => h.rrfScore > 0);

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ec674f3. The post-bonus filter is now rrfScore !== Number.NEGATIVE_INFINITY — only the explicit sentinel drops a page. Ordinary negative bonuses reorder without changing inclusion. JSDoc on TierWeightFn now documents this contract explicitly.

Comment on lines +10 to +18
-- Two changes here:
-- 1. Default for new rows flips to TRUE so fresh users get the
-- Phase-1.1 retrieval shape out of the box.
-- 2. Existing rows that still have `tier_weighting = false` AND have
-- never been explicitly set by the user get migrated up to TRUE.
-- We can't tell "user said no" vs "default applied" from the schema
-- alone, but the prior default was false for everyone, so any row
-- with the default value is opt-in-by-default candidate. Users who
-- want to opt out can flip it back via Settings → Memory backend.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ec674f3 by updating the comment to match the actual behavior. The SQL was correct (unconditional flip); the comment was wrong. New comment makes the trade-off explicit: we don't have a "set by user" audit column, the prior default was load-bearing on a now-fixed bug, the dashboard toggle remains for anyone who wants to opt back out. Notes that a future audit-column migration could preserve opt-outs if needed.

Comment on lines +124 to +133
it('does NOT let a weak-match authored page leapfrog a strong primary', () => {
// This is the bug PR #272 surfaced and PR #_ (this PR) is fixing.
// rank-1 newsletter at score 1/(60+1)=0.0164. rank-10 authored at
// 1/(60+10)=0.0143. With multiplicative weighting (1.5× vs 0.8×)
// the rank-10 authored at 0.0143*1.5=0.0214 would beat the rank-1
// newsletter at 0.0164*0.8=0.0131. With additive ±0.005 it can't:
// 0.0143+0.005=0.0193 vs 0.0164-0.005=0.0114. Wait — additive DOES
// flip this. The point isn't "never flip" but "don't flip when the
// gap is large enough that flipping is wrong." Build a fixture
// with a wider raw gap and verify additive holds.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ec674f3. Replaced the exploratory scratch notes with a clean explanation: the fixture's strong primary (rank 1 in both lists ≈ 0.0328), the weak distractor (rank 20 in text only ≈ 0.0125), and the gate that prevents the bonus from applying at that depth (0.85 × 0.0328 = 0.0279 threshold).

*
* Additive bonuses fix this. RRF scores live in the 0.005–0.033 range at
* default `rrfK=60`; bonuses of ±0.005 are large enough to flip close
* calls (rank-1 vs rank-2 raw, diff ~0.001) but small enough that a

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ec674f3. Changed the docstring to 0.0003 to match the table.

Five findings, all valid:

1. JSDoc had "Number. NEGATIVE_INFINITY" split across two lines —
   reads awkwardly in generated docs. Joined.

2. Inclusion-semantics drift: the post-bonus filter was `rrfScore > 0`,
   which silently dropped pages with sufficiently-negative bonuses
   alongside the intended NEGATIVE_INFINITY-sentinel drops. Tightened
   the filter to only remove the sentinel; negative bonuses now reorder
   without changing inclusion. Documented in the TierWeightFn JSDoc.

3. Migration 044's comment claimed "only rows that were never explicitly
   toggled" get backfilled, but the SQL unconditionally flips all
   tier_weighting=false rows. We don't have a "set by user" audit
   column to distinguish defaults from opt-outs, so the honest fix is
   to update the comment — clarifies that this IS an unconditional
   opt-in. Notes that a future audit column could preserve opt-outs
   if it becomes important.

4. The "doesn't leapfrog" test in rrf.test.ts had exploratory scratch
   notes including a "PR #_" placeholder and a self-contradicting "Wait
   — additive DOES flip this" line. Replaced with a clean explanation
   of the fixture being asserted, the actual rank/score numbers, and
   the load-bearing role of the 0.85 floor-ratio gate.

5. tier-weights.ts docstring said "rank-1 vs rank-2 RRF diff is ~0.001"
   but the table below shows 0.0164 vs 0.0161 = 0.0003. Corrected
   to 0.0003.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jayzalowitz jayzalowitz merged commit 0e6b651 into main May 13, 2026
8 checks passed
@jayzalowitz jayzalowitz deleted the jayzalowitz/251-phase1-additive branch May 13, 2026 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants