techempower-org
diff --git a/‎FORK_CHANGELOG.md‎
Lines changed: 50 additions & 0 deletions b/‎FORK_CHANGELOG.md‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 74 additions & 73 deletions b/‎README.md‎
Lines changed: 74 additions & 73 deletions
diff --git a/‎docs/fork-changes.yaml‎
Lines changed: 52 additions & 0 deletions b/‎docs/fork-changes.yaml‎
Lines changed: 52 additions & 0 deletions
diff --git a/‎docs/research/2026-05-28-rrf-vs-hybrid-rerank-ab.json‎
Lines changed: 186 additions & 0 deletions b/‎docs/research/2026-05-28-rrf-vs-hybrid-rerank-ab.json‎
Lines changed: 186 additions & 0 deletions
@@ -18,6 +18,56 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 ---
 
 
+## [2026-05-28]
+
+
+### Added
+
+
+- **RRF vs convex-blend rerank — A/B measurement on our corpus (#162)** ([`TBD`](https://github.com/techempower-org/mempalace/commit/TBD))
+  Closes the measurement promised by [#162][162] (the harness landed
+  in #247 — this PR runs it). Adds a self-contained ``--mine-corpus``
+  mode to ``scripts/eval_fusion_ab.py`` that mines the named
+  directory into a fresh local ChromaDB palace and runs the
+  convex-vs-RRF A/B against it. Doesn't touch the production daemon
+  or share GPU capacity with real callers; the prior
+  ``--i-know-the-backfill-is-done`` gate stays on the
+  ``--palace-path`` mode for future daemon-side wiring.
+
+  The probe-set loader now accepts both the v2 dict shape
+  (``{"_meta": ..., "probes": [{query, expected, why}, ...]}``,
+  which the only checked-in probe file ``scripts/probes_v2_git_derived.json``
+  uses) and the legacy list-of-lists shape. This matches the
+  multi-encoder eval harness's loader so probe sets are
+  interchangeable between the two.
+
+  Findings + per-probe data in
+  ``docs/research/2026-05-28-rrf-vs-hybrid-rerank-ab.md``
+  (the human-readable summary) and the companion ``.json`` (raw
+  ranks + deltas for follow-up analysis). **One-line:** RRF
+  underperforms the convex blend on this corpus — MRR 0.4075 →
+  0.3758 (−0.0318), Recall@10 52% → 47% (−5 pp), 20 regressions
+  to 10 improvements. The convex blend's vector-heavy weighting
+  (0.6/0.4) is doing real work; RRF's score-scale-agnostic
+  treatment loses strong vector signal in the rank-1-under-convex
+  cases. Convex stays the default; ``fusion_mode="rrf"`` stays
+  shipping as the explicit opt-in for callers who want it.
+
+  Not in scope:
+
+  * Running against the production palace. The daemon's
+    ``/search/hybrid`` hard-codes ``candidate_strategy="hybrid"``
+    and doesn't forward a ``fusion_mode`` body field, so RRF can't
+    be driven remotely today. Filed forward as a palace-daemon
+    change if the A/B result motivates it.
+  * Sweeping the RRF ``k`` smoothing constant. Default 60 (Cormack
+    2009); a sweep is a follow-up if RRF is competitive enough to
+    be worth refining.
+
+  *Tests:* 29 — tests/test_eval_fusion_ab.py (probe-loader accepts v2 dict + legacy list shapes, rejects scalars and dicts-without-probes-list; main rejects --mine-corpus + --palace-path together; main refuses --palace-path without --i-know-the-backfill-is-done; pre-existing run-orchestration + scoring-math tests retained)
+  *Files:* `scripts/eval_fusion_ab.py`, `tests/test_eval_fusion_ab.py`, `docs/research/2026-05-28-rrf-vs-hybrid-rerank-ab.md`, `docs/research/2026-05-28-rrf-vs-hybrid-rerank-ab.json`
+
+
 ## [2026-05-27]
 
 
 
@@ -24,6 +24,58 @@
 
 entries:
 
+  - id: rrf-vs-hybrid-rerank-ab-run
+    date: 2026-05-28
+    bucket: Added
+    commit: TBD
+    area: Search
+    summary: "RRF vs convex-blend rerank — A/B measurement on our corpus (#162)"
+    tests: "29 — tests/test_eval_fusion_ab.py (probe-loader accepts v2 dict + legacy list shapes, rejects scalars and dicts-without-probes-list; main rejects --mine-corpus + --palace-path together; main refuses --palace-path without --i-know-the-backfill-is-done; pre-existing run-orchestration + scoring-math tests retained)"
+    files:
+      - scripts/eval_fusion_ab.py
+      - tests/test_eval_fusion_ab.py
+      - docs/research/2026-05-28-rrf-vs-hybrid-rerank-ab.md
+      - docs/research/2026-05-28-rrf-vs-hybrid-rerank-ab.json
+    body: |
+      Closes the measurement promised by [#162][162] (the harness landed
+      in #247 — this PR runs it). Adds a self-contained ``--mine-corpus``
+      mode to ``scripts/eval_fusion_ab.py`` that mines the named
+      directory into a fresh local ChromaDB palace and runs the
+      convex-vs-RRF A/B against it. Doesn't touch the production daemon
+      or share GPU capacity with real callers; the prior
+      ``--i-know-the-backfill-is-done`` gate stays on the
+      ``--palace-path`` mode for future daemon-side wiring.
+
+      The probe-set loader now accepts both the v2 dict shape
+      (``{"_meta": ..., "probes": [{query, expected, why}, ...]}``,
+      which the only checked-in probe file ``scripts/probes_v2_git_derived.json``
+      uses) and the legacy list-of-lists shape. This matches the
+      multi-encoder eval harness's loader so probe sets are
+      interchangeable between the two.
+
+      Findings + per-probe data in
+      ``docs/research/2026-05-28-rrf-vs-hybrid-rerank-ab.md``
+      (the human-readable summary) and the companion ``.json`` (raw
+      ranks + deltas for follow-up analysis). **One-line:** RRF
+      underperforms the convex blend on this corpus — MRR 0.4075 →
+      0.3758 (−0.0318), Recall@10 52% → 47% (−5 pp), 20 regressions
+      to 10 improvements. The convex blend's vector-heavy weighting
+      (0.6/0.4) is doing real work; RRF's score-scale-agnostic
+      treatment loses strong vector signal in the rank-1-under-convex
+      cases. Convex stays the default; ``fusion_mode="rrf"`` stays
+      shipping as the explicit opt-in for callers who want it.
+
+      Not in scope:
+
+      * Running against the production palace. The daemon's
+        ``/search/hybrid`` hard-codes ``candidate_strategy="hybrid"``
+        and doesn't forward a ``fusion_mode`` body field, so RRF can't
+        be driven remotely today. Filed forward as a palace-daemon
+        change if the A/B result motivates it.
+      * Sweeping the RRF ``k`` smoothing constant. Default 60 (Cormack
+        2009); a sweep is a follow-up if RRF is competitive enough to
+        be worth refining.
+
   - id: cli-bulk-move-relocation
     date: 2026-05-27
     bucket: Added
 
@@ -0,0 +1,186 @@
+{
+  "label_a": "convex",
+  "label_b": "rrf",
+  "metrics_a": {
+    "n_probes": 200,
+    "mrr": 0.4075,
+    "recall_at_5": 0.47,
+    "recall_at_10": 0.52,
+    "found": 104
+  },
+  "metrics_b": {
+    "n_probes": 200,
+    "mrr": 0.3758,
+    "recall_at_5": 0.45,
+    "recall_at_10": 0.47,
+    "found": 94
+  },
+  "delta": {
+    "mrr": -0.0318,
+    "recall_at_5": -0.02,
+    "recall_at_10": -0.05
+  },
+  "improved": [
+    {
+      "query": "Plainto_tsquery + ILIKE fallback for underscore identifiers",
+      "rank_a": 2,
+      "rank_b": 1
+    },
+    {
+      "query": "Add --mode session for per-session manifest drawers",
+      "rank_a": 3,
+      "rank_b": 2
+    },
+    {
+      "query": "Quote ChromaBackend annotation for Python 3.9 compatibility",
+      "rank_a": null,
+      "rank_b": 9
+    },
+    {
+      "query": "Add hook_verbatim_mode toggle for transcript ingest",
+      "rank_a": null,
+      "rank_b": 5
+    },
+    {
+      "query": "Atomic write to prevent partial corruption on crash",
+      "rank_a": 5,
+      "rank_b": 3
+    },
+    {
+      "query": "Reconfigure stdio to UTF-8 on Windows",
+      "rank_a": 6,
+      "rank_b": 5
+    },
+    {
+      "query": "Avoid false hnsw divergence fallback",
+      "rank_a": 4,
+      "rank_b": 3
+    },
+    {
+      "query": "Harden HNSW startup preflight",
+      "rank_a": null,
+      "rank_b": 2
+    },
+    {
+      "query": "Add multi-agent support roadmap (Claude Code, OpenCode, Cursor, Aider, Gemini CLI, Codex CLI, Warp)",
+      "rank_a": 5,
+      "rank_b": 3
+    },
+    {
+      "query": "Paginate closet_llm col.get (#1073)",
+      "rank_a": null,
+      "rank_b": 6
+    }
+  ],
+  "regressed": [
+    {
+      "query": "Mempalace rooms subcommand + example config.yaml",
+      "rank_a": 10,
+      "rank_b": null
+    },
+    {
+      "query": "Emit only canonical rooms per hybrid-search-taxonomy spec",
+      "rank_a": 6,
+      "rank_b": null
+    },
+    {
+      "query": "Use pg_available_extensions.name (not extname) in preflight",
+      "rank_a": 1,
+      "rank_b": 2
+    },
+    {
+      "query": "MEMPALACE_KG_BACKEND routing — sqlite default, age opt-in (Phase 2.4)",
+      "rank_a": 1,
+      "rank_b": 2
+    },
+    {
+      "query": "Identify lock holder + exit non-zero on contention",
+      "rank_a": 6,
+      "rank_b": null
+    },
+    {
+      "query": "Retry tool_search once on Chroma \"Error finding id\" transient (#1315)",
+      "rank_a": 1,
+      "rank_b": null
+    },
+    {
+      "query": "Fix ruff bugbear and silent-except findings",
+      "rank_a": 7,
+      "rank_b": null
+    },
+    {
+      "query": "Clamp similarity to [0,1] to avoid negative values",
+      "rank_a": 10,
+      "rank_b": null
+    },
+    {
+      "query": "Handle null JSON-RPC request payloads safely",
+      "rank_a": 5,
+      "rank_b": null
+    },
+    {
+      "query": "Clamp effective_distance to valid cosine range [0, 2]",
+      "rank_a": 1,
+      "rank_b": null
+    },
+    {
+      "query": "Preflight SQLite integrity before rebuild",
+      "rank_a": 3,
+      "rank_b": 4
+    },
+    {
+      "query": "Verify write roundtrip before bailout",
+      "rank_a": 3,
+      "rank_b": null
+    },
+    {
+      "query": "Don't write chunking defaults in cfg.init()",
+      "rank_a": 3,
+      "rank_b": null
+    },
+    {
+      "query": "Extract Windows UTF-8 reconfigure into shared helper",
+      "rank_a": 1,
+      "rank_b": 2
+    },
+    {
+      "query": "Per-stream stdio errors policy on Windows",
+      "rank_a": 7,
+      "rank_b": null
+    },
+    {
+      "query": "Address Copilot review on #1306",
+      "rank_a": 1,
+      "rank_b": null
+    },
+    {
+      "query": "Basename source_file in tool_get_drawer responses",
+      "rank_a": 9,
+      "rank_b": null
+    },
+    {
+      "query": "Close active backend before rollback restore",
+      "rank_a": 1,
+      "rank_b": 2
+    },
+    {
+      "query": "Split get_or_create_collection on reopen (follow-up to #1262)",
+      "rank_a": 1,
+      "rank_b": null
+    },
+    {
+      "query": "Address release review feedback",
+      "rank_a": 1,
+      "rank_b": 2
+    }
+  ],
+  "candidate_strategy": "hybrid",
+  "n_results": 10,
+  "timing_secs": {
+    "convex": 32.21,
+    "rrf": 29.51
+  },
+  "palace_path": "/tmp/fusion_ab_palace_sqz0u5_f",
+  "mined_corpus": "/home/jp/Projects/memorypalace/.claude/worktrees/feat-162-hybrid-rrf-ab/mempalace",
+  "drawer_count": 3413
+}