You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor(searcher): retire kind= filter — structural split made it inert
The `kind=` post-filter and the `max(n*20, 100)` over-fetch hack were
transitional safety nets put in place 2026-04-25 when the main collection
still carried Stop-hook auto-save checkpoint drawers that dominated
vector top-N. Phases A–E of the checkpoint collection split (2026-04-25
→ 2026-04-26) moved all checkpoints to `mempalace_session_recovery`;
empirical check on the canonical 150,891-drawer production palace finds
0 drawers with `topic=checkpoint` and 0 with `topic=auto-save` in the
main collection (763 in `mempalace_session_recovery`). The filter was
filtering nothing.
Deletes:
- `_CHECKPOINT_TOPICS` from `searcher.py` (moved to `palace.py` next to
`_SESSION_RECOVERY_COLLECTION` — write-side routing in
`tool_diary_write` still needs it; read-side does not)
- `_is_checkpoint_drawer` and `_apply_kind_text_filter` post-filter
- `kind=` parameter on `search_memories`, `build_where_filter`, `search`
(CLI delegate), and `mempalace_search` MCP tool (input_schema entry
removed)
- The `max(n*20, 100)` over-fetch hack — back to standard `n_results * 3`
- `TestCheckpointFilter` (9 tests) in `tests/test_searcher.py`
`migrate.py` now imports `_CHECKPOINT_TOPICS` from `palace.py` instead
of carrying its own duplicate. `layers.py` calls drop their now-unused
`kind="all"` argument.
Tests: 1500 passing (was 1510 — TestCheckpointFilter's 9 + 1 misc).
Companion change in palace-daemon strips `kind=` from `/search` and
`/context` HTTP routes. Production verified 0 checkpoints in main
before deletion.
Copy file name to clipboardExpand all lines: CLAUDE.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,10 +50,10 @@ Ruff for linting (`ruff check`), line length 100, target Python 3.9.
50
50
18.~~**fix: PID file guard prevents stacking mine processes**~~ — **merged upstream via #1023 in v3.3.2.** Includes the Windows `os.kill` → `OpenProcess` cross-platform fix. No longer fork-ahead.
51
51
19.**fix: `.claude-plugin/` venv-aware Python resolution** — hooks (`mempal-stop-hook.sh`, `mempal-precompact-hook.sh`) and `.mcp.json` resolve Python in this order: `MEMPALACE_PYTHON` env → `$PLUGIN_ROOT/venv/bin/python3` → system `python3`. Upstream's `5fe0c1c` + `be9214a` (fatkobra) and `9f5b8f5` (Pim) regressed to PATH-only lookups and bare `"mempalace-mcp"` command, which break editable dev installs where `mempalace`/`mempalace-mcp` only live in the repo venv. Documented here so future `upstream/develop` merges surface the conflict rather than silently re-regress. Attempted via #1115 on 2026-04-22; withdrew 2026-04-23 as premature pending #1069 arbitration — CI correctly caught the #942 PATH-only contract violation. Re-submit after bensig's direction on #1069.
52
52
20.~~**fix: `_tokenize` None-document guard**~~ — **merged upstream via #1198 on 2026-04-26.** No longer fork-ahead.
53
-
21. **feat: `kind` filter on `search_memories` excludes Stop-hook checkpoints by default** (commits `8d02835` → `3d85739` → `398f42f` → `f9f5cc4`, 2026-04-25) — Stop-hook auto-save diary entries (topic=checkpoint, text starting `"CHECKPOINT:"`) were dominating MCP search results because they're short, word-dense, and outrank substantive content under cosine similarity. New `kind` parameter on `search_memories` and `mempalace_search` MCP tool: `"content"` (default, excludes checkpoints), `"checkpoint"` (only checkpoints, recovery/audit), `"all"` (no filter, pre-2026-04-25 behavior). **Two architecture corrections during the same day:** (a) the where-clause filter (`topic $nin [...]`) tripped a ChromaDB 1.5.x filter-planner bug — `Internal error: Error finding id` on every kind=content vector query — so the exclusion moved to post-filter only (`398f42f`); (b) vector top-N is dominated by checkpoints on this palace (top-10 hits all CHECKPOINT entries on probe queries), so post-filter alone empties the result set without aggressive over-fetch — pull size raised to `max(n*20, 100)` for kind != "all" (`f9f5cc4`). Post-filter checks both `topic` metadata and text-prefix shape; coverage equivalent to the original belt-and-suspenders without the chromadb bug. Result dicts now surface `topic`. 9 tests in `TestCheckpointFilter`. Companion fix in [`jphein/palace-daemon`](https://github.com/jphein/palace-daemon) commit `dd8894c` standardizes all hook clients on `topic="checkpoint"` (was `topic="auto-save"` in `clients/hook.py`). Structural fix still pending: stop indexing checkpoints as searchable drawers (separate session-recovery table). Upstream PR pending.
53
+
21.~~**feat: `kind` filter on `search_memories` excludes Stop-hook checkpoints by default**~~ — **deleted 2026-04-27 as transitional/inert.** The structural split (Phases A–E, see row 23) moved all checkpoints to `mempalace_session_recovery`; production has 0 checkpoints in `mempalace_drawers`, so the filter was filtering nothing. Removed `_CHECKPOINT_TOPICS` from `searcher.py`, `_is_checkpoint_drawer`, `_apply_kind_text_filter`, the `max(n*20, 100)` over-fetch hack (back to `n_results * 3`), and the `kind=` parameter on `search_memories` / `mempalace_search` / palace-daemon `/search` & `/context`. Write-side `_CHECKPOINT_TOPICS` (topic→collection routing in `tool_diary_write`) lives in `palace.py` now alongside `_SESSION_RECOVERY_COLLECTION`. `TestCheckpointFilter` (9 tests) deleted.
54
54
22.~~**fix: `palace_graph.build_graph` skips None metadata**~~ — **merged upstream via #1201 on 2026-04-26.** No longer fork-ahead.
55
55
56
-
23. **feat: checkpoint collection split — phases A–C** (commit `e266365`, 2026-04-25) — Promoted from "future work" to "necessary" by 2026-04-25 Cat 9 A/B (`kind=all` 632 tokens/Q vs `kind=content` 3 tokens/Q on the canonical 151K palace; over-fetch=100 inadequate, structural fix non-optional). **Phase A:** new `_SESSION_RECOVERY_COLLECTION` constant + `get_session_recovery_collection()` in `palace.py` (mirrors `get_collection`'s shape — cosine, num_threads=1). **Phase B:** `tool_diary_write` routes `topic in _CHECKPOINT_TOPICS` to the dedicated `mempalace_session_recovery` collection, everything else stays in `mempalace_drawers`; new `_get_session_recovery_collection()` in `mcp_server.py` with parallel cache. **Phase C:** new `tool_session_recovery_read` MCP handler reads recovery collection only with optional filters `session_id`, `agent`, `since`, `until`, `wing`, `limit`; `session_id` added as optional metadata field on `tool_diary_write` so the new tool can filter by Claude Code session. Registered in `TOOLS` dict, documented in `website/reference/mcp-tools.md`. 12 new tests across `tests/test_session_recovery.py` + `TestCheckpointRouting` + `TestSessionRecoveryRead`. Design + plan at `docs/superpowers/specs/2026-04-25-checkpoint-collection-split.md` and `docs/superpowers/plans/2026-04-25-checkpoint-collection-split-impl.md`. **Phases D (data migration of ~640 existing checkpoints out of main collection) and E (palace-daemon `lifespan` auto-migrate + `mempalace repair --mode reorganize`) deferred** — multi-day work, gated on a separate go-ahead. Once D lands and the canonical-palace re-run shows the predicted `kind=all` ≈ `kind=content` token convergence, the `kind=` post-filter and over-fetch hack become deletable. **Update 2026-04-26:** phase D shipped — `migrate_checkpoints_to_recovery()` in `mempalace/migrate.py`, idempotent walk that moves topic in `_CHECKPOINT_TOPICS` drawers from main → recovery while preserving IDs and metadata. Wired into `mempalace repair --mode reorganize` (CLI dispatch in `cli.py` chooses between `rebuild` (HNSW from sqlite) and `reorganize` (this new path)). PreCompact hook also incorporated — `hook_precompact` now writes a recovery marker via `_save_diary_direct` mirroring Stop, so a context-compaction event leaves a queryable timestamp in the recovery collection. 6 new migration tests in `test_migrate.py::TestMigrateCheckpointsToRecovery`. **Phase E shipped** in palace-daemon commit [`034023c`](https://github.com/jphein/palace-daemon/commit/034023c) on 2026-04-26 — `lifespan` calls `migrate_checkpoints_to_recovery()` in an executor on startup, gated behind `PALACE_AUTO_MIGRATE_CHECKPOINTS=1` (default on), with `ImportError` fallthrough so upstream-shaped installs without `mempalace.migrate` still start cleanly. Canonical 151K palace migrated 667 checkpoints on 2026-04-26 10:24:09 PDT. **Cleanup phase pending** — once Cat 9 convergence (currently 974/1267 tokens/Q kind=all vs kind=content) is judged acceptable, delete `_CHECKPOINT_TOPICS`, `_apply_kind_text_filter`, the `max(n*20, 100)` over-fetch hack, and the `kind=` parameter on `search_memories` / `mempalace_search` / daemon `/search` & `/context` routes.
56
+
23. **feat: checkpoint collection split — phases A–C** (commit `e266365`, 2026-04-25) — Promoted from "future work" to "necessary" by 2026-04-25 Cat 9 A/B (`kind=all` 632 tokens/Q vs `kind=content` 3 tokens/Q on the canonical 151K palace; over-fetch=100 inadequate, structural fix non-optional). **Phase A:** new `_SESSION_RECOVERY_COLLECTION` constant + `get_session_recovery_collection()` in `palace.py` (mirrors `get_collection`'s shape — cosine, num_threads=1). **Phase B:** `tool_diary_write` routes `topic in _CHECKPOINT_TOPICS` to the dedicated `mempalace_session_recovery` collection, everything else stays in `mempalace_drawers`; new `_get_session_recovery_collection()` in `mcp_server.py` with parallel cache. **Phase C:** new `tool_session_recovery_read` MCP handler reads recovery collection only with optional filters `session_id`, `agent`, `since`, `until`, `wing`, `limit`; `session_id` added as optional metadata field on `tool_diary_write` so the new tool can filter by Claude Code session. Registered in `TOOLS` dict, documented in `website/reference/mcp-tools.md`. 12 new tests across `tests/test_session_recovery.py` + `TestCheckpointRouting` + `TestSessionRecoveryRead`. Design + plan at `docs/superpowers/specs/2026-04-25-checkpoint-collection-split.md` and `docs/superpowers/plans/2026-04-25-checkpoint-collection-split-impl.md`. **Phases D (data migration of ~640 existing checkpoints out of main collection) and E (palace-daemon `lifespan` auto-migrate + `mempalace repair --mode reorganize`) deferred** — multi-day work, gated on a separate go-ahead. Once D lands and the canonical-palace re-run shows the predicted `kind=all` ≈ `kind=content` token convergence, the `kind=` post-filter and over-fetch hack become deletable. **Update 2026-04-26:** phase D shipped — `migrate_checkpoints_to_recovery()` in `mempalace/migrate.py`, idempotent walk that moves topic in `_CHECKPOINT_TOPICS` drawers from main → recovery while preserving IDs and metadata. Wired into `mempalace repair --mode reorganize` (CLI dispatch in `cli.py` chooses between `rebuild` (HNSW from sqlite) and `reorganize` (this new path)). PreCompact hook also incorporated — `hook_precompact` now writes a recovery marker via `_save_diary_direct` mirroring Stop, so a context-compaction event leaves a queryable timestamp in the recovery collection. 6 new migration tests in `test_migrate.py::TestMigrateCheckpointsToRecovery`. **Phase E shipped** in palace-daemon commit [`034023c`](https://github.com/jphein/palace-daemon/commit/034023c) on 2026-04-26 — `lifespan` calls `migrate_checkpoints_to_recovery()` in an executor on startup, gated behind `PALACE_AUTO_MIGRATE_CHECKPOINTS=1` (default on), with `ImportError` fallthrough so upstream-shaped installs without `mempalace.migrate` still start cleanly. Canonical 151K palace migrated 667 checkpoints on 2026-04-26 10:24:09 PDT. **Cleanup phase shipped 2026-04-27** — empirical check on production showed 0 checkpoints in `mempalace_drawers` (763 in `mempalace_session_recovery`), so the kind= filter was provably inert. Deleted in row 21 above.
57
57
58
58
27. **perf: batch ChromaDB inserts in miner (cherry-pick of upstream #1085)** (commit `6be6fff`, 2026-04-26) — Cherry-picked @midweste's [#1085](https://github.com/MemPalace/mempalace/pull/1085) "batch ChromaDB inserts in miner — 10-30x faster mining". Upstream PR #1085 is still **OPEN** as of 2026-04-26 (created 2026-04-21, base=develop, not yet merged) — verified via `gh pr view 1085 --repo MemPalace/mempalace`. We cherry-picked the commit ahead of merge so the fork can use it now; this row clears when #1085 merges into develop and we next sync. We don't file a competing fork-side PR — the proposal is @midweste's. New `_build_drawer()` helper builds id+document+metadata in one shot; new `add_drawers()` batch-insert function takes the full chunk list and sub-batches at `DRAWER_UPSERT_BATCH_SIZE` (one chromadb upsert + one ONNX embedding forward-pass per sub-batch instead of per-chunk). `process_file` now calls `add_drawers` directly. Hoists `datetime.now()` and `os.path.getmtime()` to file-level (2 syscalls per file instead of 2N). **Conflict resolution:** fork already had a fork-only `_build_drawer_metadata` + an outer batch loop in `process_file`; upstream's clean structure supersedes both. Kept fork's `DRAWER_UPSERT_BATCH_SIZE=1000` (more conservative than upstream's 5000 for embedding-pass memory headroom); aliased upstream's `CHROMA_BATCH_LIMIT` to point at it so any code/test referencing either name sees the same value. 74/74 miner+convo_miner tests pass; full suite 1366/1366. Becomes a no-op when #1085 merges into upstream develop and we next sync develop→main.
Copy file name to clipboardExpand all lines: README.md
+4-5Lines changed: 4 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@
8
8
9
9
---
10
10
11
-
This fork tracks `upstream/develop` through the 2026-04-27 sync and runs in production on a 151,478-drawer palace behind [palace-daemon](https://github.com/jphein/palace-daemon) at `disks.jphe.in:8085`. It carries 17 fork-ahead changes that compose with — not replace — bensig's release direction; four landed upstream on 2026-04-26 (#1173, #1177, #1198, #1201). 1,510 tests pass on `main`. The new things here are *what we've learned*, not just what we've fixed.
11
+
This fork tracks `upstream/develop` through the 2026-04-27 sync and runs in production on a 151,478-drawer palace behind [palace-daemon](https://github.com/jphein/palace-daemon) at `disks.jphe.in:8085`. It carries 16 fork-ahead changes that compose with — not replace — bensig's release direction; four landed upstream on 2026-04-26 (#1173, #1177, #1198, #1201). 1,500 tests pass on `main`. The new things here are *what we've learned*, not just what we've fixed.
12
12
13
13
## What just shipped
14
14
@@ -54,7 +54,7 @@ The deeper read on local-first AI memory: the sovereignty argument lands in cour
54
54
55
55
Three bands of work, all instances of the principles above. Detail rows in the [appendix](#fork-change-inventory) at the bottom.
56
56
57
-
-**Structural retrieval fixes (Principle 1, Principle 2).** Multi-collection split moves Stop-hook checkpoints to a dedicated `mempalace_session_recovery` collection — physically absent from `mempalace_search`, queryable via the new `mempalace_session_recovery_read` MCP tool. PreCompact incorporated. Auto-migrates on first daemon restart. The transitional `kind=` filter and over-fetch hack become deletable next release. `drawer_id` surfacing on every search/diary/recovery hit so callers can build citation popovers and follow-ups.
57
+
-**Structural retrieval fixes (Principle 1, Principle 2).** Multi-collection split moves Stop-hook checkpoints to a dedicated `mempalace_session_recovery` collection — physically absent from `mempalace_search`, queryable via the new `mempalace_session_recovery_read` MCP tool. PreCompact incorporated. Auto-migrates on first daemon restart. The transitional `kind=` filter and over-fetch hack are gone (2026-04-27) — the structural fix made them inert. `drawer_id` surfacing on every search/diary/recovery hit so callers can build citation popovers and follow-ups.
58
58
-**Single-writer architecture (Principle 3).**[palace-daemon](https://github.com/jphein/palace-daemon) is the only process that opens the palace; clients connect over HTTP. ChromaDB 1.5.x's HNSW concurrency hazards (`#974`/`#965`/`#823` family) become structurally impossible. Cold-start integrity sniff-test on segment metadata files prevents `quarantine_stale_hnsw` from destroying healthy indexes during async-flush lag. Cherry-pick of upstream [#1085](https://github.com/MemPalace/mempalace/pull/1085) for 10–30× mining speedup; cherry-pick of upstream-PR-#1094 for boundary-level None-metadata coercion that closes a per-site-guard family.
59
59
-**Deterministic hook saves (Principles 1+2+3 compose).** Silent saves bypass auto-memory conflicts entirely — the LLM is no longer in the save path, so `decision: "block"` race conditions and Claude's auto-memory winning over MCP tools both go away. Save marker advances only after confirmed write. `systemMessage` notification surfaces results. PreCompact writes a recovery-collection marker before mining + compaction so context-boundary events leave a queryable timestamp.
60
60
@@ -100,7 +100,7 @@ A Stop hook fires every 15 messages in Claude Code, writes directly to `mempalac
100
100
101
101
When the HNSW index is genuinely degraded (rare, post-fix), the same call returns `warnings: ["vector search returned 0 of 5 requested; filled 5 from sqlite+BM25 keyword match"]` with hits tagged `"matched_via": "sqlite_bm25_fallback"` — data is never silently hidden.
102
102
103
-
After the 2026-04-26 migration, the example queries from a week ago all return content rather than checkpoint word-soup. `kind=all` is now equivalent to `kind=content` in practice; the parameter survives one more release as a safety net, then retires.
103
+
After the 2026-04-26 migration, the example queries from a week ago all return content rather than checkpoint word-soup. The `kind=`parameter retired 2026-04-27 — the structural split made it inert.
|**Search**| Surface `drawer_id` in `mempalace_search` results, `mempalace_diary_read` entries, and `mempalace_session_recovery_read` payload. ChromaDB primary key was returned but never plumbed into the result-building loop. Defensive zip-with-id-pad for test mocks. | PR pending — fork commit [`9a8bb77`](https://github.com/jphein/mempalace/commit/9a8bb77); upstream [#1219](https://github.com/MemPalace/mempalace/pull/1219) (@pepo72) is the narrower searcher-only equivalent. |`searcher.py`, `mcp_server.py`, `tests/...`, `website/reference/mcp-tools.md`|
323
323
|**Reliability**|`hook_precompact` writes a session-recovery checkpoint marker before mining + compaction. Mirrors `hook_stop`'s `_save_diary_direct` call; same routing path (recovery collection, queryable by `session_id`). | Bundled with phase D in [`42817d7`](https://github.com/jphein/mempalace/commit/42817d7)|`mempalace/hooks_cli.py`|
324
-
|**Search**|`kind=` filter on `search_memories` excludes Stop-hook checkpoints by default. Three values: `"content"` (default), `"checkpoint"`, `"all"`. Post-filter only (chromadb 1.5.x `$nin`/`$in` filter-planner bug); over-fetch `max(n*20, 100)` for non-`"all"`. **Transitional** — becomes deletable next release after the structural split (above) ships. | PR pending — fork commits `8d02835` → `f9f5cc4`|`searcher.py`, `mcp_server.py`|
325
324
|**Performance**| Cherry-picked upstream [#1085](https://github.com/MemPalace/mempalace/pull/1085) (@midweste) — batch ChromaDB inserts in miner. New `_build_drawer()` + `add_drawers()`. Reported 10–30× mining speedup. | Cherry-pick of open #1085 — fork commit [`6be6fff`](https://github.com/jphein/mempalace/commit/6be6fff). Becomes a no-op when #1085 merges. |`mempalace/miner.py`|
326
325
|**Reliability**| Cherry-picked upstream [#1094](https://github.com/MemPalace/mempalace/pull/1094) — coerce None metadatas at chromadb boundary. Closes the per-site-guard family of None-metadata bugs (#999, #1198, #1201) at one site instead of N. | Cherry-pick of open #1094 — fork commit [`43d728d`](https://github.com/jphein/mempalace/commit/43d728d)|`backends/chroma.py`, `tests/test_backends.py`|
327
326
|**CLI**|`mempalace purge --wing/--room` via `collection.delete(where=...)`. Earlier nuke-and-rebuild draft predicated on #521's race; @igorls's review traced the stack — race is on the upsert path, not delete-by-where. Simpler version preserves embedding fn, no rmtree window, routes through `ChromaBackend`. |[#1087](https://github.com/MemPalace/mempalace/pull/1087), rewritten 2026-04-26 per review |`cli.py`, `tests/test_cli.py`|
0 commit comments