feat(mcp): persist + return drawer metadata by rglaubitz · Pull Request #1 · rglaubitz/mempalace

rglaubitz · 2026-05-26T08:38:19Z

Summary

add optional metadata to mempalace_add_drawer
persist caller metadata as one JSON blob in drawer metadata storage
return decoded metadata on mempalace_search hits, with legacy/no-metadata drawers returning metadata: null
bump package/version badge/lock to 3.4.0 and tag v3.4.0

Audax Track C context

This unblocks rglaubitz/audax-apex Track C: MempalaceIngestSink needs to pass planner provenance through the MemPalace boundary so the Phase 9.5 retrieval driver can read fields like source_class, sensitivity, injectable, reference_only, and derived_from from search hits.

Migration / storage approach

MemPalace currently stores drawers in ChromaDB per-document metadata. This PR stores the entire caller-owned metadata dict as a single JSON string under the metadata key. Chroma metadata is schema-less, so there is no SQLite ALTER TABLE; the lazy migration behavior is effectively read-compatible: existing drawers simply lack the key and decode to metadata: null on search.

Base branch note

Per the dispatch brief, this branch starts from audax-apex's pinned MemPalace SHA 1888b671e2051565480c70e1bab699c1796eef7f (origin/develop in this fork) and targets main.

Verification

RED: new metadata passthrough tests failed before implementation (unexpected keyword argument 'metadata', missing search metadata key)
uv run pytest tests/test_mcp_server.py::TestWriteTools::test_add_drawer_with_metadata_round_trips_to_search tests/test_mcp_server.py::TestWriteTools::test_add_drawer_without_metadata_round_trips_none tests/test_mcp_server.py::TestWriteTools::test_legacy_drawer_without_metadata_key_searches_with_metadata_none -q
uv run pytest tests/test_mcp_server.py tests/test_searcher.py -q -> 101 passed
uv run pytest tests/test_version_consistency.py -q -> 2 passed
uv run pytest -q -> 1492 passed, 1 skipped, 106 deselected
uv run ruff check .
manual temp-palace smoke confirmed audax-apex-style metadata round-trips through add + search

…lace#1299) `mcp_server._get_collection` bypassed `ChromaBackend.get_collection` and called `client.get_collection` / `client.create_collection` without `embedding_function=`. ChromaDB 1.x does not persist the EF identity with the collection, so the MCP server's reopen silently bound chromadb's built-in `DefaultEmbeddingFunction` while the miner / Stop hook ingest path bound `mempalace.embedding.get_embedding_function()`. On bleeding-edge interpreters (python 3.14 + chromadb 1.5.x on Apple Silicon, per MemPalace#1299) the default EF's lazy ONNX provider selection could SIGSEGV the host process on first `col.add()`, killing the MCP stdio server and leaving every subsequent tool call returning `Connection closed` until Claude Code was relaunched. Reads worked because `col.get(ids=...)` and metadata fetches don't invoke the EF; the auto-ingest path worked because mining routes through the backend abstraction. Diary writes were the consistent failure surface. Resolve the EF up front (matching `ChromaBackend._resolve_embedding_function`) and pass it into both reopen branches. Falls back to the chromadb default only if `mempalace.embedding.get_embedding_function` itself raises. Regression test patches the chromadb client class to capture `embedding_function=` on every `get_collection` / `create_collection` call from `_get_collection(create=True)` and `_get_collection()`, and fails if any call omits it. Follow-up to MemPalace#1262 / MemPalace#1289 (which fixed the metadata-mismatch SIGSEGV path); this addresses the EF-mismatch SIGSEGV path on the same surface.

- Resolve the EF inside the two reopen branches that actually call `client.get_collection` / `client.create_collection`, so warm-cache reads stay zero-cost (no `MempalaceConfig()` / `_resolve_providers` on every tool call). - Reuse `ChromaBackend._resolve_embedding_function()` instead of duplicating its try/except + log message + None-fallback. - Reword the inline + CHANGELOG explanation to clarify that ChromaDB 1.x persists the EF *identity* (its `name()`) but not the *instance/ configuration* — `mempalace.embedding` documents this and spoofs `name()` to `"default"` precisely so the identity check passes; the bug was the *provider list* (lazy ONNX selection) silently differing.

MemPalace#1076) (MemPalace#1077) Shell splits hook command on whitespace after variable expansion, breaking paths with spaces (e.g. C:\Users\Richard M on Windows). Wrapping the path in double quotes preserves the token boundary. Fixes the reported Stop/PreCompact pair in .claude-plugin/hooks/hooks.json and applies the same fix to .codex-plugin/hooks.json (SessionStart/Stop/ PreCompact), which carries the identical bug.

…ad them (MemPalace#1244) `cmd_compress` was writing AAAK-compressed drawers to a `mempalace_compressed` collection, but every read path (`palace.get_closets_collection`, `searcher.py`, `repair.py`) reads from `mempalace_closets`. Result: for non-mined palaces (or any palace where the user ran `mempalace compress` expecting to backfill the closet/index layer), the compressed output was silently invisible — written to a collection nothing else opens. Fix the writer rather than renaming the readers: "closets" is the user-visible feature name baked into the public API (`get_closets_collection`), the searcher hybrid path, repair/HNSW diagnostics, and docs. Renaming the readers would churn 15+ call sites and the README for no benefit. The compressed AAAK strings are exactly what closets are conceptually — compact pointers scanned by an LLM to locate the right drawer — so they belong in `mempalace_closets`. Tests: - Update `test_cmd_compress_stores_results` to assert the collection name passed to `get_or_create_collection` is `mempalace_closets`. - Add `test_cmd_compress_output_readable_via_get_closets_collection`: end-to-end with a real ChromaBackend, seed a drawer, run cmd_compress, then read back via the same `get_closets_collection` helper that palace.py / searcher use. Regression test for the wrong-collection bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI requires ruff format --check on the whole touched file. Pre-existing drift, no logic change.

…alace#1243) `tool_diary_write` stored the `agent` metadata verbatim after `sanitize_name` (which preserves case), while `tool_diary_read` filtered by exact match — so writing as "Claude" and reading as "claude" silently returned zero rows. Both endpoints now lowercase `agent_name` immediately after sanitization. The default per-agent wing slug is also stable across casings since it's derived from the same normalized form. Behavior change: entries written prior to this fix under mixed-case agent names will not match the new lowercase filter; documented under v3.3.5 in CHANGELOG with a `mempalace repair` pointer. Adds a regression test (`test_diary_read_case_insensitive_agent`) and updates the existing `test_diary_write_and_read` to assert the new lowercase agent identity. Closes MemPalace#1243 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…event SIGSEGV on stale HNSW (MemPalace#1121, MemPalace#1132, MemPalace#1263) PR MemPalace#1173 wired quarantine_stale_hnsw into the static make_client() helper but not into the instance _client() method. As a result every non-MCP entry point (CLI mining, search, repair, status) — which all use get_collection / _get_or_create_collection / _client() — skipped the cold-start quarantine pass and could SIGSEGV on a stale HNSW segment left over from a partial flush, replicated palace, or crashed-mid-write. Refactor: extract the (_fix_blob_seq_ids + gated quarantine_stale_hnsw) pre-open pass into a single private static helper ChromaBackend._prepare_palace_for_open(). Both make_client() and _client() now route through it, so the _quarantined_paths once-per- palace-per-process gate is preserved (no runtime thrash on hot paths) and behaviour stays identical — the fix is purely about extending the existing protection to the path that was missing it. Tests: - test_client_quarantines_corrupt_segment_on_first_open mirrors the existing make_client test and verifies _client() actually renames a corrupt segment on first open. - test_client_quarantines_only_on_first_call_per_palace verifies the cache gate prevents re-running quarantine across repeated _client() calls — important because _client() is hit on every backend op. Closes MemPalace#1121. Closes MemPalace#1132. Closes MemPalace#1263. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI requires whole-file format on touched files; pre-existing drift only.

@icciaaron

The MCP `mempalace_status` tool was returning the server's absolute `_config.palace_path` to any connected client on both the main (ChromaDB-backed) path and the sqlite fallback path that runs when HNSW divergence is detected (MemPalace#1222). On a single-user local deployment this is self-disclosure, but in nested-agent or multi-server MCP topologies the client is a separate trust domain and the absolute path has no documented client-side use. Clients that legitimately need the palace path continue to have three documented channels: the `MEMPALACE_PALACE_PATH` env var (primary) or its legacy `MEMPAL_PALACE_PATH` alias, the `~/.mempalace/config.json` file, and the `--palace` CLI flag on most subcommands. Also corrects stale docs that claimed `mempalace_reconnect` returned a `palace_path` field; the code returns `{success, message, drawers, vector_disabled[, vector_disabled_reason]}` on success, plus a no-palace shape and an exception shape. - mempalace/mcp_server.py: drop palace_path from tool_status() and _tool_status_via_sqlite() result dicts - website/reference/mcp-tools.md: update documented return shapes for mempalace_status (fix) and mempalace_reconnect (stale-docs correction) Authored-by: Aaron Salsitz (ICCI LLC, @icciaaron). Claude Code was used as an authoring and review-orchestration tool, with human-in-the-loop oversight at every step: Aaron wrote the prompts, reviewed each draft, called for three independent review passes (drafting / post-rebase technical / CISA-aligned disclosure-leak), and verified the final patch behavior before commit.

The MCP `mempalace_get_drawer` tool returned the entire raw drawer metadata blob to any connected client, and the `source_file` field in that blob is the absolute filesystem path written by the miners (`miner.py`, `convo_miner.py` — `source_file = str(filepath)`). On a single-user local deployment this is self-disclosure, but in nested-agent or multi-server MCP topologies the client is a separate trust domain and the host's directory layout has no documented client-side use. Mirror the mitigation that `searcher.search_memories()` already applies on its own return path: reduce `source_file` to its basename via `Path(source_file).name` before handing the metadata to the client. Citations still work — the directory layout does not leak. Companion to #1 (omit palace_path from tool_status). Same threat class, different surface: - mempalace_status — palace dir path → fixed in #1 - mempalace_get_drawer — per-drawer source_file path → this PR Other read tools were audited and do not leak host paths: - mempalace_search — already basenames source_file - mempalace_list_drawers — returns wing/room/preview only - mempalace_diary_read — date/timestamp/topic/content only - mempalace_reconnect — success/message/drawers only - mempalace_kg_* — entity/predicate strings, counts - mempalace_check_duplicate — wing/room/preview only Changes: - mempalace/mcp_server.py: tool_get_drawer() now basenames metadata.source_file - tests/test_mcp_server.py: regression test asserting the absolute path and its parent directory do not appear anywhere in the response - website/reference/mcp-tools.md: clarify the documented return shape

…ing pool Default search behavior is unchanged. Opt-in candidate_strategy="union" also pulls top-K BM25-only candidates from sqlite FTS5 and merges them into the rerank pool, catching docs with strong BM25 signal that the vector index didn't surface in the over-fetch window. Motivation ---------- The current hybrid path gathers candidates from the vector index only (n_results * 3 over-fetch), then BM25-reranks within them. When the query embeds close to the wrong content semantically, the right doc never enters the rerank pool — *no matter how wide the over-fetch*. Tested on a ~6K-document mixed corpus (knowledge prose + short structured records): at *30x* over-fetch (~5% of the corpus) the target doc still didn't surface for narrative-shaped queries targeting terminology guides. Wider over-fetch isn't the answer; widening the pool's *source* is. Concrete failure mode: a narrative-shaped query embeds close to records sharing the same operational vocabulary (other narrative entries in the corpus). A terminology / style guide is BM25-strong for the query (rare keywords the guide repeats) but vector-distant. Vector-only candidates don't include it; BM25 never gets to rerank it. The hybrid path produces 0.00 recall on a probe that pure BM25 alone scores 1.00 — the hybrid is worse than its component on the same input. Behavior change --------------- * New parameter ``candidate_strategy: str = "vector"`` on ``search_memories``. - ``"vector"`` (default): historical behavior, no change. - ``"union"``: also fetch top ``n_results * 3`` candidates via the existing ``_bm25_only_via_sqlite`` helper, dedupe by source_file, merge into the rerank pool. BM25-only candidates carry ``distance=None`` so they're scored on BM25 contribution alone (vec_sim coerces to 0). * ``_hybrid_rank`` now handles ``distance=None`` explicitly, scoring such candidates as vector-unknown (vec_sim=0) rather than treating it as max-distance via shim. * New strategies register via ``_CANDIDATE_MERGERS``; dispatch is in ``_apply_candidate_strategy`` so ``search_memories`` stays under the C901 complexity ceiling. Bench numbers (~6K-doc internal mixed corpus, recall@10, 5 probes spanning policy-exception lookup, temporal-decay, style retrieval, set-difference, and pattern-recognition): baseline ("vector") "union" policy-exception probe 0.00 0.50 +0.50 temporal-decay probe 0.17 0.50 +0.33 style-retrieval probe 0.00 1.00 +1.00 (PASSES) set-difference probe 0.00–0.06 0.06–0.09 ~ pattern-recog probe 0.64 (stable) 0.50–0.71 variance, typ. +0.07 macro recall 0.16–0.17 0.51–0.56 +0.34 to +0.40 The pattern-recog variance points at a related issue worth a separate PR: ``_hybrid_rank`` computes BM25 IDF over the candidate set. Adding new candidates re-normalizes BM25 for *existing* candidates non-monotonically. Stable corpus-wide BM25 would remove this. Out of scope here. Tests ----- ``tests/test_hybrid_candidate_union.py`` (6 tests, all pass): - default behavior unchanged (explicit ``"vector"`` matches default) - ``"union"`` surfaces a BM25-strong vector-distant doc - ``"union"`` doesn't drop docs ``"vector"`` would have found - empty-palace handling - invalid ``candidate_strategy`` raises - ``_hybrid_rank`` tolerates ``distance=None`` Existing ``test_hybrid_search.py`` (5) and ``test_searcher.py`` (27) pass. Performance note ---------------- Each ``"union"`` query adds one sqlite open + FTS5 MATCH + metadata fetch (via the existing ``_bm25_only_via_sqlite`` helper, which already runs as the ``vector_disabled`` fallback path so the code is well-trodden). Per-query overhead is small but unmeasured at corpus scale. Default stays ``"vector"`` until a maintainer characterizes the cost.

- Dedup union candidates by (full_path, chunk_index), not basename — two files sharing a basename in different dirs no longer collide, and a vector hit on chunk N of a file no longer blocks BM25 from contributing chunk M of the same file. - Validate candidate_strategy at the top of search_memories so invalid values fail consistently, not only when the call routes through the vector path. - Trim hits back to n_results after the union+rerank pool grows; preserves the existing search_memories size contract that the MCP limit parameter is built on. - Skip BM25-only injection when max_distance > 0.0; BM25-only candidates carry distance=None and would silently bypass the caller's strict vector-distance threshold. Adds 4 tests covering: validation under vector_disabled, n_results trim, max_distance honoring, and basename-collision dedup.

…emPalace#1314) `tool_kg_add` previously accepted only `valid_from` and `source_closet`, silently dropping `valid_to`, `source_file`, and `source_drawer_id` at the MCP boundary. Backfilling already-ended historical facts therefore collapsed to "still current," and adapter provenance never reached the SQLite layer even though `KnowledgeGraph.add_triple` already supported every column. `tool_kg_invalidate` returned the literal string `"today"` whenever the caller omitted `ended`, hiding the actual stamped date from anyone trying to verify what got persisted. Changes: - Extend `tool_kg_add` signature + MCP input_schema with `valid_to`, `source_file`, `source_drawer_id`; forward all of them to `_kg.add_triple` and to the WAL log. - Resolve `ended` to `date.today().isoformat()` in `tool_kg_invalidate` before logging / returning, so the response always reports the actual date stored in `valid_to`. - Add regression tests for valid_to round-trip, source_file / source_drawer_id provenance, and the resolved-ended-date contract. - Leave TODO(MemPalace#1283) markers so the open ISO-8601 validation PR can drop `validate_iso_date` over `valid_from` / `valid_to` / `ended` cleanly. The underlying `KnowledgeGraph.add_triple` already accepted these kwargs (RFC 002 §5.5) — only the MCP edge needed wiring up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The repo's anti-jargon meta-test bans §N markers outside the sources/backends allowlist. mcp_server.py isn't allowlisted, so the "RFC 002 §5.5" references added in this PR turned the test red. Trim to "RFC 002" — section number isn't load-bearing for the description.

cmd_init was instantiating MempalaceConfig() unconditionally, ignoring args.palace and always writing the palace under ~/.mempalace. Mirror the env-var pattern used by mcp_server.py (and consistent with how cmd_mine / cmd_status / cmd_search resolve --palace) so every downstream read of cfg.palace_path inside cmd_init — Pass 0, cfg.init(), and the post-init mine — routes to the user-specified location. Adds tests/test_cli.py::test_cmd_init_honors_palace_flag covering the regression: asserts Pass 0 receives the --palace value (not ~/.mempalace) and that MEMPALACE_PALACE_PATH is set in os.environ. Closes MemPalace#1313.

monkeypatch.delenv(name, raising=False) on a missing key registers no undo entry, so the env var cmd_init writes leaked into test_config_from_file on Python 3.13 / Windows / macOS. Prime the slot with setenv before delenv so teardown rolls back the write.

rglaubitz

Review — request changes

Three blockers before merge, all narrowly scoped and patchable on this branch:

Contract correctness

F01 — tool_get_drawer.metadata and tool_search.results[i].metadata return different shapes for the same field name. tool_search decodes the caller's opaque metadata via decode_drawer_metadata; tool_get_drawer returns the raw Chroma metadata dict (with the caller payload still a JSON string under an inner metadata key). Downstream consumer (audax-apex Track C T3) is normalizing hits from search; if any path also goes through get_drawer, they'll silently get a different shape. Fix in one place, document the intended response shape in either case. See inline comment on mempalace/mcp_server.py for the recommended shape.

Backwards-compat / silent data loss

F10 — mempalace_compressed → mempalace_closets rename (MemPalace#1244) has no migration. palace.get_closets_collection reads only from the new name; existing palaces with compressed-drawer data become silently unreadable. mempalace repair --mode does not have a closets-rename mode (verified — only legacy and max-seq-id). Either add a migration mode, fall back to the old name on empty-new, or document the breakage as "re-mine required" in the CHANGELOG entry.
F11 — Diary case-insensitivity fix (MemPalace#1243) CHANGELOG says "run mempalace repair if you need to migrate legacy diary metadata", but no diary-case mode exists in mempalace repair. The promised remediation does not ship with this PR. Either implement the mode, or amend the CHANGELOG.

Scope note: this PR is effectively develop → main + the T2 metadata passthrough on top — at least 11 distinct changes bundled together. If main is the release branch and develop is the integration branch, this ships a 3.4.0 release with T2 piggybacking, not a feature PR. Worth confirming framing before merging.

MED findings (action recommended, not blocking)

F02 — metadata={} and metadata=None are distinguishable on round-trip (encode({}) → "{}", decoded back to {}). Document this in the API ref; T3 sink should send None for "no provenance," never {}.
F03 — decode_drawer_metadata silently returns None on JSON parse failure. A corrupted-metadata drawer becomes indistinguishable from a legacy drawer with no metadata key. Under a downstream policy that treats None as "legacy, allow as fallback," a corrupted injectable=False drawer silently becomes injectable. At minimum logger.warning with drawer context; consider a sentinel like {"_corrupted": True} so downstream policy can fail closed.
F06 — First-write-wins for metadata works via the deterministic drawer-ID idempotency path (mcp_server.py:856-861) but is untested. Add a test: same wing+room+content twice with different metadata, second metadata must be silently dropped without an error.
F14 — cmd_init mutates os.environ["MEMPALACE_PALACE_PATH"] in production code (not just tests) and never restores. Benign for one-shot CLI; leaks across calls if invoked twice in one process.
F15 — _get_collection EF-binding (MemPalace#1299 fix) no-ops if ChromaBackend._resolve_embedding_function() returns None (kwargs dict ends up empty, falls back to the old buggy call without embedding_function=). When does the resolver return None? If never in production, surface the assumption inline; if sometimes (dev fallback), the fix doesn't fully cover.
F17 — No test asserts that tool_get_drawer and tool_search return the same metadata shape on the same drawer. Would catch F01.

LOW / cosmetic

F04 — decode_drawer_metadata accepts dict passthrough (drawer_metadata.py:22-23). Defensive branch for test fixtures only; asymmetric with encoder. Drop or document.
F05 — json.dumps on deeply-nested dict raises RecursionError, not caught by except (TypeError, ValueError) in tool_add_drawer. Caller sees an uncaught exception rather than {success: False, error: ...}.
F07 — Opaque metadata payload is a new leak surface (callers can stash absolute paths inside). MemPalace cannot scrub it; document that callers are responsible.
F08 — tool_kg_add.source_file accepted verbatim; later returned by tool_kg_query. Same threat class as the tool_get_drawer basename-redaction fix this PR ships. Mirror or document.
F09 — except Exception as e: return {"error": str(e)} patterns may leak palace paths / Chroma internals in error strings. Pre-existing.
F12 — tool_kg_invalidate WAL log format changed (resolved date now logged, not raw input). Note in CHANGELOG.
F16 — ChromaBackend._quarantined_paths class-level mutable set, no lock. _prepare_palace_for_open now called from both make_client and _client. Benign for single-process MCP server; defense-in-depth threading.Lock if multi-threaded callers ever appear.
F19–F24 — Test gaps on edge cases: metadata={} distinct-from-None, non-dict-error response, non-JSON-serializable values, size bounds, legacy mempalace_compressed readability, pre-fix diary case-mismatch repair.

Methodology: adversarial review across seven attack lanes (contract conformance, encoding correctness, info leakage, backwards compat, concurrency, test coverage, scope discipline). Hoare-style preconditions / postconditions / invariants per public API.

KG schema migration (valid_to, source_file, source_drawer_id columns) was verified clean — knowledge_graph.py:112-113 auto-applies ALTER TABLE. Candidate-union BM25 mode (searcher.py) was reviewed and the dedup / max_distance interaction looks correct.

rglaubitz · 2026-05-26T17:31:16Z

-            "metadata": meta,
+            "wing": safe_meta.get("wing", ""),
+            "room": safe_meta.get("room", ""),
+            "metadata": safe_meta,


F01 (HIGH) — metadata shape drift between get_drawer and search

This returns metadata = dict(meta), i.e., the full Chroma metadata dict (wing, room, source_file, chunk_index, added_by, filed_at, plus the caller's metadata JSON STRING under an inner key).

Compare to searcher.py:836:

"metadata": decode_drawer_metadata(meta.get(DRAWER_METADATA_FIELD)),

search returns the decoded caller dict under metadata. So:

tool_search()["results"][0]["metadata"]["source_class"] → works.

tool_get_drawer(id)["metadata"]["source_class"] → KeyError. Consumer must do json.loads(get_drawer(id)["metadata"]["metadata"])["source_class"].

The leak-probe test below (test_get_drawer_does_not_leak_absolute_source_file_path) asserts result["metadata"]["source_file"] == "notes.md" — confirming consumers currently use get_drawer.metadata as the system metadata dict. That convention conflicts directly with the new metadata field the Track C contract expects.

Recommended fix (pick one, document):

Decode the inner metadata in tool_get_drawer and return as a sibling of wing / room. System fields become top-level keys; metadata is the caller dict (matching search). This is closer to the audax-apex contract.

Keep the current get_drawer shape but rename the caller field everywhere (drawer_metadata? passthrough?).

Either way, add a test that asserts shape parity between get_drawer and search for the same drawer (F17).

Threat-model framing: same-user-agent. The concern is API consistency for the T3 consumer, not adversarial access. Don't add sanitization/validation; just align the shape.

rglaubitz · 2026-05-26T17:31:19Z

    if not args.dry_run:
        try:
-            comp_col = backend.get_or_create_collection(palace_path, "mempalace_compressed")
+            comp_col = backend.get_or_create_collection(palace_path, "mempalace_closets")


F10 (HIGH) — mempalace_compressed → mempalace_closets rename has no migration

palace.get_closets_collection (palace.py:68) reads from mempalace_closets. Existing palaces written by 3.3.x have compressed-drawer data in mempalace_compressed — they become silently invisible to the reader after this rename.

Verified: mempalace repair --mode only accepts legacy and max-seq-id (cli.py:1197-1202). No closets-rename migration mode ships with this PR.

Fix options (pick one):

Add --mode closets-rename to mempalace repair: rename the collection in place, or copy old→new and drop the old.

Make get_closets_collection fall back to mempalace_compressed if mempalace_closets is absent or empty, with a one-time warning log.

Document the breakage explicitly in the [3.3.5] CHANGELOG entry: "Existing compressed-drawer data is not auto-migrated; run a fresh mempalace compress to repopulate the new collection."

Currently neither the CHANGELOG nor the diff acknowledges the loss path. For a 3.4.0 minor bump that's a surprise.

rglaubitz · 2026-05-26T17:31:20Z


+## [3.3.5] — unreleased
+
+### Bug Fixes


F11 (HIGH) — promised migration tool does not exist

This entry advertises mempalace repair as the migration path for legacy mixed-case diary entries. Verified against cli.py:1197-1202: the only --mode values accepted are legacy and max-seq-id. No diary-case mode exists.

Pre-fix diary entries under agent_name = "Claude" are unreachable under the new .lower() filter (mcp_server.py:1212, 1276), and the recommended remediation does not ship.

Fix options (pick one):

Implement --mode diary-case: scan diary drawers, lowercase the stored agent metadata in place, optionally fix the default-wing slug derived from it.

Amend this CHANGELOG entry: drop the mempalace repair claim, replace with "legacy entries written under mixed-case agent names are not migrated; re-write the affected entries or accept the loss."

Either resolution is fine; the current state is a CHANGELOG promise the codebase can't honor.

F03 now logs corrupted drawer metadata decode failures through the mempalace_mcp logger, truncating the stored value to value_head so operators get an identifier without leaking the full payload. F15 is Case B: ChromaBackend._resolve_embedding_function() can return None because it catches any exception from get_embedding_function() and returns None after logging; ChromaBackend.get_collection() mirrors the same conditional kwargs behavior and does not repair that case. For the MCP _get_collection call site, treat a None EF as a fail-closed condition and never reopen a collection without embedding_function=, so the MemPalace#1299 default-EF regression path cannot silently reappear there. Validation: uv run pytest -q; uv run ruff check .

igorls and others added 21 commits May 26, 2026 01:59

style: ruff format cli.py (MemPalace#1244)

b0884cf

CI requires ruff format --check on the whole touched file. Pre-existing drift, no logic change.

style: ruff format tests/test_cli.py (PR MemPalace#1319)

34e4203

style: ruff format tests/test_mcp_server.py (PR MemPalace#1323)

ab9287e

style: ruff format touched files (PR MemPalace#1322)

9b6a5e2

CI requires whole-file format on touched files; pre-existing drift only.

style: ruff format tests/test_backends.py (PR MemPalace#1322)

3103749

style: ruff format test_mcp_server.py (PR MemPalace#1320)

062f351

feat(mcp): persist drawer metadata

7a926ae

rglaubitz force-pushed the codex/metadata-passthrough branch from 0d68ac4 to 7a926ae Compare May 26, 2026 09:03

rglaubitz marked this pull request as ready for review May 26, 2026 09:03

rglaubitz commented May 26, 2026

View reviewed changes

rglaubitz added 2 commits May 26, 2026 10:39

fix: address metadata passthrough review blockers

020a455

rglaubitz merged commit bfd2fb2 into main May 26, 2026

rglaubitz deleted the codex/metadata-passthrough branch May 26, 2026 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): persist + return drawer metadata#1

feat(mcp): persist + return drawer metadata#1
rglaubitz merged 23 commits into
mainfrom
codex/metadata-passthrough

rglaubitz commented May 26, 2026

Uh oh!

rglaubitz left a comment

Uh oh!

rglaubitz May 26, 2026

Uh oh!

rglaubitz May 26, 2026

Uh oh!

rglaubitz May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rglaubitz commented May 26, 2026

Summary

Audax Track C context

Migration / storage approach

Base branch note

Verification

Uh oh!

rglaubitz left a comment

Choose a reason for hiding this comment

Review — request changes

MED findings (action recommended, not blocking)

LOW / cosmetic

Uh oh!

rglaubitz May 26, 2026

Choose a reason for hiding this comment

Uh oh!

rglaubitz May 26, 2026

Choose a reason for hiding this comment

Uh oh!

rglaubitz May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants