fix(mcp): tolerate None metadata cells (#1426)#1445
Conversation
ChromaDB's col.get() and col.query() can return None for the metadata
cell of a partially-flushed row, or for any row written without metadata
in older formats. The MCP handler then crashes with:
AttributeError: 'NoneType' object has no attribute 'get'
This crashed the path before the embeddings_queue cleanup step, so the
queue grew without bound while writes kept appearing successful.
Add a small _safe_meta() boundary helper that coerces None (or any other
non-dict) to {}, and route every direct metadatas[i]/metadatas[0] read in
mcp_server through it: tool_get_drawer, tool_list_drawers,
tool_update_drawer, tool_delete_drawer (audit-log path),
tool_check_duplicate, and tool_diary_read.
The contract — *metadata is always a dict by the time it leaves the
boundary* — is documented on the helper and self-documents at each call
site.
Regression coverage: tests/test_mcp_server.py::TestNoneMetadataSafety
covers the helper and each affected handler with a stub collection that
returns None metadatas (Chroma rejects None at write time, so we can't
reproduce upstream state through the real backend).
Closes MemPalace#1426.
There was a problem hiding this comment.
Code Review
This pull request introduces a _safe_meta helper function to robustly handle None values in ChromaDB metadata, preventing AttributeError crashes in various tool functions. It also includes a new suite of regression tests to ensure metadata safety. Review feedback highlights that the documents field is similarly vulnerable to None values in several locations, which could cause TypeError during string operations like slicing for logs or previews. It is recommended to coerce these document values to empty strings to ensure consistency and prevent runtime errors.
| deleted_meta = _safe_meta( | ||
| existing.get("metadatas", [{}])[0] if existing.get("metadatas") else {} | ||
| ) |
There was a problem hiding this comment.
While this correctly handles None metadata, the deleted_content variable on line 1055 is also vulnerable to being None in the same "partially-flushed row" scenario. If deleted_content is None, the _wal_log call at line 1064 will raise a TypeError when attempting to slice it (deleted_content[:200]).
Additionally, the deleted_meta assignment can be simplified using a more idiomatic Python pattern to handle None, empty lists, or missing keys.
deleted_meta = _safe_meta((existing.get("metadatas") or [{}])[0])| drawers = [] | ||
| for i, did in enumerate(result["ids"]): | ||
| meta = result["metadatas"][i] | ||
| meta = _safe_meta(result["metadatas"][i]) |
There was a problem hiding this comment.
| return {"success": False, "error": f"Drawer not found: {drawer_id}"} | ||
|
|
||
| old_meta = existing["metadatas"][0] | ||
| old_meta = _safe_meta(existing["metadatas"][0]) |
| if not result["ids"]: | ||
| return {"error": f"Drawer not found: {drawer_id}"} | ||
| meta = result["metadatas"][0] | ||
| meta = _safe_meta(result["metadatas"][0]) |
There was a problem hiding this comment.
The doc variable assigned on the next line (line 1121) can also be None if the row is partially flushed. While it doesn't cause an immediate crash here, it results in a null value in the tool output, which may be unexpected for the client. For consistency with tool_check_duplicate and to ensure robustness, please coerce it to an empty string.
| # Combine and sort by timestamp | ||
| entries = [] | ||
| for doc, meta in zip(results["documents"], results["metadatas"]): | ||
| meta = _safe_meta(meta) |
There was a problem hiding this comment.
) CI pins ruff>=0.4.0,<0.5 (resolves 0.4.10); the PR's two new assert blocks were laid out in ruff-0.5+ style, so `ruff format --check .` failed. Reformat with ruff 0.4.10 (the exact CI version) — assertion layout only, no semantic change. `ruff check .` + `ruff format --check .` both pass under 0.4.10.
CI installed `ruff>=0.4.0,<0.5` (resolves 0.4.10) while contributors run modern ruff (0.5+). ruff 0.5 changed assert-message formatting, so valid code formatted by a current ruff fails CI's `ruff format --check .` on layout alone. This blocked MemPalace#1438, MemPalace#1445, and MemPalace#1528 — each needing a manual reformat-and-push cycle with no actual code defect. Pin an exact, modern ruff in both pyproject dev lists and in ci.yml so CI and `pip install -e ".[dev]"` format/lint identically. No new `ruff check` violations under the existing E/F/W/C901 select.
Bumps version 3.3.5 → 3.3.6 across pyproject.toml, version.py, plugin manifests (.claude-plugin/plugin.json, .claude-plugin/marketplace.json, .codex-plugin/plugin.json), README badge, and uv.lock. Flips CHANGELOG.md from ``[Unreleased]`` to ``[3.3.6] — 2026-05-24`` and backfills the major user-facing entries that landed without changelog entries during the cycle: Features: - MemPalace#1555 office-document mining via --mode extract + virtual line numbers - MemPalace#1584 surgical closet pointers with date+line locators (Tier 6a) - MemPalace#1558 + MemPalace#1560 within-wing hallways (entity co-occurrence graph) - MemPalace#1565 cross-wing tunnels auto-promoted from hallways - MemPalace#1578 Hebbian potentiation + Ebbinghaus decay on hallways/tunnels - MemPalace#1236 API-tool transcripts auto-route to wing_api - MemPalace#711 hooks.auto_save toggle for silent-mode sessions - MemPalace#1605 COCA content-word filter for entity detection - MemPalace#1557 case-insensitive entity matching at mine time - MemPalace#1483 multilingual embeddings (embeddinggemma-300m) by default Bug Fixes (selected, user-visible): - MemPalace#1540 silent data loss in three unchunked upsert sites - MemPalace#1538 paragraph chunker oversized chunks - MemPalace#1554 per-file chunk cap too low for transcripts - MemPalace#1562 Windows hook subprocess/ChromaDB deadlock - MemPalace#1529 create_tunnel corrupted hyphenated wing names - MemPalace#1424 save-hook truncated hyphenated project folders - MemPalace#1383 KG cache duplicated graphs for symlinked/cased paths - MemPalace#1466 silent symlink skip now logged - MemPalace#1441 macOS stock-bash 3.2 hook compatibility - MemPalace#1500 / MemPalace#1513 structured JSON-RPC errors on bad MCP input - MemPalace#1523 VACUUM + FTS5 rebuild after repair - MemPalace#1548 FTS5 validation at end of mine - plus MemPalace#1216, MemPalace#1408, MemPalace#1438, MemPalace#1439, MemPalace#1445, MemPalace#1452, MemPalace#1459, MemPalace#1461, MemPalace#1466, MemPalace#1470, MemPalace#1477, MemPalace#1485, MemPalace#1500, MemPalace#1513, MemPalace#1528, MemPalace#1532, MemPalace#1543, MemPalace#1546, MemPalace#1585 Performance: - MemPalace#1474 convo miner pre-fetches mined-set - MemPalace#1487 rebuild_index progress callback - MemPalace#1530 MCP cold-start diagnostics + opt-in warmup Lint passes (ruff 0.15.14); mempalace-mcp entry point alignment verified per RELEASING.md.
Summary
Fix
AttributeError: 'NoneType' object has no attribute 'get'raised by several MCP handlers when ChromaDB returnsNonein a metadata cell. The crash happens before theembeddings_queuecleanup step, so the queue keeps growing while writes appear successful — exactly the failure mode described in the issue.Closes #1426.
What's changing
Add a small
_safe_meta()boundary helper inmempalace/mcp_server.pythat coercesNone(or any other non-dict) to{}, and route every directmetadatas[i]/metadatas[0]indexed read through it:tool_check_duplicate(was already usingor {}; switched to the helper for consistency)tool_delete_drawer— audit-log path (this is the one that bricked the cleanup step in the issue)tool_get_drawertool_list_drawerstool_update_drawertool_diary_readThe contract — metadata is always a dict by the time it leaves the boundary — is documented on the helper and self-documents at each call site.
Why a helper instead of inline
or {}x or {}looks the same but coercesFalse,0,"", and[]to{}too, which would silently swallow truly malformed rows.isinstance(meta, dict) else {}is precise about what we tolerate (None / non-dict garbage) without masking other shape bugs that should surface as test failures. Centralizing the coercion also gives one obvious place to log/alert if we want to track how often it fires in the wild.Test plan
New regression class
tests/test_mcp_server.py::TestNoneMetadataSafety(5 tests):_safe_metaunit test — None,{}, dict, str, list all coerce correctly.tool_get_drawerwithmetadatas=[None]— returns empty wing/room, no crash.tool_list_drawersmixed[None, {...}]— first drawer empty, second intact.tool_update_drawerwith None old metadata + new wing — succeeds, new wing slotted in cleanly.tool_delete_draweraudit-log path with None metadata — reaches thecol.delete(...)call (the exact behavior bricked in the issue).Each test uses a stub collection because Chroma's write path rejects
Noneat insert time, so we can't reproduce the upstream state through the real backend — mocking_get_collectionlets us assert handler tolerance for the failure mode that actually shows up in production.ruff check+ruff format --checkclean.Out of scope (filed in the issue as "Proposed Fixing Paths")
Noneat insert time on current versions; the rows that trigger [BUG] MemPalace MCP Handler AttributeError when metadata is None (Queue Stuck) #1426 in the wild are pre-existing partially-flushed entries from older formats or interrupted writes. Read-side defense is what unblocks affected users today.DELETE FROM embeddings_queuecleanup script. The issue mentions running it as a mitigation; that's a separate operational tool (probably belongs nearrepair.py), not in scope here.Happy to follow up with either if maintainers want them folded into the same PR or split out.