fix: stale-PID timeout, MCP idle exit, structured errors (#1552)#1562
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces stale process detection and an idle watchdog to prevent resource leaks, particularly on Windows. PID files now include a Unix timestamp, allowing the system to reclaim slots if a process exceeds a configurable timeout. Additionally, the MCP server now features a background watchdog that terminates the process after a period of inactivity. Error handling has been enhanced to return structured data, including error classes and exception messages. Feedback focuses on a discrepancy between the documentation and implementation regarding environment variable parsing, where returning zero is preferred over a default value to ensure safety. There is also a critical concern regarding the accidental removal of multi-tenancy test assertions in test_mcp_server.py.
There was a problem hiding this comment.
Pull request overview
This PR addresses Windows deadlock/lockfile accumulation problems (Issue #1552) by adding time-based stale-PID reclamation for mine subprocess hooks, introducing MCP server idle auto-exit, and improving MCP JSON-RPC error payloads with structured metadata so callers can distinguish error types programmatically.
Changes:
- Record
{pid} {unix_timestamp}in mine PID files and treat long-running (alive) mines as stale afterMEMPALACE_MINE_TIMEOUT_HOURS(default 2h). - Add an MCP idle watchdog intended to terminate the server after
MEMPALACE_MCP_IDLE_HOURS(default 8h;0disables). - Extend
_internal_tool_error()to optionally attach{error_class, message}in JSON-RPC errordata, and returnerror_classforMineAlreadyRunningintool_sync.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
mempalace/hooks_cli.py |
Adds timestamped PID slots + timeout-based stale detection for hook-spawned mines. |
mempalace/miner.py |
Updates PID-slot cleanup to handle the new {pid ts} format safely. |
mempalace/mcp_server.py |
Adds idle-exit watchdog and structured JSON-RPC internal error payloads. |
tests/test_hooks_cli.py |
Updates and expands tests for new PID slot format and timeout behavior. |
tests/test_mcp_server.py |
Adds structured-error tests (but currently weakens an existing tenant-isolation test). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add MEMPALACE_MINE_TIMEOUT_HOURS (default 2h): PID files now record
'{pid} {unix_timestamp}'; _mine_already_running() treats alive-but-old
processes as stale, unblocking queued mines after a ChromaDB hang.
Backward-compatible: bare-PID files (old format) treated as stale.
- Add MEMPALACE_MCP_IDLE_HOURS (default 8h): daemon watchdog thread in
mcp_server calls sys.exit(0) after the configured idle period, preventing
accumulation of stale server processes holding ChromaDB/HNSW file handles.
Set to 0 to disable.
- Enrich _internal_tool_error() with optional exc parameter: adds
data: {error_class, message} to JSON-RPC error body so callers can
distinguish lock contention, ChromaDB transients, and segfaults without
scraping the message string. MineAlreadyRunning handler in tool_sync()
adds error_class: 'LockHeldByOtherProcess' to the result dict.
- Update miner._cleanup_mine_pid_file() to parse first whitespace token
as PID (handles both old and new PID file formats).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Disable mine and MCP idle timeouts when env values are invalid. - Use bare-PID slot file mtime as the compatibility timestamp instead of treating old slots as infinitely stale. - Treat malformed PID-slot timestamps as stale without crashing hook execution. - Use process-level idle watchdog termination so stale MCP servers actually release file handles. - Restore tenant-isolation assertions accidentally removed from the KG cache test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bumps version 3.3.5 → 3.3.6 across pyproject.toml, version.py, plugin manifests (.claude-plugin/plugin.json, .claude-plugin/marketplace.json, .codex-plugin/plugin.json), README badge, and uv.lock. Flips CHANGELOG.md from ``[Unreleased]`` to ``[3.3.6] — 2026-05-24`` and backfills the major user-facing entries that landed without changelog entries during the cycle: Features: - MemPalace#1555 office-document mining via --mode extract + virtual line numbers - MemPalace#1584 surgical closet pointers with date+line locators (Tier 6a) - MemPalace#1558 + MemPalace#1560 within-wing hallways (entity co-occurrence graph) - MemPalace#1565 cross-wing tunnels auto-promoted from hallways - MemPalace#1578 Hebbian potentiation + Ebbinghaus decay on hallways/tunnels - MemPalace#1236 API-tool transcripts auto-route to wing_api - MemPalace#711 hooks.auto_save toggle for silent-mode sessions - MemPalace#1605 COCA content-word filter for entity detection - MemPalace#1557 case-insensitive entity matching at mine time - MemPalace#1483 multilingual embeddings (embeddinggemma-300m) by default Bug Fixes (selected, user-visible): - MemPalace#1540 silent data loss in three unchunked upsert sites - MemPalace#1538 paragraph chunker oversized chunks - MemPalace#1554 per-file chunk cap too low for transcripts - MemPalace#1562 Windows hook subprocess/ChromaDB deadlock - MemPalace#1529 create_tunnel corrupted hyphenated wing names - MemPalace#1424 save-hook truncated hyphenated project folders - MemPalace#1383 KG cache duplicated graphs for symlinked/cased paths - MemPalace#1466 silent symlink skip now logged - MemPalace#1441 macOS stock-bash 3.2 hook compatibility - MemPalace#1500 / MemPalace#1513 structured JSON-RPC errors on bad MCP input - MemPalace#1523 VACUUM + FTS5 rebuild after repair - MemPalace#1548 FTS5 validation at end of mine - plus MemPalace#1216, MemPalace#1408, MemPalace#1438, MemPalace#1439, MemPalace#1445, MemPalace#1452, MemPalace#1459, MemPalace#1461, MemPalace#1466, MemPalace#1470, MemPalace#1477, MemPalace#1485, MemPalace#1500, MemPalace#1513, MemPalace#1528, MemPalace#1532, MemPalace#1543, MemPalace#1546, MemPalace#1585 Performance: - MemPalace#1474 convo miner pre-fetches mined-set - MemPalace#1487 rebuild_index progress callback - MemPalace#1530 MCP cold-start diagnostics + opt-in warmup Lint passes (ruff 0.15.14); mempalace-mcp entry point alignment verified per RELEASING.md.
Fixes #1552
Changes