Skip to content

fix: stale-PID timeout, MCP idle exit, structured errors (#1552)#1562

Merged
igorls merged 3 commits into
developfrom
igorls/fix-subprocess-hook-chromadb-deadlock-wi
May 20, 2026
Merged

fix: stale-PID timeout, MCP idle exit, structured errors (#1552)#1562
igorls merged 3 commits into
developfrom
igorls/fix-subprocess-hook-chromadb-deadlock-wi

Conversation

@igorls

@igorls igorls commented May 20, 2026

Copy link
Copy Markdown
Member

Fixes #1552

Changes

  • Mine stale-PID timeout (hooks_cli.py, miner.py): PID files now record {pid} {unix_timestamp}. _mine_already_running() treats alive-but-old processes as stale after MEMPALACE_MINE_TIMEOUT_HOURS (default 2h). Backward-compatible with bare-PID format.
  • MCP idle auto-exit (mcp_server.py): Daemon watchdog thread exits after MEMPALACE_MCP_IDLE_HOURS idle (default 8h; 0=disabled), preventing accumulation of stale server processes holding ChromaDB file handles.
  • Structured error codes (mcp_server.py): _internal_tool_error() now accepts optional exc and adds data: {error_class, message} to JSON-RPC error bodies. MineAlreadyRunning returns error_class: LockHeldByOtherProcess.
  • Tests: 9 new tests across test_hooks_cli.py and test_mcp_server.py.

@igorls igorls requested a review from milla-jovovich as a code owner May 20, 2026 15:33
Copilot AI review requested due to automatic review settings May 20, 2026 15:33

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces stale process detection and an idle watchdog to prevent resource leaks, particularly on Windows. PID files now include a Unix timestamp, allowing the system to reclaim slots if a process exceeds a configurable timeout. Additionally, the MCP server now features a background watchdog that terminates the process after a period of inactivity. Error handling has been enhanced to return structured data, including error classes and exception messages. Feedback focuses on a discrepancy between the documentation and implementation regarding environment variable parsing, where returning zero is preferred over a default value to ensure safety. There is also a critical concern regarding the accidental removal of multi-tenancy test assertions in test_mcp_server.py.

Comment thread mempalace/hooks_cli.py Outdated
Comment thread tests/test_mcp_server.py
Comment thread mempalace/mcp_server.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Windows deadlock/lockfile accumulation problems (Issue #1552) by adding time-based stale-PID reclamation for mine subprocess hooks, introducing MCP server idle auto-exit, and improving MCP JSON-RPC error payloads with structured metadata so callers can distinguish error types programmatically.

Changes:

  • Record {pid} {unix_timestamp} in mine PID files and treat long-running (alive) mines as stale after MEMPALACE_MINE_TIMEOUT_HOURS (default 2h).
  • Add an MCP idle watchdog intended to terminate the server after MEMPALACE_MCP_IDLE_HOURS (default 8h; 0 disables).
  • Extend _internal_tool_error() to optionally attach {error_class, message} in JSON-RPC error data, and return error_class for MineAlreadyRunning in tool_sync.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
mempalace/hooks_cli.py Adds timestamped PID slots + timeout-based stale detection for hook-spawned mines.
mempalace/miner.py Updates PID-slot cleanup to handle the new {pid ts} format safely.
mempalace/mcp_server.py Adds idle-exit watchdog and structured JSON-RPC internal error payloads.
tests/test_hooks_cli.py Updates and expands tests for new PID slot format and timeout behavior.
tests/test_mcp_server.py Adds structured-error tests (but currently weakens an existing tenant-isolation test).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_mcp_server.py
Comment thread mempalace/mcp_server.py Outdated
Comment thread mempalace/hooks_cli.py Outdated
Comment thread mempalace/hooks_cli.py Outdated
Comment thread mempalace/hooks_cli.py
@igorls igorls merged commit 2e29014 into develop May 20, 2026
6 checks passed
igorls and others added 3 commits May 20, 2026 15:31
- Add MEMPALACE_MINE_TIMEOUT_HOURS (default 2h): PID files now record
  '{pid} {unix_timestamp}'; _mine_already_running() treats alive-but-old
  processes as stale, unblocking queued mines after a ChromaDB hang.
  Backward-compatible: bare-PID files (old format) treated as stale.
- Add MEMPALACE_MCP_IDLE_HOURS (default 8h): daemon watchdog thread in
  mcp_server calls sys.exit(0) after the configured idle period, preventing
  accumulation of stale server processes holding ChromaDB/HNSW file handles.
  Set to 0 to disable.
- Enrich _internal_tool_error() with optional exc parameter: adds
  data: {error_class, message} to JSON-RPC error body so callers can
  distinguish lock contention, ChromaDB transients, and segfaults without
  scraping the message string. MineAlreadyRunning handler in tool_sync()
  adds error_class: 'LockHeldByOtherProcess' to the result dict.
- Update miner._cleanup_mine_pid_file() to parse first whitespace token
  as PID (handles both old and new PID file formats).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Disable mine and MCP idle timeouts when env values are invalid.
- Use bare-PID slot file mtime as the compatibility timestamp instead of treating old slots as infinitely stale.
- Treat malformed PID-slot timestamps as stale without crashing hook execution.
- Use process-level idle watchdog termination so stale MCP servers actually release file handles.
- Restore tenant-isolation assertions accidentally removed from the KG cache test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
arnoldwender pushed a commit to arnoldwender/mempalace that referenced this pull request May 24, 2026
Bumps version 3.3.5 → 3.3.6 across pyproject.toml, version.py, plugin
manifests (.claude-plugin/plugin.json, .claude-plugin/marketplace.json,
.codex-plugin/plugin.json), README badge, and uv.lock. Flips CHANGELOG.md
from ``[Unreleased]`` to ``[3.3.6] — 2026-05-24`` and backfills the
major user-facing entries that landed without changelog entries during
the cycle:

Features:
- MemPalace#1555 office-document mining via --mode extract + virtual line numbers
- MemPalace#1584 surgical closet pointers with date+line locators (Tier 6a)
- MemPalace#1558 + MemPalace#1560 within-wing hallways (entity co-occurrence graph)
- MemPalace#1565 cross-wing tunnels auto-promoted from hallways
- MemPalace#1578 Hebbian potentiation + Ebbinghaus decay on hallways/tunnels
- MemPalace#1236 API-tool transcripts auto-route to wing_api
- MemPalace#711 hooks.auto_save toggle for silent-mode sessions
- MemPalace#1605 COCA content-word filter for entity detection
- MemPalace#1557 case-insensitive entity matching at mine time
- MemPalace#1483 multilingual embeddings (embeddinggemma-300m) by default

Bug Fixes (selected, user-visible):
- MemPalace#1540 silent data loss in three unchunked upsert sites
- MemPalace#1538 paragraph chunker oversized chunks
- MemPalace#1554 per-file chunk cap too low for transcripts
- MemPalace#1562 Windows hook subprocess/ChromaDB deadlock
- MemPalace#1529 create_tunnel corrupted hyphenated wing names
- MemPalace#1424 save-hook truncated hyphenated project folders
- MemPalace#1383 KG cache duplicated graphs for symlinked/cased paths
- MemPalace#1466 silent symlink skip now logged
- MemPalace#1441 macOS stock-bash 3.2 hook compatibility
- MemPalace#1500 / MemPalace#1513 structured JSON-RPC errors on bad MCP input
- MemPalace#1523 VACUUM + FTS5 rebuild after repair
- MemPalace#1548 FTS5 validation at end of mine
- plus MemPalace#1216, MemPalace#1408, MemPalace#1438, MemPalace#1439, MemPalace#1445, MemPalace#1452, MemPalace#1459, MemPalace#1461, MemPalace#1466,
  MemPalace#1470, MemPalace#1477, MemPalace#1485, MemPalace#1500, MemPalace#1513, MemPalace#1528, MemPalace#1532, MemPalace#1543, MemPalace#1546, MemPalace#1585

Performance:
- MemPalace#1474 convo miner pre-fetches mined-set
- MemPalace#1487 rebuild_index progress callback
- MemPalace#1530 MCP cold-start diagnostics + opt-in warmup

Lint passes (ruff 0.15.14); mempalace-mcp entry point alignment
verified per RELEASING.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Subprocess hook architecture causes ChromaDB cold-init deadlock + STATUS_ACCESS_VIOLATION segfaults on Windows (multi-session)

2 participants