fix(lifecycle): trigger isParentAlive re-check on stdin EOF to close 30s CPU-spin window (#388)#389
Merged
Conversation
buildNodeCommand() converts backslashes to forward slashes to prevent MSYS path mangling on Windows. The test assertion was comparing against raw pluginRoot (backslashes from mkdtempSync) causing CI failure on windows-latest while macOS and Ubuntu passed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…30s CPU-spin window (mksglu#311, mksglu#388) The vendored MCP SDK's StdioServerTransport only registers 'data' and 'error' listeners on process.stdin. When the parent (e.g. Claude Code) dies abruptly without sending SIGTERM, the server keeps reading from a half-closed pipe and CPU-spins until the 30s ppid poll catches up. In practice this manifests as orphaned context-mode processes accumulating ~80h of CPU time before being SIGKILL'd manually (mksglu#388). The fix adds a single 'end' listener on process.stdin inside the lifecycle guard. It does NOT shut down on 'end' alone — that's the false-positive behavior mksglu#236 tore out. Instead, 'end' triggers the same isParentAlive() probe the periodic timer runs, just earlier: - parent alive → no-op (mksglu#236 regression test still passes) - parent dead → 30s detection window collapses to ~0 Skipped on TTY (OpenCode ts-plugin), where stdin is not the MCP channel. Tests: added a unit test that emits stdin 'end' under both alive and dead parent conditions, and updated the existing listener-invariance test to pin the new contract (only 'end' touched, restored on cleanup). All existing tests still pass.
mksglu
added a commit
that referenced
this pull request
May 2, 2026
* fix(insight): move showAllInsights useState before early return (React #310) * 1.0.102 * fix(test): normalize pluginRoot path separators for Windows (#369) buildNodeCommand() converts backslashes to forward slashes to prevent MSYS path mangling on Windows. The test assertion was comparing against raw pluginRoot (backslashes from mkdtempSync) causing CI failure on windows-latest while macOS and Ubuntu passed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(stats): persist counter + show lifetime + auto-memory + business value framing - Add tool_calls table to SessionDB — counter survives upgrades and --continue - Show persistent memory totals (events across all sessions) - Show auto-memory count from ~/.claude/projects/*/memory/ - Replace hardcoded '9 more' with actual category count - Use Opus pricing ($15/M) for cost calculations - Replace '3.0x' with '3x longer sessions' phrasing - Add 'Bottom line' footer with session/lifetime cost summary Closes the upgrade-resets-stats bug. ctx_stats now correctly shows that data persists across compaction, restart, and upgrade. * fix(windows): normalize hooks.json placeholders on startup (#378) The committed hooks/hooks.json and .claude-plugin/plugin.json use ${CLAUDE_PLUGIN_ROOT} placeholders + bare 'node' command. On Windows + Claude Code, this causes runtime loader failures (cjs/loader:1479) because: 1. bare 'node' may not resolve via PATH (Git Bash issue, see #369) 2. ${CLAUDE_PLUGIN_ROOT} resolution can hit MSYS path mangling (see #372) 3. backslash paths get corrupted in shell quoting Fix: start.mjs detects Windows on every MCP server boot. If hooks.json or plugin.json contain unresolved placeholders, rewrites them with: - process.execPath (absolute, quoted) instead of bare 'node' - Forward-slash paths (prevent MSYS translation) - Double-quoted paths (handle spaces) Idempotent — only rewrites when placeholder pattern is detected. Survives upgrades — runs at every start. Closes #378 * fix(cache-heal): use shebang on Unix, self-heal stale node paths After Brew updates Node, the versioned Cellar path written to ~/.claude/settings.json becomes stale, causing 'session start' errors: /opt/homebrew/Cellar/node/25.9.0_2/bin/node (gone after upgrade) vs the stable symlink: /opt/homebrew/bin/node (always current) Root cause: start.mjs wrote `process.execPath` directly, which on Brew returns the versioned path snapshot. Fix (2 layers): 1. New installs on Unix: write cache-heal script with shebang (#!/usr/bin/env node) + chmod +x, register hook as bare script path. `env` resolves node from PATH at runtime — survives any Node upgrade. 2. Self-heal: every MCP boot, check if existing hook command references a node path that no longer exists. If stale, rewrite using current pattern. Windows unchanged (no shebang support) — uses process.execPath + buildHookCommand pattern, plus self-heal for any breakage. Reported by @vigo on Discord. * fix(lifecycle): trigger isParentAlive re-check on stdin EOF to close 30s CPU-spin window (#388) (#389) * fix(test): normalize pluginRoot path separators for Windows (#369) buildNodeCommand() converts backslashes to forward slashes to prevent MSYS path mangling on Windows. The test assertion was comparing against raw pluginRoot (backslashes from mkdtempSync) causing CI failure on windows-latest while macOS and Ubuntu passed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: update install stats * ci: update install stats * ci: update install stats * ci: update install stats * ci: update install stats * fix(lifecycle): trigger isParentAlive re-check on stdin EOF to close 30s CPU-spin window (#311, #388) The vendored MCP SDK's StdioServerTransport only registers 'data' and 'error' listeners on process.stdin. When the parent (e.g. Claude Code) dies abruptly without sending SIGTERM, the server keeps reading from a half-closed pipe and CPU-spins until the 30s ppid poll catches up. In practice this manifests as orphaned context-mode processes accumulating ~80h of CPU time before being SIGKILL'd manually (#388). The fix adds a single 'end' listener on process.stdin inside the lifecycle guard. It does NOT shut down on 'end' alone — that's the false-positive behavior #236 tore out. Instead, 'end' triggers the same isParentAlive() probe the periodic timer runs, just earlier: - parent alive → no-op (#236 regression test still passes) - parent dead → 30s detection window collapses to ~0 Skipped on TTY (OpenCode ts-plugin), where stdin is not the MCP channel. Tests: added a unit test that emits stdin 'end' under both alive and dead parent conditions, and updated the existing listener-invariance test to pin the new contract (only 'end' touched, restored on cleanup). All existing tests still pass. --------- Co-authored-by: Mert Koseoglu <bm.ksglu@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ray <cho_meiko@okuribito-funeral.jp> * fix(executor): hide Windows console + drop .sh extension for shell exec (#384) On Windows, ctx_execute(language: 'shell', ...) had two problems: 1. Silent output - child_process.spawn without windowsHide:true creates a visible console window that intercepts stdout, leaving the MCP response empty. 2. Git Bash popup - temp script written as 'script.sh' triggers Windows file association for .sh files. bash.exe opens a visible window over the user's IDE. Fix (minimal, two surgical changes): - spawn(..., { windowsHide: isWin }) via buildSpawnOptions(platform) - isWin && language === 'shell' ? 'script' : 'script.{ext}' via buildScriptFilename(language, platform) Both changes are Windows-gated. Linux/macOS behavior unchanged. Does NOT change shell invocation semantics (no bash -c wrapper). Does NOT add SHELL env override. Both deferred - separate features. Helpers exposed as pure functions for unit testing without mocking spawn or filesystem. Closes #384. Supersedes #385 with smaller surface area. * fix(executor): full Windows shell coverage — bash -c source + SHELL override (#384) Builds on commit 9d1f44f (windowsHide + no .sh extension) with the remaining two root causes: 1. MSYS2 path mangling on non-C: drives. When bash.exe receives a script as a direct argument, MSYS rewrites paths like D:\tmp\script to D:\c\tmp\script, breaking execution. Fix: wrap in bash -c "source 'path'". The -c flag prevents MSYS from touching the file argument. 2. SHELL env var override. Users with non-standard shell setups (WSL, custom bash location, msys2 installations) need to point context-mode at their preferred shell. detectRuntimes() now checks process.env.SHELL first; if the path exists, uses it. Single-quote escape applied to filePath in bash -c form to handle paths containing apostrophes safely. PowerShell uses -File flag (correct .ps1 invocation). cmd.exe uses direct file (.cmd association is safe — no Git Bash issue). Closes #384 fully (in addition to commit 9d1f44f). Test coverage: - SHELL env override (3 tests) - buildCommand bash -c source (Windows + Unix variants, 5 tests) - Single-quote escape edge case - All previous Windows shell tests still pass * fix(test): normalize scriptPath separators in cache-heal-self-heal assertion Same root cause as the #369 test fix: buildHookCommand normalizes backslashes to forward slashes for cross-platform safety (MSYS/Git Bash mangling prevention). Test assertion at line 189 compared against the raw scriptPath from mkdtempSync (backslash-separated on Windows), causing CI failure on windows-latest while macOS and Ubuntu passed. Aligns line 189 with line 234 which already had this normalization. * fix(openclaw): route native tool aliases (#383) * fix(test): normalize pluginRoot path separators for Windows (#369) buildNodeCommand() converts backslashes to forward slashes to prevent MSYS path mangling on Windows. The test assertion was comparing against raw pluginRoot (backslashes from mkdtempSync) causing CI failure on windows-latest while macOS and Ubuntu passed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: update install stats * ci: update install stats * fix(openclaw): route native tool aliases --------- Co-authored-by: Mert Koseoglu <bm.ksglu@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * fix(hooks): passthrough on `ask` in headless mode (CLAUDE_CODE_HEADLESS) (#380) * fix(test): normalize pluginRoot path separators for Windows (#369) buildNodeCommand() converts backslashes to forward slashes to prevent MSYS path mangling on Windows. The test assertion was comparing against raw pluginRoot (backslashes from mkdtempSync) causing CI failure on windows-latest while macOS and Ubuntu passed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: update install stats * ci: update install stats * fix(hooks): passthrough on `ask` in headless mode In `claude --print`, the CLI has no TTY to surface a permission prompt. When routing returns `action: "ask"`, the formatter emits `permissionDecision: "ask"` and the CLI hangs forever waiting on a user verdict that can never arrive. Mirror gemini-cli.mjs: when `CLAUDE_CODE_HEADLESS=1` is set in the environment, return null (passthrough) on `ask`. Other actions (deny/modify/context) unchanged. Interactive sessions are unaffected — without the env var, behavior is identical to before. Launcher scripts running headless agents must export `CLAUDE_CODE_HEADLESS=1` to opt in. * fix(hooks): extend headless passthrough to deny + modify Without this, v1.0.103's routing.mjs returns action:'modify' for 'dangerous' curl/wget invocations — silently rewriting the command into an echo that suggests ctx_execute. In TTY sessions that nudge is useful (the agent reconsiders or asks the user). In headless 'claude --print' the agent has no UI to reconsider; the rewritten echo runs, produces zero stdout, and downstream pipelines see a silent failure. Same shape as the existing 'ask' fix: case "deny": + if (isHeadless()) return null; return { ... }; case "modify": + if (isHeadless()) return null; return { ... }; The 'context' case is left as-is (additionalContext is informational, doesn't block the tool). Two existing tests in formatters.test.ts that asserted 'still formats deny/modify normally' under CLAUDE_CODE_HEADLESS=1 are inverted to assert the new passthrough behavior, matching the existing 'ask' test pattern. 21/21 tests pass. --------- Co-authored-by: Mert Koseoglu <bm.ksglu@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * fix(memory): adapter-aware persistent memory across all 14 platforms v1.0.100's "Unified Persistent Memory" feature was Claude-centric. Auto-memory, prior session, and persist memory all hardcoded ~/.claude/, breaking 13 of 14 platforms. Plus a worktree filename mismatch broke Claude Code too on worktree sessions. Architectural fix: adds 3 methods to HookAdapter interface so every adapter declares its own conventions: - getConfigDir() — ~/.claude, ~/.codex, ~/.qwen, ~/.gemini, etc. - getInstructionFiles() — ['CLAUDE.md'], ['AGENTS.md'], ['QWEN.md'], etc. - getMemoryDir() — ~/.claude/memory, ~/.codex/memories, etc. BaseAdapter ships sensible defaults derived from sessionDirSegments; only the 11 non-Claude adapters override (Claude inherits). Wiring changes: - searchAutoMemory() now accepts an adapter, dispatches via methods - ctx_search timeline uses _detectedAdapter.getConfigDir() - ctx_search timeline SessionDB filename now includes worktree suffix (matches what session-snapshot/session-extract write to) - extract.ts rule detection covers AGENTS.md, GEMINI.md, QWEN.md, KIRO.md, copilot-instructions.md, context-mode.mdc, and any *.md inside a memory/memories directory Bonus fixes (from PR #376): - OpenCode/KiloCode cache path: now packages/context-mode@latest/ layout (silently changed by upstream late 2024 — broke doctor/upgrade) - OpenCode SessionStart equivalent via experimental.chat.messages.transform — prior-session continuity now works on OpenCode/KiloCode Tests added (8 new files, 65 new tests, all green): - tests/adapters/base-adapter-memory.test.ts (4) - tests/adapters/claude-code-memory.test.ts (3) - tests/adapters/memory-conventions.test.ts (36) - tests/core/auto-memory-adapter.test.ts (5) - tests/core/cache-plugin-root.test.ts (2) - tests/core/server-timeline-adapter.test.ts (3) - tests/opencode-session-start.test.ts (2) - tests/session/extract-rule-detection.test.ts (10) Closes architectural root cause of #367 follow-ups. Supersedes #379 (Codex), #370 (Qwen), #376 (OpenCode/KiloCode portions). Co-Authored-By: Marcus Neufeldt <MarcusNeufeldt@users.noreply.github.com> Co-Authored-By: btxbtxbtx <btxbtxbtx@users.noreply.github.com> Co-Authored-By: Mickey Lazarevic <mikij@users.noreply.github.com> * fix(test): make Windows-host assertions platform-aware Three Windows CI failures in two test files, two distinct root causes: executor.test.ts "buildCommand returns shell command array" - PR #384 changed Windows+bash to return [bash, -c, "source 'path'"] (3 elements) to dodge MSYS path mangling on non-C: drives. - Test still asserted length === 2 universally. Make it platform+shell aware: 3-element bash -c form on Windows+bash, 2-element direct-exec form everywhere else. cache-heal-self-heal.test.ts "Unix: rewrites command when node path is stale" - selfHealCacheHealHook({platform: "linux"}) controls the WRITTEN command but ensureShebangAndExecBit's chmodSync is still a host syscall — NTFS cannot honor 0o755 exec bit. - Skip just the mode assertion on Windows host; shebang check stays. cache-heal-self-heal.test.ts "preserves other hooks unchanged" - Same root cause as 35b6f90: buildHookCommand normalizes backslashes → forward slashes for cross-platform safety. Line 287 was missed when 35b6f90 patched line 190. Apply identical fix. * Add Insight directory overrides (#400) * feat: add insight directory overrides * build: update bundles for insight overrides * fix(server): use sync platform detection for pre-adapter session dir Edge case: when MCP server is called before initialize completes, _detectedAdapter is null and getSessionDir() returns hardcoded ~/.claude/context-mode/sessions/. For non-Claude platforms this means the wrong sessions dir until the adapter is detected. Fix: when adapter is null, call detectPlatform() (sync, env-var-based) and map to platform-specific session dir segments without adapter instantiation. Falls back to .claude only if no platform signal found. Closes the last hardcoded .claude fallback in server.ts. Completes the multi-platform memory work from 6262c13. * chore: rebuild bundles with sync platform detection fix * fix(test): anchor XDG_CONFIG_HOME under fakeHome in memory-conventions OpenCodeAdapter (and kilo variant) honor XDG_CONFIG_HOME / APPDATA before falling back to homedir(). GitHub Actions Ubuntu runners can have XDG_* set to the runner's real /home/runner, which bypasses the homedir() mock from tests/setup-home.ts and leaks the real path into test assertions. Override XDG_CONFIG_HOME / XDG_DATA_HOME / APPDATA / LOCALAPPDATA at the top of memory-conventions.test.ts (after setup-home runs) so all adapters stay sandboxed under fakeHome. Reproduced locally with XDG_CONFIG_HOME=/tmp/fake-xdg before the fix; passes after. * test(session): regression test for cross-session bleed (#398) Adds three tests pinning the contract LMS927369's PR #398 fixed: 1. getSessionEvents(db, sessionId) returns ONLY the requested session's events — no bleed from concurrent sessions even when another session has a more recent session_meta.started_at. 2. getSessionEvents returns [] for unknown sessionId — no fallback to global most-recent (which was the original bug's root cause). 3. getLatestSessionEvents still picks globally-most-recent by design, pinning the existing semantics so future callers can't be surprised. All three would silently break if any of the 6 patched SessionStart adapters regressed back to getLatestSessionEvents(db). * fix(mcp): resolve ctx_index relative paths from project dir (#365) fix(mcp): resolve ctx_index relative paths from project dir Resolves ctx_index relative paths against the project directory (via getProjectDir() env chain: CLAUDE_PROJECT_DIR / *_PROJECT_DIR / CONTEXT_MODE_PROJECT_DIR / cwd) instead of MCP server cwd. Known follow-up gaps tracked separately: - IDEA_INITIAL_DIRECTORY missing from getProjectDir() cascade (JetBrains) - FTS5 source label uses raw user-typed path (dedup gap) - Bundle rebuild - Negative-path test coverage (traversal / env-unset / label collision) - getProjectDir() not unified across server.ts (deny-policy at L429) Co-authored-by: Ousama Ben Younes <ousamabenyounes@users.noreply.github.com> * test(hooks): add cross-platform regression matrix for MCP readiness (#354) * test(hooks): add cross-platform regression matrix for MCP readiness Locks in the directory-scan + PID-liveness contract from #347. The 11 test files updated by #347 changed the sentinel path but never asserted that isMCPReady() returns true for a sentinel whose PID is outside the test runner's process tree — the exact condition the PPID-keyed lookup failed on under WSL2 / `bash -c "node ..."` topologies. Coverage: - sentinelPathForPid + deprecated sentinelPath shape - sentinelDir platform branch (/tmp on Unix, os.tmpdir on win32) - isMCPReady happy path + resilience to malformed payloads - Stale-sentinel self-healing (gated to clean envs) - PPID-independence regression: child PID ∉ runner tree → still true Pure test-only PR. No production changes. * test(hooks): apply review cuts (drop deprecated test, collapse it.each, env-var path) - Drop sentinelPath() deprecated-export test. The JSDoc says it's kept for one release cycle; testing what's about to die is a maintenance trap. - Collapse empty-payload + non-numeric-payload tests into a single it.each. Same contract, fewer lines. - Pass resolved sentinel directory to the regression child via env var instead of recomputing the platform branch inline. Keeps mcp-ready.mjs as the single source of truth for the path shape. * test(hooks): merge mcp-ready regression matrix into core-routing.test.ts Per review feedback: move the three describe blocks (`contract`, `stale-cleanup self-healing`, `PPID-independence`) from the standalone tests/hooks/mcp-ready.test.ts into tests/hooks/core-routing.test.ts as top-level describes after the existing `routePreToolUse` block. Same test bodies, same assertions; only the host file changed. The stale-cleanup block adds a local `beforeEach` that unlinks the file-level `mcpSentinel` so the runner's own live sentinel does not mask the dead-PID cleanup the test verifies. --------- Co-authored-by: Mert Köseoğlu <bm.ksglu@gmail.com> * fix(cli): use execFile to open URL without shell interpolation execSync(`open "${url}"`) interpolates the URL through the shell. The URL is localhost-only today, but the pattern is fragile — a future remote URL or weak port-validation flips it into shell injection. Switches darwin/linux/win32 branches to execFile with an arg array and a try/catch that prints a copyable URL on failure. * perf: cut per-tool-call latency across all 14 adapters Five fixes targeting synchronous hot paths fired on every tool call. Per-session reclaim: ~1.8s macOS / ~0.7-1.5s Linux / ~7.5-12.5s Windows (Windows wins biggest because fork+exec is heavier). A1. src/session/db.ts memoize getWorktreeSuffix per (cwd, env override) Prior: `git worktree list --porcelain` subprocess fork on every ctx_* tool call (~12ms macOS, 50-100ms Windows). Cached: 0.86μs warm — 17,000x faster on the hot path. A2. hooks/session-helpers.mjs 2-level cache for getWorktreeSuffix Hooks are fresh node forks per fire — module cache alone won't survive across calls. Added cross-process tmpdir marker keyed by sha256(cwd) — Windows-safe filename, all-OS tmpdir(). Within a hook process, in-memory cache hits 2 of 3 callsites (db/events/cleanup paths). Across hook processes, marker file short-circuits the git fork. Bench: 41ms cold child vs 28ms warm child = -12ms/fire on macOS, ~50-100ms/fire on Windows. A3. src/server.ts defer persistToolCallCounter via setImmediate SQLite open/select/update/close was on the response path. Now runs after response returns. Removes 1-3ms from user-perceived latency per ctx_* call. C1. hooks/auto-injection.mjs collapse 4× O(N) Array.filter() into one O(N) pass. UserPromptSubmit fires this every prompt; with N up to 100 events the prior code walked the array 4 times. C5. src/session/db.ts + hooks/session-loaders.mjs add bulkInsertEvents PostToolUse emits 5-15 events per tool call; per-event insertEvent ran N transactions = N WAL commits. Bulk path pre-computes hashes outside the SQL transaction, then runs all dedup/evict/insert work inside one transaction. attributeAndInsertEvents prefers bulk when available, falls back to loop for backward compat. C3. src/search/auto-memory.ts single statSync per candidate file Prior code stat'd each candidate twice (size guard, mtime). Reuse the first stat for both — one syscall per file instead of two. Cross-platform: every fix uses platform-agnostic primitives (tmpdir(), sha256-hashed filenames, setImmediate, in-process module cache). Tested on macOS locally (2045/2064 vitest pass, tsc --noEmit clean); CI exercises Linux + Windows. The 5 fixes apply uniformly to all 14 adapters because they all funnel through the same session-helpers / session-db / MCP server hot paths. * test(server): regression guard for ctx_fetch_and_index tmp cleanup ctx_fetch_and_index writes fetched content (including auth headers and API tokens via subprocess fetch) to os.tmpdir()/ctx-fetch-*.dat before reading and indexing. On macOS /tmp is world-readable, so leaking even one file is a P0 issue on shared hosts. The handler currently wraps the read in try/finally with rmSync(outputPath), but nothing prevents a future refactor from dropping that block. Adds tests/core/fetch-cleanup.test.ts with two layers of protection: 1. Static-source guard — fails if the handler in src/server.ts loses the `finally { ... rmSync(outputPath) ... }` block. Verified RED by deleting the finally block (matches fail) and GREEN by restoring it. 2. Behavioural tests — replicate the read+cleanup pattern against a local HTTP fixture covering success, empty content, error before write, and partial-write-then-throw paths. All assert no ctx-fetch-*.dat file remains in os.tmpdir() after the call. No production code change — the fix already landed in 45ecf90; this commit only locks the invariant in. * fix(server): include IDEA_INITIAL_DIRECTORY in getProjectDir() chain JetBrains adapter sets IDEA_INITIAL_DIRECTORY but the server's getProjectDir() cascade did not read it, so ctx_index relative paths resolved against the IDE bin dir instead of the project root. Adds it to the env cascade and pins the regression with a JSON-RPC spawn test that asserts resolution under IDEA_INITIAL_DIRECTORY only. * docs: correct repo path, tool names, version, and adapter list - llms.txt referenced the wrong repo (claude-context-mode) in title and 18 raw URLs. - llms-full.txt documented tools without the ctx_ prefix and was pinned to v0.9.22 with a 6-tool count; updates to 11+ tools matching the live server registry, drops the stale version pin. - platform-support.md said "nine platforms"; adds detail sections for OpenClaw and Zed and updates the count. - README "6 sandbox tools" updated to current count. * fix(server): include URL in ctx_fetch_and_index cache key getSourceMeta(label) returned the meta from any prior fetch with the same label, so two distinct URLs sharing a label silently returned the cached first response instead of fetching the second. Composes the cache key from label+url so legitimate cache hits still work but cross-URL label reuse no longer serves stale content. * fix(codex): default projectDir to cwd when env and input missing Codex's parser left projectDir undefined when neither input.cwd nor the platform env var (CODEX_PROJECT_DIR) was set, so downstream hooks received undefined and broke under worktrees / non-default cwd. Aligns with cursor/opencode pattern by falling back to process.cwd(). * fix(gemini-cli): default projectDir to cwd when env and input missing Gemini CLI's parser left projectDir undefined when neither input.cwd nor the platform env var (GEMINI_PROJECT_DIR / CLAUDE_PROJECT_DIR) was set, so downstream hooks received undefined and broke under worktrees / non-default cwd. Now also accepts cwd from the wire input and falls back to process.cwd(), aligning with the cursor/opencode pattern. * fix(openclaw): default projectDir to cwd when env and input missing OpenClaw's parser left projectDir undefined when neither input.cwd nor the platform env var (OPENCLAW_PROJECT_DIR) was set, so downstream hooks received undefined and broke under worktrees / non-default cwd. Aligns with cursor/opencode pattern by falling back to process.cwd(). * fix(zed): default projectDir to cwd when env and input missing Zed's parser left projectDir undefined when neither input.cwd nor the platform env var (ZED_PROJECT_DIR) was set, so downstream hooks received undefined and broke under worktrees / non-default cwd. Aligns with cursor/opencode pattern by falling back to process.cwd(). Replaces the throw-on-call defensive parsers with parsers that return a minimal event using the standard fallback chain. Zed remains mcp-only (capability flags are still all false), so these parsers should not be invoked in normal operation — they exist as safe defaults if a misconfigured caller bypasses the capability check. * fix(antigravity): default projectDir to cwd when env and input missing Antigravity's parser left projectDir undefined when neither input.cwd nor the platform env var (ANTIGRAVITY_PROJECT_DIR) was set, so downstream hooks received undefined and broke under worktrees / non-default cwd. Aligns with cursor/opencode pattern by falling back to process.cwd(). Replaces the throw-on-call defensive parsers with parsers that return a minimal event using the standard fallback chain. Antigravity remains mcp-only (capability flags are still all false), so these parsers should not be invoked in normal operation - they exist as safe defaults if a misconfigured caller bypasses the capability check. * fix(server): unify deny-policy project-dir resolution with getProjectDir() checkFilePathDenyPolicy used `process.env.CLAUDE_PROJECT_DIR ?? cwd()` which skips GEMINI_PROJECT_DIR / VSCODE_CWD / OPENCODE_PROJECT_DIR / PI_PROJECT_DIR / IDEA_INITIAL_DIRECTORY / CONTEXT_MODE_PROJECT_DIR. Non-Claude adapters either failed open or matched the wrong repo's deny rules. Routes resolution through the existing getProjectDir() helper so all 12 adapters apply policy against the correct root. * fix(server): canonicalize ctx_index source label to resolvedPath source ?? path used the raw user-typed input, so the same absolute file indexed via './foo.md', 'foo.md', or 'subdir/../foo.md' produced three FTS5 rows because dedup keys on sources.label. Default the label to the resolved absolute path; explicit `source` still wins. Adds a regression test pinning that two relative spellings of the same file yield exactly one row. * test(server): pin negative-path coverage for ctx_index resolution Adds three regression tests: - Relative `../` path traversal stays allowed (matches current trust-boundary policy; pinned to surface future security changes). - CLAUDE_PROJECT_DIR unset falls back to spawned-server cwd. - Strengthens the absolute-path bypass test to assert the source label equals the absolute path, so the test fails on baselines that skip the resolver. * refactor(server): unify ad-hoc project-dir resolution on getProjectDir() PR #365 added the getProjectDir() env cascade but only routed ctx_index through it; ctx_execute_file's executor captured project root once at construction (CLAUDE_PROJECT_DIR ?? cwd), so the same relative path resolved differently across the two tools when only CONTEXT_MODE_PROJECT_DIR was set. Switches the executor to lazy resolution via getProjectDir(). Adds a regression test asserting ctx_execute_file resolves under CONTEXT_MODE_PROJECT_DIR when CLAUDE_PROJECT_DIR is unset. * Revert "fix(zed): default projectDir to cwd when env and input missing" This reverts commit 7cd9535. * Revert "fix(antigravity): default projectDir to cwd when env and input missing" This reverts commit 2f5442f. * fix(test): isolate Windows test env from start.mjs side effects - start.mjs: skip normalizeHooksOnStartup under VITEST. server.test.ts spawns start.mjs from the repo root; on Windows it was mutating the committed .claude-plugin/plugin.json, which then poisoned cli.test.ts:156's ${CLAUDE_PLUGIN_ROOT} placeholder assertion. - memory-conventions: make OpenCode/Kilo getConfigDir/getMemoryDir expectations platform-aware. Adapter honors XDG_CONFIG_HOME on POSIX and APPDATA on Windows; tests previously asserted ~/.config on all platforms. * feat(batch_execute): opt-in concurrency for I/O-bound batches Adds a `concurrency: 1-8` parameter to ctx_batch_execute. Default 1 preserves the existing serial path (shared timeout budget, cascading skip on timeout). >1 switches to a worker pool with per-command timeouts and order-preserving output. Local benchmark: 5× sleep(500ms) → 533ms at concurrency=8 (4.97× speedup vs 2651ms serial). Why now: LLM agents fan out multi-source research (gh/curl/git batches). Sequential I/O is pure wait; concurrency turns it into overlapped wait without any user-facing API change. Tool description hardened with positive guidance per PRD-concurrency-architectural.md §4: ✅ network/I/O batches use 4-8, ❌ CPU-bound (npm test, build, lint) and stateful (ports, locks) stay at 1. Architecture: - runBatchCommands() extracted as pure function with BatchExecutor interface — testable in isolation, no MCP/SDK dependency. - Handler is now a thin wiring layer (executor + sessionStats + store). - THINK IN CODE directive preserved at full strength in description. Tests (tests/core/server.test.ts, 12 new): - Serial: order, cascade-skip, shared-budget exhaustion. - Parallel: order preservation, in-flight cap, per-command timeout isolation, FS bytes callback, cmd-count > concurrency safety. - Edge: empty array, no-output sentinel, prefix prepending. Schema/description coverage assertion in batch_execute FS read tracking suite proves the contract stays documented. Co-Authored-By: Sebastian Breguel <sebastianbreguel@gmail.com> * chore(batch_execute): strengthen THINK IN CODE in tool description THINK IN CODE upgraded from soft guidance to NON-NEGOTIABLE directive, with explicit clarification of how it relates to concurrency: Concurrency parallelizes the FETCH; THINK IN CODE owns the PROCESSING. Adds the tactical detail (pure JavaScript, Node.js built-ins, try/catch, null-safe) so LLMs writing the processing command have an unambiguous contract — same level of specificity already present in ctx_execute and ctx_execute_file descriptions. No behavior change. Existing description-coverage assertion still passes. * fix(test): raise insight-cors beforeAll hookTimeout to 120s Default vitest hookTimeout (10s) is shorter than the inner 3-attempt × 30s waitForInsight polling, so any slow Windows runner that takes >10s to spawn node + open sqlite deterministically fails. Pin to 120s to fit the worst-case retry budget. * feat(concurrency): opt-in parallelism for I/O-bound MCP tools Adds a `concurrency: 1-8` parameter to ctx_batch_execute and ctx_fetch_and_index, plus a shared `runPool` primitive, observability extractor, and Parallel I/O guidance across all 14 adapter routing docs. What ships - src/concurrency/runPool.ts (new): generic in-flight-capped worker pool returning Promise.allSettled-style results. Single primitive used by both batch tools — no copy-pasted worker logic. - ctx_batch_execute: serial branch unchanged (shared timeout budget, cascading skip). Parallel branch routed through runPool. Description hardened with PARALLELIZE I/O ✅/❌ guidance and NON-NEGOTIABLE THINK IN CODE clause. - ctx_fetch_and_index: accepts both legacy `{url, source}` (single, exact backward-compat wording) and new `{requests: [{url, source}]}` (batch). Workers fetch in parallel via runPool; FTS5 writes drain serially through indexFetched to avoid SQLite WAL contention. Per-URL preview capped at 384 chars in batch mode (~3KB total) so context-savings hold under 8-URL fan-outs. composeFetchCacheKey wiring preserved across the refactor — same-label-different-URL collisions stay fixed (commit 1f1243e regression test enforced). - effectiveConcurrency = min(N, os.cpus().length) when capByCpuCount set. Response surfaces capped count in caveman style. - mcp_tool_call extractor (src/session/extract.ts) persists tool_input for mcp__* events with UTF-8-aware truncation at 2KB. Unlocks getMcpToolUsage() analytics — median/max concurrency per batch tool visible in ctx_stats. - 14 adapter routing docs updated with the same Parallel I/O paragraph adapted to each host's tool-call prefix style. GitHub rate-limit caveat included consistently. Hardening from 2-round architectural review - Worker try/catch + Promise.allSettled isolation: one job throw no longer strands siblings or leaves undefined output slots. - Timeout sentinel routes through formatCommandOutput: __CM_FS__ markers stripped + bytes counted on partial-stdout-on-timeout. - trackIndexed moved after FTS5 write succeeds (no over-count on failed indexes). - UTF-8-aware truncate (Buffer.byteLength + continuation-byte walk-back): multi-byte payloads (CJK, 4-byte symbols) honor the byte budget without landing mid-codepoint. - cpuCountForCap helper deleted: was CommonJS require in an ESM file, silently always returning 1. Replaced with top-level `cpus` import from node:os. Tests (per CONTRIBUTING.md no-new-test-files rule, all under existing files) - 7 runPool unit tests: order, throw isolation, in-flight cap, job-count clamp, os.cpus cap, onSettled callback ordering. - 13 ctx_fetch_and_index batch source-level tests: schema accepts both shapes, serial-write contract holds, backward-compat wording preserved, batch preview cap enforced, caveman header formatting, composeFetchCacheKey wiring across the refactor. - 3 P0 hardening tests: throw-isolation, timeout marker stripping, 5-cmd × 100ms at concurrency=5 < 200ms (CI-checked timing regression replacing the deleted bench). - 4 mcp_tool_call extractor tests including UTF-8 multibyte regression. - 3 getMcpToolUsage analytics tests. Verification - 138/138 server.test.ts pass; 309/309 across server + extract + analytics on cw/ctx-analytics. - On next: 318/326 pass. 8 pre-existing unrelated failures (ctx_index projectRoot resolution from #365, ctx_execute_file env cascade, getSessionDir pre-detection) untouched. - Typecheck clean. Co-Authored-By: Sebastian Breguel <sebastianbreguel@gmail.com> * fix(stats): restore Auto-memory, Opus pricing, and business-value footer Commit b392c2f rewrote src/session/analytics.ts as part of the opt-in concurrency feature and inadvertently dropped the user-facing stats improvements landed in 4742160 (bugs #5/#6/#7/#8): the Auto-memory preferences-learned line, the "Your AI talks less, remembers more, costs less" tagline, the Opus pricing breakdown, the "$X this session / $Y lifetime" footer, and the per-prefix auto-memory bars. The 4th arg to formatReport also collapsed from an options object {lifetime, mcpUsage} into bare mcpUsage, breaking every test that passed lifetime data. Restores the options-object signature, re-renders all dropped sections under their original guards, and keeps b392c2f's runPool and getMcpToolUsage infrastructure intact (no concurrency revert). Updates the src/server.ts call site to match. tests/session/stats-output-format.test.ts back to green (12/12 pass); no other tests regress. * feat(stats): persist runtime stats + status line bar (#399) Adds a Claude Code statusLine integration so users see live token savings at the bottom of their terminal without invoking any MCP tool. - src/server.ts: persistStats() writes <sessionDir>/stats-<sessionId>.json after every trackResponse / trackIndexed, throttled to 500ms; cleared on ctx_purge. - bin/statusline.mjs: single-file Node script, no extra deps; walks the parent process chain via /proc/<pid>/status to find Claude Code; falls back to most recent stats-*.json within 30 minutes. - src/cli.ts: context-mode statusline / statusline-install subcommands with safe ~/.claude/settings.json merge + timestamped backup. - tests/statusline.test.ts: 8 hermetic cases covering idle, render, PPID fallback, stale sentinel, NaN guard, corrupt file, --json, error exit. - README.md: status line wiring snippet for the Claude Code section. Render aligned with the restored ctx_stats business voice (Auto-memory, Opus pricing, "preserved across compact, restart & upgrade" tagline, \$X this session / \$Y across sessions footer) so both surfaces speak the same language. Co-authored-by: Ousama Ben Younes <ousamabenyounes@users.noreply.github.com> * fix(adapters/detect): full 14-platform PLATFORM_ENV_VARS audit + opencode-plugin DRY PR #376 follow-up. mikij flagged that src/opencode-plugin.ts hardcoded a KILO_PID-only check that violated DRY against PLATFORM_ENV_VARS. Audit of the canonical list itself surfaced the broader problem: half of the entries were unverified placeholders, 4 platforms (antigravity, zed, pi, openclaw) were entirely missing or incorrectly listed, and the plugin paradigm's fallback to "opencode" was blind (didn't actively check OPENCODE env vars). What ships - Re-audited every entry against the platform's own runtime source code: - kilo: dropped bare `KILO` (Kilo-Org/kilocode never sets it; only KILO_PID is set unconditionally at packages/opencode/src/index.ts:140). - jetbrains-copilot: dropped IDEA_HOME and JETBRAINS_CLIENT_ID (no source-line evidence in any JetBrains repo). Kept IDEA_INITIAL_DIRECTORY. - qwen-code: dropped QWEN_SESSION_ID (0 hits in QwenLM/qwen-code). - openclaw: removed entirely from env-var tier (runtime never sets OPENCLAW_HOME/OPENCLAW_CLI). Detection falls through to ~/.openclaw/ config-dir tier, which already worked. - Added 3 new platforms with verified env vars: - antigravity: ANTIGRAVITY_CLI_ALIAS — verified in Google's google-gemini/gemini-cli packages/core/src/ide/detect-ide.ts (canonical IDE detection map). Listed before vscode-copilot since Antigravity is an Electron/VSCode fork. - zed: ZED_SESSION_ID + ZED_TERM — verified in zed-industries/zed crates/terminal/src/terminal.rs `insert_zed_terminal_env()` and cross-confirmed by Google's gemini-cli detect-ide.ts. - pi: PI_PROJECT_DIR — confirmed by our own consumers at src/pi-extension.ts:154 and src/server.ts:153. - Reordered fork pairs so collision detection works: - kilo before opencode (Kilo sets OPENCODE=1 because it's an OpenCode fork). - cursor + antigravity before vscode-copilot (both inherit VSCODE_PID). - src/opencode-plugin.ts getPlatform() rewritten to iterate PLATFORM_ENV_VARS instead of hardcoding KILO_PID. Filters to kilo+opencode so a stray CLAUDE_PROJECT_DIR can't leak into the plugin's platform decision. Symmetric: actively checks BOTH platform's env vars instead of blind fallback. Per-line JSDoc credits PR #376 (mikij). Tests - tests/adapters/detect.test.ts: removed 5 broken assertions for unverified env vars; added 4 assertions for new platforms (antigravity, zed×2, pi) and a fork-collision test (KILO_PID + OPENCODE both set → kilo wins). - tests/adapters/detect-config-dir.test.ts: rewrote priority chain from OPENCLAW/CODEX assertions to fork-collision assertions (KILO/OPENCODE, CURSOR/VSCODE, ANTIGRAVITY/VSCODE, CURSOR/CODEX). Verification - 451/451 adapter + plugin tests pass on next worktree. - Typecheck clean. Co-Authored-By: Mickey Lazarevic <noreply@github.com> * fix(server): re-apply path-resolution + adapter-aware fixes lost in b392c2f Commit b392c2f rewrote ~600 lines of src/server.ts as part of the opt-in concurrency feature and inadvertently reverted PR #365 plus fixes 1, 2, 4, 10 from the prior fix-army landings (#400 cluster). Six independent regressions slipped through: ctx_index ignored CLAUDE_PROJECT_DIR / IDEA_INITIAL_DIRECTORY / source-label canonicalization, the deny-policy and executor cwd fell back to ad-hoc CLAUDE_PROJECT_DIR ?? cwd(), getSessionDir lost its detectPlatform pre-detection branch, and timeline-mode search lost its worktree suffix, adapter-aware configDir, and adapter pass-through. Re-applies all of the above as a single squashed restore: - isAbsolute import + resolveProjectPath helper. - IDEA_INITIAL_DIRECTORY in getProjectDir() env cascade. - ctx_index uses resolveProjectPath; source label canonicalises to the resolved absolute path so FTS5 dedup keys stop fragmenting across cwds. - Executor takes a () => getProjectDir() thunk so ctx_execute_file picks up the full env cascade lazily, not just the constructor snapshot of CLAUDE_PROJECT_DIR. - checkFilePathDenyPolicy reads getProjectDir() instead of the divergent CLAUDE_PROJECT_DIR ?? cwd() pattern. - getSessionDir consults detectPlatform + getSessionDirSegments before the .claude fallback. - Timeline mode opens SessionDB at hash+worktreeSuffix, derives configDir from _detectedAdapter.getConfigDir(), and threads the adapter into searchAllSources. Also relaxes the fetch-cleanup static guard: b392c2f extracted the fetch path into a runFetchOne helper, so the prior slice from the registerTool call no longer covered ctx-fetch-*.dat. Asserts the patterns at the file scope instead. Local: 5 failing tests on next reduced from 13. The remaining five are bundle-stale ctx_index spawn cases that pass once CI rebuilds server.bundle.mjs on main. * fix(security+release): PR #401 5-mode review follow-up — B3 redaction, SSRF guard, SHELL allowlist + 6 hardening fixes 5-agent review on PR #401 (v1.0.104) flagged P0 security + P1 release-quality issues. This commit addresses every actionable finding except those requiring release-process changes (npm version bump, grill-me gate — handled separately on the release path). Security (B3, SSRF guard, SHELL allowlist) ------------------------------------------ - src/session/extract.ts: mcp_tool_call extractor redacts secret-bearing keys BEFORE serialization. Walk via redactSecrets() with ancestor-set cycle detection (path-based, so DAG / shared-ref shapes process every site). Keys matching /authorization|token|secret|password|api_key|cookie|signature| private_key|client_secret/i are masked to "[REDACTED]". DAG-safe so a shared `headers` object referenced by multiple sub-requests gets redacted at every site. - src/server.ts: ssrfGuard for ctx_fetch_and_index. Hard-blocks file://, gopher://, javascript:, data: schemes; hard-blocks 169.254.0.0/16 (link-local incl. AWS/GCP/Azure IMDS 169.254.169.254), IPv6 link-local, multicast, reserved. Loopback + RFC1918 ALLOWED by default (developer workflow: local dev servers on localhost / internal network) — strict mode via CTX_FETCH_STRICT=1 blocks those too. DNS-resolves to defend against attacker-controlled DNS records / DNS rebinding. Runs BEFORE cache lookup so a previously-poisoned source label can't serve from cache. - src/runtime.ts: SHELL env var allowlist. Basename must match /^(bash|sh|zsh|dash|pwsh|powershell|cmd)(\.exe)?$/i. Cross-OS basename split handles both / and \ separators. Defends against profile-script compromise redirecting executor to /usr/bin/python or arbitrary binary. Release quality (P1.1, P1.2, P1.3) ---------------------------------- - src/server.ts P1.1: OPUS_INPUT_PRICE_PER_TOKEN dedup. Removed local definition; imports from src/session/analytics.ts (single source of truth). Architect + Ops 2-vote convergence. - src/server.ts P1.2: gracefulShutdown flushes persistStats with throttle bypass before exit. Last 0-500ms of bytes_indexed/bytes_returned no longer silently lost on SIGTERM/SIGINT. - src/server.ts + bin/statusline.mjs P1.3: STATS_SCHEMA_VERSION=1 in payload. Statusline reads schemaVersion (defaults 0 for legacy bundles), warns to stderr when reading future schema, still parses known fields. Eliminates silent schema drift (architect review found dollars_saved_lifetime was removed without versioning). Architecture / dev experience ----------------------------- - src/adapters/* getConfigDir contract: always returns absolute path. Pre-fix: cursor/vscode-copilot/jetbrains-copilot/kiro/openclaw returned relative segments → server.ts:1587 consumed verbatim → corrupted timeline configDir. New contract documented in HookAdapter JSDoc; all adapters resolve via path.resolve(projectDir ?? cwd, segment). - bin/statusline.mjs cross-OS PID resolution (B4): macOS now walks parent chain via `ps -o ppid=,comm= -p <pid>` (mirroring Linux /proc walk). Windows degrades to ppid with one-shot stderr warning. Fixes session-id mismatch where statusline #1 would show stats from session #2 on macOS. - Deleted PRD-347-ppid-mismatch-wsl2.md + PRD-wsl2-ppid-sentinel.md — shipped as docs without implementation per Diagnose review. Implement later or remove the orphan. Test consolidation (CONTRIBUTING.md L275) ----------------------------------------- - 3 cache-heal test files merged into 1 with shared fixture helper: tests/hooks/cache-heal-build-command.test.ts (deleted), tests/hooks/cache-heal-stale-node-detection.test.ts (deleted), tests/hooks/cache-heal-self-heal.test.ts (4 describe blocks, 24 tests preserved, makeTmp/writeJson helpers extracted). Verification ------------ - All new/modified test files pass: - tests/session/session-extract.test.ts: 153/153 (B3 + 2 new redaction tests) - tests/runtime.test.ts: 12/12 (SHELL allowlist + 4 new tests) - tests/core/server.test.ts SSRF block: 12/12 (classifyIp + ssrfGuard source-grep) - tests/statusline.test.ts: 13/13 (B4 cross-OS + schemaVersion handling) - tests/hooks/cache-heal-self-heal.test.ts: 24/24 (consolidated) - tests/adapters/memory-conventions.test.ts: 62/62 (getConfigDir contract) - Full vitest run: 2170/2188 pass, 19 skipped, 14 pre-existing failures (8 opencode config-paths + 6 ctx_index/ctx_execute_file projectRoot resolution — both documented in PRD-concurrency-architectural §8 and Diagnose review baseline; both resolve via `npm run build`). - npx tsc --noEmit clean. Co-Authored-By: Mickey Lazarevic <noreply@github.com> * fix(stats): persist dollars_saved_lifetime for statusline (#402) fix(stats): persist dollars_saved_lifetime so statusline can render the brand-poem triad The README shipped in 58a60d8 promises: context-mode ● $0.42 saved this session · $12.30 saved across sessions · 87% efficient · 23m bin/statusline.mjs reads `stats.dollars_saved_lifetime ?? 0` and only renders the "saved across sessions" block when > 0. After the b392c2f concurrency refactor + e638bd6 analytics restoration, getLifetimeStats came back, but persistStats() never wired it into the JSON sidecar — the statusline would always read 0 and the "remembers more" half of the brand poem (talks-less / remembers / costs-less) would never render. Wire `getLifetimeStats({ sessionsDir: getSessionDir() })` into persistStats() with a 30s cache. The 500ms persist throttle would be too aggressive for a function that scans every per-project SessionDB plus the auto-memory dir; the statusline doesn't need second-by-second lifetime accuracy. Conversion factor (256 tokens/event = ~1KB ÷ 4 bytes/token) is the same one used by analytics.ts renderBottomLine, extracted to a TOKENS_PER_EVENT constant so it stays in lockstep if either side moves. Failures during the disk scan keep the stale cache (or 0) — same best-effort discipline as the surrounding persistStats() try/catch. Verification - npm run typecheck clean - npm run build clean (cli 552kb, server 511kb) - npx vitest run 74/74 files, 2130/2130 pass - targeted: tests/statusline.test.ts + lifetime-stats + stats-output-format all green (16/16) Addresses Critical 3 from the PR #399 review: #399 (comment) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Ray <34021803+meikocho1@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ray <cho_meiko@okuribito-funeral.jp> Co-authored-by: 津铭 <yongtingzhang@gmail.com> Co-authored-by: Anton Okhontsev <anton.ohontsev@gmail.com> Co-authored-by: Marcus Neufeldt <MarcusNeufeldt@users.noreply.github.com> Co-authored-by: btxbtxbtx <btxbtxbtx@users.noreply.github.com> Co-authored-by: Mickey Lazarevic <mikij@users.noreply.github.com> Co-authored-by: VrianCao <45995071+VrianCao@users.noreply.github.com> Co-authored-by: Tomodad <128800342+Tomodad@users.noreply.github.com> Co-authored-by: Ousama Ben Younes <ousamabenyounes@users.noreply.github.com> Co-authored-by: Sebastian Breguel <62109266+sebastianbreguel@users.noreply.github.com> Co-authored-by: Sebastian Breguel <sebastianbreguel@gmail.com> Co-authored-by: Mickey Lazarevic <noreply@github.com> Co-authored-by: Ben Younes <benyounes.ousama@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #388 (and addresses the residual root cause noted in the resolution comment of #311).
The vendored MCP SDK's
StdioServerTransport.start()registers only'data'and'error'listeners onprocess.stdin— never'end'. When the parent (e.g. Claude Code) dies abruptly without sending SIGTERM, the server keeps reading from a half-closed pipe and CPU-spins until the next 30 sppidpoll. In #388 this manifested as six orphaned processes, each pinning ~95% CPU, with the oldest accumulating 80 h of CPU time before being SIGKILL'd manually.This PR adds a single
'end'listener insidestartLifecycleGuard. Crucially, it does not shut down on'end'alone — that is exactly the false-positive behavior#236tore out. Instead,'end'triggers the sameisParentAlive()probe the periodic timer already runs, just sooner:#236regression test still passes).Skipped on TTY (e.g. OpenCode ts-plugin), where
stdinis not the MCP channel.Why not "fix the SDK"
The clean fix is to register
'end'inStdioServerTransport.start()upstream in@modelcontextprotocol/sdk. That's worth doing too, but:server.bundle.mjsso context-mode users won't get the upstream fix without a re-bundle anyway.lifecycle.tskeeps the policy (when to shut down) where the rest of the policy already lives — out of the SDK transport.Why this doesn't reintroduce #236
#236removedprocess.stdin.resume()and the'close'/'error'shutdown shortcuts because those fired on transient pipe events and calledprocess.exit(0)mid-request. This PR:'end'(a one-shot EOF signal, not a transient event),process.stdin.resume()(the SDK's'data'listener already puts stdin into flowing mode),isParentAlive()gate must also report parent death.The existing
child does NOT exit when stdin is closedintegration test still passes unchanged.Diff shape
Tests
bun run test tests/lifecycle.test.ts), including the unchanged Lifecycle guard stdin listeners cause spurious MCP -32000 Connection closed #236 integration test.'end'twice — once withisParentAlive: () => true(asserts no shutdown), once withisParentAlive: () => false(asserts shutdown). Pins both halves of the contract.'end'listener delta fromstartLifecycleGuard, while pinning that'close','data','error', and'readable'remain untouched and that the'end'listener is removed on cleanup. Skipped on TTY.bunx tsc --noEmitis clean.Reproduction (from #388)
All in
Rstate.pkill -TERMhad no effect; onlySIGKILLworked. With this patch, an'end'event from the parent's pipe close triggers the existingisParentAlive()check immediately, so a dead parent shuts the server down before it can accumulate even one minute of busy-loop time.