Skip to content

fix(codegraph): stop leaking detached MCP daemons and indexing the whole drive (#3747)#3755

Merged
esengine merged 2 commits into
main-v2from
fix/mcp-codegraph-process-leak
Jun 10, 2026
Merged

fix(codegraph): stop leaking detached MCP daemons and indexing the whole drive (#3747)#3755
esengine merged 2 commits into
main-v2from
fix/mcp-codegraph-process-leak

Conversation

@esengine

Copy link
Copy Markdown
Owner

Problem

#3747: after a session, dozens of orphaned codegraph serve --mcp node processes survived (~1.7 GB), and one instance launched with cwd C:\ indexed the entire system drive (~1 GB).

Two independent Reasonix-side root causes.

1. The job object lost the race against the launcher

Each stdio MCP server starts as cmd.exe -> node.exe; CodeGraph's shim re-parents the node daemon off the launcher. TrackTree assigned the process to its kill-on-close Job Object after cmd.Start() returned — but a fast .cmd shim can exec node and exit before that assignment lands, leaving node.exe orphaned in no job. KillTracked (and an abrupt reasonix exit, which relies on the job handle closing) then can't reap it, so every failed/closed handshake leaks a daemon.

Fix: proc.StartTracked creates the child suspended, assigns it to the job while it is still frozen, then resumes it — so the launcher and every descendant it spawns are captured before any code runs. The process is always resumed (even if assignment fails) so a child can never be left wedged suspended.

2. A filesystem-root cwd indexed the whole volume

CodeGraph is cwd-aware; when the project root resolved to C:\ it walked the entire drive. codegraph.IndexableRoot now rejects drive roots, UNC share roots, /, and an empty root — checked at both spawn sites (boot and the /codegraph connect path) before launching serve.

Tests

  • proc: job assignment + reap through StartTracked; a deterministic resume check (a suspended child that must run to exit 7, so a missed resume fails instead of hanging).
  • codegraph: IndexableRoot rejects C:\, \server\share, /, and "".

Verified locally on Windows 11: golangci-lint clean, go vet, and the proc / codegraph / plugin / boot / control package tests pass.

Not in this PR

Dedup of concurrent codegraph spawns and an idle-timeout watchdog (#3747 items 2-3) are follow-ups. With reaping now reliable, leaked handles are felled on close and on exit, which removes the surviving-orphan harm; dedup/idle-timeout are an optimization on top.

Closes #3747

reasonix added 2 commits June 9, 2026 19:31
… MCP children are reaped

A Windows stdio MCP launcher (cmd.exe -> node.exe, as the CodeGraph
daemon re-parents itself off its shim) raced the job-object assignment:
the job was assigned only after Start returned, so a fast shim could exec
its grandchild and exit before the assignment, leaving node.exe orphaned
in no job. KillTracked and an abrupt reasonix exit then both missed it,
and dozens of codegraph daemons leaked past a session (#3747).

Create the child suspended, assign it to the job, then resume — so every
descendant is captured before the launcher can spawn anything.
When the workspace root resolved to a drive root (C:\), CodeGraph's
cwd-aware serve --mcp walked the entire volume — C:\Windows, Program
Files, everything — pinning ~1GB of RAM (#3747). Reject filesystem roots
(and an empty root) at both spawn sites before launching serve.
@esengine esengine requested a review from SivanCola as a code owner June 10, 2026 02:32
@github-actions github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development agent Core agent loop (internal/agent, internal/control) mcp MCP servers / plugins (internal/plugin, codegraph) config Configuration & setup (internal/config) labels Jun 10, 2026
@esengine esengine merged commit 2818888 into main-v2 Jun 10, 2026
14 checks passed
@esengine esengine deleted the fix/mcp-codegraph-process-leak branch June 10, 2026 02:44
SuMuxi66 pushed a commit to SuMuxi66/DeepSeek-Reasonix that referenced this pull request Jun 10, 2026
…ole drive (esengine#3747) (esengine#3755)

* fix(proc): assign the job object before the launcher runs so detached MCP children are reaped

A Windows stdio MCP launcher (cmd.exe -> node.exe, as the CodeGraph
daemon re-parents itself off its shim) raced the job-object assignment:
the job was assigned only after Start returned, so a fast shim could exec
its grandchild and exit before the assignment, leaving node.exe orphaned
in no job. KillTracked and an abrupt reasonix exit then both missed it,
and dozens of codegraph daemons leaked past a session (esengine#3747).

Create the child suspended, assign it to the job, then resume — so every
descendant is captured before the launcher can spawn anything.

* fix(codegraph): refuse to index a filesystem root

When the workspace root resolved to a drive root (C:\), CodeGraph's
cwd-aware serve --mcp walked the entire volume — C:\Windows, Program
Files, everything — pinning ~1GB of RAM (esengine#3747). Reject filesystem roots
(and an empty root) at both spawn sites before launching serve.

---------

Co-authored-by: reasonix <reasonix@deepseek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Core agent loop (internal/agent, internal/control) config Configuration & setup (internal/config) mcp MCP servers / plugins (internal/plugin, codegraph) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP process leak: 35 orphaned codegraph processes (1.7GB) + C:\ drive indexer (1GB)

1 participant