fix(codegraph): stop leaking detached MCP daemons and indexing the whole drive (#3747) by esengine · Pull Request #3755 · esengine/DeepSeek-Reasonix

esengine · 2026-06-10T02:32:50Z

Problem

#3747: after a session, dozens of orphaned codegraph serve --mcp node processes survived (~1.7 GB), and one instance launched with cwd C:\ indexed the entire system drive (~1 GB).

Two independent Reasonix-side root causes.

1. The job object lost the race against the launcher

Each stdio MCP server starts as cmd.exe -> node.exe; CodeGraph's shim re-parents the node daemon off the launcher. TrackTree assigned the process to its kill-on-close Job Object after cmd.Start() returned — but a fast .cmd shim can exec node and exit before that assignment lands, leaving node.exe orphaned in no job. KillTracked (and an abrupt reasonix exit, which relies on the job handle closing) then can't reap it, so every failed/closed handshake leaks a daemon.

Fix: proc.StartTracked creates the child suspended, assigns it to the job while it is still frozen, then resumes it — so the launcher and every descendant it spawns are captured before any code runs. The process is always resumed (even if assignment fails) so a child can never be left wedged suspended.

2. A filesystem-root cwd indexed the whole volume

CodeGraph is cwd-aware; when the project root resolved to C:\ it walked the entire drive. codegraph.IndexableRoot now rejects drive roots, UNC share roots, /, and an empty root — checked at both spawn sites (boot and the /codegraph connect path) before launching serve.

Tests

proc: job assignment + reap through StartTracked; a deterministic resume check (a suspended child that must run to exit 7, so a missed resume fails instead of hanging).
codegraph: IndexableRoot rejects C:\, \server\share, /, and "".

Verified locally on Windows 11: golangci-lint clean, go vet, and the proc / codegraph / plugin / boot / control package tests pass.

Not in this PR

Dedup of concurrent codegraph spawns and an idle-timeout watchdog (#3747 items 2-3) are follow-ups. With reaping now reliable, leaked handles are felled on close and on exit, which removes the surviving-orphan harm; dedup/idle-timeout are an optimization on top.

Closes #3747

… MCP children are reaped A Windows stdio MCP launcher (cmd.exe -> node.exe, as the CodeGraph daemon re-parents itself off its shim) raced the job-object assignment: the job was assigned only after Start returned, so a fast shim could exec its grandchild and exit before the assignment, leaving node.exe orphaned in no job. KillTracked and an abrupt reasonix exit then both missed it, and dozens of codegraph daemons leaked past a session (#3747). Create the child suspended, assign it to the job, then resume — so every descendant is captured before the launcher can spawn anything.

When the workspace root resolved to a drive root (C:\), CodeGraph's cwd-aware serve --mcp walked the entire volume — C:\Windows, Program Files, everything — pinning ~1GB of RAM (#3747). Reject filesystem roots (and an empty root) at both spawn sites before launching serve.

…ole drive (esengine#3747) (esengine#3755) * fix(proc): assign the job object before the launcher runs so detached MCP children are reaped A Windows stdio MCP launcher (cmd.exe -> node.exe, as the CodeGraph daemon re-parents itself off its shim) raced the job-object assignment: the job was assigned only after Start returned, so a fast shim could exec its grandchild and exit before the assignment, leaving node.exe orphaned in no job. KillTracked and an abrupt reasonix exit then both missed it, and dozens of codegraph daemons leaked past a session (esengine#3747). Create the child suspended, assign it to the job, then resume — so every descendant is captured before the launcher can spawn anything. * fix(codegraph): refuse to index a filesystem root When the workspace root resolved to a drive root (C:\), CodeGraph's cwd-aware serve --mcp walked the entire volume — C:\Windows, Program Files, everything — pinning ~1GB of RAM (esengine#3747). Reject filesystem roots (and an empty root) at both spawn sites before launching serve. --------- Co-authored-by: reasonix <reasonix@deepseek.com>

reasonix added 2 commits June 9, 2026 19:31

esengine requested a review from SivanCola as a code owner June 10, 2026 02:32

github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development agent Core agent loop (internal/agent, internal/control) mcp MCP servers / plugins (internal/plugin, codegraph) config Configuration & setup (internal/config) labels Jun 10, 2026

esengine merged commit 2818888 into main-v2 Jun 10, 2026
14 checks passed

esengine deleted the fix/mcp-codegraph-process-leak branch June 10, 2026 02:44

This was referenced Jun 10, 2026

fix(proc): reap codegraph's process tree off Windows via process groups #3787

Merged

fix(macOS): reap entire codegraph process tree on exit (Setpgid + negative-PID kill) #3735

Closed

Bernardxu123 mentioned this pull request Jun 10, 2026

[Meta] Issues 分组审核报告 — 按模块分类 & 优先级排序 #3275

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(codegraph): stop leaking detached MCP daemons and indexing the whole drive (#3747)#3755

fix(codegraph): stop leaking detached MCP daemons and indexing the whole drive (#3747)#3755
esengine merged 2 commits into
main-v2from
fix/mcp-codegraph-process-leak

esengine commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esengine commented Jun 10, 2026

Problem

1. The job object lost the race against the launcher

2. A filesystem-root cwd indexed the whole volume

Tests

Not in this PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant