Skip to content

fix(codegraph): shorten shared daemon idle cleanup safely / 安全缩短共享 daemon 空闲清理#4332

Merged
SivanCola merged 2 commits into
esengine:main-v2from
heshuimu:fix/codegraph-daemon-leak
Jun 14, 2026
Merged

fix(codegraph): shorten shared daemon idle cleanup safely / 安全缩短共享 daemon 空闲清理#4332
SivanCola merged 2 commits into
esengine:main-v2from
heshuimu:fix/codegraph-daemon-leak

Conversation

@heshuimu

@heshuimu heshuimu commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

CodeGraph serve --mcp starts a root-scoped shared daemon that detaches into its own process group. The launcher/proxy can be killed by Reasonix cleanup, but the daemon intentionally outlives any single MCP client and exits only after its idle timer once the last client disconnects.

This PR keeps that shared-client lifecycle intact while making Reasonix-launched daemons short-lived after use.

Changes:

  • Set CODEGRAPH_DAEMON_IDLE_TIMEOUT_MS=1000 on the built-in CodeGraph MCP spec, so a daemon started by Reasonix exits quickly after the last client disconnects instead of lingering for the default 5 minutes.
  • Do not call KillDaemon(root) from normal ctrl.Close() cleanup. Closing a controller now only tears down Reasonix’s proxy connection; CodeGraph’s own refcount decides when the shared daemon may exit.
  • Keep KillDaemon(root) for explicit test/diagnostic cleanup, but harden it to read .codegraph/daemon.pid and require the daemon socket hello to match the recorded pid/socket/protocol before sending a direct kill. This avoids killing an unrelated process after stale PID reuse.
  • Update E2E cleanup comments and add focused coverage for lock parsing, PID-reuse protection, matching-socket cleanup, and the boot-time idle-timeout env contract.

Verification:

  • go test ./internal/codegraph
  • go test ./internal/boot
  • git diff --check

Authorship note: @heshuimu is the primary author of the original PR. @SivanCola contributed the follow-up hardening commit that preserves CodeGraph shared-daemon semantics and guards explicit daemon cleanup against stale PID reuse.

CodeGraph serve --mcp starts a daemon that calls setsid(2) to detach
into its own process group.  On Unix, KillTree sends SIGKILL to the
launcher's process group (syscall.Kill(-pid, SIGKILL)) — but the
daemon has already moved to a new group, so it survives and runs until
its 5-minute idle timeout fires.  (Windows is unaffected: the Job
Object with KILL_ON_JOB_CLOSE captures all descendants regardless of
reparenting.)

Fix:

- codegraph.DaemonPID(root) parses the last daemon PID from
  .codegraph/daemon.log
- codegraph.KillDaemon(root) sends a direct SIGKILL to that PID
- boot.Build chains KillDaemon into the cleanup closure so every
  ctrl.Close() path — normal exit, model switch, tab rebuild,
  desktop shutdown — reaps the escaped daemon
- Both E2E tests now clean up their daemons; a new E2E test
  (TestE2ECodegraphKillDaemon) proves the daemon survives
  process-group kill and dies to the explicit KillDaemon call
@github-actions github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development mcp MCP servers / plugins (internal/plugin, codegraph) config Configuration & setup (internal/config) labels Jun 13, 2026
Use CodeGraph’s idle-timeout knob instead of killing the root-scoped daemon from controller cleanup, so active clients on the same project keep their shared daemon alive.

Harden explicit daemon cleanup by reading the structured daemon.pid lockfile and requiring a matching socket hello before sending a direct kill, preventing stale log/PID reuse from killing unrelated processes.
@SivanCola SivanCola changed the title fix(codegraph): kill daemon that escapes process-group cleanup fix(codegraph): shorten shared daemon idle cleanup safely / 安全缩短共享 daemon 空闲清理 Jun 14, 2026
@SivanCola

Copy link
Copy Markdown
Collaborator

Follow-up hardening pushed in edd4996f.

What changed:

  • Replaced the controller-cleanup hard kill with CodeGraph's own idle-timeout knob: Reasonix-launched daemons now use CODEGRAPH_DAEMON_IDLE_TIMEOUT_MS=1000, so they exit shortly after the last client disconnects while still staying alive for other active clients on the same project root.
  • Kept KillDaemon(root) only for explicit test/diagnostic cleanup, and hardened it to require a structured .codegraph/daemon.pid record plus a matching daemon socket hello before sending a direct kill. This avoids stale-log/PID-reuse kills.
  • Added focused tests for the idle-timeout env contract and safe explicit daemon cleanup.

Verified:

  • go test ./internal/codegraph
  • go test ./internal/boot
  • git diff --check

@SivanCola SivanCola left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approve

@SivanCola

Copy link
Copy Markdown
Collaborator

@codex review

@SivanCola SivanCola merged commit 9a30fcc into esengine:main-v2 Jun 14, 2026
14 checks passed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: edd4996fc3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/boot/boot.go
esengine pushed a commit that referenced this pull request Jun 14, 2026
…4363)

Extract a shared codegraph.MCPSpec helper so the built-in CodeGraph MCP spec (daemon idle-timeout env, codegraph_ prefix strip, low-priority, read-only tool names) stays identical across normal boot, token-economy on-demand enablement, and manual MCP reconnect. The on-demand and reconnect paths previously drifted and omitted the idle-timeout env (and prefix/priority), which this centralizes. Follow-up to #4332.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config Configuration & setup (internal/config) mcp MCP servers / plugins (internal/plugin, codegraph) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants