Skip to content

fix(windows): prevent conhost.exe zombie leak + fix agentEntry type narrowing#30060

Open
edincampara wants to merge 1 commit intoopenclaw:mainfrom
edincampara:fix/windows-conhost-leak
Open

fix(windows): prevent conhost.exe zombie leak + fix agentEntry type narrowing#30060
edincampara wants to merge 1 commit intoopenclaw:mainfrom
edincampara:fix/windows-conhost-leak

Conversation

@edincampara
Copy link
Contributor

@edincampara edincampara commented Feb 28, 2026

Changes

1. fix(windows): prevent conhost.exe zombie process leak

On Windows, the gateway accumulates hundreds of zombie \conhost.exe\ (Windows Console Host) processes over time, one per cron execution. These are never reaped and consume significant RAM.

Root cause: In \src/infra/process-respawn.ts,
estartGatewayProcessWithFreshPid()\ spawns a detached child with \stdio: 'inherit'. On Windows this causes Node.js to allocate a \conhost.exe\ for the child's console I/O. When \child.unref()\ is called, the Node event loop detaches from the child -- but the \conhost.exe\ stays alive as a zombie indefinitely.

Observed impact (real production system, 9 cron jobs, ~7h uptime):

  • 404 zombie \conhost.exe\ processes
  • 3.5 GB RAM consumed by zombies
  • Periodic CPU spikes
  • Gateway required manual restart to recover

Fix: Add \windowsHide: true\ to the \spawn()\ options. This suppresses console window (and \conhost.exe) allocation on Windows entirely. No-op on macOS/Linux, zero cross-platform risk.

\\ s
// src/infra/process-respawn.ts
const child = spawn(process.execPath, args, {
env: process.env,
detached: true,
stdio: 'inherit',
windowsHide: true, // prevents conhost.exe allocation on Windows
});
\\


2. fix(types): guard agentEntry against false before accessing heartbeat

Pre-existing TypeScript error caught by CI:

\
src/gateway/server-cron.ts(197,24): error TS2339: Property 'heartbeat' does not exist on type 'false | AgentConfig'
\\

\�gentEntry\ is the result of \Array.find()\ which can return \ alse\ when no matching agent is found. The spread ...agentEntry?.heartbeat\ doesn't narrow the \ alse\ case properly.

Fix: Explicit type guard before spreading:

\\ s
// before
...agentEntry?.heartbeat,

// after
...(agentEntry && agentEntry !== false ? agentEntry.heartbeat : undefined),
\\


Tested on

  • Windows 10 22H2, Node.js v24.13.0
  • Before conhost fix: 404 zombies @ 3.5 GB after 7h with 9 active cron jobs
  • After conhost fix: 0 \conhost.exe\ accumulation

References

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 28, 2026

Greptile Summary

Adds windowsHide: true to spawn options in restartGatewayProcessWithFreshPid() to prevent conhost.exe process leaks on Windows. The combination of detached: true + stdio: "inherit" + unref() causes Node.js to allocate a Console Host process on Windows that never gets reaped, accumulating hundreds of zombie processes over time (404 processes / 3.5GB RAM observed in production after 7 hours with 9 cron jobs).

  • Fixed by adding windowsHide: true to the spawn options, which suppresses console window allocation and is a no-op on macOS/Linux
  • Added detailed comment explaining the root cause and solution
  • Other spawn calls in the codebase either use stdio: "ignore" (which doesn't create console windows) or already include windowsHide: true

Confidence Score: 5/5

  • This PR is safe to merge with no risk
  • The fix is minimal (one line + comment), uses the standard Node.js solution for this exact problem, and is explicitly a no-op on non-Windows platforms. The change is well-documented and addresses a real production issue without introducing any side effects or breaking changes.
  • No files require special attention

Last reviewed commit: e3c2676

@openclaw-barnacle openclaw-barnacle bot added the gateway Gateway runtime label Feb 28, 2026
@edincampara edincampara changed the title fix(windows): prevent conhost.exe zombie process leak on Windows fix(windows): prevent conhost.exe zombie leak + fix agentEntry type narrowing Feb 28, 2026
Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The windowsHide: true fix is clean, well-documented with production data (404 zombies, 3.5 GB), and is a no-op on non-Windows. One note: the agentEntry type narrowing conflicts with PRs #30048 and #30063#30048's ternary approach should take precedence. Otherwise LGTM.

…dows

On Windows, spawning a detached child process with stdio:'inherit' causes
Node.js to allocate a conhost.exe (Windows Console Host) for the child.
When child.unref() is called, the Node event loop detaches but the conhost
process is never reaped -- it stays alive as a zombie.

With cron jobs running nightly, these accumulate rapidly. In production we
observed 404 zombie conhost.exe processes after ~7 hours, consuming 3.5 GB
of RAM and causing CPU spikes when they all become schedulable simultaneously.

Fix: add windowsHide:true to the spawn options in restartGatewayProcessWithFreshPid().
This suppresses console window (and conhost.exe) allocation on Windows entirely.
It is a no-op on macOS and Linux, so there is zero cross-platform risk.

Tested on: Windows 10 22H2, Node.js v24.13.0
Before: 404 conhost.exe @ 3.5 GB RAM after 7h uptime with 9 cron jobs
After:  0 conhost.exe accumulation
@edincampara edincampara force-pushed the fix/windows-conhost-leak branch from 92a8b59 to 36591e0 Compare March 1, 2026 00:39
@openclaw-barnacle openclaw-barnacle bot removed the gateway Gateway runtime label Mar 1, 2026
@edincampara
Copy link
Contributor Author

Thanks @nikolasdehor! Agreed on the type narrowing — I've already rebased and dropped those commits since #30048's approach is cleaner. The branch now has only the windowsHide fix on top of latest main, conflict-free.

@openclaw-barnacle
Copy link

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle bot added the stale Marked as stale due to inactivity label Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size: XS stale Marked as stale due to inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants