Summary
On a host where the OpenClaw gateway runs continuously and openclaw agent is invoked many times (in our case, once per inbound email via a third-party Microsoft Graph bridge), MCP stdio server children spawned for the local-bridge MCP integration are never reaped. They accumulate at roughly 66 MB RSS each with no upper bound until the gateway is restarted.
The MCP server itself is a textbook stdio implementation using @modelcontextprotocol/sdk@^1.12.0:
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
const server = new McpServer({ name: 'local-bridge-mcp', version: '0.1.0' });
// ... server.registerTool(...) ×N ...
const transport = new StdioServerTransport();
await server.connect(transport);
The same server.js works correctly under Claude Code and other MCP hosts — the children exit cleanly when the host closes our stdin or sends a shutdown request, and Node exits naturally because nothing is left holding the event loop. So the MCP server code is not the issue.
Root cause (hypothesis)
The leak is an OpenClaw-side lifecycle bug, not an MCP-protocol or SDK-side bug. The diagnostic that points there:
$ ps -A -o pid,ppid,etime,command | grep -E 'openclaw-(agent|infer|gateway)'
10828 10826 58:54 openclaw-infer
11260 11257 39:09 openclaw-agent
11463 1 28:58 openclaw-gateway
All three of these are CLI invocations (or a long-running daemon) that should have exited in seconds for a single agent turn — but they're staying alive for many minutes after the work completes. Because they don't exit, they don't close the stdio pipes to their MCP server children. The children sit in await server.connect(transport) forever, exactly as the MCP SDK design says they should.
lsof on a leaked MCP child confirms its stdin pipe is still actively connected to the parent's writing end:
$ lsof -p <leaked-mcp-pid>
node 12719 <user> 4 PIPE 0xc6b48af03ed5df52 16384 ->0x799250810c02e91f
node 12719 <user> 5 PIPE 0x799250810c02e91f 16384 ->0xc6b48af03ed5df52
Reproduction
- OpenClaw 2026.4.23 (
a979721) installed and running as ai.openclaw.gateway LaunchAgent on macOS (Apple Silicon).
- An MCP server registered in
~/.openclaw/openclaw.json under mcp.servers.local-bridge using stdio transport. The server's only behavior is to register a few tools that httpJson proxy to a localhost HTTP service.
- Repeatedly invoke
openclaw agent --message ... --json --timeout 60 (we do this from a third-party bridge, but a shell loop reproduces).
- Observe that:
- Every
openclaw agent invocation lingers as a openclaw-agent process for many minutes after returning.
- Each invocation also leaves behind one
node .../mcp-server/server.js child of either the gateway or the agent process.
ps | grep mcp-server | wc -l grows monotonically.
- Total RSS climbs by ~66 MB per turn.
Observed scale
On a host that processes ~17 inbound emails since last gateway restart:
$ ps -A -o pid,rss,etime,command | grep '[m]cp-server/server.js' \
| awk '{rss+=$2; n++} END {printf "%d processes, total RSS: %.1f MB\n", n, rss/1024}'
17 processes, total RSS: 1130.2 MB
15 of the 17 children are direct children of the gateway daemon; the other 2 are children of the abandoned openclaw-agent/openclaw-infer parents shown above.
Expected behavior
After an agent turn completes:
- Any one-shot
openclaw agent / openclaw infer CLI invocation exits, returning its result.
- Any spawned MCP server children have their stdin closed (or receive a
shutdown request), see EOF, and exit naturally.
- The parent reaps the child via
wait() so it doesn't become a zombie.
- Long-running
openclaw gateway daemon does the same per-turn — spawn, use, shutdown, reap.
This is what other MCP hosts (Claude Code, Cursor, etc.) do with the same server.js, and it's the contract the MCP SDK assumes.
Workaround we deployed (defensive, not a fix)
We added a hard max-lifetime + parent-watch in our MCP server to cap the leak:
const MAX_LIFETIME_MS = parseInt(process.env.MCP_MAX_LIFETIME_MS || '600000', 10);
setTimeout(() => {
console.error(`[mcp] max lifetime ${MAX_LIFETIME_MS}ms reached, exiting`);
process.exit(0);
}, MAX_LIFETIME_MS).unref();
setInterval(() => {
try { process.kill(process.ppid, 0); }
catch { console.error('[mcp] parent gone, exiting'); process.exit(0); }
}, 30_000).unref();
.unref() on both timers so they don't keep the event loop alive on their own. This bounds the leak per child but doesn't address the underlying lifecycle issue in OpenClaw — that has to be fixed upstream.
Environment
- OpenClaw
2026.4.23 (a979721) installed via npm i -g openclaw
- macOS (Apple Silicon Mac mini)
- Node.js
>=20
@modelcontextprotocol/sdk@^1.12.0
- Gateway running as
ai.openclaw.gateway LaunchAgent on port 18789
Why this matters
For an agent that runs continuously (e.g., embedded in any kind of always-on automation — email, calendar, cron-like flows), the leak is unbounded. At ~66 MB per turn and a steady traffic of even tens of turns per day, the host runs out of RAM in a week or two without external intervention. The workaround above is acceptable belt-and-suspenders, but the right fix has to be on the host (OpenClaw) side: either send shutdown per the MCP spec, or close stdio and reap the child after each turn.
Summary
On a host where the OpenClaw gateway runs continuously and
openclaw agentis invoked many times (in our case, once per inbound email via a third-party Microsoft Graph bridge), MCP stdio server children spawned for the local-bridge MCP integration are never reaped. They accumulate at roughly 66 MB RSS each with no upper bound until the gateway is restarted.The MCP server itself is a textbook stdio implementation using
@modelcontextprotocol/sdk@^1.12.0:The same
server.jsworks correctly under Claude Code and other MCP hosts — the children exit cleanly when the host closes our stdin or sends ashutdownrequest, and Node exits naturally because nothing is left holding the event loop. So the MCP server code is not the issue.Root cause (hypothesis)
The leak is an OpenClaw-side lifecycle bug, not an MCP-protocol or SDK-side bug. The diagnostic that points there:
All three of these are CLI invocations (or a long-running daemon) that should have exited in seconds for a single agent turn — but they're staying alive for many minutes after the work completes. Because they don't exit, they don't close the stdio pipes to their MCP server children. The children sit in
await server.connect(transport)forever, exactly as the MCP SDK design says they should.lsofon a leaked MCP child confirms its stdin pipe is still actively connected to the parent's writing end:Reproduction
a979721) installed and running asai.openclaw.gatewayLaunchAgent on macOS (Apple Silicon).~/.openclaw/openclaw.jsonundermcp.servers.local-bridgeusing stdio transport. The server's only behavior is to register a few tools thathttpJsonproxy to a localhost HTTP service.openclaw agent --message ... --json --timeout 60(we do this from a third-party bridge, but a shell loop reproduces).openclaw agentinvocation lingers as aopenclaw-agentprocess for many minutes after returning.node .../mcp-server/server.jschild of either the gateway or the agent process.ps | grep mcp-server | wc -lgrows monotonically.Observed scale
On a host that processes ~17 inbound emails since last gateway restart:
15 of the 17 children are direct children of the gateway daemon; the other 2 are children of the abandoned
openclaw-agent/openclaw-inferparents shown above.Expected behavior
After an agent turn completes:
openclaw agent/openclaw inferCLI invocation exits, returning its result.shutdownrequest), see EOF, and exit naturally.wait()so it doesn't become a zombie.openclaw gatewaydaemon does the same per-turn — spawn, use, shutdown, reap.This is what other MCP hosts (Claude Code, Cursor, etc.) do with the same
server.js, and it's the contract the MCP SDK assumes.Workaround we deployed (defensive, not a fix)
We added a hard max-lifetime + parent-watch in our MCP server to cap the leak:
.unref()on both timers so they don't keep the event loop alive on their own. This bounds the leak per child but doesn't address the underlying lifecycle issue in OpenClaw — that has to be fixed upstream.Environment
2026.4.23 (a979721)installed vianpm i -g openclaw>=20@modelcontextprotocol/sdk@^1.12.0ai.openclaw.gatewayLaunchAgent on port 18789Why this matters
For an agent that runs continuously (e.g., embedded in any kind of always-on automation — email, calendar, cron-like flows), the leak is unbounded. At ~66 MB per turn and a steady traffic of even tens of turns per day, the host runs out of RAM in a week or two without external intervention. The workaround above is acceptable belt-and-suspenders, but the right fix has to be on the host (OpenClaw) side: either send
shutdownper the MCP spec, or close stdio and reap the child after each turn.