openclaw agent / openclaw infer CLI processes don't exit; MCP stdio server children orphaned and accumulating (~66 MB each)

### Summary
On a host where the OpenClaw gateway runs continuously and `openclaw agent` is invoked many times (in our case, once per inbound email via a third-party Microsoft Graph bridge), MCP stdio server children spawned for the local-bridge MCP integration are never reaped. They accumulate at roughly **66 MB RSS each** with no upper bound until the gateway is restarted.

The MCP server itself is a textbook stdio implementation using `@modelcontextprotocol/sdk@^1.12.0`:

```js
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
const server = new McpServer({ name: 'local-bridge-mcp', version: '0.1.0' });
// ... server.registerTool(...) ×N ...
const transport = new StdioServerTransport();
await server.connect(transport);
```

The same `server.js` works correctly under Claude Code and other MCP hosts — the children exit cleanly when the host closes our stdin or sends a `shutdown` request, and Node exits naturally because nothing is left holding the event loop. So the MCP server code is not the issue.

### Root cause (hypothesis)
The leak is an OpenClaw-side lifecycle bug, not an MCP-protocol or SDK-side bug. The diagnostic that points there:

```
$ ps -A -o pid,ppid,etime,command | grep -E 'openclaw-(agent|infer|gateway)'
10828 10826   58:54 openclaw-infer
11260 11257   39:09 openclaw-agent
11463     1   28:58 openclaw-gateway
```

All three of these are CLI invocations (or a long-running daemon) that **should have exited in seconds** for a single agent turn — but they're staying alive for many minutes after the work completes. Because they don't exit, they don't close the stdio pipes to their MCP server children. The children sit in `await server.connect(transport)` forever, exactly as the MCP SDK design says they should.

`lsof` on a leaked MCP child confirms its stdin pipe is still actively connected to the parent's writing end:

```
$ lsof -p <leaked-mcp-pid>
node    12719 &lt;user&gt;   4     PIPE 0xc6b48af03ed5df52     16384  ->0x799250810c02e91f
node    12719 &lt;user&gt;   5     PIPE 0x799250810c02e91f     16384  ->0xc6b48af03ed5df52
```

### Reproduction
1. OpenClaw 2026.4.23 (`a979721`) installed and running as `ai.openclaw.gateway` LaunchAgent on macOS (Apple Silicon).
2. An MCP server registered in `~/.openclaw/openclaw.json` under `mcp.servers.local-bridge` using stdio transport. The server's only behavior is to register a few tools that `httpJson` proxy to a localhost HTTP service.
3. Repeatedly invoke `openclaw agent --message ... --json --timeout 60` (we do this from a third-party bridge, but a shell loop reproduces).
4. Observe that:
   - Every `openclaw agent` invocation lingers as a `openclaw-agent` process for many minutes after returning.
   - Each invocation also leaves behind one `node .../mcp-server/server.js` child of either the gateway or the agent process.
   - `ps | grep mcp-server | wc -l` grows monotonically.
   - Total RSS climbs by ~66 MB per turn.

### Observed scale
On a host that processes ~17 inbound emails since last gateway restart:

```
$ ps -A -o pid,rss,etime,command | grep '[m]cp-server/server.js' \
    | awk '{rss+=$2; n++} END {printf "%d processes, total RSS: %.1f MB\n", n, rss/1024}'
17 processes, total RSS: 1130.2 MB
```

15 of the 17 children are direct children of the gateway daemon; the other 2 are children of the abandoned `openclaw-agent`/`openclaw-infer` parents shown above.

### Expected behavior
After an agent turn completes:
1. Any one-shot `openclaw agent` / `openclaw infer` CLI invocation exits, returning its result.
2. Any spawned MCP server children have their stdin closed (or receive a `shutdown` request), see EOF, and exit naturally.
3. The parent reaps the child via `wait()` so it doesn't become a zombie.
4. Long-running `openclaw gateway` daemon does the same per-turn — spawn, use, shutdown, reap.

This is what other MCP hosts (Claude Code, Cursor, etc.) do with the same `server.js`, and it's the contract the MCP SDK assumes.

### Workaround we deployed (defensive, not a fix)
We added a hard max-lifetime + parent-watch in our MCP server to cap the leak:

```js
const MAX_LIFETIME_MS = parseInt(process.env.MCP_MAX_LIFETIME_MS || '600000', 10);
setTimeout(() => {
  console.error(`[mcp] max lifetime ${MAX_LIFETIME_MS}ms reached, exiting`);
  process.exit(0);
}, MAX_LIFETIME_MS).unref();

setInterval(() => {
  try { process.kill(process.ppid, 0); }
  catch { console.error('[mcp] parent gone, exiting'); process.exit(0); }
}, 30_000).unref();
```

`.unref()` on both timers so they don't keep the event loop alive on their own. This bounds the leak per child but doesn't address the underlying lifecycle issue in OpenClaw — that has to be fixed upstream.

### Environment
- OpenClaw `2026.4.23 (a979721)` installed via `npm i -g openclaw`
- macOS (Apple Silicon Mac mini)
- Node.js `>=20`
- `@modelcontextprotocol/sdk@^1.12.0`
- Gateway running as `ai.openclaw.gateway` LaunchAgent on port 18789

### Why this matters
For an agent that runs continuously (e.g., embedded in any kind of always-on automation — email, calendar, cron-like flows), the leak is unbounded. At ~66 MB per turn and a steady traffic of even tens of turns per day, the host runs out of RAM in a week or two without external intervention. The workaround above is acceptable belt-and-suspenders, but the right fix has to be on the host (OpenClaw) side: either send `shutdown` per the MCP spec, or close stdio and reap the child after each turn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

openclaw agent / openclaw infer CLI processes don't exit; MCP stdio server children orphaned and accumulating (~66 MB each) #71457

Summary

Root cause (hypothesis)

Reproduction

Observed scale

Expected behavior

Workaround we deployed (defensive, not a fix)

Environment

Why this matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

openclaw agent / openclaw infer CLI processes don't exit; MCP stdio server children orphaned and accumulating (~66 MB each) #71457

Description

Summary

Root cause (hypothesis)

Reproduction

Observed scale

Expected behavior

Workaround we deployed (defensive, not a fix)

Environment

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions