Skip to content

fix(codex): prevent gateway crash when app-server subprocess terminates abruptly#67947

Merged
openperf merged 2 commits into
openclaw:mainfrom
openperf:fix/67886-gateway-epipe-crash
Apr 17, 2026
Merged

fix(codex): prevent gateway crash when app-server subprocess terminates abruptly#67947
openperf merged 2 commits into
openclaw:mainfrom
openperf:fix/67886-gateway-epipe-crash

Conversation

@openperf

@openperf openperf commented Apr 17, 2026

Copy link
Copy Markdown
Member

Summary

  • Problem: When the codex-acp subprocess terminates abruptly (e.g. misconfigured binary, interactive prompt on stdout, deprecated config error), CodexAppServerClient.writeMessage() at extensions/codex/src/app-server/client.ts:215 writes a JSON-RPC payload to the dead subprocess's stdin. Node.js emits an asynchronous write EPIPE error on the stream, and because no error event handler is registered on child.stdin, the error propagates as an unhandled exception that crashes the entire OpenClaw gateway daemon — taking down all connected channels.

  • Root Cause: Three compounding defects in CodexAppServerClient:

    1. No error handler on child.stdin (constructor, line 57–77): the constructor registers error/exit handlers on the transport (child) itself, but never on child.stdin. When the pipe breaks, the stdin stream emits an error event that has no listener, causing Node.js to throw it as an uncaught exception.
    2. writeMessage() does not check this.closed (line 214–216): even after the client is marked closed by closeWithError(), async code paths like handleServerRequest() can still resume and call writeMessage(), triggering writes to a dead pipe.
    3. closeWithError() performs incomplete cleanup (line 298–304): unlike close() which closes the readline interface and shuts down the transport, closeWithError() only sets closed = true and rejects pending requests — leaving the readline active (processing more lines → more writes) and the transport open.
  • Fix: Three targeted changes that address each root cause:

    1. Attach an error event handler on child.stdin in the constructor that routes errors through closeWithError(), preventing unhandled exceptions.
    2. Guard writeMessage() with a this.closed check, preventing writes to dead pipes from async continuations.
    3. Align closeWithError() cleanup with close(): close the readline interface and shut down the transport.
  • What changed:

    • extensions/codex/src/app-server/transport.ts: added optional on method to the stdin type to support error event registration across stdio, websocket, and test transports.
    • extensions/codex/src/app-server/client.ts: added stdin error handler in constructor; added closed guard in writeMessage(); added lines.close() and closeCodexAppServerTransport() to closeWithError().
    • extensions/codex/src/app-server/client.test.ts: added two tests — one verifying stdin EPIPE is caught and pending requests are rejected gracefully; one verifying notify() after child exit does not attempt writes.
  • What did NOT change (scope boundary):

    • close() behavior is unchanged — it already performed full cleanup.
    • request() call-site error handling is unchanged — its existing try/catch around writeMessage() continues to work.
    • WebSocket transport (transport-websocket.ts) is unchanged — its custom Writable stdin already buffers writes safely.
    • No changes to handleLine(), handleResponse(), handleNotification(), or any RPC protocol logic.
    • No changes to the transport lifecycle in closeCodexAppServerTransport().

Reproduction

  1. Configure OpenClaw with a codex-acp binary that fails during initialization (e.g. a binary that writes an interactive prompt or error text to stdout, then exits immediately).
  2. Start the gateway: openclaw gateway
  3. The gateway attempts to send a JSON-RPC initialize payload to the dead subprocess's stdin.
  4. Before fix: the gateway crashes with Error: write EPIPE as an unhandled exception.
  5. After fix: the client closes gracefully, rejects the pending initialize request, and the gateway continues running.

Risk / Mitigation

  • Risk: closeWithError() now calls closeCodexAppServerTransport() which sends SIGTERM/SIGKILL to the child process. If the child has already exited (the common case), the signal is silently ignored. If the child is still running after a stdin error, terminating it is the correct behavior.
  • Mitigation: Both close() and closeWithError() are guarded by if (this.closed) return, so only one can execute — no double-cleanup risk. The closeCodexAppServerTransport() force-kill timer is unref()'d, so it cannot keep the process alive. Two new tests validate the exact crash scenario and the post-exit write guard.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Gateway
  • Codex extension
  • Plugin SDK (transport type)

Linked Issue/PR

Fixes #67886

@greptile-apps

greptile-apps Bot commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Fixes a gateway crash caused by unhandled EPIPE errors when the codex-acp subprocess terminates abruptly. Three targeted changes address three compounding defects: registering a stdin error handler in the constructor, guarding writeMessage() with a this.closed check, and aligning closeWithError() cleanup with the existing close() path (readline close + transport teardown). Two new tests validate the exact crash scenario and the post-exit write guard.

Confidence Score: 5/5

Safe to merge — fixes a real gateway crash with targeted, well-tested changes and no regressions to existing behavior.

All three root causes are addressed correctly. closeWithError() is idempotent (guarded by if (this.closed) return), so the new transport teardown call is safe even when exit fires alongside an EPIPE error. The optional chaining on child.stdin.on?.() correctly skips registration for transports that don't expose on. No P0/P1 findings; remaining observations are cosmetic.

No files require special attention.

Reviews (1): Last reviewed commit: "fix(codex): handle stdin EPIPE in CodexA..." | Re-trigger Greptile

@rexname

rexname commented Apr 17, 2026

Copy link
Copy Markdown

Jejak

@openperf openperf force-pushed the fix/67886-gateway-epipe-crash branch 2 times, most recently from a750b29 to 34c947e Compare April 17, 2026 15:24
…eway crash

When the codex-acp subprocess terminates abruptly, writing to its dead
stdin pipe emits an unhandled EPIPE error that crashes the entire
gateway daemon.

Root cause: writeMessage() wrote to child.stdin without checking the
closed state, child.stdin had no error event handler, and closeWithError()
did not clean up the readline interface or transport — leaving the client
in a half-closed state that allowed further writes.

Fix: attach an error handler on child.stdin, guard writeMessage() against
writes after close, and align closeWithError() cleanup with close().
…#67947

- transport.ts: narrow stdin.on? from (event: string) to (event: "error")
  so the error-handler contract is explicit and misuse is caught at compile time
- CHANGELOG.md: add Fixes entry for openclaw#67947 under ## Unreleased
@openperf openperf force-pushed the fix/67886-gateway-epipe-crash branch from 34c947e to b2ec11b Compare April 17, 2026 15:28
@openperf openperf merged commit 0b3d876 into openclaw:main Apr 17, 2026
7 checks passed
kvnkho pushed a commit to kvnkho/openclaw that referenced this pull request Apr 17, 2026
…es abruptly (openclaw#67947)

Fixes openclaw#67886. Handles stdin EPIPE in CodexAppServerClient by attaching an error handler, guarding writeMessage against writes after close, and aligning closeWithError cleanup with close.
Mquarmoc pushed a commit to Mquarmoc/openclaw that referenced this pull request Apr 20, 2026
…es abruptly (openclaw#67947)

Fixes openclaw#67886. Handles stdin EPIPE in CodexAppServerClient by attaching an error handler, guarding writeMessage against writes after close, and aligning closeWithError cleanup with close.
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
…es abruptly (openclaw#67947)

Fixes openclaw#67886. Handles stdin EPIPE in CodexAppServerClient by attaching an error handler, guarding writeMessage against writes after close, and aligning closeWithError cleanup with close.
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
…es abruptly (openclaw#67947)

Fixes openclaw#67886. Handles stdin EPIPE in CodexAppServerClient by attaching an error handler, guarding writeMessage against writes after close, and aligning closeWithError cleanup with close.
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
…es abruptly (openclaw#67947)

Fixes openclaw#67886. Handles stdin EPIPE in CodexAppServerClient by attaching an error handler, guarding writeMessage against writes after close, and aligning closeWithError cleanup with close.
globalcaos pushed a commit to globalcaos/tinkerclaw that referenced this pull request May 13, 2026
…es abruptly (openclaw#67947)

Fixes openclaw#67886. Handles stdin EPIPE in CodexAppServerClient by attaching an error handler, guarding writeMessage against writes after close, and aligning closeWithError cleanup with close.
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
…es abruptly (openclaw#67947)

Fixes openclaw#67886. Handles stdin EPIPE in CodexAppServerClient by attaching an error handler, guarding writeMessage against writes after close, and aligning closeWithError cleanup with close.
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
…es abruptly (openclaw#67947)

Fixes openclaw#67886. Handles stdin EPIPE in CodexAppServerClient by attaching an error handler, guarding writeMessage against writes after close, and aligning closeWithError cleanup with close.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gateway crashes with unhandled write EPIPE when CodexAppServerClient receives unparseable stdout

2 participants