Skip to content

Gateway crashes with EPIPE on LaunchAgent restart, causes exponential throttle #4632

@ninjaboy

Description

@ninjaboy

Description

The gateway process spontaneously crashes with Uncaught exception: Error: write EPIPE and stops working. Once crashed, macOS launchd applies exponential backoff on restarts — resulting in the gateway being down for hours with no automatic recovery.

The user has to manually run clawdbot doctor --fix every time to bring the bot back to life.

Steps to Reproduce

  1. Install Clawdbot with LaunchAgent (default setup)
  2. Use the bot normally
  3. At some point, the gateway crashes with EPIPE and stops responding
  4. It does not come back on its own — launchd throttles restart attempts after repeated crashes
  5. Only clawdbot doctor --fix (or manual restart) brings it back

This happens repeatedly and unpredictably. In our case, the gateway died at 22:43 and did not come back until 09:06 the next day — a 10-hour outage with KeepAlive: true set in the LaunchAgent.

Root Cause

The gateway writes to stdout/stderr after the pipe has been closed (during process shutdown or launchd restart). This triggers an uncaught EPIPE exception → crash → exit code 1. Repeated crashes cause macOS launchd to throttle KeepAlive restarts exponentially (the service had runs = 11 and last exit code = 1).

Additionally, every restart produces unhandled AbortError from pending fetch requests being cancelled — these may contribute to the crash loop.

Error Logs

[clawdbot] Uncaught exception: Error: write EPIPE
    at afterWriteDispatched (node:internal/stream_base_commons:159:15)
    at writeGeneric (node:internal/stream_base_commons:150:3)
    at Socket._writeGeneric (node:net:971:11)
    at Socket._write (node:net:983:8)
    at writeOrBuffer (node:internal/streams/writable:570:12)
    at _write (node:internal/streams/writable:499:10)
    at Writable.write (node:internal/streams/writable:508:10)
    at console.value (node:internal/console/constructor:298:16)
    at console.log (node:internal/console/constructor:384:26)
[clawdbot] Unhandled promise rejection: AbortError: This operation was aborted
    at node:internal/deps/undici/undici:13502:13

Gateway restart timeline showing exponential throttle:

22:43:08 — listening (PID 66511)
[10+ hour gap — gateway dead, launchd throttled]
09:06:33 — listening (PID 7341) ← only after manual doctor --fix
09:08:13 — restart
09:08:29 — restart  
09:08:53 — restart
[3+ hour gap — throttled again]
12:24:20 — listening ← after another doctor --fix

Workaround

Add ThrottleInterval: 5 to the LaunchAgent plist to cap restart delay at 5 seconds:

<key>ThrottleInterval</key>
<integer>5</integer>

Suggested Fix

  1. Handle EPIPE on stdout/stderr: process.stdout.on("error", () => {}) or equivalent in the logging layer
  2. Catch AbortError on pending fetch requests during shutdown
  3. Add ThrottleInterval to the generated LaunchAgent plist by default so crashes don't cause hours of downtime

Environment

  • Clawdbot: 2026.1.24-3
  • macOS: 15.6.1 (arm64)
  • Node: 23.7.0
  • LaunchAgent: com.clawdbot.gateway with KeepAlive: true

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions