Skip to content

[Bug]: Gateway crashes with Attempted to reconnect zombie connection after disconnecting first and is auto-restarted by systemd #65009

@rivluc

Description

@rivluc

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

My OpenClaw gateway process intermittently crashes with:

Error: Attempted to reconnect zombie connection after disconnecting first (this shouldn't be possible)

The process exits, then systemd --user automatically restarts openclaw-gateway.service.

This appears to be happening in the gateway heartbeat / reconnect path, not as a full machine reboot.

Environment

  • OpenClaw version: v2026.3.12
    (I updated the gateway, but the issue still occurred afterward)
  • Host: Linux
  • Service managed by: systemd --user
  • Channel in use: Discord
  • Model: openai-codex/gpt-5.4

Observed behavior
The gateway runs normally for a while, then crashes with an uncaught exception. systemd notices the failure and restarts the service.

Representative crash sequence:

[openclaw] Uncaught exception: Error: Attempted to reconnect zombie connection after disconnecting first (this shouldn't be possible)
    at Object.reconnectCallback (file:///usr/lib/node_modules/openclaw/node_modules/@buape/carbon/src/plugins/gateway/GatewayPlugin.ts:211/217:...)
    at Timeout.sendHeartbeat [as _onTimeout] (file:///usr/lib/node_modules/openclaw/node_modules/@buape/carbon/src/plugins/gateway/utils/heartbeat.ts:31:...)

Then:

openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE
openclaw-gateway.service: Failed with result 'exit-code'.
openclaw-gateway.service: Scheduled restart job ...
Starting openclaw-gateway.service - OpenClaw Gateway ...
Started openclaw-gateway.service - OpenClaw Gateway ...

What I checked

  • This is not a host reboot. The machine stayed on the same boot.
  • I checked for OOM / kernel panic / thermal issues and did not find evidence that those caused the restart.
  • The crash appears to come from OpenClaw itself, specifically the heartbeat / reconnect logic.
  • I checked whether cron jobs posting to Discord were the immediate cause. I did not find strong evidence that Discord posting is the root cause. At most, activity may expose the bug, but the fatal event is in heartbeat/reconnect handling.

Steps to reproduce

  1. Run OpenClaw gateway as a systemd --user service on Linux with Discord enabled.
  2. Keep the gateway running for several hours under normal use, including cron activity and Discord messaging.
  3. Intermittently, the gateway crashes with:Attempted to reconnect zombie connection after disconnecting first
  4. systemd then auto-restarts the service.

Expected behavior

If a connection is stale/dead, the gateway should:

  • ignore the stale connection
  • establish a fresh one if needed
  • log a warning/error
  • continue running

It should not throw an uncaught exception that kills the whole gateway process.

Actual behavior

The gateway runs normally for a while, then crashes with an uncaught exception. systemd notices the failure and restarts the service.

Representative crash sequence:

[openclaw] Uncaught exception: Error: Attempted to reconnect zombie connection after disconnecting first (this shouldn't be possible)
    at Object.reconnectCallback (file:///usr/lib/node_modules/openclaw/node_modules/@buape/carbon/src/plugins/gateway/GatewayPlugin.ts:211/217:...)
    at Timeout.sendHeartbeat [as _onTimeout] (file:///usr/lib/node_modules/openclaw/node_modules/@buape/carbon/src/plugins/gateway/utils/heartbeat.ts:31:...)

Then:

openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE
openclaw-gateway.service: Failed with result 'exit-code'.
openclaw-gateway.service: Scheduled restart job ...
Starting openclaw-gateway.service - OpenClaw Gateway ...
Started openclaw-gateway.service - OpenClaw Gateway ...

OpenClaw version

v2026.3.12

Operating system

Ubuntu 24.04.4 LTS (GNU/Linux 6.8.0-107-generic x86_64)

Install method

npm global

Model

openai-codex/gpt-5.4 / anthropic/claude-opus-4.6 / anthropic/claude-sonnet-4.6

Provider / routing chain

Discord -> OpenClaw gateway -> openai-codex/gpt-5.4

Additional provider/model setup details

No response

Logs, screenshots, and evidence

text
2026-04-11T18:30:54+00:00 [openclaw] Uncaught exception: Error: Attempted to reconnect zombie connection after disconnecting first (this shouldn't be possible)
    at Object.reconnectCallback (file:///usr/lib/node_modules/openclaw/node_modules/@buape/carbon/src/plugins/gateway/GatewayPlugin.ts:211:...)
    at Timeout.sendHeartbeat [as _onTimeout] (file:///usr/lib/node_modules/openclaw/node_modules/@buape/carbon/src/plugins/gateway/utils/heartbeat.ts:31:...)
2026-04-11T18:30:55+00:00 openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE
2026-04-11T18:31:00+00:00 openclaw-gateway.service: Scheduled restart job, restart counter is at 8.
2026-04-11T18:31:16+00:00 Started openclaw-gateway.service - OpenClaw Gateway (v2026.3.12).

Impact and severity

Impact:

  • Gateway process crashes intermittently during normal operation
  • Messaging becomes temporarily unavailable until systemd restarts the service
  • In-flight replies or tool actions may fail or be interrupted
  • Reduces reliability of cron jobs and Discord interactions
  • Creates noisy restart churn and makes the system hard to trust for unattended use

Severity: High

Additional information

  • The crash appears to be in the gateway connection layer, not in the model provider itself.
  • systemd --user auto-restart masked the failure somewhat by bringing the service back quickly, but the crash is still user-visible because in-flight replies can fail.
  • At least one run showed a Discord-side symptom shortly before a crash:
    • discord final reply failed: AbortError: This operation was aborted
    • This may be related, or just a downstream symptom of connection instability.
  • After updating, I also observed a likely separate compatibility warning:
    • TypeError: api.registerAgentHarness is not a function
    • from the Codex plugin registration path
    • This did not appear to be the direct fatal error, but it may indicate a version mismatch worth checking.
  • The gateway logs also showed repeated Bonjour advertiser warnings before/after restarts:
    • watchdog detected non-announced service; attempting re-advertise
    • probably secondary, but worth mentioning in case it points to broader connection lifecycle issues.
  • After updating, I also saw:
[plugins] codex failed during register from /usr/lib/node_modules/openclaw/dist/extensions/codex/index.js: TypeError: api.registerAgentHarness is not a function
  • But this did not seem to be the direct crash trigger. The actual fatal event was still the zombie reconnect exception. It should not throw an uncaught exception that kills the whole gateway process
  • Repro is not deterministic on demand

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingbug:crashProcess/app exits unexpectedly or hangs

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions