Bug type
Crash (process/app exits or hangs)
Beta release blocker
No
Summary
My OpenClaw gateway process intermittently crashes with:
Error: Attempted to reconnect zombie connection after disconnecting first (this shouldn't be possible)
The process exits, then systemd --user automatically restarts openclaw-gateway.service.
This appears to be happening in the gateway heartbeat / reconnect path, not as a full machine reboot.
Environment
- OpenClaw version:
v2026.3.12
(I updated the gateway, but the issue still occurred afterward)
- Host: Linux
- Service managed by:
systemd --user
- Channel in use: Discord
- Model:
openai-codex/gpt-5.4
Observed behavior
The gateway runs normally for a while, then crashes with an uncaught exception. systemd notices the failure and restarts the service.
Representative crash sequence:
[openclaw] Uncaught exception: Error: Attempted to reconnect zombie connection after disconnecting first (this shouldn't be possible)
at Object.reconnectCallback (file:///usr/lib/node_modules/openclaw/node_modules/@buape/carbon/src/plugins/gateway/GatewayPlugin.ts:211/217:...)
at Timeout.sendHeartbeat [as _onTimeout] (file:///usr/lib/node_modules/openclaw/node_modules/@buape/carbon/src/plugins/gateway/utils/heartbeat.ts:31:...)
Then:
openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE
openclaw-gateway.service: Failed with result 'exit-code'.
openclaw-gateway.service: Scheduled restart job ...
Starting openclaw-gateway.service - OpenClaw Gateway ...
Started openclaw-gateway.service - OpenClaw Gateway ...
What I checked
- This is not a host reboot. The machine stayed on the same boot.
- I checked for OOM / kernel panic / thermal issues and did not find evidence that those caused the restart.
- The crash appears to come from OpenClaw itself, specifically the heartbeat / reconnect logic.
- I checked whether cron jobs posting to Discord were the immediate cause. I did not find strong evidence that Discord posting is the root cause. At most, activity may expose the bug, but the fatal event is in heartbeat/reconnect handling.
Steps to reproduce
- Run OpenClaw gateway as a systemd --user service on Linux with Discord enabled.
- Keep the gateway running for several hours under normal use, including cron activity and Discord messaging.
- Intermittently, the gateway crashes with:Attempted to reconnect zombie connection after disconnecting first
- systemd then auto-restarts the service.
Expected behavior
If a connection is stale/dead, the gateway should:
- ignore the stale connection
- establish a fresh one if needed
- log a warning/error
- continue running
It should not throw an uncaught exception that kills the whole gateway process.
Actual behavior
The gateway runs normally for a while, then crashes with an uncaught exception. systemd notices the failure and restarts the service.
Representative crash sequence:
[openclaw] Uncaught exception: Error: Attempted to reconnect zombie connection after disconnecting first (this shouldn't be possible)
at Object.reconnectCallback (file:///usr/lib/node_modules/openclaw/node_modules/@buape/carbon/src/plugins/gateway/GatewayPlugin.ts:211/217:...)
at Timeout.sendHeartbeat [as _onTimeout] (file:///usr/lib/node_modules/openclaw/node_modules/@buape/carbon/src/plugins/gateway/utils/heartbeat.ts:31:...)
Then:
openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE
openclaw-gateway.service: Failed with result 'exit-code'.
openclaw-gateway.service: Scheduled restart job ...
Starting openclaw-gateway.service - OpenClaw Gateway ...
Started openclaw-gateway.service - OpenClaw Gateway ...
OpenClaw version
v2026.3.12
Operating system
Ubuntu 24.04.4 LTS (GNU/Linux 6.8.0-107-generic x86_64)
Install method
npm global
Model
openai-codex/gpt-5.4 / anthropic/claude-opus-4.6 / anthropic/claude-sonnet-4.6
Provider / routing chain
Discord -> OpenClaw gateway -> openai-codex/gpt-5.4
Additional provider/model setup details
No response
Logs, screenshots, and evidence
text
2026-04-11T18:30:54+00:00 [openclaw] Uncaught exception: Error: Attempted to reconnect zombie connection after disconnecting first (this shouldn't be possible)
at Object.reconnectCallback (file:///usr/lib/node_modules/openclaw/node_modules/@buape/carbon/src/plugins/gateway/GatewayPlugin.ts:211:...)
at Timeout.sendHeartbeat [as _onTimeout] (file:///usr/lib/node_modules/openclaw/node_modules/@buape/carbon/src/plugins/gateway/utils/heartbeat.ts:31:...)
2026-04-11T18:30:55+00:00 openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE
2026-04-11T18:31:00+00:00 openclaw-gateway.service: Scheduled restart job, restart counter is at 8.
2026-04-11T18:31:16+00:00 Started openclaw-gateway.service - OpenClaw Gateway (v2026.3.12).
Impact and severity
Impact:
- Gateway process crashes intermittently during normal operation
- Messaging becomes temporarily unavailable until systemd restarts the service
- In-flight replies or tool actions may fail or be interrupted
- Reduces reliability of cron jobs and Discord interactions
- Creates noisy restart churn and makes the system hard to trust for unattended use
Severity: High
Additional information
- The crash appears to be in the gateway connection layer, not in the model provider itself.
systemd --user auto-restart masked the failure somewhat by bringing the service back quickly, but the crash is still user-visible because in-flight replies can fail.
- At least one run showed a Discord-side symptom shortly before a crash:
discord final reply failed: AbortError: This operation was aborted
- This may be related, or just a downstream symptom of connection instability.
- After updating, I also observed a likely separate compatibility warning:
TypeError: api.registerAgentHarness is not a function
- from the Codex plugin registration path
- This did not appear to be the direct fatal error, but it may indicate a version mismatch worth checking.
- The gateway logs also showed repeated Bonjour advertiser warnings before/after restarts:
watchdog detected non-announced service; attempting re-advertise
- probably secondary, but worth mentioning in case it points to broader connection lifecycle issues.
- After updating, I also saw:
[plugins] codex failed during register from /usr/lib/node_modules/openclaw/dist/extensions/codex/index.js: TypeError: api.registerAgentHarness is not a function
- But this did not seem to be the direct crash trigger. The actual fatal event was still the zombie reconnect exception. It should not throw an uncaught exception that kills the whole gateway process
- Repro is not deterministic on demand
Bug type
Crash (process/app exits or hangs)
Beta release blocker
No
Summary
My OpenClaw gateway process intermittently crashes with:
Error: Attempted to reconnect zombie connection after disconnecting first (this shouldn't be possible)The process exits, then
systemd --userautomatically restartsopenclaw-gateway.service.This appears to be happening in the gateway heartbeat / reconnect path, not as a full machine reboot.
Environment
v2026.3.12(I updated the gateway, but the issue still occurred afterward)
systemd --useropenai-codex/gpt-5.4Observed behavior
The gateway runs normally for a while, then crashes with an uncaught exception. systemd notices the failure and restarts the service.
Representative crash sequence:
Then:
What I checked
Steps to reproduce
Expected behavior
If a connection is stale/dead, the gateway should:
It should not throw an uncaught exception that kills the whole gateway process.
Actual behavior
The gateway runs normally for a while, then crashes with an uncaught exception. systemd notices the failure and restarts the service.
Representative crash sequence:
Then:
OpenClaw version
v2026.3.12
Operating system
Ubuntu 24.04.4 LTS (GNU/Linux 6.8.0-107-generic x86_64)
Install method
npm global
Model
openai-codex/gpt-5.4 / anthropic/claude-opus-4.6 / anthropic/claude-sonnet-4.6
Provider / routing chain
Discord -> OpenClaw gateway -> openai-codex/gpt-5.4
Additional provider/model setup details
No response
Logs, screenshots, and evidence
Impact and severity
Impact:
Severity: High
Additional information
systemd --userauto-restart masked the failure somewhat by bringing the service back quickly, but the crash is still user-visible because in-flight replies can fail.discord final reply failed: AbortError: This operation was abortedTypeError: api.registerAgentHarness is not a functionwatchdog detected non-announced service; attempting re-advertise