Bug type
Regression (worked before, now fails)
Summary
After upgrading from v2026.3.11 to v2026.3.24, the gateway crashes every ~35 minutes due to Discord's health-monitor detecting stale sockets and triggering a reconnection path that throws an uncaught exception. Zero crashes occurred across 5+ days on v2026.3.11. On v2026.3.24, 16 crashes occurred in a single day.
Steps to reproduce
- Install v2026.3.24 with Discord channel enabled (single guild, allowlist-only)
- Gateway runs normally for ~30-35 minutes
- Discord health-monitor detects a stale WebSocket (no events within staleSocketMinutes, default 30)
- Health-monitor calls stopChannel() → triggers onAbort()
- onAbort() sets gateway.options.reconnect = { maxAttempts: 0 } then calls gateway.disconnect()
- WebSocket closes with code 1005 ("No Status Received")
- handleClose(1005) → handleReconnectionAttempt() → checks reconnectAttempts(0) >= maxAttempts(0) → true
- Emits new Error("Max reconnect attempts (0) reached after code 1005")
- Error is uncaught → entire Node.js process crashes
- systemd restarts → cycle repeats every ~35 minutes
Expected behavior
Health-monitor should gracefully restart the Discord channel without crashing the gateway process.
Actual behavior
The onAbort handler sets maxAttempts: 0 before disconnecting. The WebSocket close handler then fires and immediately triggers the max-attempts error path (0 >= 0 is true), emitting an uncaught exception that crashes the entire Node.js process.
OpenClaw version
2026.3.24 (upgraded from 2026.3.11)
Operating system
Ubuntu 24.04 LTS (Linux 6.18.7 x64)
Install method
npm global
Model
anthropic/claude-opus-4-6 / anthropic/claude-sonnet-4-6
Provider / routing chain
openclaw -> anthropic (direct)
Additional provider/model setup details
Bug is in Discord WebSocket lifecycle management, not model-specific.
Logs, screenshots, and evidence
[health-monitor] [discord:default] health-monitor: restarting (reason: stale-socket)
[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)
at SafeGatewayPlugin.handleClose (provider-CAlWEl41.js:3364:8)
at WebSocket.<anonymous> (provider-CAlWEl41.js:3307:9)
Root cause in provider-CAlWEl41.js:
Line 6952 — onAbort sets: gateway.options.reconnect = { maxAttempts: 0 };
Lines 3316-3318 — Reconnection handler checks:
const { maxAttempts = 5 } = this.options.reconnect ?? {};
if (this.reconnectAttempts >= maxAttempts) {
this.emitter.emit("error", new Error(`Max reconnect attempts (${maxAttempts}) reached...`));
Crash frequency data:
• Mar 17-24 (v2026.3.11): 0 crashes across 5+ days
• Mar 25 (v2026.3.24): 16 crashes in one day, every ~35 min
Impact and severity
High — Gateway crashes every ~35 minutes. All running subagent sessions are disrupted or killed. Subagent completion announce-back fails after restart ("Outbound not configured for channel: telegram"). Long-running subagent tasks (30-75 min) have near-zero chance of completing.
Additional information
Suggested fixes:
- (Preferred) Set a flag to suppress the close handler rather than manipulating maxAttempts — lifecycleStopping already exists on line 6944, add a check in handleClose
- Set maxAttempts to a sentinel value that handleReconnectionAttempt treats as "intentional shutdown, don't emit error"
- Catch the error in the health-monitor's restart flow so it doesn't propagate as uncaught
Workaround: Disable Discord (channels.discord.enabled: false).
Note: Also observed a secondary issue — with Discord channel disabled but Discord plugin still enabled (plugins.entries.discord.enabled: true), message-action-discovery still tries to resolve the Discord token SecretRef, causing a separate crash ("Unhandled promise rejection: channels.discord.token: unresolved SecretRef"). Both the channel AND plugin must be disabled as workaround.
Bug type
Regression (worked before, now fails)
Summary
After upgrading from v2026.3.11 to v2026.3.24, the gateway crashes every ~35 minutes due to Discord's health-monitor detecting stale sockets and triggering a reconnection path that throws an uncaught exception. Zero crashes occurred across 5+ days on v2026.3.11. On v2026.3.24, 16 crashes occurred in a single day.
Steps to reproduce
Expected behavior
Health-monitor should gracefully restart the Discord channel without crashing the gateway process.
Actual behavior
The onAbort handler sets maxAttempts: 0 before disconnecting. The WebSocket close handler then fires and immediately triggers the max-attempts error path (0 >= 0 is true), emitting an uncaught exception that crashes the entire Node.js process.
OpenClaw version
2026.3.24 (upgraded from 2026.3.11)
Operating system
Ubuntu 24.04 LTS (Linux 6.18.7 x64)
Install method
npm global
Model
anthropic/claude-opus-4-6 / anthropic/claude-sonnet-4-6
Provider / routing chain
openclaw -> anthropic (direct)
Additional provider/model setup details
Bug is in Discord WebSocket lifecycle management, not model-specific.
Logs, screenshots, and evidence
[health-monitor] [discord:default] health-monitor: restarting (reason: stale-socket) [openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005 at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47) at SafeGatewayPlugin.handleClose (provider-CAlWEl41.js:3364:8) at WebSocket.<anonymous> (provider-CAlWEl41.js:3307:9) Root cause in provider-CAlWEl41.js: Line 6952 — onAbort sets: gateway.options.reconnect = { maxAttempts: 0 }; Lines 3316-3318 — Reconnection handler checks: const { maxAttempts = 5 } = this.options.reconnect ?? {}; if (this.reconnectAttempts >= maxAttempts) { this.emitter.emit("error", new Error(`Max reconnect attempts (${maxAttempts}) reached...`)); Crash frequency data: • Mar 17-24 (v2026.3.11): 0 crashes across 5+ days • Mar 25 (v2026.3.24): 16 crashes in one day, every ~35 minImpact and severity
High — Gateway crashes every ~35 minutes. All running subagent sessions are disrupted or killed. Subagent completion announce-back fails after restart ("Outbound not configured for channel: telegram"). Long-running subagent tasks (30-75 min) have near-zero chance of completing.
Additional information
Suggested fixes:
Workaround: Disable Discord (channels.discord.enabled: false).
Note: Also observed a secondary issue — with Discord channel disabled but Discord plugin still enabled (plugins.entries.discord.enabled: true), message-action-discovery still tries to resolve the Discord token SecretRef, causing a separate crash ("Unhandled promise rejection: channels.discord.token: unresolved SecretRef"). Both the channel AND plugin must be disabled as workaround.