-
-
Notifications
You must be signed in to change notification settings - Fork 52.5k
Description
Summary
When the Slack socket mode WebSocket connection is lost due to a transient DNS failure (e.g., network change, WiFi dropout), the gateway process stays alive but the Slack channel becomes permanently unresponsive until a manual openclaw gateway stop && openclaw gateway install.
The Slack WebClient retries individual API calls indefinitely (observed 2800+ retries), but the underlying WebSocket is never re-established.
This is the same root cause as #13506 (WhatsApp), which was fixed in PR #9727 — but the fix was only applied to the WhatsApp channel, not Slack.
Log Evidence
Gateway 2026.2.24, macOS, socket mode, OpenAI Codex provider.
Healthy startup (21:41 UTC):
[slack] socket mode connected
[slack] users resolved: U087GL2J2PQ→U087GL2J2PQ
DNS failure begins (21:46 UTC) — agent completes LLM run but can't deliver reply:
[WARN] bolt-app http request failed getaddrinfo ENOTFOUND slack.com
[WARN] bolt-app http request failed getaddrinfo ENOTFOUND slack.com
[WARN] socket-mode:SlackWebSocket:1 A pong wasn't received from the server before the timeout of 5000ms!
[WARN] web-api:WebClient:125 http request failed getaddrinfo ENOTFOUND slack.com
DNS resolves again (verified via nslookup slack.com on host), but gateway never reconnects. WebClient retry counter climbs to 2800+ over the next hour. Socket mode connection is dead.
Only fix: openclaw gateway stop && openclaw gateway install
Root Cause
The Slack channel monitor (src/slack/monitor.ts, visible in bundled reply-Cx57rl6c.js:38654) has no reconnect loop:
try {
await app.start();
runtime.log?.("slack socket mode connected");
// Blocks forever — no reconnect on socket death
await new Promise((resolve) => {
opts.abortSignal?.addEventListener("abort", () => resolve(), { once: true });
});
} finally {
await app.stop().catch(() => void 0);
}Once app.start() succeeds and the socket later dies, recovery depends entirely on Bolt SDK internals. When the SDK's SocketModeClient fails to re-establish the WebSocket (known issues: slackapi/node-slack-sdk#1495, slackapi/bolt-js#1151), the channel is permanently dead.
Compare with the WhatsApp fix in PR #9727, which wraps the equivalent listenerFactory() call in a retry loop with backoff and maxAttempts.
Expected Behavior
After a transient DNS failure resolves, the Slack socket mode connection should be re-established automatically using the same backoff/retry strategy as WhatsApp (#9727).
Reproduction
- Start gateway with Slack in socket mode
- Verify
slack socket mode connectedin logs - Simulate DNS failure (e.g., switch WiFi networks, or temporarily block
slack.comin/etc/hosts) - Restore DNS
- Observe: WebClient retries individual API calls but socket mode never reconnects
Environment
- OpenClaw: 2026.2.24
- Node: 24.13.0
- macOS Darwin 25.3.0
- Slack mode: socket
- Trigger: network change (inflight WiFi → airport WiFi)
Suggested Fix
Apply the same strategy as PR #9727 to the Slack channel monitor: wrap the app.start() + socket lifetime in a reconnect loop with exponential backoff and maxAttempts. On socket death or unrecoverable Bolt SDK error, tear down and retry the full app.start() cycle.