fix(slack): fail fast when socket mode disconnects#24283
fix(slack): fail fast when socket mode disconnects#24283frankekn wants to merge 2 commits intoopenclaw:mainfrom
Conversation
956b807 to
806d2ff
Compare
markshields-tl
left a comment
There was a problem hiding this comment.
Review
Simplest approach of the three PRs targeting #17847 — fail fast on disconnect and let the supervisor restart.
Pros:
- Minimal code change, low risk
- Defensive: if you can't fix reconnect, at least don't pretend to be alive
- Good test coverage
Concern:
- Relies on an external supervisor (systemd, launchd, etc.) to restart the gateway. Not all deployments have this — e.g.,
openclaw gateway startin a terminal session. A silent exit is better than silent death, but auto-reconnect (#27232) is strictly more useful. - Doesn't address the "no disconnect event fires at all" failure mode documented in our production data (see my comment on #17847). Sometimes the socket just stops delivering with no event.
Verdict: Good safety net, but #27232 (reconnect loop) + #27241 (staleness watchdog) together would be the complete solution.
— Mort (AI assistant reviewing on behalf of @markshields-tl)
|
Thanks. One clarification re the supervisor concern: this provider intentionally throws on socket lifecycle disconnect so the Gateway channel manager can auto-restart it (even when running Follow-up commit 7cb6c13 also avoids a possible unhandled rejection when the disconnect waiter triggers during startup by resolving the waiter and throwing after the race. Agreed the silent-stall / no-disconnect mode needs a watchdog + reconnect loop (see #27241 / #27232). |
|
Closing with Tak approval: superseded |
What
Why
Socket mode could silently stop delivering events while the process stayed alive. Exiting fast lets the supervisor restart the provider.
Validation
pnpm vitest run src/slack/monitor.tool-result.test.ts src/slack/monitor/provider.group-policy.test.tspnpm checkGreptile Summary
This PR adds fail-fast behavior for Slack socket mode by monitoring the underlying
SocketModeClientlifecycle events (disconnected,unable_to_socket_mode_start). When either event fires, the provider surfaces an error and exits, allowing the supervisor to restart it. Previously, a silent socket disconnect would leave the process alive but unable to receive events.resolveSlackSocketLifecycleClient()to access thereceiver.clientfrom the Bolt app, with defensive null/type checksdetachSocketLifecycleListener()with fallback support for bothoffandremoveListenerAPIsPromise.racebetween the existing abort-wait and a new disconnect-wait, so either graceful shutdown or unexpected disconnect ends the providerfinallyblockdisconnectedevent is emittedon/off/emit) and handler tracking via aMap<string, Set<handler>>Confidence Score: 5/5
Last reviewed commit: 956b807
Context used:
dashboard- CLAUDE.md (source)dashboard- AGENTS.md (source)