Fix Slack socket health stale detection#69833
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 166ae81efe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Greptile SummaryThis PR fixes quiet Slack workspaces being incorrectly marked stale by separating socket liveness from inbound event activity: connect no longer seeds Confidence Score: 4/5Safe to merge; one P2 finding about clientOptions placement in SocketModeReceiver worth addressing before the next release. All findings are P2 or lower. The one flagged concern (clientOptions going to installerOptions rather than the top-level SocketModeReceiver field) is unlikely to cause issues in practice because the App constructor still receives the correct options for web API calls, but it could silently miss proxy/TLS settings for the socket transport if those are ever configured. extensions/slack/src/monitor/provider.ts — review the clientOptions placement in createSlackBoltApp for SocketModeReceiver. Prompt To Fix All With AIThis is a comment left during a code review.
Path: extensions/slack/src/monitor/provider.ts
Line: 676-682
Comment:
**`clientOptions` routed to `installerOptions`, not the top-level receiver field**
`SocketModeReceiverOptions` has a top-level `clientOptions: SocketModeOptions` for configuring the underlying `@slack/socket-mode` transport. The `installerOptions.clientOptions` slot is a `WebClientOptions` for the OAuth installer's web client, which is inactive here (no OAuth flow). As a result, options from `resolveSlackWebClientOptions()` land in the installer slot and never reach the socket transport. The `App` constructor still receives them directly so all handler-side Slack API calls are fine, but any proxy or TLS settings meant for the socket connection would be silently ignored.
Consider passing `clientOptions` at the top level of `SocketModeReceiver` instead of inside `installerOptions`.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix: make slack socket health event-driv..." | Re-trigger Greptile |
fee029a to
38a4e3f
Compare
38a4e3f to
02c89ff
Compare
97a6f1a to
205d5ae
Compare
a212f8e to
3657f5f
Compare
a13f7c7 to
2bf80d8
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2bf80d8dd4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (Array.isArray(target) && Array.isArray(patch)) { | ||
| const merged = target.map((entry) => structuredClone(entry)); | ||
| const indexById = new Map<string, number>(); | ||
| for (const [index, entry] of merged.entries()) { | ||
| if (isObjectWithStringId(entry)) { |
There was a problem hiding this comment.
Match gateway array merge semantics in noop patch detection
isConfigPatchNoopForSnapshot can misclassify real config.patch updates as no-ops because applyQaMergePatch always takes this array-merge path and preserves the existing array when the patch array is empty. In gateway config.patch, non-id arrays are replaced (not merged), so a patch like {"tools":{"deny":[]}} over {"tools":{"deny":["image_generate"]}} is a real change; here it is treated as unchanged and runConfigMutation skips the RPC, leaving stale config that can taint subsequent QA scenarios.
Useful? React with 👍 / 👎.
| if (scenarioState.subagentFanoutPhase === 2 && prompt) { | ||
| scenarioState.subagentFanoutPhase = 3; | ||
| return fanoutCompleteReply; |
There was a problem hiding this comment.
Scope fanout completion reply to fanout-specific turns
This completion branch is now triggered by any non-empty prompt when subagentFanoutPhase === 2, even if the request is unrelated to fanout. Because scenarioState is shared across requests in the mock server, an interrupted fanout can leave phase 2 set and the next unrelated prompt will incorrectly receive subagent-1: ok\nsubagent-2: ok, polluting parity outputs.
Useful? React with 👍 / 👎.
| Object.assign(status, createConnectedChannelStatusPatch(at)); | ||
| } | ||
| if (options?.transportActivity) { | ||
| Object.assign(status, createTransportActivityStatusPatch(at)); |
There was a problem hiding this comment.
Clear stale transport timestamp on reconnect without sync proof
When reconnecting from a disconnected state, noteConnected sets connected via createConnectedChannelStatusPatch but only refreshes lastTransportActivityAt for SYNCING. If the prior transport timestamp is old, it is carried into the new connected lifecycle and can immediately trip stale-socket health checks before the first successful /sync, causing avoidable restart churn.
Useful? React with 👍 / 👎.
| export type ChannelStatusAdapter<ResolvedAccount, Probe = unknown, Audit = unknown> = { | ||
| defaultRuntime?: ChannelAccountSnapshot; | ||
| skipStaleSocketHealthCheck?: boolean; | ||
| /** Runtime `mode` values where `lastEventAt` can prove connected socket liveness. */ | ||
| staleSocketHealthCheckModes?: readonly string[]; | ||
| buildChannelSummary?: BivariantCallback< |
There was a problem hiding this comment.
Restore removed stale-socket fields in public status adapter
Removing skipStaleSocketHealthCheck and staleSocketHealthCheckModes from ChannelStatusAdapter is a source-compatible break for third-party channel plugins that still provide those properties in status object literals. After this change, those plugins fail TypeScript excess-property checks on upgrade even though no major-version SDK break is being introduced.
Useful? React with 👍 / 👎.
| const healthPolicy: ChannelHealthPolicy = { | ||
| channelId, | ||
| now, | ||
| staleEventThresholdMs: timing.staleEventThresholdMs, | ||
| channelConnectGraceMs: timing.channelConnectGraceMs, | ||
| skipStaleSocketCheck: channelPluginStatus?.skipStaleSocketHealthCheck, | ||
| staleSocketHealthCheckModes: channelPluginStatus?.staleSocketHealthCheckModes, | ||
| }; |
There was a problem hiding this comment.
Preserve plugin stale-socket override wiring in health monitor
The monitor now evaluates health without reading channel plugin status overrides, so plugin-defined stale-socket exemptions are silently ignored at runtime. For plugins that relied on those overrides, connected accounts can now be restarted for stale-socket in modes they intentionally excluded, which is a behavioral regression independent of type-level cleanup.
Useful? React with 👍 / 👎.
Summary
Slack Socket Mode was using
lastEventAtfor two different meanings at once: real inbound Slack activity and socket liveness. That caused quiet but healthy Slack workspaces to be marked stale. This PR makes Slack health rely on actual socket disconnects instead of inactivity windows, while keeping the generic stale-socket behavior unchanged for other channels.What Changed
skipStaleSocketHealthCheckthrough the computed status adapter and enable it for Slack onlylastEventAton Slack connect; connect status now setsconnected,lastConnectedAt,healthState: "healthy", and clearslastErrorSocketModeReceiverwithautoReconnectEnabled: false, passing sharedclientOptionsthroughinstallerOptions.clientOptionsso OpenClaw owns reconnect behaviorapp.start()to avoid the startup race where the socket can disconnect before the waiter is attachedtrackEvent()through real inbound Slack activity paths: messages, reactions, member/channel/pin events, block actions, modal submit and close, slash commands, slash arg options, and slash arg actionsWhy
Testing
Fully tested locally.
pnpm checkpnpm buildpnpm exec vitest run --config test/vitest/vitest.extension-slack.config.tsnode scripts/run-vitest.mjs run --config test/vitest/vitest.gateway-methods.config.ts src/gateway/server-methods/send.test.tsReferences
AI Disclosure