Skip to content

Fix Slack socket health stale detection#69833

Merged
steipete merged 15 commits intoopenclaw:mainfrom
bek91:fix/slack-socket-health
Apr 22, 2026
Merged

Fix Slack socket health stale detection#69833
steipete merged 15 commits intoopenclaw:mainfrom
bek91:fix/slack-socket-health

Conversation

@bek91
Copy link
Copy Markdown
Contributor

@bek91 bek91 commented Apr 21, 2026

Summary

Slack Socket Mode was using lastEventAt for two different meanings at once: real inbound Slack activity and socket liveness. That caused quiet but healthy Slack workspaces to be marked stale. This PR makes Slack health rely on actual socket disconnects instead of inactivity windows, while keeping the generic stale-socket behavior unchanged for other channels.

What Changed

  • propagate skipStaleSocketHealthCheck through the computed status adapter and enable it for Slack only
  • stop seeding lastEventAt on Slack connect; connect status now sets connected, lastConnectedAt, healthState: "healthy", and clears lastError
  • switch Slack Socket Mode to an explicit SocketModeReceiver with autoReconnectEnabled: false, passing shared clientOptions through installerOptions.clientOptions so OpenClaw owns reconnect behavior
  • install the disconnect waiter before app.start() to avoid the startup race where the socket can disconnect before the waiter is attached
  • wire trackEvent() through real inbound Slack activity paths: messages, reactions, member/channel/pin events, block actions, modal submit and close, slash commands, slash arg options, and slash arg actions
  • keep the send-route follow-up fix so derived Slack thread routes update the mirrored session key and pass the derived thread id to delivery
  • preserve mixed-case Slack session-key recovery with case-insensitive base-session comparison
  • preserve numeric derived thread ids for non-Slack channels so inferred topic/thread delivery does not regress

Why

  • quiet Slack workspaces should remain healthy when the socket is connected
  • Slack-specific reconnect and liveness handling should not change generic channel health behavior
  • real inbound activity should update event timestamps everywhere relevant instead of relying on fake connect-time activity
  • derived Slack thread routes must continue to deliver into the resolved thread rather than the channel root

Testing

Fully tested locally.

  • pnpm check
  • pnpm build
  • pnpm exec vitest run --config test/vitest/vitest.extension-slack.config.ts
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.gateway-methods.config.ts src/gateway/server-methods/send.test.ts

References

AI Disclosure

  • AI-assisted: Codex was used to help implement and validate this change.

@openclaw-barnacle openclaw-barnacle Bot added channel: slack Channel integration: slack gateway Gateway runtime size: L labels Apr 21, 2026
@bek91 bek91 marked this pull request as ready for review April 21, 2026 20:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 166ae81efe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/slack/src/monitor/provider.ts Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

This PR fixes quiet Slack workspaces being incorrectly marked stale by separating socket liveness from inbound event activity: connect no longer seeds lastEventAt, health state is driven by actual socket disconnect events, and skipStaleSocketHealthCheck: true is propagated through the status adapter for Slack only. Supporting changes include switching to an explicit SocketModeReceiver with autoReconnectEnabled: false, installing the disconnect waiter before app.start() to close a startup race, and threading trackEvent() through all real inbound Slack activity paths (messages, interactions, slash commands, modals). The send-route fix ensures derived Slack thread session keys and their thread IDs reach the delivery layer correctly.

Confidence Score: 4/5

Safe to merge; one P2 finding about clientOptions placement in SocketModeReceiver worth addressing before the next release.

All findings are P2 or lower. The one flagged concern (clientOptions going to installerOptions rather than the top-level SocketModeReceiver field) is unlikely to cause issues in practice because the App constructor still receives the correct options for web API calls, but it could silently miss proxy/TLS settings for the socket transport if those are ever configured.

extensions/slack/src/monitor/provider.ts — review the clientOptions placement in createSlackBoltApp for SocketModeReceiver.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/slack/src/monitor/provider.ts
Line: 676-682

Comment:
**`clientOptions` routed to `installerOptions`, not the top-level receiver field**

`SocketModeReceiverOptions` has a top-level `clientOptions: SocketModeOptions` for configuring the underlying `@slack/socket-mode` transport. The `installerOptions.clientOptions` slot is a `WebClientOptions` for the OAuth installer's web client, which is inactive here (no OAuth flow). As a result, options from `resolveSlackWebClientOptions()` land in the installer slot and never reach the socket transport. The `App` constructor still receives them directly so all handler-side Slack API calls are fine, but any proxy or TLS settings meant for the socket connection would be silently ignored.

Consider passing `clientOptions` at the top level of `SocketModeReceiver` instead of inside `installerOptions`.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: make slack socket health event-driv..." | Re-trigger Greptile

Comment thread extensions/slack/src/monitor/provider.ts
@steipete steipete force-pushed the fix/slack-socket-health branch from fee029a to 38a4e3f Compare April 22, 2026 02:46
@steipete steipete requested a review from a team as a code owner April 22, 2026 02:46
@steipete steipete force-pushed the fix/slack-socket-health branch from 38a4e3f to 02c89ff Compare April 22, 2026 02:58
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation channel: discord Channel integration: discord channel: matrix Channel integration: matrix channel: telegram Channel integration: telegram channel: whatsapp-web Channel integration: whatsapp-web app: web-ui App: web-ui agents Agent runtime and tooling channel: qqbot maintainer Maintainer-authored PR labels Apr 22, 2026
steipete added a commit to bek91/openclaw that referenced this pull request Apr 22, 2026
@steipete steipete force-pushed the fix/slack-socket-health branch from 97a6f1a to 205d5ae Compare April 22, 2026 03:11
@steipete steipete force-pushed the fix/slack-socket-health branch from a212f8e to 3657f5f Compare April 22, 2026 04:23
@openclaw-barnacle openclaw-barnacle Bot removed channel: whatsapp-web Channel integration: whatsapp-web agents Agent runtime and tooling channel: qqbot labels Apr 22, 2026
@openclaw-barnacle openclaw-barnacle Bot added scripts Repository scripts agents Agent runtime and tooling extensions: tokenjuice Changes to the bundled tokenjuice extension labels Apr 22, 2026
@steipete steipete force-pushed the fix/slack-socket-health branch from a13f7c7 to 2bf80d8 Compare April 22, 2026 07:48
@steipete steipete merged commit bee2e0f into openclaw:main Apr 22, 2026
101 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2bf80d8dd4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +120 to +124
if (Array.isArray(target) && Array.isArray(patch)) {
const merged = target.map((entry) => structuredClone(entry));
const indexById = new Map<string, number>();
for (const [index, entry] of merged.entries()) {
if (isObjectWithStringId(entry)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Match gateway array merge semantics in noop patch detection

isConfigPatchNoopForSnapshot can misclassify real config.patch updates as no-ops because applyQaMergePatch always takes this array-merge path and preserves the existing array when the patch array is empty. In gateway config.patch, non-id arrays are replaced (not merged), so a patch like {"tools":{"deny":[]}} over {"tools":{"deny":["image_generate"]}} is a real change; here it is treated as unchanged and runConfigMutation skips the RPC, leaving stale config that can taint subsequent QA scenarios.

Useful? React with 👍 / 👎.

Comment on lines +749 to +751
if (scenarioState.subagentFanoutPhase === 2 && prompt) {
scenarioState.subagentFanoutPhase = 3;
return fanoutCompleteReply;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Scope fanout completion reply to fanout-specific turns

This completion branch is now triggered by any non-empty prompt when subagentFanoutPhase === 2, even if the request is unrelated to fanout. Because scenarioState is shared across requests in the mock server, an interrupted fanout can leave phase 2 set and the next unrelated prompt will incorrectly receive subagent-1: ok\nsubagent-2: ok, polluting parity outputs.

Useful? React with 👍 / 👎.

Comment on lines 62 to +65
Object.assign(status, createConnectedChannelStatusPatch(at));
}
if (options?.transportActivity) {
Object.assign(status, createTransportActivityStatusPatch(at));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear stale transport timestamp on reconnect without sync proof

When reconnecting from a disconnected state, noteConnected sets connected via createConnectedChannelStatusPatch but only refreshes lastTransportActivityAt for SYNCING. If the prior transport timestamp is old, it is carried into the new connected lifecycle and can immediately trip stale-socket health checks before the first successful /sync, causing avoidable restart churn.

Useful? React with 👍 / 👎.

Comment on lines 173 to 175
export type ChannelStatusAdapter<ResolvedAccount, Probe = unknown, Audit = unknown> = {
defaultRuntime?: ChannelAccountSnapshot;
skipStaleSocketHealthCheck?: boolean;
/** Runtime `mode` values where `lastEventAt` can prove connected socket liveness. */
staleSocketHealthCheckModes?: readonly string[];
buildChannelSummary?: BivariantCallback<
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore removed stale-socket fields in public status adapter

Removing skipStaleSocketHealthCheck and staleSocketHealthCheckModes from ChannelStatusAdapter is a source-compatible break for third-party channel plugins that still provide those properties in status object literals. After this change, those plugins fail TypeScript excess-property checks on upgrade even though no major-version SDK break is being introduced.

Useful? React with 👍 / 👎.

Comment on lines 127 to 132
const healthPolicy: ChannelHealthPolicy = {
channelId,
now,
staleEventThresholdMs: timing.staleEventThresholdMs,
channelConnectGraceMs: timing.channelConnectGraceMs,
skipStaleSocketCheck: channelPluginStatus?.skipStaleSocketHealthCheck,
staleSocketHealthCheckModes: channelPluginStatus?.staleSocketHealthCheckModes,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve plugin stale-socket override wiring in health monitor

The monitor now evaluates health without reading channel plugin status overrides, so plugin-defined stale-socket exemptions are silently ignored at runtime. For plugins that relied on those overrides, connected accounts can now be restarted for stale-socket in modes they intentionally excluded, which is a behavioral regression independent of type-level cleanup.

Useful? React with 👍 / 👎.

medikoo pushed a commit to medikoo/openclaw that referenced this pull request Apr 24, 2026
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
zhonghe0615 pushed a commit to zhonghe0615/openclaw that referenced this pull request May 7, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling app: web-ui App: web-ui channel: discord Channel integration: discord channel: matrix Channel integration: matrix channel: slack Channel integration: slack channel: telegram Channel integration: telegram docs Improvements or additions to documentation extensions: qa-lab extensions: tokenjuice Changes to the bundled tokenjuice extension gateway Gateway runtime maintainer Maintainer-authored PR scripts Repository scripts size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Slack Socket Mode stale-socket health uses inbound app-event recency as transport liveness

2 participants