Skip to content

fix(slack): harden thread continuation gating#75630

Open
soichiyo wants to merge 6 commits intoopenclaw:mainfrom
soichiyo:feat/slack-thread-continuity-phase1
Open

fix(slack): harden thread continuation gating#75630
soichiyo wants to merge 6 commits intoopenclaw:mainfrom
soichiyo:feat/slack-thread-continuity-phase1

Conversation

@soichiyo
Copy link
Copy Markdown

@soichiyo soichiyo commented May 1, 2026

This is a follow-up to #58731 after the SDK-backed Slack thread participation persistence landed on main. I rebased onto the current state-store path, dropped the competing JSON participation store from this branch, and removed the unrelated Dockerfile assertion from the original branch.

This also closes the loop on the symptom in #75969: unresolved Slack thread replies are now marked ambiguous and skipped before they can be routed as root channel messages.

Changes:

  • use the existing SDK-backed thread participation lookup for mention gating
  • remove the unreliable child-ts conversations.replies fallback path
  • preserve cached unresolved replies as _ambiguousThreadReply
  • drop ambiguous thread replies before they can be treated as root channel messages
  • wire threadContinuation through the legacy mention-gating helpers
  • add focused regression coverage for the review feedback

Validation:

  • pnpm test extensions/slack/src/sent-thread-cache.test.ts extensions/slack/src/monitor/message-handler/prepare.test.ts extensions/slack/src/monitor/monitor.thread-resolution.test.ts extensions/slack/src/monitor.threading.missing-thread-ts.test.ts src/channels/mention-gating.test.ts

Real behavior proof

  • Behavior or issue addressed: Slack replies with parent_user_id but no resolvable thread_ts should not continue downstream as ordinary root channel messages. The resolver should mark them ambiguous, including the cached-null path, so prepareSlackMessage can drop them before root routing.
  • Real environment tested: Local OpenClaw development checkout on macOS, rebased to openclaw/openclaw@31a710c5a6, running the production Slack thread resolver from this PR head (1573a992afdaa87e382681a7f65416f2a66dc80d) with Node v25.9.0.
  • Exact steps or command run after this patch: Ran this after the rebase and push:
node --import tsx -e '
import { createSlackThreadTsResolver } from "./extensions/slack/src/monitor/thread-resolution.ts";
const historyCalls = [];
const message = { type: "message", user: "U1", text: "follow up", ts: "201.000", parent_user_id: "U2", channel: "C123", channel_type: "channel" };
const resolver = createSlackThreadTsResolver({
  client: { conversations: { history: async (args) => { historyCalls.push(args); return { messages: [{ ts: "201.000" }] }; } } },
  cacheTtlMs: 60000,
  maxSize: 5,
});
const first = await resolver.resolve({ message, source: "message" });
const second = await resolver.resolve({ message, source: "message" });
console.log(JSON.stringify({
  first: { thread_ts: first.thread_ts ?? null, ambiguous: first._ambiguousThreadReply === true },
  second: { thread_ts: second.thread_ts ?? null, ambiguous: second._ambiguousThreadReply === true },
  historyCalls: historyCalls.length,
}, null, 2));
'
  • Evidence after fix: Terminal output from the production resolver run:
{
  "first": {
    "thread_ts": null,
    "ambiguous": true
  },
  "second": {
    "thread_ts": null,
    "ambiguous": true
  },
  "historyCalls": 1
}
  • Observed result after fix: The first unresolved lookup and the cached-null lookup both return ambiguous: true with no thread_ts, and the history lookup is only called once. The focused prepare-path regression also verifies that a message carrying _ambiguousThreadReply returns null instead of being routed as a root message.
  • What was not tested: No live Slack workspace message was sent from this checkout; the live evidence above exercises the production resolver path locally, with the downstream prepare drop covered by focused regression coverage.

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 1, 2026

Codex review: needs maintainer review before merge.

Summary
The PR marks Slack replies with parent_user_id but unresolved thread_ts as ambiguous, drops those messages before root-channel handling, wires deprecated mention-gating helpers for thread continuation, adds focused regressions, and updates the changelog.

Reproducibility: yes. Source inspection on current main shows a Slack reply with parent_user_id, no resolvable thread_ts, and a null or failed history lookup is returned as the original message, and prepareSlackMessage has no ambiguity drop before root handling.

Real behavior proof
Sufficient (terminal): The PR body includes after-fix terminal output from a local OpenClaw checkout running the production Slack resolver and showing fresh plus cached unresolved lookups marked ambiguous.

Next step before merge
No narrow automated repair remains; maintainer review and exact-head validation are the remaining actions before merge.

Security
Cleared: The diff is limited to Slack runtime gating/logging, focused tests, shared mention-gating compatibility, and changelog text with no dependency, CI, permission, package, or secret-handling changes.

Review details

Best possible solution:

Land this PR after maintainer review and exact-head validation, keeping the fix scoped to Slack thread resolution, prepare-time ambiguity dropping, and compatibility mention-gating helpers.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection on current main shows a Slack reply with parent_user_id, no resolvable thread_ts, and a null or failed history lookup is returned as the original message, and prepareSlackMessage has no ambiguity drop before root handling.

Is this the best way to solve the issue?

Yes. Marking unresolved resolver results as ambiguous and dropping them in Slack prepare is the narrowest maintainable fix, and it avoids reintroducing the unreliable conversations.replies fallback from the earlier branch.

What I checked:

  • Current main returns unresolved replies unchanged: createSlackThreadTsResolver returns the original message when a cached unresolved lookup is null or a history lookup cannot resolve thread_ts, so the downstream code cannot distinguish it from a real root message. (extensions/slack/src/monitor/thread-resolution.ts:90, 83aad863fd77)
  • Current main has no ambiguity drop in prepare: prepareSlackMessage proceeds directly into resolveInboundMentionDecision after computing mention policy; there is no _ambiguousThreadReply check before root-message routing. (extensions/slack/src/monitor/message-handler/prepare.ts:510, 83aad863fd77)
  • Existing participation state is available: Current main already has SDK-backed Slack thread participation lookup and memory backfill through hasSlackThreadParticipationWithPersistence, which the PR reuses instead of adding a competing store. (extensions/slack/src/sent-thread-cache.ts:138, 83aad863fd77)
  • PR diff targets the gap: The PR adds _ambiguousThreadReply, returns it for fresh and cached unresolved thread lookups, drops ambiguous replies in prepareSlackMessage, and adds regression coverage for resolver, prepare, missing-thread, and mention-gating paths. (extensions/slack/src/monitor/thread-resolution.ts:17, 1573a992afda)
  • Contributor supplied after-fix terminal proof: The PR body includes a local terminal run against PR head 1573a992afdaa87e382681a7f65416f2a66dc80d showing both fresh and cached unresolved resolver calls return ambiguous: true; the proof label was updated to proof: supplied. (1573a992afda)
  • Exact-head checks are mostly green: GitHub check-runs for the exact head show successful latest real-behavior-proof and many successful checks, with only older superseded cancelled/failure proof runs in the aggregate list. (1573a992afda)

Likely related people:

  • amknight: Authored the merged SDK-backed Slack thread participation persistence work that this PR reuses for mention gating. (role: recent feature-history owner; confidence: high; commits: d0ec3d1f09b0; files: extensions/slack/src/sent-thread-cache.ts, extensions/slack/src/monitor/message-handler/prepare.ts, extensions/slack/src/monitor/message-handler/dispatch.ts)
  • steipete: Recent history shows repeated maintenance of Slack prepare/thread-resolution behavior and shared inbound mention policy around this path. (role: recent Slack and shared gating maintainer; confidence: high; commits: 6b525023d4d2, 05eda57b3c72, 7934a2390c06; files: extensions/slack/src/monitor/thread-resolution.ts, extensions/slack/src/monitor/message-handler/prepare.ts, src/channels/mention-gating.ts)
  • vincentkoc: Recent Slack hot-path performance commits changed the same prepare-time participation lookup area now touched by the PR. (role: recent adjacent maintainer; confidence: medium; commits: ac74a928456d, c0302512d4dd; files: extensions/slack/src/monitor/message-handler/prepare.ts)

Remaining risk / open question:

  • No live Slack workspace message was sent from this checkout; the supplied proof is a local terminal run of the production resolver plus focused regression coverage.
  • This PR addresses the unresolved thread_ts ambiguity path, while broader Slack thread/session linkage symptoms remain separately tracked at [Bug]: slack responses delivered to wrong thread or outside of thread #75969.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 83aad863fd77.

Re-review progress:

@soichiyo soichiyo force-pushed the feat/slack-thread-continuity-phase1 branch from de52097 to 91894a0 Compare May 1, 2026 12:23
Copy link
Copy Markdown
Contributor

@martingarramon martingarramon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static review (no fresh checkout / pnpm test). Diff reads clean against the stated goal. A few notes:

Tie to #75969: the user symptom there ("thread message arrives into channel root") matches the new _ambiguousThreadReply → return null path in extensions/slack/src/monitor/message-handler/prepare.ts:493-505 exactly. Worth surfacing in the PR body for an extra close-loop.

One out-of-scope change: extensions/slack/src/monitor/message-handler/prepare.ts:612 drops as AckReactionScope | undefined to as AckReactionScope. Unrelated to the PR title — separate cleanup, or upstream guarantee change?

Possible double-count: src/channels/mention-gating.ts:205-212 adds bot_thread_participant via threadContinuation, but the call site in extensions/slack/src/monitor/message-handler/prepare.ts still passes it directly via implicitMentionKinds. If both fire, matchedKinds could list the kind twice — a test for this case or a call-site dedupe is recommended.

logVerbose guard drops in thread-resolution.ts: safe — logVerbose self-gates at src/globals.ts:10-23. No log-spam regression.

CI red: the 10 failures look PR-base-stale — extensions/matrix/src/matrix/monitor/handler.ts:1834-1835 errors (liveDmAllowFrom, dmScope-on-wrong-type) don't reproduce on current origin/main; matrix-side commits 7e8d95b413 and f6a1d70080 likely settled the typing. A rebase + push should green most of them.

CI will be authoritative; the read above spans every Slack-side branch I can reason about. Since src/channels/mention-gating.ts is shared infra, would also want regression coverage on at least one other channel (Discord/Telegram) before relying on threadContinuation cross-channel.

@soichiyo soichiyo force-pushed the feat/slack-thread-continuity-phase1 branch from b9d3b13 to 7c1b2fa Compare May 3, 2026 10:58
@soichiyo
Copy link
Copy Markdown
Author

soichiyo commented May 3, 2026

Thanks for the review!

I updated the PR body to call out the #75969 symptom explicitly.

On the threadContinuation note: Slack currently passes the participation signal through implicitMentionKinds; the threadContinuation field is for the deprecated compatibility helpers. That helper path now maps it to bot_thread_participant, and the core decision still dedupes matched implicit kinds.

The AckReactionScope cast cleanup came from the lint/type path while rebasing. Happy to split or revert that small cleanup if you’d prefer keeping this PR strictly Slack-thread scoped.

Focused tests pass on the rebased head:
pnpm test extensions/slack/src/sent-thread-cache.test.ts extensions/slack/src/monitor/message-handler/prepare.test.ts extensions/slack/src/monitor/monitor.thread-resolution.test.ts extensions/slack/src/monitor.threading.missing-thread-ts.test.ts src/channels/mention-gating.test.ts

soichiyo and others added 6 commits May 8, 2026 09:59
…ase 1)

Thread continuation in Slack channels with requireMention was fragile
because participation state lived only in a 24-hour in-memory cache
that was lost on process restart. This caused threads to silently
drop after gateway updates.

Phase 1 changes:
- Add persistent thread participation store (JSON file alongside
  session store) so restart no longer loses thread continuation
- Add structured diagnostic logs at mention gating, thread
  continuation decisions, and participation recording to make
  silent failures traceable
- Elevate thread_ts resolution failure from verbose-only to
  always-visible warning since it causes silent degradation
- Pass storePath through PreparedSlackMessage so dispatch can
  persist participation without re-resolving the path

Note: pre-commit tsgo blocked by pre-existing matrix test errors
(5 errors in extensions/matrix/ unrelated to this change).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on gating (Phase 2)

Phase 2A — thread_ts resolution improvement:
- Add conversations.replies as bounded fallback when conversations.history
  fails to resolve thread_ts for messages with parent_user_id
- Mark unresolvable messages as _ambiguousThreadReply instead of silently
  treating them as root messages
- Log ambiguous thread reply status in mention skip reasons

Phase 2B — gating separation:
- Split implicitMention into two distinct signals:
  - implicitMention: bot is the thread parent (root-message signal)
  - threadContinuation: bot has previously replied (participation signal)
- Add threadContinuation as a first-class parameter to mention gating
  so root-message gating and thread-continuation gating are independently
  visible in logs and code
- Backward compatible: threadContinuation is optional, existing callers
  (Teams, Google Chat, etc.) are unaffected

Note: pre-commit tsgo blocked by pre-existing matrix test errors
(5 errors in extensions/matrix/ unrelated to this change).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@soichiyo soichiyo force-pushed the feat/slack-thread-continuity-phase1 branch from 7c1b2fa to 1573a99 Compare May 8, 2026 01:02
@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 8, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: slack Channel integration: slack proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants