Skip to content

[Bug]: Slack channel final replies silently dropped on 2026.5.3 when agent emits [thinking,text]-only turn under visibleReplies="message_tool" #77320

@pandore

Description

@pandore

Summary

On OpenClaw 2026.5.3 with messages.groupChat.visibleReplies = "message_tool" (the schema-migration default applied by doctor --fix) and a Slack-backed agent (claude-cli/claude-opus-4-7), an assistant turn that produces thinking → text → end_turn with zero mcp__openclaw__message tool calls completes cleanly, persists the final text to the claude-cli transcript, and is silently dropped by the Slack delivery layer. From the user's perspective the agent goes silent in the Slack thread — indistinguishable from "still thinking" or "stuck".

The current contract is consistent with visibleReplies = "message_tool" as documented, but:

  1. There is no log line in gateway.log indicating that delivery was suppressed. The failure is invisible without grepping the claude-cli per-session JSONL.
  2. There is no runtime safety net (auto-promote, hard diagnostic, or fallback) when a group-chat turn closes without the required tool call.
  3. This is the Slack analog of the Telegram-side bugs in [Bug]: Telegram forum topic final replies generated but not delivered on 2026.5.2 #76554 / [Bug]: deliveryMode="final_only" silently drops long tool-loop replies when no phase=final chunk is emitted #76828 / [Bug]: Telegram transcript contains final reply but no outbound send occurs for [thinking,text] turn on 2026.4.11 #66459, which were closed (or remain open) for Telegram-specific code paths.

The agent calls mcp__openclaw__message correctly on most turns (~5/6 on the affected agent during the observation window). The failure mode is concentrated on [thinking,text]-only turn shapes — typically meta-discussion or short conversational replies where the model doesn't reach for tools. This is the same turn shape called out as still-broken in open issue #66459.

Environment

  • OpenClaw: 2026.5.3 (verified via openclaw statusapp 2026.5.3)
  • Runtime: Node v24.15.0, Linux 6.8.0-87-generic, Hetzner CPX32
  • Channel: Slack workspace T9FGQDB9D, group channel C0B0XQ03ZN3 (is_group_chat: true), thread reply
  • Provider/model: claude-cli/claude-opus-4-7 (claude-cli v2.1.116 / claude-cli session metadata reports version 2.1.126 for the inner SDK)
  • Agent: podavach-shopify
  • claude-cli SDK session via OpenClaw bundled backend
  • Active config keys (relevant):
    {
      "messages": { "groupChat": { "visibleReplies": "message_tool" } },
      "channels": {
        "slack": {
          "accounts": {
            "podavach-shopify": {
              "dm": { "enabled": true, "groupEnabled": true },
              "replyToMode": "off",
              "replyToModeByChatType": { "direct": "...", "group": "...", "channel": "..." },
              "requireMention": true,
              "streaming": null
            }
          }
        }
      }
    }

Reproduction

Pre-existing thread in a Slack group channel where the agent has already replied successfully via mcp__openclaw__message earlier in the day.

  1. User posts a meta question in the thread that doesn't naturally invite tool use (e.g. asking the agent how its scheduling/triggers work, or a short conversational follow-up).
  2. Agent's claude-cli session enqueues the inbound message normally:
    queue-operation enqueue → dequeue → user message → assistant turn
    
  3. Assistant emits the turn as: one thinking content block (encrypted signature, ~1700 output tokens), then one text content block, then stop_reason: end_turn. No tool_use blocks.
  4. The text content is persisted to ~/.claude/projects/<cwd-mangled>/<sessionId>.jsonl.
  5. No corresponding outbound entry appears in the OpenClaw delivery-mirror file ~/.openclaw/agents/<agent>/sessions/<sessionId>-topic-<thread_ts>.jsonl.
  6. No chat.postMessage / Slack API send appears in journalctl --user -u openclaw-gateway.service for that channel ID in the relevant window.
  7. No warning, error, or diagnostic log line is emitted indicating that delivery was suppressed.
  8. Slack thread shows the user's message and nothing after.

Evidence — observed turn shape

Successful prior turn in the same session, same thread (mcp__openclaw__message was called):

12:16:50.568  assistant  thinking
12:17:02.535  assistant  tool_use:  mcp__openclaw__message      ← delivers to Slack
12:17:03.310  user       tool_result                            ← delivery confirmed
12:17:07.388  assistant  text:      "[…cropped agent reply…]"   ✅ DELIVERED

Failed turn ~3 minutes later in the same session, same thread (no tool call):

12:20:09       inbound:  Slack message in #website, was_mentioned=true,
                         reply_to_id=1777885573.897639, sender=Oleksii Nikitin (U9FAR6MHC),
                         message_id=1777897207.975269
12:20:10.008   queue-operation enqueue   sessionId=75dbc854-5235-4ede-8d17-9d205f557ea5
12:20:10.009   queue-operation dequeue
12:20:33.602   assistant  thinking            (1767 output tokens, encrypted signature)
                                              requestId=req_011CahWzSDh5DUAQLfksZeju
                                              msg_id=msg_01K2j7n2pm8u7mgnB99GZhq7
12:20:42.169   assistant  text:      "[…cropped agent reply…]"
                                              same requestId, same msg_id
12:20:42.169   end_turn   (stop_reason: end_turn, no tool_use blocks emitted)

Same requestId and msg_id on both thinking and text entries — this is one Anthropic API response, split into two transcript records as streaming chunks landed. The turn is structurally complete and clean.

Delivery-mirror file for the affected thread (last entry far predates the failed turn):

~/.openclaw/agents/podavach-shopify/sessions/aa828b52-167b-41e8-b6de-ff2e8970ee75-topic-1777885573.897639.jsonl
last entry: 2026-05-04T09:47:03.096Z   (~2.5h before failed turn)

Gateway journal for the channel ID during 12:19:00 – 12:35:00 window:

$ journalctl --user -u openclaw-gateway.service \
    --since "2026-05-04 12:19:00" --until "2026-05-04 12:35:00" \
    | grep -iE "C0B0XQ03ZN3|c0b0xq03zn3|deliver|chat\.post|sendMessage"
(no output)

Recurring stuck-session diagnostics earlier in the day on the same thread (different sessions / earlier turns) suggest this is not a one-off:

[diagnostic] stuck session: sessionId=podavach-shopify
sessionKey=agent:podavach-shopify:slack:channel:c0b0xq03zn3:thread:1777885573.897639
state=processing age=136s queueDepth=1
reason=queued_work_without_active_run classification=stale_session_state

These fire then [diagnostic] stuck session recovery skipped: reason=active_reply_work action=keep_lane — the gateway sees that work is in progress and stays out of the way, which is the right thing to do. But when the work finishes with no tool call, nothing fills the gap.

What it is NOT

Expected behavior

When a group-chat turn closes under messages.groupChat.visibleReplies = "message_tool" with zero mcp__openclaw__message tool calls AND a non-empty final text content block, the runtime should do at least one of:

(a) Auto-promote on turn-end. Detect this turn shape and dispatch the final assistant text block to the originating chat using the channel's natural delivery path (same path DMs use). Preserves operator intent (model is still encouraged to call the tool) without dropping the reply.

(b) Hard diagnostic. Emit a gateway-log line at warn or error level:

[stream] turn closed with 0 message-tool emissions in group chat;
visibleReplies=message_tool suppressed N text chunks
sessionId=<id> sessionKey=<key> requestId=<req>

So operators can grep the failure rather than discovering it days later in a user complaint. Currently the failure is utterly silent in the gateway log; we had to grep mcp__openclaw__message tool_use entries against assistant text entries in the per-session claude-cli JSONL to find it.

(c) Per-channel/runtime hint surfaced to the agent. For sessions destined to group-chat threads under visibleReplies = "message_tool", attach a system-prompt hint like requiredDeliverySurface: "tool:mcp__openclaw__message" so the agent does not have to relearn the contract on every turn.

Today the runtime appears to do none of these.

Suggested fix direction

In order of operator-utility-per-engineering-cost:

  1. Hard diagnostic (cheapest, highest debugging-value). One log line whenever a group-chat turn closes with zero mcp__openclaw__message tool calls under visibleReplies = "message_tool". No behavior change, just observability. Solves the "silently lost reply" UX class even before any auto-promote work.
  2. Auto-promote-last-text on turn-end for group chats under visibleReplies = "message_tool". Mirrors the suggested fix in [Bug]: deliveryMode="final_only" silently drops long tool-loop replies when no phase=final chunk is emitted #76828 (auto-promote-last-commentary on final_only). Preserves the explicit-tool ergonomic for normal flows; covers the long tail of [thinking,text]-only meta turns.
  3. Per-account opt-in fallback mode visibleReplies = "message_tool_or_last_text" so operators can opt into the safety net without giving up the explicit-tool default fleet-wide.

Related issues

Notes / scope

  • Reproduced on a real workload, not a synthetic test. Single agent across a Slack channel migration cohort. Not yet swept across the other 6 agents on the same host but the schema migration applied identically across all 7.
  • Reproduces on [thinking,text]-only turn shapes; cannot trivially force the model into this shape, so a deterministic minimal repro is not yet attached. Happy to capture more transcripts if useful.
  • Workspaces use claude-cli SDK sessions; an OpenAI-Codex / native-Anthropic provider was not tested for this surface in 5.3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions