Skip to content

sessions_send announce retry blocks agent session for ~6 minutes on channel errors #53204

@claudiobedino

Description

@claudiobedino

Problem

When an agent uses sessions_send with timeoutSeconds > 0 to query another agent, and the announce step fails (e.g. after a gateway restart when channels are temporarily unavailable), the announce retry loop blocks the entire agent session for several minutes.

Observed behavior

  1. Agent A calls sessions_send(sessionKey, message, timeoutSeconds: 60) targeting Agent B
  2. Agent B responds successfully
  3. OpenClaw attempts the "announce" step (posting the reply to Agent A's chat channel)
  4. Announce fails with "Unknown channel: telegram" (channel not yet re-registered after gateway restart)
  5. Gateway retries 4 times with exponential backoff (5s → 10s → 20s delays, each with a 90s timeout)
  6. During this entire retry window (~6 minutes), Agent A's session is blocked — no inbound messages are processed
  7. From the user's perspective, the agent appears dead/unresponsive

Log evidence

22:36:29 [warn] Subagent announce completion direct announce agent call transient failure, retrying 2/4 in 5s: gateway timeout after 90000ms
22:38:04 [warn] Subagent announce completion direct announce agent call transient failure, retrying 3/4 in 10s: gateway timeout after 90000ms
22:39:44 [warn] Subagent announce completion direct announce agent call transient failure, retrying 4/4 in 20s: gateway timeout after 90000ms
22:41:23 [ws] ⇄ res ✗ agent errorCode=UNAVAILABLE errorMessage=Error: Unknown channel: telegram

Impact

  • Agent becomes completely unresponsive for ~6 minutes
  • User messages during this window are either dropped or queued silently
  • In a multi-agent setup with frequent config changes (22 gateway restarts in one day via config.patch), this happens regularly

Environment

  • OpenClaw 2026.3.22 (4dcc39c)
  • Multi-agent setup (5 agents, Telegram channel with 4 bot accounts)
  • Gateway restarts triggered by config.patch from Control UI

Suggested improvements

  1. Non-blocking announce: Run the announce step asynchronously so it does not block the agent session from processing new inbound messages
  2. Shorter announce timeout: The 90s gateway timeout per retry attempt is very long; a configurable timeout (e.g. 10-15s) would limit the blast radius
  3. Circuit breaker: If the channel is known-unavailable (e.g. post-restart), skip or defer the announce rather than retrying against a dead channel
  4. Agent-side control: Allow agents to opt into ANNOUNCE_SKIP behavior via sessions_send parameters (e.g. announce: false) rather than requiring the target agent to reply with the magic string

Current workaround

Using timeoutSeconds: 0 (fire-and-forget) + sessions_history to read the response avoids the announce step entirely, but loses the synchronous request-reply convenience.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions