Skip to content

[Bug] Telegram gateway drops in-flight messages on sendChatAction network failure during hot reload #71429

@sync-tachikoma

Description

@sync-tachikoma

Summary

During gateway hot reload, sendChatAction calls fail with network errors and any in-flight outbound message that has not yet been delivered to Telegram is silently dropped. The agent appears to have responded (text exists in internal INFO stream and is visible in webchat UI), but the user on Telegram never receives the message.

Environment

  • OpenClaw runtime: 2026.4.20
  • Channel: telegram (long-poll bot mode)
  • Trigger: gateway hot reload during a live agent turn
  • Affected agent: agent:kusanagi:telegram:direct:<redacted> (Anthropic Claude Opus 4.7)

Repro

  1. Start a long-poll telegram bot via OpenClaw gateway
  2. Captain sends a message that triggers a multi-tool agent turn (~10s+)
  3. Trigger gateway hot reload while the agent is mid-response
  4. Observe sendChatAction retry/failure log entries (≥10 in our case at 2026-04-24 21:01:25 KST, 12 consecutive failures)
  5. Outcome: the agent's final text appears in webchat UI (internal INFO stream) but never arrives at the Telegram user

Expected

In-flight outbound messages should either (a) be queued to a durable buffer and re-sent post-reload, or (b) abort the turn and surface a clear failure to both UI and the agent itself so it can retry.

Actual

Silent drop. Mismatch between internal stream (UI sees response) and external delivery (Telegram user does not). User experiences "ghost" turn — agent did the work but the answer never arrived. Captain has to manually copy from UI back to Telegram or restart the conversation.

Impact

  • User-facing reliability: silent message loss is the worst possible failure mode
  • Cross-channel inconsistency: webchat UI and Telegram diverge
  • Recovery cost: human (Captain) becomes the manual relay

Suggested fix direction

  • Persist outbound queue to disk (small append-only journal per channel) — survives reload
  • On reload, drain queue before resuming new turns
  • Treat sendChatAction retry exhaustion as a hard turn failure with explicit notification to both ends
  • Optional: expose a gateway.outbound.queue.depth metric so operators can detect backlog

Cross-references

  • NOW-003 (a) — auto-compaction issue (related context-loss family)
  • YET-004 — /halt gateway interception feature request (same gateway boundary)
  • YET-003 — plugin-level security enforcement (related gateway/plugin work)

Notes

This is reported as part of OpenClaw multi-agent operating model hardening (Captain JS — Kusanagi/Tachikoma/Togusa/Bato 4-agent setup). Happy to share full log excerpt and run-time config on request (will redact tokens).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions