Skip to content

openclaw update run mid-turn causes total message loss on Telegram (and likely Discord) #71178

@HeilbronAILabs

Description

@HeilbronAILabs

openclaw update run mid-turn causes total message loss on Telegram (and likely Discord)

Summary

When openclaw update is executed while an agent is mid-turn in a messaging channel (Telegram in this report; likely same for Discord), every assistant message generated during that turn is lost and never reaches the user. There is no fallback notification, no post-restart bridge message, and no surfacing of the orphaned turn after the gateway comes back up.

This is particularly pernicious because the agent itself can trigger the update in response to a user request — self-immolating its own response pipeline.

Environment

  • OpenClaw: 2026.4.21 → 2026.4.22 (upgrade observed)
  • macOS 26.4.1, node 25.9.0, pnpm install
  • Gateway under LaunchAgent ai.openclaw.gateway
  • Channel: Telegram, direct chat
  • Agent: claude-cli backend, model claude-opus-4-7
  • Date: 2026-04-24

Timeline (real incident)

Time (EDT) Event
09:17:52 User message arrives, dispatched: cli exec: provider=claude-cli model=opus promptChars=568
09:18:18 Agent generates assistant text #1 (locally buffered, not flushed)
09:18:52 Agent generates assistant text #2 (locally buffered, not flushed)
09:18:53 Agent begins openclaw update tool chain (status → dry-run → backup → update --yes)
09:19:13 Agent generates assistant text #3 (locally buffered, not flushed)
09:19:51 typing TTL reached (2m); stopping typing indicator
09:20:25 gateway signal SIGTERM received (from the in-progress update)
09:20:27 Gateway restarts on 2026.4.22
09:28:59 User sends follow-up "are you there?" — cli session reset: reason=system-prompt fires because the new version has a different system prompt; prior transcript orphaned

Outbound Telegram messages to the user between 09:17 and 09:28: zero. Confirmed by gateway log — first [telegram] sendMessage ok chat=... after the incident is in the new post-restart session.

User experience: message sent → typing indicator → typing disappears after ~2m → 9+ minutes of silence → user asks "are you there?".

Root causes (three distinct issues)

1. claude-cli buffers all assistant text until turn end

In claude-cli live-session mode, assistant text generated during a turn is not flushed to the channel until the turn fully completes. When the turn is killed mid-tool-use (here, by the gateway SIGTERM during the update), all buffered assistant text is lost — even though it was already generated and stored in the transcript.

Three complete assistant messages existed in the session transcript but were never delivered.

2. Gateway SIGTERM during agent turn has no graceful-drain

The update flow issues SIGTERM to the gateway as part of "Restart gateway service and run doctor checks". There is no detection of in-flight agent turns, no drain/wait, and no fallback user-facing notification ("your message was received but processing was interrupted by an upgrade").

3. System-prompt-change session reset orphans prior turn with no bridge

When the agent restarts on a new version whose system prompt differs, the affected session hits cli session reset: reason=system-prompt. The prior session's user message and any partial/buffered response are orphaned silently. No bridge message is sent on first post-reset activity ("your previous message was interrupted; please resend or continue").

Reproduction

  1. Send a message to a Telegram/Discord channel bound to an agent.
  2. Within the agent's response, have it run openclaw update --yes (or do it externally while the agent is mid-turn).
  3. Observe: gateway restarts, agent's buffered assistant text is never sent to the channel, user sees typing-indicator-then-silence.
  4. After restart, send a new message. Observe session reset with reason=system-prompt and no bridge to the prior turn.

Suggested fixes (in priority order)

  1. Flush assistant text in claude-cli as it's generated, not at turn end. At minimum, any assistant-message chunks preceding a tool call should be flushed to the channel before the tool executes. This alone would eliminate 90% of the user-visible impact.
  2. Graceful drain on gateway SIGTERM. Before tearing down the gateway for an update, detect active agent turns and either (a) wait up to N seconds for completion, or (b) emit a fallback "your request was interrupted by an upgrade — please retry" to the source channel.
  3. Post-reset bridge message. When a session is reset with reason=system-prompt and there is an orphaned unanswered user message in the prior transcript, emit a bridge message on first activity: "I was interrupted by an upgrade; your previous message was: ... please resend if needed."
  4. Agent-guidance warning. When an agent issues openclaw update and the source channel is the gateway's own messaging channel, either refuse (requires external execution) or warn explicitly.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions