Skip to content

Gateway restart can silently abort an in-flight Discord turn, with no automatic recovery message to the user #69249

@slideshow-dingo

Description

@slideshow-dingo

Summary

When the OpenClaw gateway restarts during an active Discord-backed turn, the in-flight reply can abort, but the user may receive no visible recovery message in the channel. From the user perspective, the assistant appears to ghost until they manually ping again.

Environment

  • OpenClaw version: OpenClaw 2026.4.15 (041266a)
  • Surface: Discord
  • Session type: main Discord session (agent:main:discord:*)
  • Model in affected run: openai-codex/gpt-5.4
  • Host: Linux

Expected behavior

If a gateway restart interrupts an in-flight Discord turn, OpenClaw should do one of these automatically:

  1. resume the interrupted turn safely, or
  2. post a user-visible recovery message into the original channel, for example:
    • "Sorry, that turn was interrupted by an internal restart. I’m back now. Reply with continue if you want me to resume."

The user should not have to manually ping the bot just to learn that the previous turn died.

Actual behavior

  • the active turn aborts mid-flight
  • the gateway comes back healthy
  • no visible recovery message is posted into the affected Discord channel
  • the conversation appears stuck until the user sends another message

Reproduction outline

I do not have a minimal isolated repo repro yet, but the behavior appears reproducible with this sequence:

  1. start a reply in a Discord-backed main session
  2. while the reply is still running, restart the gateway
  3. observe that the active turn aborts
  4. observe that the gateway later returns healthy
  5. observe that the original Discord channel does not receive a recovery/apology/resume message

Evidence from the incident

In the affected session transcript, the interrupted run recorded a prompt error with an aborted operation, and the gateway restart timestamp matched the abort window.

Representative evidence from the local incident:

  • session transcript recorded openclaw:prompt-error
  • error text: This operation was aborted
  • user-visible symptom: silence in Discord until the user pinged again

I am intentionally not pasting private local paths or full local logs here, but I can provide redacted excerpts if maintainers want them.

Why this matters

A gateway restart is operationally recoverable, but the current behavior makes it look like the assistant silently stopped responding. That turns an internal restart into a user-facing ghosting incident.

Suspected area

This looks like a recovery gap after an interrupted in-flight turn, not just a Discord transport issue.

The failure mode seems to be:

  • active turn is interrupted by gateway restart
  • runtime records the turn as aborted
  • gateway comes back up
  • no mechanism posts a recovery message into the original chat session

Suggested fixes

One or more of these would solve the user-facing problem:

  1. persist enough metadata about in-flight user-visible turns to detect interrupted replies across restart
  2. on startup, detect recently aborted turns that have no visible assistant follow-up
  3. automatically post a recovery message into the original channel/session
  4. optionally expose interrupted turns as a first-class recoverable state instead of just surfacing an internal abort

Additional note

I added a local workaround that scans for aborted turns with no visible follow-up and posts a recovery message. That mitigates the symptom locally, but it feels like the platform should handle this natively.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions