-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
Slack long-running runs can appear silent when media delivery partially fails and recovery refuses replay #83165
Copy link
Copy link
Open
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Bug report draft: Slack long-running runs can appear silent after partial media delivery failure
Title
Slack long-running runs can appear silent when media delivery partially fails and recovery refuses replay
Version / environment
f066dd2)gpt-5.5agents.defaults.contextInjection = "continuation-skip"agents.defaults.compaction.mode = "safeguard"agents.defaults.compaction.reserveTokensFloor = 40000agents.defaults.compaction.timeoutSeconds = 240agents.defaults.timeoutSeconds = 900Summary
Some Slack long-running sessions can look like they silently stop responding. The underlying work may have completed or partially completed, but final Slack delivery involving media can enter a
send_attempt_started/partial delivery failure (bestEffort)state. After a gateway restart, delivery recovery refuses blind replay for safety, leaving no visible user-facing failure/recovery message in the Slack thread.Separately, status/progress visibility can disappear during these failures, making it hard for the user to know whether the run is still active, aborted, or only failed during delivery.
Observed evidence
During one evening of usage, logs showed multiple reliability/observability symptoms:
recoveryState: "send_attempt_started"lastError: "partial delivery failure (bestEffort)"Found 3 pending delivery entries — starting recoverydelivery state is send_attempt_started; refusing blind replay without adapter reconciliationDelivery recovery complete: 0 recovered, 3 failed, 0 skipped (max retries), 0 deferred (backoff)draining 2 active task(s) and 1 active embedded run(s) before restart with timeout 300000ms[responses] ... message=Request was abortedfetch timeout reached; aborting operationinternal_server_error/ HTTP 502 from image/model providers[timeout-compaction] compaction did not reduce context ... falling through to normal handling[pi] discarded invalid tool result middleware output for messageTool output unavailable due to post-processing errorlong-running session ... queued_behind_active_work ... activeWorkKind=model_call ... recovery=noneActual behavior
From the Slack user perspective:
Expected behavior
OpenClaw should make these failure modes visible and recoverable without risking duplicate spam:
send_attempt_startedand cannot be safely replayed, post or queue a small fallback notice such as:Why this matters
The current behavior makes completed or partially completed runs indistinguishable from hung runs. This is especially painful for long-running image/video workflows where the final payload often includes media and the user depends on Slack progress/status to know whether to wait, retry, or inspect logs.
Privacy note
This report intentionally omits local paths, channel IDs, user IDs, tokens, and media names. Full local logs can be provided privately if needed.