Slack long-running runs can appear silent when media delivery partially fails and recovery refuses replay

# Bug report draft: Slack long-running runs can appear silent after partial media delivery failure

## Title
Slack long-running runs can appear silent when media delivery partially fails and recovery refuses replay

## Version / environment
- OpenClaw: 2026.5.12 (`f066dd2`)
- Runtime: macOS / LaunchAgent gateway
- Channel: Slack Socket Mode
- Model route: OpenAI Responses-compatible provider, `gpt-5.5`
- Context window observed: 200k
- Relevant config:
  - `agents.defaults.contextInjection = "continuation-skip"`
  - `agents.defaults.compaction.mode = "safeguard"`
  - `agents.defaults.compaction.reserveTokensFloor = 40000`
  - `agents.defaults.compaction.timeoutSeconds = 240`
  - `agents.defaults.timeoutSeconds = 900`
  - Slack status reactions enabled

## Summary
Some Slack long-running sessions can look like they silently stop responding. The underlying work may have completed or partially completed, but final Slack delivery involving media can enter a `send_attempt_started` / `partial delivery failure (bestEffort)` state. After a gateway restart, delivery recovery refuses blind replay for safety, leaving no visible user-facing failure/recovery message in the Slack thread.

Separately, status/progress visibility can disappear during these failures, making it hard for the user to know whether the run is still active, aborted, or only failed during delivery.

## Observed evidence
During one evening of usage, logs showed multiple reliability/observability symptoms:

- 3 failed delivery queue entries from the same day with:
  - `recoveryState: "send_attempt_started"`
  - `lastError: "partial delivery failure (bestEffort)"`
  - payloads were Slack text plus local media attachment(s)
- On gateway restart, recovery logged:
  - `Found 3 pending delivery entries — starting recovery`
  - `delivery state is send_attempt_started; refusing blind replay without adapter reconciliation`
  - `Delivery recovery complete: 0 recovered, 3 failed, 0 skipped (max retries), 0 deferred (backoff)`
- Gateway restart happened while work was active:
  - `draining 2 active task(s) and 1 active embedded run(s) before restart with timeout 300000ms`
- Additional logs around the same period included:
  - `[responses] ... message=Request was aborted`
  - `fetch timeout reached; aborting operation`
  - upstream `internal_server_error` / HTTP 502 from image/model providers
  - `[timeout-compaction] compaction did not reduce context ... falling through to normal handling`
  - `[pi] discarded invalid tool result middleware output for message`
  - `Tool output unavailable due to post-processing error`
  - `long-running session ... queued_behind_active_work ... activeWorkKind=model_call ... recovery=none`

## Actual behavior
From the Slack user perspective:
- A long-running task may appear to stop responding.
- Progress/status indicators may no longer show useful state.
- If final delivery partially fails, the user may not see a final success, a final error, or a recovery notice.
- After restart, recovery refuses blind replay, which is understandable, but the user is not clearly informed that delivery was left unresolved.

## Expected behavior
OpenClaw should make these failure modes visible and recoverable without risking duplicate spam:

1. If Slack delivery enters `send_attempt_started` and cannot be safely replayed, post or queue a small fallback notice such as:
   - “A previous reply may have partially failed during Slack delivery. It was not replayed automatically to avoid duplicates. Run a recovery command or inspect delivery queue.”
2. Provide adapter reconciliation for Slack deliveries where possible:
   - check whether a message/file was actually posted before refusing replay permanently;
   - if ambiguous, expose a clear manual recovery action.
3. Keep status/progress visible for long-running sessions even if the final delivery path fails.
4. Treat post-processing / invalid tool result middleware errors as observable diagnostic events, not silent status loss.
5. For media delivery, consider a safer two-phase pattern:
   - send text/status first;
   - upload media second;
   - if media upload fails, leave the text reply visible with a retry/recovery hint.

## Why this matters
The current behavior makes completed or partially completed runs indistinguishable from hung runs. This is especially painful for long-running image/video workflows where the final payload often includes media and the user depends on Slack progress/status to know whether to wait, retry, or inspect logs.

## Privacy note
This report intentionally omits local paths, channel IDs, user IDs, tokens, and media names. Full local logs can be provided privately if needed.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slack long-running runs can appear silent when media delivery partially fails and recovery refuses replay #83165

Bug report draft: Slack long-running runs can appear silent after partial media delivery failure

Title

Version / environment

Summary

Observed evidence

Actual behavior

Expected behavior

Why this matters

Privacy note

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Slack long-running runs can appear silent when media delivery partially fails and recovery refuses replay #83165

Description

Bug report draft: Slack long-running runs can appear silent after partial media delivery failure

Title

Version / environment

Summary

Observed evidence

Actual behavior

Expected behavior

Why this matters

Privacy note

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions