Skip to content

[Bug]: Missing delivery queue monitoring leads to periodic gateway hangs #24353

@smile-xuc

Description

@smile-xuc

Summary

OpenClaw currently has no built-in monitoring for failed delivery queue items. When a message fails to send (e.g., Telegram message exceeds 4096 character limit), it causes:

  1. Periodic gateway hangs - Failed messages retry every 15-30 minutes, blocking the gateway
  2. No alerts - Users cannot detect queue jams proactively
  3. Manual debugging required - Users must inspect logs and filesystem to identify issues

Steps to reproduce

  1. Send a message longer than 4096 characters via Telegram (e.g., a detailed project report)
  2. Observe the error in logs:
    [delivery-recovery] Recovery time budget exceeded — 1 entries deferred to next restart
    
  3. Wait 15-30 minutes
  4. Gateway becomes unresponsive due to retry attempts
  5. Repeat indefinitely

Expected behavior

  • ✅ Users should be notified when messages fail to deliver
  • ✅ Failed items should have configurable retry limits
  • ✅ Stuck messages should be auto-moved to dead-letter queue after N retries
  • ✅ Gateway should provide queue status via CLI or API

Actual behavior

  • ❌ No notification when messages fail
  • ❌ Infinite retry loop (or until manual intervention)
  • ❌ No CLI command to check queue status
  • ❌ Gateway hangs periodically during retry attempts

OpenClaw version

2026.2.22-2

Operating system

macOS 26.3 (arm64)

Install method

Node: 25.6.1

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions