Skip to content

[Feature]: Add TTL/Expiry for Delivery Queue Messages #16555

@rohan-ixlayer

Description

@rohan-ixlayer

Summary

Add configurable TTL for delivery queue messages to prevent stale/orphaned entries from flooding channels on gateway restart

Problem to solve

The delivery queue (introduced in v2026.2.13) persists outbound messages to disk indefinitely. When the gateway restarts, recoverPendingDeliveries() attempts to re-deliver ALL queued messages regardless of age, causing "message dumps" where stale/orphaned messages flood channels.

This is particularly problematic for users with daily session resets or who experience crashes - overnight accumulation leads to bursts of stale messages being replayed the next morning. There's currently no native way to prevent this behavior without external cleanup scripts.

Proposed solution

Add a configurable TTL that recoverPendingDeliveries() checks before attempting replay:

Config Schema:

{
  "messages": {
    "delivery": {
      "maxAgeMs": 7200000,  // Default: 2 hours
      "expireAction": "move-to-failed"  // or "skip" or "delete"
    }
  }
}

Behavior:
When processing queue entries:

  1. Read enqueuedAt timestamp from each JSON file
    1. Calculate age: Date.now() - enqueuedAt
    1. If age > maxAgeMs:
    • "skip" → Ignore during recovery (leave file in place)
    • "delete" → Delete the file silently
    • "move-to-failed" → Move to failed/ folder with .expired suffix (recommended default)
      Why 2 Hours?
  • Retry schedule completes in ~12.5 minutes (5 attempts)
    • 2 hours = 10x safety margin, longer than any legitimate delivery delay
      • Shorter than typical daily reset cycles
        Backward Compatibility:
        Default maxAgeMs: undefined preserves current behavior (no age check). Users opt-in by setting the config value.

Alternatives considered

1. Clear queue on daily reset - Too aggressive, loses legitimate retry attempts still in progress

2. External cron cleanup - Works but shouldn't be the user's responsibility for basic message hygiene

3. Increase retry attempts - Doesn't solve the staleness issue, just delays it

Impact

Affected: Users with messaging integrations (especially iMessage, Discord, Telegram), particularly those using daily session resets or experiencing crashes

Severity: Annoying - Causes user confusion and message pollution but doesn't break core functionality

Frequency: Daily for users with session resets; intermittent for crash recovery scenarios

Consequence: Message dumps create poor UX, users receive bursts of stale/duplicate messages on gateway restart

Evidence/examples

Observed in production (v2026.2.13):

  • Users report morning "message dumps" to messaging channels after daily session resets
    • Queue directory (~/.openclaw/delivery-queue/) grows over time without manual cleanup
      • recoverPendingDeliveries() replays all entries on gateway startup regardless of age
        • No native config option to skip replay of old messages
          Retry Schedule Reference:
  • Attempts: 5s → 25s → 2min → 10min (max 5 attempts)
    • Total window: ~12.5 minutes
      • Current behavior: Messages persist indefinitely beyond retry window

Additional information

Implementation Notes:

  • Must remain backward-compatible with existing config keys
    • Default maxAgeMs: undefined preserves current behavior (no TTL check)
      • Recommend move-to-failed as safer default over delete for debugging
        • Related to delivery queue feature introduced in v2026.2.13
          Optional Enhancement:
          Consider logging when messages are expired (e.g., "Skipped 5 expired messages (age > 2h)") for visibility

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions