Skip to content

Bug: Discord MESSAGE_CREATE listener blocks on long AI runs causing stuck restarts #33570

@theotarr

Description

@theotarr

Bug: Discord MESSAGE_CREATE listener blocks on long AI runs, causing stuck restarts

Summary

The Discord MESSAGE_CREATE listener appears to synchronously wait for full AI execution (including tools), which blocks message intake for minutes. This triggers slow-listener detection and health-monitor restarts.

Steps to reproduce

  1. Run OpenClaw with Discord enabled.
  2. Use a configuration that permits concurrent work (example from context):
    "agents": {
      "defaults": {
        "maxConcurrent": 4,
        "subagents": { "maxConcurrent": 8 }
      }
    }
  3. Send a Discord message that triggers a long AI run (10+ minutes).
  4. Send additional Discord messages while the first run is still executing.
  5. Observe listener latency and eventual health-monitor restart in logs.

Expected behavior

MESSAGE_CREATE should return quickly (milliseconds) after enqueueing work; long AI runs should execute asynchronously without blocking intake of new Discord events.

Actual behavior

Incoming Discord events block behind long-running tasks. Slow-listener alarms fire repeatedly, health monitor marks Discord as stuck, and restarts occur mid-work.

Relevant log snippets (from context)

15:58 - embedded run timeout: runId=... timeoutMs=600000
15:58 - Profile anthropic:default timed out. Trying next account...
15:58 - Slow listener detected: DiscordMessageListener: MESSAGE_CREATE: 601.7 seconds
16:05 - Slow listener detected: DiscordMessageListener: MESSAGE_CREATE: 206.3 seconds
16:46 - health-monitor: restarting (reason: stuck)
16:53 - Slow listener detected: MESSAGE_CREATE: 35.6 seconds
17:05 - Slow listener detected: MESSAGE_CREATE: 47 seconds
17:11 - Slow listener detected: MESSAGE_CREATE: 82.6 seconds
17:11 - Slow listener detected: MESSAGE_CREATE: 103.4 seconds
17:22 - typing TTL reached (2m); stopping typing indicator

Suggested fix

  • Refactor Discord MESSAGE_CREATE handling to fire-and-forget into an internal async queue.
  • Decouple event ingestion from run execution; ACK/queue immediately, process on worker(s).
  • Add backpressure + queue metrics (queue depth, oldest age) and avoid treating active long-running work as listener deadlock.
  • Keep typing indicator alive for queued/long tasks (or emit explicit “still processing” status updates).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions