Skip to content

Memory compaction blocks main processing lane, causing unresponsive bot for 10+ minutes #53008

@petiblaster

Description

@petiblaster

Bug Report

What happened

A session memory compaction (memoryFlush) run started and hung for the full 10-minute timeout (600,000ms). During this time, the main processing lane was blocked (totalActive=1), which meant all inbound Telegram messages were queued and never processed. The bot appeared completely unresponsive for 10+ minutes until the user manually restarted the gateway.

Timeline (from journalctl)

16:36:24 — compaction start (embedded run, runId f3237661...)
16:46:24 — TIMEOUT after 600000ms → aborted
           'using current snapshot: timed out during compaction'
           'compaction promise rejected (no waiter): AbortError: Unsubscribed during compaction'
16:46:24–16:51 — new runs spawned but totalActive stayed at 1-2, queue still backed up
16:51:19 — user manually sent SIGTERM to recover

Expected behavior

  • Memory compaction should not block the main processing lane for inbound messages
  • If compaction takes too long, it should be backgrounded or deprioritized so user messages can still be processed
  • A 10-minute hard timeout is too long for something that blocks interactivity

Suggested improvements

  1. Run compaction in a separate lane/worker so it doesn't block message processing
  2. Reduce the compaction timeout (or make it configurable)
  3. Add a health-monitor check that detects when the main lane is blocked for too long and auto-recovers

Environment

  • OpenClaw v2026.3.13 (upgraded to v2026.3.22 after incident)
  • Host: WSL2 Linux 6.6.87.2
  • Model: anthropic/claude-opus-4-6
  • Channel: Telegram

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions