Skip to content

[Bug]: Compaction-retry re-injects cron/system prompts as new user input, causing infinite tool-call loops #66126

@mafresh-max

Description

@mafresh-max

Summary

After auto-compaction succeeds in an isolated cron-job session, the system retries the original prompt as if it were a fresh user request. This causes the agent to re-execute the same (failing) tool calls, which grow context again, trigger another compaction, and create an infinite loop.

Environment

  • OpenClaw: 2026.4.11
  • Model: openai-codex/gpt-5.4 (primary), minimax-portal/MiniMax-M2.7-highspeed (fallback, cron default)
  • OS: macOS Darwin 25.2.0 (Apple Silicon)
  • Setup: Multi-channel Discord with ~20 internal cron jobs, active-memory plugin enabled, QMD enabled

Repro Steps

  1. Have a cron job that triggers a tool-use action (e.g. exec call to run a Python script)
  2. The tool call fails repeatedly (in our case: exec preflight rejects complex shell invocations)
  3. Each failed attempt adds error messages to context
  4. Context exceeds threshold → auto-compaction triggers
  5. After compaction: auto-compaction succeeded for openai-codex/gpt-5.4; retrying prompt
  6. The retried prompt re-injects the original cron trigger
  7. Agent believes it's a new task → re-executes the same failing tool calls
  8. Context grows again → overflow → compaction → retry → infinite loop

Evidence from logs

10:04:46 [context-overflow-diag] messages=171 compactionAttempts=0
10:04:46 context overflow detected (attempt 1/3); attempting auto-compaction
10:05:45 exec failed: exec preflight: complex interpreter invocation detected
10:05:49 exec failed: exec preflight (same command, variant 2)
10:05:53 exec failed: exec preflight (variant 3)
10:06:46 exec failed: exec preflight (variant 4)
10:07:23 exec failed: exec preflight (different script, same pattern)
10:07:26 exec failed: exec preflight (variant 6)
10:07:29 exec failed: exec preflight (variant 7)
10:07:34 exec failed: exec preflight (variant 8)
10:08:34 [context-overflow-diag] messages=116 compactionAttempts=0  ← DIFFERENT channel, same session pressure
10:08:38 "no real conversation messages to summarize; writing compaction boundary to suppress re-trigger loop"
10:08:39 auto-compaction succeeded; retrying prompt  ← RE-INJECTION
10:09:31 auto-compaction succeeded; retrying prompt  ← ANOTHER RE-INJECTION
10:09:36 [context-overflow-diag] messages=125 compactionAttempts=1  ← GROWING AGAIN

The system itself recognizes the loop (suppress re-trigger loop) but still proceeds with retrying prompt, which restarts the cycle.

24 compaction-retries in a single day, 2 with explicit suppress re-trigger loop markers.

Impact

  • All Discord channels become unresponsive during compaction-retry cycles (30-120s per cycle)
  • Lane waits exceed 116 seconds (lane wait exceeded: lane=main waitedMs=116215)
  • Multiple channels cascade into context overflow simultaneously
  • The agent appears "hung" to the user for minutes at a time

Root Cause Analysis

The compaction-retry mechanism treats the post-compaction state as "prompt needs to be retried" without tracking that:

  1. The original prompt already produced tool-call outputs (even if they failed)
  2. The failures are deterministic (same input → same preflight rejection)
  3. The retry will produce the same result, growing context identically

There is no "reply obligation ledger" — after compaction, the system loses knowledge of what was already attempted and what failed.

Suggested Fix (from our agent's own analysis)

Safe Patch (immediate):
Compaction-retry must not re-inject the current user turn as new user input. Instead: use retryOfTurnId, originalMessageIds, turnAttempt to distinguish retries from fresh inputs. The agent should see "this is retry #2 of the same prompt" not "here's a new task."

Proper Fix:
Reply-Obligation-Ledger per inbound message. collect may bundle messages, but internally track:

  • Which message_ids are still open
  • What was already answered (even with errors)
  • What remains open after compaction

After compaction, resume a stable turn-state rather than reconstructing a raw prompt.

Queue Hardening:
When a run enters a tool cascade or compaction, temporarily suspend collect or set "no further coalescing" for that turn.

Workaround

Preventing the initial tool-call failure (fixing the exec preflight issue for our scripts) reduces context growth and makes compaction less likely, but does not fix the underlying architectural issue. Any sufficiently long tool-call sequence can trigger the same loop.

Test Matrix

Scenario Expected Actual
Cron triggers tool-call that fails → compaction → retry Agent recognizes retry, does NOT re-execute same failing calls Agent re-executes identically, loop
Compaction with suppress re-trigger loop Loop stops Loop continues via retrying prompt
Multiple channels during compaction-retry Other channels respond normally All channels blocked (lane wait)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions