-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
[Bug]: Compaction-retry re-injects cron/system prompts as new user input, causing infinite tool-call loops #66126
Copy link
Copy link
Closed as not planned
Closed as not planned
Copy link
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.staleMarked as stale due to inactivityMarked as stale due to inactivity
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.staleMarked as stale due to inactivityMarked as stale due to inactivity
Type
Fields
Give feedbackNo fields configured for issues without a type.
Summary
After auto-compaction succeeds in an isolated cron-job session, the system retries the original prompt as if it were a fresh user request. This causes the agent to re-execute the same (failing) tool calls, which grow context again, trigger another compaction, and create an infinite loop.
Environment
Repro Steps
execcall to run a Python script)exec preflightrejects complex shell invocations)auto-compaction succeeded for openai-codex/gpt-5.4; retrying promptEvidence from logs
The system itself recognizes the loop (
suppress re-trigger loop) but still proceeds withretrying prompt, which restarts the cycle.24 compaction-retries in a single day, 2 with explicit
suppress re-trigger loopmarkers.Impact
lane wait exceeded: lane=main waitedMs=116215)Root Cause Analysis
The compaction-retry mechanism treats the post-compaction state as "prompt needs to be retried" without tracking that:
There is no "reply obligation ledger" — after compaction, the system loses knowledge of what was already attempted and what failed.
Suggested Fix (from our agent's own analysis)
Safe Patch (immediate):
Compaction-retry must not re-inject the current user turn as new user input. Instead: use
retryOfTurnId,originalMessageIds,turnAttemptto distinguish retries from fresh inputs. The agent should see "this is retry #2 of the same prompt" not "here's a new task."Proper Fix:
Reply-Obligation-Ledger per inbound message.
collectmay bundle messages, but internally track:message_idsare still openAfter compaction, resume a stable turn-state rather than reconstructing a raw prompt.
Queue Hardening:
When a run enters a tool cascade or compaction, temporarily suspend
collector set "no further coalescing" for that turn.Workaround
Preventing the initial tool-call failure (fixing the
exec preflightissue for our scripts) reduces context growth and makes compaction less likely, but does not fix the underlying architectural issue. Any sufficiently long tool-call sequence can trigger the same loop.Test Matrix
suppress re-trigger loopretrying prompt