-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
Bug: Compaction causes Pi runtime deadlock — agent freezes across all channels after summary generation #84777
Copy link
Copy link
Open
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Summary
Compaction causes Pi runtime deadlock after summary generation, freezing ALL channels of the affected agent. Gateway remains healthy (no crash, no error log), but the agent stops responding to any channel until session is rebuilt.
OpenClaw 2026.5.18 (50a2481) · macOS 15.6 · Node 23.11.0 · Apple Silicon
Reproduction Pattern (3 days, 4 occurrences)
reserveTokensFloor: 80000/compact).resetbackup created/compactreturns "skipped: session was already compacted recently"/new(fresh session) restores functionTimeline (Latest Incident)
All times UTC+8 (Beijing):
.resetbackup created (1.5MB, 570 lines)Evidence
Compaction entry in transcript (last entry before deadlock)
{ "type": "compaction", "timestamp": "2026-05-21T03:00:56.816Z", "summary": "## Goal\n- Investigate...", "tokensBefore": 182767, "fromHook": false }Summary was well-formed with Goal, Progress, Next Steps, read-files, modified-files — quality is fine.
File state after deadlock
Session directory contains:
xxx.jsonl.reset.<timestamp>— backup created at reset (1.5MB)xxx.checkpoint.<uuid>.jsonl— pre-compaction checkpoint (611KB)xxx.trajectory.jsonl— full trajectory (10MB)xxx.trajectory-path.json— pointerMissing: No compacted
.jsonsuccessor file was ever created.Multi-channel confirmation
When Feishu froze, WeCom channel of the same agent also stopped responding within minutes. A different agent on the same gateway continued working normally, confirming the deadlock is agent-scoped, not gateway-scoped.
Gateway health
gateway.err.log: zero errors for the incident daygateway.log: stopped writing on May 19 (2 days before incident) — log rotation or logging bugConfiguration Context
{ "agents": { "defaults": { "compaction": { "reserveTokensFloor": 80000, "midTurnPrecheck": { "enabled": true } } } } }Key:
reserveTokensFloor: 80000on a 262k context window → compaction triggers at ~70% (182k tokens), much earlier than default (24k reserve → ~91% trigger).truncateAfterCompactionwas not set (defaultfalse) — in-place rewrite mode.notifyUserwas not set (defaultfalse).Hypothesis
Compaction summary generation succeeds, but the subsequent transcript write/rotation step fails silently. Since
truncateAfterCompactionisfalse, OpenClaw uses in-place transcript rewrite. The failure leaves the Pi runtime's event loop in an inconsistent state — an async file operation doesn't resolve, blocking the entire agent's message processing queue. This would explain:/newfixes it (creates fresh Pi runtime + fresh transcript)The
reserveTokensFloor: 80000(causing frequent early compactions) and gateway restart shortly before compaction may be contributing factors — restart may leave session state slightly inconsistent when the next auto-compaction fires.Workaround Applied
reserveTokensFloor: 80000 → 24000 (default)truncateAfterCompaction: false → truenotifyUser: false → trueRelated