Bug Report: Gateway becomes completely unresponsive after compaction triggers
Issue Description
After a compaction event fires on the main session, the Gateway stops responding to all messages (webchat). Messages are sent but receive no reply. Gateway restart is required to recover.
Environment:
- Platform: Ubuntu (VM on Tencent Cloud)
- OpenClaw version: v2026.5.2
- Node version: v24.14.1
- Memory: 3.7GB total
- Webchat channel (direct conversation, not Feishu)
Compaction config:
"compaction": {
"truncateAfterCompaction": true,
"maxActiveTranscriptBytes": "10mb"
}
Steps to Reproduce
- Main session runs for an extended period with significant conversation history
- Compaction triggers (context overflow or byte threshold reached)
- User sends a message via webchat after the system prompt about compaction appears
- Gateway becomes completely unresponsive — no reply, no error, no feedback
- Only Gateway restart (
systemctl --user restart openclaw-gateway) restores functionality
Actual Behavior
After compaction triggers:
- Compaction itself completes successfully (seen in logs:
[compaction] rotated active transcript after compaction in ~30 seconds)
- But subsequent messages get no response
- Session shows
state=processing queueDepth=1 reason=queued_behind_active_work for extended periods
- Log shows
agent cleanup timed out events
- Liveness warnings show high
eventLoopDelayMaxMs values (up to 3454ms)
- WebSocket connections (sessions.list, chat.history) continue to work for other sessions but main session is stuck
Expected Behavior
Messages sent during/after compaction should either:
- Be processed after compaction completes, or
- Return an error message indicating the session is busy with compaction
Log Evidence
# Stuck session during compaction window
12:36:04 long-running session: sessionId=main sessionKey=agent:main:main state=processing age=125s queueDepth=1 reason=queued_behind_active_work classification=long_running
12:36:34 long-running session: sessionId=main sessionKey=agent:main:main state=processing age=155s queueDepth=1 reason=queued_behind_active_work
12:37:21 [compaction] rotated active transcript after compaction (sessionKey=agent:main:main)
12:39:37 long-running session: sessionId=main sessionKey=agent:main:main state=processing age=135s queueDepth=0 reason=active_work classification=long_running
# Cleanup timeouts during the stuck period
12:30:36 agent cleanup timed out: runId=... sessionId=... step=pi-trajectory-flush timeoutMs=
12:33:05 agent cleanup timed out: runId=... sessionId=... step=pi-trajectory-flush timeoutMs=
# Event loop delays
12:21:55 liveness warning: reasons=event_loop_delay interval=31s eventLoopDelayP99Ms=897.1 eventLoopDelayMaxMs=1784.7
12:22:55 liveness warning: reasons=event_loop_delay,cpu interval=30s eventLoopDelayP99Ms=1891.6 eventLoopDelayMaxMs=3454 eventLoopUtilization=0.852
12:24:56 liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu interval=30s eventLoopDelayP99Ms=3259 eventLoopDelayMaxMs=3259 eventLoopUtilization=0.999
Preliminary Root Cause Analysis
The issue appears to be in how messages are handled during/after preflightCompaction:
preflightCompaction executes synchronously inside the session lane (via enqueueCommandInLane in agent-runner.runtime-DCKwkWFL.js line ~1742)
setPhase("preflight_compacting") is set but has no timeout protection
- During compaction, new messages from webchat arrive and are queued (
queueDepth=1)
- After compaction completes, queued messages appear to not be properly dequeued/processed
- The session remains in
state=processing indefinitely
The replyOperation.abortSignal passed to compaction does not have a timeout that would interrupt a slow compaction.
Possible Related Factors
- Session file:
/root/.openclaw/agents/main/sessions/b84aa148-4a29-4f2f-94e5-9b7296aabbf3.jsonl
- Context overflow events:
Context overflow: estimated context size exceeds safe threshold during tool loop (compaction attempts: 0 — meaning preflight compaction didn't run before overflow)
- Compaction succeeded but didn't prevent the stuck state
- Recovery required full Gateway restart, not just session reset
Tags
bug compaction session-lane webchat v2026.5.2
Bug Report: Gateway becomes completely unresponsive after compaction triggers
Issue Description
After a compaction event fires on the main session, the Gateway stops responding to all messages (webchat). Messages are sent but receive no reply. Gateway restart is required to recover.
Environment:
Compaction config:
Steps to Reproduce
systemctl --user restart openclaw-gateway) restores functionalityActual Behavior
After compaction triggers:
[compaction] rotated active transcript after compactionin ~30 seconds)state=processing queueDepth=1 reason=queued_behind_active_workfor extended periodsagent cleanup timed outeventseventLoopDelayMaxMsvalues (up to 3454ms)Expected Behavior
Messages sent during/after compaction should either:
Log Evidence
Preliminary Root Cause Analysis
The issue appears to be in how messages are handled during/after
preflightCompaction:preflightCompactionexecutes synchronously inside the session lane (viaenqueueCommandInLaneinagent-runner.runtime-DCKwkWFL.jsline ~1742)setPhase("preflight_compacting")is set but has no timeout protectionqueueDepth=1)state=processingindefinitelyThe
replyOperation.abortSignalpassed to compaction does not have a timeout that would interrupt a slow compaction.Possible Related Factors
/root/.openclaw/agents/main/sessions/b84aa148-4a29-4f2f-94e5-9b7296aabbf3.jsonlContext overflow: estimated context size exceeds safe threshold during tool loop(compaction attempts: 0 — meaning preflight compaction didn't run before overflow)Tags
bugcompactionsession-lanewebchatv2026.5.2