Bug Description
Gateway WebSocket calls (sessions.list, sessions.send, sessions_history, etc.) timeout after 10s when the event loop is blocked by compaction.
Environment
- Version: OpenClaw 2026.5.2 (8b2a6e5)
- Node: v22.22.2
- Host: Linux 6.17, 10GB RAM, 4-core
- Gateway: ws://127.0.0.1:18902, bind=lan, mode=local
- Agents configured: 9 agents (main, pm1, director1, frontend1, backend1, qa1, clerk, devops1, reviewer)
Symptoms
- sessions.list / sessions_send timeout — Tool calls to
sessions_list, sessions_send, sessions_history all fail with gateway timeout after 10000ms
- Event loop severely blocked — Log shows repeated liveness warnings with
eventLoopDelayP99Ms up to 15000ms+, eventLoopUtilization=1
- sessions.list normal response time: ~800ms → spikes to 72,826ms+ during compaction stall, eventually times out
- Root cause: Compaction of a large transcript (main session with extensive history) blocks the event loop for 10-15 seconds, causing all WebSocket requests to queue up and timeout
Log Evidence
[2026-05-03T11:06:40.759+08:00] liveness warning: reasons=event_loop_delay,cpu eventLoopDelayP99Ms=1756.4 eventLoopDelayMaxMs=2115 eventLoopUtilization=0.858
[2026-05-03T11:06:54.565+08:00] [tools] sessions_list failed: gateway timeout after 10000ms
[2026-05-03T11:11:56.319+08:00] agent cleanup timed out: runId=... sessionId=...
[2026-05-03T11:18:43.947+08:00] sessions.list 101231ms
[2026-05-03T11:18:49.607+08:00] sessions.usage 89006ms
Current Compaction Config
agents.defaults.compaction.maxActiveTranscriptBytes: "15mb"
agents.defaults.compaction.truncateAfterCompaction: true
Impact
- Inter-agent communication (main → pm1, main → director1) completely broken during compaction
- Heartbeat / cron tasks fail when they depend on sessions_list
- Gateway probe/status commands fail (openclaw status hangs)
Questions / Requests
- Can compaction be made non-blocking (run in background thread/worker)?
- Is there a way to limit compaction CPU impact so it does not freeze the gateway?
- Should sessions.list/usage have their own timeout/queue management separate from compaction stalls?
Bug Description
Gateway WebSocket calls (sessions.list, sessions.send, sessions_history, etc.) timeout after 10s when the event loop is blocked by compaction.
Environment
Symptoms
sessions_list,sessions_send,sessions_historyall fail withgateway timeout after 10000mseventLoopDelayP99Msup to 15000ms+, eventLoopUtilization=1Log Evidence
Current Compaction Config
Impact
Questions / Requests