Bug Description
Session gets stuck after compaction triggers, gateway becomes unresponsive. All channels (Control UI, Feishu, new sessions) stop responding. Only gateway restart resolves it.
Environment
- OpenClaw Version: 2026.2.9
- Platform: WSL2 (Ubuntu)
- Gateway Mode: Local (loopback)
- Model: MiniMax-M2.1
- Compaction Mode: safeguard
Reproduction Steps
- Session runs normally for extended period (multiple hours/days)
- Compaction triggers automatically (when context approaches limit)
- Gateway becomes completely unresponsive:
- Control UI: Shows connection error / "(no output)"
- Feishu: Messages fail to send
- New sessions: Cannot establish connection
- Only workaround: Manual gateway restart (
openclaw gateway restart)
Technical Details
Timeline from Gateway Logs
# Compaction started
{"subsystem":"agent/embedded","1":"embedded run compaction start: runId=33d6ae56-5065-45c1-8eae-13b8d669bf8e"}
[Timestamp: 2026-02-10T10:56:53.317Z]
# Compaction retry triggered
{"subsystem":"agent/embedded","1":"embedded run compaction retry: runId=33d6ae56-5065-45c1-8eae-13b8d669bf8e"}
[Timestamp: 2026-02-10T10:57:30.652Z]
# TIMEOUT after 600 seconds (10 minutes)
{"subsystem":"agent/embedded","1":"embedded run timeout: runId=33d6ae56-5065-45c1-8eae-13b8d669bf8e timeoutMs=600000"}
[Timestamp: 2026-02-10T11:04:39.975Z]
Additional Symptoms
After timeout, stale cron job running markers were found:
{"module":"cron","1":{"jobId":"9c06c09b-e9b4-40df-8626-22c43ec0cd37","runningAtMs":1770726600004},"2":"cron: clearing stale running marker on startup"}
{"module":"cron","1":{"jobId":"7d2b3ecd-5e78-4fc3-aeb2-f4d559a033f0","runningAtMs":1770726600004},"2":"cron: clearing stale running marker on startup"}
These markers are cleared on subsequent gateway restart, indicating previous shutdown was unclean.
Impact
- Severity: High - Complete service disruption
- User Experience: Gateway completely frozen, requires manual intervention
- Recovery: Manual gateway restart is the only known workaround
- Frequency: Reproduced multiple times in the same session
Suggested Investigation Areas
- Compaction timeout handling: The 600-second timeout appears to hang rather than gracefully fail
- State cleanup: Stale running markers not being cleared during/after timeout
- Message queue: Incoming messages not being processed during compaction
- Channel reconnection: WebSocket connections not recovering after compaction failure
Workaround
Manual gateway restart:
Logs
Full gateway logs available at: /tmp/openclaw/openclaw-2026-02-10.log
Note: Logs are overwritten on gateway restart, so capture immediately after reproduction.
Additional Context
This appears to be related to issue #11140 (HEARTBEAT_OK accumulates) as compaction is involved in session context management.
Bug Description
Session gets stuck after compaction triggers, gateway becomes unresponsive. All channels (Control UI, Feishu, new sessions) stop responding. Only gateway restart resolves it.
Environment
Reproduction Steps
openclaw gateway restart)Technical Details
Timeline from Gateway Logs
Additional Symptoms
After timeout, stale cron job running markers were found:
These markers are cleared on subsequent gateway restart, indicating previous shutdown was unclean.
Impact
Suggested Investigation Areas
Workaround
Manual gateway restart:
Logs
Full gateway logs available at:
/tmp/openclaw/openclaw-2026-02-10.logNote: Logs are overwritten on gateway restart, so capture immediately after reproduction.
Additional Context
This appears to be related to issue #11140 (HEARTBEAT_OK accumulates) as compaction is involved in session context management.