Compaction causes gateway to hang, requiring manual restart

## Bug Description

**Session gets stuck after compaction triggers, gateway becomes unresponsive. All channels (Control UI, Feishu, new sessions) stop responding. Only gateway restart resolves it.**

## Environment

- **OpenClaw Version:** 2026.2.9
- **Platform:** WSL2 (Ubuntu)
- **Gateway Mode:** Local (loopback)
- **Model:** MiniMax-M2.1
- **Compaction Mode:** safeguard

## Reproduction Steps

1. Session runs normally for extended period (multiple hours/days)
2. Compaction triggers automatically (when context approaches limit)
3. Gateway becomes completely unresponsive:
   - Control UI: Shows connection error / "(no output)"
   - Feishu: Messages fail to send
   - New sessions: Cannot establish connection
4. **Only workaround:** Manual gateway restart (`openclaw gateway restart`)

## Technical Details

### Timeline from Gateway Logs

```
# Compaction started
{"subsystem":"agent/embedded","1":"embedded run compaction start: runId=33d6ae56-5065-45c1-8eae-13b8d669bf8e"}
[Timestamp: 2026-02-10T10:56:53.317Z]

# Compaction retry triggered
{"subsystem":"agent/embedded","1":"embedded run compaction retry: runId=33d6ae56-5065-45c1-8eae-13b8d669bf8e"}
[Timestamp: 2026-02-10T10:57:30.652Z]

# TIMEOUT after 600 seconds (10 minutes)
{"subsystem":"agent/embedded","1":"embedded run timeout: runId=33d6ae56-5065-45c1-8eae-13b8d669bf8e timeoutMs=600000"}
[Timestamp: 2026-02-10T11:04:39.975Z]
```

### Additional Symptoms

After timeout, stale cron job running markers were found:
```
{"module":"cron","1":{"jobId":"9c06c09b-e9b4-40df-8626-22c43ec0cd37","runningAtMs":1770726600004},"2":"cron: clearing stale running marker on startup"}
{"module":"cron","1":{"jobId":"7d2b3ecd-5e78-4fc3-aeb2-f4d559a033f0","runningAtMs":1770726600004},"2":"cron: clearing stale running marker on startup"}
```

These markers are cleared on subsequent gateway restart, indicating previous shutdown was unclean.

## Impact

- **Severity:** High - Complete service disruption
- **User Experience:** Gateway completely frozen, requires manual intervention
- **Recovery:** Manual gateway restart is the only known workaround
- **Frequency:** Reproduced multiple times in the same session

## Suggested Investigation Areas

1. **Compaction timeout handling:** The 600-second timeout appears to hang rather than gracefully fail
2. **State cleanup:** Stale running markers not being cleared during/after timeout
3. **Message queue:** Incoming messages not being processed during compaction
4. **Channel reconnection:** WebSocket connections not recovering after compaction failure

## Workaround

Manual gateway restart:
```bash
openclaw gateway restart
```

## Logs

Full gateway logs available at: `/tmp/openclaw/openclaw-2026-02-10.log`

**Note:** Logs are overwritten on gateway restart, so capture immediately after reproduction.

## Additional Context

This appears to be related to issue #11140 (HEARTBEAT_OK accumulates) as compaction is involved in session context management.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compaction causes gateway to hang, requiring manual restart #13379

Bug Description

Environment

Reproduction Steps

Technical Details

Timeline from Gateway Logs

Additional Symptoms

Impact

Suggested Investigation Areas

Workaround

Logs

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Compaction causes gateway to hang, requiring manual restart #13379

Description

Bug Description

Environment

Reproduction Steps

Technical Details

Timeline from Gateway Logs

Additional Symptoms

Impact

Suggested Investigation Areas

Workaround

Logs

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions