Bug: Compaction causes Pi runtime deadlock — agent freezes across all channels after summary generation

## Summary

Compaction causes Pi runtime deadlock after summary generation, freezing ALL channels of the affected agent. Gateway remains healthy (no crash, no error log), but the agent stops responding to any channel until session is rebuilt.

OpenClaw 2026.5.18 (50a2481) · macOS 15.6 · Node 23.11.0 · Apple Silicon

## Reproduction Pattern (3 days, 4 occurrences)

1. Agent session accumulates tokens to ~182k/262k (~70%) — compaction threshold determined by `reserveTokensFloor: 80000`
2. Auto-compaction triggers (or manual `/compact`)
3. Compaction summary is generated successfully (verified in transcript)
4. `.reset` backup created
5. **Post-compaction: no new messages written to transcript** — session goes silent
6. All channels of the same agent freeze (confirmed: Feishu + WeCom both unresponsive)
7. Other agents on same gateway unaffected
8. `/compact` returns "skipped: session was already compacted recently"
9. Gateway restart does NOT recover — agent still unresponsive
10. Only `/new` (fresh session) restores function

## Timeline (Latest Incident)

All times UTC+8 (Beijing):

| Time | Event |
|------|-------|
| 10:54 | Gateway auto-restarted by launchd (kickstart) |
| 10:58 | User message processed normally |
| 11:00 | Auto-compaction triggered (182,767 tokens) |
| 11:00 | Compaction summary generated (comprehensive, well-structured) |
| 11:00 | `.reset` backup created (1.5MB, 570 lines) |
| 11:00+ | **No new messages in transcript** — deadlock |
| 11:05 | Session marked as reset |
| 11:08 | User rebuilt session → new session works |

## Evidence

### Compaction entry in transcript (last entry before deadlock)

```json
{
  "type": "compaction",
  "timestamp": "2026-05-21T03:00:56.816Z",
  "summary": "## Goal\n- Investigate...",
  "tokensBefore": 182767,
  "fromHook": false
}
```

Summary was well-formed with Goal, Progress, Next Steps, read-files, modified-files — quality is fine.

### File state after deadlock

Session directory contains:
- `xxx.jsonl.reset.<timestamp>` — backup created at reset (1.5MB)
- `xxx.checkpoint.<uuid>.jsonl` — pre-compaction checkpoint (611KB)
- `xxx.trajectory.jsonl` — full trajectory (10MB)
- `xxx.trajectory-path.json` — pointer

**Missing:** No compacted `.json` successor file was ever created.

### Multi-channel confirmation

When Feishu froze, WeCom channel of the same agent also stopped responding within minutes. A different agent on the same gateway continued working normally, confirming the deadlock is agent-scoped, not gateway-scoped.

### Gateway health

- `gateway.err.log`: zero errors for the incident day
- `gateway.log`: stopped writing on May 19 (2 days before incident) — log rotation or logging bug
- Gateway process: healthy, no crash
- Other agents: fully functional

## Configuration Context

```json
{
  "agents": {
    "defaults": {
      "compaction": {
        "reserveTokensFloor": 80000,
        "midTurnPrecheck": { "enabled": true }
      }
    }
  }
}
```

Key: `reserveTokensFloor: 80000` on a 262k context window → compaction triggers at ~70% (182k tokens), much earlier than default (24k reserve → ~91% trigger).

`truncateAfterCompaction` was not set (default `false`) — in-place rewrite mode.
`notifyUser` was not set (default `false`).

## Hypothesis

**Compaction summary generation succeeds, but the subsequent transcript write/rotation step fails silently.** Since `truncateAfterCompaction` is `false`, OpenClaw uses in-place transcript rewrite. The failure leaves the Pi runtime's event loop in an inconsistent state — an async file operation doesn't resolve, blocking the entire agent's message processing queue. This would explain:

1. Agent-level deadlock (Pi runtime blocked, not gateway)
2. No gateway errors (the event loop is stuck, not crashed)
3. Gateway restart doesn't help (the broken session state persists on disk)
4. `/new` fixes it (creates fresh Pi runtime + fresh transcript)

The `reserveTokensFloor: 80000` (causing frequent early compactions) and gateway restart shortly before compaction may be contributing factors — restart may leave session state slightly inconsistent when the next auto-compaction fires.

## Workaround Applied

- `reserveTokensFloor`: 80000 → 24000 (default)
- `truncateAfterCompaction`: false → true
- `notifyUser`: false → true

## Related

- Model: deepseek-v4-pro (262k context)
- Previous occurrence: same pattern observed on May 19 and May 20
- Session reset files preserved for debugging if needed


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Compaction causes Pi runtime deadlock — agent freezes across all channels after summary generation #84777

Summary

Reproduction Pattern (3 days, 4 occurrences)

Timeline (Latest Incident)

Evidence

Compaction entry in transcript (last entry before deadlock)

File state after deadlock

Multi-channel confirmation

Gateway health

Configuration Context

Hypothesis

Workaround Applied

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Time	Event
10:54	Gateway auto-restarted by launchd (kickstart)
10:58	User message processed normally
11:00	Auto-compaction triggered (182,767 tokens)
11:00	Compaction summary generated (comprehensive, well-structured)
11:00	`.reset` backup created (1.5MB, 570 lines)
11:00+	No new messages in transcript — deadlock
11:05	Session marked as reset
11:08	User rebuilt session → new session works

Uh oh!

Bug: Compaction causes Pi runtime deadlock — agent freezes across all channels after summary generation #84777

Description

Summary

Reproduction Pattern (3 days, 4 occurrences)

Timeline (Latest Incident)

Evidence

Compaction entry in transcript (last entry before deadlock)

File state after deadlock

Multi-channel confirmation

Gateway health

Configuration Context

Hypothesis

Workaround Applied

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions