Compaction circuit breaker: fallback model + force-truncate after repeated failures

## Problem

When the primary model (e.g. Opus) is degraded or overloaded, context compaction can enter a death spiral:

1. Context hits the limit, compaction triggers
2. Compaction sends full context to the model, which times out (10min timeout)
3. Failure kills Telegram polling ("Unsubscribed during compaction")
4. Next inbound message triggers another compaction attempt
5. Repeat indefinitely

This blocked all message processing for over an hour tonight (2026-03-13, ~9:25 PM to ~10:30 PM ET). The gateway was healthy the entire time. The agent session was completely unresponsive.

From the error log:
```
[agent/embedded] embedded run timeout: runId=... timeoutMs=600000
[agent/embedded] using current snapshot: timed out during compaction
[telegram] Restarting polling after unhandled network error: Unsubscribed during compaction
[telegram] polling runner stopped (unhandled network error); restarting in 23.29s
[diagnostic] lane wait exceeded: lane=session:agent:main:main waitedMs=600160 queueAhead=0
```

This repeated 5+ times in sequence.

## Proposed solutions

1. **Fallback model for compaction**: If compaction fails 2x in a row with the primary model, retry with a faster/larger-context fallback (e.g. Sonnet, which is cheaper and has a bigger context window). Could be configurable: `agents.defaults.compactionFallbackModel`.

2. **Force-truncate after N failures**: If compaction fails 3x total (including fallback), hard-truncate the context (drop oldest messages) rather than retrying indefinitely. Lossy but better than total unresponsiveness.

3. **Don't block Telegram polling during compaction**: The compaction failure currently crashes the polling connection. Compaction should not take down the channel transport. Even if the agent can't respond yet, it should still be receiving messages.

4. **Expose compaction health in gateway status**: `openclaw gateway status` should show if compaction is currently running, how many times it's failed, and whether the session is effectively stuck.

## Environment
- OpenClaw 2026.3.12
- Model: anthropic/claude-opus-4-6
- Channel: Telegram
- OS: macOS (Darwin 25.3.0, arm64)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compaction circuit breaker: fallback model + force-truncate after repeated failures #45686

Problem

Proposed solutions

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Compaction circuit breaker: fallback model + force-truncate after repeated failures #45686

Description

Problem

Proposed solutions

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions