Bug: Preemptive context overflow silently kills embedded sessions without notifying user

## Bug: Preemptive context overflow silently kills embedded sessions without notifying user

**OpenClaw version:** 2026.5.19-beta.1 (ba9034b)
**Gateway:** running (systemd)
**Affected agent(s):** agent:marcus:main, agent:jordan:main, agent:neon (multiple agents same day)
**Channel:** webchat
**First seen:** 2026-05-19 08:45 UTC
**Last seen:** 2026-05-20 12:20 UTC

---

## Description

Embedded agent sessions enter a terminal error state (`Context overflow: estimated context size exceeds safe threshold during tool loop`) during tool loops. **The session silently dies with no user notification.** The error is logged but never delivered to the user's channel. Sessions remain stuck in a `processing` state for hours until manually restarted.

Expected: user sees `Context overflow: prompt too large for the model. Try /reset` immediately.
Actual: silence, frozen session, no message delivered, no self-healing.

---

## Steps to Reproduce

1. Have an embedded agent session engaged in a long tool-heavy task
2. Session accumulates messages in a tool loop (context at ~50%, well within 200K token window)
3. Preemptive context overflow check fires — **before** any model call
4. Session enters `embedded_run_agent_end` with error
5. Error is logged but **never delivered to the channel**
6. Session stays frozen in `processing` state
7. Stuck-session recovery fires hours later but only releases the lane, no notification
8. User must manually restart gateway

---

## Expected Behavior

1. Error message delivered to the user's channel immediately
2. Session auto-resets or clearly signals the user to reset
3. No silently dead sessions for hours without notification

---

## Actual Behavior

1. Session ends with `embedded_run_agent_end` + error
2. `compactionAttempts=0` — preemptive check short-circuits before in-attempt compaction
3. Error **never reaches the channel**
4. Session stays frozen in `processing` state
5. Stuck-session recovery fires hours later but only releases the lane
6. No self-healing

---

## Environment

| Field | Value |
|-------|-------|
| OS | Linux 7.0.0-15-generic (x64) |
| Node.js | v24.15.0 |
| OpenClaw | 2026.5.19-beta.1 (ba9034b) |
| Install location | /usr/local/bin/openclaw |
| Gateway bind | loopback (127.0.0.1:18789) |
| Provider | minimax |
| Model | minimax/MiniMax-M2.7 |
| Fallbacks | yes (MiniMax-M2.5) |
| Compaction mode | safeguard |

---

## Relevant Logs

```
// PREEMPTIVE OVERFLOW — fires before model call, no token count measured
{"subsystem":"agent/embedded","1":"[context-overflow-diag] sessionKey=agent:marcus:main provider=minimax/MiniMax-M2.7 compactionAttempts=0 observedTokens=unknown error=Context overflow: estimated context size exceeds safe threshold during tool loop."}

// SESSION ENDS
{"subsystem":"agent/embedded","1":{"event":"embedded_run_agent_end","isError":true,"error":"Context overflow: prompt too large for the model. Try /reset (or /new) to start a fresh session, or use a larger-context model."}}

// COMPACTION RUNS BUT SESSION NEVER RESUMES
{"subsystem":"agent/embedded","1":"[compaction] rotated active transcript after compaction (sessionKey=agent:marcus:main)"}

// HOURS LATER — stuck session detected but recovery is insufficient
{"subsystem":"diagnostic","1":"stuck session: ...lastProgressAge=29351s terminalProgressStale=true recovery=checking"}
{"subsystem":"diagnostic","1":"stuck session recovery outcome: status=released action=release_lane ... released=0"}
```

---

## Root Cause Analysis

The issue is in the embedded Pi runner's **preemptive** context overflow handling (`selection-BpjGe-Y0.js`):

1. `PREEMPTIVE_OVERFLOW_RATIO = 0.9` is **hardcoded** — no config path, not tunable
2. For MiniMax-M2.7 (200K token context): `maxContextChars = 200,000 × 4 × 0.9 = 720,000 chars`
3. The preemptive check fires during a tool loop **before** sending to the model, even when actual token count is well within limits
4. `compactionAttempts=0` — preemptive check short-circuits before the in-attempt compaction path is reached
5. Error is not delivered to the channel
6. Session never resumes

**Code locations:**
- `selection-BpjGe-Y0.js:9325` — `PREEMPTIVE_OVERFLOW_RATIO = .9` hardcoded
- `selection-BpjGe-Y0.js:9495` — `maxContextChars = Math.floor(contextWindowTokens * 4 * 0.9)`
- `selection-BpjGe-Y0.js:9537-9538` — throws error without running compaction
- `pi-embedded-BpxGOwmb.js` — in-attempt compaction never reached

---

## Additional Context

**Multiple agents hit this on 2026-05-19:**

| Time (UTC) | Agent | Channel |
|-----------|-------|---------|
| 08:45 | neon:telegram:direct | telegram |
| 17:44 | jordan:main | webchat |
| 18:09 | neon:telegram:direct | telegram |
| 22:52 | marcus:main | webchat |

**Key evidence this is NOT actual context exhaustion:**
- `compactionAttempts=0` — pre-emptive check fired, not model rejection
- `observedTokens=unknown` — no token count was measured
- Session at ~50% context — well within 200K token window
- Error fires during tool loop, before model call

---

## Severity

**High** — agents silently die, no user notification, requires manual restart. No self-healing. Recurring across multiple agents.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Preemptive context overflow silently kills embedded sessions without notifying user #84536