Bug: Context compression fails to trigger on API disconnect, causing death spiral in gateway sessions

## Bug Report

### Description

Gateway (Telegram) sessions can grow to 212k+ tokens / 700+ messages, causing Anthropic API connection failures ("Server disconnected without sending a response"). Three layers of compression defense all fail simultaneously, creating an unrecoverable death spiral.

**Model:** claude-opus-4.6 with 200k context limit
**Config:** compression threshold = 0.85 (170k tokens trigger point)

### Steps to Reproduce

1. Use a gateway (Telegram) session with claude-opus-4.6
2. Accumulate a long conversation history approaching the 200k context limit
3. Context grows past the limit, causing API disconnect
4. Send another message — session never recovers

### Expected Behavior

When the API disconnects due to oversized context, compression should be triggered automatically before the next retry or message, bringing the context back within limits.

### Actual Behavior

API disconnects → no usage data saved → compression not triggered → next message reloads same bloated history → API disconnects again → repeat forever. User must manually delete the session to recover.

### Root Cause Analysis

Three defense layers exist, and all three fail simultaneously:

**Layer 1: Agent in-loop compression (`run_agent.py` line 6487)**
- Checks `should_compress()` after each tool call
- Relies on `last_prompt_tokens` from the previous API response usage data
- **FAILURE:** When API disconnects due to oversized context, no usage data is returned, so `last_prompt_tokens` stays at its old (stale) value. `should_compress()` never triggers.

**Layer 2: Preflight compression (`run_agent.py` line 5245)**
- Runs once at start of `run_conversation()` using rough estimate (`len(str(msg))//4`)
- **FAILURE:** Only runs once at entry. If tools within a single conversation turn inflate context past the limit, this does not re-check.

**Layer 3: Gateway Session Hygiene (`gateway/run.py` lines 1636-1834)**
- Pre-agent check when Telegram message arrives
- Uses `session_entry.last_prompt_tokens` (real API value) when available, falls back to rough estimate with 1.4x safety factor on threshold
- Hygiene threshold = 0.85 x 200k = 170k
- **FAILURE:** When previous agent call disconnected without returning usage, `last_prompt_tokens` = 0. Falls back to rough estimate mode. But the rough estimate with 1.4x inflated threshold (238k) makes it HARDER to trigger, not easier — the direction is inverted. The 1.4x factor was meant to compensate for rough estimate overestimation on tool-heavy content, but it makes the safety net less sensitive.

### Key Files Involved

- `agent/context_compressor.py` — ContextCompressor class, should_compress(), compress()
- `run_agent.py` lines 5235-5270 (preflight), 6479-6492 (in-loop check)
- `gateway/run.py` lines 1636-1834 (session hygiene)
- `agent/model_metadata.py` line 793 — estimate_messages_tokens_rough()

### Suggested Fixes

1. **API error handler compression:** In the API connection error catch block, use rough estimate to proactively trigger compression before retry
2. **Fix gateway hygiene threshold direction:** The 1.4x factor inflates the THRESHOLD (making it less sensitive) — it should deflate the threshold or inflate the estimate instead
3. **Hard message count safety valve:** If messages > 500, force compression regardless of token estimates
4. **Persistent token tracking:** Store a "last known good" prompt_tokens that persists even when API calls fail, so subsequent checks are not blind

### Environment

- Model: claude-opus-4.6 (200k context limit)
- Platform: Telegram gateway
- Compression threshold config: 0.85

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Context compression fails to trigger on API disconnect, causing death spiral in gateway sessions #2153

Bug Report

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause Analysis

Key Files Involved

Suggested Fixes

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: Context compression fails to trigger on API disconnect, causing death spiral in gateway sessions #2153

Description

Bug Report

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause Analysis

Key Files Involved

Suggested Fixes

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions