Bug Report
Description
Gateway (Telegram) sessions can grow to 212k+ tokens / 700+ messages, causing Anthropic API connection failures ("Server disconnected without sending a response"). Three layers of compression defense all fail simultaneously, creating an unrecoverable death spiral.
Model: claude-opus-4.6 with 200k context limit
Config: compression threshold = 0.85 (170k tokens trigger point)
Steps to Reproduce
- Use a gateway (Telegram) session with claude-opus-4.6
- Accumulate a long conversation history approaching the 200k context limit
- Context grows past the limit, causing API disconnect
- Send another message — session never recovers
Expected Behavior
When the API disconnects due to oversized context, compression should be triggered automatically before the next retry or message, bringing the context back within limits.
Actual Behavior
API disconnects → no usage data saved → compression not triggered → next message reloads same bloated history → API disconnects again → repeat forever. User must manually delete the session to recover.
Root Cause Analysis
Three defense layers exist, and all three fail simultaneously:
Layer 1: Agent in-loop compression (run_agent.py line 6487)
- Checks
should_compress() after each tool call
- Relies on
last_prompt_tokens from the previous API response usage data
- FAILURE: When API disconnects due to oversized context, no usage data is returned, so
last_prompt_tokens stays at its old (stale) value. should_compress() never triggers.
Layer 2: Preflight compression (run_agent.py line 5245)
- Runs once at start of
run_conversation() using rough estimate (len(str(msg))//4)
- FAILURE: Only runs once at entry. If tools within a single conversation turn inflate context past the limit, this does not re-check.
Layer 3: Gateway Session Hygiene (gateway/run.py lines 1636-1834)
- Pre-agent check when Telegram message arrives
- Uses
session_entry.last_prompt_tokens (real API value) when available, falls back to rough estimate with 1.4x safety factor on threshold
- Hygiene threshold = 0.85 x 200k = 170k
- FAILURE: When previous agent call disconnected without returning usage,
last_prompt_tokens = 0. Falls back to rough estimate mode. But the rough estimate with 1.4x inflated threshold (238k) makes it HARDER to trigger, not easier — the direction is inverted. The 1.4x factor was meant to compensate for rough estimate overestimation on tool-heavy content, but it makes the safety net less sensitive.
Key Files Involved
agent/context_compressor.py — ContextCompressor class, should_compress(), compress()
run_agent.py lines 5235-5270 (preflight), 6479-6492 (in-loop check)
gateway/run.py lines 1636-1834 (session hygiene)
agent/model_metadata.py line 793 — estimate_messages_tokens_rough()
Suggested Fixes
- API error handler compression: In the API connection error catch block, use rough estimate to proactively trigger compression before retry
- Fix gateway hygiene threshold direction: The 1.4x factor inflates the THRESHOLD (making it less sensitive) — it should deflate the threshold or inflate the estimate instead
- Hard message count safety valve: If messages > 500, force compression regardless of token estimates
- Persistent token tracking: Store a "last known good" prompt_tokens that persists even when API calls fail, so subsequent checks are not blind
Environment
- Model: claude-opus-4.6 (200k context limit)
- Platform: Telegram gateway
- Compression threshold config: 0.85
Bug Report
Description
Gateway (Telegram) sessions can grow to 212k+ tokens / 700+ messages, causing Anthropic API connection failures ("Server disconnected without sending a response"). Three layers of compression defense all fail simultaneously, creating an unrecoverable death spiral.
Model: claude-opus-4.6 with 200k context limit
Config: compression threshold = 0.85 (170k tokens trigger point)
Steps to Reproduce
Expected Behavior
When the API disconnects due to oversized context, compression should be triggered automatically before the next retry or message, bringing the context back within limits.
Actual Behavior
API disconnects → no usage data saved → compression not triggered → next message reloads same bloated history → API disconnects again → repeat forever. User must manually delete the session to recover.
Root Cause Analysis
Three defense layers exist, and all three fail simultaneously:
Layer 1: Agent in-loop compression (
run_agent.pyline 6487)should_compress()after each tool calllast_prompt_tokensfrom the previous API response usage datalast_prompt_tokensstays at its old (stale) value.should_compress()never triggers.Layer 2: Preflight compression (
run_agent.pyline 5245)run_conversation()using rough estimate (len(str(msg))//4)Layer 3: Gateway Session Hygiene (
gateway/run.pylines 1636-1834)session_entry.last_prompt_tokens(real API value) when available, falls back to rough estimate with 1.4x safety factor on thresholdlast_prompt_tokens= 0. Falls back to rough estimate mode. But the rough estimate with 1.4x inflated threshold (238k) makes it HARDER to trigger, not easier — the direction is inverted. The 1.4x factor was meant to compensate for rough estimate overestimation on tool-heavy content, but it makes the safety net less sensitive.Key Files Involved
agent/context_compressor.py— ContextCompressor class, should_compress(), compress()run_agent.pylines 5235-5270 (preflight), 6479-6492 (in-loop check)gateway/run.pylines 1636-1834 (session hygiene)agent/model_metadata.pyline 793 — estimate_messages_tokens_rough()Suggested Fixes
Environment