Skip to content

Bug: Context compression fails to trigger on API disconnect, causing death spiral in gateway sessions #2153

@hypotyposis

Description

@hypotyposis

Bug Report

Description

Gateway (Telegram) sessions can grow to 212k+ tokens / 700+ messages, causing Anthropic API connection failures ("Server disconnected without sending a response"). Three layers of compression defense all fail simultaneously, creating an unrecoverable death spiral.

Model: claude-opus-4.6 with 200k context limit
Config: compression threshold = 0.85 (170k tokens trigger point)

Steps to Reproduce

  1. Use a gateway (Telegram) session with claude-opus-4.6
  2. Accumulate a long conversation history approaching the 200k context limit
  3. Context grows past the limit, causing API disconnect
  4. Send another message — session never recovers

Expected Behavior

When the API disconnects due to oversized context, compression should be triggered automatically before the next retry or message, bringing the context back within limits.

Actual Behavior

API disconnects → no usage data saved → compression not triggered → next message reloads same bloated history → API disconnects again → repeat forever. User must manually delete the session to recover.

Root Cause Analysis

Three defense layers exist, and all three fail simultaneously:

Layer 1: Agent in-loop compression (run_agent.py line 6487)

  • Checks should_compress() after each tool call
  • Relies on last_prompt_tokens from the previous API response usage data
  • FAILURE: When API disconnects due to oversized context, no usage data is returned, so last_prompt_tokens stays at its old (stale) value. should_compress() never triggers.

Layer 2: Preflight compression (run_agent.py line 5245)

  • Runs once at start of run_conversation() using rough estimate (len(str(msg))//4)
  • FAILURE: Only runs once at entry. If tools within a single conversation turn inflate context past the limit, this does not re-check.

Layer 3: Gateway Session Hygiene (gateway/run.py lines 1636-1834)

  • Pre-agent check when Telegram message arrives
  • Uses session_entry.last_prompt_tokens (real API value) when available, falls back to rough estimate with 1.4x safety factor on threshold
  • Hygiene threshold = 0.85 x 200k = 170k
  • FAILURE: When previous agent call disconnected without returning usage, last_prompt_tokens = 0. Falls back to rough estimate mode. But the rough estimate with 1.4x inflated threshold (238k) makes it HARDER to trigger, not easier — the direction is inverted. The 1.4x factor was meant to compensate for rough estimate overestimation on tool-heavy content, but it makes the safety net less sensitive.

Key Files Involved

  • agent/context_compressor.py — ContextCompressor class, should_compress(), compress()
  • run_agent.py lines 5235-5270 (preflight), 6479-6492 (in-loop check)
  • gateway/run.py lines 1636-1834 (session hygiene)
  • agent/model_metadata.py line 793 — estimate_messages_tokens_rough()

Suggested Fixes

  1. API error handler compression: In the API connection error catch block, use rough estimate to proactively trigger compression before retry
  2. Fix gateway hygiene threshold direction: The 1.4x factor inflates the THRESHOLD (making it less sensitive) — it should deflate the threshold or inflate the estimate instead
  3. Hard message count safety valve: If messages > 500, force compression regardless of token estimates
  4. Persistent token tracking: Store a "last known good" prompt_tokens that persists even when API calls fail, so subsequent checks are not blind

Environment

  • Model: claude-opus-4.6 (200k context limit)
  • Platform: Telegram gateway
  • Compression threshold config: 0.85

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions