Problem
When an LLM provider rejects a request with a context overflow error (e.g., request (66540 tokens) exceeds the available context size (65536 tokens)), the session fails the turn with an error message instead of automatically compacting and retrying.
Root Cause
Compaction only triggers after a successful LLM response returns InputTokenCount exceeding the CompactionTokenLimit threshold (LlmSessionActor.cs:1350-1358). When the provider rejects the call with a 400 error, no usage stats are returned — LlmCallFailed fires → FailCurrentTurn() → user sees an error.
The system detects context overflow via IsContextOverflowError() (line 1647-1660) but only uses it for crafting the error message, not as a compaction trigger.
Impact
- Session becomes permanently stuck if context grows beyond model window
- Especially affects self-hosted models (llama-server) where the configured
ContextWindowTokens may not match the actual server limit
- User must manually start a new thread — terrible UX
Expected Behavior
When IsContextOverflowError() returns true in the LlmCallFailed handler:
- Trigger emergency compaction (same as
CompactionTriggered)
- After compaction completes, retry the failed turn from the buffer
- If compaction + retry still overflows, then fail with the error message
Relevant Code
LlmSessionActor.cs:650-658 — LlmCallFailed handler (just fails, no recovery)
LlmSessionActor.cs:1647-1660 — IsContextOverflowError() detection
LlmSessionActor.cs:1470-1473 — ShouldCompact() only checks _lastInputTokenCount
SessionConfig.cs:136-138 — CompactionTokenLimit calculation
Incident
Session D0AC6CKBK5K/1774021483_588239 (0.7.0) reported this exact error.
Additional Consideration
The user also asked about proactive compaction before sending — estimating token count pre-flight and compacting if the estimate exceeds the limit, rather than waiting for the provider to reject. The naive totalChars / 4 estimator exists (LlmSessionActor.cs:1719-1733) but isn't used pre-flight.
Problem
When an LLM provider rejects a request with a context overflow error (e.g.,
request (66540 tokens) exceeds the available context size (65536 tokens)), the session fails the turn with an error message instead of automatically compacting and retrying.Root Cause
Compaction only triggers after a successful LLM response returns
InputTokenCountexceeding theCompactionTokenLimitthreshold (LlmSessionActor.cs:1350-1358). When the provider rejects the call with a 400 error, no usage stats are returned —LlmCallFailedfires →FailCurrentTurn()→ user sees an error.The system detects context overflow via
IsContextOverflowError()(line 1647-1660) but only uses it for crafting the error message, not as a compaction trigger.Impact
ContextWindowTokensmay not match the actual server limitExpected Behavior
When
IsContextOverflowError()returns true in theLlmCallFailedhandler:CompactionTriggered)Relevant Code
LlmSessionActor.cs:650-658—LlmCallFailedhandler (just fails, no recovery)LlmSessionActor.cs:1647-1660—IsContextOverflowError()detectionLlmSessionActor.cs:1470-1473—ShouldCompact()only checks_lastInputTokenCountSessionConfig.cs:136-138—CompactionTokenLimitcalculationIncident
Session
D0AC6CKBK5K/1774021483_588239(0.7.0) reported this exact error.Additional Consideration
The user also asked about proactive compaction before sending — estimating token count pre-flight and compacting if the estimate exceeds the limit, rather than waiting for the provider to reject. The naive
totalChars / 4estimator exists (LlmSessionActor.cs:1719-1733) but isn't used pre-flight.