Skip to content

bug: no auto-compaction on context overflow — session fails instead of recovering #314

@Aaronontheweb

Description

@Aaronontheweb

Problem

When an LLM provider rejects a request with a context overflow error (e.g., request (66540 tokens) exceeds the available context size (65536 tokens)), the session fails the turn with an error message instead of automatically compacting and retrying.

Root Cause

Compaction only triggers after a successful LLM response returns InputTokenCount exceeding the CompactionTokenLimit threshold (LlmSessionActor.cs:1350-1358). When the provider rejects the call with a 400 error, no usage stats are returned — LlmCallFailed fires → FailCurrentTurn() → user sees an error.

The system detects context overflow via IsContextOverflowError() (line 1647-1660) but only uses it for crafting the error message, not as a compaction trigger.

Impact

  • Session becomes permanently stuck if context grows beyond model window
  • Especially affects self-hosted models (llama-server) where the configured ContextWindowTokens may not match the actual server limit
  • User must manually start a new thread — terrible UX

Expected Behavior

When IsContextOverflowError() returns true in the LlmCallFailed handler:

  1. Trigger emergency compaction (same as CompactionTriggered)
  2. After compaction completes, retry the failed turn from the buffer
  3. If compaction + retry still overflows, then fail with the error message

Relevant Code

  • LlmSessionActor.cs:650-658LlmCallFailed handler (just fails, no recovery)
  • LlmSessionActor.cs:1647-1660IsContextOverflowError() detection
  • LlmSessionActor.cs:1470-1473ShouldCompact() only checks _lastInputTokenCount
  • SessionConfig.cs:136-138CompactionTokenLimit calculation

Incident

Session D0AC6CKBK5K/1774021483_588239 (0.7.0) reported this exact error.

Additional Consideration

The user also asked about proactive compaction before sending — estimating token count pre-flight and compacting if the estimate exceeds the limit, rather than waiting for the provider to reject. The naive totalChars / 4 estimator exists (LlmSessionActor.cs:1719-1733) but isn't used pre-flight.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions