You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When context_budget_tokens is set too tight (e.g. 18K with a 12K+ system prompt), the compaction logic enters a multi-turn loop where every single turn triggers compaction, compaction summaries grow the context further, and eventually the API returns 400 (context too large).
Reproduction
Config: context_budget_tokens = 18000, compaction_threshold = 0.65, auto_budget = false, provider = openai
Run 7+ turns with tool calls.
The root problem: compaction summaries are injected back as system messages. With a very tight budget, the system prompt + injected summaries alone exceed the threshold, so every turn triggers compaction even when there are almost no messages left to compact.
Secondary bug: compaction #2 produced a 2755-token summary for only 2 messages — the summary is larger than what it replaced, making the context worse.
After compaction, if cached_tokens is still above threshold, emit WARN "context compaction could not reduce usage below threshold (compacted N messages, still at M/B tokens)" and stop attempting further compaction for that session — surface Stopping: context window is nearly full to user.
OR: add a post-compaction cooldown: skip should_compact() for the next N turns after a successful compaction.
Compaction summary should be bounded — if summary_tokens > freed_tokens, it was counterproductive; log WARN and don't apply.
Severity
Medium — only affects users with extremely tight context_budget_tokens settings (well below the default). With auto_budget = true (default), this doesn't occur. Manual tight budgets hit this edge case.
Description
When
context_budget_tokensis set too tight (e.g. 18K with a 12K+ system prompt), the compaction logic enters a multi-turn loop where every single turn triggers compaction, compaction summaries grow the context further, and eventually the API returns 400 (context too large).Reproduction
Config:
context_budget_tokens = 18000, compaction_threshold = 0.65, auto_budget = false, provider = openaiRun 7+ turns with tool calls.
Observed Behavior
cached_tokens=15371, threshold=11700 → should_compact=true— compaction M0: Project bootstrap — workspace and crate skeleton #1 fires: 40 messages →summary_tokens=3104. After:cached_tokens=12329(still > threshold).should_compact=trueagain — compaction M1: Ollama chat loop — interactive CLI with LLM #2 fires: 2 messages →summary_tokens=2755. After:cached_tokens=12217(still > threshold).The root problem: compaction summaries are injected back as system messages. With a very tight budget, the system prompt + injected summaries alone exceed the threshold, so every turn triggers compaction even when there are almost no messages left to compact.
Secondary bug: compaction #2 produced a 2755-token summary for only 2 messages — the summary is larger than what it replaced, making the context worse.
Debug dump evidence
Request #3 (failed): 3 system messages (18630 + 2059 + 13957 chars), tool output 15799 chars,
max_tokens=4096. Total context clearly exceeds 18K token budget.Expected Behavior
cached_tokensis still above threshold, emitWARN "context compaction could not reduce usage below threshold (compacted N messages, still at M/B tokens)"and stop attempting further compaction for that session — surfaceStopping: context window is nearly fullto user.should_compact()for the next N turns after a successful compaction.summary_tokens > freed_tokens, it was counterproductive; log WARN and don't apply.Severity
Medium — only affects users with extremely tight
context_budget_tokenssettings (well below the default). Withauto_budget = true(default), this doesn't occur. Manual tight budgets hit this edge case.