fix(cli): cap per-turn compaction attempts#9344
Conversation
When every compaction round still overflowed the model context, SessionPrompt.runLoop would keep calling compaction forever and report the turn as completed. Cap attempts at three per turn and surface exhaustion as a ContextOverflowError on the assistant message with TurnClose reason=error.
High-level issues on the cloud repo sideThere is one function on the cloud side that is implicated in this bug:
It has three distinct problems, only one of which matters for the infinite-loop bug, but all of which deserve attention: Problem 1 — it rewrites any upstream 4xx/5xx into
|
| Commit | PR | Author | Date | What it did |
|---|---|---|---|---|
bc8179c70 initial commit |
— | Remon Oldenbeuving (remonoldenbeuving) | 2026-02-04 | The makeErrorReadable function with the overflow-rewrite block was already present in the initial commit of this cloud repo. Predates this bug report by months. |
c4ff5bebb fix(llm-gateway): add context-length exceeded error translation for Kilo free models |
— | Igor Šćekić (iscekic) | 2026-03-03 | Ported the same logic into the (now-retired) llm-gateway Cloudflare Worker. Commit message explicitly calls the web app version "the reference". |
None of the recent cloud PRs introduced the buggy rewrite. It has been the dominant error-translation path for Kilo-exclusive models since February.
The amplifiers — what made the rewrite fire frequently in the last week
| PR | Commit | Author | Date | Effect on overflow frequency |
|---|---|---|---|---|
#2491 feat(proxy): add error_type zod enum to all LLM proxy error responses |
dff71cbac |
AI-authored via kilo-code-bot (no human named in the PR body) |
2026-04-16 | Added error_type enum everywhere. Did not change the rewrite logic — neutral for the bug. |
#2509 Route 10% of kilo-auto/free to Step Flash |
ae6033fa3 |
Christiaan Arnoldus (chrarnoldus) | 2026-04-16 | 10% of kilo-auto/free sessions now hit a different backing model; any transient upstream error there gets rewritten via the same path. |
#2518 Update Claude Opus model IDs and names to 4.7 |
48cb77744 |
Christiaan Arnoldus (chrarnoldus) | 2026-04-16 | Name bump, minor. |
#2526 Add xhigh output effort / verbosity for Opus 4.7 |
1353dc14d |
Christiaan Arnoldus (chrarnoldus) | 2026-04-16 | New xhigh/max variants inflate request max_tokens. estimateTokenCount adds that directly, so the check flips true sooner. |
#2502 feat(auto): replace kilo-auto/small backing with Gemma 4 |
416ca73a9 |
AI-authored by anthropic/claude-opus-4.6, merged via kilo-code-bot |
2026-04-17 | New backing models for kilo-auto/small. |
#2576 Enable reasoning summaries by default |
dc74d3b46 |
Christiaan Arnoldus (chrarnoldus) | 2026-04-20 | Every reasoning request now carries thinking.display: 'summarized' / reasoning.summary: 'auto'. Responses are larger → next turn's input is larger → estimateTokenCount creeps up. |
#2621 Disable Trinity Large Thinking free and notify affected users |
f9302ea43 |
AI-drafted via "Kilo for Slack" at request of Ari Messer, merged via kilo-code-bot |
2026-04-20 | Pushed a block of users off Trinity (262k ctx) onto Kilo Auto Free (minimax-m2.5:free, 204.8k ctx — the tightest of the exclusive models). |
Existing cloud-repo issues about this
I searched Kilo-Org/cloud for open issues about context_length_exceeded, makeErrorReadable, or estimateTokenCount. There are none. The bug is tracked entirely on the CLI side (Kilo-Org/kilocode#9285 by Zindaar, confirmed by visonforcoding). The cloud repo does not have an issue filed for the wrong-error-transformation problem yet.
Diagram
flowchart TD
U[Upstream provider] -->|any 4xx or 5xx:<br/>500, 502, 503, 429, 400, ...| M[makeErrorReadable]
M --> C1{BYOK?}
C1 -->|yes| R1[BYOK-specific message]
C1 -->|no| C2{Kilo-exclusive model?}
C2 -->|no| Pass[pass upstream error through]
C2 -->|yes| C3["estimateTokenCount >= context_length?<br/>(JSON.stringify/4 + max_tokens)"]
C3 -->|no| C4{Stealth model?}
C3 -->|yes| Rewrite["REWRITE to<br/>error_type: context_length_exceeded<br/>status: upstream status"]
Rewrite --> CLI[CLI classifies as<br/>context_overflow]
CLI --> Compact[auto-compact]
Compact --> U
style Rewrite fill:#fee,stroke:#c00
style C3 fill:#ffe,stroke:#cc0
The red box is where we're wrongly transforming. The yellow box is where the transformation decision is made on a badly inflated heuristic. Together they produce the loop that PR #9344 now caps on the CLI side.
Recommended cloud-side follow-ups (not fixed by #9344)
- Gate the rewrite on upstream error content, not just status + size. Only rewrite if the upstream body actually contains an ambiguous overflow signature (e.g. matches
/maximum context|context.*length|token.*exceed/ior is empty/generic) — not for every 4xx/5xx. - Fix the over-counting estimate. Either (a) use an actual tokenizer (tiktoken / model-specific) before rewriting, or (b) widen the trigger threshold to
context_length * 1.5to compensate for JSON overhead. - Don't preserve the upstream status code on rewrite. If we're confident enough to call it overflow, return 413 (the canonical overflow status) so clients can rely on status code alone.
- Add a cloud-repo issue tracking this — right now the only record is on the CLI side.
Trim pre-summary history for any completed summary (not only those whose parent has a compaction part) and strip image/PDF attachments from historical turns once a summary exists. This stops the outgoing request from re-shipping multi-MB base-64 attachments on every follow-up turn, which was causing gateway body-size rejections and cascading compaction loops even after PR Kilo-Org#9344's attempt cap kicked in.
fix(cli): cap per-turn compaction attempts
Trim pre-summary history for any completed summary (not only those whose parent has a compaction part) and strip image/PDF attachments from historical turns once a summary exists. This stops the outgoing request from re-shipping multi-MB base-64 attachments on every follow-up turn, which was causing gateway body-size rejections and cascading compaction loops even after PR Kilo-Org#9344's attempt cap kicked in.
Why
If a model kept saying the conversation was too big after every compaction, the chat would get stuck in a forever "busy" loop and eventually look like it finished normally even though nothing happened.
What changed
Each turn now keeps track of how many times it has tried to shrink the conversation. After three tries, the chat stops looping, marks the turn as an error, and shows a clear "context overflow" message instead of pretending everything was fine. Before the third try, everything behaves exactly as it did before, so normal overflow-then-recover cases still work. The error also reaches anything listening for turn-close events, so tools and UIs now see "error" instead of "completed" for these stuck turns.
How to test
{"error":{"code":"context_length_exceeded"}}.packages/opencode/, runbun test test/kilocode/session-compaction-cap.test.ts— all five cases should pass.