fix(cli): cap per-turn compaction attempts by alex-alecu · Pull Request #9344 · Kilo-Org/kilocode

alex-alecu · 2026-04-22T06:54:40Z

Why

If a model kept saying the conversation was too big after every compaction, the chat would get stuck in a forever "busy" loop and eventually look like it finished normally even though nothing happened.

What changed

Each turn now keeps track of how many times it has tried to shrink the conversation. After three tries, the chat stops looping, marks the turn as an error, and shows a clear "context overflow" message instead of pretending everything was fine. Before the third try, everything behaves exactly as it did before, so normal overflow-then-recover cases still work. The error also reaches anything listening for turn-close events, so tools and UIs now see "error" instead of "completed" for these stuck turns.

How to test

Point the CLI at a fake or local provider that always replies with HTTP 400 {"error":{"code":"context_length_exceeded"}}.
Start a new chat and send any message.
The chat should try to compact a few times, then stop and show a red "context overflow" error on the last assistant message instead of hanging on "busy".
From packages/opencode/, run bun test test/kilocode/session-compaction-cap.test.ts — all five cases should pass.

When every compaction round still overflowed the model context, SessionPrompt.runLoop would keep calling compaction forever and report the turn as completed. Cap attempts at three per turn and surface exhaustion as a ContextOverflowError on the assistant message with TurnClose reason=error.

alex-alecu · 2026-04-22T12:16:07Z

High-level issues on the cloud repo side

There is one function on the cloud side that is implicated in this bug:

apps/web/src/lib/ai-gateway/llm-proxy-helpers.ts:168 — makeErrorReadable()

It has three distinct problems, only one of which matters for the infinite-loop bug, but all of which deserve attention:

Problem 1 — it rewrites any upstream 4xx/5xx into `context_length_exceeded`

The structure of the check is:

if (response.status < 400) return undefined;             // only run on errors
if (isUserByok) { ... }                                  // BYOK branch
const model = kiloExclusiveModels.find(m => m.public_id === requestedModel);
if (model) {
  const estimatedTokenCount = estimateTokenCount(request);
  if (estimatedTokenCount >= model.context_length) {
    // REWRITE: replace upstream error with a synthesized context_length_exceeded
    return NextResponse.json(
      { error, error_type: ProxyErrorType.context_length_exceeded, message: error },
      { status: response.status }
    );
  }
}

The stated intent (in the comment immediately above) is:

Sometimes we get generic or nonsensical errors when the context length is exceeded (such as "Internal Server Error" or "No allowed providers are available for the selected model")

— i.e. translate ambiguous overflow errors into a clearer message.

In practice there is no check that the upstream error was actually overflow-related. The rewrite fires for:

Genuine overflows ✅ correct
Provider outages (Novita/Minimax 500/502/503) ❌ wrong
Rate limits (429) ❌ wrong
Parse errors / malformed upstream responses ❌ wrong
Any other 4xx/5xx that happens to land on a large request ❌ wrong

So yes — we are wrongly transforming errors. Any time a Kilo-exclusive model has an upstream hiccup on a moderately-sized request, the client sees context_length_exceeded instead of the real cause. The CLI then does what it's designed to do for that signal: compact and retry. If the upstream keeps failing, the cloud keeps translating, and the CLI keeps compacting → the exact loop PR #9344 caps.

Problem 2 — the token estimate is badly inaccurate and over-counts

function estimateTokenCount(request: GatewayRequest) {
  return Math.round(JSON.stringify(request).length / 4 + (getMaxTokens(request) ?? 0));
}

Two issues:

JSON.stringify(...).length / 4 — the "4 chars ≈ 1 token" heuristic is for plaintext English. JSON carries 50–100% overhead from quotes, braces, escape sequences, tool-definition scaffolding, cache-control markers, etc. So the "character side" of the estimate over-counts real tokens by roughly 1.5–2×.
Adding max_tokens — the model's context_length is total budget (input + output). max_tokens is the output budget. Adding them is formally correct, but clients typically send max_tokens = model.max_completion_tokens (the model's advertised max output). For minimax/minimax-m2.5:free that's 131_072 out of a 204_800 context — so the comparison effectively becomes "is the estimated input alone ≥ 73,728 tokens?", and with the 2× over-count, an actual 37k-token input is already enough to trigger the rewrite.

Net effect: the rewrite fires on inputs that would actually fit in the model. On kilo-auto/free (minimax-m2.5:free, 204.8k context, 131k max output) this is very easy to reach after just a few tool calls.

Problem 3 — the rewrite preserves the upstream status code

return NextResponse.json(
  { error, error_type: ProxyErrorType.context_length_exceeded, message: error },
  { status: response.status }        // <-- upstream status preserved
);

So a 500 upstream error becomes a 500 response with error.code === "context_length_exceeded". That shape confuses the CLI's error classifier: parseAPICallError uses statusCode === 413 as one overflow signal, so the status is no guide, but it uses body.error.code === "context_length_exceeded" as another — and this path matches that regardless of the underlying status. The effect: the CLI treats a genuine 500 outage as recoverable via compaction.

Timeline / PRs on the cloud side

The bug itself — the transformation code

Commit	PR	Author	Date	What it did
`bc8179c70 initial commit`	—	Remon Oldenbeuving (remonoldenbeuving)	2026-02-04	The `makeErrorReadable` function with the overflow-rewrite block was already present in the initial commit of this cloud repo. Predates this bug report by months.
`c4ff5bebb fix(llm-gateway): add context-length exceeded error translation for Kilo free models`	—	Igor Šćekić (iscekic)	2026-03-03	Ported the same logic into the (now-retired) `llm-gateway` Cloudflare Worker. Commit message explicitly calls the web app version "the reference".

None of the recent cloud PRs introduced the buggy rewrite. It has been the dominant error-translation path for Kilo-exclusive models since February.

The amplifiers — what made the rewrite fire frequently in the last week

PR	Commit	Author	Date	Effect on overflow frequency
#2491 `feat(proxy): add error_type zod enum to all LLM proxy error responses`	`dff71cbac`	AI-authored via `kilo-code-bot` (no human named in the PR body)	2026-04-16	Added `error_type` enum everywhere. Did not change the rewrite logic — neutral for the bug.
#2509 `Route 10% of kilo-auto/free to Step Flash`	`ae6033fa3`	Christiaan Arnoldus (chrarnoldus)	2026-04-16	10% of `kilo-auto/free` sessions now hit a different backing model; any transient upstream error there gets rewritten via the same path.
#2518 `Update Claude Opus model IDs and names to 4.7`	`48cb77744`	Christiaan Arnoldus (chrarnoldus)	2026-04-16	Name bump, minor.
#2526 `Add xhigh output effort / verbosity for Opus 4.7`	`1353dc14d`	Christiaan Arnoldus (chrarnoldus)	2026-04-16	New `xhigh`/`max` variants inflate request `max_tokens`. `estimateTokenCount` adds that directly, so the check flips true sooner.
#2502 `feat(auto): replace kilo-auto/small backing with Gemma 4`	`416ca73a9`	AI-authored by `anthropic/claude-opus-4.6`, merged via `kilo-code-bot`	2026-04-17	New backing models for `kilo-auto/small`.
#2576 `Enable reasoning summaries by default`	`dc74d3b46`	Christiaan Arnoldus (chrarnoldus)	2026-04-20	Every reasoning request now carries `thinking.display: 'summarized'` / `reasoning.summary: 'auto'`. Responses are larger → next turn's input is larger → `estimateTokenCount` creeps up.
#2621 `Disable Trinity Large Thinking free and notify affected users`	`f9302ea43`	AI-drafted via "Kilo for Slack" at request of Ari Messer, merged via `kilo-code-bot`	2026-04-20	Pushed a block of users off Trinity (262k ctx) onto Kilo Auto Free (minimax-m2.5:free, 204.8k ctx — the tightest of the exclusive models).

Existing cloud-repo issues about this

I searched Kilo-Org/cloud for open issues about context_length_exceeded, makeErrorReadable, or estimateTokenCount. There are none. The bug is tracked entirely on the CLI side (Kilo-Org/kilocode#9285 by Zindaar, confirmed by visonforcoding). The cloud repo does not have an issue filed for the wrong-error-transformation problem yet.

Diagram

flowchart TD
    U[Upstream provider] -->|any 4xx or 5xx:<br/>500, 502, 503, 429, 400, ...| M[makeErrorReadable]

    M --> C1{BYOK?}
    C1 -->|yes| R1[BYOK-specific message]
    C1 -->|no| C2{Kilo-exclusive model?}
    C2 -->|no| Pass[pass upstream error through]
    C2 -->|yes| C3["estimateTokenCount >= context_length?<br/>(JSON.stringify/4 + max_tokens)"]
    C3 -->|no| C4{Stealth model?}
    C3 -->|yes| Rewrite["REWRITE to<br/>error_type: context_length_exceeded<br/>status: upstream status"]

    Rewrite --> CLI[CLI classifies as<br/>context_overflow]
    CLI --> Compact[auto-compact]
    Compact --> U

    style Rewrite fill:#fee,stroke:#c00
    style C3 fill:#ffe,stroke:#cc0

The red box is where we're wrongly transforming. The yellow box is where the transformation decision is made on a badly inflated heuristic. Together they produce the loop that PR #9344 now caps on the CLI side.

Recommended cloud-side follow-ups (not fixed by #9344)

Gate the rewrite on upstream error content, not just status + size. Only rewrite if the upstream body actually contains an ambiguous overflow signature (e.g. matches /maximum context|context.*length|token.*exceed/i or is empty/generic) — not for every 4xx/5xx.
Fix the over-counting estimate. Either (a) use an actual tokenizer (tiktoken / model-specific) before rewriting, or (b) widen the trigger threshold to context_length * 1.5 to compensate for JSON overhead.
Don't preserve the upstream status code on rewrite. If we're confident enough to call it overflow, return 413 (the canonical overflow status) so clients can rely on status code alone.
Add a cloud-repo issue tracking this — right now the only record is on the CLI side.

Trim pre-summary history for any completed summary (not only those whose parent has a compaction part) and strip image/PDF attachments from historical turns once a summary exists. This stops the outgoing request from re-shipping multi-MB base-64 attachments on every follow-up turn, which was causing gateway body-size rejections and cascading compaction loops even after PR Kilo-Org#9344's attempt cap kicked in.

…Org#9344)

fix(cli): cap per-turn compaction attempts

Trim pre-summary history for any completed summary (not only those whose parent has a compaction part) and strip image/PDF attachments from historical turns once a summary exists. This stops the outgoing request from re-shipping multi-MB base-64 attachments on every follow-up turn, which was causing gateway body-size rejections and cascading compaction loops even after PR Kilo-Org#9344's attempt cap kicked in.

kilo-code-bot Bot mentioned this pull request Apr 22, 2026

fix(cli): simpler compaction cap to prevent infinite busy loop #9354

Closed

markijbema approved these changes Apr 22, 2026

View reviewed changes

alex-alecu merged commit 740c2b4 into main Apr 22, 2026
19 checks passed

alex-alecu deleted the fix/infinite-compact branch April 22, 2026 13:10

slamj1 pushed a commit to slamj1/kilocode that referenced this pull request May 16, 2026

feat(app): Add ability to select project directory text to web (Kilo-…

bfa986d

…Org#9344)

jliounis pushed a commit to jliounis/kilocode that referenced this pull request May 18, 2026

feat(app): Add ability to select project directory text to web (Kilo-…

1c6c1bd

…Org#9344)

jliounis pushed a commit to jliounis/kilocode that referenced this pull request May 18, 2026

Merge pull request Kilo-Org#9344 from Kilo-Org/fix/infinite-compact

f441dc1

fix(cli): cap per-turn compaction attempts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cli): cap per-turn compaction attempts#9344

fix(cli): cap per-turn compaction attempts#9344
alex-alecu merged 1 commit into
mainfrom
fix/infinite-compact

alex-alecu commented Apr 22, 2026

Uh oh!

alex-alecu commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alex-alecu commented Apr 22, 2026

Why

What changed

How to test

Uh oh!

alex-alecu commented Apr 22, 2026

High-level issues on the cloud repo side

Problem 1 — it rewrites any upstream 4xx/5xx into context_length_exceeded

Problem 2 — the token estimate is badly inaccurate and over-counts

Problem 3 — the rewrite preserves the upstream status code

Timeline / PRs on the cloud side

The bug itself — the transformation code

The amplifiers — what made the rewrite fire frequently in the last week

Existing cloud-repo issues about this

Diagram

Recommended cloud-side follow-ups (not fixed by #9344)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Problem 1 — it rewrites any upstream 4xx/5xx into `context_length_exceeded`