Skip to content

Kimi for Coding rate-limit 429 surfaces as misleading 'engine overloaded' death loop #740

@Astro-Han

Description

@Astro-Han

What happened?

When a Kimi for Coding plan account hits Moonshot's rate limit on api.kimi.com/coding/v1, the server returns HTTP 429 with error.type: "rate_limit_error" and error.message: "The engine is currently overloaded, please try again later". PawWork treats this as a generic retryable error and retries up to 10 times with exponential backoff (~5 minutes total wall clock), during which the toast shows the literal server message that incorrectly implies the engine is overloaded rather than that the user has hit a per-plan rate limit. The user experience is a 5-minute "death loop" with a misleading explanation; meanwhile no retry within minutes will succeed because Moonshot's rate limit on this plan resets on a much longer window (we observed full recovery only after ~20 minutes).

Which area seems affected?

Model harness, prompts, tools, or session mechanics

How much does this affect you?

Makes a workflow harder, but there is a workaround

Steps to reproduce

  1. Sign in to a Kimi for Coding plan account in PawWork (provider kimi-for-coding)
  2. Select model Kimi K2.6 (k2p6)
  3. Send any prompt that triggers a multi-tool round (e.g. "Read the project structure")
  4. Repeat 2-3 times in close succession, or use a quota-strained account
  5. Observe the "engine is currently overloaded" toast and the retry counter climbing 1/10, 2/10, ..., for ~5 minutes before final failure

What did you expect to happen?

Fail fast on rate_limit_error: do not retry (Moonshot's plan-level rate limit will not reset within retry window), and present a clear message to the user that they have hit the Kimi for Coding plan rate limit and may want to wait or switch to another model. The current implementation hides the actual cause behind retries on what is fundamentally a quota wall.

PawWork version

dev (preview build, version string starts with 0.0.0-...)

OS version

macOS 15

Can you reproduce it again?

Yes, every time

Diagnostics

Confirmed root cause (from sidecar AI_APICallError payload during investigation on 2026-05-18):

url:          https://api.kimi.com/coding/v1/messages
statusCode:   429
error.type:   rate_limit_error
error.message: The engine is currently overloaded, please try again later
server-timing: dur=6686-11235ms  (server processes the request, then rejects)

The "engine is currently overloaded" wording is Moonshot's own message body — it is NOT a real overload signal, it is how Moonshot phrases their plan rate-limit response. Other client projects have documented the same surface (gsd-build/gsd-2#4640, earendil-works/pi#3585, MoonshotAI/kimi-cli#907).

Ruled out causes (from same investigation):

  • Protocol gap: Not the cause. PR fix(provider): widen Kimi anthropic-SDK thinking match to K2.x #739 widened the K2.x thinking transform so K2.6 now correctly sends thinking: { type: enabled, budget_tokens: 16000 }, but the 429 persists with the correct payload.
  • User-Agent filtering: Not the cause. Side-by-side tests with two UA values (opencode/<dev-version-containing-"anthropic"-substring> vs opencode/0.0.91) on the same account showed similar 429 patterns; the UA-based filter hypothesis from external issues did not reproduce on the account we tested.
  • Tool schema sanitization (Moonshot $ref siblings, tuple items): Not applicable to this 429 — request payload is well-formed.

Proposed fix surface:

  1. packages/opencode/src/session/retry.ts — when parsing the JSON error body, if error.type === "rate_limit_error", return undefined (not retryable) instead of the current "Rate Limited" string. Approximate change is around the json.error?.code.includes("rate_limit") branch (lines ~95-103 as of d2cf1c86d).
  2. Toast / retry dialog text — when the surfaced error is a rate_limit_error from a Kimi family provider, replace the literal "engine is currently overloaded" with a PawWork-authored message that names the cause ("Kimi for Coding rate limit reached. Wait a few minutes or switch to another model.") in both English and Chinese.

Out of scope for this issue:

  • Automatic provider fallback (e.g. silently switch to volcengine Kimi). Rejected in design discussion: too many auth/model-id permutations, leave the choice to the user.
  • Reducing retry count to 2-3 as a middle ground. Rejected: the underlying quota wall does not reset in seconds, so any retry on rate_limit_error is wasted time and misleading UX. Fail fast is cleaner than partial retry.
  • Any change to the OpenAI-compatible Kimi paths (volcengine plan, moonshot.cn, moonshot.ai); those paths already have their own reasoning_content handling and are not affected by this UX bug.

Acceptance criteria:

  • A request that triggers Moonshot rate_limit_error shows a Kimi-rate-limit-specific toast within seconds (no exponential retry climb).
  • The original misleading "engine is currently overloaded" wording is not shown to the user from PawWork UI for this error type.
  • Existing retry behavior for transient server_error / 5xx / true overload is unchanged.
  • Test coverage in test/session/retry.test.ts locks in: rate_limit_error is not retryable; server_error still is.

Related:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingharnessModel harness, prompts, tool descriptions, and session mechanics

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions