You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a Kimi for Coding plan account hits Moonshot's rate limit on api.kimi.com/coding/v1, the server returns HTTP 429 with error.type: "rate_limit_error" and error.message: "The engine is currently overloaded, please try again later". PawWork treats this as a generic retryable error and retries up to 10 times with exponential backoff (~5 minutes total wall clock), during which the toast shows the literal server message that incorrectly implies the engine is overloaded rather than that the user has hit a per-plan rate limit. The user experience is a 5-minute "death loop" with a misleading explanation; meanwhile no retry within minutes will succeed because Moonshot's rate limit on this plan resets on a much longer window (we observed full recovery only after ~20 minutes).
Which area seems affected?
Model harness, prompts, tools, or session mechanics
How much does this affect you?
Makes a workflow harder, but there is a workaround
Steps to reproduce
Sign in to a Kimi for Coding plan account in PawWork (provider kimi-for-coding)
Select model Kimi K2.6 (k2p6)
Send any prompt that triggers a multi-tool round (e.g. "Read the project structure")
Repeat 2-3 times in close succession, or use a quota-strained account
Observe the "engine is currently overloaded" toast and the retry counter climbing 1/10, 2/10, ..., for ~5 minutes before final failure
What did you expect to happen?
Fail fast on rate_limit_error: do not retry (Moonshot's plan-level rate limit will not reset within retry window), and present a clear message to the user that they have hit the Kimi for Coding plan rate limit and may want to wait or switch to another model. The current implementation hides the actual cause behind retries on what is fundamentally a quota wall.
PawWork version
dev (preview build, version string starts with 0.0.0-...)
OS version
macOS 15
Can you reproduce it again?
Yes, every time
Diagnostics
Confirmed root cause (from sidecar AI_APICallError payload during investigation on 2026-05-18):
url: https://api.kimi.com/coding/v1/messages
statusCode: 429
error.type: rate_limit_error
error.message: The engine is currently overloaded, please try again later
server-timing: dur=6686-11235ms (server processes the request, then rejects)
The "engine is currently overloaded" wording is Moonshot's own message body — it is NOT a real overload signal, it is how Moonshot phrases their plan rate-limit response. Other client projects have documented the same surface (gsd-build/gsd-2#4640, earendil-works/pi#3585, MoonshotAI/kimi-cli#907).
User-Agent filtering: Not the cause. Side-by-side tests with two UA values (opencode/<dev-version-containing-"anthropic"-substring> vs opencode/0.0.91) on the same account showed similar 429 patterns; the UA-based filter hypothesis from external issues did not reproduce on the account we tested.
Tool schema sanitization (Moonshot $ref siblings, tuple items): Not applicable to this 429 — request payload is well-formed.
Proposed fix surface:
packages/opencode/src/session/retry.ts — when parsing the JSON error body, if error.type === "rate_limit_error", return undefined (not retryable) instead of the current "Rate Limited" string. Approximate change is around the json.error?.code.includes("rate_limit") branch (lines ~95-103 as of d2cf1c86d).
Toast / retry dialog text — when the surfaced error is a rate_limit_error from a Kimi family provider, replace the literal "engine is currently overloaded" with a PawWork-authored message that names the cause ("Kimi for Coding rate limit reached. Wait a few minutes or switch to another model.") in both English and Chinese.
Out of scope for this issue:
Automatic provider fallback (e.g. silently switch to volcengine Kimi). Rejected in design discussion: too many auth/model-id permutations, leave the choice to the user.
Reducing retry count to 2-3 as a middle ground. Rejected: the underlying quota wall does not reset in seconds, so any retry on rate_limit_error is wasted time and misleading UX. Fail fast is cleaner than partial retry.
Any change to the OpenAI-compatible Kimi paths (volcengine plan, moonshot.cn, moonshot.ai); those paths already have their own reasoning_content handling and are not affected by this UX bug.
Acceptance criteria:
A request that triggers Moonshot rate_limit_error shows a Kimi-rate-limit-specific toast within seconds (no exponential retry climb).
The original misleading "engine is currently overloaded" wording is not shown to the user from PawWork UI for this error type.
Existing retry behavior for transient server_error / 5xx / true overload is unchanged.
Test coverage in test/session/retry.test.ts locks in: rate_limit_error is not retryable; server_error still is.
What happened?
When a Kimi for Coding plan account hits Moonshot's rate limit on
api.kimi.com/coding/v1, the server returnsHTTP 429witherror.type: "rate_limit_error"anderror.message: "The engine is currently overloaded, please try again later". PawWork treats this as a generic retryable error and retries up to 10 times with exponential backoff (~5 minutes total wall clock), during which the toast shows the literal server message that incorrectly implies the engine is overloaded rather than that the user has hit a per-plan rate limit. The user experience is a 5-minute "death loop" with a misleading explanation; meanwhile no retry within minutes will succeed because Moonshot's rate limit on this plan resets on a much longer window (we observed full recovery only after ~20 minutes).Which area seems affected?
Model harness, prompts, tools, or session mechanics
How much does this affect you?
Makes a workflow harder, but there is a workaround
Steps to reproduce
kimi-for-coding)k2p6)What did you expect to happen?
Fail fast on
rate_limit_error: do not retry (Moonshot's plan-level rate limit will not reset within retry window), and present a clear message to the user that they have hit the Kimi for Coding plan rate limit and may want to wait or switch to another model. The current implementation hides the actual cause behind retries on what is fundamentally a quota wall.PawWork version
dev (preview build, version string starts with
0.0.0-...)OS version
macOS 15
Can you reproduce it again?
Yes, every time
Diagnostics
Confirmed root cause (from sidecar
AI_APICallErrorpayload during investigation on 2026-05-18):The "engine is currently overloaded" wording is Moonshot's own message body — it is NOT a real overload signal, it is how Moonshot phrases their plan rate-limit response. Other client projects have documented the same surface (gsd-build/gsd-2#4640, earendil-works/pi#3585, MoonshotAI/kimi-cli#907).
Ruled out causes (from same investigation):
thinking: { type: enabled, budget_tokens: 16000 }, but the 429 persists with the correct payload.opencode/<dev-version-containing-"anthropic"-substring>vsopencode/0.0.91) on the same account showed similar 429 patterns; the UA-based filter hypothesis from external issues did not reproduce on the account we tested.$refsiblings, tupleitems): Not applicable to this 429 — request payload is well-formed.Proposed fix surface:
packages/opencode/src/session/retry.ts— when parsing the JSON error body, iferror.type === "rate_limit_error", returnundefined(not retryable) instead of the current"Rate Limited"string. Approximate change is around thejson.error?.code.includes("rate_limit")branch (lines ~95-103 as ofd2cf1c86d).rate_limit_errorfrom a Kimi family provider, replace the literal"engine is currently overloaded"with a PawWork-authored message that names the cause ("Kimi for Coding rate limit reached. Wait a few minutes or switch to another model.") in both English and Chinese.Out of scope for this issue:
rate_limit_erroris wasted time and misleading UX. Fail fast is cleaner than partial retry.Acceptance criteria:
rate_limit_errorshows a Kimi-rate-limit-specific toast within seconds (no exponential retry climb).server_error/ 5xx / true overload is unchanged.test/session/retry.test.tslocks in:rate_limit_erroris not retryable;server_errorstill is.Related:
d2cf1c86d): widened K2.x thinking transform — fixed a separate latent gap, not this one25ecf0af6(#25888): addedserver_is_overloadedto retryable cases — already cherry-picked, also not the fix for this