Kimi for Coding rate-limit 429 surfaces as misleading 'engine overloaded' death loop

### What happened?

When a Kimi for Coding plan account hits Moonshot's rate limit on `api.kimi.com/coding/v1`, the server returns `HTTP 429` with `error.type: "rate_limit_error"` and `error.message: "The engine is currently overloaded, please try again later"`. PawWork treats this as a generic retryable error and retries up to 10 times with exponential backoff (~5 minutes total wall clock), during which the toast shows the literal server message that incorrectly implies the engine is overloaded rather than that the user has hit a per-plan rate limit. The user experience is a 5-minute "death loop" with a misleading explanation; meanwhile no retry within minutes will succeed because Moonshot's rate limit on this plan resets on a much longer window (we observed full recovery only after ~20 minutes).

### Which area seems affected?

Model harness, prompts, tools, or session mechanics

### How much does this affect you?

Makes a workflow harder, but there is a workaround

### Steps to reproduce

1. Sign in to a Kimi for Coding plan account in PawWork (provider `kimi-for-coding`)
2. Select model Kimi K2.6 (`k2p6`)
3. Send any prompt that triggers a multi-tool round (e.g. "Read the project structure")
4. Repeat 2-3 times in close succession, or use a quota-strained account
5. Observe the "engine is currently overloaded" toast and the retry counter climbing 1/10, 2/10, ..., for ~5 minutes before final failure

### What did you expect to happen?

Fail fast on `rate_limit_error`: do not retry (Moonshot's plan-level rate limit will not reset within retry window), and present a clear message to the user that they have hit the Kimi for Coding plan rate limit and may want to wait or switch to another model. The current implementation hides the actual cause behind retries on what is fundamentally a quota wall.

### PawWork version

dev (preview build, version string starts with `0.0.0-...`)

### OS version

macOS 15

### Can you reproduce it again?

Yes, every time

### Diagnostics

**Confirmed root cause** (from sidecar `AI_APICallError` payload during investigation on 2026-05-18):

```
url:          https://api.kimi.com/coding/v1/messages
statusCode:   429
error.type:   rate_limit_error
error.message: The engine is currently overloaded, please try again later
server-timing: dur=6686-11235ms  (server processes the request, then rejects)
```

The "engine is currently overloaded" wording is Moonshot's own message body — it is NOT a real overload signal, it is how Moonshot phrases their plan rate-limit response. Other client projects have documented the same surface (gsd-build/gsd-2#4640, badlogic/pi-mono#3585, MoonshotAI/kimi-cli#907).

**Ruled out causes** (from same investigation):

- Protocol gap: Not the cause. PR #739 widened the K2.x thinking transform so K2.6 now correctly sends `thinking: { type: enabled, budget_tokens: 16000 }`, but the 429 persists with the correct payload.
- User-Agent filtering: Not the cause. Side-by-side tests with two UA values (`opencode/<dev-version-containing-"anthropic"-substring>` vs `opencode/0.0.91`) on the same account showed similar 429 patterns; the UA-based filter hypothesis from external issues did not reproduce on the account we tested.
- Tool schema sanitization (Moonshot `$ref` siblings, tuple `items`): Not applicable to this 429 — request payload is well-formed.

**Proposed fix surface**:

1. `packages/opencode/src/session/retry.ts` — when parsing the JSON error body, if `error.type === "rate_limit_error"`, return `undefined` (not retryable) instead of the current `"Rate Limited"` string. Approximate change is around the `json.error?.code.includes("rate_limit")` branch (lines ~95-103 as of `d2cf1c86d`).
2. Toast / retry dialog text — when the surfaced error is a `rate_limit_error` from a Kimi family provider, replace the literal `"engine is currently overloaded"` with a PawWork-authored message that names the cause ("Kimi for Coding rate limit reached. Wait a few minutes or switch to another model.") in both English and Chinese.

**Out of scope for this issue**:

- Automatic provider fallback (e.g. silently switch to volcengine Kimi). Rejected in design discussion: too many auth/model-id permutations, leave the choice to the user.
- Reducing retry count to 2-3 as a middle ground. Rejected: the underlying quota wall does not reset in seconds, so any retry on `rate_limit_error` is wasted time and misleading UX. Fail fast is cleaner than partial retry.
- Any change to the OpenAI-compatible Kimi paths (volcengine plan, moonshot.cn, moonshot.ai); those paths already have their own reasoning_content handling and are not affected by this UX bug.

**Acceptance criteria**:

- A request that triggers Moonshot `rate_limit_error` shows a Kimi-rate-limit-specific toast within seconds (no exponential retry climb).
- The original misleading "engine is currently overloaded" wording is not shown to the user from PawWork UI for this error type.
- Existing retry behavior for transient `server_error` / 5xx / true overload is unchanged.
- Test coverage in `test/session/retry.test.ts` locks in: `rate_limit_error` is not retryable; `server_error` still is.

**Related**:

- PR #739 (merged `d2cf1c86d`): widened K2.x thinking transform — fixed a separate latent gap, not this one
- Upstream `25ecf0af6` (#25888): added `server_is_overloaded` to retryable cases — already cherry-picked, also not the fix for this


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kimi for Coding rate-limit 429 surfaces as misleading 'engine overloaded' death loop #740

What happened?

Which area seems affected?

How much does this affect you?

Steps to reproduce

What did you expect to happen?

PawWork version

OS version

Can you reproduce it again?

Diagnostics

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Kimi for Coding rate-limit 429 surfaces as misleading 'engine overloaded' death loop #740

Description

What happened?

Which area seems affected?

How much does this affect you?

Steps to reproduce

What did you expect to happen?

PawWork version

OS version

Can you reproduce it again?

Diagnostics

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions