fix(client): translate DeepSeek 429 into a concurrency-limit hint by esengine · Pull Request #1526 · esengine/DeepSeek-Reasonix

esengine · 2026-05-22T06:26:55Z

Summary

Translate DeepSeek 429 into a user-friendly concurrency-limit hint, and clarify that the rateLimit.rpm config is a client-side self-throttle (not a DeepSeek-enforced cap). Closes #1522.

Scope decision (intentionally narrow)

The "new rate limit" docs the issue references are concurrency-based: 500 in-flight requests per account for deepseek-v4-pro, 2500 for deepseek-v4-flash, summed across all API keys on the account. A single-user CLI doing one turn at a time can never approach this — only really hit when:

the user has 3+ reasonix processes sharing one key, or
a spawn_subagent parallel fan-out + nested subagents pile up beyond 500 in flight

So this PR is only about telling the user what's happening when 429 does fire. Explicitly not in scope:

Implementing client-side concurrency tracking — pointless when DeepSeek already enforces it server-side and our single-turn loop can't saturate it on its own.
Adding user_id parameter support — useful for multi-tenant scenarios (dashboard, cc-connect bridge), but per RFC RFC: Chat-platform bridge (Feishu / WeWork / WeChat) #410 chat-bridge work lives outside core. Reasonix-core stays single-user.
Keep-alive parsing — already handled (client.ts:117-126 comment confirms SSE comments and non-stream empty lines are invisible to our parsers).
10-min server-side queue timeout — already covered by the existing 11-min client timeout.

What this PR does

1. Friendly 429 error message

src/loop/errors.ts: add 429 → t("errors.concurrency429", { inner }) alongside the existing 401/402/422/400 cases.

src/i18n/{EN,zh-CN}.ts + types.ts: new errors.concurrency429 key. Names the actual caps (500 pro / 2500 flash), identifies the likely cause (another reasonix process on the same key, or a fan-out that overshot), and points at the remediation (wait + retry, reduce parallelism, or request expansion at platform.deepseek.com).

Before:

DeepSeek 429: {"error":{"message":"Too Many Requests, please reduce concurrency"}}

After:

DeepSeek concurrency limit hit (429): Too Many Requests, please reduce concurrency. The account has too many in-flight requests (cap: 500 for v4-pro, 2500 for v4-flash, summed across API keys account-wide). Usually means another Reasonix process is sharing the same key, or a parallel subagent fan-out overshot. Wait a few seconds and retry, reduce parallelism, or request a higher cap at https://platform.deepseek.com.

2. Clarify `rateLimit.rpm` docstring

src/config.ts: RateLimitConfig.rpm is a client-side self-throttle paced by a min-interval timer — it never mapped to a DeepSeek-enforced cap (DeepSeek has never published an RPM ceiling). Docstring now spells that out so users don't think setting rpm: 60 does anything DeepSeek-related.

Test Plan

tests/loop-error.test.ts — new case: DeepSeek 429: {...} → contains "concurrency limit" + "500" + "2500" + the inner reason + the platform URL
npx vitest run tests/loop-error.test.ts tests/comment-policy.test.ts — 38 pass
npx tsc --noEmit — clean
npx biome check on touched files — clean
npm run verify via pre-push hook — green

Test Plan (manual)

Not run — no easy way to actually trigger 429 from a single CLI process against DeepSeek's real concurrency cap without spinning up 500+ pro requests. The unit test covers the error-formatting path which is the only thing this PR changes.

DeepSeek's actual rate limit is concurrency-based, not RPM (500 for v4-pro, 2500 for v4-flash, account-wide across API keys). When the account exceeds it the server returns HTTP 429, which currently surfaces as a raw `DeepSeek 429: {...body}` string — fine for a maintainer to read, useless for a user trying to figure out why. Add a friendly message that names the actual cap, identifies the likely cause (another reasonix process on the same key, or a parallel subagent fan-out that overshot), and points at the remediation (wait + retry, reduce parallelism, or request expansion). Same shape as the existing 401/402/422/400 cases. Also clarify the `rateLimit.rpm` config docstring: it's a client-side self-throttle for politeness on shared infra, not a DeepSeek-enforced limit. DeepSeek has never had a published RPM cap. Closes #1522

…se (#1565) * chore(release): 0.49.0 — static-history TUI, queued steers, Bing default, lifecycle plans Headline themes: - TUI: Static-history renderer is the only path; virtual-viewport layers removed (#1529 stages 1-4) - Chat: queued mid-turn steer handling so input mid-render doesn't drop or fight the live frame (#1501) - Web search: default switches to Bing; dashboard engine switcher; Mojeek dropped (#1558) - Plans: lifecycle evidence summaries surface why a plan is ready to accept (#1500) - Desktop: native OS notifications for approvals + completion (#1519) - i18n: CLI command output (/mcp /sessions /prune /theme) + approval-prompt labels translated (#1524, #1560) - Security: SSRF block in web_fetch (#1544), edit-snapshot path containment (#1454), shell redirect sandbox (#1457), Task integrity guardrail (#1516) - Tools: per-turn dispatch-rate limit (#1356); run_command discourages shell-based edits (#1514) - Client: DeepSeek 429 → concurrency-limit hint (#1526); timeoutMs honored with AbortSignal (#1535); --no-proxy opt-out for direct route (#1507) - Files: read/edit/restore preserves source encoding (GB18030 / UTF-8 BOM) (#1518) - Context: pinned constraints survive folds + full tail capture (#1515, #1552) - Refactor: lifecycle risk policy extracted into its own module (#1557) See CHANGELOG for the full list. * fix(context): align fold summary prefix with main agent for cache reuse The summarizer call was sending a bespoke "You compress conversation history" system prompt and no tools, guaranteeing a 0% cache hit against the main agent's just-cached prefix. Reshape the request so system + tools + head bytes mirror the live agent's last call — the only novel bytes are the trailing summarize instruction. Skill-pin handling now collects bodies read-only instead of stubbing mid-head, so the cache prefix stays unbroken. The summarize instruction names pinned skills so the model knows not to paraphrase their bodies (which we append verbatim regardless). Measured on a real session at 48.7K prompt tokens: OLD shape: 0.0% cache hit → $0.145 per fold NEW shape: 99.6% cache hit → $0.015 per fold saving: 89.6% per fold * tools: add fold-cache shape + live benchmarks bench-fold-cache-shape.mjs replays real session jsonls, simulates OLD vs NEW summary-call shapes at the fold point, and reports byte-level shared-prefix with the main agent's preceding request. Pure local — no API required. bench-fold-cache-live.mjs sends one priming + two summary calls to DeepSeek and reports prompt_cache_hit_tokens / cost for each shape. Used to confirm the shape change actually translates to API-side cache hits. --------- Co-authored-by: reasonix <reasonix@deepseek.com>

esengine merged commit c9f8bc1 into main May 22, 2026
4 checks passed

esengine deleted the fix/deepseek-429-concurrency-message branch May 22, 2026 06:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(client): translate DeepSeek 429 into a concurrency-limit hint#1526

fix(client): translate DeepSeek 429 into a concurrency-limit hint#1526
esengine merged 1 commit into
mainfrom
fix/deepseek-429-concurrency-message

esengine commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esengine commented May 22, 2026

Summary

Scope decision (intentionally narrow)

What this PR does

1. Friendly 429 error message

2. Clarify rateLimit.rpm docstring

Test Plan

Test Plan (manual)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

2. Clarify `rateLimit.rpm` docstring