fix(serve): Add prompt queue backpressure#5033
Conversation
Add per-session prompt admission limits across the bridge, REST and ACP entrypoints, and SDK clients. The server now rejects full prompt queues before returning accepted semantics, advertises the active limit through capabilities, and documents the behavior with focused tests and design artifacts. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Close the mocked SSE stream explicitly in the pending prompt cap test so cleanup does not rely on abort-driven stream cancellation timing in CI. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Reject accepted subscription prompts if the event stream has already ended, and make the prompt-cap tests wait for the pending registration before closing or injecting SSE frames. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Map server-side prompt_queue_full responses to DaemonPendingPromptLimitError for both blocking and non-blocking prompt calls, include the session id in the local limit error, and cross-reference the duplicated default prompt cap constants. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
wenshao
left a comment
There was a problem hiding this comment.
Test single comment anchor
| ) | ||
| ); | ||
| } catch (err) { | ||
| if (deadlineTimer !== undefined) clearTimeout(deadlineTimer); |
| () => undefined, | ||
| () => undefined, | ||
| ); | ||
| result.finally(releasePromptSlot).catch(() => {}); |
There was a problem hiding this comment.
[Suggestion] The FIXME(stage-2) at bridge.ts:2808-2815 acknowledges that no absolute prompt deadline exists at the bridge level. A wedged agent that ignores cancel() while keeping the channel alive can hold the prompt promise open indefinitely — releasePromptSlot never fires and pendingPromptCount stays permanently elevated. Five such hangs (which can accumulate over time with a flaky LLM provider) brick the session with no self-healing and no runtime inspection surface (per the related gap at bridge.ts:228 — pendingPromptCount is not exposed anywhere).
The design doc states this is out of scope ("existing bridge follow-up remains separate from admission backpressure"), but given the PR introduces the admission cap whose effectiveness depends on slot release, at minimum a stall warning would help oncall: if pendingPromptCount has been at the cap for longer than some threshold (e.g. 2× promptDeadlineMs or a separate --prompt-stall-timeout-ms), emit a daemonLog.warn with the session ID and current count.
— qwen3.7-max via Qwen Code /review
Local real-run verification (maintainer, macOS) — backpressure works end-to-end on every surface; needs a rebase + two merge-time notesI built PR head Bottom line: the admission cap works exactly as documented on every surface I could reach — REST 503 shape, ACP Environment
Static checks
The two subscription-mode tests I flagged as hanging/failing in my first review are fixed by the Live behavior matrix (all real daemons, real prompts)
The SDK 503→typed-error mapping ( Control experiment — the odd ACP settle shapes are pre-existing, not this PRWhen I fired 6 blocking Merge-time notes (the actionable bits)
Not covered by this run
RecommendationFunctionally this is mergeable: every advertised behavior verified end-to-end against real daemons, including the cross-surface cap accounting that's hardest to get right. Gate merge on: (1) rebase to resolve the 中文版(Chinese version)本地真实运行验证(维护者,macOS)—— backpressure 在所有暴露面端到端可用;合并前需 rebase + 两个注意项我构建了 PR head 结论:admission cap 在我能触达的每个暴露面上都与文档严格一致——REST 503 形状、ACP 环境
静态检查
我第一轮 review 标记为挂起/失败的两个 subscription-mode 测试,已被 真实行为矩阵(全部真实 daemon、真实 prompt)
SDK 的 503→typed error 映射( 对照实验 —— ACP 上奇怪的 settle 形状是既有行为,不是本 PR 的对同一 WS 连接并发发 6 个阻塞式 合并注意项(可执行部分)
本轮未覆盖
建议功能上可合并:所有声明行为都对真实 daemon 端到端验证通过,包括最容易出错的跨暴露面共享计数。合并前置:(1) rebase 解决 |
|
@qwen-code /triage |
|
Thanks for the PR! Template looks good ✓ On direction: this is a natural and well-motivated follow-up to #4490. Prompt deadlines bounded execution time but left admission unbounded — that gap is real, and closing it with a per-session cap mirrors the existing On approach: the three-layer design (bridge as authoritative gate, REST/ACP mapping to stable 503 shape, SDK local guard for early-fail) is sound and each layer earns its existence. The SDK guard prevents wasteful SSE connections, not just redundant HTTP requests. The One observation: Minor nit: the PR title says Moving on to code review. 🔍 中文说明感谢贡献! 模板完整 ✓ 方向:这是 #4490 的自然延续。prompt deadline 限制了执行时间但没有约束 admission,这个缺口是真实的。用 per-session cap 来填补,和已有的 方案:三层设计(bridge 作为权威入口、REST/ACP 映射到稳定的 503 形状、SDK 本地 guard 提前失败)合理,每层都有存在的理由。SDK guard 阻止的是浪费的 SSE 连接,而不仅仅是冗余的 HTTP 请求。 一个观察: 小问题:PR 标题是 进入代码审查 🔍 — Qwen Code · qwen3.7-max |
Code reviewReviewed the full diff (23 files, +1421/-87). The implementation is clean and well-structured — no critical blockers found. Independent proposal comparison: my baseline was "add a Key correctness points verified in code:
Non-blocking observations:
Test resultsUnit tests (all passed)
Total: 955/955 passed, 0 failures. Typecheck passed for all packages. Real-scenario testingDefault cap (5) —
|
| 测试文件 | 测试数 | 结果 |
|---|---|---|
acp-bridge/bridge.test.ts |
261 | ✅ 全部通过 |
cli/serve/server.test.ts |
429 | ✅ 全部通过 |
cli/serve/acpHttp/transport.test.ts |
92 | ✅ 全部通过 |
sdk-typescript/DaemonClient.test.ts + DaemonSessionClient.test.ts |
173 | ✅ 全部通过 |
总计:955/955 通过,0 失败。 Typecheck 全部通过。
真实场景测试
三个场景(默认 cap=5、禁用 cap=0、非法值 -5)均按设计文档工作。limits 字段对 capabilities 是 additive 的,null 哨兵值表示禁用。
— Qwen Code · qwen3.7-max
Final verdict: ✅ ApproveStepping back — this is a solid, well-executed infrastructure PR. The motivation is clear: #4490 added prompt deadlines but left the admission door wide open, and this PR closes that gap with the minimal effective change. The counter-and-release pattern is the simplest correct solution. The three-layer design (bridge gate → REST/ACP mapping → SDK early-fail) is sound, with each layer earning its existence rather than being speculative. The slot release logic is defensive (double-release guard, 955 tests pass, typecheck clean, real-scenario testing confirms the capabilities endpoint advertises the limit correctly in default (5), disabled (null), and invalid (boot rejection) cases. My independent proposal matched the PR's approach — I didn't find a simpler path it missed. The scope feels right: the SDK local guard adds complexity but prevents wasteful SSE connections, not just redundant HTTP round-trips. Two non-blocking notes for the author's awareness (neither blocks the merge):
Approving. 🚢 中文说明最终判定:✅ 批准这是一个扎实的基础设施 PR。动机清晰:#4490 增加了 prompt deadline 但没有约束 admission,本 PR 用最小有效变更填补了这个缺口。 计数器和释放模式是最简单的正确方案。三层设计(bridge gate → REST/ACP 映射 → SDK 提前失败)合理,每层都有存在的理由。slot 释放逻辑防御性强(双重释放 guard、 955 个测试通过,typecheck 干净,真实场景测试确认 capabilities 端点在默认(5)、禁用(null)和非法值(启动拒绝)情况下均正确工作。 我的独立方案与 PR 方案一致——没有找到更简单的路径。范围合理:SDK 本地 guard 增加了复杂度但阻止了浪费的 SSE 连接。 两个非阻断提醒(不影响合并):
批准。🚢 — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
qqqys
left a comment
There was a problem hiding this comment.
Prior critical feedback remains resolved on the current head; I did not find a new critical blocker in this pass.
Log synchronous prompt queue rejections, document the sync admission contract, clean up SDK prompt-slot release, and cover the reviewed backpressure edge cases. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Reduce the generated daemon client bundle slightly and raise the browser daemon SDK bundle budget to 116 KiB so the PR merge ref has practical headroom. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
@qwen-code /triage |
|
Thanks for the PR! Template looks good ✓ — all required headings present with bilingual body. On direction: this is a natural and necessary follow-up to #4490. Prompt deadlines bounded execution time but left admission unbounded — a single session could enqueue unlimited accepted prompts, accumulating promises, SSE waits, and closures. Adding an explicit admission cap with retryable semantics is the right fix. The area is squarely within On approach: the layered defense (bridge admission → REST 503 → ACP error data → SDK local reservation → capabilities advertisement) feels proportionate to the problem. Each layer has a clear role: bridge is authoritative, SDK is early-fail guard, capabilities lets clients self-configure. The default of 5 is reasonable for single-user / small-team usage and the escape hatch ( One observation: the PR bundles design doc, E2E test plan, docs update, and implementation in a single commit set — that's thorough but makes the diff large (+1677 across 24 files). The implementation itself is well-scoped; the extras are all justified by project conventions. Moving on to code review. 🔍 中文说明感谢贡献! 模板完整 ✓ — 所有必需的标题都存在,附带双语正文。 方向:这是 #4490 的自然且必要的后续。prompt deadline 限制了执行时间,但没有约束 admission——单个 session 可以无限 enqueue 已接受的 prompt,堆积 promise、SSE wait 和闭包。增加显式 admission cap 并提供可重试语义是正确的做法。这个领域完全在 方案:分层防御(bridge admission → REST 503 → ACP error data → SDK 本地 reservation → capabilities 广告)与问题的规模相称。每一层都有明确的角色:bridge 是权威入口,SDK 是提前失败的保护,capabilities 让客户端自动配置。默认值 5 对单用户/小团队使用是合理的,逃逸阀( 一个观察:PR 把设计文档、E2E 测试计划、文档更新和实现打包在同一个提交集中——很全面但也让 diff 变大了(24 个文件 +1677 行)。实现本身范围合理;额外内容都符合项目惯例。 进入代码审查 🔍 — Qwen Code · qwen3.7-max |
Code reviewReviewed the full diff (24 files, +1677/-89). The implementation is careful and well-structured. Independent proposal: for unbounded prompt queues, I'd add a per-session counter in the bridge, throw synchronously when the cap is hit, map that to HTTP 503 in the REST route, and optionally add SDK-side local accounting as early-fail. That's essentially what this PR does. Findings: The bridge changes are solid. The REST route wraps The SDK-side No critical blockers found. The validation logic is duplicated across three layers (bridge constructor, Minor observation: a few error messages were shortened ( Test resultsAll focused unit tests pass: New tests specifically validate: sixth prompt rejected at default cap, slot release after success/failure/abort/SSE-end, pre-aborted prompts skip the counter, No TUI changes — this is non-UI serve/bridge/SDK behavior, so tmux testing is N/A per the PR's own test plan. 中文说明代码审查审查了完整 diff(24 个文件,+1677/-89)。实现仔细且结构良好。 独立方案: 对于无界 prompt 队列,我会在 bridge 中添加 per-session 计数器,达到上限时同步抛出异常,在 REST 路由中映射为 HTTP 503,并可选择在 SDK 端添加本地计数作为提前失败。这正是这个 PR 所做的。 发现: bridge 改动可靠。 REST 路由用 try/catch 包装 SDK 端的 未发现关键阻塞问题。验证逻辑在三层(bridge 构造函数、 测试结果所有聚焦单元测试通过:973 个测试,0 个失败。Typecheck 干净。 无 TUI 变更——这是非 UI 的 serve/bridge/SDK 行为,tmux 测试不适用。 — Qwen Code · qwen3.7-max |
Final verdict: ✅ ApproveStepping back — this is a well-designed, well-tested addition to the serve layer's load-management story. The motivation is clear (unbounded admission after #4490), the implementation matches the stated design, and the test coverage is thorough across all four affected packages. My independent proposal (per-session counter, synchronous throw, HTTP 503 mapping, optional SDK guard) matches the PR's approach closely. The PR exceeds it in polish: idempotent slot release, pre-abort fast-path, capabilities advertisement, stable ACP error data, and the The scope is large in file count (24) but each file has a clear purpose. The implementation code itself is tight — the bulk comes from tests and documentation, which is the right ratio for a load-management feature. The three-layer validation (bridge, runQwenServe, CLI) follows established project patterns. 973 tests pass. Typecheck clean. No correctness issues found in the code review. LGTM, ships the feature cleanly. ✅ 中文说明最终决定:✅ 批准退一步看——这是对 serve 层负载管理功能的一个设计良好、测试充分的补充。动机清晰(#4490 后无界 admission),实现符合设计声明,测试覆盖在所有四个受影响的包中都很充分。 我的独立方案(per-session 计数器、同步抛出、HTTP 503 映射、可选 SDK 守卫)与 PR 方案高度一致。PR 在细节上超越了我的方案:幂等 slot 释放、pre-abort 快速路径、capabilities 广告、稳定的 ACP error data、以及 范围在文件数量上较大(24 个),但每个文件都有明确的目的。实现代码本身很紧凑——大部分来自测试和文档,这对于负载管理功能来说是正确的比例。三层验证(bridge、runQwenServe、CLI)遵循既定的项目模式。 973 个测试通过。Typecheck 干净。代码审查中未发现正确性问题。 LGTM,干净地交付了功能。✅ — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
What this PR does
This PR adds per-session prompt admission backpressure for qwen serve. The bridge now owns the authoritative pending prompt cap, defaults it to 5 accepted-but-unsettled prompts per session, and rejects overflow before the REST route returns accepted semantics. REST responses use a stable
503shape withRetry-After: 5, ACP JSON-RPC exposes stableprompt_queue_fullerror data, and/capabilitiesadvertises the active numeric limit ornullwhen disabled. The CLI and embedded serve options can configure or disable the cap, and the TypeScript SDK adds local per-session reservation so clients fail early before opening unbounded temporary SSE waits.The change also documents the behavior in the serve user guide and adds design and E2E test-plan artifacts for the prompt queue cap semantics.
Why it's needed
After #4490, prompt deadlines limited accepted REST prompt execution time but did not bound admission into the bridge prompt FIFO or SDK-side pending prompt tracking. A single session could still enqueue unbounded accepted prompts, accumulating promises, SSE waits, closures, and eventual agent work. This PR adds an explicit admission cap so overload is rejected with retryable semantics instead of being accepted and left to accumulate.
Reviewer Test Plan
How to verify
Run the focused bridge tests and confirm the sixth default-cap prompt is synchronously rejected, slots release after success or failure, pre-aborted prompts do not occupy a slot, and
branchSessiondoes not count against the prompt cap.Run the focused CLI serve tests and confirm REST returns
503,Retry-After: 5,code: "prompt_queue_full", and nopromptIdwhen the bridge rejects admission; confirm/capabilities.limits.maxPendingPromptsPerSessionreports the default, explicit value, and disabled state; confirm invalid programmatic limits are rejected at serve boot.Run the focused ACP HTTP tests and confirm
session/promptreturns stable JSON-RPC error data witherrorKind: "prompt_queue_full"instead of the generic internal shape.Run the focused TypeScript SDK tests and confirm local prompt cap rejection does not issue fetch, reservations release on completion, turn errors, caller abort, and SSE end, and pre-aborted prompts do not reserve a slot.
Evidence (Before & After)
N/A. This is non-UI serve/bridge/SDK behavior.
Tested on
Environment (optional)
Local macOS worktree with Node.js v26.0.0. Validation commands run:
cd packages/acp-bridge && npx vitest run src/bridge.test.ts;cd packages/cli && npx vitest run src/serve/server.test.ts src/serve/acpHttp/transport.test.ts;cd packages/sdk-typescript && npx vitest run test/unit/DaemonClient.test.ts test/unit/DaemonSessionClient.test.ts;npm run build && npm run typecheck;git diff --check.Risk & Scope
--max-pending-prompts-per-session 0or the equivalent embedded option.limits.Linked Issues
References #4490.
中文说明
What this PR does
这个 PR 为 qwen serve 增加了按 session 维度的 prompt admission backpressure。bridge 现在作为权威入口维护 pending prompt cap,默认每个 session 最多 5 个已接受但尚未 settle 的 prompt,并在 REST 返回 accepted 语义之前拒绝溢出请求。REST 响应使用稳定的
503形状并带Retry-After: 5,ACP JSON-RPC 暴露稳定的prompt_queue_full错误数据,/capabilities会广告当前启用的数值限制,禁用时返回null。CLI 和嵌入式 serve options 都可以配置或禁用该 cap,TypeScript SDK 也增加了本地 per-session reservation,让客户端在打开无限临时 SSE wait 之前更早失败。本次变更还更新了 serve 用户文档,并增加了 prompt queue cap 语义的设计文档和 E2E 测试计划产物。
Why it's needed
#4490 之后,prompt deadline 只限制了已接受 REST prompt 的执行时间,并没有限制进入 bridge prompt FIFO 或 SDK pending prompt tracking 的 admission。单个 session 仍然可以无限 enqueue 已接受的 prompt,持续堆积 promise、SSE wait、闭包以及后续 agent work。这个 PR 增加显式 admission cap,让过载请求以可重试语义被拒绝,而不是先接受再持续累积。
Reviewer Test Plan
How to verify
运行聚焦 bridge 测试,确认默认 cap 下第 6 个 prompt 会同步拒绝,slot 会在成功或失败后释放,pre-aborted prompt 不占用 slot,并且
branchSession不计入 prompt cap。运行聚焦 CLI serve 测试,确认 bridge 拒绝 admission 时 REST 返回
503、Retry-After: 5、code: "prompt_queue_full",且不返回promptId;确认/capabilities.limits.maxPendingPromptsPerSession能反映默认值、显式值和禁用状态;确认非法程序化 limit 会在 serve 启动阶段被拒绝。运行聚焦 ACP HTTP 测试,确认
session/prompt返回稳定 JSON-RPC error data,其中包含errorKind: "prompt_queue_full",而不是 generic internal 形状。运行聚焦 TypeScript SDK 测试,确认本地 prompt cap 拒绝不会发出 fetch,reservation 会在完成、turn error、调用方 abort 和 SSE end 时释放,并且 pre-aborted prompt 不会 reserve slot。
Evidence (Before & After)
N/A。这是非 UI 的 serve、bridge 和 SDK 行为变更。
Tested on
Environment (optional)
本地 macOS worktree,Node.js v26.0.0。已运行验证命令:
cd packages/acp-bridge && npx vitest run src/bridge.test.ts;cd packages/cli && npx vitest run src/serve/server.test.ts src/serve/acpHttp/transport.test.ts;cd packages/sdk-typescript && npx vitest run test/unit/DaemonClient.test.ts test/unit/DaemonSessionClient.test.ts;npm run build && npm run typecheck;git diff --check。Risk & Scope
--max-pending-prompts-per-session 0或对应嵌入式 option 禁用。limits。Linked Issues
References #4490.