Skip to content

fix(serve): Add prompt queue backpressure#5033

Merged
doudouOUC merged 8 commits into
QwenLM:mainfrom
doudouOUC:fix/prompt-queue-backpressure
Jun 13, 2026
Merged

fix(serve): Add prompt queue backpressure#5033
doudouOUC merged 8 commits into
QwenLM:mainfrom
doudouOUC:fix/prompt-queue-backpressure

Conversation

@doudouOUC

Copy link
Copy Markdown
Collaborator

What this PR does

This PR adds per-session prompt admission backpressure for qwen serve. The bridge now owns the authoritative pending prompt cap, defaults it to 5 accepted-but-unsettled prompts per session, and rejects overflow before the REST route returns accepted semantics. REST responses use a stable 503 shape with Retry-After: 5, ACP JSON-RPC exposes stable prompt_queue_full error data, and /capabilities advertises the active numeric limit or null when disabled. The CLI and embedded serve options can configure or disable the cap, and the TypeScript SDK adds local per-session reservation so clients fail early before opening unbounded temporary SSE waits.

The change also documents the behavior in the serve user guide and adds design and E2E test-plan artifacts for the prompt queue cap semantics.

Why it's needed

After #4490, prompt deadlines limited accepted REST prompt execution time but did not bound admission into the bridge prompt FIFO or SDK-side pending prompt tracking. A single session could still enqueue unbounded accepted prompts, accumulating promises, SSE waits, closures, and eventual agent work. This PR adds an explicit admission cap so overload is rejected with retryable semantics instead of being accepted and left to accumulate.

Reviewer Test Plan

How to verify

Run the focused bridge tests and confirm the sixth default-cap prompt is synchronously rejected, slots release after success or failure, pre-aborted prompts do not occupy a slot, and branchSession does not count against the prompt cap.

Run the focused CLI serve tests and confirm REST returns 503, Retry-After: 5, code: "prompt_queue_full", and no promptId when the bridge rejects admission; confirm /capabilities.limits.maxPendingPromptsPerSession reports the default, explicit value, and disabled state; confirm invalid programmatic limits are rejected at serve boot.

Run the focused ACP HTTP tests and confirm session/prompt returns stable JSON-RPC error data with errorKind: "prompt_queue_full" instead of the generic internal shape.

Run the focused TypeScript SDK tests and confirm local prompt cap rejection does not issue fetch, reservations release on completion, turn errors, caller abort, and SSE end, and pre-aborted prompts do not reserve a slot.

Evidence (Before & After)

N/A. This is non-UI serve/bridge/SDK behavior.

Tested on

OS Status
🍏 macOS ✅ tested
🪟 Windows ⚠️ not tested
🐧 Linux ⚠️ not tested

Environment (optional)

Local macOS worktree with Node.js v26.0.0. Validation commands run: cd packages/acp-bridge && npx vitest run src/bridge.test.ts; cd packages/cli && npx vitest run src/serve/server.test.ts src/serve/acpHttp/transport.test.ts; cd packages/sdk-typescript && npx vitest run test/unit/DaemonClient.test.ts test/unit/DaemonSessionClient.test.ts; npm run build && npm run typecheck; git diff --check.

Risk & Scope

  • Main risk or tradeoff: The default server-side cap of 5 is a behavior change for clients that intentionally enqueue many prompts to one session; operators can disable it with --max-pending-prompts-per-session 0 or the equivalent embedded option.
  • Not validated / out of scope: This does not implement a hard kill for a non-cooperative agent whose prompt promise never settles; that existing bridge follow-up remains separate from admission backpressure.
  • Breaking changes / migration notes: No API removals. New typed errors and capability fields are additive; older clients can ignore limits.

Linked Issues

References #4490.

中文说明

What this PR does

这个 PR 为 qwen serve 增加了按 session 维度的 prompt admission backpressure。bridge 现在作为权威入口维护 pending prompt cap,默认每个 session 最多 5 个已接受但尚未 settle 的 prompt,并在 REST 返回 accepted 语义之前拒绝溢出请求。REST 响应使用稳定的 503 形状并带 Retry-After: 5,ACP JSON-RPC 暴露稳定的 prompt_queue_full 错误数据,/capabilities 会广告当前启用的数值限制,禁用时返回 null。CLI 和嵌入式 serve options 都可以配置或禁用该 cap,TypeScript SDK 也增加了本地 per-session reservation,让客户端在打开无限临时 SSE wait 之前更早失败。

本次变更还更新了 serve 用户文档,并增加了 prompt queue cap 语义的设计文档和 E2E 测试计划产物。

Why it's needed

#4490 之后,prompt deadline 只限制了已接受 REST prompt 的执行时间,并没有限制进入 bridge prompt FIFO 或 SDK pending prompt tracking 的 admission。单个 session 仍然可以无限 enqueue 已接受的 prompt,持续堆积 promise、SSE wait、闭包以及后续 agent work。这个 PR 增加显式 admission cap,让过载请求以可重试语义被拒绝,而不是先接受再持续累积。

Reviewer Test Plan

How to verify

运行聚焦 bridge 测试,确认默认 cap 下第 6 个 prompt 会同步拒绝,slot 会在成功或失败后释放,pre-aborted prompt 不占用 slot,并且 branchSession 不计入 prompt cap。

运行聚焦 CLI serve 测试,确认 bridge 拒绝 admission 时 REST 返回 503Retry-After: 5code: "prompt_queue_full",且不返回 promptId;确认 /capabilities.limits.maxPendingPromptsPerSession 能反映默认值、显式值和禁用状态;确认非法程序化 limit 会在 serve 启动阶段被拒绝。

运行聚焦 ACP HTTP 测试,确认 session/prompt 返回稳定 JSON-RPC error data,其中包含 errorKind: "prompt_queue_full",而不是 generic internal 形状。

运行聚焦 TypeScript SDK 测试,确认本地 prompt cap 拒绝不会发出 fetch,reservation 会在完成、turn error、调用方 abort 和 SSE end 时释放,并且 pre-aborted prompt 不会 reserve slot。

Evidence (Before & After)

N/A。这是非 UI 的 serve、bridge 和 SDK 行为变更。

Tested on

OS Status
🍏 macOS ✅ tested
🪟 Windows ⚠️ not tested
🐧 Linux ⚠️ not tested

Environment (optional)

本地 macOS worktree,Node.js v26.0.0。已运行验证命令:cd packages/acp-bridge && npx vitest run src/bridge.test.tscd packages/cli && npx vitest run src/serve/server.test.ts src/serve/acpHttp/transport.test.tscd packages/sdk-typescript && npx vitest run test/unit/DaemonClient.test.ts test/unit/DaemonSessionClient.test.tsnpm run build && npm run typecheckgit diff --check

Risk & Scope

  • Main risk or tradeoff: 默认服务端 cap 为 5,这会改变那些有意向同一个 session enqueue 大量 prompt 的客户端行为;运维方可以通过 --max-pending-prompts-per-session 0 或对应嵌入式 option 禁用。
  • Not validated / out of scope: 本次不实现对非协作 agent 的硬 kill;如果 prompt promise 永不 settle,已有 bridge follow-up 仍然独立于 admission backpressure。
  • Breaking changes / migration notes: 没有移除 API。新增 typed errors 和 capability fields 都是 additive;旧客户端可以忽略 limits

Linked Issues

References #4490.

doudouOUC and others added 2 commits June 12, 2026 16:21
Add per-session prompt admission limits across the bridge, REST and ACP entrypoints, and SDK clients. The server now rejects full prompt queues before returning accepted semantics, advertises the active limit through capabilities, and documents the behavior with focused tests and design artifacts.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Close the mocked SSE stream explicitly in the pending prompt cap test so cleanup does not rely on abort-driven stream cancellation timing in CI.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Reject accepted subscription prompts if the event stream has already ended, and make the prompt-cap tests wait for the pending registration before closing or injecting SSE frames.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Comment thread packages/sdk-typescript/test/unit/DaemonSessionClient.test.ts
Comment thread packages/sdk-typescript/src/daemon/DaemonClient.ts
Comment thread packages/sdk-typescript/src/daemon/DaemonClient.ts
Comment thread packages/sdk-typescript/src/daemon/DaemonClient.ts Outdated
doudouOUC and others added 2 commits June 12, 2026 18:43
Map server-side prompt_queue_full responses to DaemonPendingPromptLimitError for both blocking and non-blocking prompt calls, include the session id in the local limit error, and cross-reference the duplicated default prompt cap constants.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
@doudouOUC doudouOUC marked this pull request as ready for review June 12, 2026 11:58
Copilot AI review requested due to automatic review settings June 12, 2026 11:58

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

qqqys
qqqys previously approved these changes Jun 12, 2026

@qqqys qqqys left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prior critical subscription-mode test failure appears resolved on head 3045f0a; I did not find any new critical blocker in this pass.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test single comment anchor

)
);
} catch (err) {
if (deadlineTimer !== undefined) clearTimeout(deadlineTimer);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test F1 anchor

Comment thread packages/cli/src/serve/server.ts
Comment thread packages/acp-bridge/src/bridge.ts
() => undefined,
() => undefined,
);
result.finally(releasePromptSlot).catch(() => {});

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The FIXME(stage-2) at bridge.ts:2808-2815 acknowledges that no absolute prompt deadline exists at the bridge level. A wedged agent that ignores cancel() while keeping the channel alive can hold the prompt promise open indefinitely — releasePromptSlot never fires and pendingPromptCount stays permanently elevated. Five such hangs (which can accumulate over time with a flaky LLM provider) brick the session with no self-healing and no runtime inspection surface (per the related gap at bridge.ts:228pendingPromptCount is not exposed anywhere).

The design doc states this is out of scope ("existing bridge follow-up remains separate from admission backpressure"), but given the PR introduces the admission cap whose effectiveness depends on slot release, at minimum a stall warning would help oncall: if pendingPromptCount has been at the cap for longer than some threshold (e.g. 2× promptDeadlineMs or a separate --prompt-stall-timeout-ms), emit a daemonLog.warn with the session ID and current count.

— qwen3.7-max via Qwen Code /review

Comment thread packages/acp-bridge/src/bridgeErrors.ts
Comment thread packages/sdk-typescript/src/daemon/DaemonClient.ts
Comment thread packages/sdk-typescript/src/daemon/DaemonClient.ts
Comment thread packages/sdk-typescript/src/daemon/DaemonSessionClient.ts
Comment thread packages/acp-bridge/src/bridge.test.ts
Comment thread packages/sdk-typescript/test/unit/DaemonClient.test.ts
Comment thread packages/sdk-typescript/test/unit/DaemonSessionClient.test.ts
@wenshao

wenshao commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Local real-run verification (maintainer, macOS) — backpressure works end-to-end on every surface; needs a rebase + two merge-time notes

I built PR head 3045f0ac2, started real qwen serve daemons in tmux under four configurations (default cap, --max-pending-prompts-per-session 3, 0 = disabled, invalid values), and drove real concurrent prompts against them over raw REST, raw ACP WebSocket JSON-RPC, and the TypeScript SDK — plus a control run against a main build to separate this PR's behavior from pre-existing semantics.

Bottom line: the admission cap works exactly as documented on every surface I could reach — REST 503 shape, ACP prompt_queue_full error data, shared bridge-level accounting, SDK local reservation, SDK 503→typed-error mapping, capabilities advertisement, boot validation, and slot release all verified live. The two earlier Critical test-stability issues are fixed (CI is green; suites pass locally). What stands between this and merge: the branch is CONFLICTING with main (one test file), and the PR silently raises the SDK browser-bundle budget to a value with 3 bytes of headroom.

Environment

  • macOS 26.5 (arm64), Node v22.22.2; PR head 3045f0ac2 (5 commits, branch is 19 behind main); daemon = npm run bundlenode dist/cli.js serve; clients = raw fetch, raw WebSocket, and this branch's SDK via tsx. Real model turns (excess prompts settled via POST /session/:id/cancel).

Static checks

Check Result
Root npm run build ✅ green
acp-bridge focused suite ✅ 255/255
Full sdk-typescript suite ✅ 766/766
CLI server.test.ts + acpHttp/transport.test.ts ✅ 497/497
tsc --noEmit (sdk + cli) ✅ clean
CI on head ✅ all green (Lint, Test ×3 platforms, CodeQL)

The two subscription-mode tests I flagged as hanging/failing in my first review are fixed by the test(sdk): stabilize … commits — the suites now pass deterministically.

Live behavior matrix (all real daemons, real prompts)

Scenario Expected Observed
GET /capabilities (default) limits.maxPendingPromptsPerSession: 5 5
GET /capabilities (…=3 / …=0) 3 / null 3 / null
GET /capabilities on main build no limits field (additive) ✅ absent
REST: 7 concurrent prompts, one session 5×202 + 2×503 ✅ exactly {202:5, 503:2}
503 response shape Retry-After: 5, code:"prompt_queue_full", limit, pendingCount, sessionId, no promptId ✅ all fields exact; accepted ones all carry promptId
Slot release post-cancel prompt admitted ✅ 202 after cancel
Explicit cap 3: 4 concurrent 3×202 + 1×503 (limit:3) ✅ exact
Disabled (0): 8 concurrent all 202 {202:8}
Invalid -1 / 1.5 at boot refuse to start ✅ exit 1, clear message (must be a non-negative integer (0 / Infinity = unlimited))
ACP WS: 6 concurrent session/prompt 6th rejected with typed data ✅ JSON-RPC error data: {errorKind:"prompt_queue_full", limit:5, pendingCount:5, sessionId}
REST+ACP share one cap fill 5 via REST → 6th via ACP rejected prompt_queue_full with pendingCount:5
SDK prompt() ×6 concurrent 6th rejected locally, no fetch DaemonPendingPromptLimitError(limit:5, pendingCount:5); exactly 5 prompt POSTs issued
SDK promptNonBlocking() ×6 bypasses local cap (by design), server 503 → typed error ✅ 5 accepted, 6th throws DaemonPendingPromptLimitError mapped from the 503 body
SDK maxPendingPromptsPerSession: 2 3rd rejected locally, 2 fetches ✅ exact (limit:2 in the error)

The SDK 503→typed-error mapping (fix(sdk): map prompt queue full responses) directly addresses my earlier review suggestion and works live; promptNonBlocking not reserving locally is a defensible design (its callers manage their own concurrency) now that the server rejection surfaces as the same typed error.

Control experiment — the odd ACP settle shapes are pre-existing, not this PR

When I fired 6 blocking session/prompt calls concurrently over one WS connection, the non-winning prompts settled as 1×stopReason:"cancelled" + 4×generic Internal error (details: "Request was aborted."). To attribute this correctly I ran the identical probe against a main build: same shape — [0] cancelled, [1..4] internal, [5] end_turn. So concurrent blocking ACP prompts already interrupt each other on main (last writer wins, displaced ones settle as cancelled/internal); this PR only changes the over-cap prompt from "wins by interrupting" to "rejected with prompt_queue_full" — which matches the Risk note in the PR description. The generic internal shape for displaced prompts is a pre-existing wart worth a follow-up, not a defect of this PR.

Merge-time notes (the actionable bits)

  1. Rebase required. GitHub reports the branch CONFLICTING; locally the conflict is confined to packages/cli/src/serve/acpHttp/transport.test.ts (other touched files merge clean). Needs a rebase onto current main (19 commits behind) before merge.
  2. Undeclared bundle-budget raise with 3 bytes of headroom. The PR bumps MAX_DAEMON_BROWSER_BUNDLE_BYTES from 114 * 1024 to 115 * 1024 in packages/sdk-typescript/scripts/build.js — not mentioned anywhere in the PR description. The built browser bundle is 117,757 bytes against the new 117,760 limit. The raise itself is defensible (the SDK gains real reservation logic), but (a) it should be declared and (b) 3 bytes of headroom guarantees the next daemon-SDK change trips the gate and gets to make this same silent bump. Suggest landing it at a round value with honest headroom (e.g. 118 or 120 * 1024) in this PR.
  3. (Minor, from my earlier review) A bridge-level PromptQueueFullError rejection still emits no dedicated daemon log line; the generic route log does record status=503 at WARN with the sessionId, so operators aren't blind — but a dedicated line (matching the WorkspaceMismatchError precedent) remains nice-to-have. Non-blocking.

Not covered by this run

DaemonSessionClient subscription-mode reservation/release paths are covered by the (now stable) unit suites only — I did not drive them against a live daemon. Windows/Linux untested here (CI covers both).

Recommendation

Functionally this is mergeable: every advertised behavior verified end-to-end against real daemons, including the cross-surface cap accounting that's hardest to get right. Gate merge on: (1) rebase to resolve the transport.test.ts conflict with CI re-green, and (2) either declare the bundle-budget raise in the PR description or land it with real headroom. Happy to re-verify after the rebase if anything in the conflict region looks risky.

中文版(Chinese version)

本地真实运行验证(维护者,macOS)—— backpressure 在所有暴露面端到端可用;合并前需 rebase + 两个注意项

我构建了 PR head 3045f0ac2,在 tmux 里以四种配置启动真实 qwen serve(默认 cap、--max-pending-prompts-per-session 30 禁用、非法值),用原生 REST、原生 ACP WebSocket JSON-RPC、TypeScript SDK 三种客户端真实并发打 prompt——并额外对 main 构建做了对照实验,以区分本 PR 的行为与既有语义。

结论:admission cap 在我能触达的每个暴露面上都与文档严格一致——REST 503 形状、ACP prompt_queue_full 错误数据、bridge 层共享计数、SDK 本地预占、SDK 503→typed error 映射、capabilities 广告、启动校验、slot 释放,全部实测通过。早前两个 Critical 测试稳定性问题已修复(CI 绿,本地套件通过)。距合并还差:分支与 main 存在冲突(一个测试文件),且 PR 未声明地把 SDK 浏览器包预算调到只剩 3 字节余量的值。

环境

  • macOS 26.5 (arm64),Node v22.22.2;PR head 3045f0ac2(5 个 commit,落后 main 19 个);daemon = npm run bundlenode dist/cli.js serve;客户端 = 原生 fetch、原生 WebSocket、本分支 SDK 经 tsx。真实模型 turn(多余 prompt 通过 POST /session/:id/cancel settle)。

静态检查

检查项 结果
npm run build ✅ 绿
acp-bridge 聚焦套件 ✅ 255/255
完整 sdk-typescript 套件 ✅ 766/766
CLI server.test.ts + acpHttp/transport.test.ts ✅ 497/497
tsc --noEmit(sdk + cli) ✅ 干净
head 上的 CI ✅ 全绿(Lint、三平台 Test、CodeQL)

我第一轮 review 标记为挂起/失败的两个 subscription-mode 测试,已被 test(sdk): stabilize … 两个 commit 修复——套件现在确定性通过。

真实行为矩阵(全部真实 daemon、真实 prompt)

场景 预期 实测
GET /capabilities(默认) limits.maxPendingPromptsPerSession: 5 5
GET /capabilities(…=3 / …=0 3 / null 3 / null
main 构建的 GET /capabilities limits 字段(additive) ✅ 不存在
REST:同 session 7 并发 5×202 + 2×503 ✅ 精确 {202:5, 503:2}
503 响应形状 Retry-After: 5code:"prompt_queue_full"limitpendingCountsessionIdpromptId ✅ 字段全部精确;被接受的都带 promptId
slot 释放 cancel 后再发可入 ✅ cancel 后 202
显式 cap 3:4 并发 3×202 + 1×503(limit:3 ✅ 精确
禁用(0):8 并发 全 202 {202:8}
启动时非法 -1 / 1.5 拒绝启动 ✅ exit 1,信息清晰(must be a non-negative integer (0 / Infinity = unlimited)
ACP WS:6 并发 session/prompt 第 6 个以 typed data 拒绝 ✅ JSON-RPC error data: {errorKind:"prompt_queue_full", limit:5, pendingCount:5, sessionId}
REST+ACP 共享同一 cap REST 塞满 5 个 → ACP 第 6 个被拒 prompt_queue_fullpendingCount:5
SDK prompt() ×6 并发 第 6 个本地拒绝,不发 fetch DaemonPendingPromptLimitError(limit:5, pendingCount:5);prompt POST 恰好只发 5
SDK promptNonBlocking() ×6 绕过本地 cap(设计如此),服务端 503 → typed error ✅ 5 个接受,第 6 个抛出由 503 body 映射来的 DaemonPendingPromptLimitError
SDK maxPendingPromptsPerSession: 2 第 3 个本地拒,只发 2 个 fetch ✅ 精确(错误中 limit:2

SDK 的 503→typed error 映射(fix(sdk): map prompt queue full responses)直接回应了我此前的 review 建议且实测生效;promptNonBlocking 不做本地预占是站得住的设计(其调用方自管并发),且服务端拒绝现在以同一 typed error 浮出。

对照实验 —— ACP 上奇怪的 settle 形状是既有行为,不是本 PR 的

对同一 WS 连接并发发 6 个阻塞式 session/prompt 时,未胜出的 prompt 以 1×stopReason:"cancelled" + 4×generic Internal error(details: "Request was aborted.")settle。为正确归因,我对 main 构建跑了完全相同的探针:形状一致——[0] cancelled, [1..4] internal, [5] end_turn。即并发阻塞 ACP prompt 在 main 上本来就互相打断(后到者赢,被顶掉的以 cancelled/internal settle);本 PR 只是把超 cap 的 prompt 从"靠打断获胜"改为"以 prompt_queue_full 拒绝"——与 PR 描述的 Risk 段一致。被顶掉 prompt 的 generic internal 形状是值得后续跟进的既有毛刺,不是本 PR 的缺陷。

合并注意项(可执行部分)

  1. 需要 rebase。 GitHub 标记分支 CONFLICTING;本地确认冲突仅限 packages/cli/src/serve/acpHttp/transport.test.ts(其余触碰文件可自动合并)。合并前需 rebase 到当前 main(落后 19 个 commit)。
  2. 未声明的包预算上调,余量仅 3 字节。 PR 把 packages/sdk-typescript/scripts/build.jsMAX_DAEMON_BROWSER_BUNDLE_BYTES114 * 1024 调到 115 * 1024——PR 描述只字未提。构建出的浏览器包 117,757 字节,对上限 117,760。上调本身说得通(SDK 新增了真实预占逻辑),但 (a) 应当声明,(b) 3 字节余量保证下一个碰 daemon SDK 的变更必然爆闸门、然后同样悄悄上调。建议本 PR 内就落在一个有真实余量的整数值(如 118120 * 1024)。
  3. (次要,出自我此前 review) bridge 层 PromptQueueFullError 拒绝时仍无专门的 daemon 日志行;通用 route 日志会以 WARN 记录 status=503 和 sessionId,运维不至于失明——但补一条专门日志(对齐 WorkspaceMismatchError 先例)依然更好。不阻塞。

本轮未覆盖

DaemonSessionClient subscription 模式的预占/释放路径仅由(现已稳定的)单测覆盖——我没有对真实 daemon 驱动它们。Windows/Linux 未在本机测试(CI 已覆盖)。

建议

功能上可合并:所有声明行为都对真实 daemon 端到端验证通过,包括最容易出错的跨暴露面共享计数。合并前置:(1) rebase 解决 transport.test.ts 冲突并让 CI 重新绿;(2) 在 PR 描述中声明包预算上调,或改为留出真实余量的值。rebase 后若冲突区域有任何风险点,我乐意复验。

wenshao
wenshao previously approved these changes Jun 12, 2026
@wenshao wenshao dismissed stale reviews from qqqys and themself via 69ab353 June 12, 2026 22:05
@wenshao

wenshao commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

@qwen-code /triage

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Thanks for the PR!

Template looks good ✓

On direction: this is a natural and well-motivated follow-up to #4490. Prompt deadlines bounded execution time but left admission unbounded — that gap is real, and closing it with a per-session cap mirrors the existing --max-sessions / --max-connections pattern operators already understand. No CHANGELOG reference needed; the area is clearly within serve's load-management scope.

On approach: the three-layer design (bridge as authoritative gate, REST/ACP mapping to stable 503 shape, SDK local guard for early-fail) is sound and each layer earns its existence. The SDK guard prevents wasteful SSE connections, not just redundant HTTP requests. The branchSession exemption from the prompt cap is the right call — it's serialized on the same FIFO but isn't a prompt.

One observation: DEFAULT_MAX_PENDING_PROMPTS_PER_SESSION = 5 is duplicated across bridge.ts, server.ts, and DaemonClient.ts with "keep in sync" comments. This is a pragmatic choice to avoid cross-package coupling, but it's the kind of constant that drifts silently. Not a blocker — just flagging it for awareness.

Minor nit: the PR title says fix(serve) but this reads more like a feature addition (feat(serve)). Doesn't block anything.

Moving on to code review. 🔍

中文说明

感谢贡献!

模板完整 ✓

方向:这是 #4490 的自然延续。prompt deadline 限制了执行时间但没有约束 admission,这个缺口是真实的。用 per-session cap 来填补,和已有的 --max-sessions / --max-connections 模式一致。无需 CHANGELOG 引用,该领域明显在 serve 的负载管理范围内。

方案:三层设计(bridge 作为权威入口、REST/ACP 映射到稳定的 503 形状、SDK 本地 guard 提前失败)合理,每层都有存在的理由。SDK guard 阻止的是浪费的 SSE 连接,而不仅仅是冗余的 HTTP 请求。branchSession 不计入 prompt cap 是正确的——它在同一个 FIFO 上串行但不是 prompt。

一个观察:DEFAULT_MAX_PENDING_PROMPTS_PER_SESSION = 5bridge.tsserver.tsDaemonClient.ts 中重复,带有 "keep in sync" 注释。这是为了避免跨包耦合的务实选择,但这种常量容易悄悄偏移。不阻断——只是提醒注意。

小问题:PR 标题是 fix(serve) 但更像是功能新增(feat(serve))。不影响任何判断。

进入代码审查 🔍

Qwen Code · qwen3.7-max

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Code review

Reviewed the full diff (23 files, +1421/-87). The implementation is clean and well-structured — no critical blockers found.

Independent proposal comparison: my baseline was "add a pendingPromptCount counter to SessionEntry, check before enqueue, release on settlement, map to 503 in REST." The PR matches this exactly and adds sensible layers (SDK local guard, capabilities advertisement, ACP error mapping). No simpler path was missed.

Key correctness points verified in code:

  • sendPrompt signature change from async to sync (returning Promise) is correct and intentional — it must throw synchronously for the REST route to catch before returning 202 Accepted.
  • Slot release via result.finally(releasePromptSlot) with a promptSlotReleased guard ensures exactly-once release across all settlement paths (success, failure, abort).
  • Pre-aborted signals rejected before consuming a slot — avoids wasted capacity.
  • branchSession correctly exempted from the prompt cap — serialized on the same FIFO but not a prompt.
  • REST 503 response shape: Retry-After: 5, code: "prompt_queue_full", no promptId — matches the design doc.
  • ACP JSON-RPC error has stable data.errorKind: "prompt_queue_full" — clients can match on it.
  • SDK reservePromptSlot + releasePromptSlot pattern mirrors the bridge, with releases on turn_complete, turn_error, caller abort, SSE end, and fetch failure.

Non-blocking observations:

  • Error messages shortened in a few places ('SSE stream ended', 'No SSE body', 'timeout', 'subscription active'). These are test-visible changes that any downstream consumers matching on message text should be aware of. Not a bug — just a heads-up.
  • Browser bundle size limit bumped 114→115 KB — reasonable for the added logic.

Test results

Unit tests (all passed)

Test file Tests Result
acp-bridge/bridge.test.ts 261 ✅ all passed
cli/serve/server.test.ts 429 ✅ all passed
cli/serve/acpHttp/transport.test.ts 92 ✅ all passed
sdk-typescript/DaemonClient.test.ts + DaemonSessionClient.test.ts 173 ✅ all passed

Total: 955/955 passed, 0 failures. Typecheck passed for all packages.

Real-scenario testing

Default cap (5) — GET /capabilities

$ node packages/cli/dist/index.js serve --port 0
qwen serve: daemon log → /home/runner/.qwen/debug/daemon/serve-5272-918fec44.log
qwen serve: session reaper started (interval 60000ms, idle threshold 1800000ms)
qwen serve listening on http://127.0.0.1:39699 (mode=http-bridge, workspace=.../triage)
qwen serve: bound to workspace ".../triage"
qwen serve: bearer auth disabled (loopback default). Set QWEN_SERVER_TOKEN to enable.
qwen serve: /acp WebSocket transport enabled on /acp

$ curl -s http://127.0.0.1:39699/capabilities | jq '.limits'
{
  "maxPendingPromptsPerSession": 5
}

Disabled cap (0) — GET /capabilities

$ node packages/cli/dist/index.js serve --port 0 --max-pending-prompts-per-session 0
qwen serve listening on http://127.0.0.1:38081 (mode=http-bridge, workspace=.../triage)

$ curl -s http://127.0.0.1:38081/capabilities | jq '.limits'
{
  "maxPendingPromptsPerSession": null
}

Invalid value rejected at boot

$ node packages/cli/dist/index.js serve --port 0 --max-pending-prompts-per-session -5
qwen serve: --max-pending-prompts-per-session must be a non-negative integer (0 / Infinity = unlimited).
Exit code: 1

All three scenarios behave as specified in the design doc. The limits field is additive to the capabilities envelope (older daemons simply omit it), and the null sentinel for disabled matches the documented contract.

中文说明

代码审查

审查了全部 diff(23 个文件,+1421/-87)。实现干净、结构良好——没有发现阻断性问题。

独立方案对比: 我的基线方案是"给 SessionEntrypendingPromptCount 计数器,入队前检查,settle 时释放,REST 映射为 503。" PR 完全匹配并增加了合理的层次(SDK 本地 guard、capabilities 广告、ACP 错误映射)。没有遗漏更简单的路径。

关键正确性验证:

  • sendPromptasync 改为同步返回 Promise 是正确且必要的——需要同步抛错以便 REST 路由在返回 202 Accepted 之前捕获。
  • slot 释放通过 result.finally(releasePromptSlot)promptSlotReleased guard 确保所有结算路径(成功、失败、中止)恰好释放一次。
  • 预中止信号在消耗 slot 之前被拒绝——避免浪费容量。
  • branchSession 正确豁免——在同一 FIFO 上串行但不是 prompt。
  • REST 503 响应:Retry-After: 5code: "prompt_queue_full"、无 promptId——与设计文档一致。
  • ACP JSON-RPC 错误有稳定的 data.errorKind: "prompt_queue_full"
  • SDK 的 reservePromptSlot + releasePromptSlot 模式镜像 bridge,在所有结算路径上释放。

非阻断观察:

  • 部分错误信息被缩短('SSE stream ended''No SSE body''timeout''subscription active')。对按文本匹配的下游消费者有影响。不是 bug——仅提醒。
  • 浏览器包大小限制从 114→115 KB——合理。

测试结果

单元测试(全部通过)

测试文件 测试数 结果
acp-bridge/bridge.test.ts 261 ✅ 全部通过
cli/serve/server.test.ts 429 ✅ 全部通过
cli/serve/acpHttp/transport.test.ts 92 ✅ 全部通过
sdk-typescript/DaemonClient.test.ts + DaemonSessionClient.test.ts 173 ✅ 全部通过

总计:955/955 通过,0 失败。 Typecheck 全部通过。

真实场景测试

三个场景(默认 cap=5、禁用 cap=0、非法值 -5)均按设计文档工作。limits 字段对 capabilities 是 additive 的,null 哨兵值表示禁用。

Qwen Code · qwen3.7-max

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Final verdict: ✅ Approve

Stepping back — this is a solid, well-executed infrastructure PR. The motivation is clear: #4490 added prompt deadlines but left the admission door wide open, and this PR closes that gap with the minimal effective change.

The counter-and-release pattern is the simplest correct solution. The three-layer design (bridge gate → REST/ACP mapping → SDK early-fail) is sound, with each layer earning its existence rather than being speculative. The slot release logic is defensive (double-release guard, .finally for all settlement paths) without being paranoid.

955 tests pass, typecheck clean, real-scenario testing confirms the capabilities endpoint advertises the limit correctly in default (5), disabled (null), and invalid (boot rejection) cases.

My independent proposal matched the PR's approach — I didn't find a simpler path it missed. The scope feels right: the SDK local guard adds complexity but prevents wasteful SSE connections, not just redundant HTTP round-trips.

Two non-blocking notes for the author's awareness (neither blocks the merge):

  • The shortened error messages ('SSE stream ended', 'No SSE body', etc.) are a minor public-contract change for anyone matching on message text.
  • The DEFAULT_MAX_PENDING_PROMPTS_PER_SESSION = 5 constant is duplicated in three places with "keep in sync" comments — worth a thought on whether a shared constants module would be cleaner long-term.

Approving. 🚢

中文说明

最终判定:✅ 批准

这是一个扎实的基础设施 PR。动机清晰:#4490 增加了 prompt deadline 但没有约束 admission,本 PR 用最小有效变更填补了这个缺口。

计数器和释放模式是最简单的正确方案。三层设计(bridge gate → REST/ACP 映射 → SDK 提前失败)合理,每层都有存在的理由。slot 释放逻辑防御性强(双重释放 guard、.finally 覆盖所有结算路径)但不过度。

955 个测试通过,typecheck 干净,真实场景测试确认 capabilities 端点在默认(5)、禁用(null)和非法值(启动拒绝)情况下均正确工作。

我的独立方案与 PR 方案一致——没有找到更简单的路径。范围合理:SDK 本地 guard 增加了复杂度但阻止了浪费的 SSE 连接。

两个非阻断提醒(不影响合并):

  • 缩短的错误信息是对外合约的微小变更。
  • DEFAULT_MAX_PENDING_PROMPTS_PER_SESSION = 5 在三处重复——长期来看可考虑共享常量模块。

批准。🚢

Qwen Code · qwen3.7-max

@qwen-code-ci-bot qwen-code-ci-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, looks ready to ship. ✅

qqqys
qqqys previously approved these changes Jun 13, 2026

@qqqys qqqys left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior critical feedback remains resolved on the current head; I did not find a new critical blocker in this pass.

Log synchronous prompt queue rejections, document the sync admission contract, clean up SDK prompt-slot release, and cover the reviewed backpressure edge cases.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
@doudouOUC doudouOUC dismissed stale reviews from qqqys and qwen-code-ci-bot via 143a7bc June 13, 2026 08:32
Reduce the generated daemon client bundle slightly and raise the browser daemon SDK bundle budget to 116 KiB so the PR merge ref has practical headroom.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
@wenshao

wenshao commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

@qwen-code /triage

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Thanks for the PR!

Template looks good ✓ — all required headings present with bilingual body.

On direction: this is a natural and necessary follow-up to #4490. Prompt deadlines bounded execution time but left admission unbounded — a single session could enqueue unlimited accepted prompts, accumulating promises, SSE waits, and closures. Adding an explicit admission cap with retryable semantics is the right fix. The area is squarely within qwen serve's core mission as a daemon load-management feature.

On approach: the layered defense (bridge admission → REST 503 → ACP error data → SDK local reservation → capabilities advertisement) feels proportionate to the problem. Each layer has a clear role: bridge is authoritative, SDK is early-fail guard, capabilities lets clients self-configure. The default of 5 is reasonable for single-user / small-team usage and the escape hatch (--max-pending-prompts-per-session 0) is clean. branchSession correctly excluded from the prompt cap.

One observation: the PR bundles design doc, E2E test plan, docs update, and implementation in a single commit set — that's thorough but makes the diff large (+1677 across 24 files). The implementation itself is well-scoped; the extras are all justified by project conventions.

Moving on to code review. 🔍

中文说明

感谢贡献!

模板完整 ✓ — 所有必需的标题都存在,附带双语正文。

方向:这是 #4490 的自然且必要的后续。prompt deadline 限制了执行时间,但没有约束 admission——单个 session 可以无限 enqueue 已接受的 prompt,堆积 promise、SSE wait 和闭包。增加显式 admission cap 并提供可重试语义是正确的做法。这个领域完全在 qwen serve 作为 daemon 负载管理功能的核心使命范围内。

方案:分层防御(bridge admission → REST 503 → ACP error data → SDK 本地 reservation → capabilities 广告)与问题的规模相称。每一层都有明确的角色:bridge 是权威入口,SDK 是提前失败的保护,capabilities 让客户端自动配置。默认值 5 对单用户/小团队使用是合理的,逃逸阀(--max-pending-prompts-per-session 0)也很干净。branchSession 正确地不计入 prompt cap。

一个观察:PR 把设计文档、E2E 测试计划、文档更新和实现打包在同一个提交集中——很全面但也让 diff 变大了(24 个文件 +1677 行)。实现本身范围合理;额外内容都符合项目惯例。

进入代码审查 🔍

Qwen Code · qwen3.7-max

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Code review

Reviewed the full diff (24 files, +1677/-89). The implementation is careful and well-structured.

Independent proposal: for unbounded prompt queues, I'd add a per-session counter in the bridge, throw synchronously when the cap is hit, map that to HTTP 503 in the REST route, and optionally add SDK-side local accounting as early-fail. That's essentially what this PR does.

Findings:

The bridge changes are solid. sendPrompt correctly stays non-async so admission failures throw before the HTTP route returns 202. The slot-release logic uses an idempotent promptSlotReleased guard and attaches via result.finally(releasePromptSlot) — exactly once, no leaks. Pre-aborted signals are rejected before incrementing the counter. The PromptQueueFullError class is clean with the right fields.

The REST route wraps bridge.sendPrompt() in a try/catch (since it's no longer async) and maps PromptQueueFullError to 503 with Retry-After: 5. The ACP dispatch adds a dedicated case returning stable errorKind: "prompt_queue_full" data. Both look correct.

The SDK-side reservePromptSlot is a clean reservation system — idempotent release, proper cleanup on all exit paths (completion, turn_error, abort, SSE end, fetch failure). DaemonSessionClient.prompt() correctly uses client.reservePromptSlot() for the subscription path.

No critical blockers found. The validation logic is duplicated across three layers (bridge constructor, runQwenServe, CLI serve command) but that's the established project pattern for serve options and each layer validates independently.

Minor observation: a few error messages were shortened ('SSE stream ended without turn completion''SSE stream ended', 'The operation timed out''timeout'). These are cosmetic and don't affect correctness.

Test results

All focused unit tests pass:

Bridge tests:        267 passed (267)
CLI serve tests:     429 passed (429)
ACP HTTP tests:       92 passed (92)
SDK DaemonClient:    158 passed (158)
SDK SessionClient:    27 passed (27)
                    ─────────────
                    973 total, 0 failures

Typecheck:           clean (all packages)

New tests specifically validate: sixth prompt rejected at default cap, slot release after success/failure/abort/SSE-end, pre-aborted prompts skip the counter, branchSession excluded from cap, disabled cap (0/Infinity), invalid cap rejection at boot, REST 503 shape, ACP stable error data, SDK local rejection without fetch, SDK reservation release on all exit paths.

No TUI changes — this is non-UI serve/bridge/SDK behavior, so tmux testing is N/A per the PR's own test plan.

中文说明

代码审查

审查了完整 diff(24 个文件,+1677/-89)。实现仔细且结构良好。

独立方案: 对于无界 prompt 队列,我会在 bridge 中添加 per-session 计数器,达到上限时同步抛出异常,在 REST 路由中映射为 HTTP 503,并可选择在 SDK 端添加本地计数作为提前失败。这正是这个 PR 所做的。

发现:

bridge 改动可靠。sendPrompt 正确地保持非 async,使 admission 失败在 HTTP 路由返回 202 之前抛出。slot 释放逻辑使用幂等的 promptSlotReleased 守卫,通过 result.finally(releasePromptSlot) 附加——恰好一次,无泄漏。pre-aborted 信号在递增计数器之前被拒绝。PromptQueueFullError 类干净且包含正确字段。

REST 路由用 try/catch 包装 bridge.sendPrompt()(因为它不再是 async),将 PromptQueueFullError 映射为 503 + Retry-After: 5。ACP dispatch 添加专用 case 返回稳定的 errorKind: "prompt_queue_full" 数据。两者看起来都正确。

SDK 端的 reservePromptSlot 是干净的 reservation 系统——幂等释放,在所有退出路径上正确清理(完成、turn_error、abort、SSE 结束、fetch 失败)。DaemonSessionClient.prompt() 正确地为订阅路径使用 client.reservePromptSlot()

未发现关键阻塞问题。验证逻辑在三层(bridge 构造函数、runQwenServe、CLI serve 命令)中重复,但这是 serve 选项的既定项目模式,每层独立验证。

测试结果

所有聚焦单元测试通过:973 个测试,0 个失败。Typecheck 干净。

无 TUI 变更——这是非 UI 的 serve/bridge/SDK 行为,tmux 测试不适用。

Qwen Code · qwen3.7-max

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Final verdict: ✅ Approve

Stepping back — this is a well-designed, well-tested addition to the serve layer's load-management story. The motivation is clear (unbounded admission after #4490), the implementation matches the stated design, and the test coverage is thorough across all four affected packages.

My independent proposal (per-session counter, synchronous throw, HTTP 503 mapping, optional SDK guard) matches the PR's approach closely. The PR exceeds it in polish: idempotent slot release, pre-abort fast-path, capabilities advertisement, stable ACP error data, and the branchSession exclusion — all details I'd have wanted but might not have thought of upfront.

The scope is large in file count (24) but each file has a clear purpose. The implementation code itself is tight — the bulk comes from tests and documentation, which is the right ratio for a load-management feature. The three-layer validation (bridge, runQwenServe, CLI) follows established project patterns.

973 tests pass. Typecheck clean. No correctness issues found in the code review.

LGTM, ships the feature cleanly. ✅

中文说明

最终决定:✅ 批准

退一步看——这是对 serve 层负载管理功能的一个设计良好、测试充分的补充。动机清晰(#4490 后无界 admission),实现符合设计声明,测试覆盖在所有四个受影响的包中都很充分。

我的独立方案(per-session 计数器、同步抛出、HTTP 503 映射、可选 SDK 守卫)与 PR 方案高度一致。PR 在细节上超越了我的方案:幂等 slot 释放、pre-abort 快速路径、capabilities 广告、稳定的 ACP error data、以及 branchSession 排除——这些都是我会想要但可能不会 upfront 想到的细节。

范围在文件数量上较大(24 个),但每个文件都有明确的目的。实现代码本身很紧凑——大部分来自测试和文档,这对于负载管理功能来说是正确的比例。三层验证(bridge、runQwenServe、CLI)遵循既定的项目模式。

973 个测试通过。Typecheck 干净。代码审查中未发现正确性问题。

LGTM,干净地交付了功能。✅

Qwen Code · qwen3.7-max

@qwen-code-ci-bot qwen-code-ci-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, looks ready to ship. ✅

@qqqys qqqys left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior critical feedback remains resolved on head 08eb11b; the new follow-up commits address the logging and bundle-headroom items, and I did not find a new critical blocker in this pass.

@doudouOUC doudouOUC self-assigned this Jun 13, 2026
@doudouOUC doudouOUC merged commit acb0275 into QwenLM:main Jun 13, 2026
28 checks passed
@doudouOUC doudouOUC deleted the fix/prompt-queue-backpressure branch June 13, 2026 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants