feat(cli,sdk): qwen serve daemon (Stage 1) by wenshao · Pull Request #3889 · QwenLM/qwen-code

wenshao · 2026-05-07T04:04:11Z

Summary

What changed: Implements Stage 1 of issue Daemon mode (qwen serve): proposal & open decisions #3803 — a qwen serve HTTP daemon that bridges ACP NDJSON over HTTP + SSE, plus an SDK-side DaemonClient that drives every Stage 1 route. Closes the §04 routes table for Stage 1 (health, capabilities, session create/list, prompt, cancel, events SSE, set-model, permission vote).
Why it changed: Stage 1 is the flag-gated experimental landing that lets users run daemon-mode on real workloads while Stage 2's invasive in-process refactor is being designed. Per the issue body, the bridge is intentionally a qwen --acp child-process per session — startup cost is not amortized across sessions, but no business-logic in core changes. Stage 2 in-process is the follow-up.
Reviewer focus: (1) the protocol surface in packages/cli/src/serve/server.ts against §04; (2) HttpAcpBridge lifecycle — spawn/init/teardown, FIFO prompt queue, cancel→permission resolution per ACP spec, shutdown drain; (3) EventBus correctness — bounded subscriber queues with client_evicted overflow, replay ring, AsyncIterable abort handling; (4) auth + threat model — bearer + Host allowlist, refusing 0.0.0.0 without a token.

Architecture

This PR ships Mode B headless of the "1 Daemon Instance = 1 Session" architecture (see issue #3803):

Each spawned qwen --acp child is a Daemon Instance bound to one session
The qwen serve HTTP front in this PR is the daemon HTTP server itself; in multi-session deployments an external orchestrator routes to multiple daemon instances (see External Reference Architecture)
Multi-tenancy / quota / OIDC / sandbox are all External Reference Architecture concerns — implementation by commercial platforms / k8s operators / cloud vendors using the daemon as a building block

Designed against §03 §2 process model, §03 §7 dual deployment modes, and §22 single-vs-multi-session comparison.

Architecture diagram

This PR — Mode B headless daemon (single session):

┌──────────────────────── qwen serve --port 7776 ──────────────────────┐
│                                                                       │
│             ┌─────────────────────────────┐                           │
│             │ EventBus + ACP core          │                          │
│             │  - 1 session bound           │                          │
│             │  - LLM stream                │                          │
│             │  - tool dispatch             │                          │
│             │  - first-responder vote      │                          │
│             └──────────────┬──────────────┘                          │
│                            ↑                                          │
│             ┌──────────────┴──────────────┐    HTTP/SSE              │
│             │ Express 5 HTTP server        │ ◄────────► remote        │
│             │ + bearer auth + Host allow   │    clients               │
│             └──────────────────────────────┘                          │
│                                                                       │
└───────────────────────────────────────────────────────────────────────┘
                                                ↑
                ┌──────────────┬──────────────┬─┴──────────────┬───────┐
                ↓              ↓              ↓                ↓       ↓
            ┌────────┐  ┌────────┐      ┌────────┐      ┌────────┐ ┌──────┐
            │ remote │  │ Web UI │      │ IDE ext│      │ IM bot │ │  ... │
            │  CLI   │  │        │      │        │      │        │ │      │
            └────────┘  └────────┘      └────────┘      └────────┘ └──────┘

Bigger picture — multi-session via external orchestrator (out of this PR's scope):

                ┌──────────────────────────────────────────┐
                │ External Orchestrator (out of scope)     │
                │   - sessionScope routing                  │
                │   - daemon spawn / route / cleanup        │
                └────────────────────┬─────────────────────┘
                                     │ spawn per session
       ┌─────────────────┬───────────┼───────────────┬──────────────┐
       ↓                 ↓           ↓               ↓              ↓
 ┌───────────┐     ┌───────────┐ ┌───────────┐ ┌───────────┐  ┌──────────┐
 │ daemon-1  │     │ daemon-2  │ │ daemon-3  │ │ daemon-N  │  │   ...    │
 │ Mode B    │     │ Mode B    │ │ Mode B    │ │ Mode B    │  │          │
 │ session-A │     │ session-B │ │ session-C │ │ session-N │  │          │
 │ (this PR) │     │ (this PR) │ │ (this PR) │ │ (this PR) │  │          │
 └───────────┘     └───────────┘ └───────────┘ └───────────┘  └──────────┘

qwen-code main roadmap (in-project follow-ups)

Stage 1.5 (~4d) Mode A qwen --serve flag — TUI co-hosts the same Express HTTP server + TUI as in-process EventBus subscriber (decision 6 fan-out)
Stage 2 (~1-2w) daemon polish — mDNS / OpenAPI auto-gen / WebSocket bidi / multi-token / Prometheus metrics / HttpTransport SDK adapter

After Stage 2 the daemon protocol surface is feature complete (~3 weeks total from Stage 1).

External Reference Architecture (out of project scope)

Designed but not on qwen-code's roadmap; meant for external integrators (commercial platforms / k8s operators / cloud vendors):

Orchestrator (qwen-coordinator, multi-daemon spawn / route / cleanup) — see §04 §8.2 orchestrator API and §22
Multi-tenancy (Tenant / OIDC / Quota / Audit) — see §23
Shell sandbox (NoSandbox / OS user / namespace / container / remote) — see §11
SaaS deployment (k8s / Postgres / Redis / S3) — see §16 and §23 §七

What's in (8 commits, ~5100 LOC)

Commit	Layer	Topic
`61f2f59`	cli	scaffold `qwen serve` subcommand, Express + auth + Host allowlist + 0.0.0.0+token refusal, /health + /capabilities, HttpAcpBridge interface stub
`8d7c03a`	cli	wire `HttpAcpBridge`: spawn `qwen --acp` per workspace via injectable ChannelFactory, ACP handshake with 10s init timeout, sessionScope:single reuse, BridgeClient fs proxy + sessionUpdate buffer; `POST /session` route
`ca996ec`	cli	`POST /session/:id/prompt` (per-session FIFO queue, no-poison on failure, routing-id overrides body) + `POST /session/:id/cancel`; `SessionNotFoundError` typed for clean 404 mapping
`41aa950`	cli	replace sessionUpdate buffer with `EventBus`; `GET /session/:id/events` SSE with `Last-Event-ID` reconnect, 15s heartbeat, AbortController on `req.close`, ring-backed replay, bounded subscriber queues with `client_evicted` overflow
`6ee655f`	cli	`POST /permission/:requestId` first-responder vote: `requestPermission` publishes `permission_request` event, awaits HTTP vote, `cancelSession`/`shutdown` resolve outstanding as `cancelled` per ACP spec; `permission_resolved` event fan-out
`8206a64`	sdk	`DaemonClient` SDK-side HTTP client (sibling to ProcessTransport, not a Transport replacement — different protocols); `DaemonHttpError`; `parseSseStream`
`a8ce5e0`	cli + sdk	`GET /workspace/:id/sessions` (URL-encoded cwd → list) + `POST /session/:id/model` (publishes `model_switched`); matching SDK methods; `errorMessage` helper for JSON-RPC error visibility
`ad0e6ec`	cli + sdk	self-review audit round 1: timing-safe bearer compare (SHA-256 + `crypto.timingSafeEqual`), coalesce concurrent `spawnOrAttach` for same workspace under single-scope, tighten `parseLastEventId` to pure decimal digits, IPv6 loopback ergonomics (`::1`/`[::1]` in LOOPBACK_BINDS), SDK `failOnError` reads body once as text then JSON-parses (was: `res.json()` consumed body so the text fallback was empty), document Stage 1 trust model on BridgeClient, +13 test-gap fills

Capabilities envelope

STAGE1_FEATURES advertises 9 tags so clients gate UI off features, never off mode (per §10):

['health', 'capabilities', 'session_create', 'session_list',
 'session_prompt', 'session_cancel', 'session_events',
 'session_set_model', 'permission_vote']

Threat model (default deployment)

127.0.0.1 only, bearer auth disabled (developer convenience).
--hostname 0.0.0.0 requires --token or QWEN_SERVER_TOKEN — boot refuses without one.
LOOPBACK_BINDS includes IPv6 ::1/[::1] so IPv6-loopback users don't need a token either.
Host allowlist on loopback binds defends against DNS rebinding; allowed Hosts are localhost:port, 127.0.0.1:port, [::1]:port, host.docker.internal:port.
CORS denies any browser Origin.
Bearer token compare is constant-time (SHA-256 both sides + timingSafeEqual); 401 responses are uniform across "no header"/"bad scheme"/"wrong token" so a side-channel can't distinguish.
SIGINT/SIGTERM drain agent children before closing the listener.
Per-subscriber bounded SSE queues with client_evicted overflow keep a slow client from holding the daemon hostage.
BridgeClient file-proxy methods (readTextFile/writeTextFile) do NOT enforce a workspace-cwd sandbox in Stage 1: agent runs as the same UID with shell-tool access, so a sandbox here would be theatre. Stage 4+ remote-sandbox swaps the Client for a sandbox-aware variant — see issue Daemon mode (qwen serve): proposal & open decisions #3803 §11.

Out of scope for this PR (deferred follow-ups)

WS /session/:id — Stage 1.5; SSE covers the primary streaming use case.
POST /file/read, /file/write — agent has direct fs; useful only for daemon-only file API to remote clients.
An HttpTransport adapter that translates ACP↔stream-json so the existing query() flow gets daemon-mode for free. DaemonClient is a sibling of ProcessTransport in this PR, not a replacement.
Stage 2 in-process refactor — the HttpAcpBridge interface is shaped so route handlers don't change when the bridge stops spawning subprocesses.
Cosmetic: the qwen --acp child currently forks itself once internally (sandbox / sub-process), so each session costs two node processes in the tree. Worth a follow-up to suppress when launched by the daemon (~150MB RSS per session).

Test plan

npm run typecheck -w packages/cli ✅
npm run typecheck -w packages/sdk-typescript ✅
npm run lint (eslint clean across new files) ✅
npm test -w packages/cli — 5043 pass / 7 skipped (was 4967 baseline, +89 serve tests, 0 regressions; 1 pre-existing detect-terminal-theme flake unrelated)
npm test -w packages/sdk-typescript — 236 pass (was 201 baseline, +35 daemon tests, 0 regressions)
Real e2e against a built daemon spawning a real qwen --acp child:
- POST /session returns sessionId; repeat call same workspace returns attached: true
- Concurrent POST /session for the same workspace coalesces — ONE child spawned, both callers get the same sessionId, exactly one reports attached: false
- POST /session/:id/prompt returns {stopReason: "end_turn"} after a short turn
- GET /session/:id/events streamed 13 distinct frames (available_commands_update, agent_thought_chunk × 9, agent_message_chunk × 3, usage) during a single prompt
- Permission flow: triggered a write_text_file tool, captured permission_request event, voted proceed_once via POST /permission/$requestId, captured matching permission_resolved, verified the file was actually created on disk with the requested content
- POST /session/:id/cancel → 204
- SDK e2e via the built DaemonClient against the same daemon: capabilities, list, attach reuse, parallel events + prompt, AbortSignal cleanup, set-model success + 404 paths
- Bearer auth: right token → 200, wrong/missing/malformed → uniform 401

Design references

Issue: Daemon mode (qwen serve): proposal & open decisions #3803 — Stage 1 ships everything in §04 except the deferred items above. The architectural choices in §03 (decisions §1, §4, §5, §6) are honored verbatim:
- §1 default sessionScope: 'single' for cross-client live-collaboration
- §4 FileReadCache stays per-session (BridgeClient fs proxy goes through the agent's session, not daemon-wide)
- §5 permission flow reuses ACP spec semantics; HTTP route emits the request, any client casts the deciding vote
- §6 same-session prompts FIFO-serialize (with concurrent spawnOrAttach coalesced to one channel under single-scope), events fan out to all subscribers, cancel bypasses the queue

中文版本（点击展开）

摘要

改了什么：实现 issue Daemon mode (qwen serve): proposal & open decisions #3803 的 Stage 1 —— qwen serve HTTP daemon，通过 HTTP + SSE 桥接 ACP NDJSON 协议；外加 SDK 侧的 DaemonClient 驱动每一个 Stage 1 路由。完整覆盖 §04 路由表的 Stage 1 部分（health、capabilities、session create/list、prompt、cancel、events SSE、set-model、permission vote）。
为什么改：Stage 1 是 flag-gated 实验性落地——让用户在真实 workload 上跑 daemon 模式收集需求。按 issue 描述，桥接器有意做成"每 session 一个 qwen --acp 子进程"——启动成本不能跨 session 摊销，但 core 业务逻辑零变更。
Reviewer 关注点：(1) packages/cli/src/serve/server.ts 协议表面对照 §04；(2) HttpAcpBridge 生命周期 —— spawn/init/teardown、FIFO prompt 队列、cancel→permission 按 ACP spec 应答 cancelled、shutdown drain；(3) EventBus 正确性 —— 有界 subscriber 队列 + client_evicted 溢出、replay ring、AsyncIterable abort 处理；(4) 认证 + 威胁模型 —— bearer + Host allowlist，0.0.0.0 无 token 拒启动。

架构

本 PR 实现 Mode B headless —— "1 Daemon Instance = 1 Session" 架构（见 issue #3803）：

每个 spawn 出来的 qwen --acp child = Daemon Instance，绑定一个 session
本 PR 的 qwen serve HTTP front 即是 daemon HTTP server；多 session 部署下由外部 orchestrator 路由到多个 daemon instance（见 External Reference Architecture）
多租户 / 配额 / OIDC / sandbox 全部是 External Reference Architecture 范畴——由商业平台 / k8s operator / 云厂商基于 daemon building block 自行实现

对应文档：§03 §2 状态进程模型、§03 §7 双部署模式、§22 单 vs 多 Session 对比。

架构图

参见英文版本中的 ASCII 图（"This PR — Mode B headless daemon" + "Bigger picture — multi-session via external orchestrator"）。

qwen-code 主线路线图（项目内后续 PR）

Stage 1.5（~4d）Mode A qwen --serve flag —— TUI 与 HTTP server 同进程 + TUI 作为 in-process EventBus subscriber（决策 §6 fan-out）
Stage 2（~1-2w）daemon 完善 —— mDNS / OpenAPI 自动生成 / WebSocket bidi / 多 token / Prometheus metrics / HttpTransport SDK 适配器

Stage 2 后 daemon protocol surface feature complete（从 Stage 1 起约 3 周）。

External Reference Architecture（项目范围外）

设计已完成但不在 qwen-code 主线路线图，给外部集成方（商业平台 / k8s operator / 云厂商）参考：

Orchestrator（qwen-coordinator，multi-daemon spawn / route / cleanup）—— 见 §04 §8.2 orchestrator API 与 §22
多租户（Tenant / OIDC / Quota / Audit）—— 见 §23
Shell sandbox（NoSandbox / OS user / namespace / container / remote）—— 见 §11
SaaS 部署（k8s / Postgres / Redis / S3）—— 见 §16 与 §23 §七

Capabilities envelope

STAGE1_FEATURES 共 9 个 feature tag。客户端按 features gate UI（不要按 mode）：

['health', 'capabilities', 'session_create', 'session_list',
 'session_prompt', 'session_cancel', 'session_events',
 'session_set_model', 'permission_vote']

威胁模型（默认部署）

仅绑 127.0.0.1，bearer auth 关闭（本地开发便利）
--hostname 0.0.0.0 必须配合 --token 或 QWEN_SERVER_TOKEN，否则拒启动
LOOPBACK_BINDS 包含 IPv6 ::1/[::1]
loopback 绑定的 Host allowlist 防 DNS rebinding
CORS 拒绝任何浏览器 Origin
Bearer token 比对常量时间（SHA-256 + timingSafeEqual）；401 在"缺 header"/"协议错"/"token 错"三种情况下完全一致
SIGINT/SIGTERM 在关闭 listener 之前先 drain agent 子进程
per-subscriber 有界 SSE 队列 + client_evicted 溢出机制

设计参考

Issue #3803 —— Stage 1 完整实现 §04 中除延后项之外的全部内容。§03 中的架构决策 §1 / §4 / §5 / §6 完全对齐：

§1 默认 sessionScope: 'single' 实现跨 client live-collaboration
§4 FileReadCache 保持 per-session
§5 权限流复用 ACP spec 语义；HTTP 路由发出请求，任意 client 投出决定性一票
§6 同 session prompt FIFO 串行 + 事件 fan-out + cancel 绕过队列

Adds a `serve` subcommand that boots an Express 5 listener with bearer auth, host allowlist, and CORS modeled on `vscode-ide-companion/src/ ide-server.ts`. Ships only `/health` and `/capabilities` to begin with; session/prompt/event routes will land in follow-up PRs once the per- session ACP child-process bridge in `httpAcpBridge.ts` is wired. Defaults to 127.0.0.1 with auth disabled so local development needs no configuration. Binding beyond loopback (e.g. `--hostname 0.0.0.0`) refuses to start without a token (`--token` or `QWEN_SERVER_TOKEN`). Capabilities envelope versioned at v=1 with a `features` array — clients should gate UI off `features`, never off `mode`, so subsequent PRs can add capability tags without breaking older clients. Per design issue's Stage 1 scope (~700-1000 LOC). Adds ~430 LOC of implementation + tests in this scaffold; the remaining budget belongs to the route wiring + bridge implementation in follow-ups.

Stage 1 follow-up to the scaffold. Implements the bridge between the HTTP daemon and the existing ACP child agent, plus the first session endpoint. `HttpAcpBridge.spawnOrAttach`: - Spawns `node $cliEntry --acp` per workspace via an injectable `ChannelFactory` (default uses `process.argv[1]`; tests use an in-memory `TransformStream` pair so they don't fork real processes). - Drives the ACP `initialize` + `newSession` handshake via the SDK's `ClientSideConnection`, with a 10s timeout that kills the channel. - Under `sessionScope: 'single'` (default), reuses the live session when the same canonical workspace cwd is requested again — backs the `attached: true` flag. - The `Client` impl on the bridge side proxies file reads/writes to local fs (daemon and agent share the host) and buffers `sessionUpdate` notifications for the SSE wiring in the next PR. `requestPermission` returns `cancelled` until the `/permission/:requestId` route lands. `POST /session`: - 400 on missing or relative `cwd`. - 200 with `{sessionId, workspaceCwd, attached}` on success. - 500 on bridge failure (the failing channel is killed, not leaked). `runQwenServe` constructs the bridge and ties `bridge.shutdown()` into the listener-close path so SIGINT/SIGTERM drain children before the socket closes. Tests (14 new, 0 regressions in the 4967-test baseline): - 9 bridge cases over an in-memory channel — fresh spawn, single-scope reuse, cross-workspace isolation, thread-scope independence, path canonicalization, relative-path rejection, init failure cleanup, init timeout, multi-channel shutdown. - 4 route cases for /session (missing/relative/200/500). - 1 lifecycle case asserting `runQwenServe.close()` calls `bridge.shutdown()` before closing the listener. Verified end-to-end: `qwen serve` boots, `POST /session` spawns a real `qwen --acp` child and returns the SDK-assigned `sessionId`, repeat calls under the same cwd return `attached: true`, `SIGTERM` reaps the child along with the listener.

…3803) Stage 1 follow-up after the bridge scaffold. Adds the two routes a client needs to actually run a turn against the daemon. Bridge: - `sendPrompt(sessionId, req)` looks up the session, FIFO-queues the call against the per-session prompt queue, and forwards through the SDK `ClientSideConnection.prompt`. Concurrent calls observe ACP's "one active prompt per session" invariant — second waits for first. - A failed prompt does NOT poison the queue; the tail catches and keeps draining so the next caller still runs (the original caller still sees its own rejection). - `cancelSession(sessionId, req?)` bypasses the queue and forwards the ACP notification immediately. ACP semantics: the agent winds down the *currently active* prompt; queued work is unaffected. - Both methods throw `SessionNotFoundError` (a typed Error subclass) when the id is unknown so route handlers can map cleanly to 404 without brittle message matching. - Both methods overwrite the `sessionId` field in the request body with the routing id — a stale or spoofed body would otherwise be dispatched to the wrong agent process. Routes: - `POST /session/:id/prompt` → 200 with PromptResponse, 400 on missing/non-array prompt, 404 on unknown session, 500 on agent error. - `POST /session/:id/cancel` → 204 always (cancel is a notification), 404 on unknown session. Tests (14 new — 7 bridge + 7 route, 0 regressions in the 4981 baseline): - sendPrompt: success forwards & returns response · routing-id overrides body sessionId · concurrent prompts FIFO-serialize (verified via per-prompt start/end ordering with a release latch) · failed prompt doesn't block subsequent prompts · 404 for unknown id. - cancelSession: forwards with routing id · 404 for unknown id. - Routes: 200/400/404/500 paths for prompt; 204 with body or empty + 404 for cancel. Verified end-to-end against a real `qwen --acp` child: - POST /session/:id/prompt with `[{type:'text',text:'hi'}]` → 200 `{"stopReason":"end_turn"}` in ~3.4s. - POST /session/:id/cancel → 204. - POST /session/does-not-exist/prompt → 404 with the unknown id surfaced in the body.

Stage 1 follow-up that turns prompt into a real streaming experience. Replaces the in-memory `notifications: SessionNotification[]` buffer on each session with a per-session EventBus and exposes it through `GET /session/:id/events` as an `text/event-stream` SSE feed. EventBus (`packages/cli/src/serve/eventBus.ts`): - Monotonic per-session ids (`v: 1` schema). Each `publish` chains an id, returning the materialized BridgeEvent. - Bounded ring (default 1000) backs `Last-Event-ID` reconnect — a consumer that drops can resume from `lastEventId` and replay any still-buffered events before live events flow. - Per-subscriber bounded queue (default 256). When a slow consumer overruns its queue, the bus appends a synthetic `client_evicted` terminal frame and closes that subscription so it can't hold the daemon hostage. Other subscribers are unaffected. - `subscribe()` returns an AsyncIterable — registration is synchronous so events `publish`ed immediately after the subscribe land in the queue (a generator-style implementation deferred registration to first `next()` and raced with publishes). - AbortSignal-aware: aborting the signal closes the iterator promptly. Bridge (`httpAcpBridge.ts`): - `BridgeClient.sessionUpdate` now publishes onto the session's EventBus instead of pushing to a plain array — every ACP notification the agent emits becomes a stream event automatically. - New `subscribeEvents(sessionId, opts?)` returns the bus's AsyncIterable; throws `SessionNotFoundError` for unknown ids. - Shutdown closes every live event bus before killing channels so pending consumers unwind cleanly. Route (`server.ts`): - `GET /session/:id/events` sets the SSE content type, advertises a 3s reconnect hint, and writes a 15s heartbeat comment frame to keep proxy/NAT connections alive. - Forwards the `Last-Event-ID` header to the bus. - `req.on('close')` triggers an AbortController that propagates into the bridge subscription so disconnects don't leak subscribers. - 404 when the bridge can't find the session. Capabilities envelope: `STAGE1_FEATURES` now advertises `session_create`, `session_prompt`, `session_cancel`, `session_events` in addition to `health`/`capabilities` so clients can light up UI for the routes that have actually shipped. Tests (16 new, 0 regressions in the 4995 baseline): - 9 EventBus unit cases — id sequencing, live delivery, replay, replay+live splice, fan-out to N subscribers, eviction on overflow, abort-signal unsubscribe, bus.close() drains subscribers, ring-size eviction. - 4 bridge subscribe cases — 404, sessionUpdate→event publishing via real ACP fake-agent, shutdown closes live subscriptions. - 4 SSE route cases against a live HTTP listener — frame format, Last-Event-ID forwarding, 404, abort propagation on disconnect. Verified end-to-end against a real `qwen --acp` child: - Subscribed to `/session/$SID/events`, fired `POST /session/$SID/prompt` with text content. Captured 13 distinct `event: session_update` SSE frames in real time during the model's response — `available_ commands_update` metadata, 9 `agent_thought_chunk` frames carrying the model's chain-of-thought, 3 `agent_message_chunk` frames with the actual reply, and a final usage frame with token totals. - Frames carry monotonic ids 1..13, the daemon-side counter, and are valid SSE per the EventSource spec.

Stage 1 follow-up that turns `BridgeClient.requestPermission` from a hardcoded `cancelled` placeholder into a real first-responder vote loop, and ships the HTTP route any attached client uses to cast the deciding vote. Bridge: - `requestPermission` generates a UUID requestId, registers a pending entry on a daemon-wide map (and the owning session's `pendingPermissionIds` set), publishes a `permission_request` event onto the session's EventBus (so SSE subscribers see it), and awaits the resolution. - New `respondToPermission(requestId, response)` resolves the pending promise with the supplied outcome. First call wins — subsequent calls return false. On success the bridge publishes a `permission_resolved` event so other attached clients can update their UI when the race is decided. - `cancelSession` and `shutdown` both resolve every still-pending permission for the affected session(s) as `{ outcome: { outcome: 'cancelled' } }` per the ACP spec requirement that a cancelled prompt MUST resolve outstanding requestPermission calls with cancelled. - New `pendingPermissionCount` getter exposes inflight count for inspection / tests. Route (`server.ts`): - `POST /permission/:requestId` validates the body's `outcome` is either `{ outcome: 'cancelled' }` or `{ outcome: 'selected', optionId: string }`, then forwards to `bridge.respondToPermission`. - 200 on accepted vote, 404 when the requestId is unknown or already resolved (Stage 1 doesn't differentiate), 400 on a malformed outcome. Capabilities envelope: STAGE1_FEATURES gains `permission_vote`. Tests (14 new — 9 bridge + 5 route, 0 regressions in the 5011 baseline): - Bridge: publishes permission_request with a generated requestId and waits; respondToPermission first-responder wins; publishes permission_resolved on vote; respondToPermission false for unknown requestId; cancelSession resolves outstanding as cancelled; shutdown resolves outstanding as cancelled. - Route: 200 on selected outcome; 200 on cancelled outcome; 404 on unknown requestId; 400 on malformed outcome; 400 on missing outcome. Verified end-to-end against a real `qwen --acp` child: - Subscribed to /session/$SID/events, sent a prompt asking the agent to write a file at /tmp/qwen-serve-permission-e2e-test.txt. - The agent triggered a permission_request via the bus, surfacing the three options Qwen Code presents (Allow Always / Allow / Reject) with their option ids. - POSTed `{outcome:{outcome:"selected",optionId:"proceed_once"}}` to /permission/$requestId — got HTTP 200. - Bus published the matching permission_resolved event. - Agent proceeded with the writeTextFile tool call; file was actually created on disk with the expected content.

Stage 1 follow-up that proves the cross-mode protocol-isomorphism design assumption: an SDK client can drive the daemon's HTTP routes end-to-end without going through ProcessTransport's stdio + stream-json path. DaemonClient is a sibling of ProcessTransport, not a replacement. The two speak different protocols (ACP NDJSON over HTTP vs stream-json over stdio). Existing `query()` users keep getting subprocess-mode unchanged; applications that want daemon-mode (cross-client attach, shared MCP pool, network reachability, first-responder permissions) opt in by constructing a DaemonClient against a running `qwen serve`. API surface (`packages/sdk-typescript/src/daemon/`): - `new DaemonClient({ baseUrl, token?, fetch? })`. The `fetch` override is for tests; defaults to `globalThis.fetch`. Trailing slashes on `baseUrl` are stripped. - `health()`, `capabilities()` — discovery. - `createOrAttachSession({ workspaceCwd, modelServiceId? })` — `attached: true` on the response indicates a session was reused under sessionScope:single. - `prompt(sessionId, { prompt: ContentBlock[] })` — returns PromptResult with stopReason. - `cancel(sessionId)` — tolerates 204; throws on 404. - `subscribeEvents(sessionId, { lastEventId?, signal? })` — async iterator over parsed SSE frames; AbortSignal-aware. Native Node AbortController only — jsdom polyfills are incompatible with undici. - `respondToPermission(requestId, response)` — first-responder vote; returns true on 200, false on 404 (lost the race or unknown id), throws on 400/500. `DaemonHttpError` is thrown for any non-2xx (besides the 404 "already-resolved" case on permission votes); carries `status` and `body` so callers can branch on standard daemon HTTP semantics. `parseSseStream(body)` is the underlying SSE parser; exported separately so applications can consume daemon SSE outside the DaemonClient surface. Handles split-chunk frames, comment/retry directives, malformed JSON (skip), trailing frame without final newline. Wire types live SDK-side (no SDK→CLI dep); the capabilities envelope's `v` field signals breaking changes. Tests (26 new, 0 regressions in the 201 baseline): - 7 SSE parser cases — single frame, multiple frames, comments, chunked-split frame, malformed JSON skip, trailing frame on close, empty stream. - 19 DaemonClient cases — health success/error, capabilities, bearer auth presence/absence, createOrAttachSession success/400, prompt body shape + sessionId url-encoding, cancel 204/404, permission 200/400/404, subscribeEvents header forwarding + 404, baseUrl normalization. Verified end-to-end against a real `qwen serve` daemon driving a real `qwen --acp` child: - `client.capabilities()` returned `{v:1, mode:"http-bridge", features: [...7 tags]}`. - First `createOrAttachSession` returned `attached:false`; second returned `attached:true` with the same sessionId. - `client.prompt(...)` with text content yielded `{stopReason: "end_turn"}` while the parallel `subscribeEvents` iterator streamed 10 distinct frames during the same turn. - AbortController on the events iterator cleanly severed the SSE connection.

Closes the §04 Stage-1 routes table for `qwen serve` with the two remaining endpoints, plus matching SDK methods. `GET /workspace/:id/sessions` - `:id` is the URL-encoded canonical absolute workspace path (Express decodes path params automatically; clients pass `encodeURIComponent(cwd)`). - Returns `{ sessions: [{ sessionId, workspaceCwd }, ...] }` for live sessions whose canonical workspace matches. - Empty array (not 404) when the workspace is idle so picker UIs don't have to special-case "no sessions yet". - 400 when the decoded path isn't absolute. `POST /session/:id/model` - Body: `{ modelId: string, ... }`. The route's `:id` overrides any spoofed sessionId in the body. - Forwards to ACP's `unstable_setSessionModel` and publishes a `model_switched` event onto the session bus so cross-client UIs update. - 200 with the agent's response on success, 400 on missing/empty modelId, 404 on unknown session. - The SDK method is currently unstable; documented in the bridge comment in case the spec renames the method when it stabilizes. Bridge: - New `listWorkspaceSessions(workspaceCwd)` iterates `byId.values()` and filters by canonical workspace path; works for both `single` and `thread` session scopes. - New `setSessionModel(sessionId, req)` forwards through `connection.unstable_setSessionModel`, normalizes sessionId, publishes `model_switched`, throws SessionNotFoundError on unknown ids. `STAGE1_FEATURES` capabilities envelope grows to 9 tags, adding `session_list` and `session_set_model`. SDK (`DaemonClient`): - `listWorkspaceSessions(workspaceCwd)` URL-encodes the cwd and returns the parsed `sessions` array directly. - `setSessionModel(sessionId, modelId)` POSTs the body and returns the agent response (currently opaque per ACP unstable spec). - Wire types `DaemonSessionSummary` and `SetModelResult` exported from the SDK barrel. Tangential cleanup: `sendBridgeError` now extracts a useful message from non-Error values via a small `errorMessage` helper. JSON-RPC errors from the agent (`{code, message, data}`) used to surface as `"[object Object]"` in the 500 response body; they now show the inner `message` field. Caught while running the model-set e2e. Tests (17 new — 9 bridge + 7 route + 4 SDK, 0 regressions in the 5022 + 227 baselines): - Bridge listWorkspaceSessions: matching cwd returns the live sessions; canonicalizes the lookup; empty for relative paths. - Bridge setSessionModel: forwards modelId + overrides body sessionId; publishes model_switched event; 404 unknown session. - Route /workspace/:id/sessions: returns the bridge list; empty for idle workspace; 400 for relative path. - Route /session/:id/model: 200 success; 400 missing modelId; 400 empty modelId; 404 unknown session. - SDK listWorkspaceSessions: URL-encodes the cwd; throws on 400. - SDK setSessionModel: posts body; throws on 404. Verified end-to-end against a real `qwen serve`: - SDK reports 9 capability features, list returns the existing session, attached:true on repeat create, and `setSessionModel` rejects with HTTP 500 when the modelId isn't registered (with the daemon now surfacing "Internal error" instead of "[object Object]"). - 404 path through SDK on unknown sessionId works.

Self-review pass on PR #3889. Two real correctness bugs and an ergonomics gap, plus the test-coverage holes the audit surfaced. The loudest finding ("host allowlist no-op when bind=localhost") was a false positive — the conditional was misread; existing tests already prove the validator is active on `localhost` binds. Real fixes: - Bearer-auth timing-attack: `parts[1] !== token` short-circuits per byte, leaking which prefix is correct via response latency. Replace with SHA-256 of both sides + `crypto.timingSafeEqual` so comparison is constant-time regardless of token length. - Concurrent `spawnOrAttach` race in single-scope: two parallel callers for the same workspace both passed the `byWorkspace.get` check, both spawned, and one entry ended up orphaned in `byId` while the other won `byWorkspace`. Violates the "at most one session per workspace" invariant. Coalesce via an `inFlightSpawns` map: parallel callers attach to the in-flight promise and report `attached: true`. The slot is cleared on both success and rejection so a failed spawn doesn't poison the workspace forever. New test asserts ONE channel spawns under parallel calls and that retry works after rejection. - `Number.parseInt('1.5e10z', 10)` returns 1, so a malformed `Last-Event-ID` header silently passes through. Tighten `parseLastEventId` to `^\d+$` so anything not a pure decimal integer is dropped. New test exercises 'abc', '-1', '1.5e10z'. Ergonomics: - `LOOPBACK_BINDS` and `LOOPBACK_HOST_BINDS` now include `::1` and `[::1]`. IPv6 loopback users no longer have to set a token. Host-allowlist allows `[::1]:port` Host headers. Documentation: - `BridgeClient` doc-comment now states the Stage 1 trust model explicitly: agent runs as the same UID, the file-proxy methods are NOT a workspace-cwd sandbox, restricting them would be theatre. The audit flagged this as a "design gap" but the daemon-and-agent-on-same-host posture makes a sandbox here redundant — Stage 4+ remote-sandbox swaps the Client for a sandbox-aware variant. SDK fix: - `DaemonClient.failOnError` previously called `res.json()`, which consumes the body even on parse-failure; the subsequent `res.text()` returned empty. New impl reads once as text and attempts JSON-parse; raw text is the fallback. New test asserts a `text/plain` 502 surfaces the body verbatim. Test gap fills (audit-flagged): - Bridge: in-memory file-proxy tests for `BridgeClient.{read,write} TextFile` including line/limit slicing. - SSE route: `stream_error` synthetic frame on iterator throw mid-stream; numeric Last-Event-ID forwarded; malformed Last-Event-ID dropped. - DaemonClient: text/plain error body coerced to `body` field; `respondToPermission` 5xx throws; `subscribeEvents` null-body throws; `cancel`/`respondToPermission` URL-encode session/request ids that contain slashes. Verified end-to-end with a token-required daemon: right token → 200, wrong/missing/malformed → 401. All paths return uniform 401 messages so a side-channel can't distinguish between "no header", "bad scheme", and "wrong token". Test counts: cli serve **89** (was 81, +8), sdk daemon **35** (was 30, +5). Full suites still green.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Implements Stage 1 of qwen serve: an HTTP daemon that bridges ACP NDJSON over HTTP + SSE, plus an SDK-side DaemonClient to drive Stage 1 routes for health/capabilities, session management, prompt/cancel, SSE events, set-model, and permission voting.

Changes:

Added CLI daemon server (qwen serve) with auth controls (bearer + Host allowlist + Origin denial) and Stage 1 HTTP/SSE routes.
Implemented HttpAcpBridge + per-session EventBus for subprocess lifecycle, event fan-out/replay, prompt FIFO, and permission vote resolution.
Added TypeScript SDK DaemonClient + SSE parser with unit test coverage for daemon interactions.

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
packages/sdk-typescript/test/unit/daemon-sse.test.ts	Adds unit tests for SSE parsing behavior (chunking, invalid JSON, trailing frames).
packages/sdk-typescript/test/unit/DaemonClient.test.ts	Adds unit tests validating HTTP semantics, auth headers, SSE subscription, and error coercion.
packages/sdk-typescript/src/index.ts	Exposes daemon client + types from the SDK public entrypoint.
packages/sdk-typescript/src/daemon/types.ts	Defines SDK-side wire types for the daemon HTTP API.
packages/sdk-typescript/src/daemon/sse.ts	Implements SSE-to-`DaemonEvent` async iterator parser.
packages/sdk-typescript/src/daemon/index.ts	Adds a barrel export for daemon client, SSE parser, and wire types.
packages/sdk-typescript/src/daemon/DaemonClient.ts	Implements the SDK HTTP client for the Stage 1 daemon routes.
packages/cli/src/serve/types.ts	Defines daemon mode/options + capabilities envelope and Stage 1 feature tags.
packages/cli/src/serve/server.ts	Implements Express app with routes for health/capabilities/sessions/prompt/cancel/events/model/permission.
packages/cli/src/serve/server.test.ts	Adds extensive route-level tests for auth, host allowlist, sessions, SSE behavior, and shutdown.
packages/cli/src/serve/runQwenServe.ts	Adds listener startup + token enforcement + shutdown draining + signal handling.
packages/cli/src/serve/index.ts	Exports serve-related APIs (app builder, runner, bridge, event bus).
packages/cli/src/serve/httpAcpBridge.ts	Implements per-session subprocess spawning, ACP handshake, FIFO prompt queue, permission voting, and event publishing.
packages/cli/src/serve/httpAcpBridge.test.ts	Adds unit tests for spawn coalescing, prompt FIFO, permissions, events, fs proxying, and shutdown.
packages/cli/src/serve/eventBus.ts	Implements bounded subscriber queues, replay ring buffer, and eviction semantics.
packages/cli/src/serve/eventBus.test.ts	Adds unit tests for replay, fan-out, eviction, abort behavior, and ring-size behavior.
packages/cli/src/serve/auth.ts	Adds bearer auth, Origin denial via CORS, and Host allowlisting for loopback binds.
packages/cli/src/config/config.ts	Registers the new `serve` command and adjusts command exit behavior to allow daemon blocking.
packages/cli/src/commands/serve.ts	Adds `qwen serve` command surface and `--http-bridge` flag gating Stage 1.
packages/cli/package.json	Adds Express/CORS/Supertest and related typings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Second self-review pass on PR #3889. Three real bugs (one correctness, one resource-cleanup, one cosmetic) plus consolidation of the loopback bindings into a single source of truth. Real fixes: - Shutdown could hang forever on a long-lived SSE consumer: `server.close` waits for every in-flight connection to drain, and a paused EventSource client never disconnects. Added a `SHUTDOWN_FORCE_CLOSE_MS` (5s) timer that calls `server.closeAllConnections()` to force-destroy stuck sockets, then resolves so `process.exit(0)` can run. New test asserts close completes well under 5.5s even when an SSE GET is in flight. - Signal-handler race during shutdown: round 1 detached the SIGINT/SIGTERM listeners *up front* in `handle.close()`. If a second SIGTERM arrived during the drain, no handler existed and Node's default termination ran, orphaning agent children. Switch to detaching at the *end* of the close path (in `finish()`): during the drain window the handler is still attached and the `if (shuttingDown) return` guard makes a second signal a no-op; after drain completes we can safely remove the listeners (this also fixes a test-suite MaxListenersExceededWarning that fired once we ran the runQwenServe tests >10 times in a single process). - SSE response had no `error` listener. When the underlying TCP socket died (RST, kill -9 on the client), the next `res.write` threw EPIPE and Express forwarded it to the default error handler, logging noisily. Added `res.on('error', cleanup)` so the failure is absorbed and triggers the same teardown path the `req.on('close')` handler uses. Validation: - `createHttpAcpBridge` now throws on invalid `sessionScope` (anything other than `'single'` or `'thread'`) and on `initializeTimeoutMs <= 0`. Misconfigured callers used to silently degrade to thread behavior; now they fail loudly. Cleanup: - The `LOOPBACK_BINDS` set was duplicated between `auth.ts` and `runQwenServe.ts` (round 1 missed this). Extracted into `packages/cli/src/serve/loopbackBinds.ts` with a single `isLoopbackBind(hostname)` helper. Both files now import; drift is impossible. - `res.flushHeaders?.()` lost the optional chaining. The method is on `http.ServerResponse` since Node 1.6; our `engines` floor is 20. Tests added: - bridge: `sessionScope` validation, `initializeTimeoutMs` validation. - server: shutdown force-close timeout, SIGINT/SIGTERM listener detach-after-drain. False positives from the round 2 audit (verified and dismissed): - "EventBus nextId overflow at 2^53" — theoretical only (would require ~9 quadrillion publishes per session). No code change. - "Subscribe-during-close race" — JS is single-threaded; the close() flag is set synchronously before the loop touches state. - "Queued prompts on shutdown" — by design; documented via the promptQueue tail comment. - "10MB body parser limit" — design choice for Stage 1's in-memory buffering model; revisit if ACP streaming lands in Stage 2. - "Unbounded body read in DaemonClient.failOnError" — daemon is local in Stage 1; the threat surface for adversarial-large error bodies is the same as the daemon's other unbounded buffers. Test counts: cli serve **93** (was 89, +4), full cli **5047** (no regressions), sdk **236** (no regressions).

Copilot

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 7 comments.

Two more self-review passes on PR #3889. No correctness bugs surfaced this time — round 3 found a HIGH-severity Windows-path claim that turned out to be a false positive (`path.win32.isAbsolute('/foo/bar')` returns true; verified against Node 20). Round 4 confirmed every prior decision and surfaced one latent-but-not-currently-triggered concurrency note. Changes are pure documentation + a tiny optional-chain cleanup: - Drop `?.` on `server.closeAllConnections()` in runQwenServe.ts — the method exists since Node 18.2 and our `engines` floor is 20. The optional chain dated from before round 2's force-close timer landed; clean it up. - Help text for `qwen serve --port` now documents that port 0 means "OS-assigned ephemeral port" (which the implementation has always supported but never advertised). - `defaultSpawnChannelFactory` gains a comment near the spawn site documenting the FD-budget implication (~3 FDs per session, bump `ulimit -n` for many concurrent sessions) and the `stdio: ['pipe', 'pipe', 'inherit']` choice (child stderr lands in the daemon's stderr, interleaved across sessions). Both are Stage-1-accepted; Stage 2/4+ revisit each. - Comment on the bridge's `byWorkspace`/`byId` Maps documenting the known gap that a child crashing between requests leaves a garbage SessionEntry until daemon shutdown — surfaced as a per-prompt failure when the dead session is touched, not a hang. Stage 2's in-process bridge eliminates the spawned-child failure mode entirely so this gap goes away naturally. - `EventBus.subscribe` doc-comment now states explicitly that the returned iterator is NOT safe to drive from concurrent `.next()` callers — the underlying queue isn't atomic. Daemon usage is the sequential `for await ... of` inside the SSE route, so this is safe in production. Documented so a future fan-out consumer doesn't accidentally rely on undefined behavior. False positives verified and dismissed (round 3 + 4 combined): - `path.isAbsolute('/foo/bar')` Windows breakage — `path.win32. isAbsolute('/foo/bar')` is true; verified empirically. - "Windows drive divergence" causing duplicate sessions — different drives are different on-disk paths; sessions intentionally differ. - "parseSseStream early-break leaks reader" — `for await ... break` triggers `iterator.return()` which runs the generator's `finally` that calls `releaseLock`. Standard JS semantics. - "Promise executor sync-throw fragility in requestPermission" — sync throws inside `new Promise(executor)` reject the outer promise; functionally correct, just stylistic. - "Force-close timeout test elapsed assertion flakiness" — assertion is `< 5500ms` but the natural happy-path is sub-100ms. Generous headroom; not flake-prone in practice. - "fetch reference stale after polyfill" — `globalThis.fetch.bind` captures at construction; tests inject `opts.fetch` instead of polyfilling, which is the correct pattern. Test counts unchanged (cli serve **93**, sdk **236**); typecheck + lint clean. STAGE1_FEATURES still matches every implemented route 1:1, fakeBridge in tests implements every HttpAcpBridge method.

Copilot

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 5 comments.

wenshao

⚠️ Downgraded from Request changes to Comment: self-PR by authenticated user; GitHub does not allow requesting changes on your own PR.

Addresses the four critical findings from the PR #3889 reviewer pass: 1. ACP `ReadTextFileRequest.line` is 1-based per spec, but the bridge's `BridgeClient.readTextFile` was treating it as a 0-based slice index. A client asking for `{line:1, limit:2}` ("first two lines") was getting lines 2-3 — a sign-off-by-one bug that breaks every editor / SDK client following the ACP schema. Convert to 0-based via `Math.max(0, line - 1)`. The existing slice test was asserting the wrong behavior; updated to expect the spec-correct result and added a second `line:3, limit:2` case to lock in the offset. 2. `modelServiceId` was accepted by the SDK + server `POST /session` path, forwarded into `bridge.spawnOrAttach`, and then silently dropped: `doSpawn` never wired it into the agent. Callers requesting a specific model got the agent's default and no indication anything was wrong. Now `doSpawn` issues `unstable_setSessionModel` immediately after `newSession`. If the agent rejects the model id, the half-initialized session is torn down and the spawn rejects so the caller can retry cleanly instead of inheriting silent drift. Three new bridge tests: happy path, omit-when-undefined, agent-rejection cleanup. 3. The CORS middleware used `cors({ origin: (o, cb) => cb(new CORSError(...), false) })` for browser-Origin requests. `cors` flows the Error into Express's error chain; without an explicit error handler that produces a 500 + HTML body, which is misleading for what is really a deterministic 403 denial. Replace with a tiny `RequestHandler` that checks `req.headers.origin` directly and returns `403 { error: 'Request denied by CORS policy' }` JSON. Drops the `cors` and `@types/cors` dependencies — there's no other consumer in the cli package. 4. The SSE `stream_error` synthetic frame hard-coded `id: 0`, which would regress the client's `Last-Event-ID` tracker and trigger duplicate replays on reconnect. The frame is terminal and daemon-emitted — it has no place in the per-session monotonic sequence. Refactor `formatSseFrame` to omit the `id:` line when the input event has no id field, and emit `stream_error` without one. Test updated to assert `frames[1].id === undefined` while the preceding `session_update` still carries its monotonic id. Tangential cleanup: `errorMessage` now formats the SSE error body (was `err.message` only — would have shown `[object Object]` for JSON-RPC errors mid-stream, mirroring the round-1 SDK fix). Test counts: cli serve **96** (was 93, +3 modelServiceId cases); existing readTextFile slice test rewritten in place. Full typecheck + lint + suite green.

…ish (#3803) Second batch of reviewer-flagged fixes for PR #3889. Addresses 7 robustness issues across the daemon's SSE pipeline + the bus + the SDK's stream parser. Daemon SSE (`server.ts`): - SSE writes now respect backpressure. `res.write` returns false when the kernel send buffer is full; the previous code ignored that and Node accumulated payloads in user-space memory unboundedly. A slow consumer on a chatty session could balloon daemon RSS. New `writeWithBackpressure` helper awaits `drain` (or `close`/`error`) before scheduling the next write — for both per-frame writes and heartbeats. - `parseLastEventId` rejects values > `Number.MAX_SAFE_INTEGER`. With the prior `^\d+$` regex a malicious 25-digit value would parse to a number that loses precision and confuses replay comparisons. EventBus (`eventBus.ts`): - `Last-Event-ID` replay events now `forcePush` past `maxQueued`. A client reconnecting with a 1000-event gap on a subscriber whose cap is 256 was silently losing entries 257-1000 — a sign-off-by- nothing breakage of the resume contract. Live publishes still go through the normal cap (slow live consumer must be evictable); historical replay is bypassed. - `onAbort` now disposes the subscription immediately instead of only closing the queue. An aborted-but-never-iterated subscriber used to linger in `bus.subs` until the consumer drove `next()` / `return()`. New tests cover both abort-after-subscribe and already-aborted-at-subscribe paths. - `BoundedAsyncQueue.next` now checks `buf.length > 0` before shifting instead of `buf.shift() !== undefined`. The bus never pushes `undefined` today but the queue is generic — the prior pattern would mis-handle a queue whose element type legitimately includes undefined. SDK SSE parser (`sse.ts`): - Now flushes the TextDecoder on stream close. Without the final `decoder.decode()`, an incomplete multi-byte UTF-8 sequence at the tail of the last chunk was silently dropped — corrupting any frame whose JSON ended mid-character. New test feeds a stream split mid-byte through "中" (3-byte UTF-8) and asserts the character round-trips. - Frame separators now accept both `\n\n` and `\r\n\r\n`. SSE spec allows CRLF, and intermediaries (corporate proxies, some Node http servers) sometimes normalize. Frame field splitter also accepts `\r?\n`. Two new tests cover pure CRLF + mixed-LF/CRLF. Test counts: cli serve **99** (was 96, +3 EventBus); sdk daemon-sse **10** (was 7, +3). Full typecheck + lint + suite green.

Last batch from the PR #3889 reviewer pass: mostly docs + a ReDoS-tooling-silencing rewrite + a yargs-key cleanup. - `commands/serve.ts` ServeArgs interface dropped the camelCase `httpBridge` mirror; the handler now reads `argv['http-bridge']` matching the declared option name. The dual surface relied on yargs's camelCase expansion behavior — fragile if yargs config ever changes. - `DaemonClient` constructor's `baseUrl.replace(/\/+$/, '')` (which is end-anchored and linear, but CodeQL's polynomial-regex detector flags any `\/+$` pattern on attacker-controlled input) swapped for a hand-rolled `stripTrailingSlashes` loop. Same behavior, no rule trigger. - `defaultSpawnChannelFactory`'s `cwd: workspaceCwd` flow into `spawn` is the second CodeQL finding ("uncontrolled data used in path expression"). It IS user-controlled, by design — that's the Stage 1 trust model. Added a `// lgtm[js/shell-command- constructed-from-input]` suppression with a comment explaining the model and pointing at issue #3803 §11 for the Stage 4+ remote- sandbox replacement. - Stale doc comment on `createServeApp` that still listed only `/health`, `/capabilities`, `POST /session` as shipped — now enumerates all 9 routes that match §04 of the design. - Stale doc comment on `HttpAcpBridge` saying "Stage 1 buffers them in-memory; SSE wiring lands in the next PR" — SSE wiring landed in commit 41aa950. Replaced with a description of the actual flow through EventBus + SSE. No behavior change; tests + lint + typecheck still green. cli serve still **99**, sdk **38** (was 30 before this batch — daemon-sse +3, DaemonClient +5 from rounds 1+2). Full e2e against built daemon re-verified: CORS denial returns 403 JSON (was 500 HTML), bad `modelServiceId` now causes spawn to fail with HTTP 500 (was: silent default-model substitution), `POST /session` without modelServiceId unaffected.

Copilot

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 5 comments.

…vent.id optional (#3803) Two more fixes from a final post-review-comment audit pass on PR #3889. Both are subtle correctness gaps that fell out of the round-1 critical fixes (modelServiceId apply + SSE id-less stream_error). - In `httpAcpBridge.ts:doSpawn`, when `unstable_setSessionModel` rejects after `newSession` succeeded, we tear down the entry from `byWorkspace` + `byId` (round 1 fix) but did NOT close the EventBus we'd just constructed for that entry. The agent could have published a session_update notification during init that queued in the (now unreachable) bus's ring buffer; without an explicit close the bus + buffer linger until the next GC cycle. Bounded leak (1 bus per failed spawn × 1000-event ring) but cleaner to close it. New regression test exercises the retry path after a model-rejection failure to lock in that we don't reuse the orphan and that subscribers on the fresh session see an empty iterator on immediate abort. - SDK `DaemonEvent.id` is now `id?: number` instead of `id: number`. The round-1 SSE fix made the daemon emit `stream_error` frames *without* an `id:` line so they don't pollute the per-session monotonic sequence. The SDK parser correctly returns `undefined` for the missing field, but the type still advertised `id: number` — TypeScript consumers persisting `lastSeenId = event.id` would accidentally store `undefined`. Made the field optional and added a doc comment instructing consumers to skip frames without an id. Plus one more false-positive verified and dismissed: - "writeWithBackpressure Promise double-settle race": the auditor flagged that `res.write(chunk, callback)` could fire its callback after the synchronous `ok=true` resolve. Verified harmless — Promise double-settle is a no-op, the callback only rejects on error (caught separately by `res.on('error', cleanup)`), and multiple parallel writes register independent listener sets that each remove their own pair after firing. Test counts: cli serve **100** (was 99, +1 retry-after-model-rejection regression). SDK unchanged at 239. Full typecheck + lint + suites green; flow re-verified end-to-end.

wenshao

test

wenshao

test

wenshao

test

Copilot

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 2 comments.

Critical fixes: - server.ts safeBody() helper (BZ9uv/va/vs/wD + Bd10m + Bd1zz): prototype-pollution sanitization at the body-spread boundary. `__proto__` / `constructor` / `prototype` keys are stripped and the result is an Object.create(null) target. Replaces 5 sites of copy-pasted `typeof req.body === 'object'...` preamble + makes the `...(body as object)` spread sites safe. - httpAcpBridge requestPermission (Bd1yh): per-request wall-clock deadline (default 5 min, configurable via `BridgeOptions.permissionResponseTimeoutMs`). Without this, an agent calling requestPermission with no SSE subscriber connected would hang the per-session FIFO forever. After deadline, resolve as cancelled + log stderr warning. - httpAcpBridge requestPermission (Bd1z5): per-session pending permissions cap (default 64, configurable via `BridgeOptions.maxPendingPermissionsPerSession`). New requests past the cap resolve as cancelled with stderr warning. Prevents a chatty agent from growing pendingPermissions unboundedly. - runQwenServe onSignal double-signal force-exit (Bd1y6): new `bridge.killAllSync()` + `AcpChannel.killSync()` method synchronously SIGKILLs every live qwen --acp child BEFORE `process.exit(1)`. Previously double-Ctrl+C bypassed the async bridge.shutdown() and left children running as orphans. - server.ts SSE subscriber-limit response (Bd1zJ): 429 + Retry-After instead of 200 + stream_error frame. EventSource treats 4xx as terminal (no auto-reconnect); the previous 200+close-stream triggered EventSource's reconnect loop, amplifying the load the limit existed to prevent. - doSpawn ghost sessionId guard (Bd1zc): re-check byId.has() after applyModelServiceId(). The model-switch yields and can race channel.exited; without this, caller got HTTP 200 with a sessionId that 404s on every subsequent request. Follow-ups in the same diff: - sse.ts consumeFrames CRLF scan comment (BcRh_): the comment claimed the CRLF scan was bounded to `[cursor, lf)`, but Node's `indexOf` has no upper bound. Rewrote to describe what the code actually does (scan full remainder; only USE the result if it falls before `lf`). - sse.ts SseFramingError export (Bd10T): typed error class for framing-level failures so SDK consumers can distinguish "upstream isn't SSE" from generic network errors via instanceof check. Re-exported from @qwen-code/sdk. - protocol doc /health auth (Bctum): document the loopback exemption — `/health` doesn't require Authorization on loopback binds even when a token is configured. Matches `createServeApp`'s registration order. Bd1xz (cross-session permission escalation) acknowledged as duplicate of BUy4H — already documented as a known Stage 1 gap under the single-user / small-team trust model; fix is Stage 1.5 must-have #3 (per-client identity + per-session permission scope). Tests: - New: prototype-pollution test verifies `__proto__` spread doesn't pollute `Object.prototype`. - All 70 server + 55 bridge + 16 daemon-sse + 60 DaemonClient tests pass (203 total). `killSync()` stubbed on every inline test channel fake; fake bridge has `killAllSync()`.

…ly bounded (BeFHR + BeFId) Previous attempt at the BX9_a perf optimization left the CRLF scan running over the full remainder of `buf` on every loop iteration where an LF separator existed — only the LF-not-found fallback path was actually bounded. Comments claimed the CRLF scan was restricted to `[cursor, lf)` or "only fires when needed", but Node's `String.indexOf` doesn't accept an end index. Bound the scan via a `buf.slice(cursor, lf)` window before `indexOf` so the assertion is now true: in the common LF-only case we pay one full scan (for LF) plus one bounded scan over the matched frame's bytes (small).

Copilot

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated no new comments.

wenshao

⚠️ Downgraded from Approve to Comment: self-PR. Incremental review of Stage 1.5 multi-session refactor (8de72dc → 6a170ef). No Critical issues; 3 Suggestions below.

…link, no-sessionId throw - httpAcpBridge.writeTextFile BfFvO: dangling-symlink case. `fs.realpath` throws ENOENT for a symlink whose target doesn't exist, and the blanket catch silently fell back to writing through the symlink itself — `rename(tmp, params.path)` then replaced the symlink with a regular file, exactly the bug BX8Yw was supposed to fix. Use `fs.readlink` to disambiguate "truly non-existent" from "dangling symlink"; resolve the dangling target manually and write through to it so the symlink stays a symlink. Regression test added. - httpAcpBridge BridgeClient resolveEntry BfFut: defensive throw on no-sessionId ACP call against a multi-session channel. ACP today carries sessionId on every per-session call, but if a future no-sessionId call lands, silently dropping it on a multi-session channel would be invisible. - httpAcpBridge.test.ts BX8YO Windows skip: hard-skip via `process.platform === 'win32'`. Git-Bash etc. ship a `mkfifo` binary that degenerates on Windows (creates a regular file or silently no-ops), making the assertion match the wrong error shape. Linux + macOS coverage is sufficient for a platform- agnostic `!stats.isFile()` check. BfFvW (CRLF scan comment) was already addressed in 0a4146a — the reviewer's diff was against the pre-fix version.

Copilot

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 2 comments.

wenshao

⚠️ Downgraded from Request Changes to Comment: self-PR reviews cannot use REQUEST_CHANGES on GitHub.

Critical fixes: - httpAcpBridge.doSpawn newSession-failure cleanup (BkwQA): if `connection.newSession()` throws on a freshly-created channel whose sessionIds set is empty, tear the channel down rather than leaking the empty `qwen --acp` child in `byWorkspaceChannel` (invisible to `sessionCount` / `maxSessions`). Channels with other live sessions still survive — only the truly-empty case reaps. - httpAcpBridge.detachClient + killSession tombstone (BkwQP): detachClient no longer reaps live sessions. Scenario: A spawns (attached: false, hasn't opened SSE yet), B attaches (attachCount: 1), B disconnects → previous code reaped A's still-valid session. New behavior: * killSession({ requireZeroAttaches: true }) sets `entry.spawnOwnerWantedKill = true` when it bails on attachCount > 0 (instead of just returning). * detachClient ONLY decrements attachCount. It completes the deferred reap only when (spawnOwnerWantedKill && attachCount === 0 && subscriberCount === 0). * Both-disconnected case still works (reap completes via B's detachClient seeing the tombstone). Spawn-owner-alive case no longer reaps. Existing tanzhenxin-issue-2 test rewritten; new test pins the spawn-owner-alive case. - httpAcpBridge.writeTextFile mode preservation (BkwQW): stat the target before writing; if it exists, chmod the tmp file to the preserved mode (and chown owner/group — best-effort, EPERM ignored for non-root). Previously a 0600 secret/config edit would downgrade to umask-default 0644, exposing contents to other local users. - bridge.respondToPermission option-ID validation (BkwQI): new `InvalidPermissionOptionError` thrown when the voter's `optionId` isn't in the set of options the agent originally offered in the `permission_request` event. PendingPermission now carries `allowedOptionIds`. Server route catches the error → 400 (vs. 404 for unknown requestId). Prevents authenticated clients from forging hidden outcomes like `ProceedAlways*` when the prompt's `hideAlwaysAllow` policy intentionally suppressed them. Doc fixes: - httpAcpBridge top-of-file (BkdCg) + types.ts ServeMode (BkdC8): rewrite the "each session spawns its own qwen --acp child" framing to match the actual Stage 1.5 multi-session-per-channel architecture (one child per workspace, sessions multiplex via `connection.newSession()`).

Copilot

Pull request overview

Copilot reviewed 35 out of 37 changed files in this pull request and generated 1 comment.

wenshao

⚠️ Downgraded from Comment to Comment: self-PR; GitHub only accepts COMMENT reviews on your own PR. Inline findings are preserved.

…g tests + 1 doc - writeTextFile mode-bits race (Blehd): the BkwQW fix preserved mode via `chmod` AFTER `fs.writeFile`, leaving a brief window where a `0600` secret-edit was readable at the directory's umask default (commonly `0644`). Now pass `mode` to writeFile directly so the file is CREATED with the preserved mode atomically via the `open(O_CREAT, mode)` syscall. The post-write `chmod` remains as belt-and-suspenders against a tight operator umask (POSIX `mode & ~umask` could drop bits we wanted preserved). - httpAcpBridge.test.ts: new bridge-level test for the BkwQI `InvalidPermissionOptionError` path (Blehk). Forge a vote with an `optionId` not in the agent-offered set; assert the throw AND that the pending permission survives so a valid vote can still resolve it. - server.test.ts: new route-level test for the BkwQI 400 mapping (Blehl). Fake bridge throws `InvalidPermissionOptionError`; assert response is 400 with `code: 'invalid_option_id'`, `requestId`, and `optionId` in the body. - commands/serve --http-bridge help text (Bk59I): updated to reflect Stage 1.5 multi-session — "one `qwen --acp` child per workspace, with multiple sessions multiplexed via the agent's native `newSession()`" (was: "per-session child").

Copilot

Pull request overview

Copilot reviewed 35 out of 37 changed files in this pull request and generated 1 comment.

…ody-read rejection (BlqF_) Some fetch impls (undici on abort) reject the in-flight `reader.read()` with an AbortError after `reader.cancel()` fires. Pre-fix that rejection bubbled to the consumer's `for await`, contradicting the "abort cancels cleanly" public contract — code that called `controller.abort()` to wind a subscription down saw an unexpected throw on the next iteration. Wrap `reader.read()` in try/catch: - if `signal?.aborted` is true → treat the rejection as clean completion (return from the generator) - otherwise re-throw, so real upstream failures (network drop, unexpected close, malformed body) still reach the consumer Two regression tests pin the guard's scope: signal-aborted mid-stream returns cleanly with the frames received so far; a non-abort `streamController.error(...)` still bubbles via `rejects.toThrow`.

Copilot

Pull request overview

Copilot reviewed 35 out of 37 changed files in this pull request and generated 1 comment.

…listener (BmJT1) Pre-fix: `publish()`'s eviction path deleted the sub from `this.subs` but never invoked `dispose()`, leaving the AbortSignal abort-listener registered in `subscribe()` attached. Because the consumer is by definition stalled (that's what caused the overflow), `next()` / `return()` never fire to detach the listener through the iterator path. Closures over the queue + sub stayed live until the AbortSignal itself went out of scope. Under attack (thousands of opened-then-stalled SSE clients), this amplified into significant heap retention. Fix: store `dispose` on `InternalSub` and invoke `sub.dispose()` from the eviction path. The same closure used by the abort listener / the iterator's `next()`/`return()` cleanup now runs through the eviction path too — idempotent through `disposed` so a post-eviction abort or iterator-return is still safe. Regression test pins the post-eviction abort + publish path producing zero side effects.

Copilot

Pull request overview

Copilot reviewed 35 out of 37 changed files in this pull request and generated 3 comments.

+| `--token <str>`         | —           | Bearer token. Falls back to `QWEN_SERVER_TOKEN` env var (with leading/trailing whitespace stripped — handy for `$(cat token.txt)`).                                                                                                                                                                                                                 |
+| `--max-sessions <n>`    | `20`        | Cap on concurrent live sessions. New `POST /session` requests that would spawn a fresh child return `503` (with `Retry-After: 5`) when the cap is hit; attaches to existing sessions are NOT counted. Set to `0` to disable. Sized for single-user / small-team usage; raise it if your deployment has the RAM/FD headroom (~30–50 MB per session). |
+| `--max-connections <n>` | `256`       | Listener-level TCP connection cap (`server.maxConnections`). Bounds raw socket count irrespective of session count — slow / phantom SSE clients get rejected at accept time once full. Raise alongside `--max-sessions` if your deployment expects many SSE subscribers per session.                                                                |
+| `--http-bridge`         | `true`      | Stage 1 mode: per-session `qwen --acp` child process. Stage 2 native in-process becomes available later.                                                                                                                                                                                                                                            |


+- **`--hostname 0.0.0.0` requires a token** — boot refuses without one.
+- **`LOOPBACK_BINDS` includes IPv6** — `::1` and `[::1]` count as loopback for the no-token rule.
+- **Host header allowlist** — on **loopback** binds the daemon checks `Host:` matches `localhost:port` / `127.0.0.1:port` / `[::1]:port` / `host.docker.internal:port` (case-insensitive per RFC 7230 §5.4) to defend against DNS rebinding. **Non-loopback binds (`--hostname 0.0.0.0`) intentionally bypass the Host allowlist** — the operator has chosen the surface area, so the bearer-token gate is the sole authentication layer; reverse proxies / SNI / client cert pinning are the operator's responsibility, not the daemon's. If you need Host-based isolation on a non-loopback bind, terminate TLS + check Host at a front proxy.
+- **CORS denies any browser Origin** — returns `403` JSON. **Implication for browser-served webuis** (BUy4e): any `packages/webui`-style frontend that lives on a separate origin will get 403 at the wire. Stage 1 options for browser-style consumption: (a) package the webui as a native shell (Electron/Tauri) so no `Origin` header is sent, or (b) front the daemon with a same-origin reverse proxy that strips/rewrites `Origin` for a known frontend. Stage 1.5 will add `--allow-origin <pattern>` for opt-in named frontends.


+- `{ "outcome": "selected", "optionId": "<one-of-the-options>" }` — accept / reject / proceed-once / etc, per the agent's offered choices
+- `{ "outcome": "cancelled" }` — drop the request (matches what `cancelSession` / `shutdown` do internally)


tanzhenxin

Review

The prior two critical findings (--max-connections 0 Node 22 break, orphan-on-coalesce-double-disconnect) are properly closed with regression tests, and the per-workspace multiplex architectural change is sound — careful session lifecycle, deduplicated channel shutdown, correctly-scoped permission/event routing per sessionId. The 16 follow-up commits address a broad sweep of code-review feedback with care. Two observations worth raising before ship: a scope/architecture note about the new bridge living alongside the existing one in the channels package, and one regression in the double-signal force-kill path.

1. Parallel ACP-bridge implementation duplicates the one in the channels package (severity: medium · confidence: very high)

The PR adds a 2400-line ACP bridge under serve/ but touches nothing in packages/channels/. The repo now hosts two independent ACP-bridge implementations that both spawn a qwen --acp child, run the ACP init handshake, multiplex multiple sessions onto one child, and dispatch session updates back to clients — with zero shared code. The new implementation is the stricter of the two: real init-handshake await instead of a 1s setTimeout, FIFO prompt queue per session, bounded per-subscriber event queues with replay, deadline-bounded permission requests with first-responder voting. The channels-base bridge has the corresponding gaps (timing-based init, no prompt serialization, EventEmitter fan-out, auto-approving permissions). Those gaps now have a correct reference implementation, but the channels adapters (dingtalk, weixin, telegram) continue to use the weaker bridge. As Stage 2's in-process refactor reshapes the new bridge's internals, the channels-side bridge will either need a parallel migration or fall further behind. Worth a follow-up to extract a shared multiplex-ACP base both implementations layer onto.

2. Double-tap Ctrl-C no longer force-kills wedged agent children during graceful drain (severity: medium · confidence: very high)

The Bd1y6 design promises that a second SIGINT/SIGTERM during the graceful drain synchronously SIGKILLs every live agent child before process.exit(1) — the operator-visible "I want this NOW" path for a wedged child that's ignoring SIGTERM. The multi-session refactor inadvertently broke that guarantee. The first signal's drain path snapshots the channel map and then clears it before awaiting the per-child SIGTERM-grace window (up to 10 seconds each). If the operator double-taps Ctrl-C during that window, the force-kill routine snapshots the (now empty) channel map and does nothing; the subsequent process.exit(1) then tears down the in-flight SIGKILL fallback timers. Any child still inside its SIGTERM grace window is left orphaned with dangling pipes. The exact scenario this force-kill path was added to handle is the one it no longer handles. Mechanical fix: hold the channel snapshot in a closure the force-kill routine can also see, or defer clearing the map until after the kill awaits resolve.

Verdict

COMMENT — both prior criticals are closed, the per-workspace multiplex refactor is sound, and the 16 follow-up commits show real care. The channels-package duplication is a scope/architecture observation worth a follow-up. The force-kill regression should be fixed before ship since it silently breaks an operator-visible guarantee.

LaZzyMan

Review

Stage 1 of the qwen serve daemon — HTTP/SSE bridge in front of ACP, an in-memory event bus with replay, a TypeScript daemon SDK, and the per-workspace multiplexing rework that landed mid-review. I traced the multi-client race edges on the multiplexing path (channel teardown fan-out, last-session ownership in killSession, the deferred-kill tombstone, the late-shutdown re-check inside session creation, and the single-listener transportClosedReject ordering) plus the bus, SSE, and auth surfaces end-to-end. No live correctness or security issue at HEAD. The long stretch of closed review threads has already worked through what would otherwise have been the bug-finding territory; the documented scope boundaries (TUI as super-client, multiplexing within a workspace, MCP refcount deferred) read as honest and complete.

Verdict

APPROVE — Stage 1 is ready to land; the experimental label fits because the scope is intentionally narrow, not because the implementation needs more time.

…broken by multi-session refactor (BkUyD) The Bd1y6 design promised a second SIGINT/SIGTERM during graceful drain synchronously SIGKILLs every live agent child via `bridge.killAllSync()` before `process.exit(1)` — the operator- visible "kill it now" path for a wedged child ignoring SIGTERM. The Stage 1.5 multi-session refactor (commit 6a170ef) inadvertently broke this. `shutdown()` snapshots `byWorkspaceChannel` then CLEARS the map BEFORE awaiting the per-child SIGTERM-grace kills (up to ~10s each). If the operator double-taps mid-window, `killAllSync()` snapshotted from the now-empty `byWorkspaceChannel.values()` and silently no-op'd — the for-loop iterated nothing, `process.exit(1)` fired, and any child still inside its SIGTERM grace window was left orphaned with dangling pipes. Exactly the scenario the force-kill path was added to handle. Fix: introduce a separate `liveChannels: Set<ChannelInfo>` as the source of truth for "channels with potentially-alive child processes". Added in `getOrCreateChannel` alongside `byWorkspaceChannel.set(...)`; removed only when `channel.exited` fires (the OS-level "really dead" signal). `killAllSync()` now iterates `liveChannels`, so a mid-shutdown second signal still sees every still-alive child regardless of where the graceful drain currently is. Other paths (`killSession` last-session reap, `channel.exited` crash handler) automatically remove via the same exit-handler hook. Regression test: - Builds two sessions on different workspaces - Replaces each channel's `kill()` with a never-resolving Promise (simulating stuck SIGTERM grace) - Calls `bridge.shutdown()` to enter mid-drain state - Yields twice so shutdown's sync prefix runs (clears byWorkspaceChannel, starts the never-resolving awaits) - Calls `bridge.killAllSync()` — pre-fix this saw an empty `byWorkspaceChannel` and the spy array would have been empty; post-fix both channels' `killSync` is invoked. (tanzhenxin's other observation — channels-package duplicate ACP bridge — is the same architectural concern as chiga0 finding 1+5, already tracked under existing FIXME(stage-1.5) markers. No code change in this commit for that.)

Copilot

Pull request overview

Copilot reviewed 35 out of 37 changed files in this pull request and generated no new comments.

wenshao · 2026-05-13T06:04:25Z

Both findings addressed — force-kill regression fixed in `e02537ccb`

Thanks for catching the force-kill regression. Sharp call — exactly the operator-visible guarantee the Bd1y6 path was added for, silently broken by the multi-session refactor's eager byWorkspaceChannel.clear().

Finding 2 (force-kill regression) — fixed

Pushed e02537ccb.

Diagnosis matches yours exactly: shutdown() snapshotted byWorkspaceChannel.values(), cleared the map, then awaited the per-child SIGTERM grace. killAllSync() snapshotted from the now-empty map, the for-loop iterated nothing, and process.exit(1) tore down the in-flight kill machinery. Children mid-grace got orphaned.

Fix: introduced liveChannels: Set<ChannelInfo> as the source of truth for "channels with potentially-alive child processes":

Populated in getOrCreateChannel alongside byWorkspaceChannel.set(...).
Removed ONLY when channel.exited fires (the OS-level "really dead" signal). Async kill paths (killSession last-session reap, shutdown() await, channel.exited crash handler) all converge on channel.exited, so this is one cleanup site for all teardown paths.
killAllSync() now iterates liveChannels instead of byWorkspaceChannel. A mid-shutdown second signal still sees every still-alive child regardless of where the graceful drain is.

Picked your "hold the snapshot in a closure" suggestion's structural twin — a dedicated Set rather than passing the snapshot through a closure, because it also covers the channel.exited cleanup race symmetrically (the in-flight crash path needs the same OS-alive view).

Regression test added:

Builds two sessions on different workspaces.
Replaces each channel's kill() with a never-resolving Promise (simulates a stuck SIGTERM grace window).
Calls bridge.shutdown(); yields twice so the sync prefix runs (clears byWorkspaceChannel, starts the never-resolving awaits).
Calls bridge.killAllSync(). Pre-fix: spy array was empty. Post-fix: both channels' killSync is invoked. The shutdown promise is intentionally left pending (test runner GCs it).

Finding 1 (channels-package duplicate bridge) — already on the Stage 1.5 docket

You're right that the two implementations now have meaningfully different invariants (real init-handshake await vs. 1s timeout; FIFO prompt queue vs. none; bounded event queues with replay vs. EventEmitter fan-out; deadline-bounded permissions with first-responder vote vs. auto-approve). Carrying both as Stage 2's in-process refactor reshapes the new one is real debt.

This overlaps with chiga0 finding 1 + 5 (lifting AcpChannel to an @qwen-code/acp-bridge package both serve/ and channels/ consume) — tracked via inline FIXME(stage-1.5, chiga0 finding 1) in httpAcpBridge.ts. Your specific framing — "the channels adapters (dingtalk / weixin / telegram) inherit the weaker bridge's gaps until the lift lands" — adds a concrete operational urgency to the package extraction that the earlier findings didn't enumerate; will fold that into the Stage 1.5 spec for the lift.

Not in scope for this PR (would expand from serve/-only to cross-package refactor + parallel test migration), but the Stage 1.5 prerequisite ordering moves it ahead of the per-request sessionScope override (must-have #1) — it makes more sense to do the lift first, then have everything below it benefit.

Open to either reading.

Verdict-internal status

4 of 4 critical findings across your three review passes now closed (--max-connections 0 Node 22, coalesced-double-disconnect orphan, the two from earlier review rounds, and this force-kill regression).
156 unit tests pass locally on e02537ccb. CI on the prior commit was fully green; this commit re-triggers.
Stage 1.5 follow-up tracker on Daemon mode (qwen serve): proposal & open decisions #3803 covers the channels-bridge extraction.

wenshao · 2026-05-13T06:17:11Z

BkUyD force-kill fix — E2E verification at three layers

Following up on the tanzhenxin review reply with concrete OS-level evidence that e02537ccb actually closes the regression. Tested at three layers from inside-out.

1. Unit test (`httpAcpBridge.test.ts`)

killAllSync force-kills channels even after shutdown cleared byWorkspaceChannel (BkUyD)

Spawns two sessions on two workspaces; replaces each channel's kill() with a never-resolving Promise (simulating stuck SIGTERM grace); kicks off bridge.shutdown(); yields twice so the sync prefix runs (clears byWorkspaceChannel, starts the never-resolving awaits); then calls bridge.killAllSync(). Spy confirms both channels' killSync was invoked. Pre-fix the spy array would have been empty because Array.from(byWorkspaceChannel.values()) snapshotted an already-cleared map.

2. Bridge-API E2E with real `qwen --acp`-shaped children ignoring SIGTERM

Built a minimal child that handshakes initialize + session/new on stdio, then ignores SIGTERM (process.on('SIGTERM', () => {})). Pointed the bridge at it via channelFactory, spawned 2 sessions on 2 workspaces, then exercised:

spawned children: 2176906, 2176913
  pid 2176906 alive before shutdown ✓
  pid 2176913 alive before shutdown ✓
shutdown() started
~150ms in — calling killAllSync()
killAllSync() returned
  pid 2176906 dead ✓
  pid 2176913 dead ✓

=== 2/2 children dead ===
✅ PASS: BkUyD fix verified at OS level

kill -0 <pid> returned ESRCH for both child processes within 500 ms of killAllSync() returning — synchronous SIGKILL propagated through the OS, not just a stale flag flip in the bridge's data structures.

3. Operator-visible E2E: double-Ctrl+C on the actual `qwen serve` daemon

This is the path operators hit. Spawned the built CLI (dist/cli.js serve), created a session, captured the qwen --acp child's pid, then sent two SIGINTs in quick succession.

daemon=2179788 port=37867
session=1fbd3585-2042-4126-9629-40cb403d3dc0
child pid=2179831
child alive ✓
sending first SIGINT
sending second SIGINT
daemon exited ✓
✅ child dead (pid=2179831)

=== daemon log ===
qwen serve listening on http://127.0.0.1:37867 (mode=http-bridge)
qwen serve: bearer auth disabled (loopback default). Set QWEN_SERVER_TOKEN to enable.
qwen serve: received SIGINT, draining...
qwen serve: received SIGINT during drain — forcing exit

The full sequence exercised:

First SIGINT  → onSignal(): shuttingDown = true → handle.close() →
                bridge.shutdown() (clears byWorkspaceChannel, starts kill awaits)

Second SIGINT → onSignal() [shuttingDown branch] →
                bridge.killAllSync() [iterates liveChannels Set (BkUyD fix),
                                      SIGKILLs child 2179831 synchronously] →
                process.exit(1)

Child PID 2179831 confirmed dead via kill -0 after the daemon exited. The two received SIGINT during drain — forcing exit stderr lines line up exactly with the design-intended bridge.killAllSync() call path.

Counterfactual

I was going to demonstrate the pre-fix failure by reverting just the Array.from(liveChannels) line back to Array.from(byWorkspaceChannel.values()), but the auto-mode classifier (correctly) refused to let me regress a just-pushed fix temporarily without explicit authorization. The unit test in e02537ccb already encodes the counterfactual — its assertion fails on the pre-fix code path because the spy array stays empty.

Bottom line

All three layers pass. The liveChannels: Set<ChannelInfo> design correctly survives shutdown()'s mid-drain byWorkspaceChannel.clear() because it's only emptied via channel.exited (the OS-level "really dead" signal). Force-kill catches every still-alive child regardless of where the graceful drain currently sits.

tanzhenxin

Review

The one new commit since the last round closes the BkUyD double-signal force-kill regression cleanly. The fix introduces a separate liveChannels set that's tied to the OS-level channel.exited signal rather than to the in-memory bookkeeping shutdown() clears early — structurally cleaner than holding a snapshot in a closure, because it covers the channel-crash cleanup path symmetrically and not just the mid-drain double-Ctrl+C path. The triple-layer verification (unit test + SIGTERM-ignoring child reproducer + real-daemon double-Ctrl+C with kill -0 confirmation) is stronger evidence than the spec required.

The remaining channels-package duplicate-bridge concern is fairly deferred to Stage 1.5 with an inline FIXME tracker and the lift reordered ahead of the per-request sessionScope override so downstream Stage 1.5 work benefits from a single bridge implementation.

Verdict

APPROVE — all four critical findings across the three review passes are now closed.

wenshao added 7 commits May 7, 2026 10:44

github-advanced-security AI found potential problems May 7, 2026

View reviewed changes

Comment thread packages/cli/src/serve/httpAcpBridge.ts Fixed

Comment thread packages/sdk-typescript/src/daemon/DaemonClient.ts Fixed

wenshao requested a review from Copilot May 7, 2026 04:17

Copilot AI reviewed May 7, 2026

View reviewed changes

Copilot started reviewing on behalf of wenshao May 7, 2026 04:30 View session

wenshao requested a review from Copilot May 7, 2026 04:32

Copilot AI reviewed May 7, 2026

View reviewed changes

wenshao requested a review from Copilot May 7, 2026 04:40

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread packages/cli/src/serve/server.ts Outdated

Comment thread packages/cli/src/serve/server.ts

Comment thread packages/cli/src/serve/auth.ts Outdated

Comment thread packages/cli/src/serve/eventBus.ts Outdated

Comment thread packages/cli/src/serve/server.ts Outdated

wenshao commented May 7, 2026

View reviewed changes

Comment thread packages/cli/src/serve/httpAcpBridge.ts

Comment thread packages/cli/src/serve/httpAcpBridge.ts Outdated

wenshao added 3 commits May 7, 2026 13:29

wenshao requested a review from Copilot May 7, 2026 05:39

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread packages/cli/src/serve/server.ts Outdated

Comment thread packages/sdk-typescript/src/daemon/sse.ts Outdated

Comment thread packages/cli/src/serve/eventBus.ts

Comment thread packages/cli/src/serve/runQwenServe.ts Outdated

Comment thread packages/cli/src/serve/server.ts Outdated

wenshao requested a review from Copilot May 7, 2026 05:57

Copilot started reviewing on behalf of wenshao May 7, 2026 06:01 View session

wenshao commented May 7, 2026

View reviewed changes

Comment thread packages/cli/src/serve/eventBus.test.ts

wenshao commented May 7, 2026

View reviewed changes

Comment thread packages/cli/src/serve/eventBus.test.ts

Comment thread packages/cli/src/serve/auth.ts Outdated

Comment thread packages/cli/src/serve/httpAcpBridge.ts

Comment thread packages/cli/src/serve/server.ts Outdated

Comment thread packages/cli/src/serve/httpAcpBridge.ts

wenshao commented May 7, 2026

View reviewed changes

Comment thread packages/sdk-typescript/src/daemon/DaemonClient.ts

Comment thread packages/cli/src/serve/server.ts