refactor(serve): 1 daemon = 1 workspace (#3803 §02)#4113
Conversation
Stage 1 shipped with M-workspaces-per-daemon routing (`byWorkspaceChannel` Map keyed by request `cwd`). The §02 architectural revision in `docs/comparison/qwen-code-daemon-design/02-architectural-decisions.md` narrows the bridge to 1 daemon = 1 workspace × N sessions: each daemon binds to one canonical workspace path at boot; `POST /session` with a mismatched `cwd` returns 400 `workspace_mismatch`. Multi-workspace deployments run multiple daemon processes (one per workspace, supervised externally — systemd / docker-compose / k8s / `qwen-coordinator`). Bridge state collapses from maps to single optional slots: - `byWorkspaceChannel: Map<string, ChannelInfo>` → `channelInfo?: ChannelInfo` - `inFlightChannelSpawns: Map<string, Promise>` → `inFlightChannelSpawn?: Promise` - `byWorkspace: Map<string, SessionEntry>` → `defaultEntry?: SessionEntry` - `liveChannels: Set<ChannelInfo>` → not needed; `channelInfo` is the live reference, cleared only by `channel.exited` (preserves the tanzhenxin BkUyD invariant that `killAllSync` finds a target mid-SIGTERM-grace) `BridgeOptions.boundWorkspace` becomes required. `WorkspaceMismatchError` is thrown from `spawnOrAttach` when the request's canonical cwd doesn't match the bound path, translated to 400 `workspace_mismatch` (with both paths in the body) by the route layer. `CapabilitiesEnvelope.workspaceCwd` surfaces the bound path so clients pre-flight check + omit `cwd` from `POST /session` (it falls back to the bound workspace). A new `--workspace <path>` CLI flag lets operators override `process.cwd()` at boot. The previous `--http-bridge` / `--multi-workspace` opt-in was never shipped; nothing changes for default users running `qwen serve` in their project directory. Removed code path: ~150 LOC of multi-workspace map machinery in `httpAcpBridge.ts` plus the test cases that exercised it. Test surgery: - New `makeBridge()` helper in `httpAcpBridge.test.ts` injects `boundWorkspace: WS_A` by default; tests that need a different bind (the mismatch test) pass it explicitly. - `does NOT reuse across workspaces` → `rejects cross-workspace requests with WorkspaceMismatchError` (the new semantics under §02). - `shutdown kills every live channel` retargeted to single-channel multi-session shutdown. - `killAllSync force-kills channels even after shutdown cleared byWorkspaceChannel (BkUyD)` retargeted to single-channel: the invariant is the same (channel reference must outlive eager shutdown clearing), the surface is just smaller. - `listWorkspaceSessions` cross-workspace assertion now expects empty for the un-bound path. - `--max-sessions` cap test uses two thread-scope sessions on `WS_A` instead of WS_A + WS_B. Closes #3803 §02.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Two correctness fixes + four doc/test polish items surfaced by the multi-agent review of #4113: 1. `killSession` → `spawnOrAttach` race (Critical). After killing the last session, `channel.kill()` runs through a 5s SIGTERM grace before SIGKILL. During that window a concurrent `spawnOrAttach` used to hit `ensureChannel`, find `channelInfo` still set, and reuse the dying transport — either landing the caller with a sessionId that 404s on every follow-up once `channel.exited` fires, or hanging until the newSession timeout. Fix: add an `isDying: boolean` flag on `ChannelInfo`, set synchronously by `killSession` / `doSpawn`-newSession-failure / `shutdown` BEFORE awaiting `channel.kill()`. `ensureChannel` treats a dying channel as absent and spawns a fresh one. The tanzhenxin BkUyD invariant ("`channelInfo` reference must outlive the kill-await for `killAllSync` mid-grace") is preserved — we set `isDying` but don't clear `channelInfo` until the OS reaps the child via `channel.exited`. A regression test in `httpAcpBridge.test.ts` pins the invariant: a never-resolving `kill()` keeps the SIGTERM grace open while a concurrent spawn verifies the factory was called twice (two distinct handles). 2. `boundWorkspace` canonicalization divergence (Critical). `server.ts` and `runQwenServe.ts` each computed `opts.workspace ?? process.cwd()` independently. The bridge canonicalized that string via `realpathSync.native` (resolving symlinks, case-folding on case-insensitive filesystems); the callers retained the raw form. On macOS HFS+ / APFS or any symlinked path, `/capabilities.workspaceCwd` advertised one spelling while the bridge enforced against another — clients echoing the advertised path back saw `POST /session` succeed but the response carry a different `workspaceCwd`. Fix: export `canonicalizeWorkspace` from `httpAcpBridge.ts` and call it once in `runQwenServe` (after the existence check) and once in `createServeApp`. Both paths land on the same canonical form; the bridge's own re-canonicalize is now a no-op (idempotent). 3. Reject `--workspace` pointing at non-existent directories at boot (Suggestion). `canonicalizeWorkspace`'s ENOENT fallback to `path.resolve` previously let the daemon boot pointed at a path that didn't exist; every `POST /session` then spawned a `qwen --acp` child with that cwd and the agent failed with an opaque ENOENT. Now `runQwenServe` `statSync`s the bound path at boot and rejects "directory does not exist" / "not a directory" with a clear message. 4. Stale docstrings (Nice to have). `types.ts` `ServeMode` JSDoc said "one `qwen --acp` child PER WORKSPACE" — directly contradicted the new `workspace` field's doc in the same file. `commands/serve.ts` `--http-bridge` description said "per workspace" — directly contradicted the `--workspace` flag's help in the same yargs builder. Both updated to "per daemon (the daemon binds to ONE workspace at boot)". 5. Stale `byWorkspace` comment references (Nice to have). `server.ts:188` ("orphaned in byId / byWorkspace") and `httpAcpBridge.test.ts:1210` ("still in byId/byWorkspace at the moment of crash") referenced the removed Map. Updated to `defaultEntry`. 6. `/capabilities` curl example in the Authentication section of `docs/users/qwen-serve.md` was missing the new `workspaceCwd` field — the Quickstart's curl example was updated but the parallel one in the auth section was not. Synced. Tests added: - `killSession marks the channel dying so concurrent spawnOrAttach gets a fresh channel` — pins fix (1). - `--workspace flows end-to-end and surfaces on /capabilities` — exercises the runQwenServe → server.ts → bridge plumbing that no prior test covered. - `rejects --workspace pointing at a non-existent directory` and `rejects --workspace pointing at a regular file` — pin fix (3). - `rejects relative --workspace at boot` — covers the absoluteness check that exists but was untested. Net: +238 / -24 across 8 files. All 149 serve tests pass.
…lure coverage Round-2 review of #4113 caught three follow-up issues introduced by or left open after round-1's fixes: 1. **BkUyD invariant overwrite race (Critical).** Round-1's `isDying` flag lets `ensureChannel` skip a dying channel and spawn a fresh one. When the fresh spawn completes, `channelInfo = info` overwrote the dying channel's reference — leaving NO global pointer to it. `killAllSync()` then iterated only `channelInfo` (the fresh one) and missed the dying child entirely. A double-Ctrl+C arriving mid-SIGTERM-grace would call `process.exit(1)` before the dying child's per-channel SIGKILL escalation timer fired, orphaning the child. Restore a `aliveChannels: Set<ChannelInfo>` (parallel to the original Stage 1 design, but justified by single-workspace too). Entries added in `ensureChannel`, removed by each channel's `channel.exited` handler. `killAllSync` iterates the SET, not the single attach-target slot. `shutdown` does the same — snapshots every alive channel and kills each, not just the current `channelInfo`. New regression test pins the invariant: spawn → killSession (channel marked dying, kill hangs) → spawnOrAttach (fresh channel overwrites `channelInfo`) → `killAllSync` — expect BOTH channels' `killSync` to fire. Pre-fix only the fresh one would have fired. 2. **Windows-fragile test path.** The new `rejects --workspace pointing at a regular file` test used `new URL(import.meta.url).pathname` to get a path to the test file. On Windows that returns `/C:/path/...` (leading slash); `fs.statSync` then resolves it as path-from-current-drive-root, fails with ENOENT, and the test sees the "does not exist" error message instead of the expected "not a directory" branch. CI runs `windows-latest`. Fix: `fileURLToPath(import.meta.url)` from `node:url`. 3. **doSpawn newSession-failure isDying path was untested.** The round-1 fix added `ci.isDying = true` to both `killSession` AND `doSpawn`'s newSession-failure catch, but only the killSession path had a regression test. Added a parallel one for the doSpawn path: thread-scope bridge with a `newSessionImpl` that throws on the first call → captures the rejection without awaiting it (the bridge's `await ci.channel.kill()` hangs in the test), yields enough cycles for the `isDying = true` sync prefix to settle, then confirms (a) the next `spawnOrAttach` produces a fresh channel and (b) `killAllSync` finds both channels in `aliveChannels`. Also added a `newSessionImpl` option to the test FakeAgent — the existing `initializeThrows` hook covered handshake-time failures, but post-init `newSession` rejections (auth, bad config, mid-init crashes) had no test affordance. All 151 serve tests pass.
Round-3 review caught that the SDK example doc was the only one of the three serve-related docs that the §02 refactor didn't touch. Updated: - Boot log example now shows the `, workspace=/path/to/your-project` suffix that `runQwenServe` emits after the §02 changes. - The "Hello daemon" example now reads `caps.workspaceCwd` off `/capabilities` and passes it back as `workspaceCwd` on session creation — illustrating the documented pre-flight pattern, not a hand-written literal that may not match the daemon's actual bind. - Shared-session example makes the prerequisite explicit: the daemon must be bound to `/work/repo` (via `--workspace` or `cd`); under §02 two clients can only share a session if they're both hitting a daemon already bound to that workspace. - New "Workspace mismatch" section shows how to handle the `400 workspace_mismatch` error class: catching `DaemonHttpError`, branching on `body.code`, surfacing `boundWorkspace` / `requestedWorkspace` for the operator. This is a new error class SDK consumers' error handlers should branch on. No code changes; docs only.
…orkspace
Round-4 review caught one type-drift gap + a set of integration-test
assumptions that the §02 refactor invalidated.
**SDK type drift.** `DaemonCapabilities` in
`packages/sdk-typescript/src/daemon/types.ts` was the SDK-side mirror
of `CapabilitiesEnvelope` on the daemon side. The §02 PR added
`workspaceCwd: string` to the daemon envelope (and the round-3 doc
example reads `caps.workspaceCwd` off the SDK client) but the SDK
type wasn't updated. A TypeScript consumer copying the doc snippet
verbatim would hit `TS2339 'workspaceCwd' does not exist on type
'DaemonCapabilities'`. The wire field is present so JS consumers
wouldn't notice — but the SDK is marketed as a TypeScript quickstart,
so this is a real onboarding break.
Fix: add `workspaceCwd: string` to `DaemonCapabilities` (parallel to
`DaemonSession.workspaceCwd` which is already there). The SDK unit
test for `client.capabilities()` was updated to put the new field
in the mocked response.
**Integration tests.** `qwen-serve-routes.test.ts` spawns a real
`qwen serve` daemon in `beforeAll`. Three breakages exposed:
1. The daemon was launched without `--workspace`, so it inherited
the test runner's `cwd`. Tests then POST `workspaceCwd: REPO_ROOT`
assuming the daemon is bound to the repo root — true when run via
`npm test` from the repo, brittle from IDEs / launchers that have
a different `cwd`. Added `'--workspace', REPO_ROOT` to the spawn
args so the bound workspace is deterministic regardless of where
the test runner is launched.
2. The `bad modelServiceId` test used `cwd: '/tmp'`. Under §02 this
would now return 400 workspace_mismatch before the session was
spawned. Switched to `REPO_ROOT` and softened the `attached`
assertion (REPO_ROOT may already have a session from earlier
tests in the suite under sessionScope:single).
3. Added three new integration tests pinning the §02 surface
end-to-end through a real daemon process:
- `rejects cross-workspace cwd with 400 workspace_mismatch` —
posts `/tmp` and asserts the full structured error body
(`code`, `boundWorkspace`, `requestedWorkspace`).
- `omits cwd → falls back to bound workspace` — posts an empty
body and asserts the response's `workspaceCwd` matches REPO_ROOT
(verifies the runQwenServe → createServeApp → bridge fallback
plumbing).
- `GET /capabilities surfaces workspaceCwd` — asserts the new
SDK type field is populated correctly off the wire.
All 422 unit tests pass (cli serve + sdk). Integration tests
typecheck clean.
wenshao
left a comment
There was a problem hiding this comment.
[Critical] The streaming integration suite still starts qwen serve without --workspace, but the tests create sessions with workspaceCwd: REPO_ROOT. With the new single-workspace default, an omitted --workspace binds the daemon to process.cwd(), so running the documented integration workflow from integration-tests can make these tests fail with 400 workspace_mismatch before they exercise streaming behavior. Please pass --workspace, REPO_ROOT in integration-tests/cli/qwen-serve-streaming.test.ts, matching the routes suite.
— gpt-5.5 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Test coverage gaps (not mapped to specific diff lines):
- canonicalizeWorkspace is a now-exported API called by 3 modules, but has zero dedicated unit tests in
httpAcpBridge.test.ts(symlink resolution, case-insensitive FS, ENOENT fallback, idempotency). - boundWorkspace TypeError path in
createHttpAcpBridge(httpAcpBridge.ts:992) is untested — all tests pass absolute paths viamakeBridge(). - Empty bridge shutdown (zero sessions, no channel) is untested — every test calls
spawnOrAttachbeforeshutdown(). workspace_mismatchresponse body (code,boundWorkspace,requestedWorkspace) parsing is untested inDaemonClient.test.ts.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
Process the 7 inline /review comments on PR #4113: - C1+C3 (SDK): make `DaemonCapabilities.workspaceCwd` and `CreateSessionRequest.workspaceCwd` optional in the SDK types. `workspaceCwd` is an additive field on the v=1 envelope per #3803 §02; the protocol's "bump v only on incompatible changes" stance is honored by leaving the field optional at the type level. `DaemonClient.createOrAttachSession` now omits `cwd` from the body when `workspaceCwd` isn't passed, matching the PR description's "SDK accepts bound path or none". Adds a unit test pinning the empty-body shape. - C2 (docs/users/qwen-serve.md): the `--http-bridge` row described the pre-§02 per-session model; updated to reflect one child per daemon with N sessions multiplexed via ACP `newSession()`. - C4 (server.ts): `WorkspaceMismatchError` was silently 400'ing without a stderr breadcrumb, leaving operators blind to cross-workspace routing drift. Mirrors the SessionLimitExceeded /InvalidPermissionOption observability pattern. - C5 (server.test.ts): the `/capabilities` fallback test compared `res.body.workspaceCwd` against raw `process.cwd()`; on macOS default tmpdir flows (`/var/folders/...` → `/private/var/...`) the canonicalize-once route value diverges. Use `realpathSync.native(process.cwd())` to match the route's canonicalization. - C6 (server.ts): the cwd-not-absolute error said "cwd is required and must be an absolute path" but cwd is now optional under §02. Tightened wording to "must be an absolute path when provided". - C7 (runQwenServe.ts): the `statSync` catch only wrapped ENOENT with a friendly diagnostic; EACCES / EPERM (typical for SIP-protected dirs on macOS or root-owned paths the daemon's UID can't traverse) re-threw as raw `SystemError`. Wrap both codes with a `--workspace`-context message so the boot failure points at the flag the operator set. Docs: quickstart shows the explicit-pass-or-omit options side by side; protocol reference notes `workspaceCwd` is additive to v=1.
Windows CI failed on this PR's two new tests because returns (drive-relative absolute), so the route's canonicalize step diverged from the hardcoded literal. Mirror the WS_A/WS_B pattern already used in httpAcpBridge.test.ts: define WS_BOUND / WS_DIFFERENT via `path.resolve(path.sep, …)` and use the constants everywhere. The 400 workspace_mismatch test would still have passed (mock controls both throw + assertion) but I aligned it for consistency. Failures from CI run 25806528710: expected 'D:\work\bound' to be '/work/bound' (Object.is) Affected tests: - createServeApp > GET /capabilities > reports the bound workspace - createServeApp > POST /session > 200 when cwd is omitted
wenshao
left a comment
There was a problem hiding this comment.
Review submitted via API.
Four new inline findings from the latest /review pass:
- N1 (integration-tests/cli/qwen-serve-routes.test.ts) — Critical:
the `workspace_mismatch` assertion compared `requestedWorkspace`
against the literal `'/tmp'`, but the bridge canonicalizes via
`realpathSync.native` and on macOS `/tmp` is a symlink to
`/private/tmp`. Compare against `realpathSync.native('/tmp')` so
the assertion is portable.
- N2 (packages/cli/src/serve/types.ts):
`CapabilitiesEnvelope.workspaceCwd: string` (server side) diverged
from the SDK's `DaemonCapabilities.workspaceCwd?: string`. Made the
server type optional too — matches the SDK, matches the protocol
doc's "additive to v=1" framing, doesn't change runtime emission
(the post-§02 server still always populates the field).
- N3 + N4 (packages/cli/src/serve/server.ts + sdk-typescript/.../DaemonClient.ts):
the route's `cwd` validation treated every non-string body value
(`null`, `123`, `{}`, `[]`) the same as omitted, silently falling
back to `boundWorkspace`. That hid client/orchestrator
serialization bugs as "session attached to wrong workspace".
Now the route uses `'cwd' in body` to detect presence and rejects
presence-but-not-a-string with `400 'cwd must be a string absolute
path when provided'`. Empty string still hits the existing
`path.isAbsolute` branch ("must be an absolute path when
provided"), so an SDK caller passing `workspaceCwd: ''` no longer
silently lands in the daemon's bound workspace.
SDK side: reverted my conditional spread to `cwd: req.workspaceCwd`
unconditional. `JSON.stringify` strips `undefined` automatically
(so omitted `workspaceCwd` becomes "no `cwd` key" on the wire, as
before), but empty-string is now forwarded verbatim and the server's
400 surfaces the bug instead of the SDK swallowing it. Added a unit
test pinning the empty-string-forwarded shape.
Server tests:
- `400 when cwd is present but not a string` covers null / number /
object / array via a sub-loop.
- `400 when cwd is the empty string` pins the isAbsolute path.
bridge: 73/73; server: 80/80 (was 78, +2 new); SDK: 40/40 (was 39,
+1 empty-string test). tsc clean for SDK and PR-touched CLI files.
CI lint failed with packages/cli/src/serve/server.ts:199:9 prefer-const: 'cwd' is never reassigned. The wave-4 rewrite split the original 'let cwd; if (!cwd) cwd = boundWorkspace' into a single ternary, which removes the only mutation path; the variable should be const accordingly.
wenshao
left a comment
There was a problem hiding this comment.
wenshao
left a comment
There was a problem hiding this comment.
wenshao
left a comment
There was a problem hiding this comment.
No Critical code issues found. Build and all targeted tests pass locally. The TS2339 error from tsc is environment-related (missing node_modules in review worktree). The PR author confirms npx tsc --noEmit -p integration-tests/tsconfig.json is clean for qwen-serve-*.
— deepseek-v4-pro via Qwen Code /review
…-v4-pro) Five new inline findings; M1 was already resolved in 1c7f5f0. - M2 (httpAcpBridge.ts): drop the dead `ChannelInfo.workspaceCwd` field. Pre-§02 it was the routing key for `byWorkspaceChannel.get`; after the §02 collapse all reads target `SessionEntry.workspaceCwd` and `ChannelInfo.workspaceCwd` was only written, never read. Per- channel storage also suggests variance the "1 daemon = 1 workspace" model forbids. Removing the field encodes the single-workspace invariant in the type itself; left a stub comment so future readers don't reintroduce it. - M3 (httpAcpBridge.ts): fast-path `canonicalizeWorkspace` when `req.workspaceCwd === boundWorkspace`. The §02 recommended client flow is `caps.workspaceCwd` → POST `cwd: caps.workspaceCwd`, and the omit-cwd route in server.ts synthesizes the same equality. Both hit the equality check and skip the sync `realpathSync.native` syscall. Non-equal inputs fall through to the full canonicalize (clients sending `/work/./bound`, mixed casing on case-insensitive FS, symlink aliases) so correctness is unchanged. - M4 (httpAcpBridge.ts): operator stderr breadcrumb in the `channel.exited` handler. An agent crash (OOM / segfault) used to be silent on the daemon side — the child-stderr forwarder caught whatever the child wrote before dying (often nothing on SIGKILL/segfault), and SSE subscribers saw `session_died` frames but operators reading `qwen serve`'s own output had no signal that the agent process was gone. Log code+signal+affected-session-count so the line is the canonical "agent disappeared" indicator. - M5 (server.ts): documentation-only. The reviewer wanted `createServeApp` to validate `opts.workspace` exists + is a directory (currently only `runQwenServe` does). Trade-off: doing that breaks 4 existing tests which pass synthetic `/work/bound` on purpose to exercise route-layer behavior without a real directory. Deferred the helper extraction; added a JSDoc note pinning the contract so future entry points binding `createServeApp` to user input know to replicate the validation. - M6 (runQwenServe.ts): pass the already-canonical `boundWorkspace` into `createServeApp` via `opts.workspace`. `canonicalizeWorkspace` is idempotent so the server-side recanonicalize is a no-op today, but if a future refactor ever makes it non-idempotent the values the route advertises on `/capabilities` and the bridge enforces would diverge — landing clients in a "/capabilities says X, POST /session/X returns workspace_mismatch" contradiction. Removes the drift risk. bridge: 73/73; server: 80/80; tsc clean for PR-touched files.
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅
This is the fourth review round on this PR. Previous rounds (deepseek-v4-pro, gpt-5.5) identified issues around workspace canonicalization, error message accuracy, lint warnings, test portability, and SDK type alignment — all addressed in commits 4d41fce, c922a4e, 1c7f5f0, and 67a937a.
Deterministic checks: tsc clean, eslint clean, build passes, 168 serve tests pass, 258 SDK tests pass. CI: 8/8 checks green.
— qwen-latest-series-invite-beta-v17 via Qwen Code /review
|
Follow-up from automated review — observability suggestions for 3 AM oncall scenarios (non-blocking, can be addressed in a follow-up): 1. .catch((err) => { writeStderrLine(`qwen serve: reap error for ${session.sessionId}: ${String(err)}`); });2. 3. entry.channel.exited.then((info) => { throw new Error(`agent channel closed mid-request (session ${entry.sessionId}, code=${info?.exitCode}, signal=${info?.signalCode})`); });4. Initialize timeout lacks child diagnostics ( 5. Shutdown has no progress logging ( All of these are nice-to-have improvements — the current code handles all correctness cases properly. These are purely about making production debugging easier. |
Follow-up: Observability suggestions (non-blocking)From a "3 AM oncall" audit — these are nice-to-have improvements that would make production debugging easier. All correctness cases are handled properly. 1. swallows cleanup errors (server.ts L263, L276)The disconnect reap paths catch all errors with zero logging. If .catch((err) => {
writeStderrLine(`qwen serve: reap error for ${session.sessionId}: ${String(err)}`);
});2.
|
Follow-up: Observability suggestions (non-blocking)From a "3 AM oncall" audit — these are nice-to-have improvements that would make production debugging easier. All correctness cases are handled properly. 1.
|
1 similar comment
Follow-up: Observability suggestions (non-blocking)From a "3 AM oncall" audit — these are nice-to-have improvements that would make production debugging easier. All correctness cases are handled properly. 1.
|
wenshao
left a comment
There was a problem hiding this comment.
Review Summary
5 Suggestions found. No Critical issues. All findings are defensive/maintainability concerns — no functional bugs. Core logic ('cwd' in body check, fast-path equality, SDK transparent-forwarding) is correct.
Findings that could not be mapped to diff lines (pre-existing code affected by this PR's context):
1. canonicalizeWorkspace catch-all swallows all fs errors (httpAcpBridge.ts:2255)
The catch {} silently swallows all errors from realpathSync.native (EACCES, ELOOP, EIO), not just ENOENT. Production path is safe (boot-time statSync validation), but future entry points calling createServeApp directly could boot with a non-canonical path. Consider narrowing catch to ENOENT only or adding writeStderrLine warnings for non-ENOENT errors.
2. JSON.stringify implicit undefined-stripping (DaemonClient.ts:279)
JSON.stringify({ cwd: req.workspaceCwd, ... }) relies on undefined being stripped to implement "omit cwd". A future replacer or body middleware would silently send {"cwd": null}. Consider explicit body construction: const body: Record<string, unknown> = {}; if (req.workspaceCwd !== undefined) body.cwd = req.workspaceCwd;
3. Missing liveness recheck after applyModelServiceId in attach paths (httpAcpBridge.ts:1575)
doSpawn performs byId.has() after model init, but non-merge attach paths return directly after applyModelServiceId without rechecking. If the child crashes during that await, the attach path returns a zombie session (HTTP 200, but all subsequent requests 404). Consider adding a byId.has() recheck mirroring doSpawn's Bd1zc pattern.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
Two new inline findings: - O1 (server.ts): the POST /session route uses `'cwd' in body` against `safeBody`'s `Object.create(null)` output to distinguish "client omitted cwd" from "client sent cwd". The semantics quietly couple to `safeBody`'s literal strip list (`__proto__/constructor/prototype`). If a future maintainer adds a user-facing key (e.g. `cwd`) to that strip list, the route's presence-check would silently flip to "absent → fallback", masking the bug as "wrong workspace bound." Extracted `PROTOTYPE_POLLUTION_KEYS: ReadonlySet<string>` as a named module-scope constant; safeBody uses `.has()` on it (behavior unchanged); the route's comment now cross-references the const so the coupling is documented at both ends. The const's JSDoc spells out what to do if the strip set ever has to grow into user-key territory. - O2 (sdk-typescript): `DaemonCapabilities.workspaceCwd` is `string | undefined` (additive to v=1; pre-§02 daemons omit). SDK consumers that pass it into a `string` context get a TS strict error or, against an old daemon, a runtime `Cannot read properties of undefined`. Added a `requireWorkspaceCwd` helper + `DaemonCapabilityMissingError` so consumers can opt into an actionable `DaemonCapabilities.workspaceCwd is missing — introduced in #3803 §02 …` error instead. Exported both from `@qwen-code/sdk`'s top-level module + the `daemon/` sub-module. Unit tests cover populated, missing, and empty-string inputs. bridge: 73/73; server: 80/80; SDK DaemonClient: 43/43 (was 40, +3 new requireWorkspaceCwd cases). tsc clean for SDK and PR-touched CLI files.
doudouOUC
left a comment
There was a problem hiding this comment.
[Suggestion] Potential log injection via err.requested on server.ts:906-908.
writeStderrLine here interpolates err.requested directly into a single log line. The value flows from POST /session body → canonicalizeWorkspace(req.workspaceCwd) → WorkspaceMismatchError. Both path.resolve and realpathSync.native preserve control characters (newlines, CR, NUL) inside path segments — they only normalize separators, ../., and resolve symlinks. So a body like:
{"cwd": "/some/path\nqwen serve: FAKE LOG LINE injected by attacker"}(where \n is a literal newline in the JSON string) will produce a stderr breadcrumb that looks like two valid daemon log lines to any line-based log shipper.
err.bound is safe (canonicalized once at boot from --workspace/process.cwd()), but err.requested is user-input-derived. Bearer-token gate bounds the abuse to authenticated callers, but logs are often shipped to shared infra (Splunk / Loki / SIEM) where line-based parsing makes injection particularly cheap to weaponize.
Two reasonable mitigations:
JSON.stringify(err.requested)in the log line — escapes control chars and quotes the value, making the injection visible:writeStderrLine( `qwen serve: workspace_mismatch (POST /session): ` + `daemon bound to ${JSON.stringify(err.bound)}, rejected ${JSON.stringify(err.requested)}`, );
- Add a
containsControlChars(p)reject branch incanonicalizeWorkspaceitself so paths with\n/\r/\0400 at the route boundary before reaching the bridge.
(1) is the smaller change and fixes this specific log line; (2) is the more thorough fix and would also defend any future code that interpolates a workspace path into structured output.
wenshao
left a comment
There was a problem hiding this comment.
Overall this is a well-executed architectural simplification. Strong test coverage for the race conditions (BkUyD, cold-spawn-window, dying-channel overlap). A few inline suggestions below — mostly comment verbosity and minor cleanup.
wenshao
left a comment
There was a problem hiding this comment.
Overall a well-executed architectural simplification with strong test coverage for the race conditions (BkUyD, cold-spawn-window, dying-channel overlap). Approve with minor suggestions on comment verbosity, redundant syscalls, and edge cases (see inline comments).
--Opus-4.7 via Claude Code /review
…-v4-pro)
Eight new inline findings; six applied, two deferred-with-reply.
- P1 (httpAcpBridge.ts init-failure isDying comment): my comment
overstated what `info.isDying` accomplishes on the init-failure
path — concurrent `ensureChannel()` callers don't bypass via
`isDying`, they coalesce on `inFlightChannelSpawn` and observe the
same rejection. Reworded to describe the actual cross-path
invariant marker.
- P2 (server.ts workspace_mismatch log injection): doudouOUC flagged
log injection via `err.requested` (user-controlled). `path.resolve`
+ `realpathSync.native` preserve control chars in path segments,
so a body `{"cwd": "/legit/path\nqwen serve: FAKE LOG"}` would
emit two valid-looking daemon log lines on stderr — weaponizing
line-based log shippers (Splunk / Loki / journald → SIEM).
`JSON.stringify` both `err.bound` and `err.requested` in the log
line escapes control chars + quotes the values, making any
injection attempt visible-as-quoted-noise rather than forged-line.
Bound is operator-controlled and inherently safe but quoted
symmetrically for readability. The defense-in-depth alternative
(reject control chars in canonicalizeWorkspace) is deferred —
this single log site was the actionable interpolation; future
workspace-path-into-stderr / -JSON / -templated-SQL flows can pick
up the rejection if they ship.
- P3 (httpAcpBridge.test.ts): refactor the cross-workspace
WorkspaceMismatchError test to a single `.catch((e) => e)` capture
rather than firing the rejection twice (once for the `rejects
.toBeInstanceOf` matcher, once for the field assertions). Logic
unchanged.
- P4 (httpAcpBridge.ts channel.exited log): the `qwen serve:
channel exited (...)` line fired on every channel exit including
planned shutdown — alarming for operators who Ctrl+C'd a healthy
daemon. Guarded with `if (!shuttingDown)` so the planned-shutdown
case (operator already saw `received SIGINT, draining...`) stays
silent. The killSession path (last session leaves, daemon stays
up — no top-level context line) still logs, since the line is the
only signal that the cleanup actually ran.
- P5 (httpAcpBridge.ts): light trim of the "pre-fix" narrative
voice in two comment blocks (cold-spawn ensureChannel layout +
BkUyD killAllSync aliveChannels iteration). Kept the invariant
explanations — those carry maintenance value — dropped the
"pre-fix the code did X" framing that's review-context not
future-reader context.
- P6 (server.ts + runQwenServe.ts): `createServeApp` now accepts a
pre-canonicalized `deps.boundWorkspace` to skip its own
`canonicalizeWorkspace` syscall when the caller (runQwenServe)
already did the work. Replaces my earlier `{...opts, workspace:
boundWorkspace}` opts-mutation hack — cleaner separation of
concerns + drops one `realpathSync.native` per boot. Direct
callers (tests, embeds) that omit `deps.boundWorkspace` still get
the in-body canonicalize path.
- P8 (httpAcpBridge.ts): defensive `aliveChannels.size > 2`
warning. The set is intentionally multi-entry to cover the
killSession-then-spawnOrAttach overlap window (size 2 is
legitimate). Anything higher implies a `channel.exited` handler
never fired for a prior channel — a real leak we'd otherwise
catch only as gradually-growing RSS. The warning surfaces it the
moment it happens.
- P7 (CreateSessionRequest.workspaceCwd optional): deferred with
reply rationale. Making the field optional is the §02 design
("SDK accepts bound path or none"); the JSDoc already explains
the omit-vs-explicit choice; Stage 1 has no shipping SDK
consumers so there's no breakage to call out in a changelog file.
No code change.
bridge: 74/74 (cross-workspace test refactor + behavioral assertions
unchanged); server: 80/80; SDK 43/43. tsc clean for PR-touched
files.
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: self-PR. — gpt-5.5 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No high-confidence Critical findings. Three additional Suggestion-level items couldn't be anchored to specific diff lines (the routes/exports they reference weren't modified in this PR but interact with the §02 contract):
1. packages/cli/src/serve/index.ts — barrel re-exports
The serve barrel re-exports SessionNotFoundError but not WorkspaceMismatchError, SessionLimitExceededError, InvalidPermissionOptionError, or the now-exported canonicalizeWorkspace. WorkspaceMismatchError is part of the public daemon error contract documented in docs/developers/qwen-serve-protocol.md; the asymmetry with SessionNotFoundError reads as an oversight rather than a deliberate cap. External consumers (the @qwen-code/acp-bridge package extraction noted as a chiga0 follow-up) can only reach these symbols via deep imports of ./httpAcpBridge.js.
2. packages/cli/src/serve/server.ts:426-439 — GET /workspace/:id/sessions contract asymmetry
This route silently returns 200 { sessions: [] } for any non-bound workspace (because bridge.listWorkspaceSessions filters all entries against boundWorkspace). Meanwhile POST /session rejects the same cross-workspace input with 400 workspace_mismatch. A session-picker UI debugging "why don't my sessions show up?" can't distinguish "right daemon but no sessions yet" from "queried the wrong daemon." Consider rejecting with 400 workspace_mismatch (matching the POST shape) when canonicalizeWorkspace(req.params.id) !== boundWorkspace — or at minimum add a stderr breadcrumb so the operator-side log has a trace.
3. packages/cli/src/serve/httpAcpBridge.ts:1898-1911 — listWorkspaceSessions missing M3 fast-path
This method unconditionally calls canonicalizeWorkspace(workspaceCwd) on every GET /workspace/:id/sessions, even though under §02 the canonical answer is always either boundWorkspace or empty. The M3 equality fast-path that was added to spawnOrAttach (line 1597-1600) wasn't propagated to this sibling. Session-picker UIs polling/refreshing on each connect pay the realpathSync.native syscall per request unnecessarily:
listWorkspaceSessions(workspaceCwd) {
if (!path.isAbsolute(workspaceCwd)) return [];
const key =
workspaceCwd === boundWorkspace
? boundWorkspace
: canonicalizeWorkspace(workspaceCwd);
if (key !== boundWorkspace) return [];
// ...
}— claude-opus-4-7 via Claude Code /qreview
- canonicalizeWorkspace: narrow catch to ENOENT only, propagate other filesystem errors - listWorkspaceSessions: add fast-path string equality to avoid realpathSync on every poll - GET /workspace/:id/sessions: return 400 workspace_mismatch for cross-workspace queries - SessionNotFoundError: accept optional extra message; clarify agent-crash-on-spawn case - requireWorkspaceCwd: distinguish empty-string (post-§02 bug) from absent (pre-§02 daemon)
wenshao
left a comment
There was a problem hiding this comment.
Remaining gaps (not in inline comments):
- SDK
DaemonClient.test.tslacks a test case forworkspace_mismatchresponse body fields (code,boundWorkspace,requestedWorkspace). The quickstart doc tells users to branch onbody.code === 'workspace_mismatch', but no SDK unit test verifies thatDaemonHttpError.bodyexposes these fields.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
Wave-5 commit 0c6e963 ("apply auto-fixes from /review (#4113)") added a 400 workspace_mismatch reject path to GET /workspace/:id/sessions for cross-workspace queries, but the existing two happy-path tests queried `/work/a` / `/work/idle` against an unbound daemon (which falls back to `process.cwd()`). Both turned to 400 in CI. Bind the daemon to WS_BOUND in both happy-path tests and query the same path. Add a third regression test that pins the §02 cross-workspace rejection contract — `code: workspace_mismatch`, both paths in the body, bridge.listCalls untouched (no silent fallback regression). Brings server.test.ts from 80 → 82 tests, all passing.
Six new inline findings; five applied, one defer-with-reply.
- Q1 (httpAcpBridge.ts + server.ts + tests): cwd length amplification
through WorkspaceMismatchError. The error constructor interpolates
`requested` into `.message` TWICE; `sendBridgeError` echoes it on
stderr (now JSON.stringify-wrapped); `res.json` echoes it again — a
~10 MB `cwd` body (right under express.json's 10 MB cap) would
amplify to ~60 MB per request × maxConnections (default 256). On
loopback-default-no-token deployments this is pre-auth. Added
`MAX_WORKSPACE_PATH_LENGTH = 4096` (Linux PATH_MAX); route rejects
oversized `cwd` with a 400 BEFORE the bridge is touched, and the
`WorkspaceMismatchError` constructor truncates `requested` as
defense-in-depth for non-route callers (tests, embeds, future
entry points that throw the error directly). Three new tests pin
the route 400, the constructor truncation, and the normal-path
passthrough.
- Q2 + Q5 (httpAcpBridge.ts docs): the `channelInfo` declaration
comment + `ChannelInfo.sessionIds` JSDoc + `ChannelInfo.isDying`
JSDoc all overstated when `channelInfo` is cleared. Post-§02 the
BkUyD invariant is "ONLY `channel.exited` clears `channelInfo`"
— teardown initiators (killSession last-session-leaving,
doSpawn-newSession-failure, ensureChannel init-failure/late-
shutdown, shutdown) set `isDying = true` but LEAVE `channelInfo`
pointing at the dying channel until OS reap, so `killAllSync`
can still reach it through `aliveChannels`. A future maintainer
reading the old phrasing might "fix" killSession to also clear
`channelInfo` and silently break the double-Ctrl+C force-kill
path. Rewrote all three sites to describe the actual invariant +
enumerate the 5 isDying set-sites + spell out the BkUyD rationale
in one place (the `isDying` JSDoc) that other comments point at.
- Q3 (runQwenServe.ts): the "listening on …" boot summary goes to
stdout but every other operational diagnostic (bearer auth, the
workspace_mismatch breadcrumb, channel-exited, bridge errors) goes
to stderr. Operators capturing only stderr (systemd / docker / k8s
default) miss the `workspace=` indicator, which is the single
piece of information they need most when triaging §02 migration
issues. Added a `qwen serve: bound to workspace "X"` stderr line
alongside the stdout one — keeps stdout untouched (integration
tests + scripts parse it) while making the breadcrumb visible to
stderr-only log shippers. `JSON.stringify` the boundWorkspace
value (operator-controlled but cheap defense-in-depth against any
future flow that lands a control char in the path).
- Q4 (integration-tests/tsconfig.json): the `paths` entry resolved
`@qwen-code/sdk` to the SDK's built `dist/` directory; `dist/` is
gitignored and stale dist (no `npm run build` first) yields TS2339
errors on the integration tests' imports of new SDK fields.
Pointed `paths` at SDK source instead — `tsc -p
integration-tests/tsconfig.json` no longer requires a prior
rebuild. The vitest config's runtime alias still resolves to
`dist/index.mjs` so the actual test execution exercises the
published-bundle shape; this paths entry only affects type
resolution.
- Q6 (httpAcpBridge.ts): `createHttpAcpBridge` constructor called
`canonicalizeWorkspace(opts.boundWorkspace)` even when the caller
(`runQwenServe`) had already canonicalized and threaded the same
value through `deps.boundWorkspace` into `createServeApp`. Two
independent `realpathSync.native` calls can theoretically diverge
on NFS-transient / mid-rename filesystems, landing the bridge with
a canonical form different from what `/capabilities` advertises
and from `createServeApp`'s view. Dropped the bridge's
re-canonicalize; kept `path.isAbsolute` (structural, not a
syscall); documented the caller contract on `BridgeOptions
.boundWorkspace` ("MUST be pre-canonicalized; tests/embeds call
`canonicalizeWorkspace` first"). Tests use
`path.resolve(path.sep, ...)` which is already canonical-or-
fallback for non-existent paths, so no test changes needed.
bridge: 76/76 (was 74, +2 WorkspaceMismatchError truncation tests);
server: 82/82 (was 80, +2 length cap + the auto-applied helper).
tsc clean for SDK, CLI PR-touched files, and integration-tests'
qwen-serve-*.
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
Targeted re-review against the latest five commits since my previous round. Both prior blockers — the streaming integration test now pinning the daemon to the repo root, and the cold-spawn window that previously allowed a half-handshaking channel to escape the alive-channels set — are closed with regression tests pinning each. The four prior minor items are either fixed or explicitly tradeoff-deferred with the deferral rationale captured in JSDoc, which is the right call for items at this size.
The new bound-workspace stderr breadcrumb, the cwd-length cap for the mismatch error path, the log-injection defang on the stderr breadcrumb, and the narrowed ENOENT-only catch in canonicalize all read as unprompted hardening rather than reactive fixes — net positive on hostile-input handling.
Verdict
APPROVE — both prior blockers closed with regression tests, deferred items have explicit tradeoff rationale.
[PR#4113](QwenLM/qwen-code#4113) refactor(serve): 1 daemon = 1 workspace (#3803 §02) 已合入: - merge commit 790f2d04 - merged 2026-05-15 04:44:37 UTC - 最终 +2051/-434 across 13 files(比 OPEN 时 +1121/-374 多了 ~900 LOC: round-1 symlink canonicalization fix + 更完整 tests + isDying flag / aliveChannels rename / overlap-race coverage) 全系列 4 文件状态调整: - §01 PR#4113 状态 OPEN → ✅ MERGED 2026-05-15;Stage 演进表拆为"已 ship 部分"vs"剩余" - §02 §2 PR#4113 引用从 OPEN 改为 MERGED 2026-05-15 + merge commit hash - §06 PR#4113 段落标题 🔧 → ✅;时间线开头加 PR#4113 ship 标记;1.5a 拆为 "已 ship 部分" vs "剩余" - README PR#4113 follow-up 块 OPEN → ✅;Stage 进展表 + 已合并 PR 表反映 MERGED 状态 Stage 1.5a 当前实际进度:multi-workspace 移除 ✅ + chiga0 10 must-haves 剩 9 项(#10 already shipped)+ Mode A 1.5b + daemon-side state CRUD 1.5c + AcpChannel lift 1.5-prereq 待开。
* refactor(serve): 1 daemon = 1 workspace (QwenLM#3803 §02) Stage 1 shipped with M-workspaces-per-daemon routing (`byWorkspaceChannel` Map keyed by request `cwd`). The §02 architectural revision in `docs/comparison/qwen-code-daemon-design/02-architectural-decisions.md` narrows the bridge to 1 daemon = 1 workspace × N sessions: each daemon binds to one canonical workspace path at boot; `POST /session` with a mismatched `cwd` returns 400 `workspace_mismatch`. Multi-workspace deployments run multiple daemon processes (one per workspace, supervised externally — systemd / docker-compose / k8s / `qwen-coordinator`). Bridge state collapses from maps to single optional slots: - `byWorkspaceChannel: Map<string, ChannelInfo>` → `channelInfo?: ChannelInfo` - `inFlightChannelSpawns: Map<string, Promise>` → `inFlightChannelSpawn?: Promise` - `byWorkspace: Map<string, SessionEntry>` → `defaultEntry?: SessionEntry` - `liveChannels: Set<ChannelInfo>` → not needed; `channelInfo` is the live reference, cleared only by `channel.exited` (preserves the tanzhenxin BkUyD invariant that `killAllSync` finds a target mid-SIGTERM-grace) `BridgeOptions.boundWorkspace` becomes required. `WorkspaceMismatchError` is thrown from `spawnOrAttach` when the request's canonical cwd doesn't match the bound path, translated to 400 `workspace_mismatch` (with both paths in the body) by the route layer. `CapabilitiesEnvelope.workspaceCwd` surfaces the bound path so clients pre-flight check + omit `cwd` from `POST /session` (it falls back to the bound workspace). A new `--workspace <path>` CLI flag lets operators override `process.cwd()` at boot. The previous `--http-bridge` / `--multi-workspace` opt-in was never shipped; nothing changes for default users running `qwen serve` in their project directory. Removed code path: ~150 LOC of multi-workspace map machinery in `httpAcpBridge.ts` plus the test cases that exercised it. Test surgery: - New `makeBridge()` helper in `httpAcpBridge.test.ts` injects `boundWorkspace: WS_A` by default; tests that need a different bind (the mismatch test) pass it explicitly. - `does NOT reuse across workspaces` → `rejects cross-workspace requests with WorkspaceMismatchError` (the new semantics under §02). - `shutdown kills every live channel` retargeted to single-channel multi-session shutdown. - `killAllSync force-kills channels even after shutdown cleared byWorkspaceChannel (BkUyD)` retargeted to single-channel: the invariant is the same (channel reference must outlive eager shutdown clearing), the surface is just smaller. - `listWorkspaceSessions` cross-workspace assertion now expects empty for the un-bound path. - `--max-sessions` cap test uses two thread-scope sessions on `WS_A` instead of WS_A + WS_B. Closes QwenLM#3803 §02. * fix(serve): address review findings on the §02 refactor Two correctness fixes + four doc/test polish items surfaced by the multi-agent review of QwenLM#4113: 1. `killSession` → `spawnOrAttach` race (Critical). After killing the last session, `channel.kill()` runs through a 5s SIGTERM grace before SIGKILL. During that window a concurrent `spawnOrAttach` used to hit `ensureChannel`, find `channelInfo` still set, and reuse the dying transport — either landing the caller with a sessionId that 404s on every follow-up once `channel.exited` fires, or hanging until the newSession timeout. Fix: add an `isDying: boolean` flag on `ChannelInfo`, set synchronously by `killSession` / `doSpawn`-newSession-failure / `shutdown` BEFORE awaiting `channel.kill()`. `ensureChannel` treats a dying channel as absent and spawns a fresh one. The tanzhenxin BkUyD invariant ("`channelInfo` reference must outlive the kill-await for `killAllSync` mid-grace") is preserved — we set `isDying` but don't clear `channelInfo` until the OS reaps the child via `channel.exited`. A regression test in `httpAcpBridge.test.ts` pins the invariant: a never-resolving `kill()` keeps the SIGTERM grace open while a concurrent spawn verifies the factory was called twice (two distinct handles). 2. `boundWorkspace` canonicalization divergence (Critical). `server.ts` and `runQwenServe.ts` each computed `opts.workspace ?? process.cwd()` independently. The bridge canonicalized that string via `realpathSync.native` (resolving symlinks, case-folding on case-insensitive filesystems); the callers retained the raw form. On macOS HFS+ / APFS or any symlinked path, `/capabilities.workspaceCwd` advertised one spelling while the bridge enforced against another — clients echoing the advertised path back saw `POST /session` succeed but the response carry a different `workspaceCwd`. Fix: export `canonicalizeWorkspace` from `httpAcpBridge.ts` and call it once in `runQwenServe` (after the existence check) and once in `createServeApp`. Both paths land on the same canonical form; the bridge's own re-canonicalize is now a no-op (idempotent). 3. Reject `--workspace` pointing at non-existent directories at boot (Suggestion). `canonicalizeWorkspace`'s ENOENT fallback to `path.resolve` previously let the daemon boot pointed at a path that didn't exist; every `POST /session` then spawned a `qwen --acp` child with that cwd and the agent failed with an opaque ENOENT. Now `runQwenServe` `statSync`s the bound path at boot and rejects "directory does not exist" / "not a directory" with a clear message. 4. Stale docstrings (Nice to have). `types.ts` `ServeMode` JSDoc said "one `qwen --acp` child PER WORKSPACE" — directly contradicted the new `workspace` field's doc in the same file. `commands/serve.ts` `--http-bridge` description said "per workspace" — directly contradicted the `--workspace` flag's help in the same yargs builder. Both updated to "per daemon (the daemon binds to ONE workspace at boot)". 5. Stale `byWorkspace` comment references (Nice to have). `server.ts:188` ("orphaned in byId / byWorkspace") and `httpAcpBridge.test.ts:1210` ("still in byId/byWorkspace at the moment of crash") referenced the removed Map. Updated to `defaultEntry`. 6. `/capabilities` curl example in the Authentication section of `docs/users/qwen-serve.md` was missing the new `workspaceCwd` field — the Quickstart's curl example was updated but the parallel one in the auth section was not. Synced. Tests added: - `killSession marks the channel dying so concurrent spawnOrAttach gets a fresh channel` — pins fix (1). - `--workspace flows end-to-end and surfaces on /capabilities` — exercises the runQwenServe → server.ts → bridge plumbing that no prior test covered. - `rejects --workspace pointing at a non-existent directory` and `rejects --workspace pointing at a regular file` — pin fix (3). - `rejects relative --workspace at boot` — covers the absoluteness check that exists but was untested. Net: +238 / -24 across 8 files. All 149 serve tests pass. * fix(serve): BkUyD overwrite race + Windows-fragile test + doSpawn-failure coverage Round-2 review of QwenLM#4113 caught three follow-up issues introduced by or left open after round-1's fixes: 1. **BkUyD invariant overwrite race (Critical).** Round-1's `isDying` flag lets `ensureChannel` skip a dying channel and spawn a fresh one. When the fresh spawn completes, `channelInfo = info` overwrote the dying channel's reference — leaving NO global pointer to it. `killAllSync()` then iterated only `channelInfo` (the fresh one) and missed the dying child entirely. A double-Ctrl+C arriving mid-SIGTERM-grace would call `process.exit(1)` before the dying child's per-channel SIGKILL escalation timer fired, orphaning the child. Restore a `aliveChannels: Set<ChannelInfo>` (parallel to the original Stage 1 design, but justified by single-workspace too). Entries added in `ensureChannel`, removed by each channel's `channel.exited` handler. `killAllSync` iterates the SET, not the single attach-target slot. `shutdown` does the same — snapshots every alive channel and kills each, not just the current `channelInfo`. New regression test pins the invariant: spawn → killSession (channel marked dying, kill hangs) → spawnOrAttach (fresh channel overwrites `channelInfo`) → `killAllSync` — expect BOTH channels' `killSync` to fire. Pre-fix only the fresh one would have fired. 2. **Windows-fragile test path.** The new `rejects --workspace pointing at a regular file` test used `new URL(import.meta.url).pathname` to get a path to the test file. On Windows that returns `/C:/path/...` (leading slash); `fs.statSync` then resolves it as path-from-current-drive-root, fails with ENOENT, and the test sees the "does not exist" error message instead of the expected "not a directory" branch. CI runs `windows-latest`. Fix: `fileURLToPath(import.meta.url)` from `node:url`. 3. **doSpawn newSession-failure isDying path was untested.** The round-1 fix added `ci.isDying = true` to both `killSession` AND `doSpawn`'s newSession-failure catch, but only the killSession path had a regression test. Added a parallel one for the doSpawn path: thread-scope bridge with a `newSessionImpl` that throws on the first call → captures the rejection without awaiting it (the bridge's `await ci.channel.kill()` hangs in the test), yields enough cycles for the `isDying = true` sync prefix to settle, then confirms (a) the next `spawnOrAttach` produces a fresh channel and (b) `killAllSync` finds both channels in `aliveChannels`. Also added a `newSessionImpl` option to the test FakeAgent — the existing `initializeThrows` hook covered handshake-time failures, but post-init `newSession` rejections (auth, bad config, mid-init crashes) had no test affordance. All 151 serve tests pass. * docs(serve): update daemon-client-quickstart for §02 single-workspace Round-3 review caught that the SDK example doc was the only one of the three serve-related docs that the §02 refactor didn't touch. Updated: - Boot log example now shows the `, workspace=/path/to/your-project` suffix that `runQwenServe` emits after the §02 changes. - The "Hello daemon" example now reads `caps.workspaceCwd` off `/capabilities` and passes it back as `workspaceCwd` on session creation — illustrating the documented pre-flight pattern, not a hand-written literal that may not match the daemon's actual bind. - Shared-session example makes the prerequisite explicit: the daemon must be bound to `/work/repo` (via `--workspace` or `cd`); under §02 two clients can only share a session if they're both hitting a daemon already bound to that workspace. - New "Workspace mismatch" section shows how to handle the `400 workspace_mismatch` error class: catching `DaemonHttpError`, branching on `body.code`, surfacing `boundWorkspace` / `requestedWorkspace` for the operator. This is a new error class SDK consumers' error handlers should branch on. No code changes; docs only. * feat(sdk,test): align SDK types + integration tests with §02 single-workspace Round-4 review caught one type-drift gap + a set of integration-test assumptions that the §02 refactor invalidated. **SDK type drift.** `DaemonCapabilities` in `packages/sdk-typescript/src/daemon/types.ts` was the SDK-side mirror of `CapabilitiesEnvelope` on the daemon side. The §02 PR added `workspaceCwd: string` to the daemon envelope (and the round-3 doc example reads `caps.workspaceCwd` off the SDK client) but the SDK type wasn't updated. A TypeScript consumer copying the doc snippet verbatim would hit `TS2339 'workspaceCwd' does not exist on type 'DaemonCapabilities'`. The wire field is present so JS consumers wouldn't notice — but the SDK is marketed as a TypeScript quickstart, so this is a real onboarding break. Fix: add `workspaceCwd: string` to `DaemonCapabilities` (parallel to `DaemonSession.workspaceCwd` which is already there). The SDK unit test for `client.capabilities()` was updated to put the new field in the mocked response. **Integration tests.** `qwen-serve-routes.test.ts` spawns a real `qwen serve` daemon in `beforeAll`. Three breakages exposed: 1. The daemon was launched without `--workspace`, so it inherited the test runner's `cwd`. Tests then POST `workspaceCwd: REPO_ROOT` assuming the daemon is bound to the repo root — true when run via `npm test` from the repo, brittle from IDEs / launchers that have a different `cwd`. Added `'--workspace', REPO_ROOT` to the spawn args so the bound workspace is deterministic regardless of where the test runner is launched. 2. The `bad modelServiceId` test used `cwd: '/tmp'`. Under §02 this would now return 400 workspace_mismatch before the session was spawned. Switched to `REPO_ROOT` and softened the `attached` assertion (REPO_ROOT may already have a session from earlier tests in the suite under sessionScope:single). 3. Added three new integration tests pinning the §02 surface end-to-end through a real daemon process: - `rejects cross-workspace cwd with 400 workspace_mismatch` — posts `/tmp` and asserts the full structured error body (`code`, `boundWorkspace`, `requestedWorkspace`). - `omits cwd → falls back to bound workspace` — posts an empty body and asserts the response's `workspaceCwd` matches REPO_ROOT (verifies the runQwenServe → createServeApp → bridge fallback plumbing). - `GET /capabilities surfaces workspaceCwd` — asserts the new SDK type field is populated correctly off the wire. All 422 unit tests pass (cli serve + sdk). Integration tests typecheck clean. * fix(serve): address /review feedback from gpt-5.5 + deepseek-v4-pro Process the 7 inline /review comments on PR QwenLM#4113: - C1+C3 (SDK): make `DaemonCapabilities.workspaceCwd` and `CreateSessionRequest.workspaceCwd` optional in the SDK types. `workspaceCwd` is an additive field on the v=1 envelope per QwenLM#3803 §02; the protocol's "bump v only on incompatible changes" stance is honored by leaving the field optional at the type level. `DaemonClient.createOrAttachSession` now omits `cwd` from the body when `workspaceCwd` isn't passed, matching the PR description's "SDK accepts bound path or none". Adds a unit test pinning the empty-body shape. - C2 (docs/users/qwen-serve.md): the `--http-bridge` row described the pre-§02 per-session model; updated to reflect one child per daemon with N sessions multiplexed via ACP `newSession()`. - C4 (server.ts): `WorkspaceMismatchError` was silently 400'ing without a stderr breadcrumb, leaving operators blind to cross-workspace routing drift. Mirrors the SessionLimitExceeded /InvalidPermissionOption observability pattern. - C5 (server.test.ts): the `/capabilities` fallback test compared `res.body.workspaceCwd` against raw `process.cwd()`; on macOS default tmpdir flows (`/var/folders/...` → `/private/var/...`) the canonicalize-once route value diverges. Use `realpathSync.native(process.cwd())` to match the route's canonicalization. - C6 (server.ts): the cwd-not-absolute error said "cwd is required and must be an absolute path" but cwd is now optional under §02. Tightened wording to "must be an absolute path when provided". - C7 (runQwenServe.ts): the `statSync` catch only wrapped ENOENT with a friendly diagnostic; EACCES / EPERM (typical for SIP-protected dirs on macOS or root-owned paths the daemon's UID can't traverse) re-threw as raw `SystemError`. Wrap both codes with a `--workspace`-context message so the boot failure points at the flag the operator set. Docs: quickstart shows the explicit-pass-or-omit options side by side; protocol reference notes `workspaceCwd` is additive to v=1. * fix(serve/test): make /work/bound literals Windows-portable Windows CI failed on this PR's two new tests because returns (drive-relative absolute), so the route's canonicalize step diverged from the hardcoded literal. Mirror the WS_A/WS_B pattern already used in httpAcpBridge.test.ts: define WS_BOUND / WS_DIFFERENT via `path.resolve(path.sep, …)` and use the constants everywhere. The 400 workspace_mismatch test would still have passed (mock controls both throw + assertion) but I aligned it for consistency. Failures from CI run 25806528710: expected 'D:\work\bound' to be '/work/bound' (Object.is) Affected tests: - createServeApp > GET /capabilities > reports the bound workspace - createServeApp > POST /session > 200 when cwd is omitted * fix(serve): address second /review round (gpt-5.5 + deepseek-v4-pro) Four new inline findings from the latest /review pass: - N1 (integration-tests/cli/qwen-serve-routes.test.ts) — Critical: the `workspace_mismatch` assertion compared `requestedWorkspace` against the literal `'/tmp'`, but the bridge canonicalizes via `realpathSync.native` and on macOS `/tmp` is a symlink to `/private/tmp`. Compare against `realpathSync.native('/tmp')` so the assertion is portable. - N2 (packages/cli/src/serve/types.ts): `CapabilitiesEnvelope.workspaceCwd: string` (server side) diverged from the SDK's `DaemonCapabilities.workspaceCwd?: string`. Made the server type optional too — matches the SDK, matches the protocol doc's "additive to v=1" framing, doesn't change runtime emission (the post-§02 server still always populates the field). - N3 + N4 (packages/cli/src/serve/server.ts + sdk-typescript/.../DaemonClient.ts): the route's `cwd` validation treated every non-string body value (`null`, `123`, `{}`, `[]`) the same as omitted, silently falling back to `boundWorkspace`. That hid client/orchestrator serialization bugs as "session attached to wrong workspace". Now the route uses `'cwd' in body` to detect presence and rejects presence-but-not-a-string with `400 'cwd must be a string absolute path when provided'`. Empty string still hits the existing `path.isAbsolute` branch ("must be an absolute path when provided"), so an SDK caller passing `workspaceCwd: ''` no longer silently lands in the daemon's bound workspace. SDK side: reverted my conditional spread to `cwd: req.workspaceCwd` unconditional. `JSON.stringify` strips `undefined` automatically (so omitted `workspaceCwd` becomes "no `cwd` key" on the wire, as before), but empty-string is now forwarded verbatim and the server's 400 surfaces the bug instead of the SDK swallowing it. Added a unit test pinning the empty-string-forwarded shape. Server tests: - `400 when cwd is present but not a string` covers null / number / object / array via a sub-loop. - `400 when cwd is the empty string` pins the isAbsolute path. bridge: 73/73; server: 80/80 (was 78, +2 new); SDK: 40/40 (was 39, +1 empty-string test). tsc clean for SDK and PR-touched CLI files. * fix(serve): use const cwd in POST /session (prefer-const lint) CI lint failed with packages/cli/src/serve/server.ts:199:9 prefer-const: 'cwd' is never reassigned. The wave-4 rewrite split the original 'let cwd; if (!cwd) cwd = boundWorkspace' into a single ternary, which removes the only mutation path; the variable should be const accordingly. * fix(serve): address third /review round (gpt-5.5 + glm-5.1 + deepseek-v4-pro) Five new inline findings; M1 was already resolved in 1c7f5f0. - M2 (httpAcpBridge.ts): drop the dead `ChannelInfo.workspaceCwd` field. Pre-§02 it was the routing key for `byWorkspaceChannel.get`; after the §02 collapse all reads target `SessionEntry.workspaceCwd` and `ChannelInfo.workspaceCwd` was only written, never read. Per- channel storage also suggests variance the "1 daemon = 1 workspace" model forbids. Removing the field encodes the single-workspace invariant in the type itself; left a stub comment so future readers don't reintroduce it. - M3 (httpAcpBridge.ts): fast-path `canonicalizeWorkspace` when `req.workspaceCwd === boundWorkspace`. The §02 recommended client flow is `caps.workspaceCwd` → POST `cwd: caps.workspaceCwd`, and the omit-cwd route in server.ts synthesizes the same equality. Both hit the equality check and skip the sync `realpathSync.native` syscall. Non-equal inputs fall through to the full canonicalize (clients sending `/work/./bound`, mixed casing on case-insensitive FS, symlink aliases) so correctness is unchanged. - M4 (httpAcpBridge.ts): operator stderr breadcrumb in the `channel.exited` handler. An agent crash (OOM / segfault) used to be silent on the daemon side — the child-stderr forwarder caught whatever the child wrote before dying (often nothing on SIGKILL/segfault), and SSE subscribers saw `session_died` frames but operators reading `qwen serve`'s own output had no signal that the agent process was gone. Log code+signal+affected-session-count so the line is the canonical "agent disappeared" indicator. - M5 (server.ts): documentation-only. The reviewer wanted `createServeApp` to validate `opts.workspace` exists + is a directory (currently only `runQwenServe` does). Trade-off: doing that breaks 4 existing tests which pass synthetic `/work/bound` on purpose to exercise route-layer behavior without a real directory. Deferred the helper extraction; added a JSDoc note pinning the contract so future entry points binding `createServeApp` to user input know to replicate the validation. - M6 (runQwenServe.ts): pass the already-canonical `boundWorkspace` into `createServeApp` via `opts.workspace`. `canonicalizeWorkspace` is idempotent so the server-side recanonicalize is a no-op today, but if a future refactor ever makes it non-idempotent the values the route advertises on `/capabilities` and the bridge enforces would diverge — landing clients in a "/capabilities says X, POST /session/X returns workspace_mismatch" contradiction. Removes the drift risk. bridge: 73/73; server: 80/80; tsc clean for PR-touched files. * fix(serve,sdk): address fourth /review round (deepseek-v4-pro x2) Two new inline findings: - O1 (server.ts): the POST /session route uses `'cwd' in body` against `safeBody`'s `Object.create(null)` output to distinguish "client omitted cwd" from "client sent cwd". The semantics quietly couple to `safeBody`'s literal strip list (`__proto__/constructor/prototype`). If a future maintainer adds a user-facing key (e.g. `cwd`) to that strip list, the route's presence-check would silently flip to "absent → fallback", masking the bug as "wrong workspace bound." Extracted `PROTOTYPE_POLLUTION_KEYS: ReadonlySet<string>` as a named module-scope constant; safeBody uses `.has()` on it (behavior unchanged); the route's comment now cross-references the const so the coupling is documented at both ends. The const's JSDoc spells out what to do if the strip set ever has to grow into user-key territory. - O2 (sdk-typescript): `DaemonCapabilities.workspaceCwd` is `string | undefined` (additive to v=1; pre-§02 daemons omit). SDK consumers that pass it into a `string` context get a TS strict error or, against an old daemon, a runtime `Cannot read properties of undefined`. Added a `requireWorkspaceCwd` helper + `DaemonCapabilityMissingError` so consumers can opt into an actionable `DaemonCapabilities.workspaceCwd is missing — introduced in QwenLM#3803 §02 …` error instead. Exported both from `@qwen-code/sdk`'s top-level module + the `daemon/` sub-module. Unit tests cover populated, missing, and empty-string inputs. bridge: 73/73; server: 80/80; SDK DaemonClient: 43/43 (was 40, +3 new requireWorkspaceCwd cases). tsc clean for SDK and PR-touched CLI files. * fix(serve): address tanzhenxin REQUEST_CHANGES (cold-spawn + streaming-test bind) Two findings from the CHANGES_REQUESTED review on PR QwenLM#4113. - T1 (integration-tests/cli/qwen-serve-streaming.test.ts) — high severity: the daemon spawn in `beforeAll` did not pass `--workspace REPO_ROOT`, so under §02 the daemon bound to whatever cwd the test runner was invoked from. Every later `createOrAttachSession({ workspaceCwd: REPO_ROOT })` then 400'd with `workspace_mismatch`, and the entire file — child-crash recovery, multi-client first-responder permission, Last-Event-ID resume — silently no-op'd once `SKIP_LLM_TESTS` was unset. The sibling `qwen-serve-routes.test.ts` got the same fix earlier in this PR; this file was missed in that pass. Added the flag with a comment pointing at the rationale so the omission can't recur. - T2 (packages/cli/src/serve/httpAcpBridge.ts) — medium severity: cold-spawn window orphans the agent child on double-Ctrl+C. The `qwen --acp` child exists from the moment `channelFactory` spawns it, but pre-fix the bridge only added the channel to `aliveChannels` AFTER `connection.initialize()` returned. During the up-to-`initTimeoutMs` (default 10s) handshake window `aliveChannels` was empty, and a double-Ctrl+C in that window played out as: first SIGINT entered `shutdown()` and awaited the in-flight spawn; second SIGINT called `killAllSync()` against an empty set; `process.exit(1)` orphaned the child. Same class of bug the BkUyD invariant set out to close — the post-init overwrite race was covered, the pre-init handshake window wasn't. Fix: move `info` creation + `aliveChannels.add(info)` + the `channel.exited` handler registration BEFORE the `initialize` await. Init-failure / late-shutdown / child-crash-during-handshake all converge on the same cleanup path: mark `isDying = true`, `await channel.kill()`, let the exited handler `aliveChannels .delete(info)` once the OS reaps the process. `channelInfo` (the attach target) is still assigned LAST so `ensureChannel`'s fast-path never returns a still-handshaking channel. Regression test: `killAllSync force-kills the channel during the initialize handshake` uses a bespoke factory whose agent's `initialize` never resolves and asserts `killAllSync` fires killSync against the channel during the handshake window. Pre-fix the test would observe an empty `killSyncCalls` array. bridge: 74/74 (was 73, +1 cold-spawn test); server: 80/80; tsc clean for PR-touched files. * fix(serve): address third /review round (gpt-5.5 + glm-5.1 + deepseek-v4-pro) Eight new inline findings; six applied, two deferred-with-reply. - P1 (httpAcpBridge.ts init-failure isDying comment): my comment overstated what `info.isDying` accomplishes on the init-failure path — concurrent `ensureChannel()` callers don't bypass via `isDying`, they coalesce on `inFlightChannelSpawn` and observe the same rejection. Reworded to describe the actual cross-path invariant marker. - P2 (server.ts workspace_mismatch log injection): doudouOUC flagged log injection via `err.requested` (user-controlled). `path.resolve` + `realpathSync.native` preserve control chars in path segments, so a body `{"cwd": "/legit/path\nqwen serve: FAKE LOG"}` would emit two valid-looking daemon log lines on stderr — weaponizing line-based log shippers (Splunk / Loki / journald → SIEM). `JSON.stringify` both `err.bound` and `err.requested` in the log line escapes control chars + quotes the values, making any injection attempt visible-as-quoted-noise rather than forged-line. Bound is operator-controlled and inherently safe but quoted symmetrically for readability. The defense-in-depth alternative (reject control chars in canonicalizeWorkspace) is deferred — this single log site was the actionable interpolation; future workspace-path-into-stderr / -JSON / -templated-SQL flows can pick up the rejection if they ship. - P3 (httpAcpBridge.test.ts): refactor the cross-workspace WorkspaceMismatchError test to a single `.catch((e) => e)` capture rather than firing the rejection twice (once for the `rejects .toBeInstanceOf` matcher, once for the field assertions). Logic unchanged. - P4 (httpAcpBridge.ts channel.exited log): the `qwen serve: channel exited (...)` line fired on every channel exit including planned shutdown — alarming for operators who Ctrl+C'd a healthy daemon. Guarded with `if (!shuttingDown)` so the planned-shutdown case (operator already saw `received SIGINT, draining...`) stays silent. The killSession path (last session leaves, daemon stays up — no top-level context line) still logs, since the line is the only signal that the cleanup actually ran. - P5 (httpAcpBridge.ts): light trim of the "pre-fix" narrative voice in two comment blocks (cold-spawn ensureChannel layout + BkUyD killAllSync aliveChannels iteration). Kept the invariant explanations — those carry maintenance value — dropped the "pre-fix the code did X" framing that's review-context not future-reader context. - P6 (server.ts + runQwenServe.ts): `createServeApp` now accepts a pre-canonicalized `deps.boundWorkspace` to skip its own `canonicalizeWorkspace` syscall when the caller (runQwenServe) already did the work. Replaces my earlier `{...opts, workspace: boundWorkspace}` opts-mutation hack — cleaner separation of concerns + drops one `realpathSync.native` per boot. Direct callers (tests, embeds) that omit `deps.boundWorkspace` still get the in-body canonicalize path. - P8 (httpAcpBridge.ts): defensive `aliveChannels.size > 2` warning. The set is intentionally multi-entry to cover the killSession-then-spawnOrAttach overlap window (size 2 is legitimate). Anything higher implies a `channel.exited` handler never fired for a prior channel — a real leak we'd otherwise catch only as gradually-growing RSS. The warning surfaces it the moment it happens. - P7 (CreateSessionRequest.workspaceCwd optional): deferred with reply rationale. Making the field optional is the §02 design ("SDK accepts bound path or none"); the JSDoc already explains the omit-vs-explicit choice; Stage 1 has no shipping SDK consumers so there's no breakage to call out in a changelog file. No code change. bridge: 74/74 (cross-workspace test refactor + behavioral assertions unchanged); server: 80/80; SDK 43/43. tsc clean for PR-touched files. * fix(serve): apply auto-fixes from /review (QwenLM#4113) - canonicalizeWorkspace: narrow catch to ENOENT only, propagate other filesystem errors - listWorkspaceSessions: add fast-path string equality to avoid realpathSync on every poll - GET /workspace/:id/sessions: return 400 workspace_mismatch for cross-workspace queries - SessionNotFoundError: accept optional extra message; clarify agent-crash-on-spawn case - requireWorkspaceCwd: distinguish empty-string (post-§02 bug) from absent (pre-§02 daemon) * fix(serve/test): bind workspace explicitly in GET /workspace tests Wave-5 commit 0c6e963 ("apply auto-fixes from /review (QwenLM#4113)") added a 400 workspace_mismatch reject path to GET /workspace/:id/sessions for cross-workspace queries, but the existing two happy-path tests queried `/work/a` / `/work/idle` against an unbound daemon (which falls back to `process.cwd()`). Both turned to 400 in CI. Bind the daemon to WS_BOUND in both happy-path tests and query the same path. Add a third regression test that pins the §02 cross-workspace rejection contract — `code: workspace_mismatch`, both paths in the body, bridge.listCalls untouched (no silent fallback regression). Brings server.test.ts from 80 → 82 tests, all passing. * fix(serve,sdk): address fourth /review round (deepseek-v4-pro x2) Six new inline findings; five applied, one defer-with-reply. - Q1 (httpAcpBridge.ts + server.ts + tests): cwd length amplification through WorkspaceMismatchError. The error constructor interpolates `requested` into `.message` TWICE; `sendBridgeError` echoes it on stderr (now JSON.stringify-wrapped); `res.json` echoes it again — a ~10 MB `cwd` body (right under express.json's 10 MB cap) would amplify to ~60 MB per request × maxConnections (default 256). On loopback-default-no-token deployments this is pre-auth. Added `MAX_WORKSPACE_PATH_LENGTH = 4096` (Linux PATH_MAX); route rejects oversized `cwd` with a 400 BEFORE the bridge is touched, and the `WorkspaceMismatchError` constructor truncates `requested` as defense-in-depth for non-route callers (tests, embeds, future entry points that throw the error directly). Three new tests pin the route 400, the constructor truncation, and the normal-path passthrough. - Q2 + Q5 (httpAcpBridge.ts docs): the `channelInfo` declaration comment + `ChannelInfo.sessionIds` JSDoc + `ChannelInfo.isDying` JSDoc all overstated when `channelInfo` is cleared. Post-§02 the BkUyD invariant is "ONLY `channel.exited` clears `channelInfo`" — teardown initiators (killSession last-session-leaving, doSpawn-newSession-failure, ensureChannel init-failure/late- shutdown, shutdown) set `isDying = true` but LEAVE `channelInfo` pointing at the dying channel until OS reap, so `killAllSync` can still reach it through `aliveChannels`. A future maintainer reading the old phrasing might "fix" killSession to also clear `channelInfo` and silently break the double-Ctrl+C force-kill path. Rewrote all three sites to describe the actual invariant + enumerate the 5 isDying set-sites + spell out the BkUyD rationale in one place (the `isDying` JSDoc) that other comments point at. - Q3 (runQwenServe.ts): the "listening on …" boot summary goes to stdout but every other operational diagnostic (bearer auth, the workspace_mismatch breadcrumb, channel-exited, bridge errors) goes to stderr. Operators capturing only stderr (systemd / docker / k8s default) miss the `workspace=` indicator, which is the single piece of information they need most when triaging §02 migration issues. Added a `qwen serve: bound to workspace "X"` stderr line alongside the stdout one — keeps stdout untouched (integration tests + scripts parse it) while making the breadcrumb visible to stderr-only log shippers. `JSON.stringify` the boundWorkspace value (operator-controlled but cheap defense-in-depth against any future flow that lands a control char in the path). - Q4 (integration-tests/tsconfig.json): the `paths` entry resolved `@qwen-code/sdk` to the SDK's built `dist/` directory; `dist/` is gitignored and stale dist (no `npm run build` first) yields TS2339 errors on the integration tests' imports of new SDK fields. Pointed `paths` at SDK source instead — `tsc -p integration-tests/tsconfig.json` no longer requires a prior rebuild. The vitest config's runtime alias still resolves to `dist/index.mjs` so the actual test execution exercises the published-bundle shape; this paths entry only affects type resolution. - Q6 (httpAcpBridge.ts): `createHttpAcpBridge` constructor called `canonicalizeWorkspace(opts.boundWorkspace)` even when the caller (`runQwenServe`) had already canonicalized and threaded the same value through `deps.boundWorkspace` into `createServeApp`. Two independent `realpathSync.native` calls can theoretically diverge on NFS-transient / mid-rename filesystems, landing the bridge with a canonical form different from what `/capabilities` advertises and from `createServeApp`'s view. Dropped the bridge's re-canonicalize; kept `path.isAbsolute` (structural, not a syscall); documented the caller contract on `BridgeOptions .boundWorkspace` ("MUST be pre-canonicalized; tests/embeds call `canonicalizeWorkspace` first"). Tests use `path.resolve(path.sep, ...)` which is already canonical-or- fallback for non-existent paths, so no test changes needed. bridge: 76/76 (was 74, +2 WorkspaceMismatchError truncation tests); server: 82/82 (was 80, +2 length cap + the auto-applied helper). tsc clean for SDK, CLI PR-touched files, and integration-tests' qwen-serve-*. (cherry picked from commit 790f2d0)
Four test calls still referenced the old upstream name after the 1-daemon=1-workspace cherry-pick (QwenLM#4113). Update them to match the renamed export. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Stage 1 (PR #3889, merged) shipped
qwen servewith M-workspaces-per-daemon routing — a daemon process hosted sessions for arbitrarily many workspaces, keyed by the request'scwd. The §02 architectural revision indocs/comparison/qwen-code-daemon-design/02-architectural-decisions.mdnarrows that to 1 daemon = 1 workspace × N sessions: every daemon binds to one canonical workspace path at boot, andPOST /sessionwith a mismatchedcwdreturns400 workspace_mismatch. Multi-workspace deployments run multiple daemon processes (one per workspace, supervised by systemd / docker-compose / k8s / aqwen-coordinatorreference orchestrator).Closes issue #3803 §02.
Why this change is needed now
Stage 1 was deliberately small-scope, but the routing layer had hidden coupling that gets worse under Stage 2 (in-process) and Stage 4+ (sandboxed):
acpAgent.ts:601doesthis.settings = loadSettings(cwd)on everynewSession— same-process cross-workspace sessions stomp on each other's settings.Running one daemon per workspace also gives operators the clean failure-isolation story they already expect from systemd / k8s: a wedged workspace doesn't take down the others.
Scope: what this PR does NOT change
Only the HTTP daemon (
qwen serve) is affected. The PR diff touches zero files inpackages/cli/src/acp-integration/*orpackages/cli/src/commands/acp.ts. Concretely:qwen --acpdirect ACP clients (Zed / Cursor / custom stdio client)newSession({cwd, mcpServers})still accepts a per-sessioncwd. Multi-cwd sessions on one ACP child are still supported at the protocol layer, subject to the pre-existingloadSettings(cwd)cross-workspace interaction noted above (not introduced by this PR).POST /sessionwith a mismatchedcwdreturns400 workspace_mismatchcarrying bothboundWorkspaceandrequestedWorkspaceso an orchestrator-aware client can route correctly.DaemonClient(TypeScript SDK)DaemonCapabilities.workspaceCwd: stringadded;createOrAttachSessionaccepts the bound path or none (route falls back). Cross-workspace POST surfaces asDaemonHttpErrorwithbody.code === 'workspace_mismatch'.qwen --acpchild viaconnection.newSession()— same process, same OAuth state, same file/hierarchy caches.Note that pre-PR Stage 1 daemon also never multiplexed cross-cwd sessions on a single child — it spawned a separate child per workspace precisely because of the
loadSettings(cwd)issue. The §02 refactor collapses the now-redundant per-workspace routing layer; the per-cwd isolation goal is preserved by running separate daemon processes externally.What changed
Bridge state — maps collapse to single optional slots (
httpAcpBridge.ts):byWorkspaceChannel: Map<string, ChannelInfo>channelInfo?: ChannelInfo+aliveChannels: Set<ChannelInfo>inFlightChannelSpawns: Map<string, Promise>inFlightChannelSpawn?: PromisebyWorkspace: Map<string, SessionEntry>defaultEntry?: SessionEntryliveChannels: Set<ChannelInfo>(BkUyD invariant)aliveChannels— entries removed only when each channel'schannel.exitedfires, sokillAllSyncfinds both dying and fresh attach-target during akillSession-then-spawnOrAttachoverlapA
ChannelInfo.isDying: booleanflag separates "channel is being torn down" from "channel is gone."ensureChannelskips dying channels and spawns a fresh one;killAllSync/shutdowniteratealiveChannelsso neither the dying nor the freshly attached channel is missed.API:
BridgeOptions.boundWorkspace: stringis now required.WorkspaceMismatchErroris thrown fromspawnOrAttachwhen the request's canonical cwd doesn't match the bound path. The route layer translates to400 workspace_mismatchwith both paths in the body.CapabilitiesEnvelope.workspaceCwd: stringsurfaces the bound path so clients pre-flight check + omitcwdfromPOST /session(the route falls back to the bound workspace).--workspace <path>CLI flag overridesprocess.cwd()at boot. The flag is validated at boot — absolute path required, must exist, must be a directory (otherwise the agent would fail at child-spawn time with an opaque ENOENT).canonicalizeWorkspaceis exported and called once each byrunQwenServe,createServeApp, and the bridge — so/capabilities.workspaceCwd, thePOST /sessionfallback, and the bridge's mismatch check all agree on a single canonical form (no symlink / case-insensitive-FS drift between what the daemon advertises and what it enforces).SDK (
packages/sdk-typescript):DaemonCapabilitiesnow includesworkspaceCwd: string, mirroring the wire shape. Without this, TypeScript consumers copying the new quickstart snippet would hitTS2339.Tests:
makeBridge()helper inhttpAcpBridge.test.tsdefaultsboundWorkspace: WS_A; tests that need a different bind (the mismatch test) pass it explicitly.does NOT reuse across workspaces→ rewritten asrejects cross-workspace requests with WorkspaceMismatchError.shutdown kills every live channelretargeted to single-channel multi-session shutdown.killAllSync force-kills channels...(BkUyD)retargeted to single-channel — the invariant is the same (channel reference must outlive eager shutdown clearing), the surface is just smaller. A separatekillAllSync force-kills BOTH the dying channel AND the fresh attach-targettest pins the overlap-race scenario.killSession marks the channel dying so concurrent spawnOrAttach gets a fresh channelanddoSpawn newSession-failure marks the empty channel dying so the next spawn gets a fresh onepin theisDyingflag's two set-sites.listWorkspaceSessionscross-workspace assertion now expects empty for the un-bound path.--max-sessionscap test uses two thread-scope sessions onWS_Ainstead ofWS_A+WS_B.server.test.tsgainsrunQwenServeE2E tests:--workspaceflows through to/capabilities, rejects non-existent / non-directory / relative workspace paths at boot.integration-tests/cli/qwen-serve-routes.test.tsdaemon is now launched with--workspace REPO_ROOTfor deterministic binding; three new integration tests pin theworkspace_mismatchbody shape, cwd-omission fallback, and/capabilities.workspaceCwdover a real daemon process.Docs:
docs/users/qwen-serve.md— adds the--workspaceflag row to the CLI table, rewrites "Multi-session & remote deployment" as "Multi-session & multi-workspace deployment" with the multiple-daemon-processes pattern, updates the Quickstart to omitcwdfromPOST /session.docs/developers/qwen-serve-protocol.md— addsWorkspaceMismatchErrorto the common error shapes,workspaceCwdto the/capabilitiesenvelope, flipsPOST /session'scwdfield from required to optional.docs/developers/examples/daemon-client-quickstart.md— example now readscaps.workspaceCwdand passes it back asworkspaceCwdon session creation. New "Workspace mismatch" section shows theDaemonHttpError/body.code === 'workspace_mismatch'branching pattern.Net LOC: +1121 / -374 across 13 files. Bridge implementation shrinks ~150 LOC after removing the routing-map machinery; tests + docs + SDK type update account for the rest.
Test plan
vitest run packages/cli/src/serve/httpAcpBridge.test.ts— 73/73 passingvitest run packages/cli/src/serve/server.test.ts— 78/78 passingvitest run packages/sdk-typescript— 256/256 passingnpx tsc --noEmit -p packages/cli/tsconfig.json— cleannpx tsc --noEmit -p packages/sdk-typescript/tsconfig.json— cleannpx tsc --noEmit -p integration-tests/tsconfig.json— clean forqwen-serve-*(pre-existing errors in other integration tests unrelated to this PR)vitest runon packages/cli — only failure is an unrelateddoctorChecks.test.ts > should pass Node.js version check for v22+(env runs Node 20)qwen serve --workspace /tmp/qwen-serve-e2e-ws+ curl POST /session with matching cwd → 200; with mismatched cwd → 400workspace_mismatchwith both paths in body (verified locally — see "Local E2E verification" below)/capabilitiesreturnsworkspaceCwd: "/tmp/qwen-serve-e2e-ws"; client can omitcwdonPOST /session(route falls back, response carries bound path)qwen --acpchild was reaped, not orphaned)Local E2E verification (12 scenarios)
Built the cli (
cd packages/cli && npm run build) and ran against the resultingdist/index.jsbinary. Highlights:Boot-time
--workspacevalidation:Symlink canonicalization (the round-1 fix). With
/tmp/qwen-symlink-ws → /tmp/qwen-real-target, bootingqwen serve --workspace /tmp/qwen-symlink-ws:BkUyD double-SIGINT:
Breaking changes
BridgeOptions.boundWorkspacebecomes required. External consumers ofcreateHttpAcpBridgedirectly (rather thanrunQwenServe/createServeApp) must now pass it. None of the merged code paths in this repo lack it; the change is a TypeScript-level compile error if you do hit it.POST /sessionwith a cross-workspacecwdnow returns400 workspace_mismatchinstead of silently spawning a newqwen --acpchild in that other workspace. Clients that pointed one daemon at multiple project directories need to start one daemon per directory and route accordingly.--http-bridgeis unchanged; there was no shipped--multi-workspaceopt-in to remove (the design doc's deprecated terms —Workspace Bridge/ChannelInfo/byWorkspaceChannel— were internal implementation details, not user-facing flags).Direct ACP clients (
qwen --acpover stdio) are NOT affected — see "Scope: what this PR does NOT change" above.Follow-ups (not in this PR)
Per the deprecated-terms list in §02 doc and the chiga0 / tanzhenxin review backlog:
@qwen-code/acp-bridgepackage extraction (chiga0 must-have pre-release: fix ci #1) — lifts the channel + transport seams out of the daemon so the agent's TUI in-process variant + the daemon's HTTP variant share one primitive.sessionScopeoverride onPOST /session(chiga0 must-have Where is the config saved? #2) — daemon-wide setting is too coarse for IDE extensions that want a private session per editor window.loadSession/unstable_resumeSessionover HTTP (chiga0 must-have 如何自定义密钥文件 .env可能与其他文件冲突 #3).简体中文版本(点击展开)
摘要
Stage 1(PR #3889,已合入)提交的
qwen serve采用 M-workspaces-per-daemon 路由 —— 一个 daemon 进程按请求的cwd为 key 服务任意多个 workspace。§02 架构修订(详见docs/comparison/qwen-code-daemon-design/02-architectural-decisions.md)将其收窄为 1 daemon = 1 workspace × N sessions:每个 daemon 在启动时绑定一个 canonical workspace 路径,POST /session携带不匹配的cwd时返回400 workspace_mismatch。多 workspace 部署改为多个 daemon 进程(每个 workspace 一个,由 systemd / docker-compose / k8s /qwen-coordinator参考编排器管理)。Closes #3803 §02。
为什么现在做这个改动
Stage 1 是有意做小范围,但路由层有隐性耦合,到 Stage 2(in-process)和 Stage 4+(sandboxed)会更糟:
acpAgent.ts:601每次newSession都会this.settings = loadSettings(cwd)—— 同进程跨 workspace 的 session 会互踩 settings。让 daemon 一个 workspace 也能给运维一个干净的故障隔离故事,跟 systemd / k8s 的预期一致:一个 workspace wedge 不会拖垮其他的。
范围说明:这个 PR 不改的部分
只有 HTTP daemon(
qwen serve)受影响。PR diff 在packages/cli/src/acp-integration/*或packages/cli/src/commands/acp.ts一个文件都没动。具体:qwen --acp直连 ACP 客户端(Zed / Cursor / 自写 stdio 客户端)newSession({cwd, mcpServers})仍然接受每 session 的cwd。一个 ACP child 多个不同 cwd 的 session 在协议层仍然支持(前置限制:loadSettings(cwd)跨 workspace 互踩,这个不是本 PR 引入的)。POST /session带不匹配cwd返回400 workspace_mismatch,body 携带boundWorkspace+requestedWorkspace,orchestrator 客户端可以据此路由。DaemonClient(TypeScript SDK)DaemonCapabilities.workspaceCwd: string;createOrAttachSession接受绑定路径或省略(route fallback)。跨 workspace POST 通过DaemonHttpError抛出,body.code === 'workspace_mismatch'。connection.newSession()仍然共享一个qwen --acpchild —— 同一进程、同一 OAuth、同一文件 / 层级缓存。注意 PR 前 Stage 1 daemon 本来就 不在一个 child 上多路复用跨 cwd session —— 它对每个 workspace 单独 spawn 一个 child,恰恰是因为上面的
loadSettings(cwd)问题。§02 把这个早已冗余的 per-workspace 路由层删掉;per-cwd 隔离的目标改由"在外部跑多个 daemon 进程"来达成。改动内容
Bridge 状态 —— 多个 Map 收敛为单 optional slot(
httpAcpBridge.ts):byWorkspaceChannel: Map<string, ChannelInfo>channelInfo?: ChannelInfo+aliveChannels: Set<ChannelInfo>inFlightChannelSpawns: Map<string, Promise>inFlightChannelSpawn?: PromisebyWorkspace: Map<string, SessionEntry>defaultEntry?: SessionEntryliveChannels: Set<ChannelInfo>(BkUyD 不变式)aliveChannels—— 仅在每个 channel 的channel.exited触发时移除,所以killAllSync能同时找到killSession-then-spawnOrAttach重叠期间的 dying channel 和 fresh attach-targetChannelInfo.isDying: booleanflag 区分"channel 正在 tear down"和"channel 已经消失"。ensureChannel跳过 dying channel 并 spawn 一个新的;killAllSync/shutdown遍历aliveChannels,保证 dying 和新挂载的 channel 都不会漏掉。API:
BridgeOptions.boundWorkspace: string现在 required。WorkspaceMismatchError由spawnOrAttach抛出,当请求 cwd canonicalize 后与 bound 路径不匹配时。Route 层转换为400 workspace_mismatch,body 携带两条路径。CapabilitiesEnvelope.workspaceCwd: string暴露 bound 路径,客户端可以预检 + 在POST /session中省略cwd(route 会 fallback 到 bound workspace)。--workspace <path>CLI flag 覆盖 boot 时的process.cwd()。Flag 在 boot 时校验 —— 必须 absolute、必须存在、必须是 directory(否则 agent 在 child spawn 时会得到一个让人摸不着头脑的 ENOENT)。canonicalizeWorkspace导出,由runQwenServe、createServeApp和 bridge 各调用一次 ——/capabilities.workspaceCwd、POST /sessionfallback、bridge 的 mismatch 检查全部归约到同一 canonical form(不会因 symlink / 大小写不敏感 FS 在 daemon 对外宣告值和内部判断值之间产生 drift)。SDK(
packages/sdk-typescript):DaemonCapabilities增加workspaceCwd: string,与 wire 形状对齐。否则 TypeScript 消费者照搬新 quickstart 代码会撞TS2339。测试:
httpAcpBridge.test.ts新增makeBridge()helper 默认boundWorkspace: WS_A;需要其他绑定路径的测试(mismatch test)显式传。does NOT reuse across workspaces→ 改写为rejects cross-workspace requests with WorkspaceMismatchError。shutdown kills every live channel重定向到单 channel 多 session shutdown。killAllSync force-kills channels...(BkUyD)重定向到单 channel —— 不变式相同(channel 引用必须 outlive eager shutdown clearing),surface area 缩小。另有独立的killAllSync force-kills BOTH the dying channel AND the fresh attach-target测试覆盖 overlap-race 场景。killSession marks the channel dying so concurrent spawnOrAttach gets a fresh channel和doSpawn newSession-failure marks the empty channel dying so the next spawn gets a fresh one钉住isDyingflag 的两个 set 点。listWorkspaceSessions跨 workspace 断言现在期望未 bound 路径返回空。--max-sessionscap 测试改用两个 thread-scope session onWS_A,而不是WS_A+WS_B。server.test.ts增加runQwenServeE2E 测试:--workspace流经到/capabilities、boot 拒绝不存在 / 非目录 / 相对 workspace 路径。integration-tests/cli/qwen-serve-routes.test.tsdaemon 现在启动时带--workspace REPO_ROOT以获得确定性绑定;三个新 integration test 在真实 daemon 进程上覆盖workspace_mismatchbody 形状、cwd 省略 fallback、/capabilities.workspaceCwd。文档:
docs/users/qwen-serve.md—— CLI table 增加--workspaceflag 一行,将"Multi-session & remote deployment"改写为"Multi-session & multi-workspace deployment"并描述多 daemon 进程模式,Quickstart 改为在POST /session省略cwd。docs/developers/qwen-serve-protocol.md—— 在 common error shapes 增加WorkspaceMismatchError,在/capabilitiesenvelope 增加workspaceCwd,将POST /session的cwd字段从必填改为可选。docs/developers/examples/daemon-client-quickstart.md—— 示例改为读caps.workspaceCwd后传回workspaceCwd作为 session 创建参数。新增 "Workspace mismatch" 段落展示DaemonHttpError/body.code === 'workspace_mismatch'分支处理。净 LOC:13 个文件 +1121 / -374。Bridge 实现移除约 150 LOC(路由 map 机制),其余分布在测试 + 文档 + SDK 类型更新。
测试计划
vitest run packages/cli/src/serve/httpAcpBridge.test.ts—— 73/73 通过vitest run packages/cli/src/serve/server.test.ts—— 78/78 通过vitest run packages/sdk-typescript—— 256/256 通过npx tsc --noEmit -p packages/cli/tsconfig.json—— cleannpx tsc --noEmit -p packages/sdk-typescript/tsconfig.json—— cleannpx tsc --noEmit -p integration-tests/tsconfig.json——qwen-serve-*clean(其他 integration test 的预先错误与本 PR 无关)vitest run—— 唯一失败是无关的doctorChecks.test.ts > should pass Node.js version check for v22+(dev box 跑 Node 20)qwen serve --workspace /tmp/wsA+ curl POST /session with matching cwd → 200;with mismatched cwd → 400workspace_mismatchbody 带两条路径/capabilities返回workspaceCwd: "/tmp/wsA";客户端可以在POST /session省略cwd破坏性变更
BridgeOptions.boundWorkspace现在 required。createHttpAcpBridge的外部调用方(即不经runQwenServe/createServeApp的)必须显式传。仓库内已合入的代码都符合;如果你的代码不符合,会在 TypeScript 层得到 compile error。POST /session带跨 workspacecwd现在返回400 workspace_mismatch,而不是默默 spawn 一个qwen --acpchild 服务那个 workspace。把一个 daemon 指向多个项目目录的客户端需要改为每个目录起一个 daemon 并相应路由。--http-bridge不变;没有要移除的--multi-workspace用户级 flag(设计文档里的废弃术语 ——Workspace Bridge/ChannelInfo/byWorkspaceChannel—— 都是内部实现细节,不是用户可见 flag)。直连 ACP 客户端(
qwen --acpover stdio)不受影响 —— 详见上方"范围说明:这个 PR 不改的部分"。后续工作(不在本 PR 中)
按 §02 文档的废弃术语列表和 chiga0 / tanzhenxin review backlog:
@qwen-code/acp-bridge包提取(chiga0 must-have pre-release: fix ci #1)—— 把 channel + transport 抽象从 daemon 里 lift 出来,让 agent TUI in-process 变体和 daemon HTTP 变体共享一个原语。POST /session上的 per-requestsessionScope覆盖(chiga0 must-have Where is the config saved? #2)—— daemon 级设置对希望"每窗口一个私有 session"的 IDE 扩展太粗。loadSession/unstable_resumeSessionHTTP 暴露(chiga0 must-have 如何自定义密钥文件 .env可能与其他文件冲突 #3)。