fix(core): retry API request on model-unloaded errors for local model servers#8
Open
B-A-M-N wants to merge 328 commits into
Open
fix(core): retry API request on model-unloaded errors for local model servers#8B-A-M-N wants to merge 328 commits into
B-A-M-N wants to merge 328 commits into
Conversation
There was a problem hiding this comment.
1 issue found across 31 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/src/core/openaiContentGenerator/pipeline.ts">
<violation number="1" location="packages/core/src/core/openaiContentGenerator/pipeline.ts:565">
P2: Streaming retry result is not wrapped with `wrapStreamWithRetry`. When a streaming request initially fails at creation time with a model-unloaded error and the retry succeeds, the returned generator bypasses the `wrapStreamWithRetry` wrapping that the normal path applies. Iteration-time errors in the retry stream will propagate unhandled.</violation>
</file>
Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/src/core/openaiContentGenerator/pipeline.ts">
<violation number="1" location="packages/core/src/core/openaiContentGenerator/pipeline.ts:604">
P1: Buffering all chunks until stream completion defeats the purpose of streaming. Every streaming response is now fully collected before any data reaches the consumer, so users won't see incremental text. This is the equivalent of a non-streaming request from a UX perspective.
Consider yielding chunks immediately and only applying buffering (or a different retry strategy) when the target is known to be a local/JIT model server, or accept that in the rare retry case some chunks may have already been yielded and handle it at a higher level.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
ccc8eb3 to
5ad4d20
Compare
…nLM#4269) * refactor(serve/fs): glob audit hashes workspace + emits pattern Closes PR QwenLM#4250 follow-up #4. Hashing the per-call cwd for glob audit produced a different pathHash for every subdirectory glob without giving operators any actionable difference (raw paths are privacy-gated). Replace the hash basis with the bound workspace itself and surface the literal pattern on a new schema field, so every glob row carries a stable workspace marker and a per-call pattern. The pattern field also fires on parse_error denials (path-escape patterns, non-relative patterns) so audit consumers debugging a production glob rejection can see the exact rejected pattern without needing QWEN_AUDIT_RAW_PATHS=1. * feat(serve): safe workspace file read routes (QwenLM#4175 PR 19) Add four read-only HTTP routes that consume PR 18's per-request WorkspaceFileSystem boundary: - GET /file?path=... text content + meta (encoding/BOM/lineEnding) - GET /list?path=... directory entries (name/kind/ignored) - GET /glob?pattern=... workspace-relative match paths - GET /stat?path=... file/directory metadata The routes share one error envelope (sendFsError) that maps FsError.kind through the boundary's existing DEFAULT_STATUS_BY_KIND table to a typed JSON response. All four 200 responses set Cache-Control: no-store and X-Content-Type-Options: nosniff so a browser-adjacent client cannot cache or sniff source content. Routes are advertised under a single workspace_file_read capability tag — the four endpoints share the same backing boundary and the same failure shape, so per-route tags would force four simultaneous registry entries with no operator-meaningful difference between them. Mutating routes will ship in PR 20 under their own workspace_file_write tag. Trust gate is unchanged: read intents pass on untrusted workspaces per PR 18's policy.ts. Auth follows the global bearer flow only; read routes never run mutate(), since none of them mutate state. * feat(serve): runQwenServe injects fsFactory + emit pipeline Closes PR QwenLM#4250 follow-up #2. runQwenServe now constructs a WorkspaceFileSystem factory from the bound workspace, threads its emit hook through to the read routes, and exposes the trust snapshot via deps.trustedWorkspace. Test additions pin the wiring contract: - audit events emitted on success / denial flow back through the test-supplied fsAuditEmit hook - deps.fsFactory override is honored (built-in default does not silently shadow injection) - trust snapshot defaults to true (operator-chosen workspace) - trust=false routes through to the boundary and trips untrusted_workspace on write intents Default emit stays a stderr warning so a wiring regression that drops events remains visible. PR 21's SSE fan-out will replace the default with a workspace-scoped event channel. * fixup(serve): address PR QwenLM#4269 round-1 review feedback Closes 8 findings from Copilot inline review + Codex review on PR QwenLM#4269 (5 P0, 3 P1): P0 (correctness / privacy / operations) - runQwenServe.ts: throttle the default fsAuditEmit by reusing the exported `createDefaultFsAuditEmit` from server.ts. The earlier per-event `writeStderrLine` would print one line for every /file/list/glob/stat audit event under normal traffic. Now warns once + every 100th drop with payload context, so a wiring regression is still visible without flooding logs. (Copilot runQwenServe.ts:316; Codex runQwenServe.ts:305) - routes/workspaceFileRead.ts: probe glob with maxResults+1 and trim, so `truncated` reflects whether the boundary actually had more matches. Earlier `length === maxResults` heuristic false-positived when the workspace happened to hold exactly N matches. (Copilot workspaceFileRead.ts:399) - routes/workspaceFileRead.ts: glob `relMatches` now flows through the shared `workspaceRelative` helper. Root match (`pattern=.`) renders as "." rather than the empty string `path.relative` returns; helper also covers the boundWorkspace-undefined edge case so the route no longer carries its own fallback branch. (Copilot workspaceFileRead.ts:388; review summary HIGH-1) - fs/audit.ts: `pattern` field now rides on the same privacy gate as `relPath` / `message`. Glob patterns commonly carry workspace-relative or absolute path fragments (`src/secrets/*.env`, rejected `/Users/alice/ws/**`), so emitting them in privacy mode bypassed the same redaction the other path-bearing fields honor. Operators wanting full forensic context opt in via QWEN_AUDIT_RAW_PATHS=1. (Codex audit.ts:249) - routes/workspaceFileRead.ts: cwd resolves with intent='list' rather than 'glob'. The orchestrator's `recordAndWrap` auto-derives `data.pattern` from `intent === 'glob'`, which turned cwd-resolution failures into rows where the cwd string masqueraded as the glob pattern (`?cwd=../outside` → `pattern: ../outside` in audit). Switching to 'list' is the correct semantic shape (cwd is a directory we intend to walk) with identical trust + path-resolution behavior. (Codex workspaceFileSystem.ts:941) P1 (cosmetic / comment accuracy) - server.test.ts: `honors deps.fsFactory override` test comment rewritten to match the actual failure mode (a regression would 404 on a.txt, not 200 against package.json). (Copilot server.test.ts:3219) - routes/workspaceFileRead.ts: `limit` error message uses the MAX_LIST_ENTRIES constant instead of the literal 2000. (review summary MEDIUM) - fs/audit.ts: expanded the JSDoc explaining why the AuditPublisher request types Omit four fields and pass `pattern` through. (review summary MEDIUM) Test additions / adjustments - audit.test.ts: split the existing pattern tests into raw-paths and privacy-default cases; added two new privacy-mode assertions that strip pattern under default config. - workspaceFileSystem.test.ts: harness accepts `includeRawPaths`; glob audit suite runs with raw paths to observe `pattern`; new `glob audit privacy default` suite asserts pattern + relPath are stripped without the env opt-in. - workspaceFileRead.test.ts: new GET /glob cases for the truncated edge case (count == maxResults → false; count > maxResults → true) and root-match normalization. Not adopted (with rationale) - review summary HIGH-2 (glob pathHash uses boundWorkspace): this is the deliberate follow-up #4 contract from PR 18; pattern is the per-call signal, pathHash is the workspace marker. - review summary MEDIUM-1 (parseIntInRange three-state return): matches `parseMaxQueuedQuery` in server.ts; consistency wins. - review summary LOW-1/2/3 (capabilities comment length, CSP header, reverse truncated:false assertion): rationale already documented in code, CSP belongs in a hardening PR, the reverse assertion already exists. 518/518 serve tests pass; typecheck + eslint clean within src/serve/. * fix(serve): address workspace file read review Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(serve): tighten workspace file read review followups Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* feat(serve): add /demo debug page for qwen serve daemon Add a self-contained HTML debug page at GET /demo that provides a browser-based UI for exercising all daemon routes: session create/attach, prompt send/cancel, SSE event streaming, model switching, permission voting, and health/capabilities checks. Also add a same-origin exemption middleware (before the CORS deny layer) so browser fetch calls from the demo page pass through while external Origins remain blocked. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(serve): address CR feedback for /demo page security and robustness - Fix XSS: build permission buttons with DOM APIs instead of innerHTML - Fix SSE: move currentEvent outside read loop for cross-chunk frames - Fix SSE: handle stream end (flush trailing buffer, update UI status) - Security: move /demo route behind hostAllowlist and bearerAuth guards - Security: add host.docker.internal to same-origin Origin allowlist - Add Auth Token input and include Authorization header in API/SSE calls - Add try/catch to /demo route handler with writeStderrLine logging - Check API result before removing permission card from UI - Add 7 tests for /demo route and Origin-stripping middleware * fix(serve): move /demo before bearerAuth so browsers can reach it Browsers cannot attach Authorization headers on address-bar navigation, so /demo behind bearerAuth was unreachable when --token was set. Move the /demo route after CORS + Host allowlist but before bearerAuth. The static HTML shell contains no secrets; all API/SSE routes remain bearer-protected and the in-page token input authenticates them. * feat(serve): show 401 token hint on demo page When an API call returns 401 Unauthorized, highlight the Auth Token input field with a yellow border and display a hint message guiding the user to enter their bearer token. Applies to both API calls and SSE connections. The hint auto-dismisses after 6 seconds. * fix(serve): address round-2 CR feedback for /demo security - Loopback-gate /demo like /health: pre-auth on loopback, post-auth on non-loopback. Prevents unauthenticated access on public interfaces. - Add X-Frame-Options: DENY + CSP frame-ancestors to /demo response to prevent clickjacking via cross-origin iframe embedding. - Cache selfOrigins Set in Origin-stripping middleware (rebuild only when port changes) instead of allocating per-request. - Clear pendingPerms + reset currentAssistantBubble on session switch to prevent stale permission cards from the previous session. - Update tests: loopback vs non-loopback /demo auth, anti-clickjacking headers, rename CORS test, add localhost and [::1] origin coverage. * fix(serve/demo): address CR round-2 feedback for demo page UX - Replace innerHTML with DOM APIs in addLog() to prevent XSS - Add MAX_LOG_ENTRIES=500 pruning to prevent unbounded DOM growth - Add concurrent-send guard (promptInFlight) to prevent double submits - Show error feedback in Chat tab when sendPrompt or createSession fails - Disable prompt controls on SSE failure (both catch and non-OK paths) - Add toolCall context detail (command/input/path/diff) to permission cards --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…responses (QwenLM#4279) `workspaceRelative` returned the platform-native separator from `path.relative`, leaking backslashes into `/file`, `/stat`, `/list`, and `/glob` response paths on Windows. Surfaced as a Windows-only CI failure in the `GET /glob > scopes glob matches to cwd` test (`['sub\\inside.ts']` vs expected `['sub/inside.ts']`). Always emit POSIX-style separators so SDK consumers see the same shape across platforms. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
PRs QwenLM#4249 (workspace memory + agents CRUD) and QwenLM#4269 (workspace file read routes) added `workspace_memory`, `workspace_agents`, and `workspace_file_read` to `SERVE_CAPABILITY_REGISTRY` and updated the unit-level `EXPECTED_STAGE1_FEATURES` in `packages/cli/src/serve/server.test.ts`, but missed the matching integration-test expectation. The E2E `qwen serve — capabilities envelope > advertises all baseline capabilities` assertion has been failing on `main` since those PRs landed. Append the three tags in the same positions as `SERVE_CAPABILITY_REGISTRY` and the unit-level constant (`workspace_memory` + `workspace_agents` after `workspace_providers`, `workspace_file_read` after `mcp_guardrails`). No production code changes — same shape as QwenLM#4268.
…M#4255) * feat(serve): auth device-flow route Implements issue #4175 Wave 4 PR 21. Brokers OAuth 2.0 Device Authorization Grant (RFC 8628) through the `qwen serve` daemon so a remote SDK client can trigger a Qwen-account login whose tokens land on the **daemon** filesystem, not on the client. The daemon polls the IdP itself; the client's only job is to display the verification URL + user code. Runtime locality (#4175 §11): the daemon NEVER spawns a browser or calls `open(url)` — even when running locally. Static-source grep test fails the build on `node:child_process` / `open` / `xdg-open` / `shell.openExternal` / `execa` / `shelljs` / `process.spawn` and their dynamic-import / require variants. - `POST /workspace/auth/device-flow` — strict mutation gate; returns 201 fresh / 200 idempotent take-over with `attached: true`. Per per-`providerId` singleton: a second POST while pending takes over rather than allocating a new `device_code`. - `GET /workspace/auth/device-flow/:id` — public state read. Pending entries echo `userCode/verificationUri/expiresAt/intervalMs`; terminal entries (5-min grace) drop them and surface `status/errorKind/hint`. - `DELETE /workspace/auth/device-flow/:id` — strict; idempotent (terminal → 204 no-op; unknown → 404). - `GET /workspace/auth/status` — pending flows + supported providers snapshot. v1 stub for `providers: []` (populated in fold-in 1). `DeviceFlowRegistry` (`packages/cli/src/serve/auth/deviceFlow.ts`) is the in-memory state holder: - per-`providerId` singleton with idempotent take-over - workspace-wide cap of 4 active flows (abuse defense) - 5-min terminal grace so SDK reconnects can still observe results - TTL sweeper evicts grace-expired entries every 30s - in-flight `Promise` map coalesces concurrent `start()` calls so two parallel POSTs don't double-allocate IdP `device_code` - `transitionTerminal` returns `boolean` so caller-side emit/audit guard prevents sweeper × poll-tick double-fire - `dispose()` wired into `runQwenServe.close()`'s shutdown drain; cancels `provider.poll()` mid-flight via `cancelController`, records `lost_success` audit when an IdP-minted token is dropped by transition `DeviceFlowProvider` interface accepts `start({signal})` + `poll(state, {signal})`. `QwenOAuthDeviceFlowProvider` wraps the existing `QwenOAuth2Client.requestDeviceAuthorization` / `pollDeviceToken` primitives directly (NOT `authWithQwenDeviceFlow`, which calls `open(url)`). PKCE is provider-required by Qwen but optional in the interface for future non-PKCE providers. `success.persist()` writes to disk FIRST, then updates the in-process client — a failed disk write no longer leaves the daemon with a zombie in-memory token. Maps RFC 8628 errors via an anchored regex (`^Device token poll failed: (expired_token|access_denied|invalid_grant)`) so an `error_description` containing one of those literals can't mis-classify an unrelated upstream error. `BrandedSecret<T extends string>` holds the `device_code` and PKCE verifier. Earlier draft used `new String()` wrapper which leaked through `+` / template literals (`Symbol.toPrimitive` → `valueOf` returned the primitive). Final shape: frozen plain object + `WeakMap` indirection + 4-way redaction (`toString` / `toJSON` / `Symbol.toPrimitive` / numeric coercion → `'[redacted]'` or `NaN`) + `unique symbol` brand. 6 leak-path tests: `JSON.stringify` / `String()` / concat / template / `+x` / reveal-roundtrip. 5 new daemon events (workspace-scoped, fanned out to every active session bus via `bridge.broadcastWorkspaceEvent`): - `auth_device_flow_started` — `{deviceFlowId, providerId, expiresAt}` (no userCode/verificationUri — see PR 21 design §3) - `auth_device_flow_throttled` — `{deviceFlowId, intervalMs}`, emitted only on upstream `slow_down` interval bumps - `auth_device_flow_authorized` — `{deviceFlowId, providerId, expiresAt?, accountAlias?}`; `accountAlias` is best-effort non-PII (never email/phone) - `auth_device_flow_failed` — `{deviceFlowId, errorKind, hint?}` with `errorKind ∈ {expired_token, access_denied, invalid_grant, upstream_error, persist_failed}` - `auth_device_flow_cancelled` — `{deviceFlowId}` (DELETE on pending) Workspace-scoped reducer `reduceDaemonAuthEvent` produces `DaemonAuthState { flows: Partial<Record<ProviderId, ...>> }` — parallel to `reduceDaemonSessionEvent`. Session reducer no-ops on auth events (workspace-scoped state belongs in its own reducer). `bridge.broadcastWorkspaceEvent` is intentionally distinct from PR 16's `publishWorkspaceEvent` to avoid merge conflict; collapses to the shared helper as a fold-in once #4249 lands (~25 LoC). `@qwen-code/sdk` (`packages/sdk-typescript/`): - 4 new `DaemonClient` methods: `startDeviceFlow`, `getDeviceFlow`, `cancelDeviceFlow`, `getAuthStatus` — typed against the wire shapes, errors mapped through the existing `DaemonHttpError`. - High-level `client.auth` getter (lazy `DaemonAuthFlow` singleton) exposes a `start(...).awaitCompletion()` shape mirroring `gh auth login`'s UX: print code first, let the SDK consumer decide where to open the browser. `awaitCompletion` polls GET on the daemon-supplied `intervalMs`, honors `slow_down` bumps, and fall-back-recovers from 404 (entry evicted post-grace). POST + DELETE flow through PR 15's `mutate({strict: true})` — 401 `token_required` on token-less loopback defaults. GET routes use only the global `bearerAuth`. Every state transition (`started/authorized/failed/cancelled/expired/lost_success`) records a structured stderr breadcrumb (`[serve] auth.device-flow: provider=... deviceFlowId=abc12... clientId=... status=...`) since `mutate()` doesn't carry an audit hook — events alone aren't enough since SDK can silently drop them; stderr → journald/docker logs is the unfalsifiable record. `auth_device_flow` advertised unconditionally on `/capabilities.features`. Supported providers list lives on `/workspace/auth/status` to keep the registry descriptor uniform. - `packages/core/src/qwen/qwenOAuth2.ts`: - exports `cacheQwenCredentials` (was a private function; needed by the daemon's device-flow registry) - `cacheQwenCredentials` now calls `SharedTokenManager.clearCache()` after writing, folding what was previously a paired call site at L820+L829. Idempotent change. - file mode `0o600` on `oauth_creds.json` (was default 0o666 + umask). Mirrors opencode's `auth/index.ts`. - `packages/cli/src/serve/runQwenServe.ts`: device-flow registry `dispose()` wired into the shutdown drain (BEFORE `bridge.shutdown()`). - `auth/deviceFlow.test.ts` — 21 tests: BrandedSecret leak paths, state machine (slow_down / success / error), terminal grace, concurrent-start coalescing, dispose, cancel idempotency, static- source grep against browser-spawn primitives. - `server.test.ts` — 10 device-flow integration tests: POST 201/200 take-over, strict 401, 400 `unsupported_provider`, GET / DELETE / `/workspace/auth/status`, 502 `upstream_error` mapping, sweeper-driven auto-expiry with controlled clock, capability advertisement. - `daemonEvents.test.ts` — 5 SDK reducer tests: type guards, per- provider state projection, `failed` always → `status: 'error'` (errorKind carries the kind, including new `persist_failed`), session reducer no-ops on auth events. 369/369 serve + SDK tests pass; typecheck + `eslint --max-warnings 0` clean across 14 PR 21 files. - [x] Independently mergeable (depends only on merged PR 4 / PR 7 / PR 12 / PR 15) - [x] Backward compatible (4 new routes + 1 capability tag + 5 typed events + 4 SDK helpers; existing routes/events untouched) - [x] Default off (capability advertised but no client is forced to use it; CLI `qwen` OAuth flow unchanged) - [x] `qwen serve` Stage 1 routes / SDK behavior preserved - [x] Gradual migration (v1 only `qwen-oauth`; future providers register through the `DeviceFlowProvider` interface) - [x] Reversible (revert removes 4 routes + 1 tag + 5 events with no schema migration) - [x] Tests-first (28 new tests across 3 layers) - Inline `bridge.broadcastWorkspaceEvent` → fold-in to PR 16 (#4249) `publishWorkspaceEvent` once that lands - `/workspace/auth/status` vs PR 12 `/workspace/providers` boundary — separate route in v1; merge alternative discussed - Wave 4 PRs 17/19/20 should adopt the same mutate-strict + workspace event-fan-out pattern 5 items from pre-PR specialist passes parked for a focused follow-up: `DeviceFlowEntry` discriminated union, single-source SDK status / ProviderId unions, `awaitCompletion` memoization, broadcast-100%-fail stderr elevation, SDK 404 → `not_found_or_evicted` errorKind. Refs: #4175 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 round-1 review feedback Eleven items from copilot-pull-request-reviewer's round-1 pass on #4255 — 4 inline threads + 7 from the PR-level review summary. ## Adopted (11 items, code/doc changes) - **`lastSeenAt` → `lastSeenEventId`** (`events.ts`, `DaemonDeviceFlowReducerState`). The field was set from `rawEvent.id` (SSE event id) but documented as "epoch ms" — a real semantic mismatch that would mislead consumers into time-based logic against a monotonic counter. Rename + tighten the JSDoc to describe it as an event-id counter; reducer cases updated. - **`DEVICE_FLOW_EXPIRY_GRACE_MS = 30_000` extracted** in `DaemonAuthFlow.ts` (was a magic number on `start.expiresAt + 30_000`). `AwaitCompletionOptions.timeoutMs` doc now describes the actual grace-past-expiry behavior + the rationale (clock skew + daemon sweeper interval + network latency) instead of the wrong "defaults to expiresAt - Date.now()" claim. - **Explicit `chmod 0o600`** in `cacheQwenCredentials` after every write. `fs.writeFile`'s `mode` only applies on file creation; a pre-existing `oauth_creds.json` written under a broader umask kept its old permissions across upgrades. The chmod now tightens it on every write; chmod failure (Windows / hardened FS) surfaces via `debugLogger.warn` instead of silently dropping the invariant. - **`SharedTokenManager.clearCache()` failure now logs** `debugLogger.warn` (was a silent `try { } catch { }`). In production a swallowed clearCache means in-process callers serve stale credentials until the SharedTokenManager mtime watcher catches up — a recoverable degradation worth a log line. - **Protocol doc** lists `persist_failed` in the `auth_device_flow_failed.errorKind` union (was added to the type but missed in the doc). - **`pollDeviceToken({signal})`** plumbed through `IQwenOAuth2Client` interface + `QwenOAuth2Client` impl + the Qwen device-flow provider. Cancel / dispose during a slow IdP response now aborts the in-flight HTTP socket immediately instead of waiting for the upstream timeout. Two new registry tests assert `cancel()` / `dispose()` propagate abort to the signal observed by `provider.poll`. - **`revealSecret` error message** clarified: was "secret has been GC-evicted" (impossible — WeakMap doesn't evict reachable keys). Now points at the actual reachable failure modes (forged shape / serialize+reparse losing the WeakMap binding). - **`transitionTerminal` JSDoc** clarifies that the PRIMARY guard against late timer secret leaks is the `entry.status !== 'pending'` check at the top of `runPollTick`; secret-clearing here is defense-in-depth. - **`DeviceFlowErrorKind` JSDoc'd per variant** so consumers can tell when each fires (RFC 8628 distinctions + `persist_failed` vs `upstream_error` boundary). - **Stale "PR 16 / PR 21 §3" temporal references** in `DaemonAuthFlow.ts:124` rephrased to be timeless ("workspace-scoped events fan out through whatever session buses happen to be live" — no PR number references that rot when those PRs merge). ## Not adopted (4 items, replied to in-thread) - **`authWithQwenDeviceFlow` browser-launch separation** — correct architectural advice but out of #4255 scope (would refactor a CLI auth UX module that PR 21 only touched additively). Tracked as a Wave 5 follow-up. - **Copyright header year range** — repo-wide convention "2025"; not introduced by this PR. - **Spread `...(x ? {x} : {})` → `x: x ?? undefined`** — the two are not semantically equivalent. The current form omits the key entirely on falsy `x`; the suggested form always includes the key. Tests assert object shape and would break under the change. - **Eager `client.auth` getter** — public API boundary. Lazy construction matches `DaemonSessionClient` precedent + saves the module load for SDK consumers that never touch auth. Refs: #4175 #4255 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-1 review feedback 15 items from @wenshao's review batches on #4255. Catches a handful of real bugs that the earlier round (commit 3d9f082f5) didn't surface. ## Critical fixes - **C1 — `pollUntilTerminal` providerId pass-through** (`DaemonAuthFlow.ts:185`). The synthetic 404 fallback hardcoded `providerId: 'qwen-oauth'`; the parent `awaitCompletion` already receives the real providerId via `start.providerId` but `pollUntilTerminal`'s parameter type stripped it. Add the field to the param type, propagate. - **C2 — open `errorKind` allowlist** (`events.ts`). The closed 5-value union in the type guard silently dropped any `failed` event whose errorKind the daemon added without mirroring SDK-side (e.g. a future `rate_limited`). The flow's reducer state would never transition to terminal, leaving SDK consumers stuck on `pending` forever. Open the union with `(string & {})` and accept any non-empty string in the runtime guard. Updated test asserts forward-compat behavior + still rejects the truly-malformed empty-string case. - **C3 — `persist()` timeout + signal** (`deviceFlow.ts`). A wedged disk I/O (NFS stall, encrypted-volume contention) without bounds would pin the entry in `pending` until the upstream `expires_in` elapsed (potentially minutes). The registry now passes its `cancelController.signal` AND arms a hard `DEVICE_FLOW_PERSIST_TIMEOUT_MS = 30_000` timer; persist failure surfaces as `persist_failed` immediately. The `DeviceFlowPollResult` `success` variant signature changed to `persist({signal})`. - **C4 — cancel × success race rollback** (`deviceFlow.ts` + Qwen provider). Today, if `cancel()` transitions while `persist()` is in flight, the credentials get written but the flow's status is `cancelled`. User sees cancelled, daemon disk has a valid token. `DeviceFlowPollResult.success` gains an optional `unpersist()` callback the registry calls when `transitionTerminal(authorized)` fails — the Qwen provider wires it to `clearQwenCredentials()`. Rollback failure is audited but not propagated (re-running auth would overwrite anyway). - **C5 — don't `unref()` the `awaitCompletion` sleep timer** (`DaemonAuthFlow.ts`). On a standalone Node CLI/script doing just `client.auth.start().awaitCompletion()`, the unref'd between-poll timer was the only event-loop handle, so Node could exit before the user finished authorization. The poll wait is foreground work the caller explicitly awaits — keep it ref'd. ## Information-leak fixes - **S1 — sanitize `persist_failed` hint**. `err.message` from `cacheQwenCredentials` embeds the full `~/.qwen/oauth_creds.json` path. Broadcast via SSE, that path leaks the daemon's home layout to every connected session subscriber. Replace user-facing hint with `"credentials could not be written to the daemon filesystem — check disk space and permissions"`; full err goes to stderr audit only. - **S2 — sanitize upstream `pollDeviceToken` hint**. The class embedded the entire raw IdP response body (which can be an HTML error page from a reverse proxy) into the thrown message. Same broadcast leak path. Replace upstream-error hint with `"unexpected response from identity provider"`; RFC 8628 errors use `"Qwen IdP returned ${kind}"`. ## Cleanup / forward-compat - **D1 — drop duplicate `clearCache()`** at `qwenOAuth2.ts:840`. The paired call became redundant once `cacheQwenCredentials` folded the clearCache in (PR #4255 fold-in 1). The fold-in 1 message said this would be done; the duplicate slipped through. - **S3 — drop unused `DeviceFlowNotFoundError`** (`deviceFlow.ts`). Exported but never imported; route handlers do inline 404 JSON. - **S4 — single-source SDK status / errorKind unions** (`types.ts`). `DaemonAuthDeviceFlowSdkStatus` / `DaemonAuthDeviceFlowSdkErrorKind` were parallel literal copies of the canonical events.ts definitions — drift waiting to happen. Now imported + aliased as type-only re-exports. - **S5 — broadcast 100% fail elevates to stderr** (`httpAcpBridge.ts`). Per-session bus failures stay debug-only, but a broadcast where EVERY session bus refused is operationally interesting (clients won't see the event). Track success / fail counts; `writeStderrLine` when `successCount === 0`. - **S6 — `this.disposed` check after `await provider.start()`** (`deviceFlow.ts`). `dispose()` mid-start would orphan the freshly- inserted entry (`schedulePoll` guards on `disposed` so no poll fires; the entry never transitions). Throw post-await if disposed. - **W1 — thread `signal` into `requestDeviceAuthorization`** (`qwenOAuth2.ts` + Qwen provider). `start()` had the same cancellation gap that `pollDeviceToken` had — a slow device-authorization request couldn't be aborted during shutdown. Now plumbed end-to-end. - **W2 — split `invalid_request` from `unsupported_provider`** (`server.ts`). Conflating them surfaced misleading remediation hints to SDK consumers branching on `code` ("this provider isn't supported here" when the real cause was a serializer dropping the field). Bad-shape now returns `code: 'invalid_request'`; unknown-but-well-formed stays `unsupported_provider`. - **W3 — drop never-populated `accountAlias`** (Qwen provider). The field was wired through types / events / reducer / audit but the Qwen IdP's token response doesn't carry one (no `name` / `email` / `sub`). Returning only `{expiresAt}` makes the field type-honestly absent rather than always-undefined. Future provider with an alias-bearing response can populate it. - **W4 — `DaemonAuthFlow` JSDoc accuracy**. Doc claimed "first attempts to consume an SSE event stream … falls back to GET-based polling"; actual is GET-only with SSE as a real-time hint for clients already subscribed to a session stream. - **W5 — clearer unit arithmetic** in interval normalization. The `(_INTERVAL_MS / 1000) * 1000` cancelation hid the s↔ms boundary; expanded form makes both branches unit-explicit. ## Test changes - `daemonEvents.test.ts` updated to match the now-OPEN errorKind union (forward-compat assertion + empty-string still rejected). - `deviceFlow.test.ts` `FakeProvider.poll` aligned with the new `persist({signal})` signature + optional `unpersist`. ## Validation - `npm run typecheck --workspace packages/cli --workspace packages/sdk-typescript --workspace packages/core` — clean - `npx vitest run packages/cli/src/serve/ packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 368/368 - `npx eslint --max-warnings 0` over the 11 PR 21 surface files — clean Refs: #4175 #4255 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-2 review feedback 10 new threads from @wenshao's second deep-review pass on #4255. Verified status: 5 real issues, 1 improvement, 3 stale (already fixed; comments lagged), 1 false alarm (typecheck demonstrably clean). ## Critical fixes - **fold-in 2 C4 REVERSED**: when `provider.poll()` returns success AND `cancel()` / `dispose()` transitioned the entry mid-`persist()`, the registry now FORCES the entry to `authorized` and keeps the on-disk credentials. The earlier rollback (`unpersist()`) wasted the user's IdP approval because the RFC 8628 `device_code` is single-use — re-running the flow would force them through the whole browser-prompt + paste-code dance again for a click whose intent was likely "stop the wait" rather than "undo my already- completed approval". Aligns with gh CLI / Auth0 SDK / git- credential-manager. Audit captures the race via `hint: 'lost_success_kept ...'`. `DeviceFlowPollResult.success.unpersist` field + Qwen provider's `clearQwenCredentials` rollback removed. - **#1 GET /workspace/auth/device-flow/:id strict gate**: this GET surfaces `userCode` / `verificationUri` for pending entries, which on the loopback no-token default were readable by any local process. POST + DELETE were already strict; aligning GET closes the information-disclosure asymmetry. `/workspace/auth/status` stays bearer-only (its `pendingDeviceFlows` entries intentionally omit `userCode`). - **#2 `inFlightStarts` hard timeout**: a hung `provider.start()` (network partition, unresponsive IdP) used to leave the per- `providerId` slot in `inFlightStarts` occupied forever, blocking every subsequent POST until daemon restart. New `DEVICE_FLOW_START_TIMEOUT_MS = 30_000` arms a timer that `cancelController.abort()`s the start; the rejected promise unwinds through the `try/finally` clearing the slot. - **#10 chain-completing the C3 persist-timeout**: the earlier C3 fix armed a 30s timer that fired `cancelController.abort()` then `await result.persist({signal})`, but the chain ended at the registry boundary — `cacheQwenCredentials` didn't take a signal, so `fs.writeFile` couldn't be aborted. Now `cacheQwenCredentials` accepts an optional `{signal}` and threads it into `fs.writeFile(..., {signal})` (Node native). The Qwen provider's `persist({signal})` forwards the entry's `cancelController.signal` end-to-end. ## Improvement (#4): 404 fallback errorKind `pollUntilTerminal`'s 404 catch used to synthesize `{status: 'expired'}` for ALL evicted entries — conflating "your flow expired during your disconnect", "the daemon was restarted", and "your deviceFlowId was wrong". Now returns `status: 'error'` + `errorKind: 'not_found_or_evicted'` + a `hint` so SDK consumers branching on errorKind can distinguish. ## Information leak (#9): start() path raw IdP message S2 (fold-in 2) sanitized `poll()`'s upstream-error hint, but `start()` still embedded the raw `err.message` (full IdP response, potentially HTML from a reverse proxy / WAF) into the `UpstreamDeviceFlowError` that flowed to SDK clients via the 502. Now uses static messages for the SDK-visible errors; raw detail goes through `writeStderrLine` for operator audit only. Mirrors S2's approach. ## Stale comments cleaned (#5, #7) `qwenDeviceFlowProvider.ts:177` claimed `cacheQwenCredentials` "doesn't currently take a signal — that's a follow-up". After #10 above, that's no longer true; the comment is replaced with the actual end-to-end signal-threading note. ## Not adopted (1 false alarm) - Thread on `types.ts:330` claimed type-only-import-after- declarations breaks `tsc` and fails `daemonEvents.test.ts:670` with TS2345. Demonstrably false: `npx tsc -p packages/sdk-typescript/tsconfig.json --noEmit` exits 0; `daemonEvents.test.ts` is the post-fold-in-2 file with the open-allowlist assertion (test 28/28 passes). The reviewer may have been looking at a transient state during their analysis. ## Validation - `npm run typecheck --workspace packages/cli --workspace packages/sdk-typescript --workspace packages/core` — clean - `npx vitest run packages/cli/src/serve/ packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 398/398 pass - `npx eslint --max-warnings 0` over the PR 21 surface — clean Refs: #4175 #4255 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-3 review feedback 5 new threads from the third deep-review pass on #4255. 3 real issues fixed; 1 stale (already done in fold-in 3); 1 deferred as non-blocking design suggestion. - **A — `expiresIn` / `interval` non-finite guard** (`deviceFlow.ts`). The provider contract types both as `number`, but a misbehaving / future provider could hand `undefined` / `NaN` / `Infinity`. `Math.max(0, NaN) * 1000` is `NaN`, then `now() + NaN` is `NaN`, then `now >= NaN` is always `false` — the sweeper would NEVER evict the entry, pinning an upstream `device_code` slot until daemon restart. Same hazard on `interval * 1000` (NaN → `setTimeout(NaN)` fires immediately, Infinity → scheduler clamps to TIMEOUT_MAX). Now both fields go through `Number.isFinite(x) && x > 0`; missing/bad values fall back to RFC 8628's recommended ceilings (10 min for expiry, 5s for interval). - **D — typed `app.locals` accessor** (`deviceFlow.ts` + writer/reader call sites). The `app.locals['deviceFlowRegistry']` string key was shared between `createServeApp` (writer) and `runQwenServe` (reader); a typo on either side would compile cleanly and the shutdown dispose call would silently no-op, leaving polling timers running until the `unref()` rescue. New `setDeviceFlowRegistry(app, registry)` / `getDeviceFlowRegistry(app)` pair gives both call sites type-checked access; the string literal is encapsulated in one module. - **E — `UnsupportedDeviceFlowProviderError` docstring** (`deviceFlow.ts`). After fold-in 2's W2 fix split `invalid_request` from `unsupported_provider`, the route layer screens unknown ids against `DEVICE_FLOW_SUPPORTED_PROVIDERS` before reaching the registry — so this error is now reachable ONLY on a daemon-internal invariant violation (id is declared supported but not registered in the runtime provider map). Docstring + thrown message updated to reflect that this branch signals a programmer error, not user input. - **B** claimed `cacheQwenCredentials(credentials)` doesn't forward signal to `fs.writeFile`. Verified: fold-in 3 (#10) at `qwenDeviceFlowProvider.ts:204` calls `cacheQwenCredentials(credentials, { signal: persistOpts.signal })` and the core helper threads it into `fs.writeFile(..., {mode, signal})`. The reviewer was looking at the comment block above (lines 174-181) without scrolling to the actual call site. - **C — SDK `cancelDeviceFlow` lossy 204/404 collapse**. Suggested returning `{existed: boolean; alreadyTerminal: boolean}` instead of resolving void on both 204 and 404. Real signal-loss but tagged "[非阻塞]" by the reviewer; changing requires a daemon route shape change (200 + body instead of 204) which is better as a focused follow-up PR. Acknowledged in-thread; deferred to a fold-in PR after #4255 lands. - `npm run typecheck` — clean across `packages/{cli,sdk-typescript,core}` - `npx vitest run packages/cli/src/serve/ packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 398/398 - `npx eslint --max-warnings 0` over the PR 21 surface — clean Refs: #4175 #4255 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-4 review feedback 4 threads from the fourth review pass on #4255. 3 adopted + 1 deferred (out-of-scope rename of PR 15's `mutate` helper). ## Adopted ### #1 — `persistInFlight` flag suppresses cancel × persist event-stream UX trap When `provider.poll()` returns success and we await `persist()`, a concurrent `cancel()` would synchronously transition the entry to `cancelled` and emit `auth_device_flow_cancelled` — then `persist()` resolves and (per fold-in 3 C4) force-overrides to `authorized` + emits `auth_device_flow_authorized`. The reducer state correctly last-write-wins on `authorized`, but DIRECT event-stream consumers (close-dialog handlers, telemetry, UI cleanup) race onto an unmounted UI when the second event lands. Now: while persist is in-flight, `cancel()` and the sweeper SKIP the state transition + event emit. They register intent (set `cancelRequestedDuringPersist=true` for cancel; sweeper just no-ops) and let the persist resolution decide: - persist succeeds → `authorized` (IdP wins per fold-in 3 C4) - persist fails AND cancel was requested → `cancelled` - persist fails AND `now >= expiresAt` → `expired` / `expired_token` - persist fails otherwise → `error` / `persist_failed` Result: at most one terminal event per flow. Imperative SSE consumers no longer see oscillating terminal states. Audit captures the race (`hint: 'lost_success_kept ...'`) for incident-response correlation. ### #2 — `revealSecret` → `unsafeRevealSecret` rename The earlier JSDoc claimed "the `unsafeReveal_` naming is intentional: greppable in code review, easy to allowlist in lint rules, hard to invoke by accident" — but the actual function was named `revealSecret`. The promised safety properties didn't exist; a code reviewer wouldn't single out `revealSecret` as suspicious, and a `no-restricted-syntax` ESLint rule wouldn't flag it. Renamed to `unsafeRevealSecret` so the JSDoc-promised "greppable" / "lintable" property is now actually true. Two call sites in the Qwen provider + 4 test references updated. Internal symbol; not exposed through the SDK package. ### #4 — `QwenOAuthPollError` typed class replaces substring regex The earlier RFC 8628 error mapper used an anchored regex against the thrown error message text — an implicit cross-file string contract between `qwenOAuth2.ts` (throws) and `qwenDeviceFlowProvider.ts` (matches). If `qwenOAuth2.ts` ever changed its message format, ALL RFC 8628 errors (`expired_token` / `access_denied` / `invalid_grant`) would silently fall through to `upstream_error` — wrong errorKind flowing through telemetry with no test or type-system check to catch the drift. Now `QwenOAuth2Client.pollDeviceToken` throws a structured `QwenOAuthPollError extends Error` with `oauthError` / `description` / `status` fields. The provider branches on `instanceof QwenOAuthPollError` and reads `.oauthError` directly via a dedicated `mapRfc8628OAuthCode(code)` switch. The drift hazard is gone: a future code change that touches the typed class will fail tsc until both sides are updated. Message format preserved for any pre-existing log-parsing / substring matchers. ## Not adopted ### #3 — `mutate({strict:true})` semantic awkwardness on GET Reviewer correctly noted that `mutate` is named for state-changing routes, but `GET /workspace/auth/device-flow/:id` uses it for an information-disclosure defense (only reachable code path is reading state). Suggested rename: `mutate` → `strictHttpGate`. Deferred: the rename touches PR 15's helper which has many call sites in `server.ts` and is shared infrastructure for Wave 4 PRs 17/19/20. PR 21 is the first / only consumer of the strict-on-GET form so far; widening the rename to a Wave 4 follow-up keeps the fold-in scope tight. Replied in-thread. ## Validation - `npm run typecheck` — clean across `packages/{cli,sdk-typescript,core}` - `npx vitest run packages/cli/src/serve/ packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 544/544 - `npx eslint --max-warnings 0` over the PR 21 surface — clean Refs: #4175 #4255 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-5 review feedback Five small adopt items from the round-5 review pass; one stale thread already addressed in b5b77ee90 (fold-in 5). #2 — `as const` + derived type for DEVICE_FLOW_SUPPORTED_PROVIDERS so adding/removing a provider id requires touching exactly ONE site. Mirrors `SERVE_ERROR_KINDS` / `ServeErrorKind` in `status.ts`. #3 — Clarify `DEVICE_FLOW_EXPIRY_GRACE_MS` JSDoc to distinguish the daemon's 30s SWEEP cadence (what the grace tracks) from the 5-min TERMINAL_GRACE_MS reconnect window (which awaitCompletion does NOT need to wait through). #4 — Add `@remarks` block on `DeviceFlowProvider.poll()` warning future provider authors that thrown `err.message` flows verbatim into the SSE-broadcast `auth_device_flow_failed` hint, and must be sanitized. Two equally-correct paths documented (typed `error` result vs sanitized thrown message). #5 — Truncate raw IdP detail in `qwenDeviceFlowProvider.ts` stderr audit lines to 2 KiB. WAFs / reverse proxies can return MB-sized HTML error pages, and container log aggregators (Loki, Fluent Bit, Stackdriver) typically truncate or drop lines past 4-32 KiB — losing the useful prefix downstream. 2 KiB retains structured JSON envelopes while staying well below every aggregator's per-line cap. #6 — Track latest `originatorClientId` on per-provider singleton take-over via new `entry.lastOriginatorClientId` field + `recordTakeover()` helper. When a second SDK client posts `POST /workspace/auth/device-flow` for an already-pending provider (or one being created in `inFlightStarts`) with a different `initiatorClientId`, an audit breadcrumb records the take-over so incident response can correlate "client A started, client B took over at 12:34". Event-routing intentionally still uses the original `initiatorClientId` (events are workspace-broadcast and changing the originator field mid-flow would break SDK reducers that key on it). Two new tests cover the differing-id audit + same-id no-op. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-6 review feedback Six "Critical" findings from a gpt-5.5 /review pass — all real liveness/correctness defects in the daemon's auth device-flow path and the SDK's awaitCompletion polling loop. #1 — Make `provider.start()` timeout authoritative via `Promise.race` in `DeviceFlowRegistry.doStart`. The earlier shape only ABORTED the signal on timeout; a provider that ignores `signal` (non-abortable I/O, future implementer who forgets to thread it to `fetch`) would leave the `await` hanging until daemon restart, pinning the `inFlightStarts` slot for that providerId. Race against a rejecting timer makes the timeout authoritative regardless of provider cooperation; abort still fires for cooperative cleanup. #2 — Same shape for `result.persist()` in the success branch of `runPollTick`. A future provider whose persist performs non-abortable steps (mkdir/chmod/mv outside the abortable fs.writeFile path) would otherwise hang the poll tick until process restart. Race against rejecting timer; rejection maps to `persist_failed`. #3 — Clamp `expiresIn` and `interval` upper bounds. Previous `Number.isFinite + > 0` guards stopped NaN/Infinity but a finite extreme like `1e12` was still accepted — pinning the per-provider singleton for ~30,000 years (`expires_in`) or scheduling a TIMEOUT_MAX-clamped poll that never fires within `expiresAt` (`interval`). Two new constants (`DEVICE_FLOW_MAX_EXPIRES_IN_SEC = 3600`, `DEVICE_FLOW_MAX_INTERVAL_MS = 60_000`) cap the worst case. #4 — Extract `getDeviceFlowOrSynthetic404(...)` helper in `DaemonAuthFlow.ts` and route BOTH the loop body and the timeout-ceiling final read through it. Previously the ceiling read went directly through `client.getDeviceFlow` and a 404 at the boundary (entry evicted just as the timeout fired) would reject with `DaemonHttpError(404)` instead of returning the structured `{ status: 'error', errorKind: 'not_found_or_evicted' }` that the rest of `awaitCompletion` promises. #5 — Validate `AwaitCompletionOptions.timeoutMs` and `pollOverrideMs` with `Number.isFinite + > 0`. NaN slipped past the previous `?? default` form (NaN is truthy-ish in that position) and produced a `ceiling` of `NaN` (loop runs forever — `now >= NaN` always false) or a `setTimeout(NaN)` (Node clamps to 1ms — tight polling loop). Sanitize to `undefined` so the documented defaults take effect. #6 — Thread `signal` into `DaemonClient.getDeviceFlow` and forward to `fetchWithTimeout` (which already composes caller + timeout signals). awaitCompletion now passes `opts.signal` from both GET sites. Without this, an `awaitCompletion` caller that aborts mid- poll could not cancel an in-flight stalled GET; it would have to wait for the daemon-side `fetchTimeoutMs` (30s default) to fire. Four new tests in `deviceFlow.test.ts` pin the new behaviors: hanging-start timeout (#1), hanging-persist → persist_failed (#2), extreme-expiresIn clamp (#3), extreme-interval clamp (#3). FakeProvider gained a `startHangs` flag for the non-cooperative provider scenario. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-7 review feedback Two findings from a DeepSeek /review pass; both small but legitimate defense-in-depth gaps. #1 — `runPollTick`'s catch block forwarded `err.message` verbatim into the SSE-broadcast `hint`. The provider's `@remarks` contract (fold-in 6 #4) requires throwers to sanitize, but if violated the unbounded raw payload would reach every SSE subscriber. Added `DEVICE_FLOW_POLL_HINT_MAX_LEN = 256` + `truncatePollHint()`, applied to the catch's `result.hint`. Full raw `err.message` is still routed to the audit trail (`audit?.record({hint: 'provider.poll() threw (raw): ...'})`) so operator visibility for incident response is preserved. Belt-and-suspenders: the contract is now structurally enforced rather than relying on every future provider author to read the JSDoc. #2 — `updateMatchingFlow` (and the `started`/`authorized` handlers in `reduceDaemonAuthEvent`) unconditionally overwrote state without comparing `rawEvent.id` against the existing flow's `lastSeenEventId`. The field's JSDoc documented it as a monotonic counter to prevent stale frames from overwriting newer state, but the code didn't enforce that contract. SSE reconnect with `Last-Event-ID < terminal-frame-id` would replay older frames; if any of them were for the same `deviceFlowId` (e.g. a delayed `failed` arriving after `authorized`) the stale frame would overwrite the terminal. Daemon-side `transitionTerminal` makes the exact reachable scenario thin, but the documented contract should match the code. Threaded `rawEventId` into `updateMatchingFlow` and added the gate there + in the `started` and `authorized` handlers (the two cases that don't go through `updateMatchingFlow`). Synthetic frames without an envelope `id` (`rawEventId === undefined`) bypass the gate — they originate inside SDK reducer machinery and aren't subject to replay ordering. Three new tests pin the contracts: - `runPollTick catch truncates the SSE hint and preserves raw on the audit (fold-in 8 #1)` — `pollThrowsWith` flag on FakeProvider models a non-conforming provider; SSE hint < 400 chars + contains 'truncated'; audit hint contains the full 4_000-char raw. - `reduceDaemonAuthEvent rejects out-of-order frames (fold-in 8 #2 monotonicity)` — stale `failed`(id=7) does NOT overwrite `authorized`(id=10); stale `started`(id=4) for a different flow also rejected. - `reduceDaemonAuthEvent passes synthetic frames (no envelope id) through the gate` — SDK-internal frames without `id` are honored. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-8 review feedback Twelve correctness + structural fixes from a wenshao + DeepSeek + gpt-5.5 review pass. Tests deferred to fold-in 10 (separate, larger commit). CRITICAL CORRECTNESS #7 — `provider.persist()` Promise.race could publish `persist_failed` to SSE while a non-cooperative provider was still committing credentials to disk. Added an independent tracker on the original persist promise: if the race timed out (`persistTimedOut === true`) AND the underlying persist later resolved successfully, audit a `lost_success_after_timeout` breadcrumb so operators see the inconsistency. Tightened the persist `@remarks` contract to require signal honoring end-to-end. Qwen provider already complies (fold-in 3 #10); this is forward-defense for future providers. #11 — auth surface (`DaemonAuthFlow`, `reduceDaemonAuthEvent`, `createDaemonAuthState`, `DEVICE_FLOW_EXPIRY_GRACE_MS`, all event / data / state types) was re-exported from `src/daemon/index.ts` but NEVER from the published SDK entry `src/index.ts`. SDK consumers got `undefined` for everything except `client.auth.start()` (which traveled through the already-exported `DaemonClient`). Added the missing exports and pinned via `daemon-public-surface.test.ts`. #12 — `core/src/qwen/qwenOAuth2.ts:373`'s `debugLogger.debug('Device authorization result:', result)` writes the raw `device_code` (RFC 8628 bearer-equivalent credential) to stderr / journald, bypassing the `BrandedSecret` redaction layer. Pre-existing on main but PR 21 expanded the exposure surface. Sanitized to log only `{ ok, expires_in }` on success / `{ ok, error }` on error. #13 — `runPollTick` success-branch persist-failure × past-`expiresAt` classified as `expired_token` instead of `persist_failed`, routing operators toward "tell user to retry" (RFC 8628 expiry) when the actual root cause was disk I/O. Reclassified to `persist_failed` with a `persist_also_failed_past_expiry` audit hint to preserve the timing detail for incident response. SMALL CORRECTNESS #1 — `runPollTick` catch hint replaced with a STATIC bounded message ("provider.poll() failed; see daemon audit log for details"). The fold-in 8 truncated-prefix approach could still leak the first 256 chars of provider-templated raw text including secret material. Full raw still routed to audit channel for operator visibility. #5 — `cancellerClientId` field added to `DeviceFlowEntry`; deferred- cancel branch in `cancel()` now stamps it on the entry, and the persist-resolution `cancelled` event publish uses `entry.cancellerClientId ?? entry.initiatorClientId`. SSE consumers that suppress self-emitted events can now attribute the cancel correctly. #6 — `AwaitCompletionOptions.timeoutMs === 0` (the documented "settle immediately, return current daemon view" contract) was treated as falsy by the `?` ternary, falling back to the default. `sanitizePositiveMs` now takes an `allowZero` opt-in; the ceiling computation uses `!== undefined` instead of truthy check. #8 — `EventBus.publish()` returns `undefined` for closed buses (it does NOT throw). `broadcastWorkspaceEvent` previously counted that path as success, hiding the all-buses-dropped operator alarm. Folded the closed-bus-as-failure check into the canonical `publishWorkspaceEvent` (see #X below). #9 — start-timeout Promise.race rejected with a plain `Error`, falling through `sendBridgeError` to a generic 500. Switched to `UpstreamDeviceFlowError` so a hung IdP correctly surfaces as 502 (matching the envelope every other IdP start failure uses). STRUCTURAL #3 — Three identical `transitionTerminal + publish + audit` expired_token blocks in `runPollTick`/`sweep`/(removed by #13) deduplicated into a private `expireEntry()` helper. Future event- shape changes are now a one-edit operation. #X — PR 16 (#4249) merged on 2026-05-18 06:27Z. Per the inline comment at httpAcpBridge.ts:501, PR 21's `broadcastWorkspaceEvent` was kept distinct only to avoid the merge conflict; once PR 16 landed, it became a fold-in candidate. Folded the closed-bus + all-failed-stderr-escalation operator-visibility features (PR 21's S5 + fold-in 9 #8) INTO `publishWorkspaceEvent`; dropped `broadcastWorkspaceEvent` from the bridge interface + impl + test mocks. PR 21's deviceFlowEventSink now calls `bridge.publishWorkspaceEvent` — single canonical workspace fan-out. DOC #16 — Added a "Cross-client take-over" paragraph to `docs/users/qwen-serve.md` explaining that two clients on the same daemon for the same provider get the per-provider singleton with `attached: true`/`false` distinguishing them; no separate event fires (both eventually observe the same `auth_device_flow_authorized`). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-9 review feedback Two small non-blocking items from the round-9 pass; defensive shape + docs only. The 4 deferred test-coverage threads (#1-4 of round-8) are still tracked for fold-in 10. #6 — `lastSeenEventId` typed `number` with `?? 0` defaults in the `auth_device_flow_started` reducer case. The daemon-side `EventBus` assigns ids ≥ 1 so the `0` sentinel has no real-traffic meaning, but the monotonic gate (`rawEventId <= flow.lastSeenEventId`) would reject any future SDK-internal synthetic frame using `id: 0`. Changed the field type to `number | undefined` and dropped the `?? 0` from the started case. The `updateMatchingFlow` / `auth_device_flow_authorized` guards already short-circuit on `existing.lastSeenEventId !== undefined`, so undefined is safe end-to-end. Existing 34 reducer tests still pass unchanged. #7 — Added `@remarks` block to `DeviceFlowErrorKind.persist_failed`'s JSDoc explaining the lost-success retry UX. When fold-in 9 #7's `lost_success_after_timeout` audit fires (non-conforming provider violates signal contract; disk write succeeds after registry published `persist_failed`), a naive SDK retry hits the IdP a second time with a fresh `device_code` and prompts the user twice — but the first credential set is already valid. JSDoc now documents the mitigation: SDK consumers writing retry logic on `persist_failed` should call `client.auth.getStatus()` BEFORE re-prompting; operators can grep stderr/audit for `lost_success_after_timeout` to detect occurrences. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * test(serve): fold-in 10 — auth device-flow test bundle (#4255) Lands the four deferred test-coverage items the round-8 review flagged (and round-9 re-surfaced) as a hard merge prerequisite. Net +41 tests across registry / SDK helper / client HTTP / HTTP route layers. #1 — `deviceFlow.test.ts` `persist failure paths` describe (3 tests, +3). The success arm's three terminal mappings — pure `persist_failed`, `cancelled` (cancel during persist), and `persist_failed` past `expiresAt` (the fold-in 9 #13 reclassification with `persist_also_failed_past_expiry` audit hint) — were 0-covered. Now pinned. Test #2 also asserts the fold-in 9 #5 cancellerClientId routing on the deferred `cancelled` event. #2 — new `DaemonAuthFlow.test.ts` (+14 tests). Mock DaemonClient with sequenced `getDeviceFlow` replies. Covers happy-path polling → `authorized`; `slow_down`-driven `intervalMs` bump firing `onThrottled`; `signal.abort()` rejection; `signal` propagation through `client.getDeviceFlow` (fold-in 7 #6); `timeoutMs` ceiling final-read; `timeoutMs:0` immediate-return (round-9 #6); NaN/Infinity → `sanitizePositiveMs` fallback to default ceiling (fold-in 7 #5); 404 → synthetic `error`/`not_found_or_evicted` (fold-in 3 #4) at BOTH the loop body AND the timeoutMs ceiling read (fold-in 7 #4); non-404 DaemonHttpError rethrown; `cancel()` and top-level `status()`/`cancel()` wrappers forward correctly. #3 — `DaemonClient.test.ts` `device-flow methods` describe (+11 tests). POSTs `/workspace/auth/device-flow` happy path + clientId header + body shape; 200/201 acceptance; non-2xx → `DaemonHttpError`. GETs URL-encode the deviceFlowId; forward `opts.signal` to `fetchWithTimeout`'s composed signal (fold-in 7 #6 — verified by aborting caller signal and observing the fetch's signal flip to `aborted`); 404 throws. DELETEs swallow 204 + 404 (idempotent, mirrors `closeSession`); non- 204/404 throws. `getAuthStatus` plain GET. `client.auth` lazy-instantiated singleton. #4 — `server.test.ts` 5 supplementary contract tests (+5). The existing 8 `it()`s cover happy paths + take-over + 401 POST + DELETE pending/terminal/unknown + 502 upstream + sweeper. This commit plugs gaps: 400 `invalid_request` for missing / non-string providerId (fold-in W2 split this from `unsupported_provider`); 409 `too_many_active_flows` (via injected fake registry); 401 `token_required` on DELETE without bearer; the asymmetric GET posture (`/workspace/auth/device-flow/:id` IS strict-gated to prevent peer-process userCode shoulder-surf; `/workspace/auth/status` stays read-only because its `pendingDeviceFlows` entries intentionally redact `userCode`). Validation: cli serve 631/631 (+8 from #1, #4); sdk 384/384 (+25 from #2, #3, +/- some pre-existing churn). Typecheck + lint clean. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(qwen): atomic temp+chmod+rename in cacheQwenCredentials (PR #4255 round-11 #2) gpt-5.5 /review flagged a real correctness/security gap: the post-write `chmod` ordering left a window where freshly-written credentials could land in a broadly-readable existing `oauth_creds.json` before the chmod tightened it. On POSIX, a chmod failure additionally degraded to a debug warning while the broadly-readable tokens stayed on disk. New shape mirrors the standard atomic-write idiom: 1. Write `${filePath}.tmp.${pid}.${randomUUID()}` with `mode: 0o600`. The temp path doesn't exist beforehand, so the `mode` flag actually applies on creation (it doesn't on existing files, which was the root of the original race). 2. Defensive `chmod` on the temp file. POSIX failure is now a HARD ERROR (refuses to publish broad-perm credentials to the canonical filename). Windows logs a debug breadcrumb and proceeds, since chmod is a no-op on most NTFS volumes (perms go through ACLs). 3. Atomic `fs.rename` over `filePath`. The canonical path is ALWAYS at `0o600` from the moment it contains the new tokens; readers see either the old creds or the new creds, never a partially-written or broadly-readable state. 4. Best-effort `fs.unlink` of the temp file on any failure path so failed writes don't leave `.tmp.<pid>.<uuid>` litter on disk. Test mock in `qwenOAuth2.test.ts` extended with `chmod` + `rename` no-op stubs so the existing 158 core/qwen tests still pass; no test behavior change beyond the mock surface. Validation: typecheck clean (cli + core + sdk-typescript); core qwen 158/158; cli serve 643/643; sdk 384/384. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao + gpt-5.5 round-12 review feedback Eight findings from a wenshao + gpt-5.5 /review pass: 1 critical correctness, 2 real defensive defects, 4 edge cases / minor hardening, 1 test gap. All adopted. CRITICAL CORRECTNESS #1 CzSpN — `dispose()` race: after `await provider.poll(...)` the post-await guard checked only `entry.status !== 'pending'`, NOT `this.disposed`. `dispose()` clears the registry maps and aborts the entry's signal but doesn't mutate `entry.status`, so a provider whose poll already resolved (or doesn't honor abort) could enter the success branch and call `result.persist({...})` — committing credentials on a shutting-down daemon. Added the `if (this.disposed) return;` guard symmetric with the top-of-method check. REAL DEFENSIVE DEFECTS #2 Cy_ZG — sync-throw escape: the `result.persist({signal})` call happens BEFORE the `new Promise` constructor that captures it (`persistTracker` is closed-over inside the constructor). A non-conforming provider whose persist throws synchronously (e.g. top-of-function validation) would escape past the outer `try/catch (await new Promise(...))` and become an `unhandledRejection` since `runPollTick` is fire-and-forget via `void`. Wrapped the persist invocation in a try/catch that routes the sync-throw into the same `persistError` branch. #3 CzSpe — runtime provider map: provider validation hardcoded `DEVICE_FLOW_SUPPORTED_PROVIDERS` even though `deps.deviceFlowProviders` is the documented extension hook for tests/future providers. Switched both POST validation and `/workspace/auth/status` `supportedDeviceFlowProviders` to derive from `deviceFlowProviderMap.keys()` — single source of truth matches the registry's `resolveProvider`. EDGE CASES / MINOR HARDENING #4 Cy_Y9 — `slow_down` re-clamp: `intervalMs += SLOW_DOWN_BUMP_MS` can push past `DEVICE_FLOW_MAX_INTERVAL_MS` (the bound that keeps `setTimeout` from clamping to TIMEOUT_MAX). Wrapped in `Math.min(MAX_INTERVAL_MS, ...)` symmetric with the doStart clamp. #5 Cy_ZF — `expiresInSec` lower bound: `0.5` was finite-positive and produced `expiresAt = now() + 500 ms` — first poll (clamped at ≥1 s) fires AFTER expiresAt → flow expires before any user could authorize. Added `DEVICE_FLOW_MIN_EXPIRES_IN_SEC = 30` (RFC 8628 §3.2 calls 5–30 minutes "reasonable"; sub-30s is non-compliant). #6 CzHOK — take-over response privacy: `initiatorClientId` was echoed to ANY take-over POST caller, including those with no `X-Qwen-Client-Id` header. Bearer-gated already, but the asymmetry "anonymous caller learns who started it" violated the no-header-as-privacy-signal contract. Now only echoed when the caller's id matches the entry's initiator. #7 CzSpd — production audit visibility: production audit sink dropped `line.hint`, but the registry uses hints for operator-only breadcrumbs (`provider.poll() threw (raw)...`, `lost_success_after_timeout`, `persist_also_failed_past_expiry`, take-over correlation, `deferred (persist in flight; ...)`). The documented troubleshooting trail was invisible in production stderr. Now included with a 1 KiB bound + JSON-quoted so multi- word hints stay parseable. TEST GAP #8 Cy_ZH — `lost_success_after_timeout` audit: the fold-in 9 #7 split-brain detector for non-cooperative providers had no test pinning it. Added a controllable `latePersist` Promise + test that drives poll → success → enters persist race → fires PERSIST_TIMEOUT (registry publishes persist_failed) → resolves persist late → asserts the lost_success audit fires. Validation: typecheck + lint clean; cli serve 644/644 (+1 from the new test); sdk-typescript 384/384. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): close concurrent multi-provider cap bypass (PR #4255 round-13 #1) gpt-5.5 /review caught a real workspace-wide cap bypass: `countActive()` only counted entries already installed in `byProvider`, but the cap check at the top of `start()` runs before any provider's `inFlightStarts` slot completes `provider.start()`. A burst of fresh starts for `DEVICE_FLOW_MAX_CONCURRENT + 1` distinct providers all run synchronously to the cap check (each `start()` is async but runs to its first await — the await happens AFTER the cap check), all observe `count === 0` (no `byProvider` entries installed yet), and all pass — eventually installing more than the documented four pending flows. Fix: include `inFlightStarts.size` in `countActive()`. The two maps are disjoint by construction (the existing-pending fast-path catches any provider with both), so simple addition cannot double-count. The second concurrent caller sees count=1, the third count=2, …, and the (MAX+1)th caller is rejected with `TooManyActiveDeviceFlowsError`. Test: `caps at DEVICE_FLOW_MAX_CONCURRENT under CONCURRENT distinct-provider starts`. Fires `MAX+1` concurrent starts via `Promise.allSettled`, asserts exactly `MAX` fulfilled + exactly 1 rejected with the typed error. Pre-fix this test fails (all `MAX+1` succeed); post-fix it passes. Validation: typecheck clean across all 4 workspaces; deviceFlow.test.ts 35/35 (was 34); cli serve 645/645. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…wenLM#4280) * feat(serve): add workspace file write/edit routes Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(serve): bind file hashes to text snapshots Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(serve): tighten read-bytes snapshot and create-mode publish - readBytesWindow: re-stat the open fd after read and require unchanged ino+size+mtime before emitting the response. Mirrors the hardened text-snapshot path so the full-window hash can no longer pair with bytes that drifted under in-place rewrite or append. Surface drift as retryable hash_mismatch. - atomicWriteTextResolvedFile: reject a symlinked parent up-front as defense-in-depth ahead of the parent-fd publish follow-up referenced by assertInodeStableAfterRead. - atomicWriteTextResolvedFile: publish create-mode writes via link()+unlink() instead of rename(). POSIX rename() overwrites an existing regular file, so a racing external process could break the public create contract; link() returns EEXIST atomically and is portable across POSIX/NTFS. The early assertCreateTargetAbsent check stays for friendlier errors on the non-racing path. --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…wenLM#4175 Wave 4 PR 17) (QwenLM#4282) * feat(core): introduce TrustGateError for setApprovalMode (QwenLM#4175 Wave 4 PR 17) Adds a named subclass `TrustGateError` thrown by `Config.setApprovalMode` when the requested mode would grant privileged tool autonomy in a folder the user has not marked as trusted. Daemon mutation routes can now recognize this rejection class without depending on message text. Extends `mapDomainErrorToErrorKind` in `packages/cli/src/serve/status.ts` to map `TrustGateError → 'auth_env_error'`. Matches by `err.name` rather than `instanceof` because cross-package bundling can produce duplicate class instances where `instanceof` returns false. Test covers both the real class and a name-synthesized instance. Foundation for the `POST /session/:id/approval-mode` route landing in a follow-up commit in this PR. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * feat(core): add disabledTools workspace setting (QwenLM#4175 Wave 4 PR 17) Introduces a per-workspace skip-registration mechanism for tool names, distinct from `permissions.deny` (which keeps the tool registered and blocks invocation). Tools listed in `disabledTools` are not registered at all and never appear in `/tools`, `getAllTools()`, or function-call discovery — both built-ins and MCP-discovered tools flow through `ToolRegistry.registerTool` / `registerFactory`, so gating there covers every registration path. - `ConfigParameters.disabledTools?: string[]` (frozen into a `ReadonlySet` at Config construction; queried via `Config.getDisabledTools()`) - `ToolRegistry.registerTool` and `ToolRegistry.registerFactory` skip when the tool name is in the disabled set, with a debug log line - New `settings.tools.disabled: string[]` (UNION merge across scopes), wired from `loadCliConfig` into ConfigParameters - Tests pin the contract: skip at register, lazy factory skip, and the "next refresh" semantic (already-registered tools are unaffected by a subsequent toggle — the disabled set is consulted at register time, not at lookup time) Foundation for the `POST /workspace/tools/:name/enable` route in a follow-up commit; the bridge will write the settings file directly, and the next ACP child spawn will pick up the change. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * feat(serve): add session approval-mode mutation route (QwenLM#4175 Wave 4 PR 17) Adds POST /session/:id/approval-mode — the first strict-gated session mutation surface introduced in Wave 4 alongside PR 16 / PR 21. Remote clients can switch a live session's approval mode (plan / default / auto-edit / yolo) without touching the user's host CLI. Routing: - Route handler validates `mode` against the closed `APPROVAL_MODES` enum and an optional `persist: boolean` flag (400 on either) - Bridge `setSessionApprovalMode` forwards through the new `qwen/control/session/approval_mode` ACP extMethod (introduced in a new `SERVE_CONTROL_EXT_METHODS` namespace) so the change lands inside the ACP child's per-session `Config` - `persist: true` writes `tools.approvalMode` to workspace settings via a new `BridgeOptions.persistApprovalMode` callback wired in `runQwenServe`. Default is ephemeral so a remote caller does not pollute the user's host settings unless asked Trust gate translation: - ACP child catches `TrustGateError` from `Config.setApprovalMode` and re-raises as a JSON-RPC error with `data.errorKind: 'trust_gate'` - Bridge detects the structured payload and re-instantiates the typed `TrustGateError` (since the class name does not survive the wire) - `sendBridgeError` translates to HTTP 403 with the closed PR-13 `errorKind: 'auth_env_error'` taxonomy SDK additions: - `DaemonClient.setSessionApprovalMode(sessionId, mode, opts?, clientId?)` mirrors the route shape and forwards `X-Qwen-Client-Id` - New `DaemonApprovalMode` literal union and `DAEMON_APPROVAL_MODES` const tuple; `DaemonApprovalModeResult` for the route response - New `approval_mode_changed` typed event on `DaemonControlEvent`, reducer integration on `DaemonSessionViewState` (`approvalMode` / `approvalModeChangedCount` / `lastApprovalModeChange`) - Drift detector `approvalMode.test.ts` walks core's `ApprovalMode` enum and fails CI if `APPROVAL_MODES` or `DAEMON_APPROVAL_MODES` drift in either direction New capability tag `session_approval_mode_control` (always-on, since v1). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * feat(serve): add workspace tool toggle route (QwenLM#4175 Wave 4 PR 17) Adds POST /workspace/tools/:name/enable — strict-gated mutation route that toggles a tool name in the workspace's `tools.disabled` settings list. Pure file IO + workspace-scoped event fan-out; no ACP roundtrip. - Bridge `setWorkspaceToolEnabled(toolName, enabled, originatorClientId)` invokes the new `BridgeOptions.persistDisabledTools` callback. The default `runQwenServe` wires it to `loadSettings(workspace).setValue( 'tools.disabled', merged)` with a fresh load on each call so concurrent edits from other writers stay safe across the read/modify/write window - New private `broadcastWorkspaceEvent` helper fan-outs to every live session SSE bus, swallowing per-bus errors so a single torn-down session can't block its peers. Naming mirrors PR 21 QwenLM#4255 (the post- PR-16 fold-in will collapse the two helpers) - Unknown tool names are accepted: the daemon has no authoritative tool registry to validate against (built-ins live inside the ACP child, MCP tools are discovered post-spawn). Pre-disabling a not-yet-installed MCP tool is a legitimate use case - Live ACP children retain already-registered tools — the toggle takes effect on the next ACP child spawn (`tools.disabled` is consulted at Config construction time, gated in ToolRegistry.registerTool by PR 17 commit 2) SDK additions: - `DaemonClient.setWorkspaceToolEnabled(toolName, enabled, clientId?)` with URL-encoded tool name - `DaemonToolToggleResult` + `DaemonToolToggledEvent` typed event, reducer integration on `DaemonSessionViewState` (`toolToggleCount` / `lastToolToggle`) - `asKnownDaemonEvent` runtime guard for `tool_toggled` AND `approval_mode_changed` (the latter was missed in commit 3 — without this entry the events were silently filed as `unrecognizedKnownEvent` by `reduceDaemonSessionEvent`, never reaching the typed reducer cases) New capability tag `workspace_tool_toggle` (always-on, since v1). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * feat(serve): add workspace init route (QwenLM#4175 Wave 4 PR 17) Adds POST /workspace/init — strict-gated mutation route that scaffolds an empty `QWEN.md` (or whatever `getCurrentGeminiMdFilename()` returns under `--memory-file-name` overrides) at the daemon's bound workspace root. Mechanical only — does NOT invoke the LLM. Clients that want AI-driven content fill should follow up with POST /session/:id/prompt. Behavior: - Default refuses to overwrite when the target file exists with non- whitespace content; the bridge throws `WorkspaceInitConflictError` which the route translates to HTTP 409 `workspace_init_conflict` with the resolved path + size in the body - `body: {force: true}` overwrites unconditionally; response carries `action: 'overwrote'` vs `'created'` so SDK consumers can render the difference - Whitespace-only existing content is treated as absent (no 409), matching the local `/init` slash command's behavior so a half- broken init left with an empty file doesn't trap the user - Pure file IO + workspace-scoped event fan-out — no ACP roundtrip; works regardless of whether an ACP child is alive - Fan-outs `workspace_initialized` event with `{path, action}` to every live session SSE bus via the `broadcastWorkspaceEvent` helper introduced in commit 4 SDK additions: - `DaemonClient.initWorkspace(opts?, clientId?)` with conditional body emission (omits `force` unless explicitly true so older daemons that reject unknown body fields stay compatible) - `DaemonInitWorkspaceResult` + `DaemonWorkspaceInitializedEvent` typed event with runtime guard (`isWorkspaceInitializedData`), reducer integration on `DaemonSessionViewState` (`workspaceInitCount` / `lastWorkspaceInit`) New typed error class `WorkspaceInitConflictError` exported from `packages/cli/src/serve/index.ts` so direct embeds can match it via `instanceof`. New capability tag `workspace_init` (always-on, since v1). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * feat(serve): add MCP server restart route with budget guard (QwenLM#4175 Wave 4 PR 17) Adds POST /workspace/mcp/:server/restart — strict-gated mutation route that performs a single-server MCP restart through the ACP child's `McpClientManager.discoverMcpToolsForServer`. Pre-checks the live budget snapshot from PR 14 v1 (QwenLM#4247) so a restart on a budget-saturated workspace returns a soft refusal rather than triggering a `BudgetExhaustedError` cascade through the discovery loop. Decision logic (ACP-side, in `qwen/control/workspace/mcp/restart` extMethod): - Server not in `getMcpServers()` → JSON-RPC `resourceNotFound` → HTTP 404 - Server in `excludedMcpServers` → 200 with `{skipped:true, reason:'disabled'}` - `manager.isServerDiscovering(name)` → 200 with `{reason:'in_flight'}` - Mode is `enforce`, server not in `reservedSlots`, total ≥ budget → 200 with `{reason:'budget_would_exceed'}` - Otherwise: `discoverMcpToolsForServer(name, config)`, return `{restarted:true, durationMs}` Soft refusals still return 200 because the route understood the request and reached a deterministic answer about why no restart happened. Only hard "we cannot answer" cases (unknown server, no live ACP child) escalate to non-2xx. This mirrors PR 14 v1's discovery-time refusal contract: refusals don't throw, they get recorded. Bridge: - New `restartMcpServer(serverName, originatorClientId)` forwards through the new `SERVE_CONTROL_EXT_METHODS.workspaceMcpRestart` extMethod against the live `liveChannelInfo()` channel - Throws `SessionNotFoundError` (mapped to HTTP 404) when no ACP child is alive — restart inherently requires a live `McpClientManager` instance - Fan-outs `mcp_server_restarted` (success) or `mcp_server_restart_refused` (skip) to every live session SSE bus Core: - New public `McpClientManager.isServerDiscovering(serverName): boolean` — reads `serverDiscoveryPromises.has(name)` so the daemon can short-circuit a redundant restart with `skipped:in_flight` instead of awaiting the original discovery promise (HTTP latency stays bounded) SDK additions: - `DaemonClient.restartMcpServer(serverName, clientId?)` with URL-encoded server name - `DaemonMcpRestartResult` discriminated union, two new typed events (`DaemonMcpServerRestartedEvent`, `DaemonMcpServerRestartRefusedEvent`) with runtime guards, reducer integration on `DaemonSessionViewState` (`mcpRestartCount` / `lastMcpRestart` / `mcpRestartRefusedCount` / `lastMcpRestartRefused`) New capability tag `workspace_mcp_restart` (always-on, since v1). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * docs(serve): mutation control routes protocol section (QwenLM#4175 Wave 4 PR 17) Adds a "Mutation: approval, tools, init, MCP restart" section to the developer protocol doc covering all four PR 17 routes: - POST /session/:id/approval-mode — `{mode, persist?}` request, four closed-enum modes, trust-gate 403 with `errorKind: 'auth_env_error'`, `approval_mode_changed` SSE event (session-scoped) - POST /workspace/tools/:name/enable — `{enabled}` request, unknown names accepted, "next-spawn semantics" call-out, `tool_toggled` SSE event (workspace-scoped fan-out) - POST /workspace/init — `{force?}` request, scaffold-only contract (no LLM call), 409 with `path` + `existingSize` body when the target exists with non-whitespace content, `workspace_initialized` SSE event (workspace-scoped) - POST /workspace/mcp/:server/restart — empty body, soft-skip decision table (in_flight / disabled / budget_would_exceed), `mcp_server_restarted` and `mcp_server_restart_refused` SSE events Capability list at the top of the file updated with the four new tags (and a missing-from-PR-13 fix for `workspace_env` / `workspace_preflight`). User-facing `qwen-serve.md` gains a one-line "Remote runtime control" bullet under "What it gives you" pointing to the four routes and clarifying that `/workspace/init` is mechanical only. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve): fold-in 1 — wenshao + gpt-5.5 review (QwenLM#4175 Wave 4 PR 17) Addresses 5 critical / 4 high / 2 medium items from QwenLM#4282 review. CI blocker (wenshao H1) - Move `approvalMode.test.ts` from `packages/cli/src/acp-integration/` to `packages/sdk-typescript/test/unit/approval-mode-drift.test.ts`. The CLI package has no `@qwen-code/sdk` dep and the tsconfig has no path mapping for it, so `tsc --build` failed `Cannot find module '@qwen-code/sdk'` on Lint + Test (mac/linux/windows). The SDK package is the right host: it already depends on `@qwen-code/qwen-code-core`, and the test pins the SDK ↔ core contract directly. Also drop the tautological `APPROVAL_MODES contains every ApprovalMode enum value` check — `APPROVAL_MODES` is defined as `Object.values(ApprovalMode)` in core, so that assertion can never fire. Critical (gpt-5.5 via wenshao /review) - C1 (`initWorkspace` path traversal): `getCurrentGeminiMdFilename()` is settings-controlled. A daemon configured with `context.fileName: "../outside.md"` could resolve outside `boundWorkspace` and let this strict-gated mutation create or truncate a file outside the workspace boundary. Resolve and verify the joined path stays within `boundWorkspace`; reject otherwise. - C2 (`X-Qwen-Client-Id` forgery): the 3 workspace mutation routes (`/workspace/init`, `/workspace/tools/:name/enable`, `/workspace/mcp/:server/restart`) accepted any syntactically valid client id and stamped it onto fan-out events without checking `bridge.knownClientIds()`. Mirrors the inline validation pattern PR 16 already uses for `/workspace/memory` and `/workspace/agents`. Add `parseAndValidateWorkspaceClientId` shared helper in `server.ts` (collapses with PR 16's pattern when the Wave-4-wide DRY refactor lands). - C3 (MCP restart budget under-count): the pre-check used `accounting.total >= budget`, but enforce-mode capacity is reserved by `tryReserveSlot` via `reservedSlots` (which counts configured + in-flight + disconnected slot holders). `total` only counts CONNECTED, so a restart on a budget-saturated workspace passed the pre-check while the manager refused internally and the route reported `restarted: true`. Mirror the manager's policy by checking `reservedSlots.length`. - C4 (false `restarted: true` on broken MCP): `discoverMcpToolsForServer` catches reconnect/discovery errors internally (logs and resolves void), so the route reported `restarted: true` while the server stayed disconnected. After the call, verify the live `getMCPServerStatus(name)` is `MCPServerStatus.CONNECTED`; throw a structured JSON-RPC error otherwise. New typed bridge error `McpServerRestartFailedError` → HTTP 502 with `errorKind: 'protocol_error'`. - C5 (unknown MCP server falls through as 500): the agent-side `RequestError.resourceNotFound` was not specially handled by `sendBridgeError`, so a typo in the server name returned 500 indistinguishable from an internal daemon failure. Re-raise with structured `data.errorKind: 'mcp_server_not_found'`; bridge re-instantiates as `McpServerNotFoundError`; route maps to a stable 404 with `code: 'mcp_server_not_found'` and `serverName` in the body. High (wenshao) - H2 (`persistDisabledTools` scope leak): the callback read `fresh.merged.tools?.disabled` (UNION across System / SystemDefaults / User / Workspace) and wrote the result back into `SettingScope.Workspace`, copying entries from higher scopes into the workspace file on the first toggle. Subsequent removals at the originating scope (e.g. User) would no longer take effect. Read from the WORKSPACE-scope `LoadedSettings` only via `fresh.forScope(SettingScope.Workspace).settings.tools?.disabled`. - H3 (silent persist no-op): `setSessionApprovalMode` with `persist: true` returned HTTP 200 + `persisted: false` when no `persistApprovalMode` callback was wired, indistinguishable from "hook ran but failed" or genuine `persisted: true`. Throw asymmetrically with the sibling `setWorkspaceToolEnabled` (which already throws in the same situation). - H4 (whitespace-only init clobber): `/workspace/init` overwrote a whitespace-only `QWEN.md` with `action: 'created'` despite `force` not being passed, destroying the user's whitespace content (template, half-written init, intentional newline) without a signal. Treat existing-and-whitespace-only as a no-op; return `action: 'noop'` and skip the write. Adds `'noop'` to the discriminator union on `DaemonInitWorkspaceResult` and the `workspace_initialized` event payload. Medium - M1 (SDK `clientId` position consistency): the four new mutation helpers placed `clientId` inconsistently (4th vs 3rd vs 2nd). Fold `clientId` into the trailing options bag for all four. Matches the existing `context: { clientId }` argument the bridge layer already uses internally; reduces caller boilerplate for callers that always stamp clientId for audit. - M2 (dead `instanceof String` branch): drop the no-op `instanceof String` clause in `setSessionApprovalMode`'s wire-error reconstruction — `Error.message` is always a primitive string. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * chore(vscode): regenerate settings.schema.json for tools.disabled (QwenLM#4175 PR 17 fold-in) Picked up by `Check settings schema is up-to-date` lint step (the only red CI step on `3f63ad435`). PR 17 commit 2 added `tools.disabled` to `packages/cli/src/config/settingsSchema.ts` but didn't run `npm run generate:settings-schema`, so the JSON-schema mirror used by the VSCode IDE companion drifted. Regenerating now picks up the new entry verbatim — no behavior change. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve): fold-in 2 — gpt-5.5 + deepseek review (QwenLM#4175 Wave 4 PR 17) Addresses 3 critical / 3 suggestion items from QwenLM#4282 round-2 review. Critical (gpt-5.5) - CV1 (`initWorkspace` symlink escape): the textual `withinWorkspace` check on the joined path doesn't see through symlinks. A `QWEN.md` symlink inside the workspace pointing outside it would still get followed by `fs.readFile` / `writeFile`; under `force: true` the route would truncate the external target, and a dangling symlink could create outside the workspace. Add an `lstat(target)` check before the read/write and reject when `isSymbolicLink()`. The proper long-term fix routes through PR 18's `WorkspaceFileSystem` boundary (chain-aware resolution + audit hooks); tracked under the SV2 TODO comment below. - CV2 (MCP restart timeout vs MCP discovery deadline): bridge raced against `initTimeoutMs` (10s) but `McpClientManager`'s per-server discovery deadline can be up to 5 minutes (`MAX_DISCOVERY_TIMEOUT_MS = 300_000`). A valid restart returned HTTP timeout to the client while the ACP child kept reconnecting in the background, leaving daemon and client state divergent. Add a dedicated `MCP_RESTART_TIMEOUT_MS = 300_000` constant and use it for the bridge race. The bridge race remains a safety net against a wedged ACP channel; per-server discovery deadlines stay owned by the manager. - CV3 (`disabledTools` rename ordering bug): the gate ran on `tool.name` BEFORE the MCP collision-rename branch. An MCP tool that collided with a lazy factory and got renamed via `asFullyQualifiedTool()` (e.g. `structured_output` → `mcp__rogue-server__structured_output`) bypassed the disabled set if the operator disabled the renamed-and-exposed name. Re-check `isToolDisabled` after the rename, before inserting into `this.tools`. New regression test pins the contract. Suggestion - SV1 (deepseek): cap `:name` path parameter at 256 chars so an extremely long tool name can't bloat the workspace settings file. Mirrors `MAX_CLIENT_ID_LENGTH = 128` and `MAX_WORKSPACE_PATH_LENGTH = 4096` siblings. - SV2 (deepseek): `initWorkspace` uses `node:fs/promises` directly instead of routing through `WorkspaceFileSystem`. Bridge layer doesn't have `fsFactory` plumbed today (PR 18 boundary is per-request inside `createServeApp`); a separate plumbing PR will hoist it into `BridgeOptions`. Added a FIXME pointing to that follow-up. CV1's symlink reject covers the immediate boundary-escape concern. - SV3 (gpt-5.5): the daemon stamps `originatorClientId` on the SSE envelope, but reducer snapshots stored only `event.data`. Consumers of `lastApprovalModeChange` / `lastToolToggle` / `lastWorkspaceInit` / `lastMcpRestart{,Refused}` couldn't tell whether the mutation originated from themselves. New `mergeOriginator` helper copies the envelope's `originatorClientId` onto the stored snapshot when `data.originatorClientId` is unset (the daemon does not currently populate `data.originatorClientId`, but the field exists on the Data interfaces — preserve it if a future daemon version does). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve): fold-in 3 — gpt-5.5 round-3 review (QwenLM#4175 Wave 4 PR 17) Addresses 2 suggestion items from QwenLM#4282 round-3 review (post-rebase onto PR 21). - C7 (`docs/developers/qwen-serve-protocol.md`): protocol doc showed built-in display labels (`Bash`, `Read`, `Write`) as disable-able, but `ToolRegistry.isToolDisabled` checks the actual registered tool name. The shell tool registers as `run_shell_command`, so a `POST /workspace/tools/Bash/enable {enabled:false}` would persist + emit `tool_toggled` while the next session still registers `run_shell_command`. Updated the doc to use the canonical registry name in the example body and added a⚠️ block explaining that names must match the registry's exposed identifier exactly. The daemon route deliberately does not alias-resolve (it accepts unknown names for forward-looking MCP pre-disable, so any alias map would be incomplete). - C8 (`packages/sdk-typescript/test/unit/daemonEvents.test.ts`): the 5 PR 17 reducer cases (`approval_mode_changed`, `tool_toggled`, `workspace_initialized`, `mcp_server_restarted`, `mcp_server_restart_refused`) had no SDK-side coverage. Added 7 tests covering happy-path counter + last-snapshot accumulation, malformed-payload rejection (rounds through `asKnownDaemonEvent → undefined` and increments `unrecognizedKnownEventCount` rather than the event-specific counter), all 3 refused-reason literals, the `noop` action literal added in fold-in 1, and the `mergeOriginator` precedence rule (data-level wins over envelope-level when both present). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve): fold-in 4 — qwen-latest review (QwenLM#4175 Wave 4 PR 17) Round-4 reviewer adoption (qwen-latest-series-invite-beta-v28): - C1: hoist `persistApprovalMode` guard before the ACP roundtrip so a missing callback no longer leaves the daemon's mode shifted while the caller observes a 500 (httpAcpBridge.ts). - C2: serialize `persistApprovalMode` and `persistDisabledTools` through a per-workspace promise chain (`withSettingsLock`) so concurrent toggles can't lose updates in the read-modify-write window (runQwenServe.ts). - C3: trim `toolName` before persisting in `/workspace/tools/:name/enable` so the write path matches `loadCliConfig`'s `.trim()` on read. Re-validates empty-after-trim with 400 `invalid_tool_name`. - S1: cap `serverName` at `MAX_SERVER_NAME_LENGTH=256` on `/workspace/mcp/:server/restart` for parity with the tool-toggle cap. - S2: when `persist:true` succeeds, mirror `approval_mode_changed` via `broadcastWorkspaceEvent` so peer sessions in the same workspace observe the new default before their next ACP child spawn. - S3: `'noop'` added to `FakeBridge.initWorkspaceImpl` return type. - S5: `qwen-serve-protocol.md` action enumeration now includes `'noop'` and notes how the SSE event mirrors the response action. S4 (sync IO inside async persist callbacks) is acknowledged but deferred — `loadSettings` is the project-wide read path and the H2 fold-in already restricted us to workspace-scope-only consumption, keeping the sync window bounded. Fully eliminating it requires swapping `loadSettings` to async across the CLI, which is out of scope. 7 new tests: - server.test.ts × 3: tool-name trim, whitespace-only 400, server-name 256 cap. - httpAcpBridge.test.ts × 4: pre-call guard ordering for persist:true (no callback), persist:false bypasses guard, persist:true broadcasts to peer sessions, persist:false stays session-scoped. Typecheck clean across cli / sdk-typescript / core. 1599/1599 unit tests pass.
…QwenLM#4175 PR 22a) (QwenLM#4295) * refactor(acp-bridge): create skeleton + lift zero-coupling primitives (QwenLM#4175 PR 22a) First slice of QwenLM#4175 Wave 5 PR 22 (`refactor(serve): extract acp bridge primitives`). Lifts the three primitives from `packages/cli/src/serve/` that have zero coupling to the rest of `serve/`, so PR 22a can land ahead of PR 17 (QwenLM#4282) and PR 14b (QwenLM#4271) without merge-conflict risk on `httpAcpBridge.ts`. Also seeds the `PermissionMediator` interface contract that PR 24 will implement (4 strategies — first-responder / designated / consensus / local-only — replacing the inline first-responder voting in `BridgeClient.requestPermission`). What moves - `cli/src/serve/eventBus.ts` (578 LOC, no internal imports) → `packages/acp-bridge/src/eventBus.ts` - `cli/src/serve/inMemoryChannel.ts` (73 LOC, only depends on `@agentclientprotocol/sdk`) → `packages/acp-bridge/src/inMemoryChannel.ts` - `httpAcpBridge.ts:638-677` `AcpChannel` / `AcpChannelExitInfo` / `ChannelFactory` types → `packages/acp-bridge/src/channel.ts` - Both test files moved alongside their sources What's new - `packages/acp-bridge/` package (`@qwen-code/acp-bridge`, internal, not published to npm yet) - `packages/acp-bridge/src/permission.ts` — type-only `PermissionMediator` / `PermissionPolicy` / `PermissionResolution` / `PermissionRequestRecord` / `PermissionVote` / `PermissionVoteOutcome`. No implementation; PR 24 fills it in. - `packages/cli/src/serve/eventBus.ts` and `inMemoryChannel.ts` are now one-line re-export wrappers for backward compat. - `httpAcpBridge.ts:638-677` is now a one-line `import type` + `export type` re-export. Backward compatibility - All existing relative imports (`./eventBus.js`, `./inMemoryChannel.js`, `../serve/eventBus.js`) keep resolving via the wrappers — no churn for the 8 importer sites in `cli/src/serve/` plus `cli/src/commands/serve.ts:14`. - `httpAcpBridge.ts` continues to export `AcpChannel` / `AcpChannelExitInfo` / `ChannelFactory` so any external consumer is unaffected. - Zero `/capabilities` payload changes, zero HTTP route changes, zero SDK behavior changes, zero spawn-site behavior changes. What's not in this slice (deferred to PR 22b after PR 17/14b land) - `BridgeClient` + `createHttpAcpBridge` factory + `defaultSpawnChannelFactory` (~3000 LOC, lines 1082-3770 + 4106-4307 of `httpAcpBridge.ts`). - Channel and VSCode-IDE-companion own-spawn migrations. Tests - `npm test --workspace @qwen-code/acp-bridge` — 28/28 pass (eventBus 20 + inMemoryChannel 8) at the new location. - `cd packages/cli && npx vitest run src/serve/` — 567/567 pass (no regressions). - `cd packages/cli && npx tsc --noEmit` clean. References - QwenLM#4175 Wave 5 PR 22 row - QwenLM#3803 chiga0's "Stage 1.5-prereq AcpChannel lift" - `httpAcpBridge.ts:679-696` FIXME for the four `PermissionMediator` strategies this slice declares * fix(acp-bridge): minimize package-lock.json diff to acp-bridge entries only The previous commit ran `npm install --ignore-scripts`, which npm 11 treated as license to renormalize peer-dependency markers across the entire lockfile and resolve `@types/react@18.3.28`, `@types/react-dom@18.3.7`, and `@types/prop-types@15.7.15` away from their pinned versions. CI's `npm ci` then refused to install because the manifests no longer matched the lockfile. Restored package-lock.json to origin/main and surgically added only the three entries the new package actually requires: - `node_modules/@qwen-code/acp-bridge` workspace symlink - `packages/acp-bridge` workspace manifest snapshot - `@qwen-code/acp-bridge` listed under `packages/cli`'s dependencies `npm ci --no-audit --ignore-scripts` now succeeds (1453 packages, no warnings about stale lockfile entries). Re-verified `acp-bridge` package tests (28/28 pass) and `tsc --build` clean. * fix(acp-bridge): address PR 22a review feedback Three review-driven fixes: 1. **wenshao**: removed `src/**/*.test.ts` from `tsconfig.json` exclude so the moved `eventBus.test.ts` and `inMemoryChannel.test.ts` regain typecheck coverage. Pre-fix, a future change to `BridgeEvent` / `SubscribeOptions` shape would only fail at runtime; post-fix `tsc --noEmit` catches the mismatch. Matches `packages/core/tsconfig.json`'s pattern (no test exclude; emitted test artefacts in dist/ are accepted convention). 2. **Copilot**: corrected the FIXME line citation in `permission.ts` and `README.md` from `1144-1154` to the actual range `1096-1106` (the four-strategy FIXME inside `BridgeClient.requestPermission`). Verified via grep against current `httpAcpBridge.ts`. 3. **Copilot review summary**: added a "Imports — root vs subpaths" section to README.md explaining when to use the barrel root (`@qwen-code/acp-bridge`) vs per-module subpaths (`/eventBus`, `/inMemoryChannel`, `/channel`, `/permission`), and added `@see` JSDoc pointers in `cli/src/serve/eventBus.ts` and `inMemoryChannel.ts` wrappers to the implementation files for the design-rationale comments that moved to acp-bridge. Verification: 28/28 acp-bridge tests + 567/567 cli serve tests pass; `tsc --noEmit` clean across both packages including the moved test files. Declined (replied on the PR): - Move `@agentclientprotocol/sdk` to `peerDependencies` — sound advice in general but not yet relevant; package is internal (`files: ["dist"]`, no npm publish), so dedupe is automatic through monorepo file: links. Will revisit during PR 28 (npm alpha publish).
QwenLM#4299) (QwenLM#4300) Replace regex-on-`.message` classification in `mapDomainErrorToErrorKind` with two new typed error classes living next to `BridgeTimeoutError` in `status.ts`: - `BridgeChannelClosedError(context)` — replaces three `throw new Error('agent channel closed …')` sites in `httpAcpBridge.ts` (`getTransportClosedReject`, `getChannelClosedReject`, restore-time `transportClosed`). The `context` field preserves the legacy message wording verbatim so log greps and existing diagnostics keep working. - `MissingCliEntryError()` — replaces the generic `Error` thrown by `defaultSpawnChannelFactory` when neither `QWEN_CLI_ENTRY` nor `process.argv[1]` resolves. Message bytes preserved. `mapDomainErrorToErrorKind` now branches on `instanceof` for these two kinds, and the regex fallbacks (plus the TODO that called them out) are removed entirely — there are no remaining throw sites we don't control. The test suite gains a regression case: wrapping or foreign errors whose `.message` happens to contain "agent channel closed" or "Cannot determine CLI entry path" no longer misclassify (returns `undefined`, the correct behavior per the issue). Bridge-internal classes don't cross the JSON-RPC boundary into the ACP child, so `instanceof` symmetry is preserved at every consumer of `mapDomainErrorToErrorKind` (3 sites in `httpAcpBridge.ts`, 6 in `acpAgent.ts`). Closes QwenLM#4299 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
QwenLM#4306) Two regressions introduced by QwenLM#4271 (MCP guardrail push events) had been failing every main E2E run since the PR landed. Both fixes are in integration tests; no source changes. 1. `qwen serve — capabilities envelope > advertises all baseline capabilities`. `mcp_guardrail_events` was added to `SERVE_CAPABILITY_REGISTRY` and to the unit baseline list (`packages/cli/src/serve/server.test.ts:119`) but not to the integration test's hand-maintained list. Same drift class as QwenLM#4268 / QwenLM#4284. Fix: append the tag in registry order. 2. `MCP child amplification (P1 baseline) > clientCount matches external pgrep observation`. The test (added by QwenLM#4271, never passed CI) asserted `pgrep_observed === MCP_SERVERS_CONFIGURED`, ignoring that an ACP child runs TWO `Config` objects — bootstrap (`runAcpAgent` → `config.initialize`) + per-session (`newSessionConfig` → `config.initialize`) — each with its own `McpClientManager`. After one session, pgrep observes 2×N grandchildren while `/workspace/mcp` snapshot (`buildWorkspaceMcpStatus(this.config)`) reads only the bootstrap manager (=N). Fix: encode the 2× architectural amplification literally so a future follow-up that unifies the managers fails this assertion and forces an explicit update; keep `clientCount === MCP_SERVERS_CONFIGURED` and the original `clientCount ≤ pgrep` over-report guard intact. Verified locally: both tests pass on first attempt (no retries) via `vitest run --root ./integration-tests cli/qwen-serve-routes.test.ts cli/qwen-serve-baseline.test.ts`. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…og/span consistency (QwenLM#4302) * fix(telemetry): Phase 1.5 polish — fallback order, abort-as-result, log/span consistency Follow-up to QwenLM#4126 (Phase 1: unify span creation paths). Four small fixes surfaced in the late review rounds; tracked in QwenLM#4212. 1. session-tracing fallback order — add resolveParentContext() helper that mirrors tracer.ts:getParentContext(): when interactionContext / toolContext ALS is empty, prefer the active OTel span over the synthetic session-root fallback. Without this, an LLM or tool call nested inside another OTel span (subagent inside a tool, instrumented HTTP path, etc.) re-parented to session root and flattened the trace tree. Three sites unified. 2. tool.execution span misreporting cancellation as success — when a tool observes signal.aborted and resolves with a normal ToolResult (no .error), the execution sub-span used to close as success while the parent tool span ended cancelled. Snapshot signal.aborted once after await, mirror it into endToolExecutionSpan, and stamp a sanitized error reason (Tool execution cancelled by user / Tool execution failed) so trace backends can distinguish the two without cross-referencing the parent. 3. loggingContentGenerator test mock missing API_CALL_ABORTED_SPAN_STATUS_MESSAGE — production code imports it; Vitest returned undefined and quietly masked the abort branch in tests. 4. Stream idle timeout vs api_response success log — when the 5-min idle timeout closes the LLM span as failed, skip the post-loop safelyLogApiResponse / safelyLogOpenAIInteraction / addModelOutputAttributes so telemetry doesn't carry a "success" api_response log alongside a timed-out span. Closes QwenLM#4212. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(telemetry): address QwenLM#4302 review — catch-path gating, exec span cancelled status, sync tags Four follow-ups from the QwenLM#4302 review on top of the Phase 1.5 polish: 1. loggingContentGenerator.ts catch block — gate safelyLogApiError + safelyLogOpenAIInteraction on !spanEndedByTimeout. The success path was gated in the original PR but the error path wasn't, so a downstream throw after idle timeout would still emit api_error alongside a timed-out span — the same contradictory-telemetry pair the timeout fix targets. 2. endToolExecutionSpan now accepts a `cancelled` discriminator. Without it, success: false unconditionally sets SpanStatusCode.ERROR while the parent tool span uses setToolSpanCancelled (UNSET) — child shows ERROR, parent shows cancelled, trace backends filtering for errors false-positive on user cancels. Cancelled exec spans now keep status UNSET like the parent while still recording success: false / error reason attributes. 3. coreToolScheduler.ts catch block now snapshots signal.aborted into a local `aborted` constant and passes it through endToolExecutionSpan + the if-branch — same idiom as the success path (QwenLM#4212), eliminating the divergent style. 4. SYNC tags on tracer.ts:getParentContext and session-tracing.ts: resolveParentContext so future drift between the two parent-resolution paths surfaces in grep before it flattens the trace tree. Tests: - session-tracing.test.ts — endToolExecutionSpan cancelled: true keeps status UNSET; sanity check that the non-cancelled path still maps success: false to ERROR. - coreToolScheduler.test.ts — extends the live-output cancellation test and the catch-after-abort test to assert cancelled: true; updates the success/failed-result tests to expect cancelled: false; adds a cancelled-stays-falsy-on-real-exception test. - loggingContentGenerator.test.ts — new test that fires the idle timeout then throws downstream and verifies neither logApiError nor logInteraction fires. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…enLM#4273) * feat(cli): emit active goal stream events * feat(cli): support goal in non-interactive mode * fix(core): dedupe active goal stream updates * test(core): stabilize prompt hook duration test * fix(core): handle active goal stop hook abort * fix(core): keep active goal sync on aborted stop hook * fix(core): emit active goal before stop abort
) (QwenLM#4265) The guard at the top of `submitQuery` cleared `pendingRetryErrorItem` via `clearRetryCountdown()` without first persisting it to history, so any new user turn — including informational slash commands like `/status`, `/about`, or `/help` — would silently discard a visible API error. The user could not refer back to the failure they were investigating. The inline comment already described the intended behavior ("Commit any pending retry error to history (without hint) since the user is starting a new conversation turn") but the commit step was missing. Add the `addItem` call before clearing, stripping the now-stale "Press Ctrl+Y to retry" hint so the persisted history entry reads cleanly. Fixes QwenLM#4169
…ider seam (QwenLM#4175 PR 22b/2) (QwenLM#4304) * refactor(acp-bridge): lift BridgeOptions + introduce DaemonStatusProvider seam (QwenLM#4175 PR 22b/2) Design slice of QwenLM#4175 Wave 5 PR 22b. Lifts the bridge's construction contract (`BridgeOptions`) to `@qwen-code/acp-bridge` and introduces a new `DaemonStatusProvider` interface so the bridge factory no longer hard-imports daemon-host-specific helpers. The mechanical bulk lift of `BridgeClient` + spawn factory + `createHttpAcpBridge` factory closure (~3000 LOC) lands in PR 22b/3 against this stable contract. What's lifted - `BridgeOptions` interface (~150 LOC) from `httpAcpBridge.ts:189-325` → `acp-bridge/src/bridgeOptions.ts`. Adds new optional `statusProvider?: DaemonStatusProvider` field; everything else preserved verbatim. - `buildDaemonPreflightCells` + `safeCheck` (~210 LOC) deleted from `httpAcpBridge.ts:4002-4214` → moved to new `cli/src/serve/daemonStatusProvider.ts`. Logic byte-for-byte identical (slot-by-slot allocation, `Promise.allSettled` fan-out, classified error cells); only the call-site changes from `await buildDaemonPreflightCells(boundWorkspace)` to `await opts.statusProvider?.getDaemonPreflightCells(boundWorkspace) ?? []`. What's new - `DaemonStatusProvider` interface in `acp-bridge/src/bridgeOptions.ts`: two methods (`getEnvStatus`, `getDaemonPreflightCells`) covering the full set of daemon-host cells the bridge currently delegates. Direct positional args (boundWorkspace + acpChannelLive); both methods return `Promise<>` so future async impls (config-file reads, network probes) get the seam without breaking the bridge. No abort signal — YAGNI; future async needs can pull in a typed extension. Single combined interface (vs split into two) because env + preflight are conceptually one daemon-host status surface and the production impl ships them as one factory. - `createDaemonStatusProvider()` in `cli/src/serve/daemonStatusProvider.ts`: thin wrapper that calls `buildEnvStatusFromProcess` (still in `envSnapshot.ts`) and the moved `buildDaemonPreflightCells` body. Stateless; production callers can re-allocate per-request without performance cost. Bridge factory body changes - `getWorkspaceEnvStatus`: routes through `opts.statusProvider?.getEnvStatus(...)`. When `statusProvider` is omitted (Mode A in-process consumers, tests that don't care about env cells) the bridge returns an idle envelope `{ v, workspaceCwd, initialized: true, acpChannelLive, cells: [] }` matching the "queryable but empty" pattern PR 12 / 13 established for diagnostic routes. - `getWorkspacePreflightStatus`: routes the daemon-half through `opts.statusProvider?.getDaemonPreflightCells(...) ?? []`. ACP-side cells (auth / mcp_discovery / skills / providers / tool_registry / egress) keep being fetched separately via the existing `requestWorkspaceStatus` extMethod path; the bridge stitches the two halves together as before. Wiring - `runQwenServe.ts` passes `statusProvider: createDaemonStatusProvider()` into the bridge factory by default. - `httpAcpBridge.test.ts`: `makeBridge` helper now defaults to providing the production status provider so existing 4 env / preflight tests (which exercise the bridge's delegation path) keep seeing populated cells. Tests that want to exercise the no-provider idle fallback can override with `{ statusProvider: undefined }`. - `acp-bridge/package.json`: adds `./bridgeOptions` subpath export. - `acp-bridge/src/index.ts`: barrel re-exports `bridgeOptions.js`. Backward compatibility - `httpAcpBridge.ts` still re-exports `BridgeOptions` and `DaemonStatusProvider` types for callers via `from './httpAcpBridge.js'` (existing pattern from PR 22b/1). server.ts:31, runQwenServe.ts:16, workspaceAgents.ts, workspaceMemory.ts all keep resolving without churn. - Zero `/capabilities`, route, SDK, or behavior changes when the default `runQwenServe` is in play (production wiring is identical to pre-PR; the default `createDaemonStatusProvider()` reproduces the pre-injection cell shapes byte-for-byte). - Direct embeds / tests that omit `statusProvider` see new idle fallback semantics; this is the documented Mode A integration pattern and matches PR 12 / 13's "idle status is queryable" contract. Verification - `cd packages/core && npx tsc --build` clean - `cd packages/acp-bridge && npx tsc --noEmit` clean - `cd packages/cli && npx tsc --noEmit` clean (no errors in `src/serve/`; pre-existing tsconfig-excluded `mcp.test.ts` / `check-i18n.ts` errors unrelated to this PR) - `cd packages/cli && npx vitest run src/serve/` — 701/701 pass (16 test files; 4 env / preflight tests reuse the production provider via updated `makeBridge` helper) - `npm ci --dry-run` clean (no lockfile drift; new file imports resolve via existing `@qwen-code/qwen-code-core` and `@qwen-code/acp-bridge` workspace deps already present from PR 22b/1) Remaining PR 22b checkpoints - 22b/3 (mechanical bulk lift): `BridgeClient` (~400 LOC) + `defaultSpawnChannelFactory` (~250 LOC) + `createHttpAcpBridge` factory closure (~2240 LOC) + 5064-LOC test file move + final `httpAcpBridge.ts` shim. Zero new design decisions — factory consumes `DaemonStatusProvider` per the contract this PR froze. - 22b' (small follow-up): `WorkspaceFileSystem` injection into `BridgeClient.writeTextFile/readTextFile`. Closes PR 18 ws.ts:613 TOCTOU thread. - QwenLM#4299: `mapDomainErrorToErrorKind` regex → typed errors (parallel follow-up tracked from PR 22b/1 review). * docs(acp-bridge): catch README up to PR 22b/1 + 22b/2 module additions Self-review found the package README still listed only PR 22a's four modules (`eventBus`, `inMemoryChannel`, `channel`, `permission`) and hadn't been updated for the eight new ones added across the two follow-up slices. The "What's here today" section now covers all nine modules; consumers reading the README to figure out what to import know the full surface. Targeted minimal update — major restructure (drop the obsolete "What's not here yet" section + the "PR 22a (this)" status table) deferred to PR 22b/3 when the package surface stabilizes. Modules added to the list: - `status` (PR 22b/1) — wire contract status types + 27-symbol acp-integration import surface; mapDomainErrorToErrorKind classifier (regex → typed after QwenLM#4299/QwenLM#4300). - `workspacePaths` (PR 22b/1) — canonicalizeWorkspace + MAX_WORKSPACE_PATH_LENGTH. - `bridgeErrors` (PR 22b/1) — 11 typed bridge Error subclasses. - `bridgeTypes` (PR 22b/1) — BridgeSession*, HttpAcpBridge interface. - `bridgeOptions` (PR 22b/2) — BridgeOptions + DaemonStatusProvider seam. * fix(acp-bridge): address PR 22b/2 review feedback Four review-driven fixes from wenshao + gpt-5.5 + Copilot: 1. **wenshao [Critical] @ httpAcpBridge.ts** — `getEnvStatus` and `getDaemonPreflightCells` calls now wrapped in try/catch with stderr-logged fallback to the idle envelope / empty cells. The pre-injection `buildDaemonPreflightCells` used `Promise.allSettled` internally and was effectively unthrowable; the route's "daemon cells always render" invariant would have silently broken for any custom `DaemonStatusProvider` that throws. Fallback preserves the route-level guarantee structurally. 2. **wenshao/gpt-5.5 [Suggestion] @ httpAcpBridge.ts** — wired `createDaemonStatusProvider()` into the default bridge constructed by `createServeApp()` (in `server.ts`). Pre-fix, direct embeds / tests that didn't inject `deps.bridge` would silently lose the daemon env + preflight cells the default server app reported pre-injection. Now symmetric with `runQwenServe.ts`. 3. **wenshao [Suggestion] @ httpAcpBridge.ts** — extracted `createIdleEnvStatus(workspaceCwd, acpChannelLive)` helper into `acp-bridge/src/status.ts` (matches existing `createIdleAcpPreflightCells` pattern). Single construction site for `ServeWorkspaceEnvStatus`'s idle shape — future optional fields added to the production builder in `envSnapshot.ts buildEnvStatusFromProcess` won't silently diverge from the bridge's fallback. 4. **wenshao [Suggestion] @ httpAcpBridge.test.ts** — added 4 new tests covering the previously-untested provider-omitted / provider-throws paths: - `returns idle env envelope when statusProvider is omitted` - `returns empty daemon preflight cells when statusProvider is omitted` - `falls back to idle env envelope when statusProvider.getEnvStatus throws` - `falls back to empty daemon cells when statusProvider.getDaemonPreflightCells throws` Verification - `cd packages/core && npx tsc --build` clean - `cd packages/acp-bridge && npx tsc --build` clean - `cd packages/cli && npx tsc --noEmit` clean (no errors in `src/serve/`) - `cd packages/cli && npx vitest run src/serve/httpAcpBridge.test.ts` — 165/165 pass (4 new tests + 161 existing) - `cd packages/cli && npx vitest run src/serve/` — 705/705 pass on deterministic isolated runs; full-suite has known flakes unrelated to this PR (server.test.ts state pollution; pre-existing) Declined - **Copilot @ httpAcpBridge.ts:200** — `import type` after `export` statements claim against `import/first` ESLint rule. Verified `npx eslint src/serve/httpAcpBridge.ts` exits 0 — the rule isn't enforced on this file. False positive. * fix(acp-bridge): SkillError detection survives cross-package bundling (QwenLM#4298 review) Wenshao [Suggestion] on the merged PR QwenLM#4298 (QwenLM#4298 thread r3262781757): the same cross-package class-duplication problem the function already documents and works around for `TrustGateError` via `.name` matching applies to `SkillError` — and the existing code uses raw `instanceof SkillError`, which silently returns `false` when bundling produces duplicate class instances (a real risk once `acp-bridge` ever gets consumed outside the monorepo or under a bundler that doesn't dedupe `file:` workspace deps). Fix mirrors the `TrustGateError` pattern: `instanceof SkillError || .name === 'SkillError'`. When the synthesized branch fires we cast to read `.code` (TS can't narrow through the OR), then dispatch to the same `SKILL_PARSE_CODES` / `SKILL_FILE_CODES` lookup the genuine branch already used. Folding into PR 22b/2 rather than a standalone tiny PR because: - `status.ts` is already touched by 22b/2 (added `createIdleEnvStatus`) - the change is 4 LOC + 1 regression test, well below scope - one fewer review cycle Regression test in `status.test.ts` constructs synthesized SkillError instances (`Object.assign(new Error(...), { name: 'SkillError', code: ... })`) that fail `instanceof` but carry the right `.name` + `.code`, then asserts classification still resolves to `parse_error` / `missing_file` / `undefined`. Mirrors the existing `TrustGateError` regression test that pinned this exact pattern. Verification - `npx tsc --build` clean (core + acp-bridge) - `cd packages/cli && npx tsc --noEmit` clean - `cd packages/acp-bridge && npx vitest run` — all tests pass including new SkillError cross-bundle regression - `cd packages/cli && npx vitest run src/serve/httpAcpBridge.test.ts` — 161 pass * docs(acp-bridge): clarify env vs preflight fallback asymmetry (self-review) Self-review fold-in. Inline comment at the no-provider branch of `getWorkspacePreflightStatus` explains why the fallback shape differs from `getWorkspaceEnvStatus`: env is a single envelope (idle helper); preflight is the union of daemon-locality + ACP-locality cells stitched below, so an empty daemon slice IS the right fallback (the ACP slice fills in independently). Comment-only; zero behavior change. Caught during the comprehensive self-review pass between commits — the asymmetry is correct but a future maintainer might find it confusing without context.
…wenLM#4172) * docs: add async memory recall design spec and implementation plan * refactor(core): introduce MemoryPrefetchHandle, replace pendingRecallAbortController field * refactor(core): fire memory recall as non-blocking prefetch with settledAt flag * refactor(core): replace blocking await with zero-wait settledAt poll at UserQuery consume point Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(core): inject recalled memory on first ToolResult when UserQuery consume point misses Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(core): replace pendingRecallAbortController with pendingMemoryPrefetch in all cleanup paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(memory): remove 1s AbortSignal.timeout from relevanceSelector — caller controls lifetime Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(core): update auto-memory tests for async prefetch pattern — drop fake timers and deadline references Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(core): add ToolResult inject test — memory injected on first ToolResult when recall settles after UserQuery Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(core): address codex review findings on async memory recall Three findings fixed: 1. Abort previous prefetch before installing a new one (line 1059): A new UserQuery/Cron used to overwrite pendingMemoryPrefetch without aborting the old controller, leaking an unbounded background recall now that the 1s side-query timeout is gone. 2. Move the UserQuery consume poll AFTER the async reminder setup: ensureTool + listSubagents are awaited between the old poll location and the final assembly, so recalls that settled during those awaits used to be missed (and a tool-less turn never got a ToolResult retry). The poll now runs immediately before requestToSend assembly, and unshifts memory to the front of systemReminders to preserve ordering. 3. Append memory after functionResponse on ToolResult turns: The Qwen API requires the functionResponse part to immediately follow the model's functionCall (see lines 1209-1213). Prepending memory text risked breaking that pairing on the native Gemini path. Appending keeps the pair intact on Gemini and produces the same OpenAI output (text becomes a separate user message after the tool messages). Tests: - Updated ToolResult inject test to assert memory index > functionResponse - Added abort-previous-prefetch test (mid-flight UserQuery aborts old handle) 224/224 tests pass; tsc clean on changed files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(core): add JSDoc + clarifying comments per review feedback Annotations only, no behavior change: - MemoryPrefetchHandle: full JSDoc covering lifecycle (create → consume → discard) - UserQuery consume site: explain why we unshift (front of systemReminders) - ToolResult inject site: reference hasPendingToolCall pattern instead of brittle line numbers when citing the Qwen functionCall/Response constraint - relevanceSelector.ts: explain why the side-query has no inline timeout (caller controls lifetime via MemoryPrefetchHandle.controller) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(core): bridge caller abort signal into memory prefetch + doc accuracy fixes Behavior fix (addresses copilot review on client.ts:1071): - When the parent sendMessageStream signal aborts (user Ctrl-C / Esc), the prefetch controller now aborts too. Previously the recall side-query would keep running until a later cleanup (next UserQuery / /clear / etc), wasting fast-model tokens on work whose result no one would consume. - Listener uses { once: true } and is also removed in the promise's finally() so a long-lived parent signal doesn't accumulate listeners across many turns under normal completion. - Edge case: if signal is already aborted when fire runs, abort the controller synchronously instead of attaching a listener. Test: - New regression guard: "should abort the pending prefetch when the caller signal aborts" — verifies the abort handler installed on the recall side fires once the parent signal aborts. Doc accuracy (addresses copilot review on the design spec): - ToolResult inject: was documented as "prepend", actual implementation appends to preserve functionCall/functionResponse pairing. Updated both the prose summary and the code sample. - Cleanup section: was documented as 6 abort-locations including the "post-consume clear"; the consume sites don't actually abort (the promise has already settled). Reorganized as 5 abort-and-clear sites + 2 clear-only sites with the distinction made explicit. - Fire path snippet: added the abort-previous-prefetch line and the caller-signal bridge so the spec matches the current implementation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(core): consolidate memory-prefetch lifecycle + safety nets per round-3 review Architectural (root-cause fix for cleanup-path sibling drift): - New private cancelPendingMemoryPrefetch() consolidates the abort+clear idiom (was duplicated across 6 sites). Logs at debug when discarding a settled-but-unconsumed handle so missing-memory scenarios are diagnosable. - New private tryConsumeMemoryPrefetch() consolidates the consume-and-mark-consumed dance (was duplicated UserQuery + ToolResult). - All existing cleanup sites + the two newly-flagged early-return sites (LoopDetected, Error) now use the helper; future early-returns can rely on the finally-block safety net. - sendMessageStream try-finally now uses a `normalCompletion` flag: only the bottom-of-try return path preserves the prefetch (intentional — next ToolResult turn may consume it); every other exit (uncaught exception, abnormal early-return) goes through cancelPendingMemoryPrefetch in finally. Diagnostics: - Restored AbortError debug log in fire-path catch (was silent after removing the deadline mechanism; aborts now come from 4+ sources so a trace is valuable). - Updated stale "deadline" log in recall.ts to reflect current abort sources (caller signal / new UserQuery / cleanup / 30 s safety timeout). Safety net: - Added 30 s ceiling in relevanceSelector via AbortSignal.any(...). Generous enough that normal ~1 s recalls don't trip it; bounds zombie side-queries if the model API hangs and the caller never aborts. Replaces the uncancellable `new AbortController().signal` fallback that would have left callerless invocations running indefinitely. Doc sync: - Design doc updated: UserQuery consume code sample now shows `unshift` (matches implementation) with an inline note on the prepend-vs-append contrast. Tests: - New regression guard: resetChat aborts pending prefetch and clears the handle. - New regression guard: LoopDetected mid-stream aborts pending prefetch and clears the handle (catches the sibling-drift bug this round caught). 227/227 tests pass; tsc clean on changed files. Declined from this round: - `await Promise.resolve()` after fire path: defensive — current code has multiple natural microtask drains before consume point. Added comment documenting the dependency instead. - Renaming `settledAt: number | null` to `settled: boolean`: timestamp has diagnostic value for future instrumentation; current consumers' null-check usage is documented in the JSDoc. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): correct getLastLoopType mock return type — null, not undefined CI tsc --build (stricter than --noEmit) caught: src/core/client.test.ts(2996,65): error TS2345: Argument of type 'undefined' is not assignable to parameter of type 'LoopType | null'. getLastLoopType()'s contract returns LoopType | null; the test mock was returning undefined. Switched to null to match the type. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(core): preserve memory prefetch across hook/next-speaker continuations + accurate recall abort log Round-4 review findings (self-inflicted regression from round-3): 1. Preserve pending prefetch on `return hookTurn` (Stop-hook continuation) and `return continueTurn` (next-speaker continuation). The round-3 `normalCompletion = true` was only set at the bottom-of-try `return turn`, leaving these two recursive-yield paths to trip the finally cleanup. When the inner Hook turn produced tool calls, the subsequent ToolResult turn found `pendingMemoryPrefetch === undefined` and memory was silently dropped. 2. recall.ts catch log distinguishes caller-driven aborts (heuristic genuinely skipped below) from the 30s safety-net timeout in relevanceSelector (the caller's signal is NOT aborted by that path, so the heuristic fallback actually runs). Regression guard added: - "should PRESERVE the pending prefetch when next-speaker continueTurn returns" — was red before this commit, green after. 258/258 tests pass; tsc --build clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…rktreeExitDialog, three-mode --resume restore (QwenLM#4174) * docs(worktree): update design doc — split Phase C/D, add Future section - Phase C: session persistence + hooksPath + StatusLine + WorktreeExitDialog - Phase D: --worktree CLI flag + symlinkDirectories - Future: sparse checkout, .worktreeinclude, tmux, PR reference parsing - Feature comparison table updated with Phase A/B completion status Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(worktree): add Phase C implementation plan 8 tasks: WorktreeSession sidecar storage, hooksPath setup, EnterWorktree/ExitWorktree session wiring, useWorktreeSession hook, Footer display, --resume context injection, WorktreeExitDialog. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(worktree): update Phase C plan after claude-code comparison - WorktreeSession: add originalHeadCommit field - hooksPath: add .husky/ detection + skip-if-already-set logic - StatusLine payload: expand worktree field to match claude-code schema - WorktreeExitDialog: load dirty state on mount, display counts in dialog - UIState.activeWorktree: add originalCwd, originalBranch, originalHeadCommit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(worktree): add WorktreeSession sidecar storage New worktreeSessionService.ts exposes read/write/clear functions for the sidecar JSON file at <chatsDir>/<sessionId>.worktree.json. SessionService gains getWorktreeSessionPath() so callers don't need to know the layout. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(worktree): configure core.hooksPath after worktree creation createUserWorktree() now sets `core.hooksPath` inside the new worktree to the main repo's hooks directory (.husky preferred, .git/hooks fallback) so commits inside the worktree run the same pre-commit checks as the main repo. Mirrors claude-code's performPostCreationSetup logic — skips the subprocess when the value already matches to avoid ~14ms spawn overhead. Failures are non-fatal: the worktree is still usable without hooks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(worktree): persist WorktreeSession sidecar in EnterWorktreeTool After creating a worktree, EnterWorktreeTool now writes a sidecar JSON file at <chatsDir>/<sessionId>.worktree.json with the full session state (slug, paths, branches, original HEAD SHA). --resume reads this in Phase C task 7 to restore worktree context. Best-effort: write failures don't abort the creation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(worktree): clear WorktreeSession sidecar in ExitWorktreeTool After successful keep or remove, ExitWorktreeTool now clears the sidecar JSON file iff its slug matches the worktree being exited. The slug check prevents wiping the sidecar when the user exits a worktree that isn't currently tracked (multiple worktrees on disk, sidecar tracks one). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(worktree): expose active worktree via useWorktreeSession + UIState New useWorktreeSession hook watches the sidecar JSON file (created by EnterWorktreeTool, deleted by ExitWorktreeTool) and returns the current WorktreeSession or null. AppContainer wires it into a new UIState.activeWorktree field consumed by Footer (Task 6) and WorktreeExitDialog (Task 8). A showWorktreeExitDialog state placeholder is added too, hardcoded false until Task 8 wires the dialog trigger. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(worktree): show active worktree in Footer + StatusLine payload Footer renders `⎇ <branch> (<slug>)` when activeWorktree != null, but only when the user has no custom statusline (their script likely handles it from the stdin payload itself). useStatusLine's StatusLineCommandInput gains a `worktree` field with {name, path, branch, original_cwd, original_branch} — matches claude-code's schema so statusline scripts can be shared across both CLIs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(worktree): inject context hint on --resume when worktree is active On --resume, if the session has a WorktreeSession sidecar, append an INFO history item pointing the model at the worktree path so it continues using it for file operations. Stale sidecars (worktree dir deleted out-of-band) are cleaned up so the Footer indicator doesn't go stale. qwen-code can't process.chdir() the way claude-code does because Config.targetDir is immutable; the context hint is the equivalent behavioral cue. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(worktree): add WorktreeExitDialog with dirty-state inspection WorktreeExitDialog renders when the user double-presses Ctrl+C inside a worktree. On mount it runs `git status --porcelain` and `git rev-list --count <originalHeadCommit>..HEAD` to show how many uncommitted files and new commits the user would discard by choosing "Remove". The dialog never auto-removes — every exit goes through explicit user confirmation per requirements. handleExit in AppContainer intercepts the second-press quit when activeWorktree is set and shows the dialog instead. A new UIAction handleWorktreeExit(choice) routes the user's choice through removal (via GitWorktreeService.removeUserWorktree) + sidecar cleanup + /quit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(worktree): add Phase C E2E test plan Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(worktree): fix E2E test plan sidecar path + jq selector - sidecar lives at ~/.qwen/projects/<sanitized-cwd>/chats/, not ~/.qwen/tmp/<hash>/ - qwen --output-format json emits a JSON array, not NDJSON — jq needs .[] Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(worktree): add showWorktreeExitDialog to dialogsVisible Phase C task 8 introduced showWorktreeExitDialog state and the dialog render in DialogManager, but missed adding the flag to the dialogsVisible OR expression. DefaultAppLayout only renders DialogManager when dialogsVisible is true, so the dialog was never shown — second Ctrl+C in a worktree silently absorbed instead of triggering the prompt. Caught by Group E E2E tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(worktree): extend --resume context restore to headless + ACP modes Phase C task 7 originally placed the worktree-restore logic in AppContainer.tsx (TUI only). E2E Group C exposed that headless and ACP modes never run AppContainer, so stale sidecars accumulate and the model loses worktree context after --resume. Refactor to a shared `restoreWorktreeContext` helper in core, then wire the three entry points: - TUI (AppContainer): keep historyManager.addItem(INFO) UX, route via the helper. - Headless (nonInteractiveCli): prepend the notice as a system-reminder block on the user prompt; emit a `worktree_restored` system message to the JSON adapter so SDK consumers can react. - ACP (Session.pendingWorktreeNotice): set by acpAgent.loadSession on resume, consumed and cleared exactly once on the next #executePrompt. All three modes call the same helper, so stale-sidecar cleanup is consistent. Helper covers: missing sidecar, live worktree dir, deleted worktree dir, regular file at worktreePath, malformed JSON. 5 new unit tests for restoreWorktreeContext (13/13 pass total). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(worktree): add ACP-mode integration tests for --resume context Covers: - acpAgent.worktree.test.ts (3 tests): loadSession sets pendingWorktreeNotice only when worktree dir is live, clears stale sidecar otherwise, swallows restoreWorktreeContext errors. - Session.worktree.test.ts (4 tests): #executePrompt prepends the system-reminder block exactly once on first prompt, clears the pending notice, second prompt sees no leakage, no-op when nothing was set. E2E via real ACP protocol is impractical without a Zed client; these tests cover the integration boundaries directly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(worktree): clarify hooksPath comment + pendingWorktreeNotice one-shot rationale Two doc-only fixes from PR QwenLM#4174 review: - gitWorktreeService.ts: previous hooksPath comment overstated the optimization (claimed claude-code's ~14ms saving but we still do a read subprocess). Rewrite to be explicit: write-skip only, read retained, parseGitConfigValue's full optimization deliberately not ported because the read happens once per worktree creation. - Session.ts: pendingWorktreeNotice doc now explains why it's one-shot (after the first prompt the worktree path is already in conversation context; re-injecting would clutter history without adding signal). No behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): add getResumedSessionData to nonInteractiveCli mock Config CI surfaced TypeError: config.getResumedSessionData is not a function across 12 tests in nonInteractiveCli.test.ts. The Phase C ada0837 commit added a worktree-restore call in the headless path that probes config.getResumedSessionData(); the mock Config never had that method. Return undefined to short-circuit the restore block — these tests don't exercise --resume. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(worktree): address PR QwenLM#4174 reviewer findings Bundled response to the two review rounds. Per-thread replies follow. CORE — worktree sidecar robustness (Findings 3252368644, 3252368651, 3255171690): - atomicWriteJSON instead of fs.writeFile (no more half-written sidecar after a crash) - readWorktreeSession now schema-validates the parsed object and returns null on missing/wrong-type fields instead of propagating undefined into consumers - restoreWorktreeContext clears the sidecar on JSON parse failure / read I/O error so a corrupted file doesn't block every subsequent --resume CORE — hooksPath setup (Finding 3252368645): - configureHooksPath distinguishes ENOENT (benign "candidate not present") from real stat errors (EACCES/EIO/ENOTDIR); the latter are warn-logged so a silently-degraded hooksPath is visible to operators CLI — handleWorktreeExit Remove path (Findings 3252368637, 3252368640 a+b): - Anchor GitWorktreeService at activeWorktree.originalCwd (the captured repo root), not config.getTargetDir() — fixes monorepo-subdirectory launches where the worktree lives under the repo root but getTargetDir points at a subpackage - Check removeUserWorktree return value; on failure, leave the sidecar intact so --resume can recover (previous code cleared it regardless) - Pass forceDeleteBranch:true to honour the dialog's "discards N commits" label — without it `git branch -d` refused unmerged commits and the branch was silently preserved CLI — useWorktreeSession watcher (Finding 3252368648): - Normalize fs.watch filename via toString() so the Linux-Buffer code path triggers reloads (previous comparison silently never matched) - Treat null filename as "unknown, reload to be safe" (recursive watchers on some platforms emit events without a payload) CLI — WorktreeExitDialog (Findings 3252368650, 3255171694): - execGit now correctly reads numeric exit codes from .code/.status (NodeJS.ErrnoException.code is a string for spawn errors, number for subprocess exits); previous typeof === 'number' check always missed - Dialog body shows an "⚠ Could not measure worktree state (...)" banner when git status / rev-list failed, so the user doesn't see a misleading "0 files, 0 commits" before choosing Remove CLI — closeAnyOpenDialog (Round 2 review body): - Wire WorktreeExitDialog into the standard dialog-dismissal path so Ctrl+C dismisses it the same way it dismisses every other dialog TEST FIXES — vitest timeouts: - Real git invocations + user-global hooks (e.g. trustup post-commit webhooks) can take 10–20s per setUp on CI. Bump testTimeout + hookTimeout to 30s for the three integ test suites that spawn git (Phase B/C worktree integ tests) so the suite isn't flaky. NEW TESTS: - worktreeSessionService.test: 3 new cases covering malformed JSON, missing required fields, wrong-type fields, malformed sidecar cleanup, partial sidecar cleanup (16 total, up from 13). - useWorktreeSession.test.tsx: 4 new cases — null when no sidecar, parsed sidecar at mount, reacts to delete, reacts to creation. - WorktreeExitDialog.test.tsx: 1 new case — loading frame renders before git probes resolve. (Async dialog states tested via E2E — vi.mock of execFile in ink-testing-library doesn't fire mock impl reliably.) - nonInteractiveCli.test: 3 new "Phase C --resume" cases — system-reminder injection on live worktree, no injection when sidecar absent, stale sidecar cleanup when worktree dir is gone. DECLINED FINDINGS (replied on threads): - 3252368642 (Dialog Keep clears sidecar) — declined-design. Dialog Keep = "exit app, keep worktree for next --resume"; tool Keep = "I'm done with this worktree". Intentionally different semantics. - 3252368643 (originalHeadCommit base branch) — false-positive. There is no base_branch parameter; getCurrentCommitHash() returns HEAD which equals the tip of the current branch (== baseBranch in createUserWorktree). - 3252368640 part c (bypass safety guards) — declined-design. The dialog IS the safety affordance for this path — it shows dirty-state counts and asks for explicit user confirmation before removal. - 3255171696 (DialogManager async fire-and-forget) — false-positive. handleSlashCommand('/quit') is inside the await chain in handleWorktreeExit, so the described race ("process.exit before remove completes") cannot occur. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): correct linter-mangled imports in useWorktreeSession.test Pre-commit hook auto-fixed imports collapsed value imports (writeWorktreeSession, clearWorktreeSession) into an `import type` block, breaking runtime resolution. Split back into value + type imports. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): normalize path separators for Windows in worktree session integ Windows CI failure: `repoRoot` from Node's `fs.mkdtemp` returns backslash-separated paths (`C:\Users\runneradmin\…`), but `originalCwd` in the sidecar comes from `getRepoTopLevel()` which delegates to `git rev-parse --show-toplevel` — git on Windows returns forward slashes (`C:/Users/runneradmin/…`). The Windows-only assertion `expect(originalCwd).toBe(repoRoot)` was comparing two different representations of the same canonical path and rightly failed on `Object.is` equality. Compare via path.normalize on both sides so the assertion holds across platforms without changing the runtime path (originalCwd still records git's output verbatim, which is what consumers expect since other places in the codebase that read `getRepoTopLevel()` also work with that shape). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(worktree): address PR QwenLM#4174 round 4 findings Finding #3256237933 (Critical, follow-up to #3252368640 part 1): handleWorktreeExit silently /quit'd when removeUserWorktree returned {success:false}, contradicting the user's intent after they clicked "Remove worktree and branch (discards N commits, M files)". Now surfaces an ERROR history item with the underlying error message and STAYS in the session so the user can decide what to do (retry via exit_worktree, fix the lock/permission/corruption issue, or quit anyway). Same treatment applied to the hard-failure catch block — previously it caught the throw and proceeded to /quit with no log; now it emits the error and stays alive. Finding #3256236050 (Nit): originalCwd field name implies "user's launch cwd" but actually stores `getRepoTopLevel()` (different in monorepo subdir launches — the gap closed by #3252368637). Renaming the field would force on-disk migration of every existing sidecar (every active --resume breaks until users wipe the old file). Doc-only fix: WorktreeSession.originalCwd now carries an explicit JSDoc explaining the semantics and warning consumers expecting process.cwd() to NOT use this field. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(worktree): address PR QwenLM#4174 round 5 findings Finding #3256241831 (Nit, but awareness UX): the built-in `⎇` indicator used to disappear whenever `statusLineLines.length > 0`, on the assumption that the user's custom statusline rendered worktree itself. That assumption is unsafe — scripts written before Phase C don't know about `payload.worktree`, scripts can deliberately ignore the field, and partial scripts may render some fields but not worktree. In any of those cases the user sees no worktree UI while having an active worktree, risking destructive operations in the wrong cwd. New behavior: indicator shows by default regardless of statusline. Added an opt-out setting `ui.hideBuiltinWorktreeIndicator` (default false) for users whose custom statusline already renders worktree and want to avoid duplication. Finding #3256239608 (Nit): `fs.watch` in useWorktreeSession holds an inode handle to `chatsDir` at mount time. If the directory is deleted out-of-band (manual cleanup, antivirus quarantine, reset scripts) and recreated, the watcher does NOT re-attach to the new inode and the Footer indicator stops reacting to sidecar changes. Reviewer explicitly accepted this as a documented limitation rather than adding polling-fallback or error-event-handler complexity for an edge case that doesn't arise in normal use. Added a JSDoc block on the hook explaining the limitation and pointing to the future fix shapes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(worktree): regenerate settings.schema.json for hideBuiltinWorktreeIndicator CI Lint step caught that the JSON schema mirror in packages/vscode-ide-companion was out of date after adding the new ui.hideBuiltinWorktreeIndicator setting in 80f9cb4. Regenerated via `npm run generate:settings-schema`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(worktree): address PR QwenLM#4174 round 6 findings Critical fixes: - #3259975247: TUI dialog Remove now reads the in-worktree session marker and refuses to delete a worktree owned by a different session — same ownership guard ExitWorktreeTool already applies. Stale/copied sidecars can no longer destroy another session's work. - #3259975249: TUI --resume queues a one-shot pendingWorktreeNotice ref consumed by handleFinalSubmit; the user's first prompt is prefixed with the same <system-reminder> block headless/ACP use. Previously only the INFO history item showed in the transcript (UI-only), so resumed models could silently edit the parent checkout. - #3259975245: exit_worktree action='keep' no longer clears the sidecar. `keep` means "preserve the worktree for later"; clearing the persisted binding broke --resume / Footer / WorktreeExitDialog for kept worktrees. Now matches the Dialog keep semantics. Test updated to assert preservation instead of clearing. - ACP unstable_resumeSession parity: factored the worktree restore block into #restoreWorktreeOnResume() and called from both loadSession() and unstable_resumeSession(). ACP clients using resume no longer miss the worktree context. Suggestion-level fixes: - #3259975237: configureHooksPath now resolves the canonical hooks dir via `git rev-parse --git-common-dir` instead of constructing `<sourceRepoPath>/.git/hooks`. The construction assumed .git is a directory, but when Qwen runs from a linked worktree it's a file pointing at the real gitdir → ENOTDIR → silent no-hooks worktree. - #3259975242: only writes core.hooksPath when the key is unset. A non-empty inherited or user-configured value is preserved instead of being silently replaced. - #3256839787: restoreWorktreeContext adds a structural invariant check — worktreePath must live under <originalCwd>/.qwen/worktrees/. A tampered/copied sidecar pointing at an arbitrary existing dir is rejected and cleared so the model can't be redirected. Tests: - worktreeSessionService.test: 17/17 (added prefix-escape rejection case + restructured the existing live-worktree case to satisfy the new structural invariant). - exit-worktree.session.integ.test: rewrote keep test to assert preservation (matches new behavior). - nonInteractiveCli.test: updated fixture worktreeDir to live under <originalCwd>/.qwen/worktrees/ for the prefix invariant. - All other suites pass without modification. Test coverage gap acknowledgement (no comment_id reply): per-handler unit tests for handleWorktreeExit + dialog post-load states remain covered by the E2E Group E suite in docs/e2e-tests/worktree-phase-c.md. The execFile mock path in ink-testing-library still doesn't deliver async useEffect state transitions reliably, so unit testing those states adds more harness than signal; deferring. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…enLM#4219) (QwenLM#4262) * fix(core): apply defaultModalities() on env-var-only model config (QwenLM#4219) When qwen-code is configured only via env vars (OPENAI_API_KEY / OPENAI_BASE_URL / OPENAI_MODEL) with no modelProviders entry, resolveGenerationConfig() never invoked defaultModalities(), so generationConfig.modalities stayed undefined for image-capable models. The two other config paths (modelRegistry.resolveModelConfig and modelsConfig.applyResolvedModelDefaults) already call it. This aligns the env-var-only path with both so multimodal models like qwen3.6-35b-a3b correctly accept @image attachments. Fixes QwenLM#4219 * test(core): lock modalities fallback invariants on env-var-only path Address review feedback on PR QwenLM#4262: - Strengthen the positive regression test to also assert video:true and source kind ('computed'), matching the source-tracking convention used elsewhere in this file and catching regex regressions in modalityDefaults. - Add negative case: unknown model → modalities resolves to {} (text-only), never undefined — the key invariant introduced by the fix. - Add negative case: explicit settings.generationConfig.modalities is not clobbered by the fallback (lock the `=== undefined` guard). - Extend the fallback's comment to document the undefined → {} semantic so future maintainers don't reintroduce `modalities === undefined` branches. No behavior change. * test(core): pin Qwen OAuth modalities auto-detect for coder-model Round-2 review feedback on QwenLM#4262: `resolveGenerationConfig` is shared by both the OpenAI/env-var-only path and `resolveQwenOAuthConfig`, which passes `resolvedModel` (defaults to 'coder-model') as modelId. So the new modalities fallback also activates for Qwen OAuth — a real behavior change (was undefined, now { image: true, video: true }). The change is desired (coder-model supports vision per the existing warning text in resolveQwenOAuthConfig), but no test pinned it down. Add a regression test so future MODALITY_PATTERNS edits can't silently shift Qwen OAuth behavior.
… consumer (QwenLM#4308) * fix(cli): block Windows Tab approval-mode toggle when input has a Tab consumer Closes QwenLM#4171. On Windows, Shift+Tab is indistinguishable from a bare Tab in many terminals, so useAutoAcceptIndicator accepts a bare Tab as the approval-mode cycle shortcut. To avoid double-firing with the input area, AppContainer passes a `shouldBlockTab` callback that suppresses the cycle when the input has its own Tab handler. Until now that callback only tracked the autocomplete dropdown (`shouldShowSuggestions`). When the buffer was empty and the followup prompt-suggestion ("input prediction") was visible, pressing Tab on Windows accepted the suggestion *and* cycled approval mode at the same time — the exact behaviour reported in QwenLM#4171. The mid-input ghost-text and reverse/command-search paths had the same gap. Broaden the signal: compute `hasTabConsumer` from every Tab consumer inside InputPrompt — autocomplete dropdown, followup suggestion, mid-input ghost text, reverse-search, command-search — and feed that into `shouldBlockTab`. A single Tab keystroke now triggers exactly one action on Windows; macOS and Linux behaviour is unchanged. Tests cover the four states (followup visible, ghost text visible, autocomplete visible, idle). * fix(cli): tighten hasTabConsumer, add unmount cleanup + tests (QwenLM#4308 review) Three review findings on PR QwenLM#4308 addressed together — all touch the same `hasTabConsumer` signal surface exposed from InputPrompt to AppContainer. 1. **Tighten signal semantics (Copilot)**: drop the standalone `reverseSearchActive || commandSearchActive` terms. When those overlays have matches, their `showSuggestions` flag already flows into `shouldShowSuggestions` and Tab is consumed via `ACCEPT_SUGGESTION_REVERSE_SEARCH`. When they're active without matches, Tab is NOT consumed — including the bare flags misrepresented the signal as "Tab consumer present" when it really meant "modal overlay open". `hasTabConsumer` now strictly matches its name. 2. **useEffect cleanup on unmount (wenshao)**: previously, if any Tab consumer was active when InputPrompt unmounted (e.g. streaming begins while autocomplete is open), AppContainer's `hasTabConsumer` state retained the stale `true` value and kept blocking Windows Tab approval-mode cycling for the entire unmount window. Effect now resets to `false` on cleanup. The pre-existing code had the same gap with one trigger; expanding to 3 triggers materially raised the likelihood. 3. **JSDoc on prop name (wenshao)**: `onSuggestionsVisibilityChange` now carries broader "Tab consumer" semantics than the name suggests. Cross-file rename across UIActionsContext + Composer + AppContainer is too much churn for QwenLM#4308's scope; add JSDoc on the prop declaration documenting the broader signal and that the name is retained for backward compatibility. 4. **Test coverage (wenshao)**: add two tests — autocomplete dismissal reports `false` (true→false transition); unmount-while-active reports `false` (cleanup regression guard). * fix(cli): split Tab-consumer signal so it doesn't hide Footer (QwenLM#4308 review) Self-inflicted regression caught by wenshao: the previous round broadened `onSuggestionsVisibilityChange` from "autocomplete dropdown visible" to "any Tab consumer present", but Composer.tsx was using that same callback for a different purpose — hiding the Footer / KeyboardShortcuts when the dropdown would overlap their vertical space. As a result, followup prompt suggestions and mid-input ghost text (both inline within the input box, neither competing for vertical space) were also hiding the Footer on every platform. Split into two signals: - `onSuggestionsVisibilityChange` — narrow, autocomplete dropdown only. Kept local to Composer for Footer hiding. Restored to pre-PR semantics; no cleanup-on-unmount needed (the entire conditional in Composer.tsx is already gated by `uiState.isInputActive`, which goes false when InputPrompt unmounts). - `onTabConsumerChange` — broad, any input-side Tab consumer (autocomplete + followup + ghost text). Plumbed through UIActionsContext to AppContainer's `hasTabConsumer` state → useAutoAcceptIndicator's `shouldBlockTab`. Retains the cleanup-on-unmount wenshao added last round (the broad signal IS read while InputPrompt is unmounted). Tests: - All 6 broad-signal regression tests renamed to assert `onTabConsumerChange`. - 3 new narrow-signal regression tests pin that `onSuggestionsVisibilityChange` does NOT fire `true` for followup or ghost text. Catches the exact shape of my regression.
* feat(core): extend cross-auth fast models to agents * fix(core): tighten cross-auth model resolution fallbacks When a forked-agent caller passes a selector that cannot resolve (e.g. `fast` with no fast model configured), fall back to the parent session model instead of forwarding the raw selector string to the provider. Matches the subagent path, where unresolvable selectors mean "inherit parent". In BaseLlmClient.createContentGeneratorForModel, do not cache the unregistered-model fallback. getCurrentContentGenerator() reads the runtime view from AsyncLocalStorage, which can differ between calls; caching would pin the first call's view-bound generator under the selector key and reuse it on later calls after that view has unwound. * docs(core): drop stale getFastModelForSideQuery from sideQuery JSDoc The function was removed when fast-model resolution collapsed onto getFastModel(); the JSDoc fallback chain still mentioned it.
B-A-M-N
pushed a commit
that referenced
this pull request
May 20, 2026
…#4247) * feat(serve): MCP client guardrails (QwenLM#4175 Wave 3 PR 14) Adds an in-process MCP client counter, slot-reservation enforcement at all 3 spawn sites (discoverAllMcpTools / discoverAllMcpToolsIncremental / readResource), new `--mcp-client-budget=N` + `--mcp-budget-mode={enforce,warn,off}` CLI flags forwarded to the ACP child via env, and additive `clientCount` / `clientBudget` / `budgetMode` / `budgets[]` fields plus `disabledReason: 'budget'` tagging on `GET /workspace/mcp`. Always-on capability tag `mcp_guardrails` with `modes: ['warn', 'enforce']` so SDK clients can pre-flight refusal semantics. Typed SSE push events (`mcp_budget_warning` / `mcp_child_refused_batch`) intentionally deferred to a small follow-up PR — the snapshot already exposes `budgets[0].status: 'warning'|'error'` + `refusedCount` so operator visibility isn't blocked. * fixup(serve): address PR 14 review (QwenLM#4247) findings 1-7 Addresses Codex + Copilot review feedback on QwenLM#4247. Seven functional and forward-compat fixes; (8) `tcp` transport mapper vs createTransport deferred pending @wenshao direction (separate core/protocol decision). 1. **Single-server rediscovery bypass** — add `tryReserveSlot` at the top of `discoverMcpToolsForServerInternal`. Pre-fix a server refused at startup could be brought online later via `/mcp reconnect <name>` and exceed the cap in enforce mode. 2. **Empty `budgets[]` when mode=off** — early `return []` in `buildBudgetCells` when mode is `off`. Protocol docs / SDK types promise empty array; pre-fix emitted a synthetic noisy cell. 3. **runQwenServe validation + env leakage** — mirror CLI budget validation in `runQwenServe` (the embedded entry point); explicitly delete `QWEN_SERVE_MCP_*` env vars when options are undefined so multiple daemons in one process don't leak prior budget config to subsequent ACP children. 4. **Disabled-vs-refused precedence + stale refusal log** — config-disable wins over budget refusal in the per-server cell; `removeServer` + `disconnectServer` drop the entry from `lastRefusedServerNames` so operator action immediately clears the budget tag. 5. **Incremental remove-before-reserve ordering** — process config-removed servers FIRST in `discoverAllMcpToolsIncremental` so freed slots are visible to subsequent `tryReserveSlot` calls. Pre-fix scenario {a,b}→{a,c} with budget=2 wasted a slot. 6. **`scope` forward-compat type widening** — `'workspace' | (string & {})` on both `ServeMcpBudgetStatusCell` and `DaemonMcpBudgetStatusCell` so SDK consumers don't break when PR 23 adds `scope: 'pool'` per the documented no-schema-bump contract. 7. **Test comment alignment** — fix "With budget=1" comment to match `clientBudget: 2` code. Plus 4 new core regression tests covering #1/#2/#4/#5, and 4 new serve tests covering #3 (boot rejection + env cleanup). 237/237 pass across the affected files (36 core mcp-client-manager + 50 acpAgent + 151 serve). * docs(serve): clarify v1 snapshot-based budget warning detection (QwenLM#4247) Address github-actions review-summary finding (I) on PR QwenLM#4247: v1 operators have no SSE push event for budget pressure yet (deferred to PR 14b), so the protocol doc should explicitly say how to detect warning / error states from the snapshot. Adds the three-way mapping `budgets[0].status` ↔ live/refused counts. * fixup(serve): address PR 14 review round 2 (QwenLM#4247 wenshao) Addresses @wenshao review on PR QwenLM#4247. Three critical safety fixes + four suggestion-level improvements. Critical (zombie slot leaks — would break `enforce` mode for the rest of the daemon's lifetime): - C2: `discoverAllMcpTools` connect() catch now releases reservedSlots + clients entry. Pre-fix one failed connect permanently consumed a budget slot. - C3: `readResource` wraps client.connect() in try/catch; on throw the slot + client entry are cleaned up before re-raising. Tracked `weReservedSlot` so the cleanup only fires for newly-created lazy spawns (reused already-CONNECTED clients are untouched). - (wenshao C1 was the rediscovery-bypass also caught by Codex + Copilot — already addressed in fixup 597f011.) Suggestion: - S4: `readBudgetFromEnv` downgrades `mode='enforce'` → `'off'` when no budget is set, mirroring the CLI + `runQwenServe` invariant. Fail-closed on operator misconfiguration rather than silently bypassing enforcement. - S5: extract duplicated `mcp_budget_decision` telemetry into private `emitBudgetTelemetry(configuredCount)`. - S6: rename `BudgetExhaustedError` constructor param `liveCount` → `reservedCount`. `reservedSlots.size` is what's blocking the new server, not the live CONNECTED count (those differ when a reserved server is disconnected). - S7a: bump accounting-failure log level — `debugLogger.debug` (gated on debug=true) replaced by `process.stderr.write` so production daemons surface slot-leak / type-mismatch failures in journald/docker logs. (S7b — expose `reservedSlots[]` on the wire for slot-leak debugging — deferred as additive; will be in PR 14b alongside the typed events.) + 3 new core regression tests (C2 leak release, C3 lazy-spawn leak release, S4 env enforce-downgrade). 626/626 tests pass across the focused suite; typecheck + lint clean. * fixup(serve): address PR 14 review round 3 (QwenLM#4247 wenshao second pass) Addresses @wenshao's second review pass on PR QwenLM#4247 (submitted 15:56Z after round 2 fixup landed). Four code fixes + three doc clarifications. Code: - R3 #5: `readResource` lazy-spawn path now checks `isMcpServerDisabled` BEFORE the budget gate. Pre-existing gap: a server disabled via `mcpServers.<name>.disabled: true` or `/mcp disable <name>` could be resurrected by any resource read. Disabled precedence over budget mirrors the per-server cell logic. - R3 #6: `buildBudgetCells` now receives the post-disabled-filter `refusedCount` so the workspace cell matches the per-server cell precedence. Pre-fix a server disabled after being refused rendered `disabled` on its per-server row but `error: budget_exhausted` on the workspace row. - R3 #7: extract `MCP_BUDGET_WARN_FRACTION = 0.75` constant. Was hardcoded in `acpAgent.buildBudgetCells` AND `commands/serve.ts` stderr breadcrumb (the latter with `Math.ceil` divergence on non-integer multiples). Pre-extract so PR 14b's dual-threshold (0.75 warn + 0.375 rearm) lands in one file. - R3 #1: env-var enforce-without-budget downgrade (already fixed in round 2 ba3e3fe S4 — reply-only on the new thread). Docs: - R3 #2: docstring on `mcpTransportOf` now spells out the `tcp` vs `createTransport` divergence + records the deferred decision (PR 14b / future core). Closes the "comment claims X but code does Y" gap. - R3 #3: comments in both `discoverAllMcpTools` catch (release slot — stop() owns lifecycle) AND `discoverMcpToolsForServerInternal` catch (KEEP slot — operator intent + health-monitor retry). Different paths, different contracts, both explicit. - R3 #4: invariant note in `readResource` lookup→reserve sequence documenting the synchronous no-await guarantee that closes the TOCTOU window. + 3 new core regression tests (readResource disabled gate, disabled-wins-over-budget precedence, MCP_BUDGET_WARN_FRACTION pin). 629/629 tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 4 (QwenLM#4247 wenshao second + third pass) Addresses @wenshao's second + third review passes on PR QwenLM#4247. One critical scope-correction (per-session vs per-workspace) + one zombie leak fix shared across three threads. Critical correction — per-session vs per-workspace (wenshao R3 line 117 docs): - Reality check: `acpAgent.newSessionConfig()` constructs a fresh `Config` + `ToolRegistry` + `McpClientManager` for EVERY ACP session. Each manager independently reads `QWEN_SERVE_MCP_CLIENT_BUDGET` env. So `--mcp-client-budget=10` with 5 sessions caps at 5 × 10 = 50 live MCP clients across the daemon, NOT 10. The "per-workspace" framing in v1 docs was incorrect. - Pragmatic v1 path (not the big refactor): rewrite docs + change `scope: 'workspace'` → `scope: 'session'` so the wire contract reflects reality. Wave 5 PR 23 (shared MCP pool) will introduce a workspace-scoped manager and add `scope: 'workspace'` cells alongside. - Files touched: `status.ts` + `sdk types.ts` (cell `scope` field widened to `'session' | 'workspace' | (string & {})` with v1 emitting `'session'`), `acpAgent.buildBudgetCells` (emits `'session'` + new code comment explaining the per-session truth), `docs/users/qwen-serve.md` (CLI flag + budget section relabel +⚠️ v1 limitation callout), `docs/developers/qwen-serve-protocol.md` (capabilities section + JSON example + paragraph rewrite + per-session detection hint). Zombie leak fix — single weReserved-pattern fix in discoverMcpToolsForServerInternal closes wenshao R3 line 546 + R4 line 639 + R4 line 929: - Same pattern as R2 C3 (`readResource`): track `weReservedSlot = reservation === 'reserved' && this.reservedSlots.has(serverName)` (the set-membership guard distinguishes a real fresh reservation from `off`-mode's no-op return). On connect-failure, release slot + drop client only when `weReservedSlot`; an `'already_held'` reconnect keeps its slot so health-monitor retry doesn't compete for capacity. - Pre-fix a brand-new server connecting via /mcp reconnect / health monitor / incremental's serversToUpdate that failed on connect() would permanently consume a budget slot under enforce mode. - Updated R3's "always keep" doc comment to reflect the new two-mode cleanup (release on fresh + keep on reconnect). - Caught and added a tripwire test for the `off`-mode no-op edge case (`tryReserveSlot` returns `'reserved'` without adding to the set in off mode — without the has-guard, my fix would have broken the pre-existing "should restore health checks after failed server rediscovery" test by deleting the failed client even in unbudgeted operation). + 2 new core regression tests (fresh-reserve connect-failure releases slot, reconnect connect-failure keeps slot). 631/631 focused tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 5 (QwenLM#4247 wenshao fourth pass) Addresses @wenshao's fourth review pass on PR QwenLM#4247. Two critical zombie-leak / staleness fixes; three reviewer findings deferred or already-addressed (replied + resolved on the threads). Critical fixes: - R5 line 956: `runWithDiscoveryTimeout` timeout handler now releases `reservedSlots.delete(serverName)` and drops the stale `lastRefusedServerNames` entry alongside the existing `clients.delete`. Pre-fix a timed-out server in `enforce` mode permanently held its budget slot; N consecutive timeouts permanently degraded daemon capacity. + regression test. - R5 line 1268-1: `readResource` lazy-spawn path drops the server from `lastRefusedServerNames` when `tryReserveSlot` returns `'reserved'` (a successful late re-reservation). Pre-fix a server refused at discovery but later re-reserved via `readResource` (e.g., after another server freed a slot) kept its stale `disabledReason: 'budget'` tag in the snapshot. + regression test. Reviewer findings deferred / already done (replied + resolved): - R5 line 1268-2 (`no try/catch around connect()` in readResource): stale view — R2 C3 fixup ba3e3fe added the try/catch with the weReservedSlot cleanup pattern. - R5 line 1274 (`BudgetExhaustedError.liveCount` semantic mismatch): R2 S6 fixup ba3e3fe renamed the param + readonly field to `reservedCount`, exactly matching the proposed semantic. - R5 acpAgent.ts null line (`Math.ceil(0.75 * budget)` for small budgets): proposed fix is semantically a no-op for integer liveCount — `liveCount >= 0.75` and `liveCount >= Math.ceil(0.75) === 1` give identical results when liveCount is an integer. The underlying "small budgets jump ok→error" observation is a real but inherent limitation of percentage-based thresholds at small N; design tradeoff, not implementation bug. 46/46 core tests pass (44 prior + 2 new R5 regression). Typecheck + lint clean. * fixup(serve): address PR 14 review round 6 (QwenLM#4247 wenshao fifth pass) Addresses @wenshao's fifth review pass on PR QwenLM#4247. Two critical fixes (one TOCTOU race, one cross-daemon env leak). Critical fixes: - R6 Thread 2 (line 956): remove the duplicate pre-reservation block in `discoverAllMcpToolsIncremental`. The reservation already happens inside `discoverMcpToolsForServerInternal` (R1 fix #1). With both sites reserving, the timeout cleanup raced against the inner connect path — `runWithDiscoveryTimeout`'s timeout handler could release the slot mid-flight while the inner `connect()` later resolved successfully, leaving a CONNECTED client with NO reservation and breaking `enforce`-mode budget enforcement. With pre-reservation removed, the inner call owns the entire reservation lifecycle (reserve → connect → release-on-failure-via-weReservedSlot → cleared-by-timeout-if-fires) at a single site. Refusal behavior is observably identical from outside. - R6 Thread 1 (runQwenServe.ts:216): per-handle env passthrough via new `BridgeOptions.childEnvOverrides` instead of mutating global `process.env`. Pre-fix concurrent embedded `runQwenServe()` handles with different MCP budgets would race on the global env — `defaultSpawnChannelFactory` snapshots `process.env` AT SPAWN TIME, so the last `runQwenServe()` call to set the var would silently win for ALL daemon handles' subsequent ACP child spawns. Wire surface: - `ChannelFactory` signature: `(workspaceCwd, childEnvOverrides?) => Promise<AcpChannel>`. - `BridgeOptions.childEnvOverrides?: Readonly<Record<string, string | undefined>>` — `undefined` value means "scrub this var from the child env" so an embedded caller can wipe a stale inherited var without touching global state. - `defaultSpawnChannelFactory` merges overrides AFTER `SCRUBBED_CHILD_ENV_KEYS` so the daemon-only secret list still wins (operators can't override the scrub). - `runQwenServe` closes over per-handle overrides; never touches `process.env`. + 3 new regression tests (incremental refusal post-pre-reservation-removal, runQwenServe-doesn't-mutate-process.env, bridge forwards childEnvOverrides to channelFactory with two concurrent bridges asserting isolation). 327/327 focused tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 7 (QwenLM#4247 wenshao sixth pass) Addresses @wenshao's sixth review pass on PR QwenLM#4247 (glm-5.1 via Qwen Code /review). One critical staleness fix + four real bug fixes + one operator-visibility breadcrumb + one refactor. Critical: - R7 #1 line 612: `discoverMcpToolsForServerInternal` now drops the entry from `lastRefusedServerNames` on successful connect+discover. Pre-fix a previously-refused server that reconnects via `/mcp reconnect` (or health-monitor retry after another server frees capacity) left the snapshot reporting `error / disabledReason: 'budget'` for a CONNECTED, working server until the next discovery pass cleared the per-pass log. Real bugs: - R7 #2 line 528: disabled gate added to `discoverMcpToolsForServerInternal`. Reachable from `/mcp reconnect`, OAuth re-discovery, and health-monitor `reconnectServer` — none of which previously checked `isMcpServerDisabled`. Pre-fix a disabled server could be resurrected through any of these paths, wasting a budget slot and registering tools the operator told us to ignore. Mirrors the bulk-discovery + readResource patterns. Optional-chain on the call to stay defensive against test fixtures missing the method. - R7 #3 line 634: transport leak in the `discoverMcpToolsForServerInternal` connect-failure catch. Pre-fix when `connect()` succeeded (transport established) and `discover()` later threw, the catch deleted the client reference without calling `client.disconnect()`, leaking the stdio child / socket until Node exit. Best-effort `await client.disconnect()` added before the map cleanup. - R7 #4 line 1302: `readResource`'s `weReservedSlot` now uses the same `reservation === 'reserved' && this.reservedSlots.has(serverName)` guard as `discoverMcpToolsForServerInternal`. Distinguishes a real fresh reservation from `off`-mode's no-op return. Maintenance-trap fix; in `off` mode the cleanup branch never fires now. - R7 #5 line 1342: `readResource` re-checks `isMcpServerDisabled` on EVERY call, regardless of whether the client was just lazy-spawned or pre-existing. Pre-fix a server connected pre-disable and then operator-disabled mid-session via settings reload still served resource reads via its existing CONNECTED client until the next incremental discovery pass called `removeServer`. Polish: - R7 #6 line 191: `readBudgetFromEnv` now emits a stderr breadcrumb when env values are invalid (`QWEN_SERVE_MCP_CLIENT_BUDGET=abc`, `QWEN_SERVE_MCP_BUDGET_MODE=foo`). Pre-fix operator typos silently fell through to "no enforcement". Same pattern as the `--require-auth` boot log. - R7 #7 line 464: extracted `dropRefusalEntry` (4 sites) + `refuseAndLog` (3 sites) helpers. Pure refactor, zero behavior change. The `readResource` refusal path now calls `refuseAndLog` before throwing `BudgetExhaustedError` so operators get the same stderr trail as bulk-discovery refusals. + 5 new core regression tests (refusal-cleared-on-success, internal-disabled-gate, discover-throw-disconnects, env-typo-breadcrumb, existing-client-disabled-rejected). 52/52 core tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 8 (QwenLM#4247 wenshao seventh pass) Addresses @wenshao's seventh review pass on PR QwenLM#4247 (gpt-5.5 + DeepSeek/deepseek-v4-pro via Qwen Code /review). One critical transport leak + three soundness/consistency fixes; one optional clarity refactor explicitly deferred. Critical: - R8 #1 line 532 (4 duplicate threads): bulk-path transport leak. Mirrors the R7 #3 fix but in `discoverAllMcpTools` instead of the per-server path. Pre-fix: when `connect()` succeeded (transport established) and `discover()` later threw, the bulk catch deleted the client reference without calling `client.disconnect()`, leaking the stdio child / WebSocket / HTTP socket for the rest of the daemon's lifetime (`stop()` can't see what we just removed from `this.clients`). Best-effort `await client.disconnect()` added before `clients.delete` + `reservedSlots.delete`. Updated the doc comment that misleadingly claimed `stop()` was the lifecycle owner — true only for slot bookkeeping, not transports. Soundness: - R8 #2 line 431: tighten `readBudgetFromEnv` mode-without-budget downgrade. Originally only `enforce` got downgraded to `off` when no budget was set; `warn` mode without a budget threshold reached `emitBudgetTelemetry` with `clientBudget: undefined`, contradicting the JSDoc invariant `mode !== 'off' ⇒ clientBudget defined`. Now both `enforce` AND `warn` downgrade to `off` when no budget is configured. The invariant comment was also weakened to match the actual `?? 0` defense-in-depth (the new R8 #5 constructor downgrade closes the remaining edge case). - R8 #5 line 302: constructor mirrors the `readBudgetFromEnv` downgrade for the direct `budgetConfig` parameter. All production callers (CLI, `runQwenServe`, env-var fallback) validate upfront, but a future code path that injects `budgetConfig` directly without re-validating would re-introduce the silent fail-open. Defense in depth. - R8 #4 line 1221: distinguish fresh vs `'already_held'` reservations in `runWithDiscoveryTimeout`'s timeout handler. New private `freshReservations: Set<string>` field marked when `weReservedSlot === true` inside `discoverMcpToolsForServerInternal` and cleared in finally / catch / success. Timeout handler now releases the slot ONLY when `freshReservations.has(serverName)` — meaning the slot was freshly reserved by THIS in-flight call. `'already_held'` reconnect timeouts (a previously-healthy server's transient hiccup) keep the slot so health-monitor retry doesn't have to compete for capacity with new servers admitted during the timeout window. Aligns the timeout handler with the connect-failure catch's `weReservedSlot` semantics — closes the asymmetry wenshao R8 #4 caught. Deferred: - R8 #3 line 332 (`tryReserveSlot` `'observed'` return value clarity): optional, non-blocking style improvement that ripples through 3 call sites + many tests for zero behavior change. Worth doing in a focused refactor PR; flagged as deferred polish, not in this fixup. + 3 new core regression tests (bulk discover-throw disconnects, warn-no-budget downgrade, constructor enforce downgrade). 679/679 focused tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 9 (QwenLM#4247 wenshao eighth pass) Addresses @wenshao's eighth review pass on PR QwenLM#4247 (glm-5.1 via Qwen Code /review). Six actionable findings adopted; two threads explained as not-actionable (one stale-view, one reviewer hallucination). Critical / real bugs: - R9 #2 line 1534: `readResource` lazy-spawn connect-failure catch now does best-effort `await client.disconnect()` BEFORE `clients.delete` + `reservedSlots.delete`. Mirror of R7 #3 (per-server discovery) and R8 #1 (bulk discovery) — closes the same transport-leak class for the third spawn path. Pre-fix: connect() establishing the transport but throwing on a later handshake step would orphan the stdio child / socket. - R9 #6 line 1521: `readResource` lazy `client.connect()` now wraps in `Promise.race` against `discoveryTimeoutFor(serverConfig)` — same per-server timeout the bulk + incremental paths use. Pre-fix a hung MCP server during a resource-read spawn would block forever and permanently consume a budget slot under enforce mode, cascading into total budget exhaustion. `serverConfig` lookup hoisted to the top of `readResource` so both lazy-spawn and existing-client branches use identical timeout behavior. - R9 #8 line 1514: `readResource` lazy spawn now calls `this.startHealthCheck(serverName)` after a successful connect. Pre-fix a lazy-spawned server that later disconnected (crash, network) had no automatic reconnect — sat DISCONNECTED until the next readResource or incremental pass. Mirrors `discoverMcpToolsForServerInternal`'s finally-block pattern. Operator-visibility: - R9 #7 (general): `readBudgetFromEnv` now writes a stderr breadcrumb when the `(enforce|warn)`-without-budget downgrade fires. Pre-fix a Docker Compose / k8s env that set `QWEN_SERVE_MCP_BUDGET_MODE=enforce` but forgot the matching `_BUDGET=N` would silently boot with enforcement off and `mcp_guardrails` capability advertised — operator only signal was the snapshot's `budgetMode: 'off'`. Now mirrors the R7 #6 invalid-value breadcrumb pattern. Doc fixes: - R9 #4 line 81: `McpBudgetConfig.clientBudget` JSDoc now reflects the R4 per-session scope correction. The doc was a leftover from the original "per-workspace" framing — every other doc surface (protocol doc, user doc, type comments on the snapshot cell, capability tag) was rewritten in R4 except this one. - R9 #5 line 870: `acpAgent.buildBudgetCells` now spells out the `liveCount` (`accounting.total`, CONNECTED only — operator observability) vs `reservedSlots.size` (all reserved including in-flight — enforcement) semantic distinction. The intentional gap was undocumented in the type signatures, JSDoc, and protocol doc; future PR 14b SSE event payloads should reference both. Not adopted: - R9 #1 acpAgent:15: claimed "MCP_BUDGET_WARN_FRACTION not exported + getMcpClient* methods don't exist + 4 tsc errors" — verified incorrect: the constant IS exported (mcp-client-manager.ts:61), the 3 methods ARE class members (lines 379, 407, 412), and `npm run typecheck` is clean across all 4 workspaces. Reviewer's tool hallucinated this critical finding. - R9 #3 mcp:410: reported the bulk-path transport leak that R8 #1 (commit 7228813) had already closed. Reviewer was on the pre-R8 commit view. + 2 new core regression tests (readResource lazy connect-fail disconnects + R9 #7 stderr breadcrumb). 57/57 core tests + 679/679 focused suite pass. Typecheck + lint clean. * fixup(serve): address PR 14 review round 10 (QwenLM#4247 wenshao ninth pass) Two non-blocking 🟢 nits — both adopted for symmetry / explicitness. - R10 line 357: constructor downgrade now emits the same stderr breadcrumb the env-var path got in R9 #7. Pre-R10 the `(enforce|warn)`-without-budget downgrade was silent for the direct-`budgetConfig` path, so a future caller bypassing CLI / env-var validation would have shipped a daemon advertising `mcp_guardrails` while silently disabling enforcement. Now boot logs surface the misconfiguration uniformly across all three resolution paths. - R10 line 1572: documented the `McpClient.disconnect()` cancel-pending-connect contract that the timeout-race cleanup relies on across all three spawn paths (lazy `readResource`, bulk `discoverAllMcpTools`, per-server `discoverMcpToolsForServerInternal`). The bulk path's production stability since QwenLM#3889 is implicit evidence the contract holds; comment makes the assumption discoverable to the next reader and notes a follow-up unit test would be valuable. No behavior change. 57/57 core tests pass. Typecheck + lint clean.
…LM#4848) - Remove broken memory.md entry (file lives in features/, not configuration/) - Add missing doc entries: hooks, auto-mode, status-line, scheduled-tasks, worktree - Extract hooks from the 'Advanced' bucket into its own config category row with a proper reference to docs/features/hooks.md (1300-line comprehensive doc) - Add auto-mode.md reference to the Tool Approval config category This ensures the qc-helper bundled skill can locate the correct documentation when answering user questions about configuration, hooks, and features.
QwenLM#4906) * feat(telemetry): inject TRACEPARENT env var into shell child processes When `outboundCorrelation.propagateTraceContext` is enabled, inject a W3C `TRACEPARENT` environment variable into all shell child processes (Bash tool, hooks, monitor) so that CLI tools and Python scripts can participate in distributed tracing. - Extract shared trace-context module from debugLogger's private helpers - Use OTel's `isSpanContextValid` and `INVALID_TRACEID` for validation - Gate injection via module-level setter (set in initializeTelemetry, reset in shutdownTelemetry) - Update settingsSchema description to document shell env var behavior * fix: clear TRACEPARENT when enabled but no trace context available When propagation is enabled but getTraceContext() returns null (e.g., during initialization before any span exists), explicitly set TRACEPARENT to empty string to prevent stale/foreign values inherited from process.env from leaking into child processes. * chore: regenerate settings.schema.json after description update * fix: address wenshao review round 1 - Remove ZERO_TRACE_ID from barrel export (no external consumers) - Clear TRACESTATE alongside TRACEPARENT when context unavailable - Add zero-spanId rejection test (isSpanContextValid contract) - Add getSessionRootTraceContext error path test - Add sdk.test.ts assertions for setShellTracePropagation wiring * fix: always clear TRACESTATE when overriding TRACEPARENT Clear TRACESTATE unconditionally when propagation is enabled, not only when trace context is null. Prevents stale vendor state from the parent environment pairing with qwen-code's own traceparent.
The two Windows-targeted dev.js launcher tests added in QwenLM#4728 mock existsSync with forward-slash suffix matching and assert spawn args via forward-slash stringContaining. On a real windows-latest runner dev.js builds these paths with path.join, which yields backslashes, so the existsSync mock never matches, dev.js takes the bare tsx.cmd shell fallback, and both tests fail. On macOS/Linux the platform() mock plus real forward-slash joins keep them green, which is why main CI has been red only on the Windows job since 423cac1. Normalize separators in both the existsSync mocks and the received spawn arguments before asserting, so the tests pass on every host OS. Same content as the fix bundled into QwenLM#4840's merge commit 0e104e1, extracted into a standalone test-only change so main recovers without waiting on a core-behavior PR; both merge cleanly in either order. Co-authored-by: qqqys <qys177@gmail.com>
… docs site (QwenLM#4916) Co-authored-by: tanzhenxin <tanzhenxing1987@gmail.com>
…ild controllers (QwenLM#4810) * fix(core): isolate OpenAI SDK abort listener leak with per-request child controllers OpenAI SDK v5.11.0's `fetchWithTimeout` (client.mjs:324) adds an abort listener on the caller's signal without `{once: true}` or cleanup on request completion. PR QwenLM#4366 removed the `raiseAbortListenerCap` band-aid (which had silenced the warning by setting maxListeners to Infinity) under the assumption that `createChildAbortController` in agent-core covered it. However, the SDK's internal listener leak was never addressed — it accumulates on whichever signal reaches the SDK across retries and rounds. Fix: wrap the signal passed to `client.chat.completions.create()` in a per-request `createChildAbortController`. The SDK's leaked listener stays on the short-lived child signal. The `finally` block aborts the child, triggering reverse-cleanup that removes the parent listener. For streaming, the cleanup runs when the async generator is fully consumed or abandoned. Closes QwenLM#4423 * fix(core): skip child controller when no parent signal is provided Tests that don't pass an abortSignal expect `signal: undefined` to reach the SDK. Only create the per-request child when a parent signal exists. * fix(core): capture narrowed abort controller for closure use TS does not propagate narrowing into the nested `drainThenCleanup` generator, so referencing `perRequestAc` inside the closure failed with TS18048. Capture the narrowed value in a local const after the guard so the finally block sees a non-nullable controller. * fix(core): address review — guard streaming create() with try/catch + add cleanup tests 1. Wrap the streaming `client.chat.completions.create()` call in try/catch so a network/DNS/proxy error still aborts the per-request child controller (same pattern as the non-streaming path). 2. Add three cleanup-invariant tests: - child signal aborted after full stream consumption - child signal aborted after consumer break - child signal aborted when SDK create() throws * fix(test): correct inaccurate comment and remove trailing whitespace - Update error-handling test comment: the error does propagate to the consumer via the async generator, not handled only internally - Remove trailing whitespace on blank line in early-break test --------- Co-authored-by: tanzhenxin <tanzhenxing1987@gmail.com>
…gent() (QwenLM#4721) (QwenLM#4732) * feat(core): register Workflow tool name (P1) * feat(core): isWorkflowsEnabled config gate with env-var override (P1) * feat(core): stripExportMeta helper for workflow sandbox (P1) * feat(core): createWorkflowSandbox with determinism stubs (P1) * test(core): cover workflow sandbox phase/log/agent primitives (P1) * feat(core): WorkflowOrchestrator with injectable dispatch (P1) * feat(core): WorkflowOrchestrator production dispatch via AgentHeadless (P1) * feat(core): WorkflowTool wraps WorkflowOrchestrator (P1) * feat(core): register WorkflowTool behind isWorkflowsEnabled gate (P1) * feat(core): export WorkflowTool from package index (P1) * fix(core): harden workflow sandbox + tighten agent() opts surface (P1) SEC-C1: deep-null-proto + hardenClosure blocks args/closure realm-escape PoC. SEC-C2: vm 30s timeout kills sync infinite loops. UP-C1: agent() throws on unsupported opts (schema/isolation/model/agentType). UP-I1: keep verbatim subagent system prompt comment. ARCH-C1: thread AbortSignal into buildProductionDispatch → subagent.execute(). UP-C2: llmContent carries script result verbatim; metadata moves to returnDisplay. SEC-I1: add WORKFLOW to EXCLUDED_TOOLS_FOR_SUBAGENTS to prevent recursive fan-out. SEC-I2: cap logs[] at 10 000 lines with a truncation marker. REUSE-I1: use ToolErrorType.EXECUTION_FAILED in workflow error returns. TST: add security PoC tests, unique-runId, dispatch-rejection, llmContent-unwrap. TST-I1: remove setter/getter tautology test from config.workflows.test.ts. * refactor(core): decouple WorkflowOrchestrator from Config (P1) - Extract WORKFLOW_SUBAGENT_SYSTEM_PROMPT into workflow-prompts.ts - Lift buildProductionDispatch() into exported createProductionDispatch(config, signal?) - WorkflowOrchestrator constructor now takes dispatch directly: (dispatch: WorkflowAgentDispatch) - Remove WorkflowOrchestratorOptions interface - WorkflowToolOptions.orchestratorOverrides replaced by WorkflowToolOptions.dispatch - WorkflowToolInvocation.execute() calls createProductionDispatch() when no override is set - Tests updated: orchestrator tests inject dispatch directly; production-dispatch tests moved to createProductionDispatch describe block * fix(core): Math proxy hardening, subagent prompt verbatim, Date.now throw, phases cap, test fidelity (P1) * fix(core): construct Math+Date in vm realm, sever proto chains on injected closures (P1) * fix(core): sever Array.prototype on args, cap deep-null-proto recursion, consolidate WorkflowAgentResult (P1) * fix(core): stub parallel/pipeline/workflow/budget globals with P1-unsupported errors * fix(core): harden budget inner functions, regression-test anti-recursion + args threading * fix(core): P2/P5 forward-compat injection seams + document error.stack limitation (P1) * fix(core): vm-realm wrap async sandbox globals + stripExportMeta hardening (PR QwenLM#4732 R1) Closes T1/T8/T14: thrown Errors and async-function Promises used to leak the host Function constructor through their prototype chains. PoC: agent('x').constructor.constructor('return process')() try { throw } catch(e) { e.constructor.constructor('return process')() } Build every async/sync global (agent, parallel, pipeline, workflow, budget, console, phase, log, args) inside the vm-realm via the existing init script. Host only exposes a primitive bridge that the init script reads once and deletes from globalThis. Both rejection and resolution paths cross the boundary as vm-realm values. Closes T2: deepNullProto used to setPrototypeOf(null) on array args, breaking for-of / .map / .filter / spread / destructuring. Replaced with vm-realm JSON.parse of an args string — arrays retain vm-realm Array.prototype methods. Closes T13: runtime allowlist on agent() opts catches typos like 'scema'. Closes T9/T16/T17: stripExportMeta now recognises //, /* */, and regex literals; throws on unbalanced braces instead of silently returning ''. Closes T6: validateArgs rejects functions, BigInt, circular refs, and over-deep nesting (previously functions silently disappeared). Closes T5: regression test for console.log/warn/error → getLogs routing. * fix(core): change WorkflowTool export to type-only (PR QwenLM#4732 R1 T3) Production callers use Config.createToolRegistry's registerLazy path which dynamic-imports './tools/workflow/workflow.js'. The barrel export at index.ts previously forced eager evaluation of the workflow.js → workflow-orchestrator → workflow-sandbox → node:vm module chain for every consumer of @qwen-code/core, even when workflows are disabled. Sibling tool exports (AgentTool, SkillTool) are type-only; align WorkflowTool with the same pattern. SDK consumers can still annotate types; instantiation happens through the registry, not the barrel. * fix(core): subagent terminateMode + bounded runConfig + disallowedTools + failure context + defensive serialization (PR QwenLM#4732 R1) Closes T10: runReasoningLoop returns terminateMode = CANCELLED|MAX_TURNS| TIMEOUT|ERROR rather than throwing. Without checking it, await agent(...) resolved to '' on user cancel and the workflow kept looping. Now check getTerminateMode() after execute() and throw on non-GOAL — mirrors AgentTool's existing handling. Closes T11: workflow subagents previously ran with runConfig: {} (no max_turns / max_time_minutes guard) and tools: ['*'] without disallowedTools. A single agent() could loop the model indefinitely. Bound to 50 turns / 10 minutes; add disallowedTools: [SEND_MESSAGE, EXIT_PLAN_MODE] to mirror upstream Tg8 — defense in depth with the §XmO system prompt. Closes T19: phases / logs accumulated before a script failure used to be discarded with the sandbox instance. WorkflowExecutionError carries them through the rejection so the user-visible display can show what ran. Closes T12 / T18: defensive serialization. A successful workflow returning a BigInt or circular value used to be reported as 'Workflow failed: Converting circular structure to JSON' because JSON.stringify was inside the try block. safeStringifyResult / safeStringifyDisplayPayload degrade to a clear placeholder so a serialization issue doesn't masquerade as a run failure. Closes T4: regression test for ToolErrorType.EXECUTION_FAILED assertion. Closes T7: vi as vitest alias removed (now matches every other test file). * chore(core): add missing @license headers + remove stale config-session-env reference (PR QwenLM#4732 R1) Closes T20: 6 of 9 new workflow files were missing the standard @license Apache-2.0 header. Add Qwen-style header (matches sibling tools/agent/agent.ts and others) to: workflow-sandbox.ts, workflow-sandbox.test.ts, workflow-orchestrator.ts, workflow-orchestrator.test.ts, workflow-prompts.ts, workflow.test.ts. Closes T21: config.workflows.test.ts and config.workflow-registration.test.ts both contained 'mirrors config-session-env.test.ts' in a setup comment, but that file does not exist in the repo. Drop the dangling reference. * chore(core): clean up stray rebase conflict marker (PR QwenLM#4732) * fix(core): sever sandboxGlobals proto + add async wall-clock timeout (PR QwenLM#4732 R2) Closes T22: sandboxGlobals was a plain host-realm Object literal. Its prototype chain reached host Object → host Function → host process, bypassing every per-global hardening measure. PoC confirmed leak via `globalThis.constructor.constructor('return process')()` returning host process before fix. Fix: Object.setPrototypeOf(null) on both sandboxGlobals and the bridge object before vm.createContext. Regression tests cover both globalThis.constructor and implicit-this escape paths. Closes T23: vm.runInContext timeout only covers synchronous execution. Once the async IIFE yields its first await, the watchdog disarms and `return new Promise(() => {})` hangs forever. Fix: wrap in Promise.race with a wall-clock timeout (default 30 minutes, configurable via SandboxOptions.maxWallClockMs or QWEN_CODE_MAX_WORKFLOW_SECONDS env var). This is a permanent defense-in-depth — not P1-only: P2/P3/P5 all add resource caps measured in agent-calls or tokens, but a 0-token / 0-agent hang requires a wall-clock cap. Documented limitation: an in-script async microtask loop continues consuming microtasks after the outer wall-clock rejects (node:vm provides no way to halt async execution). In production the workflow surface returns the timeout error and the vm context becomes unreferenced; the leaked microtask loop is a host-process concern that requires worker_threads-level isolation (out of P1 scope). * fix(core): pre-sanitize non-serializable result before display payload (PR QwenLM#4732 R3) Sibling drift of the R1 T12/T18 fix. safeStringifyResult already degrades per-field when the script's `result` is a BigInt / circular value, so llmContent survives. But the success-path display payload wraps {runId, phases, logs, result} in a single JSON.stringify — one bad `result` collapsed the whole display to the generic "(display payload not JSON-serializable)" string and the user lost the runId (needed for log correlation), the accumulated phases, AND the logs. Pre-sanitize `result` only; runId / phases / logs are always serializable. Add regression test that scripts a circular `result` with a `phase()` in front: assertions check runId, the phase, and the non-JSON-serializable placeholder all appear in returnDisplay, and that the atomic-failure fallback string does NOT appear. RED at 10:56:23 → fix → GREEN at 10:56:48. 14/14 workflow.test, 109/109 across the workflow test suite, typecheck silent. * refactor(core): push display-payload per-field fallback into the helper (PR QwenLM#4732) Post-R3 /simplify pass. The R3 fix special-cased `result` at the call site by pre-probing JSON.stringify and substituting a placeholder. Four review angles (reuse / simplification / efficiency / altitude) all converged on the same root cause: per-field degradation belongs in `safeStringifyDisplayPayload`, not duplicated at every caller. - Altitude: the bug ("all-or-nothing stringify is too coarse") names a property of the helper; the fix now lives in the helper. Any new payload field that becomes non-serializable in a future round (`metrics: bigint`, etc.) is handled without a fresh call-site patch. - Reuse: the third try/JSON.stringify probe in this file is gone; the helper owns the probe. - Simplification: call site reverts to the pre-R3 clean shape. - Efficiency: success path is back to one stringify per payload. Helper behavior: happy path → 1 stringify (unchanged) one field fails → walk top-level keys, probe each, replace failing value with `(non-JSON-serializable value of type X)`, re-stringify; total 2 stringifies of payload + N field probes. Fall through to the original generic fallback if the sanitized re-stringify also fails. R3 regression test (`execute() preserves runId/phases/logs in returnDisplay when result is non-JSON-serializable`) passes unchanged — it tests observable behavior, not the implementation site. 109/109 across the workflow suite, typecheck silent. * fix(core): honest description + meta-strip anchor + wall-clock cancellation + stray gitignore (PR QwenLM#4732 R4) Four fixes from R4 review: T32 (workflow.ts tool description) — P1 description claimed "sequential only" but `Promise.all([agent(), agent()])` bypasses the claim because the vm cannot intercept JS built-ins. Rewrite the description to be honest: P1 ships sequential primitives only (no parallel/pipeline); Promise.all spawns concurrent subagents that share Config and may race on file edits. Matches upstream Claude Code behavior — they also expose Promise.all without enforcement. T33 (workflow-sandbox.ts stripExportMeta) — drop the `/m` flag on the anchor regex. With `/m`, a template literal containing `\nexport const meta = {\n` triggered a false match, and the brace walker ripped content out of the string body, silently corrupting the script. Per design intent ("required first statement of every script") meta must be file-start; anchoring there closes the corruption surface. Adds two RED-confirmed regression tests (template literal + leading code) and a sanity test for leading whitespace. T35 (packages/core/.gitignore) — removed. The `.qwen/computer-use/` entry was stray scope pollution committed accidentally in R1's license- header cleanup (`d118c55f8`) and unrelated to the Workflow P1 surface. T40 (sandbox.ts + orchestrator.ts + workflow.ts) — completes the R2 wall-clock defense. When the timer fires the sandbox now `abort()`s a caller-supplied AbortController BEFORE rejecting; the controller's signal is also threaded into `createProductionDispatch`, so in-flight subagent.execute() calls see the cancellation and stop burning tokens. Without this, R2's "30 min wall-clock" still let subagents run for up to their internal `max_time_minutes: 10` after the user-side timed out. WorkflowTool.execute now derives `dispatchController`, forwards caller signal abort to it, passes its signal to dispatch and the controller itself to `orchestrator.run({abortOnTimeout})`. A `finally` block aborts the controller on natural completion (cancel any straggler subagent) and detaches the caller-signal listener to avoid leaks. Adds two sandbox unit tests (RED-confirmed): controller IS aborted on timeout, controller is NOT aborted on normal completion. 114/114 workflow suite tests pass, typecheck silent. * refactor(core): use createChildAbortController for T40 dispatch-signal bridge (PR QwenLM#4732) Post-R4 /simplify pass. Four review angles converged on one finding: the T40 manual AbortController-bridging at the call site re-implements `createChildAbortController` from `packages/core/src/utils/abortController.ts:61`, which is already the project-idiomatic helper for this exact pattern (used at agent-headless.ts:231, agent-interactive.ts:157, agent-core.ts:603 & 952). The replacement collapses 5 lines of imperative listener wiring at `workflow.ts:execute()` head + 1 line of finally cleanup into a single `createChildAbortController(signal)` call, plus inherits the helper's hardenings: - WeakRef on the parent (so a long-lived caller signal doesn't pin the child controller) - Auto-removal of the parent listener when the child fires (covers both the wall-clock-fire path and the natural-completion path) - Default 50-listener cap via `setMaxListeners` No behavior change at the API boundary — the wall-clock `abortOnTimeout` contract and the test assertions for T40's two cases (controller IS aborted on wall-clock; controller is NOT aborted on normal completion) all still hold. The /simplify "altitude" finding (push the bridging into the orchestrator) is deferred — that would change WorkflowOrchestrator's constructor/run signature and is outside this PR's scope. Also trims a redundant inline comment at workflow-orchestrator.ts:172 (the `abortOnTimeout: req.abortOnTimeout` line; the field's type comment at line 71-81 already explains the contract). 114/114 workflow suite tests pass, typecheck silent. * chore(core): compress over-weight T40 comments after createChildAbortController refactor (PR QwenLM#4732) The previous commit moved the bridging logic into the helper, so the inline comments restating the helper's contract (parent forwarding, already-aborted fast path, WeakRef, auto-removal) became redundant — that's the helper's job to document. Compress to the load-bearing semantics: the child controller sees both caller-driven and wall-clock-driven aborts, and `finally` cancels stragglers on normal completion. No code change. * chore(core): align copyright header to Qwen on 2 PR-new test files (PR QwenLM#4732 R7 F4) Both files were derived from a template (the stale `config-session-env.test.ts` reference cleaned up in R1 T21) and retained the upstream `Copyright 2025 Google LLC` header. The other six new workflow source/test files in this PR carry `Copyright 2025 Qwen`. Align for same-PR consistency. Per DragonnZhang R7 F4. No behavior change; header text only. --------- Co-authored-by: tanzhenxin <tanzhenxing1987@gmail.com>
…ain requests (QwenLM#4925) The mid-turn queue drain request sent after every tool batch awaited the client's response with no deadline. A client that silently drops unknown methods instead of rejecting with JSON-RPC -32601 never answers, wedging every tool-calling prompt turn. Race the drain against a 2s timeout and latch the feature off after three consecutive timeouts. Also make the integration-test ACP client spec-conforming: reply -32601 to unknown agent->client requests instead of ignoring them.
…wenLM#4923) The greeting-responder example in the agent tool description could mislead some models into launching a subagent for simple greetings instead of responding with plain text. Remove the example and its associated agent description entry to eliminate this risk.
…M#4872) (QwenLM#4881) * feat(ci): add auto-generated CHANGELOG.md synced from releases (QwenLM#4872) Add a Keep a Changelog CHANGELOG.md that is fully derived from the project's GitHub Releases, so users no longer have to dig through commit history to see what changed between versions. - scripts/generate-changelog.js: fetch releases via the gh CLI, keep only stable vX.Y.Z tags (nightly/preview omitted), and re-group each release's auto-generated "What's Changed" list into Added/Changed/Fixed/Performance/ Documentation/Other sections by the conventional-commit prefix every PR title uses. Zero runtime deps (node builtins + gh). - release.yml: regenerate and commit CHANGELOG.md on stable releases; the commit rides the existing release-branch PR into main. - package.json: add 'npm run changelog'. - .prettierignore: skip the generated file. - Seed CHANGELOG.md from the current stable release history. Closes QwenLM#4872 * ci(release): make CHANGELOG regeneration non-blocking Add continue-on-error to the CHANGELOG step. Its only realistic failures (the gh API read and the git push) are transient, and the generator rebuilds from the full release history each run, so a skipped update self-heals on the next stable release. This keeps a changelog hiccup from blocking the version-bump PR to main that follows. * fix(ci): harden changelog generator per review Address review findings on the changelog generator: - Command injection: fetchReleasesJsonl built a shell string with the repo interpolated; switch to execFileSync (no shell) plus an owner/name format check so --repo can never be a shell payload. - Empty-response clobber: if the API returns zero stable releases (rate limit, auth, 5xx), refuse to overwrite CHANGELOG.md and exit 1 instead of committing a header-only stub. Paired with set -euo pipefail in the workflow so the failure is visible and the non-blocking step self-heals next release. - Arg parsing: switch getArgs() to parseArgs() (already imported) so a --dryrun typo errors instead of silently overwriting, and -h works as documented. - Breaking changes: capture the conventional-commit ! marker and prefix the entry with **BREAKING** (the repo uses feat()!:/refactor()!: in practice). - Bot authors: ENTRY_RE now accepts a trailing [bot] so GitHub App authors (e.g. @dependabot[bot]) are not dropped from release notes. Regenerates CHANGELOG.md (two entries now flagged **BREAKING**). * refactor(ci): simplify changelog generator Quality-only cleanup (output is byte-identical; 23 tests green): - Single source of truth for sections: derive TYPE_TO_SECTION and SECTION_ORDER from one SECTIONS list so they can't drift. - Parse each entry once: formatRelease now reuses the categorize() result for the noise check, section lookup, and formatEntry (was parsed up to 3x). - Drop the redundant sortKey field; sort directly from version. - Collapse the duplicated double-.map() setup in the selectStableReleases test.
…king (QwenLM#4779) * docs(stats): add dashboard design spec and implementation plan Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(stats): add cross-session usage tracking service Add usageHistoryService to core with JSONL-based persistence, session replay from chat history with sessionId deduplication, time-range aggregation, and per-model/tool/file breakdown including latency fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(stats): add stats data service and ASCII chart utilities Add statsDataService for delta calculations, efficiency metrics, tool leaderboard, and heatmap/trend data. Add asciiCharts with braille line chart (Bresenham rendering) and GitHub-style contribution heatmap. Includes 38 unit tests covering both modules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(stats): implement interactive /stats dashboard Add three-tab dialog: Session (live metrics), Activity (KPIs, heatmap, braille token trend chart, project ranking), and Efficiency (cache rate, tool success, latency cards, tool leaderboard, model comparison table). Supports tab/shift-tab navigation, r to cycle time ranges (all/month/ week/today), left/right to pan months in the trend chart, esc to close. Persist usage on /clear for accurate cross-session tracking. Update statsCommand tests for new dialog behavior and clearCommand tests for telemetry mock. Update /stats documentation in commands.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(i18n): add stats dashboard translations for all locales Add translations for stats dashboard UI strings in zh, zh-TW, ca, de, fr, ja, pt, ru. Add stats keys to en.js baseline and mustTranslateKeys. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(stats): use terminal default background for inactive heatmap cells Intensity 0 cells (no activity) now render without backgroundColor, inheriting the terminal's native background instead of a hardcoded color that renders incorrectly across different terminal themes. Also fix green gradient direction: brighter = more activity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(stats): use dot markers for inactive heatmap cells Inactive cells render as '··' with no background color instead of colored blocks, matching common contribution graph designs. Active cells keep their green gradient backgrounds. Fix gradient direction so brighter green = more activity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(stats): restore full session stats from original StatsDisplay Add back Session ID, Success Rate with color thresholds, User Agreement rate, Performance breakdown (Wall Time, Agent Active, API Time %, Tool Time %), and full token counts that were present in the original exit screen but missing from the new Session tab. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(stats): address PR review — timezone bugs, double-count, cleanup - Fix monthOffset overflow: setDate(1) before subtracting months to prevent day-count overflow (e.g. Mar 31 → Feb) - Fix UTC date-parse off-by-one: append 'T00:00:00' to date-only strings in calculateStreaks and HeatmapView fmtDate - Fix current session double-counted after rebuild: deduplicate by sessionId when injecting live session into loadStatsData - Remove unused bodyWidth prop from SessionTab - Remove 13 unused i18n keys (Overview, Favorite model, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(stats): add NaN guard, catch unhandled promise, fix useEffect race - Guard against malformed chat records with NaN timestamps in rebuild - Add .catch() to loadStatsData promise to prevent TUI crash - Add stale flag to useEffect to prevent race on rapid range cycling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(i18n): sync locale files with en.js baseline for CI check Add 19 missing translations to zh-TW.js, remove extra keys from zh.js and zh-TW.js that were deleted from en.js in prior cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(stats): use fake timers in getPreviousRangeBounds tests The test compared new Date() in the assertion against new Date() inside the function, which could differ by 1ms across a millisecond boundary. Pin system time to prevent flaky CI failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(i18n): restore Session and Success keys removed in error These keys are still referenced by t('Session') in stats-helpers.tsx and t('Success') in StatsEfficiencyTab.tsx. Add to en.js baseline and restore zh/zh-TW translations. * fix(stats): address R3 review — malformed record guard, arrow fix, error state - Skip malformed records in aggregateUsage (missing tools/files/models) - Use Object.create(null) to prevent prototype pollution on model names - Fix Avg Latency delta arrow direction (▼ for decrease, ▲ for increase) - Clamp fmtSuccessBar to prevent RangeError on corrupt data - Add error state UI when loadStatsData fails * fix(stats): address R4 review — DST fix, token consistency, tests, cleanup - Fix DST bug in getPreviousRangeBounds('today') using setDate - Unify token counting: project ranking uses totalTokens (same as KPI) - Add clearCommand tests for persistSessionUsage with/without activity - Remove dead code (unreachable sorted.length check) - Fix heatmap legend to use dot markers matching grid cells - Add 'Failed to load stats' i18n key to en/zh/zh-TW * fix(stats): include thoughtsTokens in totalTokens fallback calculation * Revert "feat(input): move physical cursor to visual cursor for IME input (QwenLM#4652)" This reverts commit 77458ad. * fix(stats): prevent Yoga layout from compressing StatsDialog content Add flexShrink={0} to the outer Box of StatsDialog so that Yoga's default flex-shrink behavior does not compress KPI cards, charts, and tables when the dialog exceeds the available terminal height. The parent container in DefaultAppLayout already applies height + overflow="hidden" to clip overflow — this fix ensures content retains its natural size instead of being squeezed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(stats): add debug logging to catch blocks and strengthen test assertion - Add debugLogger to all catch blocks in usageHistoryService.ts for observability when file I/O or parsing fails - Strengthen test assertion from toContain('Session duration') to toContain('Session duration: 0s') to verify the zero-duration fallback --------- Co-authored-by: a.ran <benguanran.bgr@alibaba-inc.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…M#4932) The `env` command is a command proxy that can execute arbitrary commands with side effects (e.g., `env open -a Calculator`, `env rm -rf ...`). Including it in the read-only allowlist bypasses the user confirmation prompt, enabling prompt-injection attacks to achieve arbitrary code execution without user interaction. Remove `env` from both `READ_ONLY_ROOT_COMMANDS` sets (regex-based checker in shellReadOnlyChecker.ts and AST-based checker in shellAstParser.ts) so all `env` invocations require user confirmation. Fixes QwenLM#4930
…nLM#4946) The CronScheduler fired jobs immediately when the current minute matched the cron expression, because lastFiredAt was undefined and the double-fire guard was bypassed. Combined with /loop's intentional immediate execution (SKILL.md step 3), this caused duplicate execution of the prompt. Initialize lastFiredAt to the creation minute-start so the scheduler treats it as already-fired and waits for the next matching cron window.
…M#4945) (QwenLM#4949) When the proportional branch dominated auto (e.g. 60K context window), rawHard fell below auto, causing hard to degrade to auto. This meant compaction only triggered at the very last moment with no gap between auto and hard tiers. Change hard from max(rawHard, auto) to min(window, max(rawHard, auto + HARD_BUFFER)), guaranteeing hard > auto for all non-zero windows while preserving behavior for large windows where rawHard already dominates. Closes QwenLM#4945
…machine (QwenLM#4903) * fix(installer): auto-detect SYSTEM account and default PATH scope to machine When the Windows standalone installer runs as the SYSTEM account (SID S-1-5-18), which is common for ECS Workbench SSM, WinRM, and scheduled tasks, the default --path-scope is now 'machine' instead of 'user'. Previously, persisting PATH to the SYSTEM user's HKCU hive had no effect on new sessions spawned by the SSM agent service, because the agent inherits its environment at service start time and never re-reads HKCU. Writing to the Machine PATH (HKLM) ensures new sessions pick up the qwen shim immediately. Also adds: - --repair-path flag to fix PATH for an existing install without reinstalling - --path-scope user|machine CLI option and QWEN_INSTALL_PATH_SCOPE env var - Uninstaller now cleans both User and Machine PATH entries - CI validation step for hosted installation assets Resolves QwenLM#4901 * test(installer): add Windows self-hosted qwen smoke * fix(installer): address Windows PATH review feedback * refactor(installer): remove redundant YAML validation, deduplicate test fixtures - Remove windows-self-hosted-qwen-smoke.yml (operational tool, not part of fix) - Revert sync-release-to-oss.yml validate step (redundant with JS behavior guards) - Extract duplicated test fixture strings into shared constants - Pass normalized path to Remove-PathEntry to avoid double Get-NormalizedPath - Fix stale comment referencing "user PATH" when scope is now configurable * chore: drop unrelated dev.test.js changes from installer PR
…alars (QwenLM#4870) * fix(skills): use full YAML parser for frontmatter to support block scalars The custom `yaml-parser.ts` does not handle YAML block scalar syntax (`>` folded, `|` literal). When a SKILL.md uses `description: >` with indented continuation lines, only the `>` character is captured as the description value. Switch both skill loading paths (`skill-load.ts` and `skill-manager.ts`) to use the `yaml` npm package (already installed for hooks parsing) as the primary parser, with a fallback to the simple parser for malformed YAML that the strict parser rejects. This also simplifies `skill-manager.ts` by removing the special-case re-parse for `hooks:` — the full parser now handles all fields uniformly. Closes QwenLM#4869 * fix(skills): address review feedback — explicit dep, null guard, shared parser - Add `yaml` to `packages/core/package.json` explicit dependencies - Move the full-YAML-with-fallback logic into `yaml-parser.ts` as the single `parse()` entry point, eliminating the duplicated wrapper - Add null/type guard so `yaml.parse('')` (returns null) falls back cleanly instead of throwing TypeError - Add debug logging when the full parser fails and fallback triggers - Tighten `|` block scalar test to use `toBe` instead of `toContain` - Move block scalar tests to `yaml-parser.test.ts` (no mock interference) - Add `>-` strip chomping test * chore: make parseSimple private, remove stale yaml-dep comment - `parseSimple` is an internal fallback, no external caller needs it — stop exporting it via `index.ts`'s `export *` - Remove the outdated "yaml package would need to be added" comment in `subagent-manager.ts` since `yaml` is now an explicit dependency * test: add comprehensive edge-case tests, use YAML 1.2 core schema - Use `schema: 'core'` to prevent date-like strings (e.g. `2024-01-01`) from being coerced into Date objects, which would break parse→stringify roundtrips in claude-converter and subagent-manager - Add yaml-parser unit tests for: Date non-coercion, bare keys returning null, explicit null/tilde, yes/no as strings (YAML 1.2), empty input, comment-only input, malformed YAML fallback - Expand skill-load real-parser integration tests: literal block scalar, strip-chomped folded, allowedTools array, complex mixed frontmatter, malformed YAML graceful fallback * fix: use loose null check for optional frontmatter fields `yaml.parse` returns `null` for bare keys (`hooks:` with no value), while the old simple parser returned `''`. The existing `!== undefined` guards don't catch null, which would cause a TypeError in parseHooksConfig and a misleading error for allowedTools. Change to `!= null` (catches both null and undefined) in both skill-load.ts and skill-manager.ts. * refactor: normalize null values in yaml parser instead of caller-side guards Move the null→undefined normalization into `parse()` via `stripNullValues` so that callers keep using strict `!== undefined` checks. This avoids the loose `!= null` pattern and fixes the contract for all five call sites (skill-load, skill-manager, subagent-manager, claude-converter, rulesDiscovery) in one place. * fix(yaml-parser): address review round 2 — security hardening and dedup - Prevent prototype pollution via Object.create(null) in both stripNullValues and parseSimple (fixes __proto__ injection vector) - Recursively sanitize nested objects (Date/Uint8Array/null/__proto__) - Upgrade fallback/non-object log level from debug to warn - Filter !!timestamp/!!binary explicit tags (belt-and-suspenders) - Extract shared parseAllowedToolsField helper to deduplicate skill-load.ts and skill-manager.ts - Add runtime type guard for hooksRaw in skill-manager.ts - Fix fallback tests to use genuinely invalid YAML input * fix(yaml-parser): apply stripNullValues to parseSimple fallback path The fallback path returned parseSimple() output directly, bypassing stripNullValues(). This caused inconsistent output: the main path stripped null values so callers could use `!== undefined`, but the fallback preserved them. Wrap the fallback return in stripNullValues() and add a test verifying null-stripping consistency across both paths.
…n for sleep chains (QwenLM#4948) When the model tries `sleep 5 && cmd`, the old error message said 'split follow-up commands into a separate invocation' but omitted the `# intentional-sleep: <reason>` syntax. The model would then try standalone `sleep 5`, get blocked a second time, and only then learn the escape hatch. Now the first rejection for non-standalone sleep tells the model both steps: split into two calls and use the intentional-sleep comment. This reduces failures from 2 to 1. Also reuses the already-computed `strippedCommand` variable instead of calling `stripShellWrapper` a second time.
Fixes QwenLM#4904. The qwen3.7-plus model was available in the Standard API Key provider but missing from the Coding Plan provider. Added it to both MODELSTUDIO_MODELS (alibaba-coding-plan.ts) and ALIBABA_SUBSCRIPTION_MODELS (subscriptionPlanDefinitions.ts) to keep them in sync. Co-authored-by: 俊良 <zzj542558@alibaba-inc.com>
Graduate cron/loop from experimental opt-in to enabled-by-default. Flip env var polarity from QWEN_CODE_ENABLE_CRON to QWEN_CODE_DISABLE_CRON for users who want to opt out. Update integration tests, docs, and VS Code schema accordingly.
…d tool calls (QwenLM#4936) The model sometimes answered the old prompts without ever calling the shell tool, timing out waitForToolCall and burning ~230s per CI run with retries. Prompts now explicitly require a run_shell_command call, assertions key off recorded telemetry (blocked call with success: false and Monitor guidance in the error) instead of model narration, and failures print debug info.
…ict backends (QwenLM#4917) OpenAI Chat Completions only permits text on `role:"tool"` messages, so an image read via read_file — the only image path available to a subagent — was embedded there and silently dropped by strict OpenAI-compatible backends (doubao / new-api / LM Studio). The model never saw the image and returned content unrelated to it (QwenLM#4876). Permissive backends (e.g. DashScope) happen to parse it, which is why the same model worked for the main agent via @-image (role:"user") but not for the subagent via read_file (role:"tool"). Flip the runtime default of splitToolMedia to true so tool-returned media is lifted into a follow-up role:"user" message — spec-compliant and visible to all backends. Opt out via generationConfig.splitToolMedia = false. Also: - modalityDefaults: recognize ByteDance Doubao (Seed chat + *vision/*vl => image; seedance/seedream generation models => text-only). - settingsSchema + docs: default true, description corrected to cover the built-in read_file (not only MCP tools). Tests: pipeline default-true regression, modalityDefaults doubao cases, converter opt-out wording.
…QwenLM#4852) When a single logical line is hard-wrapped across multiple visual lines, pressing left-arrow at the start of a wrapped visual row could land on a position that mapped back to the same logical cursor, making the cursor appear stuck. Detect this case and step one column further left so the visual cursor actually advances.
…+ maxTurns wiring + color allowlist (CC 2.1.168 parity) (QwenLM#4842) * docs: add declarative agents port design doc for QwenLM#4821 * feat(core): add declarative agent frontmatter schema constants and 9 new fields Adds agent-frontmatter-schema.ts as the single source of truth for the CC 2.1.168 declarative-agent enum constants (EFFORT_VALUES, PERMISSION_MODE_VALUES, MEMORY_VALUES, ISOLATION_VALUES, COLOR_VALUES) and lenient parsers (parseEffort, parseMaxTurns, parseBackground, parseStringOrArray) that mirror DL7's warn-and-drop posture. Also adds a permissionModeToApprovalMode bridge to map CC's permission modes onto qwen-code's existing approvalMode semantics. Extends SubagentConfig with 9 optional fields carried verbatim from CC frontmatter: permissionMode, effort, maxTurns, skills, initialPrompt, memory, isolation, mcpServers, hooks. Runtime semantics for the metadata-only fields are deferred to follow-up PRs; this lands the data carrier. Refs QwenLM#4821 QwenLM#4721 QwenLM#4732 * feat(core): parse and serialize 9 new declarative agent frontmatter fields Extends parseSubagentContent and serializeSubagent to round-trip the 9 CC 2.1.168 fields previously added to the SubagentConfig type. Parsing follows DL7's lenient warn-and-drop posture (vs the existing strict throw posture used for approvalMode), so a Claude Code agent file with an invalid optional field still parses with that field dropped — matching CC behavior so users can drop CC agent files into .qwen/agents/ unchanged. Adds a permissionMode → approvalMode bridge: when frontmatter has permissionMode but no approvalMode, the bridge resolves the approvalMode at parse time using the same mapping as claude-converter.ts. If both are set, approvalMode wins (already-explicit values take precedence over inferred ones). Tests: - 22 new parser cases covering happy paths, invalid drops, permissionMode bridge precedence, color allowlist (with auto sentinel preserved), and loose-validation passthrough for mcpServers / hooks. - 9 new serializer cases asserting round-trip and omit-when-unset behavior. Refs QwenLM#4821 QwenLM#4721 QwenLM#4732 * feat(core): promote top-level maxTurns to runConfig.max_turns at convert time Wires the new top-level `maxTurns` field into the existing runtime configuration pipeline so the value declared in agent frontmatter actually limits the agent's turn budget at run time. When both the top-level `maxTurns` and the legacy nested `runConfig.max_turns` are set, the top-level field wins (more specific and matches the CC frontmatter shape upstream agent files use). When only the nested field is set, behavior is unchanged. Refs QwenLM#4821 QwenLM#4721 QwenLM#4732 * refactor(core): use shared permissionMode bridge and parseStringOrArray in claude-converter Replaces the inline claudeToQwenMode table and the local parseStringOrArray helper in claude-converter.ts with the shared exports from agent-frontmatter-schema.ts. One source of truth for the CC ↔ qwen mapping keeps the two import paths (.qwen/agents/*.md frontmatter and Claude plugin import) in sync — when the upstream CC schema changes, only one table needs updating. Adds explicit tests for the bridge mapping (six known modes + unknown fallback). The fallback now happens at the call site rather than inside the table, but the observable behavior is unchanged. Refs QwenLM#4821 * docs(subagents): document CC-compatible frontmatter fields Adds a Claude Code Compatibility Fields section to the user-facing subagents reference, covering the 9 new frontmatter fields landed in this PR. The intent is for users with existing Claude Code agents to know they can drop the files into `.qwen/agents/` and have them parse identically. Refs QwenLM#4821 * refactor(core): reuse parseBackground in declarative agent loader Replaces the inline boolean-or-string lenient parse for the background field with the shared parseBackground util added in the schema module. Same observable behavior; one fewer place to keep in sync when the upstream shape changes. Refs QwenLM#4821 * fix(core): address adversarial review of declarative agents PR Four self-review findings caught and fixed before opening the PR: 1. **mcpServers/hooks round-trip claim was broken.** The local yaml-parser only formats one level of nesting, so serializing an agent with nested mcpServers or hooks would mangle the value into '[object Object]'. The fields are still carried verbatim in memory (read path is fine), but the serializer no longer emits them — losing the field through '/agents' edits is strictly safer than corrupting it. Documented the limitation in docs/users/features/sub-agents.md. 2. **approvalMode throw vs permissionMode lenience asymmetry.** The pre-existing approvalMode parser threw on invalid values, killing the entire agent file. With the new permissionMode bridge, a CC-imported file with 'permissionMode: bypassPermissions' + 'approvalMode: tpyo' would reject the file instead of dropping the typo and using the bridge. Demoted approvalMode to the same DL7-parity warn-and-drop posture all the new fields use; the bridge now runs when approvalMode is invalid. 3. **parseEffort missed numeric strings.** CC's DL7 falls back to parseInt for non-enum effort strings so 'effort: "5"' (quoted YAML) round-trips like 'effort: 5'. Added the same fallback and tests for floats / partial numeric strings / valid numeric strings. 4. **convertAgentFiles stripped 6/9 of the new fields.** The Claude plugin import path read frontmatter into a fixed ClaudeAgentConfig shape, so 'effort', 'maxTurns', 'initialPrompt', 'memory', 'isolation', 'mcpServers', and 'background' were silently dropped when installing a CC plugin agent. Build the rewritten frontmatter from the original keys first, then overlay the converter's transformations for the keys it owns — unknown CC fields now passthrough verbatim, future-proofing against later CC additions. Refs QwenLM#4821 QwenLM#4721 QwenLM#4732 * fix(core): round-2 self-review fixes for declarative agents PR Second adversarial review surfaced 5 more findings; all fixed: 1. **convertAgentFiles passthrough corrupted nested objects.** Round-1 fixed serializeSubagent to skip emitting mcpServers/hooks (the local yaml-parser collapses nested values to '[object Object]'). The plugin-import passthrough I added to claude-converter.ts had the same mangle bug — a CC plugin agent with nested mcpServers got written to disk as 'mcpServers:\n - [object Object]'. Now skips both fields via a NESTED_FIELDS_NOT_ROUND_TRIPPABLE set. 2. **parseEffort accepted 0 and negative integers.** Docs say 'positive integer' and parseMaxTurns already rejects <= 0. parseEffort didn't, accepting '0' / '-5' / 0 / -1 as valid efforts. Aligned both helpers and added tests for the boundary values. 3. **Function name collision: permissionModeToApprovalMode.** packages/core/src/ tools/agent/agent.ts has a module-private permissionModeToApprovalMode that maps the qwen PermissionMode enum to ApprovalMode enum (different domain entirely). My new exported helper used the same name; IDE auto-import would silently return undefined for every qwen enum value. Renamed the export to claudePermissionModeToApprovalMode and updated 4 callers. 4. **color: 'auto' round-trip asymmetry.** Round-1 added a test pinning that the parser preserves the legacy 'auto' sentinel. But serializeSubagent already had a pre-existing 'skip emit when auto' branch, so a parse → serialize → parse cycle dropped the sentinel. The CLI helpers (shouldShowColor / getColorForDisplay) already treat 'auto' identically to undefined, so normalizing 'auto' to undefined at parse time has no downstream effect AND makes round-trip cleanly idempotent. 5. **Converter approvalMode precedence diverged from loader.** When a CC source file had both 'permissionMode: bypassPermissions' AND 'approvalMode: default', the convertClaudeAgentConfig path wrote 'approvalMode: yolo' (bridge wins). The loader's rule is 'approvalMode wins over bridge'. Extended ClaudeAgentConfig with an approvalMode field and gated the bridge emit on 'source approvalMode is unset', aligning convert and load precedence. Refs QwenLM#4821 QwenLM#4721 QwenLM#4732 * fix(core): round-3 self-review fixes — drift, API hygiene, converter contract Three quality findings from the round-3 adversarial pass (after rounds 1+2 fixed 9 issues each). All low-severity but cheap; the runtime was correct, the file-on-disk + public-API surface needed tightening. 1. **Stale runConfig.max_turns duplicated on every write.** Top-level maxTurns was promoted in 9b903ee but the serializer still emitted runConfig.max_turns verbatim. A file with both fields kept both indefinitely. The runtime already prefers top-level (so behavior is correct), but a third-party tool or future viewer reading the legacy nested field would act on stale data. Prune nested max_turns from the serialized runConfig when top-level maxTurns is set; drop the runConfig block entirely if pruning leaves it empty. Three new tests pin the prune, the drop-empty-block path, and the no-top-level fallback. 2. **14 internal symbols leaked into the public package API.** The PR re-exported every helper from agent-frontmatter-schema.ts via subagents/index.ts. grep against packages/cli + packages/vscode-ide-companion finds zero cross-package consumers — both internal users (subagent-manager, claude-converter) already import directly from the schema file. Locking constant names like EFFORT_VALUES and parseEffort in the public API would constrain follow-up PRs (e.g. when js-yaml lands and the schema shape changes). Removed the entire export block; left a comment explaining the choice so a future PR knows the bar for re-introducing it. 3. **convertClaudeAgentConfig silently dropped a standalone approvalMode.** Round-2 gated the permissionMode bridge on !claudeAgent.approvalMode for precedence parity with the loader, but never added the explicit copy for approvalMode-only callers. Result: caller passes {approvalMode: 'plan'} → output has no approvalMode at all. The in-tree convertAgentFiles path recovered via the unowned-key passthrough, but exported direct-call surface needs the explicit copy. Two new tests pin both the standalone case and the precedence-when-both case. Also: pinned yaml-parser limitation tests (added in the round-3 e2e independent check) document that the lightweight parser cannot round-trip nested mcpServers/hooks; documentation + types comments updated to call out the carve-out so users know the field works only for flat forms until js-yaml lands. Refs QwenLM#4821 QwenLM#4721 QwenLM#4732 * refactor(core): shrink declarative agents PR to vertically-sliced v1 Reduces this PR to ship only the fields that have an end-to-end runtime path today: permissionMode (bridges to existing approvalMode), maxTurns (wires into runConfig.max_turns), and a tightened color allowlist. Everything else gets deferred to follow-up PRs that first land the prerequisite infra they need. Removed: - effort (no model-layer effort param in qwen providers) - mcpServers / hooks (need nested-aware YAML parser — yaml-parser.ts is hand-rolled and only handles 1 level) - memory (qwen's auto-memory has no user/project/local scope distinction) - isolation (workflow PR QwenLM#4732 owns the runtime) - initialPrompt (needs --agent CLI flag, no main-session-agent infra) - skills (needs SkillManager consumption of config.skills) This drops 7 fields from SubagentConfig + types.ts, deletes their parser / serializer / test code, deletes the unused enum constants and helpers from agent-frontmatter-schema.ts (EFFORT_VALUES, EFFORT_ALIASES, MEMORY_VALUES, ISOLATION_VALUES, parseEffort, isMemory, isIsolation), and trims the user docs to the supported field set with a pointer to the design doc for the deferred fields. Why ship the design doc alongside such a narrow v1: the full reverse-engineering record in docs/declarative-agents-port.md (DL7/Ig5/GN/_Y constants, error messages, schema parity decisions, coordination matrix with workflow PR QwenLM#4732) stays load-bearing for the follow-up PRs. A status table at the top discloses what's deferred and why. Net: -559 LOC across 7 files. Final PR is permissionMode bridge + maxTurns wiring + color allowlist + design reference doc + the yaml-parser nested-limitation pin tests added during round-3 review. Refs QwenLM#4821 QwenLM#4721 QwenLM#4732 * refactor(core): revert round-X residuals to hit v1's actual minimum LOC The previous shrink left round-1/2/3 fixes in place that were originally for the wider 9-field scope; with the carry-only fields gone they're no longer needed. Reverting them all back to pre-PR behaviour: - approvalMode strict throw (round-1 demoted to lenient; defer the symmetry fix to a separate PR if wanted) - runConfig.max_turns prune on serialize (round-3 cleanup; defer) - color: 'auto' normalised to undefined (round-2 round-trip fix; defer) - claude-converter NESTED_FIELDS_NOT_ROUND_TRIPPABLE + CONVERTER_OWNED_KEYS + passthrough loop (round-1/2 — all for the deleted carry-only fields) - ClaudeAgentConfig.approvalMode + precedence gating (round-3) - parseBackground in shared schema module (the dedup wasn't pulling weight) - parseStringOrArray in shared schema module (same) The shared bridge claudePermissionModeToApprovalMode + the disambiguated name stay — converter and loader use the same map and the name collision risk is real. Net: -336 LOC vs the prior shrink commit. * docs: sync user-facing docs to actual v1 behaviour The previous shrink left two stale claims in docs/users/features/sub-agents.md that referenced behaviour reverted in the round-X-revert commit: - maxTurns row claimed 'the legacy nested value is pruned from the on-disk file on save to avoid two sources of truth' — the prune was reverted - color row claimed 'auto is accepted on read and normalised to undefined for round-trip parity' — the normalisation was reverted; auto is preserved as-is for backward compat Updated both rows to match what the parser actually does now. The design doc (docs/declarative-agents-port.md) is unchanged: its top-line disclaimer already labels the body as reference material for follow-up PRs, so the references to the wider plan (D7 bridge, P1-P4 phases) are accurate-as-reference even though some details are deferred. * fix(core): use Map.get in claudePermissionModeToApprovalMode (prototype-chain footgun) The PERMISSION_MODE_TO_APPROVAL_MODE table was a plain object accessed with `[key]`. Calling claudePermissionModeToApprovalMode('__proto__') walked the prototype chain and returned Object.prototype — a non-string value that violated the declared return type. The current loader path was safe because isPermissionMode() filtered prototype-key strings before the bridge ran, but the exported function itself was a latent footgun for any future caller that didn't know to pre-filter. Switched the lookup table to a Map so prototype keys cannot be reached. Added explicit tests for __proto__, constructor, hasOwnProperty, toString. Refs QwenLM#4821 * fix(core): address round-2 review on declarative agents PR QwenLM#4842 Two findings: 1. `serializeSubagent` wrote both `permissionMode` and the bridge-derived `approvalMode` into the same frontmatter block. On the next load the parser takes `approvalMode` (explicit wins over bridge), so `permissionMode` silently became dead frontmatter — a user editing it later in the file would be ignored. Skip the `permissionMode` emit when `approvalMode` is also being written; `permissionMode` still round-trips when it's the user's only intent. Two new tests pin both branches. 2. `subagents/index.ts` had a NOTE comment referencing `EFFORT_VALUES` and `parseEffort` as example schema helpers that intentionally aren't re-exported. Those symbols were removed during the v1 scope shrink (637b8b7) and don't exist in the current module. Updated the comment to reference `claudePermissionModeToApprovalMode`, `parseMaxTurns`, and `isPermissionMode`, which are the actual current contents. Sibling-drift note: the same dual-emit pattern exists for `maxTurns` vs `runConfig.max_turns` — round-X revert (70a876d) explicitly deferred the `runConfig.max_turns` prune. Not addressed here per that earlier decision. Refs QwenLM#4842
* fix(core): microcompact hook continuations * test(core): cover hook microcompaction edge cases * test(core): cover hook checkpoint fallback * chore(core): refresh hook microcompaction checks * fix(core): keep hook microcompaction cleanup best-effort * test(core): cover hook microcompaction checkpoint state * test(core): cover hook microcompaction review cases * fix(core): retain hook checkpoint after microcompaction failure * fix(core): clarify hook microcompaction comment * fix(core): preserve hook checkpoint on cron no-op
…dination (QwenLM#4844) * feat(core): add Agent Team foundation (experimental, flag-gated) First stage of re-porting the Agent Team feature (originally PR QwenLM#2886) onto current main. The branch had diverged 362 commits behind a parallel rewrite of the agent runtime, so the feature is being re-applied stage by stage rather than merged. This stage lands the self-contained agents/team/ subsystem (TeamManager, mailbox, identity, tasks, leader permission bridge, test-utils) plus the team_create/team_delete and task_create/task_update/task_list tools, with the additive plumbing they need: - Config: TeamManager/TeamContext accessors, cleanupTeamRuntime, and isAgentTeamEnabled (settings or QWEN_CODE_ENABLE_AGENT_TEAM=1). - Tool registry: team/task tools registered lazily, gated on the flag. - Runtime hooks: completeOnIdle for one-shot teammates; args on the agent approval event; teammate-aware tool exclusion sets. - Backend types: TeamAgentHandle, optional getAgent, completeOnIdle. - New experimental.agentTeam setting. Everything is gated behind the experimental flag and inert by default. Build is green; team unit tests and all touched-file regressions pass. * feat(core): add send_message team routing (experimental) Stage 1 of the Agent Team re-port. Extends the send_message tool so it can route to a teammate (or "*" for broadcast) via TeamManager in addition to its existing background-task path, and supports the shutdown_request control message (leader-only). Recipient selection is a oneOf over `to`/`task_id`. Layered on top of main's classifier integration: send_message keeps its 'ask' default permission and forwards the routing fields + message to the AUTO classifier, since the message is an instruction the recipient executes. Re-adds the team-lifecycle E2E test, which now passes end to end (create -> tasks -> messages -> list -> update -> delete). * feat(core): let the Agent tool spawn named teammates (experimental) Stage 3 of the Agent Team re-port. Adds the `name` parameter to the Agent tool: when a team is active and a name is given, the call routes through TeamManager.spawnTeammate instead of launching a one-shot subagent. Without a team the call is rejected up front rather than silently falling back. The tool description advertises team coordination only when the experimental flag is on. Ported onto main's rewritten Agent tool. * feat(cli): render team_result/task_list tool displays (experimental) Stage 5 (partial) of the Agent Team re-port. Teaches the ToolMessage result renderer about the TeamResultDisplay and TaskListResultDisplay shapes so the team/task tools' output is shown via their returnDisplay text instead of a stringified object, and adds a JSON.stringify safeguard for any other non-string display object. The remaining CLI wiring (nonInteractiveCli + useGeminiStream team drivers, permissionController.handleTeammateApproval) is coupled to the turn-loop Teammate handling and will land together with Stage 4. * feat(core): treat teammate messages as top-level turns (experimental) Stage 4 of the Agent Team re-port. Adds SendMessageType.Teammate and includes it in isTopLevelInteraction so that a teammate message delivered to the leader resets the loop detector and opens an interaction span, the same as a user/cron/notification turn. Per the agreed minimal integration, teammate turns deliberately do NOT run the UserQuery/Cron block — they don't bump commit attribution, aren't recorded as user messages, and don't trigger auto-memory prefetch. That keeps the edit to main's restructured turn loop to a single condition. * feat(cli): drive teammates from the headless run loop (experimental) Stage 5 of the Agent Team re-port. Wires the non-interactive/headless run loop to the active team: it subscribes to TeamManager changes, drains teammate messages into the leader's conversation as SendMessageType.Teammate turns, waits for teammate activity when the leader has no pending tool calls, and routes teammate tool-approval requests through the session's permission channel (SDK in stream-json mode; YOLO/cancel fallback otherwise). Adds PermissionController.handleTeammateApproval and exposes it on the ControlService permission facade. Ported onto main's restructured run loop (which added its own cron/notification drain mechanism). * feat(cli): drive teammates from the interactive turn loop (experimental) Final Stage 5 piece of the Agent Team re-port. Wires the interactive (TUI) useGeminiStream hook to the active team: it subscribes to TeamManager, queues teammate messages, and drains them into the conversation as SendMessageType.Teammate turns when idle, guarded against racing the notification drain. Treats teammate turns like user/cron for image-format checks and new-prompt stats. Ported onto main's rewritten hook. * fix(core): declare proper-lockfile dependency for the team subsystem The Agent Team mailbox and task files import proper-lockfile, but the dependency was never declared in package.json, so a clean `npm ci` (as CI runs) failed to resolve the module — cascading into implicit-any and possibly-undefined errors in the same files. It built locally only because the working tree's node_modules already had the package from an earlier install. Adds proper-lockfile to packages/core dependencies and @types/proper-lockfile to root devDependencies (matching the original feature branch), and regenerates the lockfile. Build and typecheck are clean. * fix(team): address review findings on the agent-team subsystem (experimental) Triaged the unresolved review threads from the superseded PR QwenLM#2886 against the re-ported code and applied the valid fixes: - task_update(status:'deleted') now enforces the same ownership guard as updateTask, so a teammate cannot delete another teammate's task. - Completing a task and adding a blocks edge in the same call no longer leaves the dependent permanently blocked by the just-completed task. - listTasks treats a momentarily-empty (mid-create) task file as a create in flight and skips it instead of quarantining and losing the task. - Fire-and-forget coordination calls (flush, auto-claim, unassign, poll) log rejections instead of surfacing as unhandled rejections. - pollLeaderInbox re-checks the leader callback after the awaited read so a detach during the read cannot throw or drop the batch. - scanIdleAgentsForTasks skips teammates with a pending shutdown. - broadcast uses allSettled so one terminated recipient does not fail the whole broadcast. - Hybrid tool-response+teammate turns reset the loop detector, preventing a false LoopDetected when a polling leader merges teammate messages. - useGeminiStream drains its teammate queue on a manager swap; the join event carries the teammate model for the UI tab label. - Removed dead consumeUnreadByType and an unreachable ENOENT branch. Verified: core unit tests (incl. new regressions for the delete guard, the complete+addBlocks re-block, and the empty-file create race) plus live L3 (3-agent) and L4 (4-agent) E2E, both clean. * fix(core): close ownership TOCTOU and lock-ordering hazard in agent-team tasks deleteTask checked ownership against a pre-lock read, so a concurrent claimTask/updateTask could reassign the owner between the check and the unlink — silently destroying another teammate's task. Acquire the lock first, then re-read and re-check ownership inside it before unlinking, mirroring updateTask. Reciprocal edge cleanup now runs after the lock is released (never holding two per-task locks at once) but before the single tasks-updated notification, so no listener observes a phantom blocker. blockTask issued its two updateTask writes via Promise.all; two calls over the same pair in opposite directions could deadlock on per-task locks. Serialize the writes to remove the lock-ordering hazard. * fix(core): harden agent-team message handling and auto-claim - Cap per-agent pending messages (MAX_PENDING_MESSAGES). The queue only drains when its recipient goes IDLE, so an unbounded queue let a single looping teammate balloon a busy teammate's memory; sendMessage now applies backpressure once the cap is reached. - Wrap auto-claimed task content (subject/description, authored by another agent) in a <task_content> envelope with a defensive instruction so it is treated as data, not as instructions to obey. - Surface fire-and-forget coordination failures (flush, auto-claim, unassign) to the leader's conversation. They were only logged via a namespaced debug logger, i.e. invisible in production, despite mapping to silent stuck-teammate / stuck-task symptoms. * fix(core): require approval for agent-team task_create/task_update A task's subject/description becomes the prompt an idle teammate auto-claims and executes with full tool access — the same privileged-sink shape as send_message. Both tools inherited the base default 'allow', which short-circuits the classifier in AUTO mode. Override getDefaultPermission to 'ask' so that injection path stays under the classifier / human-in-the-loop, matching send_message. * docs(core): correct completeOnIdle JSDoc for team teammates The JSDoc cited team teammates as the use-case for completeOnIdle:true, but teammates set it to false so they settle to IDLE (not COMPLETED) and stay alive for follow-up messages and auto-claim. Document the actual semantics and the invariant the leader's wait loop relies on. * fix(core): harden agent-team leader callback and task envelope - fireAndForget: wrap leaderMessageCallback in try/catch so a throwing callback cannot re-introduce the unhandled rejection the wrapper exists to prevent (enforces the documented 'must not throw from this catch'). - tryAutoClaimTask: nonce-tag the <task_content> envelope with the per-session envelopeNonce (same pattern as formatLeaderEnvelope) so a teammate-authored description cannot forge the closing tag and break out of the protected zone via a </task_content> payload. * fix(core): make deleteTask edge cleanup resilient to partial failure Use Promise.allSettled (was Promise.all) for post-unlink edge cleanup so a single failing dependent (corrupt JSON, EACCES, lock exhaustion) no longer skips notifyTasksUpdated for the dependents that were cleaned. Without this their blockedBy is cleared but scanIdleAgentsForTasks never re-runs, leaving them stuck idle with no recovery (the task file is already unlinked, so a retry returns false). Per-failure warnings are logged. * fix(cli): mount useTeamInProcess so teammate tabs render The hook bridging team TEAMMATE_JOINED events to agent-tab registration (useTeamInProcess) was authored but never mounted in AgentViewProvider — only useArenaInProcess was. As a result teammate tabs never registered and the teammate tab bar never appeared during in-process team runs. Mount useTeamInProcess alongside useArenaInProcess, and label teammate tabs by name rather than model (teammates inherit the leader's model, so a model label collapses to a generic "teammate" and is identical across the team). Add a regression test asserting the provider mounts the team bridge. * test(terminal-capture): add agent-team feature demo + capture fixes Add a standalone streaming demo of the agent-team feature that captures the full lifecycle and the teammate tab navigation into a single GIF (scenarios/agent-team-demo.ts). Supporting engine fixes: - capture(): scroll the xterm viewport to the live bottom before screenshotting, so a capture taken after an idle period shows the current state instead of stale top-of-buffer scrollback. - scenario-runner: skip scenarios/*.ts files with no default export (driver scripts that guard their own entrypoint), so batch runs don't choke. * fix(core): serialize in-process mailbox writers to fix Windows lock flakiness The concurrent-write test fired 10 writeMessage() calls at one inbox, each contending for the same proper-lockfile lock with a fixed, non-randomized backoff. On Windows, slower fs syscalls let the tail writers exhaust the retry budget before winning the lock, throwing ELOCKED ("Lock file is already being held") — a flaky failure that alternated pass/fail across CI runs. Add a per-inbox in-process Mutex (async-mutex, the pattern already used in jsonl-utils and writeContextFile) so same-process writers serialize in memory and only one reaches for the file lock at a time. The proper-lockfile lock stays inside the mutex to preserve cross-process safety between agent processes. Also randomize the lock backoff to de-synchronize genuine cross-process contenders. * feat(team): render teammate reports as a compact notification line A teammate's report was injected into the leader's conversation as a raw <teammate_message_<nonce>> envelope and rendered verbatim as a user bubble — a large, scaffolding-heavy block on screen for what is often the biggest payload in the feature. Adopt the two-text split the notification queue already uses: the full nonce-tagged envelope still goes to the leader's model, but the user now sees a compact "● <name> reported back" line in its place. The verbatim USER bubble is suppressed for SendMessageType.Teammate exactly as it is for Cron, and coordination-error notices get the same treatment. The leader callback now delivers both the model text and a display string built in TeamManager (where the structured sender/summary live), so the UI never parses the envelope. Headless is unchanged — it ignores the extra arg. * fix(terminal-capture): widen agent-team-demo Phase C budget so the GIF doesn't cut off The leader sits idle (no Main-view output) while teammates read their files, so Phase C captures no frames until a report lands — making maxPolls the real wall-clock budget. At 80 polls (~112s) a slow second scout could exhaust it before reporting, ending the GIF mid-run. Bump to 200 polls (~5min) so the capture outlasts the slowest scout plus the combined summary and delete; the loop still exits early on `deleted` and idle polls capture no frames, so the GIF doesn't bloat. * fix(core): separate task-content nonce; forward send_message summary Address two review findings on the agent-team messaging path: - The <task_content_…> envelope reused envelopeNonce — the per-session nonce the leader trusts to authenticate <teammate_message_…> blocks. Because the task-content prompt is delivered to the claiming teammate, a teammate could learn the nonce and forge a leader-trusted envelope. Use a dedicated taskContentNonce so the leader-trust nonce stays secret from teammates. - The SendMessage 'summary' param was dropped between the tool and the mailbox, so the leader UI always showed the '{name} reported back' fallback. Thread summary through sendMessage → writeMessage so it reaches formatLeaderDisplay. Adds regression tests for both. * fix(core,cli): harden agent-team messaging per review round 4 - task-content envelope uses a fresh per-claim nonce instead of a shared per-session one, so a teammate that learns one task's nonce can't forge a later task's closing tag to inject the next claimant. - team_delete wraps manager.cleanup() in try/catch and always resets the Config team state, so a cleanup failure no longer permanently wedges team_create for the rest of the session. - unassignTeammateTasks uses Promise.allSettled so one corrupt/locked task file no longer strands the remaining tasks on a terminated teammate; the caller's re-scan still fires. - non-interactive teammate-approval responses .catch() rejections to avoid an unhandledRejection if the teammate terminates mid-approval. - setupEventBridge warns when the backend can't provide an agent handle or event emitter instead of returning silently. * fix(core): don't let a failed dependent unblock abort task completion unblockDependents used Promise.all, so a single dependent failing (corrupt JSON, EACCES, lock exhaustion) rejected out of updateTask before the completed status was persisted — the task stayed in_progress on disk while already-processed dependents were unblocked, leaving the dependency graph inconsistent. Switch to Promise.allSettled with a debug warning per failure, mirroring the best-effort edge cleanup in deleteTask and unassignTeammateTasks. * fix(core): quarantine corrupt teammate inboxes; skip task scan when no agent is idle Review round 7. A corrupt teammate inbox previously made every writeMessage/consumeUnread re-throw on the same file, so the teammate could never receive another message (including shutdown requests) — while the leader inbox already self-healed via quarantine. readInboxRaw now renames the corrupt file to .corrupt-{ts} and continues on a fresh inbox; the leader-side offset clamps to 0 if the inbox shrank behind the poller so messages are re-surfaced rather than silently skipped. scanIdleAgentsForTasks now checks for idle members before reading the task board, avoiding a full tasks-directory scan on every task update while all agents are busy. Also document the restrictsOwnership field enumeration hazard and the intentional metadata/activeForm exclusion. * fix(core): re-check task ownership under the lock when unassigning a terminated teammate Review round 8. unassignTeammateTasks snapshotted in_progress tasks and then blind-wrote {status: pending, owner: null} per task, so a leader reassignment (or the dying teammate's final completion) landing between the snapshot and the per-task lock was silently reverted. Releases now go through an in-lock compare-and-set that skips the task when its owner or status no longer matches the snapshot. Also isolate task-update listeners (one throwing listener no longer starves the rest) and drop the lone const enum for subsystem consistency. * fix(core): harden team task file layer against partial writes and transient I/O - createTask claims the ID with an empty O_EXCL placeholder and fills it via temp-file + rename, so concurrent readers never see partial JSON (which the quarantine would have destroyed mid-create) - listTasks quarantines only on parse failures; transient read errors (EMFILE/EIO/EACCES) skip the file for one round instead of renaming a healthy task away, and the read fan-out is capped at 16 - updateTask / claimTask / releaseOwnedTask guard the in-lock readFile against ENOENT (resetTaskList and the quarantine rename run without per-task locks), mirroring deleteTask - cover releaseOwnedTask's three defensive branches with tests * fix(core): drain messages enqueued during the IDLE transition; settle abort on idle agents - a message enqueued from inside the synchronous IDLE STATUS_CHANGE emit (TeamManager's flush) landed after the run loop's final empty check while `processing` was still true — enqueueMessage would not restart the loop and the message stranded in a dead queue; the loop now re-checks the queue after `processing` flips false - abort() on an idle/initializing agent only set the signal: no loop was running to observe it, so the agent never reached a terminal status and allTeammatesTerminated()-style gates never fired; abort now settles CANCELLED directly when no loop is in flight - regression tests drive the real AgentInteractive (stub model, real loop) through send-during-idle-emit and abort-while-idle * test(core): align FakeAgent queue and abort semantics with AgentInteractive FakeAgent modeled a friendlier runtime than the one that ships: enqueueMessage processed inline (no queue, no processing flag, resurrecting terminal agents via unconditional RUNNING), which is exactly what masked the flush-into-dead-queue bug. It now queues while a round is in flight, drains before settling IDLE, drops messages after abort()/shutdown() like the real drained queue, and never resurrects a terminal agent. * fix(core): surface spawn failures, handle shutdown_rejected, envelope peer messages - spawnTeammate now checks the agent's status after spawnAgent resolves: start() reports chat-creation failure via FAILED without throwing, so the leader was told the teammate joined while sends were accepted into a queue that could never flush; a failed spawn now rolls back and surfaces the reason (with a terminal-status replay in setupEventBridge for the attach race) - shutdown_rejected now clears _shutdownPending: a teammate that declined once stayed excluded from auto-claim and kill-armed on any later "shutdown_approved" mention - peer-to-peer deliveries get a fresh-nonce envelope like leader deliveries, closing inline leader impersonation between teammates (deliberately not the leader-trust nonce, which must never reach teammate context) * fix(core): exclude workflow tool from teammates The teammate ALS identity propagates into anything a teammate spawns, so prepareTools() keeps choosing the teammate exclusion set for nested agents — without WORKFLOW in it, a teammate-launched workflow re-arms the O(k^n) recursive fan-out the subagent exclusion set prevents. * fix(core): make task tools visible to permission review; reject dependency cycles - task_create / task_update now project their content (subject, description, status, owner, edges) to the AUTO classifier — the base '' sentinel projected to an empty object, so the classifier ruled on task_create({}) and the 'ask' override was blind; the interactive confirmation now shows the description (truncated), since that text is what a claiming teammate executes - task_update rejects self-edges and dependency cycles instead of silently persisting a graph that auto-claim can never unblock - regression tests pin the 'ask' default and a non-empty classifier projection for both tools * fix(core): reclaim stale teams on team_create instead of wedging the name Nothing deletes team dirs on normal exit (only an explicit team_delete does), so every Ctrl+C, completed headless run, or crash permanently wedged the team name behind createTeamFile's wx-exclusive create, with manual rm -rf as the default recovery. team_create now records the owner identity (leadSessionId + leadPid) and, on EEXIST, reclaims the team when the recorded lead process is gone (or is this process); only a live concurrent owner keeps the name refused. * fix(cli): pass teammate envelopes straight to the model, skipping shell/@/slash preprocessing Teammate envelopes are model-authored text already rendered as a notification line by the teammate drain, but they still flowed through the user-input preprocessing: with shell mode active a teammate report was EXECUTED as a shell command, and a leading / or an @path was reinterpreted against the leader's session. They now early-return like Notification. * fix(core): exclude Teammate from UserPromptSubmit hooks and record it in chat history Teammate envelopes are machine-driven re-entries like Cron and Notification: user-authored UserPromptSubmit hooks must not fire on (or block) internal coordination traffic. They also never reached any chat-recording path — record them like notifications so a resumed session restores the same compact info line the live UI rendered. * fix(cli): stop teammate-approval rejections from escaping as unhandled rejections The stream-json listener voided handleTeammateApproval's promise while the handler's own error path re-issues a respond() that can reject (teammate terminated mid-request) — an unhandledRejection that can take down an SDK session. The call site now catches like its headless siblings, and the controller's catch-path respond(Cancel) is wrapped so the method never rejects out of its own error path. * test(core): add getSessionId to team-lifecycle mock config
… servers - Add isModelUnloadedError() matching 'model is unloaded' pattern - Add delayWithAbortCheck() with jittered delay and abort signal support - Add retry logic in executeWithErrorHandling catch block for both streaming and non-streaming paths - Create fresh RequestContext for each retry (clean parser state) - Apply redactProxyError to all error paths (merges safely with main) - Include runtimeDiagnostics.recordOpenAIWireRequest in retry paths - Add console.warn for model-unloaded retries (production visibility) - Re-throw StreamContentError in processStreamWithLogging for higher-level TPM throttling retry
f3e4971 to
b197df5
Compare
Comment on lines
+467
to
+634
| // Verify each pre-checked level's file is actually gone via | ||
| // `fs.access`; only fan out the event for confirmed removals. | ||
| // If at least one level still has its file, return 500 with | ||
| // the residual list so callers can act. | ||
| const removed: SubagentConfig[] = []; | ||
| const remaining: SubagentConfig[] = []; | ||
| for (const found of existingAtLevels) { | ||
| if (!found.filePath) { | ||
| // Synthetic / no-file entries (impossible at project / | ||
| // user levels, defensive guard) treat as "no verification | ||
| // possible" → assume removed to match legacy behavior. | ||
| removed.push(found); | ||
| continue; | ||
| } | ||
| try { | ||
| await fs.access(found.filePath); | ||
| // Still present → unlink failed silently. | ||
| remaining.push(found); | ||
| } catch { | ||
| // Any access error (typically ENOENT) means the file is | ||
| // gone — count as successfully removed. | ||
| removed.push(found); | ||
| } | ||
| } | ||
|
|
||
| if (remaining.length > 0) { | ||
| writeStderrLine( | ||
| `qwen serve: DELETE /workspace/agents/${safeLogValue(agentType)} partial — ` + | ||
| `removed=${removed.map((r) => r.level).join(',') || 'none'} ` + | ||
| `remaining=${remaining | ||
| .map((r) => `${r.level}:${r.filePath}`) | ||
| .join(',')}`, | ||
| ); | ||
| // Still publish events for files we DID remove so subscribers | ||
| // get partial-success signals — but emit them BEFORE the 500 | ||
| // so a client reading the response can correlate. | ||
| for (const found of removed) { | ||
| const evtLevel: 'project' | 'user' = | ||
| found.level === 'project' ? 'project' : 'user'; | ||
| deps.bridge.publishWorkspaceEvent({ | ||
| type: 'agent_changed', | ||
| data: { | ||
| change: 'deleted', | ||
| name: found.name, | ||
| level: evtLevel, | ||
| }, | ||
| ...(originatorClientId ? { originatorClientId } : {}), | ||
| }); | ||
| } | ||
| res.status(500).json({ | ||
| error: | ||
| `Failed to delete every level of subagent "${agentType}" — ` + | ||
| `${remaining.length} level(s) still have their file on disk`, | ||
| code: 'agent_delete_partial', | ||
| name: agentType, | ||
| removedLevels: removed.map((r) => r.level), | ||
| remainingLevels: remaining.map((r) => r.level), | ||
| }); | ||
| return; | ||
| } | ||
|
|
||
| // Emit one event per level that was deleted so subscribers using | ||
| // event metadata for toasts/audit/echo-suppression see the | ||
| // complete picture. Without this split, an unscoped DELETE that | ||
| // removed both project AND user shadows would publish only one | ||
| // event with one level — misleading the receiver about which | ||
| // file(s) actually went away. | ||
| if (existingAtLevels.length === 0) { | ||
| // `deleteSubagent` succeeded with no pre-checked level — could | ||
| // happen if a file landed between the loadSubagent check and | ||
| // the unlink. Emit a single best-effort event with the level | ||
| // hint we know. | ||
| const fallbackLevel: 'project' | 'user' = | ||
| scopedLevel === 'user' ? 'user' : 'project'; | ||
| deps.bridge.publishWorkspaceEvent({ | ||
| type: 'agent_changed', | ||
| data: { | ||
| change: 'deleted', | ||
| name: agentType, | ||
| level: fallbackLevel, | ||
| }, | ||
| ...(originatorClientId ? { originatorClientId } : {}), | ||
| }); | ||
| } else { | ||
| for (const found of removed) { | ||
| const evtLevel: 'project' | 'user' = | ||
| found.level === 'project' ? 'project' : 'user'; | ||
| deps.bridge.publishWorkspaceEvent({ | ||
| type: 'agent_changed', | ||
| data: { | ||
| change: 'deleted', | ||
| name: found.name, | ||
| level: evtLevel, | ||
| }, | ||
| ...(originatorClientId ? { originatorClientId } : {}), | ||
| }); | ||
| } | ||
| } | ||
| res.status(204).end(); | ||
| }, |
Comment on lines
+20
to
+22
| promptId?: string; | ||
| sessionId?: string; | ||
| } |
Comment on lines
+495
to
+499
| spawn('cmd.exe', ['/c', scriptPath], { | ||
| detached: true, | ||
| stdio: 'ignore', | ||
| windowsHide: true, | ||
| }).unref(); |
| case 'emacs': | ||
| return { | ||
| command: executable, | ||
| args: [filePath], |
| case 'zed': { | ||
| return { | ||
| command: executable, | ||
| args: [filePath, '--wait'], |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes QwenLM#3802
Local model servers like LM Studio support Just-In-Time (JIT) model loading — they load the model into memory when they receive the actual chat completion request. If the model is not currently loaded, the server returns an error (e.g. "Model is unloaded") instead of loading it on demand.
The ContentGenerationPipeline now detects model-unloaded errors and retries the request once before surfacing the error to the user. This gives the server a second chance to load the model.
Changes
isModelUnloadedError()to detect JIT-loading error patterns (model is unloaded,model unloaded). Avoids matching permanent errors like "model not found" which indicate misconfiguration.executeWithErrorHandlingfor both streaming and non-streaming pathswrapStreamWithRetry()to catch iteration-time model-unloaded errors during streaming (e.g.error_finishSSE chunks), matching non-streaming retry behaviorwarnon retry,infoon success)processStreamWithLoggingno longer callshandleErrorbefore re-throwing — letswrapStreamWithRetrydecide whether to retry or propagateTests
All 41 tests pass. Added 9 new test cases:
Summary by cubic
Retries chat requests once on “model is unloaded” from local model servers and ships broad reliability and UX upgrades across the runtime, daemon, and install flows.
New Features
--approval-mode auto) that auto‑approves safe calls and falls back to manual on risk; exposed via daemon API and SDK; IDE/TUI updated./demodebug page. ACP bridge split into@qwen-code/acp-bridgewith typed options/status seams.OTEL_RESOURCE_ATTRIBUTES/settings, and opt‑intelemetry.metrics.includeSessionId.notebook_edit), per‑turn/diffdialog, external editor preference for Ctrl+X, active goal stream events, worktree Phase C (session persistence, hooksPath, footer indicator, resume hints).Bug Fixes
tool_usebefore a drop, repair/hoist danglingtool_useon resume/retry, dedupe late results, and roll back partials on retried attempts so history stays valid.structuredClonewith shallow copies to prevent OOM, make memory recall async with safe aborts, and dedupe Gemini recovery continuations (no repeated tables/thoughts, correct thought/text order).Written for commit b197df5. Summary will update on new commits.