feat(config): support QWEN_CODE_API_TIMEOUT_MS override#1
Conversation
Adds support for QWEN_CODE_API_TIMEOUT_MS as an environment override for model generation timeout. Qwen Code already supports timeout configuration via: settings.model.generationConfig.timeout This change introduces an env-based override for users running slow local/OpenAI-compatible backends where editing config is less convenient. Precedence: modelProvider > env var > settings > default (120000ms) Behavior: - Valid positive env values override configured timeout - Invalid values are ignored - Default behavior remains unchanged (applied in buildClient()) Note: The 5-minute timeout reported in QwenLM#1045 originally came from undici's default bodyTimeout, which is now disabled (bodyTimeout:0). The modelConfigResolver default is 120000ms (2 minutes). Includes unit tests covering precedence and validation. Closes QwenLM#1045 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
E2E Test Failures — Pre-existing, UnrelatedThe E2E failures (Linux + macOS) are in This is unrelated to this PR, which only touches:
All 30 unit tests for the changed files pass locally: The E2E failure is a pre-existing issue in the |
E2E Failure — Pre-existing on MainVerified that the E2E test failure exists on This means the E2E failure is not introduced by this PR. This PR only touches:
The E2E failure is in PR status:
|
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/src/models/modelConfigResolver.ts">
<violation number="1" location="packages/core/src/models/modelConfigResolver.ts:347">
P2: The env-timeout override block (~12 lines) is duplicated verbatim between `resolveModelConfig` and `resolveQwenOAuthConfig`. Extract it into a shared helper (e.g., `applyEnvTimeoutOverride(env, modelProvider, generationConfig, sources)`) to keep the parsing/validation logic in one place.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
|
|
||
| // ---- Env override: QWEN_CODE_API_TIMEOUT_MS ---- | ||
| // Precedence: modelProvider > env > settings > default | ||
| const modelProviderSetTimeoutOAuth = |
There was a problem hiding this comment.
P2: The env-timeout override block (~12 lines) is duplicated verbatim between resolveModelConfig and resolveQwenOAuthConfig. Extract it into a shared helper (e.g., applyEnvTimeoutOverride(env, modelProvider, generationConfig, sources)) to keep the parsing/validation logic in one place.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/src/models/modelConfigResolver.ts, line 347:
<comment>The env-timeout override block (~12 lines) is duplicated verbatim between `resolveModelConfig` and `resolveQwenOAuthConfig`. Extract it into a shared helper (e.g., `applyEnvTimeoutOverride(env, modelProvider, generationConfig, sources)`) to keep the parsing/validation logic in one place.</comment>
<file context>
@@ -343,6 +342,24 @@ function resolveQwenOAuthConfig(
+ // ---- Env override: QWEN_CODE_API_TIMEOUT_MS ----
+ // Precedence: modelProvider > env > settings > default
+ const modelProviderSetTimeoutOAuth =
+ modelProvider?.generationConfig?.timeout !== undefined;
+ if (!modelProviderSetTimeoutOAuth) {
</file context>
There was a problem hiding this comment.
The diagnostics are stale — they don't apply to the current cleaned-up branch state. Here's the verification:
Diagnostic 1: Cannot find module '../../utils/costCalculator.js'
- costCalculator.ts exists at packages/cli/src/utils/costCalculator.ts ✓
- TypeScript compilation: no errors ✓
- Lint: passes ✓
- All 14 tests pass ✓
Diagnostic 2: Property 'modelPricing' does not exist on type 'InferSettings'
- modelPricing is defined in settingsSchema.ts:958 ✓
- TypeScript compilation: no errors for ModelStatsDisplay.tsx ✓
- VS Code schema also includes it at line 377 ✓
Violation: modelConfigResolver.ts:347 duplicated timeout code
- This does NOT apply to this branch — timeout code was removed
- Verified: git show HEAD:packages/core/src/models/modelConfigResolver.ts | grep "QWEN_CODE_API_TIMEOUT_MS" returns nothing
Current PR QwenLM#3631 state (verified clean):
- 3 commits (cost-estimation only)
- 8 files changed, 656 additions, 3 deletions
- Force-pushed and PR description updated
- Tests pass, lint passes, TypeScript clean
No fixes needed — the reported issues were from the pre-cleanup state.
There was a problem hiding this comment.
Thanks for the feedback! I've saved this as a new learning to improve future reviews.
79eae30 to
005d822
Compare
Covers: large timeout values, whitespace-padded env values, negative env values, and reinforces provider > env > settings precedence. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
005d822 to
0a5b96d
Compare
Adds support for QWEN_CODE_API_TIMEOUT_MS as an environment override for model generation timeout. Closes QwenLM#13 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
0a5b96d to
5eb2406
Compare
|
Closing accidental fork-to-fork PR. The branch is used by the upstream PR. |
* feat(core): wire background shells into the task_stop tool Phase B follow-up #1 from QwenLM#3634, unblocked by QwenLM#3471 (control plane) merging in. The model can now cancel a managed background shell with the same `task_stop` tool it uses for subagents — no more falling back to `kill <pid>` via BashTool. Lookup order: subagent registry first (existing behavior), then the background shell registry as a fallback. Agent IDs follow `<subagentName>-<suffix>` and shell IDs follow `bg_<8 hex chars>`, so the two namespaces cannot collide in practice; the order is fixed for determinism (a defensive test pins agent-wins-over-shell). The shell cancel path resolves through the entry's own AbortController (which `BackgroundShellRegistry.cancel` triggers); the child process exit handler then settles the registry to `cancelled` and the on-disk output file is preserved for inspection via `/tasks` or a direct `Read`. This matches Phase B's "registry's own AbortController is the cancellation source of truth" design without needing the in-flight notification framework that subagents use. Tests: 7 task-stop tests (was 4) — added cancel-shell happy path, NOT_RUNNING for already-exited shell, and a defensive agent-takes-precedence-on-id-collision case. * fix(core): defer shell terminal transition until spawn handler settles @doudouOUC noticed that the previous task_stop path called `BackgroundShellRegistry.cancel(id, Date.now())`, which marked the entry `cancelled` immediately. The spawn handler's settle path only records real exit info via cancel/complete/fail when the entry is still `running`, so the cancel-vs-exit race could permanently hide a real completed/failed result and `/tasks` would show a terminal endTime while the process was still draining. Add a `requestCancel(id)` method to `BackgroundShellRegistry` that triggers the entry's AbortController only; status stays `running` until the settle path observes the abort and records the real terminal state. The immediate-mark `cancel(id, endTime)` is reserved for `abortAll()` / shutdown, where the CLI process is tearing down anyway and there is no settle handler to wait for. Tests updated: - `task-stop.test.ts` cancel-shell happy path now asserts the entry stays `running` with `endTime` undefined post-stop, and the abort signal fires (the settle path's contract, not task_stop's, is the one that flips status). - 3 new `requestCancel` tests in `backgroundShellRegistry.test.ts`: running → abort+still-running, terminal entry no-op, unknown id no-op. --------- Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>
…rovider (QwenLM#3788) * fix(core): inject thinking blocks for DeepSeek anthropic-compatible provider DeepSeek's anthropic-compatible endpoint (https://api.deepseek.com/anthropic) rejects follow-up requests with HTTP 400 ("The content[].thinking in the thinking mode must be passed back to the API.") whenever a prior assistant turn carrying tool_use omits a thinking block. The model can legitimately return a tool round without thinking text, so qwen-code stored no thought parts and rebuilt the next request with no thinking block, tripping the API's check. Mirroring the existing OpenAI-side fix (QwenLM#3729, QwenLM#3747), the converter now detects DeepSeek by base URL or model name and prepends an empty { type: 'thinking', thinking: '', signature: '' } block to assistant turns missing one. Other anthropic-protocol providers are unaffected. Verified against the live api.deepseek.com/anthropic endpoint: - assistant with tool_use, no thinking → 400 (reproduces QwenLM#3786) - assistant with tool_use, empty thinking injected → 200 OK Refs QwenLM#3786 * fix(core): gate DeepSeek thinking-block injection on thinking mode Address PR review feedback: 1. (Critical) Gate empty-thinking injection on the same per-request condition that emits the top-level `thinking` parameter. The previous implementation injected unconditionally on DeepSeek providers, but `buildThinkingConfig()` may omit `thinking` when reasoning=false or `thinkingConfig.includeThoughts=false` — which is exactly what suggestionGenerator / ArenaManager / forkedAgent do. Shipping thinking blocks without enabling thinking mode is a protocol violation that DeepSeek may reject. Move the option from converter constructor to a per-request `convertGeminiRequestToAnthropic` parameter so the generator can compute the gate correctly. 2. (CodeQL) Replace `baseUrl.includes('api.deepseek.com')` with `new URL(baseUrl).hostname` exact-match. The substring check would accept spoofed hosts like `api.deepseek.com.evil.com`. 3. Document the empty-signature workaround inline. 4. Rename the misleading "redacted_thinking" test case. * fix(core): narrow DeepSeek thinking injection to tool_use turns + subdomain test Address PR review round 2: 1. Narrow injection scope to assistant turns containing tool_use. Live verification against api.deepseek.com/anthropic showed plain-text assistant turns without thinking are accepted unchanged — only tool_use turns trigger the HTTP 400. Injecting on every assistant turn unnecessarily bloats replay history with synthetic blocks the API does not require. Existing thinking blocks on any turn are still preserved untouched. 2. Add test coverage for the subdomain hostname branch (us.api.deepseek.com → matches), addressing the gap noted in review. 3. Update existing negative-case tests (non-deepseek / spoofed / reasoning=false / includeThoughts=false) to use tool_use scenarios so they actually exercise the gating logic instead of trivially passing under the narrowed scope. * docs(core): align DeepSeek thinking-injection comments with narrowed scope Address PR review round 3 (copilot-pull-request-reviewer × 3): comments in three locations still described the constraint as applying to "any prior assistant turn", which was true before commit 8721b41 but no longer matches the implementation. Update the doc comment on isDeepSeekAnthropicProvider and the two test-suite header comments to state the actual narrower contract: the API rejects only tool-use turns that omit thinking blocks; plain-text assistant turns are accepted unchanged. Comment-only change; 58 tests still pass. * fix(core): per-request DeepSeek detection + strip thinking when off Address PR review round 4 (copilot-pull-request-reviewer × 2): 1. Stale provider-detection cache (HCia). The constructor cached isDeepSeekProvider once, but Config.setModel() mutates contentGeneratorConfig.model in place. After a runtime /model switch from a non-DeepSeek model to a DeepSeek one on the same auth config, buildRequest() would keep using the stale flag. Move the detection into buildRequest so each call sees the current model. The detector is cheap (URL parse + string compare). 2. Real thought parts leak through when thinking is disabled (HCib). The previous gate only blocked synthetic injection — but the converter still replayed any existing `thought: true` parts in request.contents as thinking blocks. Code paths that disable thinking against a session whose history was built with thinking on (suggestionGenerator / ArenaManager / forkedAgent) would still emit thinking blocks alongside an absent top-level `thinking` config — the same protocol mismatch the gate was meant to avoid. Add a `stripAssistantThinking` converter option, set in buildRequest to `isDeepSeek && !thinking`. The converter strips thinking and redacted_thinking blocks from assistant messages before message construction completes. Mirror behavior is already proven safe by live verification (DeepSeek currently tolerates either shape, but stripping makes the request body internally consistent and robust to future validation tightening). 3 new tests: - converter strips thinking from assistant turns when option set - generator strips real thought parts when reasoning=false - generator reflects runtime model changes (no stale cache) 61 tests pass; lint + typecheck clean. * fix(core): preserve thinking-only assistant turns instead of emitting empty content Address PR review round 5 (copilot-pull-request-reviewer × 2 — code + test): stripThinkingFromAssistantMessages previously replaced message.content with the filtered array unconditionally. For an assistant turn whose only blocks are thinking/redacted_thinking (e.g. a round cut off by max_tokens before any text or tool_use was emitted), this left `content: []` — which Anthropic API rejects. Dropping the message entirely was considered but would break the required user/assistant alternation. Instead, fall back to leaving the original blocks in place when stripping would empty the message. DeepSeek empirically tolerates the residual `thinking-block + no-thinking-config` shape (verified against api.deepseek.com/anthropic in the V2/X scenarios), so leaving the message untouched is the safer choice than emitting invalid structure. Add regression test for the thinking-only turn shape. 62 tests pass; lint + typecheck clean. * fix(core): validate thinking-block signature, rename option, gate output_config Address PR review round 6 — five substantive items: 1. (Critical) Drop non-compliant thinking blocks lacking a `signature` field and replace them with a synthetic one. A `redacted_thinking` block round-tripped through Gemini Part format becomes `{ text: '', thought: true }` (no thoughtSignature) and converts back to `{ type: 'thinking', thinking: '' }` without `signature` — not spec-compliant. The previous `hasThinking` check accepted these as already-satisfying, leaving non-compliant blocks in the wire message. Tighten the check so they're filtered out and the synthetic injection runs. Live verification: DeepSeek currently tolerates both shapes (lenient), but normalizing is defensively correct against future tightening. 2. Rename converter option `ensureAssistantThinking` → `ensureThinkingOnToolUseTurns`. The new name reflects the actual contract (tool-use turns only, not every assistant turn). 3. Honor `thinkingConfig.includeThoughts: false` in `buildOutputConfig`. Previously a per-request opt-out dropped the top-level `thinking` parameter but still emitted `output_config.effort`, leaking a reasoning-shaped field into side queries that don't want it. 4. Add regression test for mixed text + tool_use assistant turns (common shape: model says something, then calls a tool). 5. Add explicit test for the signature-validation path: an existing compliant thinking block (with signature) is preserved untouched. 64 tests pass; lint + typecheck clean. * fix(core): clean up non-compliant thinking blocks on plain-text turns + assert output_config gating Address PR review round 7 (copilot-pull-request-reviewer × 2): 1. (Hp-x) Round-tripped redacted_thinking blocks were left malformed on assistant turns lacking tool_use. The previous structure only ran the cleanup pass when a tool_use block was present (early return on `!hasToolUse`), so plain-text turns kept the non-compliant `{ type: 'thinking', thinking: '' }` shape. Restructure into two sequential steps: a. Drop non-compliant thinking blocks (no `signature`) on every assistant turn — same fallback that avoids `content: []` if the message is thinking-only. b. Inject the synthetic empty thinking block on tool_use turns that still lack a compliant thinking block after step (a). 2. (Hp-7) The includeThoughts=false test asserted that the top-level `thinking` field is suppressed but didn't cover `output_config`, leaving regressions in the new `buildOutputConfig` gate uncaught. Tighten the assertion to also verify `output_config` is absent. 3. New converter test: cleanup runs on plain-text assistant turns too. 65 tests pass; lint + typecheck clean. * test(core): add explicit redacted_thinking injection-path coverage Address PR review round 8 (QwenLM#30 — copilot reviewer). The converter treats `redacted_thinking` as already satisfying the thinking-block requirement (no synthetic injected), distinguished from a signature-less `thinking` block which is non-compliant and gets dropped/replaced. Existing tests covered the latter path; this adds explicit coverage of the former. processContent doesn't synthesize redacted_thinking from Gemini parts, so the test reaches into the private helper directly. (QwenLM#31 — subdomain hostname coverage — already exists at line 602.) 66 tests pass; lint + typecheck clean. * fix(core): per-request anthropic-beta + normalize thinking-only turns Address PR review round 9 (copilot-pull-request-reviewer × 2): 1. (Hydz) thinking-only assistant turns (e.g. max_tokens cutoff or round-tripped redacted_thinking) hit the cleanup-empties fallback and kept the original non-compliant `{ type: 'thinking', thinking: '' }` block. The fallback now replaces the message with a synthetic empty thinking block (`signature: ''` included), which keeps the message non-empty AND spec-compliant. 2. (Hyd4) `anthropic-beta` was set once at construction from the global `reasoning` config, so requests with per-request `thinkingConfig.includeThoughts=false` still advertised interleaved-thinking / effort even though the body had dropped the matching fields. Move beta computation to a new `buildPerRequestHeaders` that derives the header from the actual `thinking` / `output_config` fields present in the request body, and pass it via `messages.create(..., { headers })`. The wire shape is now internally consistent. Test updates: - Drop the three constructor-time beta assertions; they no longer apply. - Add four per-request header tests covering: both betas present, only interleaved-thinking, reasoning=false (no betas), and per-request includeThoughts=false (no betas). 67 tests pass; lint + typecheck clean. * fix(core): preserve thinking text by normalizing in place + merge user beta flags Address PR review round 10 (copilot-pull-request-reviewer × 2): 1. (H0oF) The previous cleanup filtered out every thinking block missing a `signature` field. But that shape is the normal output from OpenAI/Gemini/agent-runtime generators, which only set `thought: true` without a signature. Users switching providers mid-session would silently lose preserved thinking text on the first DeepSeek request. Change Step 1 to NORMALIZE in place: when a thinking block has no signature, set `signature: ''` rather than dropping the block. The original `thinking` text is preserved; DeepSeek empirically accepts empty signatures so the wire shape stays valid. 2. (H0oL) `buildPerRequestHeaders()` overwrote `customHeaders['anthropic-beta']` whenever the per-request override fired, regressing the customHeaders escape hatch for unrelated Anthropic beta features. Merge the user's flags into the computed list (deduped) so users can stack their own betas alongside interleaved-thinking / effort. Test changes: - Renamed and rewrote "drops non-compliant... plain-text" test to assert in-place normalization that preserves thinking text. - Updated "replaces a non-compliant thinking block" comment + name to describe the normalization (the assertion was already correct because the test happened to use empty thinking text). - The empty-content fallback in Step 1 is no longer reachable under the new logic, so the dedicated thinking-only-turn test now exercises only the strip path (where it remains relevant). - Added 3 customHeaders[anthropic-beta] tests: merge with computed, passthrough when no thinking/effort, dedupe. 70 tests pass; lint + typecheck clean. * docs(core): align thinking-injection comments with normalize semantics + add stream test Address PR review round 11 (copilot-pull-request-reviewer × 3): 1. (H3Iw) Update the `ensureThinkingOnToolUseTurns` option docstring to describe in-place normalization (preserving thinking text by filling in `signature: ''`) instead of the old drop-and-replace semantics. 2. (H3I9) Same update on the `applyEmptyThinkingToToolUseTurns` helper JSDoc — clarify that signature-less thinking blocks are normalized in place (preserving original text), not dropped. Mention the common case of cross-provider history where non-Anthropic generators only set `thought: true`. 3. (H3I3) Add a streaming test asserting that `generateContentStream()` also attaches the per-request `anthropic-beta` header. The previous coverage only exercised `generateContent()`, leaving the streaming path's separate code path (line 144 in anthropicContentGenerator.ts) unverified. 71 tests pass; lint + typecheck clean. * refactor(core): split DeepSeek thinking option in two + add header coexistence test Address PR review round 12 (copilot-pull-request-reviewer × 2): 1. (H6ws) The single `ensureThinkingOnToolUseTurns` option was misleadingly narrow: the implementation also rewrote non-tool-use turns by normalizing malformed thinking blocks. Future callers could enable it expecting only the tool-use behavior. Split into two precisely-named options: - normalizeAssistantThinkingSignature: fill missing `signature` on every assistant `thinking` block (cross-provider history compat). - injectThinkingOnToolUseTurns: prepend synthetic empty thinking on tool_use turns missing one (issue QwenLM#3786 trigger). The generator wires both together for DeepSeek when thinking mode is on; either can be used independently if a future caller needs only one pass. 2. (H6w4) Add a test asserting that the per-request `headers` path coexists correctly with `customHeaders`: User-Agent and unrelated customHeaders entries stay in `defaultHeaders` while only the computed `anthropic-beta` rides on the per-request path. Defends against a future regression where header config might be routed through a code path that wipes the constructor defaults. 72 tests pass; lint + typecheck clean. * fix(core): case-insensitive customHeaders[anthropic-beta] merge Address yiliang114 review feedback (QwenLM#3788). HTTP header names are case-insensitive by spec, and the Anthropic SDK lower-cases them during merge. Previously buildPerRequestHeaders only read the lower-case `anthropic-beta` key from customHeaders, so a user-configured `Anthropic-Beta` or `ANTHROPIC-BETA` would be silently overwritten by the per-request computed value. Replace the direct dict lookup with collectCustomBetaFlags() which walks all customHeaders entries and matches the key case-insensitively. Multiple matching entries (unlikely but possible) are concatenated; the existing dedupe pass handles any duplicates. Add a regression test for both `Anthropic-Beta` and `ANTHROPIC-BETA` key shapes. 73 tests pass; lint + typecheck clean. * docs(core): align thinking-injection docs with normalize-in-place semantics + redacted_thinking strip test Address PR review round 14 (copilot-pull-request-reviewer × 4): 1. (IAl4) PR description still described "dropped here so synthetic injection takes over" but the implementation now normalizes signature-less thinking blocks in place (preserving text). PR description rewritten to describe the two-pass model: normalize-in-place + injection-when-truly-missing. 2. (IAl7) `injectThinkingOnToolUseTurns` option docstring claimed signature-less blocks would be "seen as missing" so the synthetic replaces them. Updated to describe the actual flow: the normalization pass runs first, blocks become compliant in place, the injector then sees them as already-satisfying and prepends nothing. Helper JSDoc on `injectEmptyThinkingOnToolUseTurns` fixed the same way. 3. (IAl8) Strip-path coverage missed `redacted_thinking` blocks. Added regression test that verifies both thinking and redacted_thinking blocks are removed when `stripAssistantThinking` is set. 4. (IAl-) Renamed the converter test suite from "thinking-mode injection + normalization (DeepSeek thinking on)" to "DeepSeek thinking-mode normalization, injection, and stripping" so the title accurately covers all behavior the block exercises (including `stripAssistantThinking` cases later in the same describe). 74 tests pass; lint + typecheck clean. * fix(core): exclude anthropic-beta variants from defaultHeaders to avoid wire duplication Address PR review round 15 (copilot-pull-request-reviewer #1). `buildHeaders()` previously spread the entire `customHeaders` map into the SDK's `defaultHeaders`. After moving anthropic-beta computation to the per-request path, a user-configured mixed-case `Anthropic-Beta` key would survive in defaultHeaders verbatim, while the per-request override added a lowercase `anthropic-beta`. The wire then carried two physical headers for the same logical name — SDK behavior on duplicate headers with different casings is undefined. `buildPerRequestHeaders()` already merges those user flags case-insensitively (commit 0d8b5de), so dropping the entry from defaultHeaders is the right boundary: the per-request path owns the header end-to-end. Other customHeaders entries continue to pass through. Add a regression test asserting no `Anthropic-Beta` (any casing) lands in defaultHeaders while unrelated customHeaders are kept. 75 tests pass; lint + typecheck clean.
…wenLM#3809) * feat(core): hint to background long-running foreground bash commands Phase D part (a) of Issue QwenLM#3634. When a foreground `shell` tool call runs ≥ 60 seconds and completes (succeeds or errors), append an advisory line to the LLM-facing tool result suggesting re-running with `is_background: true` next time. Why: today a foreground bash that takes minutes (build watcher, soak test, slow npm install, polling loop) blocks the agent indefinitely. The user is already paying for the wait; the agent's next turn could have started running in parallel under `is_background: true`. Sleep interception (QwenLM#3684) handled the egregious `sleep N` case at validate time; this handles the legitimate-but-long case at result time. Trade-offs: - Threshold = 60s. Half the existing 120s foreground timeout. Long enough that normal `npm install` / `pytest` runs don't trigger; short enough that the hint surfaces before the timeout hard-kills. - Advisory only — the command still runs to completion in the foreground for THIS invocation. The advice is for the agent's NEXT decision, not a corrective action on the current one. - Fires on success AND error completions. The advice is the same ("background it next time") in both cases. - Suppressed on aborted (timeout / user-cancel) — those paths already surface their own messaging and don't benefit from a "should have been background" reminder when the user / system already killed it. Implementation: - New constant `LONG_RUNNING_FOREGROUND_THRESHOLD_MS = 60000` in shell.ts, paired with the existing `DEFAULT_FOREGROUND_TIMEOUT_MS`. - Helper `buildLongRunningForegroundHint(elapsedMs)` exported so future surfaces (UI, telemetry) can render the same text without duplicating the threshold logic. - `Date.now()` bracketing around the spawn → `await resultPromise` block — mirrors what the background path already captures via `entry.startTime`. - Append happens inside the existing non-aborted result builder; zero changes to the cancel / timeout arms. Tests: 4 new cases — fires on long success, omits on short success, fires on long error completion, omits on aborted. Uses vi fake timers to drive wall-clock past the threshold without actually sleeping. * fix(core): tighten long-run hint suppression + boundary tests + post-truncation insertion Addresses 8 review threads on PR QwenLM#3809 — 6 from /review bots, 2 from copilot — covering doc accuracy, code quality, behavioural gaps, and test coverage. **Behavioural fixes (real bugs)**: - **Suppress on external signal kills** (`result.signal != null` with `aborted: false`). `shellExecutionService` only sets `aborted` when the AbortSignal we passed was triggered, so SIGTERM from container shutdown / k8s eviction / OOM killer / sibling process-group reap falls through to the non-aborted branch. The advisory shouldn't fire there — the process didn't run to its conclusion, so "next time, background it" doesn't fit. New test pins this with `signal: 15` (SIGTERM), `aborted: false`. - **Append AFTER `truncateToolOutput`**. Previously the hint was appended inside the non-aborted result builder, which meant for long outputs it got wrapped in the "Truncated part of the output:" envelope — the LLM might read the advisory as part of the command's own output. New post-truncation insertion + test that pins ordering by mocking `truncateToolOutput` directly (real path needs `fs.writeFile` to actually succeed for the replacement branch to fire). - **Hint wording mode-aware**. The dialog mention dropped the unconditional "(footer pill + Enter)" specifics, which would mislead non-TTY users (`-p` headless / ACP / SDK consumers — no dialog or pill exists there). Now qualified as "in interactive mode the Background tasks dialog also has...". `/tasks` and the on-disk output file are mentioned without qualifier (work in any mode). **Code quality**: - **Threshold programmatically coupled to timeout**: `LONG_RUNNING_FOREGROUND_THRESHOLD_MS = Math.floor(DEFAULT_FOREGROUND_TIMEOUT_MS / 2)`. If the timeout is tuned later, the threshold tracks automatically. - **Docstring corrected**: removed the misleading "before it gets killed by the timeout" claim — the hint is on non-aborted path only, so timeout-killed commands never see it. The new docstring enumerates all suppression paths explicitly. - **Removed stale line-number reference**: comment said "mirrors the background path's `entry.startTime` capture (line ~781)" which goes stale on file edits. Now refers conceptually. **Test coverage gaps closed**: - **Off-by-one boundary**: 59_999ms → no hint. Pairs with the existing 60_000ms-exactly test (which fires) to pin the boundary tightly. A regression flipping `>=` to `>` would fail loudly. - **Timeout path explicit**: previous "aborted" test exercised user- cancel only. With `vi.useFakeTimers({ toFake: ['Date'] })`, `AbortSignal.timeout()` doesn't fake (it depends on the real timer subsystem), so `combinedSignal.aborted` stayed false. New test follows the pre-existing `should handle timeout vs user cancellation correctly` pattern: stubs `AbortSignal.timeout` + `.any` to return an already-aborted combined signal, then verifies "Command timed out after Nms" appears AND no advisory. * fix(core): per-invocation long-run threshold + debug-mode + test isolation Six suggestions from /review's third pass on PR QwenLM#3809: **Real semantic fix**: - Long-run threshold now scales with the EFFECTIVE timeout, not the fixed default. A user who sets `timeout: 600_000` (10 min) gets the advisory at 5 min, not at 60s — respects the explicit timeout intent. Replaced the `LONG_RUNNING_FOREGROUND_THRESHOLD_MS` constant with a per-invocation `longRunThresholdFor(effectiveTimeout)` helper. **Debug-mode visibility**: - Debug mode previously snapshotted `returnDisplayMessage = llmContent` BEFORE the truncation + hint append, so debug-mode users saw the pre-hint content while the agent saw the advisory — agent suddenly suggesting `is_background: true` had no visible trigger in the TUI. Re-sync `returnDisplayMessage` after the hint append (debug-mode branch only) so the TUI mirrors what the agent sees. **Type-safety footgun**: - `if (typeof llmContent === 'string')` would silently drop the hint if `llmContent` ever becomes structured `Part[]`. Added an explicit `else` comment documenting the deliberate omission and the conditions under which to revisit (no string llmContent path exists today). **Style**: - Replaced the JSDoc `/** ... */` block on the (now-defunct) constant with a plain `//` comment block on the helper, matching the `DEFAULT_FOREGROUND_TIMEOUT_MS` / `OUTPUT_UPDATE_INTERVAL_MS` style. **Test hygiene**: - Wrapped both `vi.stubGlobal('AbortSignal', ...)` and `vi.spyOn(truncateToolOutput, ...)` in `try/finally` so failures during the test body don't leak the stub/spy into subsequent tests (would cause confusing cascading failures). - Dropped the internal-roadmap "Phase D part (a)" reference from the test comment — future maintainers don't have the context. **New test**: - `threshold scales with the user-supplied timeout (not the default)`: sets `timeout: 600_000`, advances 100s, verifies no hint. Pins the per-invocation coupling so a regression to a fixed constant would fail loudly here. * fix(core): tighten long-run hint suppression + boundary tests + post-truncation insertion (round 4) Six suggestions from /review's pai/glm-5-fp8 pass on PR QwenLM#3809: **Behavioural / UX**: - **Hint now visible in non-debug TUI too.** Previously only debug mode mirrored the hint into `returnDisplay`; non-debug users saw the agent suggest `is_background: true` with no visible trigger. Now the hint is appended to `returnDisplayMessage` in both modes (full mirror in debug, terse-append in non-debug to preserve the output-or-status form). **Test coverage**: - **Debug-mode re-sync test added.** All other long-run hint tests run with `getDebugMode → false`; this one flips it to true and asserts the hint appears in `returnDisplay` too. Pins the re-sync so a regression that drops the debug branch would fail loudly. - **Threshold-scaling positive test added.** The negative case (`timeout: 600_000`, advance 100s, no hint) was already pinned; paired now with the positive case (advance 305s, hint fires) so a regression to a fixed 60s threshold is caught at both ends. **Style / consistency**: - **`result.signal === null` (was `== null`).** Strict equality to match the rest of the file. The `signal` field is typed `number | null` so loose equality has identical semantics, but the inconsistency was noise. **Doc clarity (timing semantics)**: - **Comment explains why elapsedMs is computed BEFORE truncation.** Two reviewers disagreed on the timing — one read it as before truncation (correct, slightly under-reports), the other as after (incorrect read). The intent is to report the COMMAND's runtime, not the tool call's total time. Truncation is post-processing, not part of "agent blocking time", so excluding it is the right semantic. Inline comment now spells this out so future readers don't have to infer. * fix(core): error-path hint surfacing + clock-resilient elapsed + threshold floor + observability Round 5 of PR QwenLM#3809 review — 10 threads, mix of Critical and Suggestion: **Critical fixes**: 1. **Hint survives the error path** (`#OWbA`). When result.error is set, coreToolScheduler builds the model-facing functionResponse from `error.message` ONLY (not llmContent — see convertToFunctionResponse + the toolResult.error branch in scheduler:1648-1724). My hint was being silently dropped on long-command-failed cases. Now the hint is appended to error.message too so the advisory survives whichever branch the scheduler takes. 2. **Hint wording de-ambiguated** (`#OU6o`). "prefer re-running with is_background: true" was ambiguous — model could read it as "re-run THIS command in the background", which on stateful commands (DB migrations, deploys, git push) would cause double side effects. Reworded to "Next time you run a SIMILAR long-running process..." with an explicit parenthetical that warns against re-running the just-completed command. 3. **Debug observability** (`#OU6s`). Added `debugLogger.debug` at the hint decision point with elapsedMs / threshold / aborted / signal — when a user reports "my 65s command didn't get the hint" the suppression branch is now visible in DEBUG output. **Other behaviour fixes**: 4. **Threshold floor of 1000ms** (`#OU6r`). Pathological `timeout: 0` / `timeout: 1` would have given a 0-ms threshold, firing the hint on every invocation showing "ran for 0s". Floor at 1s makes that branch unreachable. 5. **`performance.now()` instead of `Date.now()`** (`#OU6v`). NTP corrections / VM clock drift between capture and read would silently make `elapsedMs` negative and skip the hint with no observable failure. Monotonic clock prevents that. 6. **Debug mode preserves truncation marker** (`#OU6w` / `#OWCq`). Previously `returnDisplayMessage = llmContent` after hint clobbered the "Output too long and was saved to: …" line appended during truncation. Switched to append-style re-sync in BOTH modes so prior content is preserved. **Test coverage gaps closed**: 7. **Non-debug returnDisplay test** (`#OWCo`). Pinned that the user TUI gets the hint in the default (non-debug) mode too. 8. **Test rename** (`#OWCl`). The "debug-mode TUI mirror" test passed in non-debug too after the recent refactor; split into two tests, one per branch. 9. **Error-path hint test**. Added a test that pins `result.error?.message` contains both the original error text AND the hint, covering the scheduler-routing-via-error.message path that was silently broken before fix #1. 10. **Test: faketimers also fakes `performance`**. Since we switched to `performance.now()`, `vi.useFakeTimers({ toFake: ['Date'] })` no longer covered the elapsed measurement; extended to `['Date', 'performance']` so the threshold tests can drive the wall-clock with `advanceTimersByTimeAsync`. #OU6t (else-comment for the type guard) was already addressed in the prior round — the explicit else-with-comment is in place; adding logging there would be noise. * test(core): cover the MIN_LONG_RUN_THRESHOLD_MS floor branch PR QwenLM#3809 review: the new `Math.max(MIN_LONG_RUN_THRESHOLD_MS, ...)` floor in `longRunThresholdFor` was untested — only default-timeout and large-custom-timeout cases existed. A regression that strips the floor would let `timeout: 1` produce a 0ms threshold and fire a "ran for 0s" advisory on every invocation; the test suite would not catch it. New test: build with `timeout: 1`, advance 500ms (below the 1000ms floor), resolve with `aborted: false` to isolate the threshold logic from the abort path. Asserts no hint appears. A regression that removes the floor flips the assertion to fail. * fix(core): structured delimiter on error.message hint + tighten timeout floor comment Two of three threads from the latest /review pass on PR QwenLM#3809 (the third — PR description / threshold scaling reconciliation — is fixed in the PR description update, not in code): - **`\n---\n` divider before hint in `error.message`** (`#Pt7C`). Downstream consumers of `error.message` (firePostToolUseFailureHook, telemetry grouping, SIEM alerting, hook-side error parsers) were receiving ~400 chars of advisory text mixed inline with the original error body — pattern-matching on error messages would absorb the advisory into the matched body. Added a `---` separator line so the boundary is unambiguous and split-able. - **Threshold-floor comment narrowed to `timeout: 1`** (`#Pu9o`). The comment said the floor guards `timeout: 0` / `timeout: 1`, but `validateToolParamValues` rejects `timeout <= 0` at validate time, so `timeout: 0` can't reach `longRunThresholdFor`. Updated the comment to mention only the actually-allowed pathological case (`timeout: 1` and any value `< 2` rounds to 0). Test updated to assert the `---` divider format with `toMatch`. * fix(core): capture executionStartTime AFTER spawn so PTY import isn't counted PR QwenLM#3809 review: copilot caught that `executionStartTime` was captured BEFORE `await ShellExecutionService.execute(...)`, which meant the elapsed measurement included `getPty()` dynamic-import setup (~50-200ms on first call). The hint's "ran for Xs" reading was slightly inflated, and the comment claiming "spawn → settle" wasn't strictly accurate. Moved the capture immediately after the execute() call returns its { result, pid } handle. The pid being set by that point confirms the process has been spawned, so the subtraction is true post-spawn-to- settle. Comment updated to reflect the actual semantics. The displayed accuracy gain is small (50-200ms on a 60s+ threshold is <1%), but the comment claim now matches what the code measures. Tests unaffected — fakeTimers don't drive real dynamic imports, so the threshold tests behave identically. * fix(core): align long-run hint code/tests with ShellExecutionResult.error semantics Four copilot threads on PR QwenLM#3809 — all rooted in the same observation: `ShellExecutionResult.error` is reserved for spawn/setup failures (per the field's doc comment in shellExecutionService.ts), NOT for non-zero exit codes. My existing code/tests conflated the two, making the error-path coverage less realistic and the inline comments inaccurate. **Test shape fixes**: - `appends the hint when a long-running foreground command exits with error` → `exits non-zero`. Changed `error: new Error('exit 1')` to `error: null` (the realistic shape for a non-zero exit without spawn failure). Added a comment explaining the field contract so future test authors don't repeat the conflation. - `hint survives the error path (appended to error.message)`: reframed the mock from `spawn ENOENT` (which would resolve in <1s in practice, making the long-elapsed scenario unrealistic) to `PTY initialization failed after 75s` — a slow-spawn-failure shape that COULD plausibly take 75s. Test still pins the same CODE PATH; comment now acknowledges the edge-case nature ("rare but real: PTY init dragging, remote-fs exec syscalls, security scanners interposing"). **Comment corrections**: - `returnDisplayMessage` build-order comment was misleading. It said "the hint is appended after both the truncation block and the returnDisplayMessage build" — but `returnDisplayMessage` is built BEFORE truncation. Replaced with a chronological enumeration (1. initial value, 2. truncation marker append, 3. hint append) that matches what the code actually does. - Error-path preservation comment now acknowledges the narrow applicability (spawn failures only, exit codes don't reach this branch). Code is unchanged — the path is still real, just rare. * test(core): pin empty-output success + background-no-hint paths Two defensive tests for the long-running foreground hint: - empty-output success at >=60s — exercises the returnDisplayMessage='' → hint append branch (write-only commands like `tar czf` / `cp -r` produce no stdout). Asserts the user- facing returnDisplay still surfaces the advisory even when the command produced nothing else to show. - background never includes the hint — the foreground hint logic lives in executeForeground only, so today this can't fail; the test guards against a future refactor hoisting the advisory into a shared post-execute path that would tag every background launch with a nonsensical "ran for 0s, consider is_background: true" suggestion.
…wenLM#3115) * feat: add commit attribution with per-file AI contribution tracking via git notes Track character-level AI vs human contributions per file and store detailed attribution metadata as git notes (refs/notes/ai-attribution) after each successful git commit. This enables open-source AI disclosure and enterprise compliance audits without polluting commit messages. * feat: enhance commit attribution with real AI/human ratios and generated file exclusion - Replace line-based diff with a prefix/suffix character-level algorithm for precise contribution calculation (e.g. "Esc"→"esc" = 1 char, not whole line) - Compute real AI vs human contribution percentages at commit time by analyzing git diff --stat output: humanChars = max(0, diffSize - trackedAiChars) - Add generated file exclusion (lock files, dist/, .min.js, .d.ts, etc.) ported from an existing generatedFiles.ts - Add file deletion tracking via recordDeletion() - Update git notes payload format: {aiChars, humanChars, percent} per file with real percentages instead of hardcoded 100% * feat: add surface tracking, prompt counting, session persistence, and PR attribution Align with the full attribution feature set: - Surface tracking: read QWEN_CODE_ENTRYPOINT env var (cli/ide/api/sdk), include surfaceBreakdown in git notes payload - Prompt counting: incrementPromptCount() hooked into client.ts message loop, tracks promptCount/permissionPromptCount/escapeCount - Session persistence: toSnapshot()/restoreFromSnapshot() for serializing attribution state; ChatRecordingService.recordAttributionSnapshot() writes to session JSONL; client.ts restores on session resume - PR attribution: addAttributionToPR() in shell.ts detects `gh pr create` and appends "🤖 Generated with Qwen Code (N-shotted by Qwen-Coder)" - Session baseline: saves content hash on first AI edit of each file for precise human/AI contribution detection - generatePRAttribution() method for programmatic access * fix: audit fixes — initial commit handling, cron prompt exclusion, failed commit counter preservation - Handle initial commit (no HEAD~1) by detecting parent with rev-parse and falling back to --root for first commit in repo - Exclude Cron-triggered messages from promptCount (not user-initiated) - Add commitSucceeded parameter to clearAttributions() so failed/disabled commits don't reset the prompts-since-last-commit counter - Add test for clearAttributions(false) behavior * fix: cross-platform and correctness fixes from multi-round audit - Normalize path.relative() to forward slashes for Windows compatibility - Use diff-tree --root for initial commits (git diff --root is invalid) - Replace String.replace() with indexOf+slice to avoid $& special patterns - Fix clearAttributions(false→true) when co-author disabled but commit succeeded - Use real newlines instead of literal \n in PR attribution text - Add surface fallback in restoreFromSnapshot for version compatibility - Fix single-quote regex to not assume bash supports \' escaping - Case-insensitive directory matching in generated file detection - Handle renamed file brace notation in parseDiffStat * fix(attribution): also snapshot on ToolResult turns so resume keeps tool edits Previously, recordAttributionSnapshot() only ran at the start of UserQuery and Cron turns — before the tools for that turn had executed. A session that wrote a file in turn 1 and committed in turn 2 (across process boundaries via --resume) lost the tracked edit: the last persisted snapshot was the turn-1-start snapshot (empty fileStates), so on resume the attribution service restored empty state and no git notes were attached to the commit. Move the snapshot call out of the UserQuery/Cron conditional and run it on every non-Retry turn. ToolResult turns are scheduled right after tools execute, so their start-of-turn snapshot now captures any edits those tools made. Retry turns are skipped since the state is unchanged from the prior turn. Added unit tests asserting the snapshot fires for ToolResult/UserQuery turns and skips Retry turns. Verified end-to-end in a scratch repo: write-file in turn 1 (no commit) → exit → --resume → commit in turn 2 → git notes now contain the recorded file with correct aiChars and promptCount: 2. * refactor(attribution): merge duplicate retry guard and update stale doc Collapse the two back-to-back messageType !== Retry blocks in sendMessageStream into one, and refresh chatRecordingService's recordAttributionSnapshot doc comment to reflect that snapshots fire on every non-retry turn (not just after user prompts). * feat(attribution): split gitCoAuthor into independent commit and pr toggles Matches the shape used upstream in Claude Code's `attribution.{commit,pr}` so users can disable the PR body line without losing the commit-message Co-authored-by trailer (or vice versa). The previous boolean forced both to move together, which conflated two different surfaces. - settingsSchema: gitCoAuthor becomes an object with nested commit/pr booleans, each `showInDialog: true` so both appear in /settings. - Config constructor accepts legacy boolean (coerced to { commit: v, pr: v }) so stored preferences from the pre-split schema carry over. - shell.ts: attachCommitAttribution and addCoAuthorToGitCommit read .commit; addAttributionToPR reads .pr. * feat(settings): add v3→v4 migration for gitCoAuthor shape change Legacy gitCoAuthor was a single boolean and shipped ~4 months ago; the previous commit split it into { commit, pr } sub-toggles. Without a migration, users who had set gitCoAuthor: false would see the settings dialog show the default (true) for both sub-toggles — misleading and likely to flip their preference on the next save because getNestedValue returns undefined when asked for .commit on a boolean. - New v3-to-v4 migration expands boolean → { commit: v, pr: v }, preserves already-object values, resets invalid values to {} with a warning. - SETTINGS_VERSION bumped 3 → 4; existing integration assertions use the constant so the next bump is a single-line change. - Regenerate vscode-ide-companion settings.schema.json to reflect the new nested shape. - Docs: split the single gitCoAuthor row into .commit and .pr. * test(migration): cover null/array/number and partial object for v3-to-v4 The migration already treats any non-boolean, non-object value as invalid (reset to {} with warning), but the existing test only exercised the string "yes" branch. Add parameterized cases for null, array, and number so a future regression that accepts these in the valid bucket gets caught. Also cover partial objects — the migration must not paternalistically fill defaults; that responsibility lives in normalizeGitCoAuthor at the Config boundary. * fix(shell): address PR review for compound commits and PR body escaping Two critical issues called out in review: 1. attachCommitAttribution treated the final shell exit code as proof that `git commit` itself failed. For compound commands like `git commit -m "x" && npm test`, the commit can succeed and a later step can fail; the previous code then cleared attribution without writing the git note. Now we snapshot HEAD before the command (via `git rev-parse HEAD` through child_process.execFile, kept independent of the mockable ShellExecutionService) and detect commit creation by HEAD movement, so attribution lands whenever a new commit was created regardless of later steps. 2. addAttributionToPR spliced the configured generator name into the user-approved `gh pr create --body "..."` argument verbatim. A name containing `"`, `$`, a backtick, or `'` could break the command or be evaluated as command substitution. Now we shell-escape the appended text per the surrounding quote style before splicing. Tests cover the new escape paths for both double- and single-quoted bodies, including a generator name designed to break interpolation (`$(rm -rf /) "danger" \`eval\``) and one with an apostrophe. * fix(attribution): address Copilot review on shell, schema, and totals Six items called out on PR #3115 by Copilot: - shell.ts: addAttributionToPR's bash quote escaping doesn't apply to cmd.exe / PowerShell, where `\$` and `'\''` aren't honored. Skip the PR body rewrite entirely on Windows — losing PR attribution there is preferable to corrupting the user-approved `gh pr create` command. - attributionTrailer.ts + shell.ts call site: buildGitNotesCommand used bash-style single-quote escaping on the JSON note, which is broken on Windows. Switched to argv form (`{ command, args }`) and routed the invocation through child_process.execFile so shell quoting is bypassed entirely. Tests updated to assert the argv shape. - commitAttribution.ts: when a tracked file's aiChars exceeded the diff --stat-derived diffSize (long-line edits where diffSize ≈ lines * 40), humanChars clamped to 0 but aiChars stayed inflated, leaving aiChars + humanChars > the committed change magnitude. Clamp aiChars to diffSize so the totals stay consistent. - shell.ts parseDiffStat: only normalized rename brace notation (`{old => new}`). Cross-directory renames emit `old/path => new/path` without braces, leaving diffSizes keyed by the full string. Added a second normalization step. - shell.ts: addAttributionToPR docstring claimed `(X% N-shotted)` but the implementation only emits `(N-shotted by Generator)`. Updated the docstring to match the actual behavior. - settingsSchema.ts + generator: gitCoAuthor went from boolean to object in the V4 migration. The exported JSON Schema now wraps the field in `anyOf: [boolean, object]` (via a new `legacyTypes` hint on SettingDefinition) so users with a stored boolean don't see a spurious IDE warning before their next launch runs the migration. * fix(attribution): parse binary diffs, source generator from model, sync schema $version Three follow-up review items from Copilot: - parseDiffStat now handles git's binary-diff format (`path | Bin A -> B bytes`) using the byte delta with a floor of 1. Without this, binary edits arrived at the attribution payload as diffSize=0 and were silently dropped. Also extracted the parser to a top-level exported function so the binary path is unit-testable; added five targeted cases (text/binary/rename normalisation/summary skip). - attachCommitAttribution now passes `this.config.getModel()` into generateNotePayload instead of the user-configurable `gitCoAuthor.name`. The note's `generator` field reflects which model produced the changes — and CommitAttributionService's sanitizeModelName() actually has the codename to scrub now. - generate-settings-schema.ts imports SETTINGS_VERSION instead of hardcoding `default: 3`, so a future bump propagates to the emitted JSON schema in one place. Regenerated settings.schema.json bumps $version's default from 3 to 4 to match the V4 migration. * fix(attribution): repo-root baseDir, escape co-author trailer, switch to numstat Three Critical items called out by wenshao: - attachCommitAttribution was passing config.getTargetDir() as `baseDir` to generateNotePayload, but getCommittedFileInfo returns paths relative to `git rev-parse --show-toplevel`. When the working directory was a subdirectory of the repo, path.relative produced `../...` keys that never matched in the AI-attribution lookup, silently zeroing out attribution for every file outside getTargetDir. StagedFileInfo now carries an optional `repoRoot` (filled in by getCommittedFileInfo via `git rev-parse --show-toplevel`) and the caller prefers it over the target dir. - addCoAuthorToGitCommit interpolated `gitCoAuthorSettings.name` and `.email` into the rewritten command without escaping. A name containing `$()`, backticks, or `"` could be evaluated as command substitution under double quotes, or break the user-approved `git commit -m "..."` quoting. Now escapes per the surrounding quote style with the same helpers addAttributionToPR uses, gates on non-Windows for the same shell-quoting reason, and fixes the regex to accept `-m"msg"` shorthand (no space) so users who type the bash-shorthand form aren't silently denied a trailer. - parseDiffStat used `git diff --stat` output and approximated each line as ~40 chars by parsing a graphical text bar. Replaced with `git diff --numstat` which gives unambiguous integer additions+deletions per file; the heuristic remains but the parser is no longer fooled by the visual `++--` markers. Binary entries fall back to a fixed estimate so they still land in the map (rather than dropping out as diffSize=0). Suggestions also addressed: stale duplicate JSDoc on addCoAuthorToGitCommit removed, misleading `clearAttributions` comments rewritten to describe what the boolean argument actually does. Tests cover the new shorthand path, escape behavior, and numstat parsing (text/binary/rename/malformed). * fix(shell): shell-aware git-commit detection and apostrophe-escape handling Two more Critical items called out by wenshao plus the matching Copilot quote-handling notes: - attachCommitAttribution and addCoAuthorToGitCommit now go through a shell-aware `looksLikeGitCommit` helper instead of a raw `\bgit\s+commit\b` regex. The helper splits the command on shell separators (`splitCommands`) and checks each segment, so `echo "git commit"` no longer triggers attribution clearing or trailer injection. The same helper bails on any segment that contains `cd` or `git -C <path>`, since either could redirect the commit into a different repo than our cwd — writing notes or capturing HEAD there would corrupt unrelated state. - The post-command attribution call now runs regardless of whether the shell wrapper aborted. `git commit -m "x" && sleep 999` could move HEAD and then time out, leaving the new commit without its attribution note while the stale per-file attribution stayed around for a later unrelated commit. attachCommitAttribution still gates on HEAD movement, so it's a no-op when no commit was actually created. - The `-m '...'` and `--body '...'` regexes used to match only the first quote segment, so a command like `git commit -m 'don'\''t'` (bash's standard apostrophe-escape form) would have the trailer spliced mid-message and break the command's quoting. The single- quote patterns now use a negative lookahead / inner alternation to either skip those messages entirely (commit path) or match the whole escape-aware body (PR path). Tests cover the new behavior: quoted "git commit" is left alone, the `cd && git commit` and `git -C` patterns get no trailer, and the apostrophe-escape form passes through unchanged for both `-m` and `--body`. * fix(attribution): drop magic 100 fallback for empty deletions Deleted files with no AI tracking now use diffSize directly. With numstat as the input source, diffSize is an exact count, and an empty-file deletion legitimately reports zero — a magic fallback would only inflate totals. * fix(shell): broaden git-commit detection, gate background, drop dead helpers Five Copilot follow-ups: - looksLikeGitCommit now strips leading env-var assignments (`GIT_COMMITTER_DATE=now git commit ...`) and a small allowlist of safe wrappers (`sudo`, `command`) before matching. The previous exact-prefix match silently skipped trailer injection on common real-world commit forms. - A new looksLikeGhPrCreate (same shell-aware shape) replaces the raw `\bgh\s+pr\s+create\b` regex in addAttributionToPR, so quoted text like `echo "gh pr create --body \"x\""` no longer triggers a command-string rewrite. - executeBackground refuses to run `git commit` and tells the user to re-run foreground. The BackgroundShellRegistry lifecycle has no hook for the post-command pre/post-HEAD comparison or git-notes write, so allowing the commit through would create the new commit without notes and leak stale per-file attribution into the next foreground commit. - recordDeletion was unused outside its own test — removed (and the test). When AI-driven deletions need tracking we'll add it with an actual integration point rather than carrying dead API surface. - generatePRAttribution was likewise unused; addAttributionToPR builds the trailer string inline. The two formats had already diverged. Removed the helper and its tests; reviving from git history is straightforward if a future caller needs it. Tests: env-var and sudo prefixes now produce trailers; quoted "gh pr create" leaves the command unchanged; existing 81 shell tests still pass alongside the trimmed 25 commitAttribution tests. * fix(shell): unified git-commit detection split by intent Six items called out across CodeQL, Copilot, and wenshao: - The earlier `looksLikeGitCommit`/`stripCommandPrefix` returned a single yes/no and rejected ANY `cd` in the chain. That fixed the wrong-repo case but also disabled attribution for `git commit -m "x" && cd ..` (commit already landed safely in our cwd; the cd came after). It also conflated three distinct decisions onto one predicate. New `gitCommitContext` returns both `hasCommit` and `attributableInCwd`, walking segments in order so that a `cd` AFTER the commit doesn't invalidate it. Callers now pick the right arm: - background-mode refusal uses `hasCommit` (refuses even `cd /elsewhere && git commit` since we can't attribute it afterward either way) - HEAD snapshot, addCoAuthorToGitCommit, and the attachCommitAttribution gate use `attributableInCwd` - Tokenisation switches from a regex while-loop to `shell-quote`'s `parse`. Quoted env values like `FOO="a b" git commit` now skip correctly (the old `\S*\s+` form would cut after the opening quote). Eliminates the CodeQL polynomial-regex alert at the same time since the `\S*\s+` pattern is gone. - attachCommitAttribution now snapshots prompt counters via `clearAttributions(true)` whenever a commit lands, even if no per-file attributions were tracked. Previously the early-return on `hasAttributions() === false` meant `promptCountAtLastCommit` never advanced, so a later `gh pr create` reported an inflated N-shotted count spanning multiple commits. Tests: env-var and sudo prefixes still produce trailers; quoted "git commit" / "gh pr create" leave commands unchanged; cd BEFORE commit suppresses the rewrite while cd AFTER commit does not; `git -C <path> commit` is treated as a commit (refused in background) but not as attributable. * fix(shell): position-independent git subcommand detection + bash-shell guard Six review items, two of them critical: - gitCommitContext was checking fixed-position tokens (`arg1`, `arg3`) and missed every git invocation that puts a global flag between `git` and the subcommand: `git -c user.email=x@y commit`, `git --no-pager commit`, `git -C /p -c k=v commit`, etc. In background mode these would slip past the refusal guard; in foreground they got no co-author trailer, no git note, and no prompt-counter snapshot. New `parseGitInvocation` walks past git's global flags (with their values) before reading the subcommand, and reports `changesCwd` for `-C` / `--git-dir` / `--work-tree`. - The Windows guard on addCoAuthorToGitCommit and addAttributionToPR used `os.platform() === 'win32'`, which incorrectly skipped Windows + Git Bash (`getShellConfiguration().shell === 'bash'`). Switched both to gate on `getShellConfiguration().shell !== 'bash'` so Git Bash users keep the feature. - attachCommitAttribution was re-parsing `gitCommitContext(command)` even though `execute()` already gates on `commitCtx.attributableInCwd`. Removed the redundant re-parse — drift between the two checks would silently diverge trailer injection from git-notes writes. - tokeniseSegment (formerly tokeniseProgram) now logs via debugLogger on parse failure instead of swallowing silently. Easier to debug if shell-quote ever throws on something unusual. - Added a comment on `cwdShifted` documenting that it's a one-way latch — `cd src && cd ..` will still skip attribution. The trade-off matches the wrong-repo guard's "better miss than corrupt unrelated repos" intent. - Stale `--stat` reference in the aiChars-clamp comment updated to `--numstat` to match the actual git command in ShellToolInvocation.getCommittedFileInfo. Tests: `git -c key=val commit` and `git --no-pager commit` now produce a trailer; existing 82 shell tests still pass. * fix(shell): refuse multi-commit attribution; misc review follow-ups Five follow-ups from the latest review pass: - attachCommitAttribution now refuses to write a single git note for shell commands that produce more than one commit (e.g. `git commit -m a && git commit -m b`). The singleton's per-file attribution map can't be partitioned across the individual commits, so attaching the combined note to HEAD would mis-attribute earlier commits' changes to the last one. Walks `preHead..HEAD` via `git rev-list --count`; on multi-commit detection it snapshots the prompt counters and bails with a debug warning instead of writing a misleading note. - parseGitInvocation now recognises the attached `-C/path` form (e.g. `git -C/path commit -m x`). shell-quote tokenises that as a single `-C/path` token which previously fell to the generic flag branch with `changesCwd = false`, leaving an out-of-cwd commit classified as attributable. - attachCommitAttribution dropped its unused `command` parameter (the caller already gates on `commitCtx.attributableInCwd`, so re-parsing was removed earlier; the parameter became dead). - Added wiring guards in edit.test.ts and write-file.test.ts: AI-originated edits/writes hit `CommitAttributionService.recordEdit`, `modified_by_user: true` skips, and write-file's distinction between a true new file and an overwritten empty file (`null` vs `''` old content) is now pinned by `aiCreated` assertions. * fix(attribution): partial-commit clear, symlink baseDir, gh/git flag handling Two Critical items, two Copilot, and five wenshao Suggestions: - attachCommitAttribution's `finally` block used to call `clearAttributions()` unconditionally, wiping per-file tracking for files the AI had edited but the user excluded from this commit. Added `clearAttributedFiles(committedAbsolutePaths)` to the service and the call site now passes only the paths that actually landed in this commit; entries for un-`add`ed files stay pending for a later commit. - generateNotePayload now runs both `baseDir` and each tracked absolute path through `fs.realpathSync` before `path.relative`. On macOS in particular `/var` symlinks to `/private/var`, so the toplevel from `git rev-parse --show-toplevel` and the absolute path captured by edit/write-file tools could diverge — producing `../../actual/path` keys in the lookup that never matched and silently zeroed all per-file AI attribution. - tokeniseSegment now consumes value-taking sudo flags (`-u`, `-g`, `-h`, `-D`, `-r`, `-t`, `-C`, plus the long forms). Without this, `sudo -u other git commit` left `other` standing in for the program name and skipped the trailer entirely. - A duplicate JSDoc block above `countCommitsAfter` (a leftover from the earlier extraction of `getGitHead`) was removed; both helpers now have one accurate comment each. - attachCommitAttribution's multi-commit guard now also runs when `preHead === null` (brand-new repo), via `git rev-list --count HEAD`. A compound `git init && git commit -m a && git commit -m b` no longer slips through and mis-attributes combined data to the last commit. - addCoAuthorToGitCommit's `-m` matching switched to `matchAll` and takes the LAST match. `git commit -m "title" -m "body"` puts the trailer at the end of the body so `git interpret-trailers` recognises it; the previous first-match behaviour stuffed the trailer in the title where git treats it as plain message text. - addAttributionToPR's `--body` regex accepts both space and `=` separators (`--body "..."` and `--body="..."`); the `=` form is common with gh. - New `parseGhInvocation` walks past gh's global flags (`--repo`, `-R`, `--hostname`) so `gh --repo owner/repo pr create ...` is detected. The earlier fixed-position check at tokens[1]/tokens[2] missed any command with a global flag. - getCommittedFileInfo now fans out the two `rev-parse` calls and the three diff calls with `Promise.all`. They're independent and serialising them was paying spawn latency 5× per commit. Tests: sudo with `-u user`, multi `-m`, `gh --repo owner/repo`, `--body="..."`, plus the existing 84 shell tests still pass. * fix(attribution): canonicalize file paths centrally in CommitAttributionService Two related Copilot follow-ups: - recordEdit/getFileAttribution/clearAttributedFiles now run input paths through fs.realpathSync before storing/looking up, so a symlinked path (e.g. macOS /var ↔ /private/var) resolves to the same key regardless of which form the caller passes. Previously edit.ts/write-file.ts handed in non-realpath'd absolute paths while generateNotePayload tried to realpath only inside its lookup loop, leaving partial-clear and clear-on-finally paths unable to find entries when the forms diverged. - restoreFromSnapshot also canonicalises on the way in so a session resumed from a pre-fix snapshot (where keys may not have been canonical) ends up with the same shape as newly recorded entries — otherwise a single file could end up with two parallel records. - generateNotePayload's lookup loop dropped its per-entry realpath call (now redundant since keys are canonical at write time), keeping only the realpath of `baseDir` (which still comes from `git rev-parse --show-toplevel` and may be a symlink). - Updated `clearAttributedFiles` doc to describe the new semantics: callers can pass either the resolved repo-relative path or an already-canonical absolute path, and either will match. * fix(attribution): canonicalize-from-root cleanup; fix mixed-quote -m / gh -R= Five review items, one Critical: - attachCommitAttribution now canonicalises via the repo *root* (one realpath call) and resolves committed paths against that canonical root, rather than per-leaf realpath inside clearAttributedFiles. At cleanup time the leaf for a just-deleted file no longer exists, so per-leaf fs.realpathSync would fail and silently fall back to a non-canonical path that misses the stored canonical key — leaving stale attributions for deleted files. clearAttributedFiles drops its internal realpath and now documents the canonical-paths-required precondition explicitly. - addCoAuthorToGitCommit picks the LAST `-m` regardless of quote style. Previously `doubleMatch ?? singleMatch` always preferred the last double-quoted match, so `git commit -m "Title" -m 'Body'` injected the trailer into the title where git interpret-trailers would silently ignore it. Now compares match indices, and the escape helper follows the actually-selected match's quote style. - parseGhInvocation handles `-R=value` (the equals form of the short `--repo` alias). `--repo=...` and `--hostname=...` were already covered; `-R=...` previously fell through to the generic flag branch and skipped the value. - New tests for the symlink-aware canonicalisation: macOS-style `/var` ↔ `/private/var` mapping is mocked via vi.mock on node:fs, with cases for record-then-look-up under either form, generateNotePayload with a symlinked baseDir, partial clear via the canonical-root-derived path (deleted leaf), and snapshot restore canonicalisation. - Doc-only: integration-test header comments updated from "V1 -> V2 -> V3" / "migration to V3" to reflect the actual V4 end state (assertions already used the literal `4`). * fix(shell): scope -m rewrite to commit segment, reject nested matches Two Critical findings on addCoAuthorToGitCommit, plus a Copilot maintainability nit: - The `-m` regex used to scan the whole compound command, so `git commit -m "fix" && git tag -a v1 -m "release"` would target the LATER tag annotation (last -m wins) and splice the trailer there instead of the commit message. The rewrite now scopes to the actual `git commit` segment via a new findAttributableCommitSegment(): same shell-aware walk gitCommitContext does, but returning the segment's character range so the regex can be run on a slice and spliced back into the original command. - Within the segment, a literal `-m '...'` *inside* a quoted body was treated as a real later -m. For `git commit -m "docs mention -m 'flag' for completeness"`, the inner single-quoted -m sits at a higher index than the real outer -m, and the previous index comparison would have it win — splicing the trailer mid-message and corrupting the quoting. The new code checks whether the candidate is nested inside the other quote-style's range (start/end containment) and prefers the outer match when so. - Hoisted three constant Sets (sudo flag list, git global flags taking values, git global flags shifting cwd, gh global flags) out of the per-call scope to module constants. Functional no-op, but keeps the parsing helpers easier to read and avoids re-allocating the Sets on every command. Two regression tests added for the cases above: - inner `-m '...'` inside the outer message body is preserved literally and the trailer lands after the body - `git tag -a v1 -m "release notes"` after a real `git commit -m "fix"` is left untouched, with the trailer appended to "fix" only * fix(attribution): cd-leak, numstat partial failure, $() bailout, gh pr new alias Five Critical/Suggestion items: - `cd subdir && git commit` (or any non-attributable commit chain whose HEAD movement still happens in our cwd, e.g. cd into a subdirectory of the same repo) used to skip attribution AND fail to clear pending per-file entries. Those entries then leaked into the next foreground commit, inflating its AI percentage. New `else if (commitCtx.hasCommit)` branch in execute() compares pre- and post-HEAD; if HEAD moved we drop the per-file state. preHead is now snapshotted whenever ANY commit was attempted, not only attributable ones. - getCommittedFileInfo's three diff calls run in `Promise.all`. If `--numstat` failed while `--name-only` succeeded, every file's diffSize would be 0 and generateNotePayload would clamp aiChars to 0 — emitting a structurally valid note with all-zero AI percentages. Detect the partial-failure shape (files non-empty, diffSizes empty) and return empty so no note is written. - addCoAuthorToGitCommit and addAttributionToPR now bail when the captured `-m`/`--body` value contains `$(`. The tool description recommends `git commit -m "$(cat <<'EOF' ... EOF)"` for multi-line messages, but the regex's `(?:[^"\\]|\\.)*` body group stops at the first interior `"` from a nested shell token — splicing the trailer there breaks the command before it reaches the executor. - looksLikeGhPrCreate now accepts `gh pr new` as well — it's a documented alias for `gh pr create` and was silently skipped. - Removed `incrementPermissionPromptCount` / `incrementEscapeCount` and their getters: they had no production callers, so the backing fields just round-tripped through snapshots as 0. The four snapshot fields are now optional so pre-fix snapshots that carry non-zero values still load cleanly and just get ignored. Three regression tests added: heredoc-style `-m "$(cat <<EOF...)"` preserved literally, heredoc-style `--body` likewise, `gh pr new --body "..."` rewritten with attribution. * fix(attribution): --amend, --message/-b aliases, .d.ts over-exclusion Four Copilot follow-ups, three of them user-visible coverage gaps: - `git commit --amend` was diffing `HEAD~1..HEAD` for attribution, which spans the entire amended commit (parent → amended) rather than the actual amend delta. A message-only amend would emit a note attributing every file in the original commit to this amend. New `isAmendCommit` helper detects the flag and getCommittedFileInfo switches to `HEAD@{1}..HEAD` (the pre-amend HEAD vs the amended HEAD); if the reflog is GC'd we bail with a warning rather than over-attribute. - `git commit --message "..."` and `--message="..."` were silently skipped because the regex only recognised the short `-m` form. The flag prefix now matches both alternatives via `(?:-[a-zA-Z]*m|--message)\s*=?\s*` (non-capturing inner group so the existing `[full, prefix, body]` destructure still works). - `gh pr create -b "..."` (the short alias for `--body`) was the same gap on the PR side; `(?:--body|-b)[\s=]+` now covers both forms. - `.d.ts` was an over-broad blanket exclusion in EXCLUDED_EXTENSIONS — declaration files are commonly authored (ambient declarations, asset shims like `*.d.ts` for `import './x.svg'`); the repo even contains `packages/vscode-ide-companion/src/assets.d.ts`. Removed `.d.ts` from the extensions Set and adjusted the test to assert the new behavior. Auto-generated `.d.ts` (e.g. `tsc --declaration` output) still gets caught by the build-directory rules. Tests added: `--amend` plumbing covered by the new branch in getCommittedFileInfo (no targeted unit test — the diff invocation goes through ShellExecutionService and is exercised by the existing post-command path); `--message`/`--message="..."`/-b/-b="..."` all have positive trailer-injection assertions; `.d.ts` test split into "hand-authored" (negative) and "in dist" (positive). * fix(attribution): cd-subdir, scope --body, multi-commit count guard, /clear reset Four bugs flagged this round: - gitCommitContext / findAttributableCommitSegment used a blanket "any cd shifts cwd" gate, breaking the very common `cd subdir && git commit -m "..."` flow even though the commit lands in the same repo. New `cdTargetMayChangeRepo` heuristic: treat relative paths that don't escape upward (no leading `..`, no absolute path, no `~`/`$VAR` expansion, no bare `cd`/`cd -`) as in-repo and let attribution proceed. Conservative on anything it can't statically verify. - addAttributionToPR was running the `--body`/`-b` regex against the FULL compound command string. In `curl -b "session=abc" && gh pr create --body "summary"` the regex would match curl's `-b` cookie flag and inject attribution into the cookie value, corrupting the curl call. Added `findGhPrCreateSegment` (analog of `findAttributableCommitSegment`) and scoped the body regex to that segment, splicing back into the original command via offsetting the in-segment match index. - The multi-commit guard treated `runGitCount === 0` as "single commit" and bypassed itself. After `commitCreated === true`, a count of 0 is impossible in normal operation — it means rev-list errored or timed out. Now we bail on `commitCount !== 1` with a tailored message: anything other than exactly 1 commit is suspicious and refuses the note. - The CommitAttributionService singleton survives across `Config.startNewSession()` (the `/clear` and resume paths). New `CommitAttributionService.resetInstance()` call alongside the existing chat-recording / file-cache resets in startNewSession prevents pending attributions from a prior session attaching to a commit in the new one. Three regression tests added: `cd src && git commit` produces a trailer (in-repo cd), `cd .. && git commit` does not (could escape repo root), and `curl -b "..." && gh pr create --body "..."` leaves curl's cookie value untouched while attribution lands in gh's body. * fix(attribution): cd embedded .., env wrapper, Windows ARG_MAX, segment-locator warn Four review items, all small but real: - cdTargetMayChangeRepo missed embedded `..` traversal — `cd foo/../../escape` and similar would slip past the leading-`..` check and be treated as in-repo. Added an `includes('/..')` / `includes('\\..')` check (catches POSIX and Windows separators without false-positiving on `..` chars inside ordinary names, which only escape when followed by a separator). - tokeniseSegment now recognises `env` as a safe wrapper alongside `sudo`/`command`, so `env GIT_COMMITTER_DATE=now git commit ...` resolves to `git`. After the wrapper detection we also skip any `KEY=VALUE` argv entries (env's own argument syntax for setting vars before the program). - buildGitNotesCommand's MAX_NOTE_BYTES dropped from 128 KB to 30 KB. Windows' CreateProcess lpCommandLine is capped around 32,768 UTF-16 chars including the executable path and other argv entries; a 128 KB note would still fail to spawn even though the function returned a command instead of null. 30 KB leaves ~2 KB of headroom for the rest of the argv on Windows and is larger than any real commit's metadata in practice. - findAttributableCommitSegment / findGhPrCreateSegment now log a debugLogger.warn when `command.indexOf(sub, cursor)` returns -1 — splitCommands strips line continuations (`\<newline>`), so a multi-line command can have the trimmed segment text fail to match its source. Previously the segment was silently skipped with no signal; the warn makes the failure observable when QWEN_DEBUG_LOG_FILE is set. Two regression tests added: `cd foo/../../escape && git commit` gets no trailer (embedded-`..` heuristic catches it), and `env GIT_COMMITTER_DATE=now git commit` does (env wrapper skipped). * fix(attribution): scope isAmendCommit to attributable segment only `git -C ../other commit --amend && git commit -m x` would previously flag the second (fresh) commit as an amend, causing attachCommitAttribution to diff `HEAD@{1}..HEAD` against an unrelated reflog entry. Mirror findAttributableCommitSegment's cd/cwd tracking so only the first commit segment that runs in the original cwd determines amend status. * fix(attribution): last-match --body, symlink leaf canonicalisation, scoped prompt count - addAttributionToPR: use matchAll/last-match for `--body`/`-b` so the trailer lands in the gh-honoured (final) body when multiple flags are present. Mirrors addCoAuthorToGitCommit. Adds regression test. - attachCommitAttribution: also fs.realpathSync the per-file resolved path (not just the repo root) so files behind intermediate symlinks are matched against canonical keys recordEdit stored, instead of silently zeroing attribution and leaking entries past commit. - incrementPromptCount: scope to SendMessageType.UserQuery — ToolResult, Retry, Hook, Cron, Notification are model/background re-entries of the same logical turn. Tracking them all inflated the "N-shotted" trailer (one user message could become 10-shotted with 10 tool calls). - AttributionSnapshot: add `version: 1` field; restoreFromSnapshot now refuses incompatible versions and validates per-field types so a partially-written snapshot can't seed `Math.min(undefined, n) === NaN` into git-notes payloads. - Drop unused permission/escape counters (declared, persisted, never read or incremented) — fields, snapshot tolerance, and clear-method bookkeeping all removed; AttributionSnapshot interface simplifies. - isGeneratedFile: switch directory rule from substring `.includes('/dist/')` to segment-boundary check (split on `/`) so project dirs like `my-dist/` or `xbuild/` don't match. `.lock` removed from the blanket extension exclusion — well-known lockfiles already covered by EXCLUDED_FILENAMES; hand-authored `.lock` files (e.g. `.terraform.lock.hcl`) now stay attributable. - getClientSurface: document `QWEN_CODE_ENTRYPOINT` as the embedder override hook so the always-`'cli'` default is intentional. * fix(attribution): skip values for env -u NAME and -S string `env`'s value-taking flags (`-u`/`--unset`, `-S`/`--split-string`) were not in the wrapper's flag-skip allowlist, so `env -u FOO git commit ...` left FOO as the next token and the parser treated it as the program — masking the real `git commit` from attribution detection. Add an ENV_FLAGS_WITH_VALUE table mirroring the sudo allowlist. Regression test added. * fix(attribution): submodule leak, PR body nesting, shallow-clone bail, schema default - attachCommitAttribution: when HEAD didn't move in our cwd, leave pending attributions alone instead of dropping them. The case can be a failed commit, `git reset HEAD~1`, OR `cd submodule && git commit` (inner repo's HEAD moves, ours doesn't). Dropping was overly aggressive and silently lost outer-repo edits in the submodule case. - addAttributionToPR: mirror addCoAuthorToGitCommit's nested-match rejection so `gh pr create --body "docs mention -b 'flag'"` picks the outer `--body`, not the inner literal `-b`. Splicing into the inner match would corrupt the body. Regression test added. - getCommittedFileInfo: when `rev-parse --verify HEAD~1` fails, also check `rev-list --count HEAD === 1` to confirm HEAD is the true root commit. In a shallow clone, HEAD~1 is unreadable but the commit has a parent recorded — falling back to `diff-tree --root` would diff against the empty tree and over-attribute the entire commit. Bail with a debug warning instead. - generate-settings-schema: lift `default` (and `description`) out of the inner `anyOf[N]` schema to the outer level when wrapping with `legacyTypes`. Most JSON-schema-driven editors only surface top-level defaults; burying the default under `anyOf` lost the "enabled by default" hint. Also extend the default filter to publish non-empty plain objects (so `gitCoAuthor`'s default can appear). gitCoAuthor's source default updated to the runtime shape `{commit: true, pr: true}` to match `normalizeGitCoAuthor`. * fix(attribution): drop unsafe full-clear, tag analysis-failure with null ju1p (Copilot): the `else if (commitCtx.hasCommit)` branch fully cleared the singleton on `cd /abs/same-repo/subdir && git commit` (or `git -C . commit`), losing pending AI edits the user hadn't staged. We can't tell which files were in the commit from this branch, and the next attributable commit's partial-clear handles cleanup correctly anyway. Drop the branch entirely. ju2D (Copilot): `getCommittedFileInfo` returned the same empty StagedFileInfo for both "could not analyze" (shallow clone, --amend without reflog, --numstat partial failure, exception) and "intentionally empty" (--allow-empty). The caller couldn't tell them apart, so the partial clear became a no-op on analysis failure and the just-committed AI edits leaked to the next commit. Switch the return type to `StagedFileInfo | null` and have the caller treat null as "fall back to full clear" while empty StagedFileInfo (--allow-empty) leaves attributions intact for the next real commit. * fix(attribution): dedup snapshot writes, cap excludedGenerated, doc commit toggle scope rsf- (Copilot): recordAttributionSnapshot wrote a full snapshot to the JSONL on every non-retry turn, even when the tracked state was unchanged. Long-running sessions accumulated thousands of identical snapshot copies, inflating session size and slowing /resume hydrate. Dedup by JSON-equality with the prior write — first write always goes through, identical successors are no-ops. rsgo (Copilot): excludedGenerated path list was unbounded. A commit churning thousands of generated artifacts (large dist/ rebuild) could push the JSON note past MAX_NOTE_BYTES (30KB) and lose attribution for the real source files in the same commit. Cap the serialized sample at MAX_EXCLUDED_GENERATED_SAMPLE (50) and add excludedGeneratedCount for the true total. rsg9 + rshM (Copilot): the gitCoAuthor.commit description claimed the toggle only controlled the Co-authored-by trailer, but attachCommitAttribution also gates the per-file git-notes payload on the same flag. Update both the schema description and the settings.md table to mention both effects so disabling the option isn't a silent surprise. * fix(attribution): depth-1 shallow detection, snapshot dedup post-rewind/post-failure sfGz (Copilot): rev-list --count HEAD === 1 cannot distinguish a true root commit from a depth-1 shallow clone — both report 1 because rev-list only walks locally available objects. Switch to git log -1 --pretty=%P HEAD which reads the parent SHA directly from commit metadata: empty means a real root, non-empty means a parent is recorded (whether or not its object is local). The shallow-clone bail is now reliable. sfIm (Copilot): the dedup key persisted across rewindRecording, so the previous snapshot living on the now-abandoned branch would match the next post-rewind snapshot and silently skip the write, leaving /resume on the rewound session with no attribution state. Reset lastAttributionSnapshotJson when rewindRecording fires. sfJE (Copilot): dedup key was committed before the async write settled. A transient write failure would update the key, then permanently suppress all future identical snapshots even though nothing was ever persisted. Switch to optimistic-set then rollback on appendRecord rejection — synchronous identical calls dedup cleanly, but a failed write clears the key so the next identical snapshot retries. appendRecord now returns the per-record write promise (writeChain still has its swallow-catch for chain liveness) so callers needing per-write success can react to it. Tests added in chatRecordingService.test.ts for both rewind-reset and rollback-on-failure paths. * fix(attribution): preHead race, regex apostrophe-escape, surface failures, dead code t2G0 (deepseek-v4-pro): addCoAuthorToGitCommit single-quote regex now matches the bash close-escape-reopen apostrophe form using ((?:[^']|'\\'')*) — the same pattern bodySinglePattern uses for gh pr create. Input like git commit -m 'don'\''t' was previously silently un-rewritten because the negative lookahead bailed; the trailer now lands at the FINAL closing quote. Test updated. tMBP (gpt-5.5): preHead capture switched from concurrent async getGitHead to a synchronous getGitHeadSync (execFileSync) BEFORE ShellExecutionService.execute spawns the user's command. A fast hot-cached git commit could move HEAD before the async rev-parse resolved, leaving preHead === postHead and silently skipping the attribution note. Trade ~10–50 ms event-loop block per commit-shaped command for correctness of the post-command HEAD comparison. t2Gv (deepseek-v4-pro): attribution write failures (note exec non-zero, payload too large, diff-analysis exception, shallow clone / amend-without-reflog) are now surfaced on the shell tool's returnDisplay AND llmContent so the user and agent both see when their commit succeeded but the per-file git note didn't land. attachCommitAttribution now returns string | null (warning text or null for intentional skips like no-tracked-edits). Co-authored-by trailer is unaffected — only the note is gated by these failures. t2Gy (deepseek-v4-pro): committedAbsolutePaths now matches against the canonical keys already stored in fileAttributions (matchCommittedFiles iterates by relative path against the canonical repo root) instead of re-resolving each diff path on the fly. realpathSync(resolved) failed for deleted files and didn't follow intermediate symlinks, leaving stale per-file attribution alive past commit and inflating AI percentages on subsequent commits. t2HI (deepseek-v4-pro): removed dead sessionBaselines / FileBaseline / contentHash / computeContentHash infrastructure (~40 lines). The fields were written, persisted, and restored but never read for any computation or decision. AttributionSnapshot schema stays at version 1 — restore tolerates pre-fix snapshots that carried the now-ignored baselines field. t2HM (deepseek-v4-pro): extracted the duplicated lastMatch helper in addCoAuthorToGitCommit and addAttributionToPR into a single module-level lastMatchOf so future fixes can't be applied to only one copy. * chore(schema): regenerate settings.schema.json to match gitCoAuthor.commit description The settingsSchema.ts source for `gitCoAuthor.commit.description` was updated in 3c0e3293b but the JSON schema only picked up the OUTER description rewrite and missed this inner property's. The Lint check ("Check settings schema is up-to-date") fails on that drift; this commit re-runs `npm run generate:settings-schema` to sync them. * fix(attribution): preserve unstaged AI edits across cleanup branches uxU5 + uxVQ + uxUO (Copilot): every cleanup branch in attachCommitAttribution that called clearAttributions(true) was wholesale-erasing pending AI edits for files the user never staged in this commit. Reviewer scenarios: - multi-commit chain (`commit a && commit b`) bails out without writing a note, but unstaged edits to file Z (touched by neither commit) get cleared along with the chain's committed files. - attribution toggle off: same — toggling the flag wipes pending unstaged work. - analysis failure (shallow clone, --amend without reflog, partial diff failure): the finally-block fallback wholesale-cleared every pending file, consuming unrelated AI edits. - 0%-AI commit: when no file in the commit was AI-touched, generateNotePayload was emitting an "0% AI" note attached to a commit that legitimately had no AI involvement — actively misleading metadata. Add `noteCommitWithoutClearing()` to the service: snapshots the prompt counter as the new "at last commit" but leaves the per-file map alone. Use it in the multi-commit, no-tracked-edits, toggle-off, and analysis-failure paths. The committed-files partial-clear (clearAttributedFiles) still runs in the success path. The 0%-AI no-match case now skips the note write entirely. * fix(attribution): runGit null-on-failure, versionless v3→v4 migration z54M (Copilot): runGit returned '' on both successful-empty-output and silent failure, so a `--name-only` that errored mid-way through the diff fan-out aliased to a real `--allow-empty` commit. The empty-commit branch then preserved pending attributions, leaving the just-committed file's tracked AI edit alive to re-attribute on the next commit. Switch runGit to `Promise<string | null>`, distinguishing exit code 0 (any output, including '') from non-zero (null). The diff-stage fan-out and ancillary probes now treat null as analysis failure and bail with `return null` instead of falling into the empty-commit path. z539 (Copilot): the v3→v4 `shouldMigrate` only fired on `$version === 3`. A versionless settings file carrying the legacy `general.gitCoAuthor: false` boolean would skip every migration (gitCoAuthor isn't in V1_INDICATOR_KEYS — it post-dates V2), get its `$version` normalized to 4 by the loader, and leave the boolean in place. The settings dialog then reads the V4 `{commit, pr}` shape, sees missing keys, defaults both to true, and silently overwrites the user's opt-out on the next save. Also fire when `$version` is absent AND the value at `general.gitCoAuthor` is a boolean. Tests cover the new path and confirm the existing versioned/object-shape paths are untouched. * fix(attribution): toggle-off partial clear, normalizeGitCoAuthor type-check, terraform lockfile 0oAK (Copilot): the gitCoAuthor.commit toggle-off branch returned before computing the committed file set, leaving the just-committed files' tracked AI work in the singleton. Re-enabling the toggle and committing the same file again would re-attribute earlier (already- committed) AI edits to the new commit. Move the toggle gate AFTER matchCommittedFiles so the finally block does a proper partial clear of the just-committed files even when the note write is skipped. 0oAg (Copilot): normalizeGitCoAuthor copied value?.commit / value?.pr without type-checking. settings.json is hand-editable; a stored `{ commit: "false" }` reached runtime as a truthy string and behaved as if attribution were enabled. Add a per-field bool coercion that falls back to the schema default (true) for any non-boolean, matching what the dialog and IDE schema already imply. Tests cover the string / number / null cases. 0oAo (Copilot): v3→v4 shouldMigrate only special-cased versionless legacy booleans — versionless files with invalid gitCoAuthor values (`"off"`, `[]`, etc.) skipped the migration and the loader stamped `$version: 4` over the bad value. Runtime normalization then silently re-enabled attribution. Extend shouldMigrate to fire on ANY versionless non-object value at general.gitCoAuthor; the existing migrate() body's drop-and-warn path resets it. Already-object shapes (hand-edited to v4) still skip cleanly. Tests added. 0oAt (Copilot): `.terraform.lock.hcl` got dropped from generated-file exclusion when `.lock` was removed from the blanket extension list in 3c0e3293b. It's a generated provider lockfile in the same class as `package-lock.json` and dominates Terraform-repo commits. Re-add to EXCLUDED_FILENAMES and add a regression test covering both repo-root and module-nested locations. * fix(attribution): harden restoreFromSnapshot against corrupt payloads 1KMY (Copilot): snapshot.surface was copied without type validation. A corrupted/partially-written snapshot with a non-string surface (e.g. {}, 42, null) would later be serialized into the git note as "[object Object]" and used as a Map key downstream, breaking the expected payload shape. Type-check and fall back to the current client surface for any non-string (or empty-string) value. 1KLq (Copilot): per-field sanitiseCount enforced `promptCount >= 0` and `promptCountAtLastCommit >= 0` independently, but never the cross-field invariant. A snapshot with promptCountAtLastCommit > promptCount would surface a negative getPromptsSinceLastCommit() and propagate as a "(-N)-shotted" trailer into PR text. Clamp atLastCommit to total on restore. 1KL_ (Copilot): when a snapshot carried both the symlinked and canonical paths for the same file (a session straddling the canonicalisation fix), `set(realpathOrSelf(k), ...)` overwrote the first entry with the second, silently dropping the AI contribution the first form had accumulated. Merge instead: sum aiContribution and OR aiCreated when collapsing duplicate keys. Tests cover all three branches: non-string surface fallback, promptCount clamp, and duplicate-key merge. * fix(attribution): roll back snapshot dedup key on sync appendRecord failure 1UMh (Copilot): appendRecord can throw synchronously before returning a promise — e.g. when ensureConversationFile() rethrows a non-EEXIST writeFileSync error. The async .catch() handler attached to the promise never runs in that case, so the optimistic dedup-key set sticks on a write that never landed and permanently suppresses identical retries. Roll back lastAttributionSnapshotJson in the outer catch too. Regression test forces writeFileSync to throw EACCES on the first invocation, then asserts the second identical snapshot attempt fires a fresh write rather than getting deduped. * docs(attribution): align cleanup-branch comments with noteCommitWithoutClearing Three doc/test-fixture stale-after-refactor cleanups (Copilot 4MDx / 4MEI / 4MEa): - shell.ts:1944 (around the stagedInfo === null branch): the comment still claimed the finally block "falls back to a full clear", but 1ece87438 switched analysis-failure cleanup to noteCommitWithoutClearing(). Update the comment so the reasoning matches what the code actually does (and so a future reader doesn't reintroduce the wholesale clear thinking it's already there). - shell.ts: getCommittedFileInfo docstring carried the same stale "full clear" claim for the `null` return value. Update to describe the noteCommitWithoutClearing() fallback and the smaller-evil trade-off for the just-committed file. - chatRecordingService.test.ts: baseSnapshot fixture for the recordAttributionSnapshot tests still carried `baselines: {}`, even though that field was removed from AttributionSnapshot in 296fb55ae's dead-code purge. Structural typing let it compile, but the fixture didn't reflect the production shape — drop it. * fix(attribution): restore fire-and-forget appendRecord, route rollback via callback 6OcJ (Copilot): refactor in 715c258fb returned a Promise from appendRecord so the snapshot dedup-key path could chain rollback — but recordUserMessage / recordAssistantTurn / recordAtCommand / recordSlashCommand / rewindRecording all call appendRecord without await or .catch(). A transient jsonl.writeLine rejection on any of those would surface as an unhandled-promise-rejection (warning, or crash on --unhandled-rejections=throw). Restore the original fire-and-forget semantics: appendRecord again returns void and internally swallows async failures (logging via debugLogger). Per-record failure reactions are routed through an optional onError callback — recordAttributionSnapshot uses this to roll back lastAttributionSnapshotJson when the write that set it ends up rejecting. Tests: add a fire-and-forget regression that mocks writeLine to reject and asserts no unhandledRejection events fire while the existing snapshot rollback tests (sync + async) still pass via the new callback path. * fix(attribution): GIT_DIR repo-shift bail, snapshot envelope validation, narrow legacyTypes 80ME (gpt-5.5 /review, [Critical]): tokeniseSegment unconditionally stripped every leading KEY=value token. `GIT_DIR=elsewhere/.git git commit ...` was therefore treated as an in-cwd commit, picked up the Co-authored-by trailer, and produced a per-file note that landed against our cwd's HEAD even though the actual commit went to a different repo. Define a GIT_ENV_SHIFTS_REPO set (GIT_DIR, GIT_WORK_TREE, GIT_COMMON_DIR, GIT_INDEX_FILE, GIT_NAMESPACE) and have tokeniseSegment refuse to parse any segment whose leading env block (including the env-wrapper's KEY=VALUE block) carries one of these. Identity / date variables (GIT_AUTHOR_*, GIT_COMMITTER_*) are deliberately NOT in the set — they tweak metadata but don't relocate the repo. Tests cover plain prefix, env-wrapped prefix, and a GIT_COMMITTER_DATE positive control that should still get the trailer. 8EeQ (Copilot): restoreFromSnapshot received `snapshot as AttributionSnapshot` from a structural cast off `unknown` (the resume path), so its TS-typed shape was only a hint. A corrupted JSONL line (non-object / array / wrong type discriminator / missing type) would skip past the version check straight into Object.entries(snapshot.fileStates) — and a non-object fileStates (an array, say) seeded fileAttributions with numeric-string keys. Add envelope-level shape gates (isPlainObject + type discriminator) and a fileStates plain-object check before iterating; both bail to a clean reset rather than poisoning the singleton. Tests added. 8Eej (Copilot): SettingDefinition.legacyTypes was typed as SettingsType[] which includes 'enum' and 'object' — JSON Schema's `type` keyword doesn't accept those values. Adding `legacyTypes: ['enum']` would silently produce an invalid settings.schema.json. Narrow the field's type to ReadonlyArray<'boolean' | 'string' | 'number' | 'array'> (the JSON-Schema-primitive subset). Future complex-shape legacy support should land its own branch in convertSettingToJsonSchema. * docs(attribution): correct legacyTypes / EXCLUDED_DIRECTORY_SEGMENTS comments 9Ta_ (Copilot): the JSDoc on legacyTypes claimed JSON Schema's `type` keyword does not accept `'object'` — that's wrong; `'object'` IS a valid JSON Schema type. Reword to reflect the actual rationale: `'enum'` is not a valid JSON Schema `type` value at all (enum constraints use the `enum` keyword), and a bare `{type: 'object'}` would accept any object regardless of what the field's pre-expansion shape actually allowed. The narrowed `boolean | string | number | array` set is exactly what the one-liner generator can faithfully emit; richer legacy shapes belong in their own branch of convertSettingToJsonSchema. 9Tbs (Copilot): the comment in generatedFiles.ts referenced `EXCLUDED_DIRECTORIES`, but the constant is `EXCLUDED_DIRECTORY_SEGMENTS` (renamed during the segment-boundary refactor). Update the reference so a future maintainer scanning for the rule doesn't chase a non-existent identifier. * fix(attribution): SHA-pin git notes, on-disk hash divergence detection, env -C cwd-shift tanzhenxin review #1 — Note targets symbolic HEAD, not captured SHA: buildGitNotesCommand hard-coded 'HEAD' as the target; postHead was captured at commit-detection time but only used for the !== preHead diff. Between that capture and the execFile, three more awaited git calls run — anything that moves HEAD in the same cwd (post-commit hook, chained `commit && tag -m`, parallel process) silently lands the note on the wrong commit because of `-f`. Thread postHead through buildGitNotesCommand as a required `targetCommit` arg. Test asserts the targeted SHA, not the symbolic ref. tanzhenxin review #2 — Accumulator has no baseline: recordEdit was monotonic per-path with no reset for out-of-band mutations. Re-instate FileAttribution.contentHash and: - recordEdit hashes the input `oldContent` and resets the per-file accumulator if it doesn't match what AI's last write recorded (catches paste-replace via external editor, manual save, etc. WHEN AI subsequently edits the same file again). - New validateOnDiskHashes() rehashes every tracked file's CURRENT on-disk content and drops entries whose hash diverged. Called from attachCommitAttribution before matchCommittedFiles so a commit can never credit AI for a human-only diff. Deleted files (readFileSync throws) are left alone — the commit's deletion record is what the note should reflect. tanzhenxin review #4 — Failed-commit / staleness leak: The recordEdit divergence check above + commit-time validateOnDiskHashes together catch tanzhenxin's exact scenario (AI edits a.ts → hook rejects → user manually edits a.ts → user commits → no AI credit because validateOnDiskHashes drops the stale entry). The !commitCreated branch still preserves attributions to keep the submodule case working — the staleness problem is now solved at the next commit's validation step. Self-review item — env -C / --chdir treated as repo-shifting: Added ENV_FLAGS_SHIFT_CWD set covering -C / --chdir. tokeniseSegment returns null for `env -C DIR git commit ...` segments — same contract as a leading GIT_DIR=... assignment. Without this we'd either misidentify /elsewhere as the program (silently dropping attribution) or, worse if -C went into the value-skip set, trailer-inject onto a commit that lands in /elsewhere's repo. Tests added alongside the existing GIT_DIR repo-shift cases. 339 tests pass; typecheck clean. * fix(attribution): pickBool intent-aware, shouldClear gate, ETIMEDOUT surface, drop dead exports -wgA + -wg0 (deepseek): pickBool defaulted non-boolean to true, turning a hand-edited `{ commit: "false" }` into enabled attribution. Replace with intent-aware parsing: "true"/"yes"/"on"/ "1" → true, "false"/"no"/"off"/"0"/"" → false, anything else (unknown strings, non-1 numbers, objects, arrays, null) → false. Genuinely-absent sub-fields still default to true (schema default). Migration test scenarios covered. Tests now cover ~17 input cases across both string/number/null/object/unknown forms. -wgq (deepseek): when buildGitNotesCommand returned null (oversized payload) or git notes itself failed, the finally block called clearAttributedFiles(committedAbsolutePaths) — irreversibly deleting per-file attribution data the user might need to amend & retry. Introduce a separate `shouldClear` set that's only assigned on successful note write OR explicit toggle-off. Failure paths (oversized, exitCode != 0, exception, analysis failure) leave shouldClear null so the finally block calls noteCommitWithoutClearing instead — preserving per-file state for the user's recovery. 9p7W (Copilot): execFile callback coerced ETIMEDOUT / SIGTERM (timeout) into a generic exitCode=1 warning. Detect both `error.code === 'ETIMEDOUT'` and `error.killed === true && error.signal === 'SIGTERM'` so the user-visible warning correctly names "timed out after 5s" instead of "exited 1". -wg7 (deepseek): formatAttributionSummary and getAttributionNotesRef were exported but had zero production callers (only tests). Remove the dead exports + their tests (~40 LOC). If/when a logging surface needs them, they can be re-introduced. -wgb (deepseek): tokeniseSegment doesn't recursively unwrap `bash -c '...'` / `sh -c` / `zsh -c`, so addCoAuthorToGitCommit won't splice the trailer into a wrapped command. The background refusal AND the post-commit note path DO catch the wrapped commit because stripShellWrapper at the top of execute peels the wrapper before gitCommitContext / getGitHead run — so the worst-case ("background bash -c 'git commit' bypasses the guard") doesn't materialize. The remaining gap (no Co-authored-by trailer for bash -c-wrapped commits) requires recursively splicing into the inner script with proper bash single-quote re-quoting; significant enough that it's worth its own PR. Documented as a partial-coverage limitation. 3…
…e + Agent isolation (QwenLM#4073) * feat(tools): add generic worktree support (Phase A + B of QwenLM#4056) Adds first-class git worktree as a general-purpose capability: Phase A — User-facing tools - enter_worktree: creates `<projectRoot>/.qwen/worktrees/<slug>` on a `worktree-<slug>` branch and returns the absolute path. Slug auto-generated when omitted; validated against path traversal and disallowed characters. - exit_worktree: keeps or removes the worktree (and its branch). Refuses to remove a worktree with uncommitted tracked changes or untracked files unless `discard_changes: true` is set. Phase B — Agent isolation - Agent tool gains an `isolation: 'worktree'` parameter that provisions a temporary `agent-<7hex>` worktree, prepends a worktree notice to the task prompt, and on completion either removes the worktree (no changes) or preserves it and reports its path/branch in the result. Background and foreground execution paths both wired up; rejected for fork agents. - worktreeCleanup.cleanupStaleAgentWorktrees: fail-closed sweep for ephemeral `agent-<7hex>` worktrees older than 30 days with no tracked changes and no unpushed commits. User-named worktrees are never swept. - buildWorktreeNotice helper for fork subagents (parity with claude-code). Arena compatibility - The existing Arena worktree implementation (GitWorktreeService.setupWorktrees, ArenaManager, agents.arena.worktreeBaseDir) is untouched. Arena uses its own batch APIs and `~/.qwen/arena` base dir; the new general-purpose APIs live alongside under `<projectRoot>/.qwen/worktrees/`. Subagent safety - enter_worktree / exit_worktree are added to EXCLUDED_TOOLS_FOR_SUBAGENTS so a subagent cannot mutate the parent session's worktree state. Refs QwenLM#4056 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(worktree): use path.join in expected paths so the test passes on Windows The Windows CI run reported `enter-worktree.test.ts` failing because the expected string was hardcoded with `/` while `getUserWorktreesDir()` uses `path.join`, which returns `\\` on Windows. Build the expected path via `path.join` so the platform-correct separator is compared. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(enter-worktree): treat empty name as auto-generate Some models pass `{ "name": "" }` when calling EnterWorktree, because the schema marks `name` as optional and they emit an empty placeholder. The previous validation rejected the empty string with "Worktree name must be a non-empty string", which surprised users running the auto-slug path. Now both `validateToolParams` and `execute` treat `name: ""` as equivalent to `name: undefined` and fall back to the auto-generated `{adj}-{noun}-{4hex}` slug. Explicit invalid slugs (`'../etc'`, `'a/b'`, etc.) are still rejected as before. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(worktree): address review findings 1-6 from PR QwenLM#4073 Six issues raised on the initial review; each addressed with a verifiable guarantee. 1. Real isolation for `agent isolation: 'worktree'` Before: subagent's Config still resolved `getTargetDir()` to the parent project root, so Edit/Write/Read workspace checks and Shell's default cwd silently operated on the parent tree. The cleanup helper then saw a "clean" worktree and removed it — destroying the evidence. After: the worktree is provisioned BEFORE `createApprovalModeOverride`, and the resulting agent Config has `getTargetDir`/`getCwd`/`getWorkingDir` rebound to the worktree path. Relative paths, unqualified shell commands, and glob/grep roots all confine to the worktree. 2. `exit_worktree action='remove'` now prompts in default/auto-edit modes Added `getDefaultPermission()` on the invocation: `'ask'` when action is `remove`, `'allow'` when `keep`. Brings it in line with edit, write_file, and run_shell_command. 3. Force-delete no longer silently destroys unpushed commits `removeUserWorktree` now uses `git branch -d` (refuses unmerged) by default and surfaces `branchPreserved: true` when git refuses. Added `hasUnmergedWorktreeCommits` (checks if branch tip is reachable from any other local branch or remote ref). Both the agent isolation cleanup and `exit_worktree action='remove'` use this check: if the branch has work not covered elsewhere, the worktree+branch are preserved even when `discard_changes: true` is set (there is no `discard_commits` flag — committed work is rarely what `remove` means to discard). 4. Both new tools are now deferred behind ToolSearch `shouldDefer: true` + `searchHint` on both. Verified via openai-logging: `enter_worktree` and `exit_worktree` no longer appear in the function- declaration list sent on every API request. 5. Stale-worktree cleanup is wired in `Config.initialize()` fires `cleanupStaleAgentWorktrees(targetDir)` as a non-awaited startup sweep (skipped in bare mode). Picks up orphaned `agent-<7hex>` worktrees left by crashed runs. 6. Foreground isolation no longer leaks on uncaught throw The foreground try block tracks whether the cleanup helper ran on the success path; the finally block invokes it as a fallback when the try bailed early. Mirrors the background path's pattern. Verification: - Unit tests: 83 passed (16 worktree + 64 existing agent + 3 cleanup) — no regressions. - E2E #1: agent told to write `hello.txt` via RELATIVE path — file landed at `.qwen/worktrees/agent-XXXXXXX/hello.txt`, NOT at the parent root. - E2E #3: created worktree, committed work inside it, called exit_worktree with `discard_changes=true` — refused with clear message; worktree and branch both preserved. - E2E #4: openai-logging confirms worktree tools absent from API tool list (7 tools sent instead of 9). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(worktree): address review round 2 findings (1 from tanzhenxin, 7+8 from wenshao) The first round closed the data-loss-class issues. This round addresses follow-ups from a deeper audit: 1. Stale-worktree sweep was inert on common-case repos `cleanupStaleAgentWorktrees` previously ran `git log --branches --not --remotes --oneline` from each worktree's directory — that lists unpushed commits across EVERY local branch, not just the worktree's own branch. On any repo with no remote configured (or with stray unpushed branches), the sweep refused to remove every candidate. Replaced with `service.hasUnmergedWorktreeCommits(slug)` which scopes the check to the worktree branch via `for-each-ref --contains <tip>`. Also added the `branchPreserved` warn log requested in M7 and an `fs.access` shortcut for the empty-worktrees-dir case (M8). 2. `cleanupWorktreeIsolation` and `worktreeIsolation` were inside the inner try (~660 lines from the outer catch). Hoisted both to the top of `execute()` so the outer catch can reap or preserve the worktree when anything between provisioning and the inner try throws (e.g. `createApprovalModeOverride`, agent creation). Closure carries the resolved `repoRoot` so cleanup never has to re-resolve. 3. Background error path discarded the cleanup result. Now captures `formatWorktreeSuffix(...)` and appends it to the registry's failure /cancel message, so users see the preserved path/branch even when the agent crashed before reporting. 4. `cleanupWorktreeIsolation` now treats `result.success === false` as "worktree still on disk" and surfaces it as preserved instead of silently dropping it from the result. 5. Override was incomplete. Several Config methods read `this.targetDir` directly (`getProjectRoot`, `getFileService`, etc.) — own-property getter overrides did not redirect them. Now also shadows `targetDir` and `cwd` as own properties on the agent's Config override, swaps in a `FileDiscoveryService` rooted at the worktree, and rebuilds `WorkspaceContext` to point at the worktree only. Verified end-to-end: shell `pwd > pwd-record.txt` (no directory arg) lands at `.qwen/worktrees/agent-<7hex>/pwd-record.txt`, not the parent root. 6. monorepo subdir issue. Both `enter_worktree` and the agent isolation path now resolve `git rev-parse --show-toplevel` first and anchor `.qwen/worktrees/<slug>` at the repo root. Worktrees created from any subdirectory now end up where the startup sweep can find them. 7. Replaced `git worktree add -B` (silent force-reset of pre-existing branches) with `git worktree add -b` plus an explicit existence check via `git for-each-ref` (NOT `show-ref --quiet`, which simple-git swallows). Pre-existing `worktree-<slug>` branches now trigger a clear error instead of clobbering committed work. 8. First worktree creation in a repo writes `<projectRoot>/.qwen/.gitignore` with `worktrees/` so worktree contents stay out of the parent's `git status`, glob/grep results, and bundle tools. Idempotent: never overwrites an existing file. 9. Logging across the failure paths (`enter_worktree` errors, `agent.ts:failWorktreeProvisioning`, `cleanupWorktreeIsolation`, `hasUnmergedWorktreeCommits` swallowed errors, `cleanupStaleAgentWorktrees`'s `branchPreserved` race). 10. `exit_worktree` no longer suggests `discard_changes: true` when the git status check itself fails — that would be advising the user to bypass a safety check whose precondition is unknown. Now points at the underlying repo problem. 11. `generateAutoSlug` switched from `Math.random()` (4 hex, weak RNG, one-in-65k collision) to `randomBytes` (6 hex, ~16M combinations). Two RNG sources in this file collapsed to one. Pushed back: the TOCTOU swap in `removeUserWorktree` (S6 round 1) is left as-is — `git branch -d` is the real safety, and reordering does not eliminate the window. Windows reserved-name validation (M5 round 2) deferred to a follow-up; the current allowlist already rejects path separators, `..`, leading dot/dash, and the >64-char case. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(worktree): use randomInt to silence CodeQL biased-modulo finding CodeQL's `js/biased-cryptographic-random` flagged `randomBytes(4)[i] % ARRAY.length` in `generateAutoSlug`. The math is actually exact for the current word-list lengths (256 % 8 == 0), but the lint rule does not know that — and a future contributor changing the list to a non-power-of-two length would silently introduce bias. Switched the index lookups to `crypto.randomInt(0, length)`, which uses rejection sampling and is uniform by construction. Suffix still uses `randomBytes(3).toString('hex')` since hex encoding is unbiased. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(worktree): address review round 3 findings 1-6 from PR QwenLM#4073 The previous round added `getRepoTopLevel` for `enter_worktree`'s provisioning, but missed three sibling call sites that still used the raw cwd. The double-cleanup race in the foreground path also leaked stale `[worktree preserved]` suffixes on rejected promises. All six findings from the deeper audit are addressed: 1. exit_worktree now resolves through `getRepoTopLevel()` before building its `GitWorktreeService`, mirroring `enter_worktree`. Without this, launching `qwen` from a monorepo subdirectory created the worktree under the repo root but exit_worktree looked under the subdir's `.qwen/worktrees/` and always returned "Worktree not found". Verified end-to-end: enter + exit from `packages/core/` works. 2. agent.ts cleanup helper now nulls `worktreeIsolation` immediately after capturing the closure value. The previous structure could reach the helper twice — once in the foreground try's success path and once in the foreground finally fallback (or once in the inner try and once in the outer catch on a thrown rejection). The second call would `hasWorktreeChanges()` against a directory the first call already removed, fail-closed, and emit a bogus `[worktree preserved: <missing path>]` suffix. 3. Config.initialize's startup sweep now resolves `getRepoTopLevel()` before invoking `cleanupStaleAgentWorktrees`. Without this, every subdir launch scanned a non-existent `<subdir>/.qwen/worktrees/` and the 30-day expiry sweep was permanently a no-op. 4. agent.ts's `buildWorktreeNotice` now passes `worktreeIsolation.repoRoot` as `parentCwd` instead of `this.config.getTargetDir()`. The notice's path-translation guidance (≈ "translate paths from <parent> to <worktree>") would otherwise misdirect the subagent in a monorepo subdir launch. 5. Removed dead method `GitWorktreeService.listUserWorktrees`. It had no callers anywhere in the codebase and used `execSync` in a loop (would have blocked the event loop if anyone wired it up). 6. `localBranchExists` no longer swallows git failures silently. The defensive `false` default is preserved (so `git worktree add -b` itself surfaces the conflict if the check missed an existing branch), but the catch now logs via `debugLogger.warn` so disk-full / permission / ref-store-corruption cases are visible in debug output instead of being invisible. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(worktree): address review round 4 findings (data-loss + visibility) Seven actionable findings from a deeper audit, all closed: 1. User worktree slugs could collide with ephemeral-agent shape `validateUserWorktreeSlug` did not reject names starting with `agent-`, so a user-named `agent-1234567` matched the cleanup regex `/^agent-[0-9a-f]{7}$/` and would be silently swept after 30 days along with whatever work was in it. Now reserved — clear error message points users at the cause. 2. Slug producer and consumer were string-coupled across files `agent.ts` hardcoded `agent-${hex(7)}` and `worktreeCleanup.ts` independently hardcoded `/^agent-[0-9a-f]{7}$/`. Future change to hex length on one side would silently break the other. Lifted `AGENT_WORKTREE_PREFIX`, `AGENT_WORKTREE_HEX_LENGTH`, `AGENT_WORKTREE_SLUG_PATTERN`, and `generateAgentWorktreeSlug()` to `gitWorktreeService.ts`; both call sites import them. 3. Startup sweep was invisible at default log level Fire-and-forget sweep used `debug` for errors and discarded the success count. A leak-chasing operator had no log breadcrumb. Errors promoted to `warn`; successful removals (count > 0) logged at `info`. 4. `getRepoTopLevel()` silent catch Returned `null` on any git failure with no log. Combined with `?? cwd` fallback in callers, a flaky git would have made worktree creators and the startup sweep disagree silently about which dir to use. Now logs the underlying error. 5. `hasTrackedChanges()` silent catch Cleanup's fail-closed `return true` had no log. Couldn't tell "has real changes — leave alone" from "git index unreadable — repo may be corrupt". Now logs. 6. `cleanupWorktreeIsolation` claimed `preservedPath` for a removed dir When `removeUserWorktree` returns `{ success: true, branchPreserved: true }` it has already deleted the directory and failed only on `git branch -d`. The helper still reported the (now non-existent) path as preserved. Now returns only `preservedBranch` for that case; `formatWorktreeSuffix` emits a distinct message instructing recovery via `git worktree add <new-path> <branch>`. 7. `removeUserWorktree` swallowed branch-delete failures Both `-d` and `-D` catch blocks were empty. Locked refs, perms, disk full all looked identical to "unmerged commits". Both now `debugLogger.warn` with the underlying error. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(worktree): self-review pass — reuse, parallelism, dead code Self-review caught a handful of issues across three categories: Reuse: - `pathExists` in the new code now uses the existing `fileExists` from `utils/fileUtils.ts` instead of duplicating an `fs.access` wrapper. - `worktree-` branch prefix was string-literalled in five places. Added `WORKTREE_BRANCH_PREFIX` and `worktreeBranchForSlug(slug)` exports in `gitWorktreeService.ts`; updated `gitWorktreeService.ts`, `worktreeCleanup.ts`, and `exit-worktree.ts` to use them. Future prefix changes are a single edit. Efficiency: - `Config.initialize` used two `await import(...)` calls inside the startup-sweep IIFE, paying that cost on every CLI start. Switched to static imports at the top of `config.ts` — the modules are tiny and the dynamic indirection bought nothing. - `cleanupWorktreeIsolation` in `agent.ts` ran `hasWorktreeChanges` and `hasUnmergedWorktreeCommits` sequentially. They have no data dependency on each other and each spawns its own `git` invocation; `Promise.all` halves the cleanup wall-clock on the common path. Same fix in `worktreeCleanup.ts`'s per-entry loop. - `ensureWorktreesGitignored` used `fs.access` then `fs.writeFile`, a TOCTOU race when two agent invocations created worktrees concurrently (both could pass the `access` check and the second would clobber the first's `.gitignore`). Now writes with `flag: 'wx'` and treats `EEXIST` as the no-op case — atomic in one syscall. Quality: - Dropped the `worktreeCleanupRan` boolean in the foreground execution path. `cleanupWorktreeIsolation` already nulls its closure variable at the top of every call (see the comment at its definition), so re-entries are no-ops. The boolean and its tracking were dead weight that obscured the real guard. - Trimmed the Phase-2 override comment block to drop the WHAT-stating enumerations (items 3 and 4 just narrated the lines below) and removed a navigation comment about hoisted helpers — the helpers are visible at the top of the same method. 84 unit tests pass; typecheck clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(worktree): address review round 5 — design-doc commitments + correctness Five critical findings + four suggestions, all closed. Critical: 1. Wrong base branch for agent isolation. `createUserWorktree(slug)` with no `baseBranch` arg fell back to `getCurrentBranch()` on the **main** working tree, returning `main` regardless of which branch the user was actually on. A subagent invoked from `feature-x` would silently start from `main` and produce diffs against the wrong baseline. `enter_worktree` had the same bug. Both now resolve the parent's current branch first and pass it explicitly. Verified end-to-end: `git checkout feature-x` → `enter_worktree` → worktree HEAD includes the feature-x commit. 2. `countWorktreeChanges` (used by `exit_worktree`'s dirty-state guard) missed `status.conflicted[]`. In simple-git that array is mutually exclusive with the staged/modified/etc. arrays, so a worktree mid-merge with only conflicts looked `{tracked: 0, untracked: 0}` to the guard and `action='remove'` would proceed without `discard_changes: true`. Added `+ status.conflicted.length`. 3. `exit_worktree` had no session-ownership check, contradicting the design doc's "only operates on worktrees created by THIS session". In yolo mode a prompt injection could enumerate `.qwen/worktrees/` and pass any name to drop another session's work. Now: `enter_worktree` and agent isolation write a `.qwen-session` marker into the worktree at provisioning time; `exit_worktree action='remove'` reads it and refuses if it does not match the current `Config.getSessionId()`. Worktrees from before this guard (no marker file) are treated as "owner unknown" — allowed with a warn log so the change is observable. 4. `enter_worktree` did not refuse nested invocations from inside an existing worktree, contradicting the design doc. Now rejects any cwd containing `.qwen/worktrees/` as a path component, with a clear "Already inside a git worktree…" message. Verified: enter from inside a worktree returns is_error with that text. 6. `hasTrackedChanges` (cleanup sweep) had the same `conflicted[]` gap. Rewrote to use raw `git status --porcelain --untracked-files=no` which lists every tracked change including `UU` conflict markers in a single git call and explicitly skips the untracked walk (the prior comment claimed to skip it, but `status()` always does the scan). Suggestion: 7. `buildWorktreeNotice` now receives the parent agent's actual `getTargetDir()` again (was switched to `repoRoot` in round 3 on a different reviewer's suggestion; round-5 caught that the model's inherited paths reference the parent's cwd, not necessarily the repo root, so the prior behaviour was correct). 8. Startup sweep now does `fs.access(<targetDir>/.qwen/worktrees)` *before* importing GitWorktreeService and spawning `git rev-parse --show-toplevel`. The git probe is reserved for users who actually have a worktrees directory locally — 99% of users pay only one syscall on startup. 9. Tests: - New `exit-worktree.test.ts` covers metadata, validation, `getDefaultPermission` (ask vs allow), and getDescription. - `agent.test.ts` adds three `validateToolParams` cases for the `isolation` parameter (accepted with subagent_type, rejected without, rejected for non-"worktree" values). - `enter-worktree.test.ts` adds round-trip tests for `writeWorktreeSessionMarker` / `readWorktreeSessionMarker` plus a `worktreeBranchForSlug` sanity check. - Total: 101 tests pass (was 86 → +15). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): drop unused @ts-expect-error in exit-worktree.test.ts Empty string `''` is a valid `string` type, so the @ts-expect-error directive on `validateToolParams({ name: '', action: 'keep' })` did nothing — TypeScript correctly accepted the line, and `tsc --build` in CI reported TS2578 ("Unused '@ts-expect-error' directive"). The runtime assertion already covers the case; the directive was leftover from an earlier draft. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): use importActual in ArenaManager mock to preserve new exports The Arena test mocks `gitWorktreeService.js` with a factory that returns only `{ GitWorktreeService }`. PR QwenLM#4073 added several other exports to that module (`AGENT_WORKTREE_SLUG_PATTERN`, `WORKTREE_BRANCH_PREFIX`, `worktreeBranchForSlug`, `generateAgentWorktreeSlug`, `writeWorktreeSessionMarker`, `readWorktreeSessionMarker`, `WORKTREE_SESSION_FILE`). Other modules in the dep graph reach the mocked surface — most notably `worktreeCleanup.ts` imports `AGENT_WORKTREE_SLUG_PATTERN` and `worktreeBranchForSlug`, and now reaches the mock via the static `config.ts` → `worktreeCleanup.ts` import chain added in the self-review pass. The Arena test failed at module-load with: Caused by: Error: [vitest] No "AGENT_WORKTREE_SLUG_PATTERN" export is defined on the "../../services/gitWorktreeService.js" mock. Did you forget to return it from "vi.mock"? Use `importOriginal` to capture every real export, spread it into the return object, and only replace `GitWorktreeService` (the class the test actually needs to mock). The class-level mock keeps its existing static-method shims. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(worktree): address review round 6 (5 critical + 6 suggestions) The biggest item — #1 — is a self-inflicted regression from round 5: the new agent- prefix reservation in `validateUserWorktreeSlug` rejected EVERY slug that `generateAgentWorktreeSlug` produces, since that helper emits exactly `agent-<7hex>`. Net effect: every `AgentTool isolation: 'worktree'` invocation failed at validation. The reservation now allows the canonical pattern through (everything the helper can produce) and only rejects user-chosen `agent-*` names that don't match it. Added a round-trip regression guard: 50 `generateAgentWorktreeSlug()` outputs are fed back through `validateUserWorktreeSlug` and must all pass. Other critical fixes: 2. `hasWorktreeChanges` (used by agent isolation cleanup) was the one remaining caller relying solely on `status.isClean()`. Defensive `|| status.conflicted.length > 0` so a future simple-git bookkeeping change can't let a mid-merge worktree appear clean and get auto-deleted. 3. `readWorktreeSessionMarker` swallowed every I/O error as "marker missing", which let a disk error / EACCES silently bypass the session-ownership guard. ENOENT is still treated as missing (legitimate); every other code now logs. 4. `exit_worktree` `fs.stat` catch was the same shape — every error collapsed to "Worktree not found". ENOENT → not found; everything else logs and returns a distinct "cannot access" error. 5. `cleanupStaleAgentWorktrees` `fs.stat` catch was again the same. ENOENT → silently skip (entry vanished between readdir and stat); everything else logs. Suggestions: 6. Startup sweep fast-bail was running BEFORE resolving the repo top-level. For monorepo subdir launches, `targetDir/.qwen/worktrees` never exists and the sweep early-returned — permanently a no-op. Now resolves the root first, then fast-bails against the resolved `<root>/.qwen/worktrees`. Also logs the skip case so operators can tell "skipped" from "ran, found nothing". 7. `.qwen-session` marker was visible to `git add -A` inside the worktree. Now writes a `.git/info/exclude` rule (resolved via `git rev-parse --git-dir`, since worktree `.git` is a file pointing at the parent repo's `.git/worktrees/<name>/`). Best-effort: failure to write the rule does not abort provisioning. 8. Agent isolation now refuses to provision when the parent's cwd is already inside a worktree — same regex guard as `enter_worktree`. 9. `exit_worktree`'s wrapper around `hasUnmergedWorktreeCommits` now logs at the call site so the chain (caller → reason it asked → underlying git error) is complete in operator logs. 10. Sweep now logs unconditionally at `info`. Three distinct messages: "skipped (no worktrees dir)", "ran, nothing to remove", "removed N". Tests: 11. New `execute()` coverage: • exit-worktree: session-ownership refusal, keep happy path, legacy/no-marker fallthrough with warn log, missing-worktree error, unmerged-commits guard with `discard_changes: true`, `writeWorktreeSessionMarker` round-trip. • enter-worktree: nested-guard rejection, non-git-repo error. These spin up real temp git repos (no filesystem mocking) and drive the actual tool invocation pipeline. Total: 135 tests pass (was 101 → +34). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(worktree): demote noise startup-sweep logs to debug Self-review pass applying the round-6 review-triage framework (filter #5: "If a log only fires on the happy path, it's noise.") to my own round-6 changes: - "Stale worktree sweep skipped: <dir> does not exist" — fires on every CLI start for ~99% of users who never use worktrees. - "Stale worktree sweep ran under <root>: nothing to remove" — fires on every CLI start for users who have any worktrees but no stale ones at the moment. Both are happy-path noise at `info`. Demoted to `debug` so an operator can opt in via `--debug` when they want to confirm the sweep is wired up, but normal output stays clean. Only the actually-actionable case ("removed N worktrees") stays at `info` — that's the signal someone chasing a worktree leak would grep for. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(worktree): close AUTO_EDIT bypass + parent-dirty stale-code hazard Round-7 review caught two correctness gaps: 1. exit_worktree action='remove' was still auto-approved in AUTO_EDIT `getDefaultPermission` returning 'ask' is necessary but not sufficient. `permissionFlow.isAutoEditApproved` auto-approves any tool whose `confirmationDetails.type` is 'edit' OR 'info', and `BaseToolInvocation` returns 'info' by default. So a session in AUTO_EDIT could silently destroy a worktree (with branch deletion) without a confirmation prompt — the data-loss path the round-1 `'ask'` switch was meant to close. Now overrides `getConfirmationDetails` to return `type: 'exec'` for action=remove, which keeps the prompt in AUTO_EDIT. The `keep` action still falls through to the base info-type since it is non-destructive. Regression-guard test asserts the type is 'exec' (not 'info') for remove and that the command field describes both the worktree-remove and branch-delete operations. 2. Agent isolation worktrees ran against parent's HEAD, not its working tree `git worktree add -b <branch> <path> <base>` only checks out the base ref's tip — uncommitted edits in the parent's working tree do NOT propagate. The "edit code → ask review/test agent before committing" workflow silently ran the subagent against the pre-edit HEAD and returned results that looked authoritative but reflected stale code. Reviewer offered two options: overlay parent's dirty state à la Arena (~50 LOC, edge cases), or refuse isolation when parent is dirty (~10 LOC, clear UX). Chose the latter for Phase B scope — simpler, decisive, and matches the design-doc's explicit commitment that dirty-state overlay is Arena-specific. Users can commit/stash before re-invoking agent isolation; overlay can be a follow-up if users complain about the friction. Fail-closed on the dirty-check itself (assume dirty rather than silently launch on a possibly-stale tree). Test exercises both "dirty parent → guard fires" and "clean parent → guard passes" against real temp git repos. 139 unit tests pass (was 135, +4 regression guards). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…enLM#4175 Wave 2.5 PR 10) (QwenLM#4237) * feat(serve): SSE replay sizing + slow_client_warning backpressure QwenLM#4175 Wave 2.5 PR 10. Closes the SSE replay / backpressure knobs called out in QwenLM#3803 §02 so chatty Stage 1 sessions get an honest reconnect window and operators get a heads-up signal before clients are summarily evicted. - **`DEFAULT_RING_SIZE` 4000 → 8000.** Per-session replay ring depth now matches the QwenLM#3803 §02 target for chatty sessions. - **`--event-ring-size <n>`** CLI flag (default 8000) lets operators tune the ring per daemon. Threaded `ServeOptions` → `BridgeOptions.eventRingSize` → both `new EventBus()` construction sites (fresh sessions + restore path). Validation is fail-CLOSED (positive finite integer; 0 / NaN / negative throw at boot). - **`slow_client_warning` SSE frame.** When a subscriber's queue crosses 75% full the bus force-pushes a synthetic `slow_client_warning` to that subscriber once per overflow episode, carrying `{queueSize, maxQueued, lastEventId}`. The flag re-arms after the queue drains below 37.5% (hysteresis, no flap near threshold). If the queue actually overflows after the warning, the existing `client_evicted` terminal frame path still fires. Like `client_evicted`, the warning has no `id` (synthetic frame; must not burn a sequence slot for other subscribers). - **`?maxQueued=N`** query param on `GET /session/:id/events` (range `[16, 2048]`, default 256). Lets cold reconnect clients pre-size their per-subscriber backlog so a large `Last-Event-ID: 0` replay doesn't trip the warning on the first publish. Range rationale: lower bound 16 (smaller is useless for any replay); upper bound 2048 (so a single subscriber can't pin ~1 MB just by asking). Out-of-range / non-decimal returns `400 invalid_max_queued` BEFORE opening the SSE stream — clean 4xx beats half-opening a stream + emitting a `stream_error` (which EventSource would auto-reconnect on). - **`slow_client_warning` capability tag** — single source of truth for the warning frame + `?maxQueued` query param + ring-size knob. Old daemons silently lack all of these; pre-flight via `caps.features`. - **SDK extensions** (`@qwen-code/sdk`): typed `DaemonSlowClientWarningEvent` (added to known event union and `DaemonStreamLifecycleEvent`); schema-validated by a new `isSlowClientWarningData` predicate; reducer (`reduceDaemonSessionEvent`) increments `slowClientWarningCount` + stores `lastSlowClientWarning`. Warning is **non-terminal** — `alive` stays true (only `client_evicted` / `stream_error` / `session_died` close the stream). Re-exported from the public SDK entry. - **Docs**: `qwen-serve-protocol.md` updates the features list (adds `slow_client_warning` and the previously-missing `client_identity` to match reality post-QwenLM#4231), documents the `?maxQueued` query param, adds the warning frame to the event table, and notes the new default ring size. `qwen-serve.md` adds the `--event-ring-size` flag row. Tests: 19 eventBus (4 new: warning at 75%, once per episode, no `id` on the synthetic frame, hysteresis re-arm), 106 bridge (2 new: validate eventRingSize accept/reject), 111 server (4 new: ?maxQueued accept/absent/non-decimal/out-of-range + EXPECTED_STAGE1_FEATURES update), 14 SDK daemonEvents (2 new: schema validation + non-terminal reducer behavior). 321 focused tests total, all green. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * refactor(serve): adopt PR QwenLM#4237 review feedback (eventBus polish) Address the actionable items from the Qwen Code review bot's pass on PR QwenLM#4237: - Pre-compute `warnThreshold` / `warnResetThreshold` per `InternalSub` at `subscribe()` time so `publish()`'s per-event hot path is one integer compare per subscriber instead of a multiply + compare. The `!warned` short-circuit still collapses the steady state to a single boolean read; this just shaves a multiply when the threshold check actually fires. - Document the back-of-queue ordering choice for the synthetic `slow_client_warning` frame in `EventBus.publish()`: front-push was considered but mid-stream front-insertion would mis-count `forcedInBuf` in `BoundedAsyncQueue.next()`, and `forcePush` already short-circuits via `resolvers.shift()` for the active-consumer case — the back-of-queue path only matters for stalled consumers, who can't drain regardless of warning position. - Reuse the existing `collect()` helper in the "default ring size 8000" test for consistency with the rest of the file; the new test also tightens the assertion by checking that the first retained event id is 2 (id=1 dropped by the ring) and the last is 8001. - Soften the "~500 B per session" magic number in `BridgeOptions.eventRingSize`'s JSDoc to a qualitative description (each retained `BridgeEvent` is a reference plus its serialized payload; ceiling scales as `ringSize × average-event-size`). Rejected: - Bot's claim that the error JSON contains `\`...\`` escape sequences — bot misread the JS template-literal source as the wire output; `JSON.stringify` does not escape backticks, and the existing `cwd` error messages use the same style. - Bot's "use `Record<string, never>` instead of `[key: string]: unknown`" suggestion on `DaemonSlowClientWarningData` — every other event-data type in `sdk-typescript/src/daemon/events.ts` carries the same index signature for additive-field compatibility. - Bot's "features list breaks alphabetical order" — the capability list is grouped by protocol lifecycle (health → capabilities → session lifecycle → events → permissions), not alphabetical. Tests: 139 focused tests across eventBus + httpAcpBridge + SDK daemon events — all passing. Behavior unchanged; this is hot-path micro-opt + comment polish only. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve): correct queue tagging + plumb maxQueued through SDK Address both P2 findings from the Codex review pass on PR QwenLM#4237. **Bug 1: `BoundedAsyncQueue.forcedInBuf` position-invariant break** The previous `forcedInBuf` counter only tracked LIVE-vs-FORCED correctly when all forced entries lived at the FRONT of the buffer (subscribe-time `Last-Event-ID` replay). The new mid-stream `slow_client_warning` path force-pushes to the BACK of the queue while the queue is still open, which the existing accounting was not designed for: - publish 6 events at maxQueued=8 → 75% threshold trips → force-push warning at the back → buf=[1..6, warning], forcedInBuf=1. - consumer shifts `1` → forcedInBuf decremented to 0 (incorrect: `1` was a live frame, not the forced one). - consumer drains 2..6 + warning → buf=[], forcedInBuf=0, true live count = 0, but `size` getter and `push()` cap check then use `buf.length - forcedInBuf` which drifts over subsequent refills, causing premature warn / eviction before the cap is actually reached. Replace the position-dependent counter with a per-entry `{value, forced}` tag. `liveCount` is incremented in `push()` / decremented in `next()` only when the shifted entry was non-forced — position becomes irrelevant. `size` getter returns `liveCount` directly. The class doc comment is rewritten to call out that the new tag is the position-independent replacement for the old "forced frames must stay at the front" invariant. Regression test in `eventBus.test.ts` reproduces the codex trace (warn at 75%, drain past warning, refill to cap) and asserts no premature eviction. **Bug 2: SDK does not expose `?maxQueued`** `docs/users/qwen-serve.md` and `docs/developers/qwen-serve-protocol.md` both document `?maxQueued=N` as something SDK clients can request, but `SubscribeOptions` on `DaemonClient` only declared `lastEventId` + `signal`, and `subscribeEvents()` always fetched `/events` without a query string. Typed-SDK consumers had no way to opt in without hand-crafting URLs. - Add `SubscribeOptions.maxQueued?: number` with JSDoc noting the daemon range `[16, 2048]` and the pre-flight requirement on `caps.features.slow_client_warning`. - `DaemonClient.subscribeEvents` builds the URL with an optional `?maxQueued=<n>` segment. No client-side range validation — the daemon's `parseMaxQueuedQuery` is the source of truth and returns structured `400 invalid_max_queued`; duplicating the bounds in two layers would diverge on the next tweak. - `DaemonSessionSubscribeOptions extends SubscribeOptions` so the new field flows through `DaemonSessionClient` automatically. Three new SDK tests: - subscribeEvents appends `?maxQueued=N` when set - omits the query string when absent (existing behavior preserved) - propagates a `400 invalid_max_queued` unchanged Tests: 214 focused tests across eventBus / bridge / SDK DaemonClient / DaemonSessionClient / daemonEvents, plus 111 in the server suite. All green; the new eventBus regression case proves the position-invariant fix. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * refactor(serve): adopt PR QwenLM#4237 copilot review feedback Address 6 of 8 copilot-reviewer findings on PR QwenLM#4237; the other 2 (#1 forcedInBuf live-size corruption, #5 SDK lacks maxQueued) were already fixed in bae42c8 — replied on the threads with the commit hash. - **[2] server.ts:1068** — `?maxQueued=` (present-but-empty) now fails closed with `400 invalid_max_queued` instead of silently falling back to the default queue cap. The API documents fail-closed for any malformed value before opening SSE, so an empty string is unambiguously malformed. New server.test.ts case locks this in. - **[3] commands/serve.ts:93** — CLI help text for `--event-ring-size` no longer mis-shapes `Last-Event-ID` as a query parameter. It is an HTTP header, and the daemon's SSE route does not parse a `?Last-Event-ID=` query. - **[4] docs/developers/qwen-serve-protocol.md:351** — clarify that `?maxQueued=N` controls the LIVE-event backlog cap. Replay frames are force-pushed and exempt from the cap; what consumes it is live events that arrive while the subscriber is still draining a cold-reconnect replay. Bumping for cold reconnects is still the right answer, but for the live tail, not for the replay frames themselves. - **[6] eventBus.ts:214** — stale `ringSize=4000` performance comment updated to the new `ringSize=8000` default with a note about the O(n) `shift()` cost scaling. - **[7] sdk-typescript events.ts:492** — `isSlowClientWarningData` now uses the existing `isFiniteNumber` helper instead of bare `typeof === 'number'`. Mirrors the sibling predicates and rejects `NaN` / `Infinity` payloads as schema garbage. New daemonEvents.test.ts assertions cover both. - **[8] server.ts:127** — `createServeApp`'s default-bridge construction now also forwards `opts.eventRingSize` to `createHttpAcpBridge`, symmetric with the `runQwenServe.ts` path. Direct embeds / tests that called `createServeApp` without supplying their own bridge but did pass `ServeOptions.eventRingSize` were silently getting the default 8000 ring. Tests: 326 focused tests across eventBus / bridge / SDK DaemonClient / DaemonSessionClient / daemonEvents / server. All green; the new server.test.ts case + the extended daemonEvents.test.ts assertions cover the tightened guards. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * refactor(serve): adopt PR QwenLM#4237 wenshao round-2 review feedback Six adopted findings from @wenshao's second review pass on PR QwenLM#4237. The seventh ([10] forcedInBuf 3rd case invariant) was already fixed in bae42c8 — replied on that thread. - **[9] + [14] server.ts** — Sanitize attacker-controlled values before stderr interpolation in both `parseMaxQueuedQuery` and `parseLastEventId`. New `safeLogValue()` helper uses `JSON.stringify` to escape control characters (`\n`/`\r`/…) so a URL-encoded newline in `?maxQueued=%0a` can't inject extra log lines into journald/Loki/Splunk pipelines. Matches the `workspace_mismatch` sanitization style in `sendBridgeError`. Fixed in both helpers (the sibling pre-existing `parseLastEventId` had the same shape) so the file stays consistent. - **[11] httpAcpBridge.ts** — `!Number.isFinite(eventRingSize)` was redundant: `Number.isInteger(NaN)` and `Number.isInteger(Infinity)` both return `false`, so the sibling `!Number.isInteger` already catches both. Drop the dead guard. - **[12] httpAcpBridge.ts** — Add soft upper bound `MAX_EVENT_RING_SIZE = 1_000_000` on `eventRingSize` to catch operator typos (`--event-ring-size 80000000` vs `8000000`). At ~500 B per `BridgeEvent` an 1M-frame ring already pins ~500 MB per session — well past any realistic workload. Not a security boundary (operator-controlled flag), pure typo defense. Existing bridge construction test extended with an `80_000_000` case. - **[13] commands/serve.ts** — CLI `--event-ring-size` flag now sources its default from `DEFAULT_RING_SIZE` (imported from `serve/eventBus.js`) instead of the hardcoded literal `8000`. Without this, a future bump of the bus default would silently not take effect for daemons launched through the CLI because the flag always overrides — single source of truth fixes that. - **[15] eventBus.ts** — Drop unreachable `event.id ?? this.lastEventId` fallback in the `slow_client_warning` frame. `event` is locally constructed at the top of `publish()` with `id: this.nextId++` and is guaranteed defined. Use `event.id as number` directly + an inline note about the invariant. Tests: 197 (eventBus 20 / bridge 107 / SDK DaemonClient 57 / SDK daemonEvents 14) + 112 server. All green; the new upper-bound bridge case + the existing log assertions pin the changed behaviors. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* feat(serve): mutation gating helper and --require-auth Implements issue QwenLM#4175 Wave 4 PR 15. Adds the centralized state-changing-route gate that Wave 4 follow-ups (memory CRUD, file edit, MCP restart, device-flow auth) will reuse, plus the `--require-auth` deployment knob that hardens the loopback developer default for shared dev hosts / CI runners. - `createMutationGate({ tokenConfigured, requireAuth })` factory in serve/auth.ts — per-route middleware with a 4-cell behavior matrix: pass-through under `requireAuth` or any token configured; `401 token_required` for `strict: true` routes on no-token loopback defaults; baseline pass-through otherwise. - Existing Wave 1-2 mutation routes (POST /session, /session/:id/{load, resume,prompt,cancel,model}, /permission/:requestId) opt into the default non-strict factory call as the centralization marker. Wave 4 routes will pass `{ strict: true }` to require a token even on loopback. - `--require-auth` CLI flag + `ServeOptions.requireAuth`. Boot refuses without a token; closes the `/health` exemption when on so loopback `/health` also requires bearer auth; stderr breadcrumb so the hardened mode is visible in journald/docker logs. - Conditional `require_auth` capability tag advertised only when the flag is on. New `CONDITIONAL_SERVE_FEATURES` registry primitive so future per-deployment toggles follow the same shape. - 5 new unit tests in auth.test.ts covering the gate matrix; 5 added in server.test.ts for capability advertisement, conditional tag, /health 401 under --require-auth, and runQwenServe boot refusal + happy path. 245/245 serve tests pass; typecheck + eslint clean. Refs: QwenLM#4175 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR QwenLM#4236 review feedback Three small follow-ups from the automated reviewers on PR QwenLM#4236: 1. **Drop misleading `--require-auth` from `token_required` error message** (Copilot inline auth.ts:262). The strict-mode 401 listed three remediations but `--require-auth` is paired-required with a token at boot — naming it standalone would loop the operator into a different boot error. Keep the two valid standalone fixes (env var, --token); add inline note explaining the omission. `auth.test.ts` regex updated to `not.toMatch(/--require-auth/)` to anchor the new wording. 2. **Mention `/health` gating in `--require-auth` CLI description** (auto-reviewer Medium #2). Operators flipping the flag without reading the protocol doc would get paged when k8s/Compose probes start 401-ing. One sentence in the yargs description prevents that. 3. **Drift insurance comment between registry and `CONDITIONAL_SERVE_FEATURES`** (auto-reviewer Low #3). Document the four-step procedure for adding a new conditional tag so a future contributor doesn't update only the registry and silently advertise the tag unconditionally. Notes the Map<predicate> refactor as the right move when a second tag lands. Deferred (not in this fix-up): - Module-level PASSTHROUGH singleton (High #1) — micro-optimization, unmeasurable. - Map<feature, predicate> for conditional features (High #2) — premature abstraction with one tag. - Per-route `// non-strict marker` comments (Medium #1) — noise. - `@see` cross-ref in types.ts (Low #2) — sugar. - JSDoc bullet-list vs table (Low #1) — current format is fine. Refs: QwenLM#4175 QwenLM#4236 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR QwenLM#4236 round-2 review feedback Five small follow-ups from @wenshao + DeepSeek (via Qwen Code /review) on PR QwenLM#4236: 1. **Map<predicate> refactor for `CONDITIONAL_SERVE_FEATURES`** (review threads #3254467192 + #3254485912). Two reviewers asked for the same shape on the grounds that the `Set` + per-feature `if`-branch needed FOUR coordinated changes per new conditional tag and silently fail-CLOSED when the branch was missed. The Map collapses the predicate-decision and the set-membership into one entry per feature — adding a new conditional tag is now two coordinated changes (registry + Map entry) and a missing predicate is a TypeScript error rather than a silent omission. JSDoc updated. 2. **Drift-insurance test that iterates `CONDITIONAL_SERVE_FEATURES`** (review thread #3254467192 option 1, layered on top of #1). `server.test.ts` now walks every Map entry and asserts the predicate accepts/rejects as expected; future entries that don't add an assertion branch fail the test loudly so a missing predicate cannot ship silently. Adoption-of-record for the Map shape rather than relying on a hand-maintained invariant. 3. **Cache `strictDenier` for allocation symmetry** (review thread #3254467193). Wave 4 PRs will mount strict mode on multiple routes; without the cache each `mutate({strict:true})` call would allocate a fresh 401 closure. Now both the passthrough and the strict denier are pre-built singletons. Identity assertion in `auth.test.ts` anchors the cache so a future change that loses it surfaces in CI. 4. **Doc cosmetic — extra blank line in qwen-serve.md** (review thread #3254467198). Single blank line between the `>` quoted example and the following non-quoted bash block now. 5. **Doc correctness — `require_auth` is post-auth confirmation** (review thread #3254485910 from DeepSeek). When `--require-auth` is on, the global `bearerAuth` middleware gates every route including `/capabilities`, so an unauthenticated client cannot pre-flight `caps.features` to discover that auth is required — the discovery surface is the 401 response body itself. Both `qwen-serve.md` and `qwen-serve-protocol.md` rewritten to describe the tag as a post-authentication confirmation, matching the auth.ts JSDoc which already stated this correctly. Trade-offs documented (no code change): - **Body-parser ordering** (review thread #3254485915 from DeepSeek) noted as a comment block in `auth.ts`. Strict-mode 401 fires AFTER `express.json()` because the gate is per-route middleware. On loopback no-token defaults a strict route therefore parses the request body before refusing it — bounded by `express.json({limit: '10mb'})` × `--max-connections` (256 default). Strict routes Wave 4 actually adds carry small bodies in legitimate use, so this isn't a production hot path. Future routes accepting large bodies should lift the gate to app-level (maintain a strict-path Set in `createServeApp`); flagged as a Wave 4 follow-up rather than re-architecting the helper. - **`bearerAuth` body-shape inconsistency** (review thread #3254467197 from @wenshao) flagged as a Wave 4 cross-PR follow-up. `bearerAuth` returns `{error: 'Unauthorized'}` while the strict gate returns `{code: 'token_required', error: '...'}`; SDK clients have to branch on both shapes. Standardizing `bearerAuth` to also carry a `code` field is orthogonal to this PR's scope. Validation: 260/260 cli serve tests pass (was 258 — added the drift insurance test + strict denier identity test); typecheck + eslint clean. Refs: QwenLM#4175 QwenLM#4236 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…#4247) * feat(serve): MCP client guardrails (QwenLM#4175 Wave 3 PR 14) Adds an in-process MCP client counter, slot-reservation enforcement at all 3 spawn sites (discoverAllMcpTools / discoverAllMcpToolsIncremental / readResource), new `--mcp-client-budget=N` + `--mcp-budget-mode={enforce,warn,off}` CLI flags forwarded to the ACP child via env, and additive `clientCount` / `clientBudget` / `budgetMode` / `budgets[]` fields plus `disabledReason: 'budget'` tagging on `GET /workspace/mcp`. Always-on capability tag `mcp_guardrails` with `modes: ['warn', 'enforce']` so SDK clients can pre-flight refusal semantics. Typed SSE push events (`mcp_budget_warning` / `mcp_child_refused_batch`) intentionally deferred to a small follow-up PR — the snapshot already exposes `budgets[0].status: 'warning'|'error'` + `refusedCount` so operator visibility isn't blocked. * fixup(serve): address PR 14 review (QwenLM#4247) findings 1-7 Addresses Codex + Copilot review feedback on QwenLM#4247. Seven functional and forward-compat fixes; (8) `tcp` transport mapper vs createTransport deferred pending @wenshao direction (separate core/protocol decision). 1. **Single-server rediscovery bypass** — add `tryReserveSlot` at the top of `discoverMcpToolsForServerInternal`. Pre-fix a server refused at startup could be brought online later via `/mcp reconnect <name>` and exceed the cap in enforce mode. 2. **Empty `budgets[]` when mode=off** — early `return []` in `buildBudgetCells` when mode is `off`. Protocol docs / SDK types promise empty array; pre-fix emitted a synthetic noisy cell. 3. **runQwenServe validation + env leakage** — mirror CLI budget validation in `runQwenServe` (the embedded entry point); explicitly delete `QWEN_SERVE_MCP_*` env vars when options are undefined so multiple daemons in one process don't leak prior budget config to subsequent ACP children. 4. **Disabled-vs-refused precedence + stale refusal log** — config-disable wins over budget refusal in the per-server cell; `removeServer` + `disconnectServer` drop the entry from `lastRefusedServerNames` so operator action immediately clears the budget tag. 5. **Incremental remove-before-reserve ordering** — process config-removed servers FIRST in `discoverAllMcpToolsIncremental` so freed slots are visible to subsequent `tryReserveSlot` calls. Pre-fix scenario {a,b}→{a,c} with budget=2 wasted a slot. 6. **`scope` forward-compat type widening** — `'workspace' | (string & {})` on both `ServeMcpBudgetStatusCell` and `DaemonMcpBudgetStatusCell` so SDK consumers don't break when PR 23 adds `scope: 'pool'` per the documented no-schema-bump contract. 7. **Test comment alignment** — fix "With budget=1" comment to match `clientBudget: 2` code. Plus 4 new core regression tests covering #1/#2/#4/#5, and 4 new serve tests covering #3 (boot rejection + env cleanup). 237/237 pass across the affected files (36 core mcp-client-manager + 50 acpAgent + 151 serve). * docs(serve): clarify v1 snapshot-based budget warning detection (QwenLM#4247) Address github-actions review-summary finding (I) on PR QwenLM#4247: v1 operators have no SSE push event for budget pressure yet (deferred to PR 14b), so the protocol doc should explicitly say how to detect warning / error states from the snapshot. Adds the three-way mapping `budgets[0].status` ↔ live/refused counts. * fixup(serve): address PR 14 review round 2 (QwenLM#4247 wenshao) Addresses @wenshao review on PR QwenLM#4247. Three critical safety fixes + four suggestion-level improvements. Critical (zombie slot leaks — would break `enforce` mode for the rest of the daemon's lifetime): - C2: `discoverAllMcpTools` connect() catch now releases reservedSlots + clients entry. Pre-fix one failed connect permanently consumed a budget slot. - C3: `readResource` wraps client.connect() in try/catch; on throw the slot + client entry are cleaned up before re-raising. Tracked `weReservedSlot` so the cleanup only fires for newly-created lazy spawns (reused already-CONNECTED clients are untouched). - (wenshao C1 was the rediscovery-bypass also caught by Codex + Copilot — already addressed in fixup 597f011.) Suggestion: - S4: `readBudgetFromEnv` downgrades `mode='enforce'` → `'off'` when no budget is set, mirroring the CLI + `runQwenServe` invariant. Fail-closed on operator misconfiguration rather than silently bypassing enforcement. - S5: extract duplicated `mcp_budget_decision` telemetry into private `emitBudgetTelemetry(configuredCount)`. - S6: rename `BudgetExhaustedError` constructor param `liveCount` → `reservedCount`. `reservedSlots.size` is what's blocking the new server, not the live CONNECTED count (those differ when a reserved server is disconnected). - S7a: bump accounting-failure log level — `debugLogger.debug` (gated on debug=true) replaced by `process.stderr.write` so production daemons surface slot-leak / type-mismatch failures in journald/docker logs. (S7b — expose `reservedSlots[]` on the wire for slot-leak debugging — deferred as additive; will be in PR 14b alongside the typed events.) + 3 new core regression tests (C2 leak release, C3 lazy-spawn leak release, S4 env enforce-downgrade). 626/626 tests pass across the focused suite; typecheck + lint clean. * fixup(serve): address PR 14 review round 3 (QwenLM#4247 wenshao second pass) Addresses @wenshao's second review pass on PR QwenLM#4247 (submitted 15:56Z after round 2 fixup landed). Four code fixes + three doc clarifications. Code: - R3 #5: `readResource` lazy-spawn path now checks `isMcpServerDisabled` BEFORE the budget gate. Pre-existing gap: a server disabled via `mcpServers.<name>.disabled: true` or `/mcp disable <name>` could be resurrected by any resource read. Disabled precedence over budget mirrors the per-server cell logic. - R3 #6: `buildBudgetCells` now receives the post-disabled-filter `refusedCount` so the workspace cell matches the per-server cell precedence. Pre-fix a server disabled after being refused rendered `disabled` on its per-server row but `error: budget_exhausted` on the workspace row. - R3 #7: extract `MCP_BUDGET_WARN_FRACTION = 0.75` constant. Was hardcoded in `acpAgent.buildBudgetCells` AND `commands/serve.ts` stderr breadcrumb (the latter with `Math.ceil` divergence on non-integer multiples). Pre-extract so PR 14b's dual-threshold (0.75 warn + 0.375 rearm) lands in one file. - R3 #1: env-var enforce-without-budget downgrade (already fixed in round 2 ba3e3fe S4 — reply-only on the new thread). Docs: - R3 #2: docstring on `mcpTransportOf` now spells out the `tcp` vs `createTransport` divergence + records the deferred decision (PR 14b / future core). Closes the "comment claims X but code does Y" gap. - R3 #3: comments in both `discoverAllMcpTools` catch (release slot — stop() owns lifecycle) AND `discoverMcpToolsForServerInternal` catch (KEEP slot — operator intent + health-monitor retry). Different paths, different contracts, both explicit. - R3 #4: invariant note in `readResource` lookup→reserve sequence documenting the synchronous no-await guarantee that closes the TOCTOU window. + 3 new core regression tests (readResource disabled gate, disabled-wins-over-budget precedence, MCP_BUDGET_WARN_FRACTION pin). 629/629 tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 4 (QwenLM#4247 wenshao second + third pass) Addresses @wenshao's second + third review passes on PR QwenLM#4247. One critical scope-correction (per-session vs per-workspace) + one zombie leak fix shared across three threads. Critical correction — per-session vs per-workspace (wenshao R3 line 117 docs): - Reality check: `acpAgent.newSessionConfig()` constructs a fresh `Config` + `ToolRegistry` + `McpClientManager` for EVERY ACP session. Each manager independently reads `QWEN_SERVE_MCP_CLIENT_BUDGET` env. So `--mcp-client-budget=10` with 5 sessions caps at 5 × 10 = 50 live MCP clients across the daemon, NOT 10. The "per-workspace" framing in v1 docs was incorrect. - Pragmatic v1 path (not the big refactor): rewrite docs + change `scope: 'workspace'` → `scope: 'session'` so the wire contract reflects reality. Wave 5 PR 23 (shared MCP pool) will introduce a workspace-scoped manager and add `scope: 'workspace'` cells alongside. - Files touched: `status.ts` + `sdk types.ts` (cell `scope` field widened to `'session' | 'workspace' | (string & {})` with v1 emitting `'session'`), `acpAgent.buildBudgetCells` (emits `'session'` + new code comment explaining the per-session truth), `docs/users/qwen-serve.md` (CLI flag + budget section relabel +⚠️ v1 limitation callout), `docs/developers/qwen-serve-protocol.md` (capabilities section + JSON example + paragraph rewrite + per-session detection hint). Zombie leak fix — single weReserved-pattern fix in discoverMcpToolsForServerInternal closes wenshao R3 line 546 + R4 line 639 + R4 line 929: - Same pattern as R2 C3 (`readResource`): track `weReservedSlot = reservation === 'reserved' && this.reservedSlots.has(serverName)` (the set-membership guard distinguishes a real fresh reservation from `off`-mode's no-op return). On connect-failure, release slot + drop client only when `weReservedSlot`; an `'already_held'` reconnect keeps its slot so health-monitor retry doesn't compete for capacity. - Pre-fix a brand-new server connecting via /mcp reconnect / health monitor / incremental's serversToUpdate that failed on connect() would permanently consume a budget slot under enforce mode. - Updated R3's "always keep" doc comment to reflect the new two-mode cleanup (release on fresh + keep on reconnect). - Caught and added a tripwire test for the `off`-mode no-op edge case (`tryReserveSlot` returns `'reserved'` without adding to the set in off mode — without the has-guard, my fix would have broken the pre-existing "should restore health checks after failed server rediscovery" test by deleting the failed client even in unbudgeted operation). + 2 new core regression tests (fresh-reserve connect-failure releases slot, reconnect connect-failure keeps slot). 631/631 focused tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 5 (QwenLM#4247 wenshao fourth pass) Addresses @wenshao's fourth review pass on PR QwenLM#4247. Two critical zombie-leak / staleness fixes; three reviewer findings deferred or already-addressed (replied + resolved on the threads). Critical fixes: - R5 line 956: `runWithDiscoveryTimeout` timeout handler now releases `reservedSlots.delete(serverName)` and drops the stale `lastRefusedServerNames` entry alongside the existing `clients.delete`. Pre-fix a timed-out server in `enforce` mode permanently held its budget slot; N consecutive timeouts permanently degraded daemon capacity. + regression test. - R5 line 1268-1: `readResource` lazy-spawn path drops the server from `lastRefusedServerNames` when `tryReserveSlot` returns `'reserved'` (a successful late re-reservation). Pre-fix a server refused at discovery but later re-reserved via `readResource` (e.g., after another server freed a slot) kept its stale `disabledReason: 'budget'` tag in the snapshot. + regression test. Reviewer findings deferred / already done (replied + resolved): - R5 line 1268-2 (`no try/catch around connect()` in readResource): stale view — R2 C3 fixup ba3e3fe added the try/catch with the weReservedSlot cleanup pattern. - R5 line 1274 (`BudgetExhaustedError.liveCount` semantic mismatch): R2 S6 fixup ba3e3fe renamed the param + readonly field to `reservedCount`, exactly matching the proposed semantic. - R5 acpAgent.ts null line (`Math.ceil(0.75 * budget)` for small budgets): proposed fix is semantically a no-op for integer liveCount — `liveCount >= 0.75` and `liveCount >= Math.ceil(0.75) === 1` give identical results when liveCount is an integer. The underlying "small budgets jump ok→error" observation is a real but inherent limitation of percentage-based thresholds at small N; design tradeoff, not implementation bug. 46/46 core tests pass (44 prior + 2 new R5 regression). Typecheck + lint clean. * fixup(serve): address PR 14 review round 6 (QwenLM#4247 wenshao fifth pass) Addresses @wenshao's fifth review pass on PR QwenLM#4247. Two critical fixes (one TOCTOU race, one cross-daemon env leak). Critical fixes: - R6 Thread 2 (line 956): remove the duplicate pre-reservation block in `discoverAllMcpToolsIncremental`. The reservation already happens inside `discoverMcpToolsForServerInternal` (R1 fix #1). With both sites reserving, the timeout cleanup raced against the inner connect path — `runWithDiscoveryTimeout`'s timeout handler could release the slot mid-flight while the inner `connect()` later resolved successfully, leaving a CONNECTED client with NO reservation and breaking `enforce`-mode budget enforcement. With pre-reservation removed, the inner call owns the entire reservation lifecycle (reserve → connect → release-on-failure-via-weReservedSlot → cleared-by-timeout-if-fires) at a single site. Refusal behavior is observably identical from outside. - R6 Thread 1 (runQwenServe.ts:216): per-handle env passthrough via new `BridgeOptions.childEnvOverrides` instead of mutating global `process.env`. Pre-fix concurrent embedded `runQwenServe()` handles with different MCP budgets would race on the global env — `defaultSpawnChannelFactory` snapshots `process.env` AT SPAWN TIME, so the last `runQwenServe()` call to set the var would silently win for ALL daemon handles' subsequent ACP child spawns. Wire surface: - `ChannelFactory` signature: `(workspaceCwd, childEnvOverrides?) => Promise<AcpChannel>`. - `BridgeOptions.childEnvOverrides?: Readonly<Record<string, string | undefined>>` — `undefined` value means "scrub this var from the child env" so an embedded caller can wipe a stale inherited var without touching global state. - `defaultSpawnChannelFactory` merges overrides AFTER `SCRUBBED_CHILD_ENV_KEYS` so the daemon-only secret list still wins (operators can't override the scrub). - `runQwenServe` closes over per-handle overrides; never touches `process.env`. + 3 new regression tests (incremental refusal post-pre-reservation-removal, runQwenServe-doesn't-mutate-process.env, bridge forwards childEnvOverrides to channelFactory with two concurrent bridges asserting isolation). 327/327 focused tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 7 (QwenLM#4247 wenshao sixth pass) Addresses @wenshao's sixth review pass on PR QwenLM#4247 (glm-5.1 via Qwen Code /review). One critical staleness fix + four real bug fixes + one operator-visibility breadcrumb + one refactor. Critical: - R7 #1 line 612: `discoverMcpToolsForServerInternal` now drops the entry from `lastRefusedServerNames` on successful connect+discover. Pre-fix a previously-refused server that reconnects via `/mcp reconnect` (or health-monitor retry after another server frees capacity) left the snapshot reporting `error / disabledReason: 'budget'` for a CONNECTED, working server until the next discovery pass cleared the per-pass log. Real bugs: - R7 #2 line 528: disabled gate added to `discoverMcpToolsForServerInternal`. Reachable from `/mcp reconnect`, OAuth re-discovery, and health-monitor `reconnectServer` — none of which previously checked `isMcpServerDisabled`. Pre-fix a disabled server could be resurrected through any of these paths, wasting a budget slot and registering tools the operator told us to ignore. Mirrors the bulk-discovery + readResource patterns. Optional-chain on the call to stay defensive against test fixtures missing the method. - R7 #3 line 634: transport leak in the `discoverMcpToolsForServerInternal` connect-failure catch. Pre-fix when `connect()` succeeded (transport established) and `discover()` later threw, the catch deleted the client reference without calling `client.disconnect()`, leaking the stdio child / socket until Node exit. Best-effort `await client.disconnect()` added before the map cleanup. - R7 #4 line 1302: `readResource`'s `weReservedSlot` now uses the same `reservation === 'reserved' && this.reservedSlots.has(serverName)` guard as `discoverMcpToolsForServerInternal`. Distinguishes a real fresh reservation from `off`-mode's no-op return. Maintenance-trap fix; in `off` mode the cleanup branch never fires now. - R7 #5 line 1342: `readResource` re-checks `isMcpServerDisabled` on EVERY call, regardless of whether the client was just lazy-spawned or pre-existing. Pre-fix a server connected pre-disable and then operator-disabled mid-session via settings reload still served resource reads via its existing CONNECTED client until the next incremental discovery pass called `removeServer`. Polish: - R7 #6 line 191: `readBudgetFromEnv` now emits a stderr breadcrumb when env values are invalid (`QWEN_SERVE_MCP_CLIENT_BUDGET=abc`, `QWEN_SERVE_MCP_BUDGET_MODE=foo`). Pre-fix operator typos silently fell through to "no enforcement". Same pattern as the `--require-auth` boot log. - R7 #7 line 464: extracted `dropRefusalEntry` (4 sites) + `refuseAndLog` (3 sites) helpers. Pure refactor, zero behavior change. The `readResource` refusal path now calls `refuseAndLog` before throwing `BudgetExhaustedError` so operators get the same stderr trail as bulk-discovery refusals. + 5 new core regression tests (refusal-cleared-on-success, internal-disabled-gate, discover-throw-disconnects, env-typo-breadcrumb, existing-client-disabled-rejected). 52/52 core tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 8 (QwenLM#4247 wenshao seventh pass) Addresses @wenshao's seventh review pass on PR QwenLM#4247 (gpt-5.5 + DeepSeek/deepseek-v4-pro via Qwen Code /review). One critical transport leak + three soundness/consistency fixes; one optional clarity refactor explicitly deferred. Critical: - R8 #1 line 532 (4 duplicate threads): bulk-path transport leak. Mirrors the R7 #3 fix but in `discoverAllMcpTools` instead of the per-server path. Pre-fix: when `connect()` succeeded (transport established) and `discover()` later threw, the bulk catch deleted the client reference without calling `client.disconnect()`, leaking the stdio child / WebSocket / HTTP socket for the rest of the daemon's lifetime (`stop()` can't see what we just removed from `this.clients`). Best-effort `await client.disconnect()` added before `clients.delete` + `reservedSlots.delete`. Updated the doc comment that misleadingly claimed `stop()` was the lifecycle owner — true only for slot bookkeeping, not transports. Soundness: - R8 #2 line 431: tighten `readBudgetFromEnv` mode-without-budget downgrade. Originally only `enforce` got downgraded to `off` when no budget was set; `warn` mode without a budget threshold reached `emitBudgetTelemetry` with `clientBudget: undefined`, contradicting the JSDoc invariant `mode !== 'off' ⇒ clientBudget defined`. Now both `enforce` AND `warn` downgrade to `off` when no budget is configured. The invariant comment was also weakened to match the actual `?? 0` defense-in-depth (the new R8 #5 constructor downgrade closes the remaining edge case). - R8 #5 line 302: constructor mirrors the `readBudgetFromEnv` downgrade for the direct `budgetConfig` parameter. All production callers (CLI, `runQwenServe`, env-var fallback) validate upfront, but a future code path that injects `budgetConfig` directly without re-validating would re-introduce the silent fail-open. Defense in depth. - R8 #4 line 1221: distinguish fresh vs `'already_held'` reservations in `runWithDiscoveryTimeout`'s timeout handler. New private `freshReservations: Set<string>` field marked when `weReservedSlot === true` inside `discoverMcpToolsForServerInternal` and cleared in finally / catch / success. Timeout handler now releases the slot ONLY when `freshReservations.has(serverName)` — meaning the slot was freshly reserved by THIS in-flight call. `'already_held'` reconnect timeouts (a previously-healthy server's transient hiccup) keep the slot so health-monitor retry doesn't have to compete for capacity with new servers admitted during the timeout window. Aligns the timeout handler with the connect-failure catch's `weReservedSlot` semantics — closes the asymmetry wenshao R8 #4 caught. Deferred: - R8 #3 line 332 (`tryReserveSlot` `'observed'` return value clarity): optional, non-blocking style improvement that ripples through 3 call sites + many tests for zero behavior change. Worth doing in a focused refactor PR; flagged as deferred polish, not in this fixup. + 3 new core regression tests (bulk discover-throw disconnects, warn-no-budget downgrade, constructor enforce downgrade). 679/679 focused tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 9 (QwenLM#4247 wenshao eighth pass) Addresses @wenshao's eighth review pass on PR QwenLM#4247 (glm-5.1 via Qwen Code /review). Six actionable findings adopted; two threads explained as not-actionable (one stale-view, one reviewer hallucination). Critical / real bugs: - R9 #2 line 1534: `readResource` lazy-spawn connect-failure catch now does best-effort `await client.disconnect()` BEFORE `clients.delete` + `reservedSlots.delete`. Mirror of R7 #3 (per-server discovery) and R8 #1 (bulk discovery) — closes the same transport-leak class for the third spawn path. Pre-fix: connect() establishing the transport but throwing on a later handshake step would orphan the stdio child / socket. - R9 #6 line 1521: `readResource` lazy `client.connect()` now wraps in `Promise.race` against `discoveryTimeoutFor(serverConfig)` — same per-server timeout the bulk + incremental paths use. Pre-fix a hung MCP server during a resource-read spawn would block forever and permanently consume a budget slot under enforce mode, cascading into total budget exhaustion. `serverConfig` lookup hoisted to the top of `readResource` so both lazy-spawn and existing-client branches use identical timeout behavior. - R9 #8 line 1514: `readResource` lazy spawn now calls `this.startHealthCheck(serverName)` after a successful connect. Pre-fix a lazy-spawned server that later disconnected (crash, network) had no automatic reconnect — sat DISCONNECTED until the next readResource or incremental pass. Mirrors `discoverMcpToolsForServerInternal`'s finally-block pattern. Operator-visibility: - R9 #7 (general): `readBudgetFromEnv` now writes a stderr breadcrumb when the `(enforce|warn)`-without-budget downgrade fires. Pre-fix a Docker Compose / k8s env that set `QWEN_SERVE_MCP_BUDGET_MODE=enforce` but forgot the matching `_BUDGET=N` would silently boot with enforcement off and `mcp_guardrails` capability advertised — operator only signal was the snapshot's `budgetMode: 'off'`. Now mirrors the R7 #6 invalid-value breadcrumb pattern. Doc fixes: - R9 #4 line 81: `McpBudgetConfig.clientBudget` JSDoc now reflects the R4 per-session scope correction. The doc was a leftover from the original "per-workspace" framing — every other doc surface (protocol doc, user doc, type comments on the snapshot cell, capability tag) was rewritten in R4 except this one. - R9 #5 line 870: `acpAgent.buildBudgetCells` now spells out the `liveCount` (`accounting.total`, CONNECTED only — operator observability) vs `reservedSlots.size` (all reserved including in-flight — enforcement) semantic distinction. The intentional gap was undocumented in the type signatures, JSDoc, and protocol doc; future PR 14b SSE event payloads should reference both. Not adopted: - R9 #1 acpAgent:15: claimed "MCP_BUDGET_WARN_FRACTION not exported + getMcpClient* methods don't exist + 4 tsc errors" — verified incorrect: the constant IS exported (mcp-client-manager.ts:61), the 3 methods ARE class members (lines 379, 407, 412), and `npm run typecheck` is clean across all 4 workspaces. Reviewer's tool hallucinated this critical finding. - R9 #3 mcp:410: reported the bulk-path transport leak that R8 #1 (commit 7228813) had already closed. Reviewer was on the pre-R8 commit view. + 2 new core regression tests (readResource lazy connect-fail disconnects + R9 #7 stderr breadcrumb). 57/57 core tests + 679/679 focused suite pass. Typecheck + lint clean. * fixup(serve): address PR 14 review round 10 (QwenLM#4247 wenshao ninth pass) Two non-blocking 🟢 nits — both adopted for symmetry / explicitness. - R10 line 357: constructor downgrade now emits the same stderr breadcrumb the env-var path got in R9 #7. Pre-R10 the `(enforce|warn)`-without-budget downgrade was silent for the direct-`budgetConfig` path, so a future caller bypassing CLI / env-var validation would have shipped a daemon advertising `mcp_guardrails` while silently disabling enforcement. Now boot logs surface the misconfiguration uniformly across all three resolution paths. - R10 line 1572: documented the `McpClient.disconnect()` cancel-pending-connect contract that the timeout-race cleanup relies on across all three spawn paths (lazy `readResource`, bulk `discoverAllMcpTools`, per-server `discoverMcpToolsForServerInternal`). The bulk path's production stability since QwenLM#3889 is implicit evidence the contract holds; comment makes the assumption discoverable to the next reader and notes a follow-up unit test would be valuable. No behavior change. 57/57 core tests pass. Typecheck + lint clean.
…M#4255) * feat(serve): auth device-flow route Implements issue #4175 Wave 4 PR 21. Brokers OAuth 2.0 Device Authorization Grant (RFC 8628) through the `qwen serve` daemon so a remote SDK client can trigger a Qwen-account login whose tokens land on the **daemon** filesystem, not on the client. The daemon polls the IdP itself; the client's only job is to display the verification URL + user code. Runtime locality (#4175 §11): the daemon NEVER spawns a browser or calls `open(url)` — even when running locally. Static-source grep test fails the build on `node:child_process` / `open` / `xdg-open` / `shell.openExternal` / `execa` / `shelljs` / `process.spawn` and their dynamic-import / require variants. - `POST /workspace/auth/device-flow` — strict mutation gate; returns 201 fresh / 200 idempotent take-over with `attached: true`. Per per-`providerId` singleton: a second POST while pending takes over rather than allocating a new `device_code`. - `GET /workspace/auth/device-flow/:id` — public state read. Pending entries echo `userCode/verificationUri/expiresAt/intervalMs`; terminal entries (5-min grace) drop them and surface `status/errorKind/hint`. - `DELETE /workspace/auth/device-flow/:id` — strict; idempotent (terminal → 204 no-op; unknown → 404). - `GET /workspace/auth/status` — pending flows + supported providers snapshot. v1 stub for `providers: []` (populated in fold-in 1). `DeviceFlowRegistry` (`packages/cli/src/serve/auth/deviceFlow.ts`) is the in-memory state holder: - per-`providerId` singleton with idempotent take-over - workspace-wide cap of 4 active flows (abuse defense) - 5-min terminal grace so SDK reconnects can still observe results - TTL sweeper evicts grace-expired entries every 30s - in-flight `Promise` map coalesces concurrent `start()` calls so two parallel POSTs don't double-allocate IdP `device_code` - `transitionTerminal` returns `boolean` so caller-side emit/audit guard prevents sweeper × poll-tick double-fire - `dispose()` wired into `runQwenServe.close()`'s shutdown drain; cancels `provider.poll()` mid-flight via `cancelController`, records `lost_success` audit when an IdP-minted token is dropped by transition `DeviceFlowProvider` interface accepts `start({signal})` + `poll(state, {signal})`. `QwenOAuthDeviceFlowProvider` wraps the existing `QwenOAuth2Client.requestDeviceAuthorization` / `pollDeviceToken` primitives directly (NOT `authWithQwenDeviceFlow`, which calls `open(url)`). PKCE is provider-required by Qwen but optional in the interface for future non-PKCE providers. `success.persist()` writes to disk FIRST, then updates the in-process client — a failed disk write no longer leaves the daemon with a zombie in-memory token. Maps RFC 8628 errors via an anchored regex (`^Device token poll failed: (expired_token|access_denied|invalid_grant)`) so an `error_description` containing one of those literals can't mis-classify an unrelated upstream error. `BrandedSecret<T extends string>` holds the `device_code` and PKCE verifier. Earlier draft used `new String()` wrapper which leaked through `+` / template literals (`Symbol.toPrimitive` → `valueOf` returned the primitive). Final shape: frozen plain object + `WeakMap` indirection + 4-way redaction (`toString` / `toJSON` / `Symbol.toPrimitive` / numeric coercion → `'[redacted]'` or `NaN`) + `unique symbol` brand. 6 leak-path tests: `JSON.stringify` / `String()` / concat / template / `+x` / reveal-roundtrip. 5 new daemon events (workspace-scoped, fanned out to every active session bus via `bridge.broadcastWorkspaceEvent`): - `auth_device_flow_started` — `{deviceFlowId, providerId, expiresAt}` (no userCode/verificationUri — see PR 21 design §3) - `auth_device_flow_throttled` — `{deviceFlowId, intervalMs}`, emitted only on upstream `slow_down` interval bumps - `auth_device_flow_authorized` — `{deviceFlowId, providerId, expiresAt?, accountAlias?}`; `accountAlias` is best-effort non-PII (never email/phone) - `auth_device_flow_failed` — `{deviceFlowId, errorKind, hint?}` with `errorKind ∈ {expired_token, access_denied, invalid_grant, upstream_error, persist_failed}` - `auth_device_flow_cancelled` — `{deviceFlowId}` (DELETE on pending) Workspace-scoped reducer `reduceDaemonAuthEvent` produces `DaemonAuthState { flows: Partial<Record<ProviderId, ...>> }` — parallel to `reduceDaemonSessionEvent`. Session reducer no-ops on auth events (workspace-scoped state belongs in its own reducer). `bridge.broadcastWorkspaceEvent` is intentionally distinct from PR 16's `publishWorkspaceEvent` to avoid merge conflict; collapses to the shared helper as a fold-in once #4249 lands (~25 LoC). `@qwen-code/sdk` (`packages/sdk-typescript/`): - 4 new `DaemonClient` methods: `startDeviceFlow`, `getDeviceFlow`, `cancelDeviceFlow`, `getAuthStatus` — typed against the wire shapes, errors mapped through the existing `DaemonHttpError`. - High-level `client.auth` getter (lazy `DaemonAuthFlow` singleton) exposes a `start(...).awaitCompletion()` shape mirroring `gh auth login`'s UX: print code first, let the SDK consumer decide where to open the browser. `awaitCompletion` polls GET on the daemon-supplied `intervalMs`, honors `slow_down` bumps, and fall-back-recovers from 404 (entry evicted post-grace). POST + DELETE flow through PR 15's `mutate({strict: true})` — 401 `token_required` on token-less loopback defaults. GET routes use only the global `bearerAuth`. Every state transition (`started/authorized/failed/cancelled/expired/lost_success`) records a structured stderr breadcrumb (`[serve] auth.device-flow: provider=... deviceFlowId=abc12... clientId=... status=...`) since `mutate()` doesn't carry an audit hook — events alone aren't enough since SDK can silently drop them; stderr → journald/docker logs is the unfalsifiable record. `auth_device_flow` advertised unconditionally on `/capabilities.features`. Supported providers list lives on `/workspace/auth/status` to keep the registry descriptor uniform. - `packages/core/src/qwen/qwenOAuth2.ts`: - exports `cacheQwenCredentials` (was a private function; needed by the daemon's device-flow registry) - `cacheQwenCredentials` now calls `SharedTokenManager.clearCache()` after writing, folding what was previously a paired call site at L820+L829. Idempotent change. - file mode `0o600` on `oauth_creds.json` (was default 0o666 + umask). Mirrors opencode's `auth/index.ts`. - `packages/cli/src/serve/runQwenServe.ts`: device-flow registry `dispose()` wired into the shutdown drain (BEFORE `bridge.shutdown()`). - `auth/deviceFlow.test.ts` — 21 tests: BrandedSecret leak paths, state machine (slow_down / success / error), terminal grace, concurrent-start coalescing, dispose, cancel idempotency, static- source grep against browser-spawn primitives. - `server.test.ts` — 10 device-flow integration tests: POST 201/200 take-over, strict 401, 400 `unsupported_provider`, GET / DELETE / `/workspace/auth/status`, 502 `upstream_error` mapping, sweeper-driven auto-expiry with controlled clock, capability advertisement. - `daemonEvents.test.ts` — 5 SDK reducer tests: type guards, per- provider state projection, `failed` always → `status: 'error'` (errorKind carries the kind, including new `persist_failed`), session reducer no-ops on auth events. 369/369 serve + SDK tests pass; typecheck + `eslint --max-warnings 0` clean across 14 PR 21 files. - [x] Independently mergeable (depends only on merged PR 4 / PR 7 / PR 12 / PR 15) - [x] Backward compatible (4 new routes + 1 capability tag + 5 typed events + 4 SDK helpers; existing routes/events untouched) - [x] Default off (capability advertised but no client is forced to use it; CLI `qwen` OAuth flow unchanged) - [x] `qwen serve` Stage 1 routes / SDK behavior preserved - [x] Gradual migration (v1 only `qwen-oauth`; future providers register through the `DeviceFlowProvider` interface) - [x] Reversible (revert removes 4 routes + 1 tag + 5 events with no schema migration) - [x] Tests-first (28 new tests across 3 layers) - Inline `bridge.broadcastWorkspaceEvent` → fold-in to PR 16 (#4249) `publishWorkspaceEvent` once that lands - `/workspace/auth/status` vs PR 12 `/workspace/providers` boundary — separate route in v1; merge alternative discussed - Wave 4 PRs 17/19/20 should adopt the same mutate-strict + workspace event-fan-out pattern 5 items from pre-PR specialist passes parked for a focused follow-up: `DeviceFlowEntry` discriminated union, single-source SDK status / ProviderId unions, `awaitCompletion` memoization, broadcast-100%-fail stderr elevation, SDK 404 → `not_found_or_evicted` errorKind. Refs: #4175 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 round-1 review feedback Eleven items from copilot-pull-request-reviewer's round-1 pass on #4255 — 4 inline threads + 7 from the PR-level review summary. ## Adopted (11 items, code/doc changes) - **`lastSeenAt` → `lastSeenEventId`** (`events.ts`, `DaemonDeviceFlowReducerState`). The field was set from `rawEvent.id` (SSE event id) but documented as "epoch ms" — a real semantic mismatch that would mislead consumers into time-based logic against a monotonic counter. Rename + tighten the JSDoc to describe it as an event-id counter; reducer cases updated. - **`DEVICE_FLOW_EXPIRY_GRACE_MS = 30_000` extracted** in `DaemonAuthFlow.ts` (was a magic number on `start.expiresAt + 30_000`). `AwaitCompletionOptions.timeoutMs` doc now describes the actual grace-past-expiry behavior + the rationale (clock skew + daemon sweeper interval + network latency) instead of the wrong "defaults to expiresAt - Date.now()" claim. - **Explicit `chmod 0o600`** in `cacheQwenCredentials` after every write. `fs.writeFile`'s `mode` only applies on file creation; a pre-existing `oauth_creds.json` written under a broader umask kept its old permissions across upgrades. The chmod now tightens it on every write; chmod failure (Windows / hardened FS) surfaces via `debugLogger.warn` instead of silently dropping the invariant. - **`SharedTokenManager.clearCache()` failure now logs** `debugLogger.warn` (was a silent `try { } catch { }`). In production a swallowed clearCache means in-process callers serve stale credentials until the SharedTokenManager mtime watcher catches up — a recoverable degradation worth a log line. - **Protocol doc** lists `persist_failed` in the `auth_device_flow_failed.errorKind` union (was added to the type but missed in the doc). - **`pollDeviceToken({signal})`** plumbed through `IQwenOAuth2Client` interface + `QwenOAuth2Client` impl + the Qwen device-flow provider. Cancel / dispose during a slow IdP response now aborts the in-flight HTTP socket immediately instead of waiting for the upstream timeout. Two new registry tests assert `cancel()` / `dispose()` propagate abort to the signal observed by `provider.poll`. - **`revealSecret` error message** clarified: was "secret has been GC-evicted" (impossible — WeakMap doesn't evict reachable keys). Now points at the actual reachable failure modes (forged shape / serialize+reparse losing the WeakMap binding). - **`transitionTerminal` JSDoc** clarifies that the PRIMARY guard against late timer secret leaks is the `entry.status !== 'pending'` check at the top of `runPollTick`; secret-clearing here is defense-in-depth. - **`DeviceFlowErrorKind` JSDoc'd per variant** so consumers can tell when each fires (RFC 8628 distinctions + `persist_failed` vs `upstream_error` boundary). - **Stale "PR 16 / PR 21 §3" temporal references** in `DaemonAuthFlow.ts:124` rephrased to be timeless ("workspace-scoped events fan out through whatever session buses happen to be live" — no PR number references that rot when those PRs merge). ## Not adopted (4 items, replied to in-thread) - **`authWithQwenDeviceFlow` browser-launch separation** — correct architectural advice but out of #4255 scope (would refactor a CLI auth UX module that PR 21 only touched additively). Tracked as a Wave 5 follow-up. - **Copyright header year range** — repo-wide convention "2025"; not introduced by this PR. - **Spread `...(x ? {x} : {})` → `x: x ?? undefined`** — the two are not semantically equivalent. The current form omits the key entirely on falsy `x`; the suggested form always includes the key. Tests assert object shape and would break under the change. - **Eager `client.auth` getter** — public API boundary. Lazy construction matches `DaemonSessionClient` precedent + saves the module load for SDK consumers that never touch auth. Refs: #4175 #4255 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-1 review feedback 15 items from @wenshao's review batches on #4255. Catches a handful of real bugs that the earlier round (commit 3d9f082f5) didn't surface. ## Critical fixes - **C1 — `pollUntilTerminal` providerId pass-through** (`DaemonAuthFlow.ts:185`). The synthetic 404 fallback hardcoded `providerId: 'qwen-oauth'`; the parent `awaitCompletion` already receives the real providerId via `start.providerId` but `pollUntilTerminal`'s parameter type stripped it. Add the field to the param type, propagate. - **C2 — open `errorKind` allowlist** (`events.ts`). The closed 5-value union in the type guard silently dropped any `failed` event whose errorKind the daemon added without mirroring SDK-side (e.g. a future `rate_limited`). The flow's reducer state would never transition to terminal, leaving SDK consumers stuck on `pending` forever. Open the union with `(string & {})` and accept any non-empty string in the runtime guard. Updated test asserts forward-compat behavior + still rejects the truly-malformed empty-string case. - **C3 — `persist()` timeout + signal** (`deviceFlow.ts`). A wedged disk I/O (NFS stall, encrypted-volume contention) without bounds would pin the entry in `pending` until the upstream `expires_in` elapsed (potentially minutes). The registry now passes its `cancelController.signal` AND arms a hard `DEVICE_FLOW_PERSIST_TIMEOUT_MS = 30_000` timer; persist failure surfaces as `persist_failed` immediately. The `DeviceFlowPollResult` `success` variant signature changed to `persist({signal})`. - **C4 — cancel × success race rollback** (`deviceFlow.ts` + Qwen provider). Today, if `cancel()` transitions while `persist()` is in flight, the credentials get written but the flow's status is `cancelled`. User sees cancelled, daemon disk has a valid token. `DeviceFlowPollResult.success` gains an optional `unpersist()` callback the registry calls when `transitionTerminal(authorized)` fails — the Qwen provider wires it to `clearQwenCredentials()`. Rollback failure is audited but not propagated (re-running auth would overwrite anyway). - **C5 — don't `unref()` the `awaitCompletion` sleep timer** (`DaemonAuthFlow.ts`). On a standalone Node CLI/script doing just `client.auth.start().awaitCompletion()`, the unref'd between-poll timer was the only event-loop handle, so Node could exit before the user finished authorization. The poll wait is foreground work the caller explicitly awaits — keep it ref'd. ## Information-leak fixes - **S1 — sanitize `persist_failed` hint**. `err.message` from `cacheQwenCredentials` embeds the full `~/.qwen/oauth_creds.json` path. Broadcast via SSE, that path leaks the daemon's home layout to every connected session subscriber. Replace user-facing hint with `"credentials could not be written to the daemon filesystem — check disk space and permissions"`; full err goes to stderr audit only. - **S2 — sanitize upstream `pollDeviceToken` hint**. The class embedded the entire raw IdP response body (which can be an HTML error page from a reverse proxy) into the thrown message. Same broadcast leak path. Replace upstream-error hint with `"unexpected response from identity provider"`; RFC 8628 errors use `"Qwen IdP returned ${kind}"`. ## Cleanup / forward-compat - **D1 — drop duplicate `clearCache()`** at `qwenOAuth2.ts:840`. The paired call became redundant once `cacheQwenCredentials` folded the clearCache in (PR #4255 fold-in 1). The fold-in 1 message said this would be done; the duplicate slipped through. - **S3 — drop unused `DeviceFlowNotFoundError`** (`deviceFlow.ts`). Exported but never imported; route handlers do inline 404 JSON. - **S4 — single-source SDK status / errorKind unions** (`types.ts`). `DaemonAuthDeviceFlowSdkStatus` / `DaemonAuthDeviceFlowSdkErrorKind` were parallel literal copies of the canonical events.ts definitions — drift waiting to happen. Now imported + aliased as type-only re-exports. - **S5 — broadcast 100% fail elevates to stderr** (`httpAcpBridge.ts`). Per-session bus failures stay debug-only, but a broadcast where EVERY session bus refused is operationally interesting (clients won't see the event). Track success / fail counts; `writeStderrLine` when `successCount === 0`. - **S6 — `this.disposed` check after `await provider.start()`** (`deviceFlow.ts`). `dispose()` mid-start would orphan the freshly- inserted entry (`schedulePoll` guards on `disposed` so no poll fires; the entry never transitions). Throw post-await if disposed. - **W1 — thread `signal` into `requestDeviceAuthorization`** (`qwenOAuth2.ts` + Qwen provider). `start()` had the same cancellation gap that `pollDeviceToken` had — a slow device-authorization request couldn't be aborted during shutdown. Now plumbed end-to-end. - **W2 — split `invalid_request` from `unsupported_provider`** (`server.ts`). Conflating them surfaced misleading remediation hints to SDK consumers branching on `code` ("this provider isn't supported here" when the real cause was a serializer dropping the field). Bad-shape now returns `code: 'invalid_request'`; unknown-but-well-formed stays `unsupported_provider`. - **W3 — drop never-populated `accountAlias`** (Qwen provider). The field was wired through types / events / reducer / audit but the Qwen IdP's token response doesn't carry one (no `name` / `email` / `sub`). Returning only `{expiresAt}` makes the field type-honestly absent rather than always-undefined. Future provider with an alias-bearing response can populate it. - **W4 — `DaemonAuthFlow` JSDoc accuracy**. Doc claimed "first attempts to consume an SSE event stream … falls back to GET-based polling"; actual is GET-only with SSE as a real-time hint for clients already subscribed to a session stream. - **W5 — clearer unit arithmetic** in interval normalization. The `(_INTERVAL_MS / 1000) * 1000` cancelation hid the s↔ms boundary; expanded form makes both branches unit-explicit. ## Test changes - `daemonEvents.test.ts` updated to match the now-OPEN errorKind union (forward-compat assertion + empty-string still rejected). - `deviceFlow.test.ts` `FakeProvider.poll` aligned with the new `persist({signal})` signature + optional `unpersist`. ## Validation - `npm run typecheck --workspace packages/cli --workspace packages/sdk-typescript --workspace packages/core` — clean - `npx vitest run packages/cli/src/serve/ packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 368/368 - `npx eslint --max-warnings 0` over the 11 PR 21 surface files — clean Refs: #4175 #4255 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-2 review feedback 10 new threads from @wenshao's second deep-review pass on #4255. Verified status: 5 real issues, 1 improvement, 3 stale (already fixed; comments lagged), 1 false alarm (typecheck demonstrably clean). ## Critical fixes - **fold-in 2 C4 REVERSED**: when `provider.poll()` returns success AND `cancel()` / `dispose()` transitioned the entry mid-`persist()`, the registry now FORCES the entry to `authorized` and keeps the on-disk credentials. The earlier rollback (`unpersist()`) wasted the user's IdP approval because the RFC 8628 `device_code` is single-use — re-running the flow would force them through the whole browser-prompt + paste-code dance again for a click whose intent was likely "stop the wait" rather than "undo my already- completed approval". Aligns with gh CLI / Auth0 SDK / git- credential-manager. Audit captures the race via `hint: 'lost_success_kept ...'`. `DeviceFlowPollResult.success.unpersist` field + Qwen provider's `clearQwenCredentials` rollback removed. - **#1 GET /workspace/auth/device-flow/:id strict gate**: this GET surfaces `userCode` / `verificationUri` for pending entries, which on the loopback no-token default were readable by any local process. POST + DELETE were already strict; aligning GET closes the information-disclosure asymmetry. `/workspace/auth/status` stays bearer-only (its `pendingDeviceFlows` entries intentionally omit `userCode`). - **#2 `inFlightStarts` hard timeout**: a hung `provider.start()` (network partition, unresponsive IdP) used to leave the per- `providerId` slot in `inFlightStarts` occupied forever, blocking every subsequent POST until daemon restart. New `DEVICE_FLOW_START_TIMEOUT_MS = 30_000` arms a timer that `cancelController.abort()`s the start; the rejected promise unwinds through the `try/finally` clearing the slot. - **#10 chain-completing the C3 persist-timeout**: the earlier C3 fix armed a 30s timer that fired `cancelController.abort()` then `await result.persist({signal})`, but the chain ended at the registry boundary — `cacheQwenCredentials` didn't take a signal, so `fs.writeFile` couldn't be aborted. Now `cacheQwenCredentials` accepts an optional `{signal}` and threads it into `fs.writeFile(..., {signal})` (Node native). The Qwen provider's `persist({signal})` forwards the entry's `cancelController.signal` end-to-end. ## Improvement (#4): 404 fallback errorKind `pollUntilTerminal`'s 404 catch used to synthesize `{status: 'expired'}` for ALL evicted entries — conflating "your flow expired during your disconnect", "the daemon was restarted", and "your deviceFlowId was wrong". Now returns `status: 'error'` + `errorKind: 'not_found_or_evicted'` + a `hint` so SDK consumers branching on errorKind can distinguish. ## Information leak (#9): start() path raw IdP message S2 (fold-in 2) sanitized `poll()`'s upstream-error hint, but `start()` still embedded the raw `err.message` (full IdP response, potentially HTML from a reverse proxy / WAF) into the `UpstreamDeviceFlowError` that flowed to SDK clients via the 502. Now uses static messages for the SDK-visible errors; raw detail goes through `writeStderrLine` for operator audit only. Mirrors S2's approach. ## Stale comments cleaned (#5, #7) `qwenDeviceFlowProvider.ts:177` claimed `cacheQwenCredentials` "doesn't currently take a signal — that's a follow-up". After #10 above, that's no longer true; the comment is replaced with the actual end-to-end signal-threading note. ## Not adopted (1 false alarm) - Thread on `types.ts:330` claimed type-only-import-after- declarations breaks `tsc` and fails `daemonEvents.test.ts:670` with TS2345. Demonstrably false: `npx tsc -p packages/sdk-typescript/tsconfig.json --noEmit` exits 0; `daemonEvents.test.ts` is the post-fold-in-2 file with the open-allowlist assertion (test 28/28 passes). The reviewer may have been looking at a transient state during their analysis. ## Validation - `npm run typecheck --workspace packages/cli --workspace packages/sdk-typescript --workspace packages/core` — clean - `npx vitest run packages/cli/src/serve/ packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 398/398 pass - `npx eslint --max-warnings 0` over the PR 21 surface — clean Refs: #4175 #4255 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-3 review feedback 5 new threads from the third deep-review pass on #4255. 3 real issues fixed; 1 stale (already done in fold-in 3); 1 deferred as non-blocking design suggestion. - **A — `expiresIn` / `interval` non-finite guard** (`deviceFlow.ts`). The provider contract types both as `number`, but a misbehaving / future provider could hand `undefined` / `NaN` / `Infinity`. `Math.max(0, NaN) * 1000` is `NaN`, then `now() + NaN` is `NaN`, then `now >= NaN` is always `false` — the sweeper would NEVER evict the entry, pinning an upstream `device_code` slot until daemon restart. Same hazard on `interval * 1000` (NaN → `setTimeout(NaN)` fires immediately, Infinity → scheduler clamps to TIMEOUT_MAX). Now both fields go through `Number.isFinite(x) && x > 0`; missing/bad values fall back to RFC 8628's recommended ceilings (10 min for expiry, 5s for interval). - **D — typed `app.locals` accessor** (`deviceFlow.ts` + writer/reader call sites). The `app.locals['deviceFlowRegistry']` string key was shared between `createServeApp` (writer) and `runQwenServe` (reader); a typo on either side would compile cleanly and the shutdown dispose call would silently no-op, leaving polling timers running until the `unref()` rescue. New `setDeviceFlowRegistry(app, registry)` / `getDeviceFlowRegistry(app)` pair gives both call sites type-checked access; the string literal is encapsulated in one module. - **E — `UnsupportedDeviceFlowProviderError` docstring** (`deviceFlow.ts`). After fold-in 2's W2 fix split `invalid_request` from `unsupported_provider`, the route layer screens unknown ids against `DEVICE_FLOW_SUPPORTED_PROVIDERS` before reaching the registry — so this error is now reachable ONLY on a daemon-internal invariant violation (id is declared supported but not registered in the runtime provider map). Docstring + thrown message updated to reflect that this branch signals a programmer error, not user input. - **B** claimed `cacheQwenCredentials(credentials)` doesn't forward signal to `fs.writeFile`. Verified: fold-in 3 (#10) at `qwenDeviceFlowProvider.ts:204` calls `cacheQwenCredentials(credentials, { signal: persistOpts.signal })` and the core helper threads it into `fs.writeFile(..., {mode, signal})`. The reviewer was looking at the comment block above (lines 174-181) without scrolling to the actual call site. - **C — SDK `cancelDeviceFlow` lossy 204/404 collapse**. Suggested returning `{existed: boolean; alreadyTerminal: boolean}` instead of resolving void on both 204 and 404. Real signal-loss but tagged "[非阻塞]" by the reviewer; changing requires a daemon route shape change (200 + body instead of 204) which is better as a focused follow-up PR. Acknowledged in-thread; deferred to a fold-in PR after #4255 lands. - `npm run typecheck` — clean across `packages/{cli,sdk-typescript,core}` - `npx vitest run packages/cli/src/serve/ packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 398/398 - `npx eslint --max-warnings 0` over the PR 21 surface — clean Refs: #4175 #4255 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-4 review feedback 4 threads from the fourth review pass on #4255. 3 adopted + 1 deferred (out-of-scope rename of PR 15's `mutate` helper). ## Adopted ### #1 — `persistInFlight` flag suppresses cancel × persist event-stream UX trap When `provider.poll()` returns success and we await `persist()`, a concurrent `cancel()` would synchronously transition the entry to `cancelled` and emit `auth_device_flow_cancelled` — then `persist()` resolves and (per fold-in 3 C4) force-overrides to `authorized` + emits `auth_device_flow_authorized`. The reducer state correctly last-write-wins on `authorized`, but DIRECT event-stream consumers (close-dialog handlers, telemetry, UI cleanup) race onto an unmounted UI when the second event lands. Now: while persist is in-flight, `cancel()` and the sweeper SKIP the state transition + event emit. They register intent (set `cancelRequestedDuringPersist=true` for cancel; sweeper just no-ops) and let the persist resolution decide: - persist succeeds → `authorized` (IdP wins per fold-in 3 C4) - persist fails AND cancel was requested → `cancelled` - persist fails AND `now >= expiresAt` → `expired` / `expired_token` - persist fails otherwise → `error` / `persist_failed` Result: at most one terminal event per flow. Imperative SSE consumers no longer see oscillating terminal states. Audit captures the race (`hint: 'lost_success_kept ...'`) for incident-response correlation. ### #2 — `revealSecret` → `unsafeRevealSecret` rename The earlier JSDoc claimed "the `unsafeReveal_` naming is intentional: greppable in code review, easy to allowlist in lint rules, hard to invoke by accident" — but the actual function was named `revealSecret`. The promised safety properties didn't exist; a code reviewer wouldn't single out `revealSecret` as suspicious, and a `no-restricted-syntax` ESLint rule wouldn't flag it. Renamed to `unsafeRevealSecret` so the JSDoc-promised "greppable" / "lintable" property is now actually true. Two call sites in the Qwen provider + 4 test references updated. Internal symbol; not exposed through the SDK package. ### #4 — `QwenOAuthPollError` typed class replaces substring regex The earlier RFC 8628 error mapper used an anchored regex against the thrown error message text — an implicit cross-file string contract between `qwenOAuth2.ts` (throws) and `qwenDeviceFlowProvider.ts` (matches). If `qwenOAuth2.ts` ever changed its message format, ALL RFC 8628 errors (`expired_token` / `access_denied` / `invalid_grant`) would silently fall through to `upstream_error` — wrong errorKind flowing through telemetry with no test or type-system check to catch the drift. Now `QwenOAuth2Client.pollDeviceToken` throws a structured `QwenOAuthPollError extends Error` with `oauthError` / `description` / `status` fields. The provider branches on `instanceof QwenOAuthPollError` and reads `.oauthError` directly via a dedicated `mapRfc8628OAuthCode(code)` switch. The drift hazard is gone: a future code change that touches the typed class will fail tsc until both sides are updated. Message format preserved for any pre-existing log-parsing / substring matchers. ## Not adopted ### #3 — `mutate({strict:true})` semantic awkwardness on GET Reviewer correctly noted that `mutate` is named for state-changing routes, but `GET /workspace/auth/device-flow/:id` uses it for an information-disclosure defense (only reachable code path is reading state). Suggested rename: `mutate` → `strictHttpGate`. Deferred: the rename touches PR 15's helper which has many call sites in `server.ts` and is shared infrastructure for Wave 4 PRs 17/19/20. PR 21 is the first / only consumer of the strict-on-GET form so far; widening the rename to a Wave 4 follow-up keeps the fold-in scope tight. Replied in-thread. ## Validation - `npm run typecheck` — clean across `packages/{cli,sdk-typescript,core}` - `npx vitest run packages/cli/src/serve/ packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 544/544 - `npx eslint --max-warnings 0` over the PR 21 surface — clean Refs: #4175 #4255 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-5 review feedback Five small adopt items from the round-5 review pass; one stale thread already addressed in b5b77ee90 (fold-in 5). #2 — `as const` + derived type for DEVICE_FLOW_SUPPORTED_PROVIDERS so adding/removing a provider id requires touching exactly ONE site. Mirrors `SERVE_ERROR_KINDS` / `ServeErrorKind` in `status.ts`. #3 — Clarify `DEVICE_FLOW_EXPIRY_GRACE_MS` JSDoc to distinguish the daemon's 30s SWEEP cadence (what the grace tracks) from the 5-min TERMINAL_GRACE_MS reconnect window (which awaitCompletion does NOT need to wait through). #4 — Add `@remarks` block on `DeviceFlowProvider.poll()` warning future provider authors that thrown `err.message` flows verbatim into the SSE-broadcast `auth_device_flow_failed` hint, and must be sanitized. Two equally-correct paths documented (typed `error` result vs sanitized thrown message). #5 — Truncate raw IdP detail in `qwenDeviceFlowProvider.ts` stderr audit lines to 2 KiB. WAFs / reverse proxies can return MB-sized HTML error pages, and container log aggregators (Loki, Fluent Bit, Stackdriver) typically truncate or drop lines past 4-32 KiB — losing the useful prefix downstream. 2 KiB retains structured JSON envelopes while staying well below every aggregator's per-line cap. #6 — Track latest `originatorClientId` on per-provider singleton take-over via new `entry.lastOriginatorClientId` field + `recordTakeover()` helper. When a second SDK client posts `POST /workspace/auth/device-flow` for an already-pending provider (or one being created in `inFlightStarts`) with a different `initiatorClientId`, an audit breadcrumb records the take-over so incident response can correlate "client A started, client B took over at 12:34". Event-routing intentionally still uses the original `initiatorClientId` (events are workspace-broadcast and changing the originator field mid-flow would break SDK reducers that key on it). Two new tests cover the differing-id audit + same-id no-op. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-6 review feedback Six "Critical" findings from a gpt-5.5 /review pass — all real liveness/correctness defects in the daemon's auth device-flow path and the SDK's awaitCompletion polling loop. #1 — Make `provider.start()` timeout authoritative via `Promise.race` in `DeviceFlowRegistry.doStart`. The earlier shape only ABORTED the signal on timeout; a provider that ignores `signal` (non-abortable I/O, future implementer who forgets to thread it to `fetch`) would leave the `await` hanging until daemon restart, pinning the `inFlightStarts` slot for that providerId. Race against a rejecting timer makes the timeout authoritative regardless of provider cooperation; abort still fires for cooperative cleanup. #2 — Same shape for `result.persist()` in the success branch of `runPollTick`. A future provider whose persist performs non-abortable steps (mkdir/chmod/mv outside the abortable fs.writeFile path) would otherwise hang the poll tick until process restart. Race against rejecting timer; rejection maps to `persist_failed`. #3 — Clamp `expiresIn` and `interval` upper bounds. Previous `Number.isFinite + > 0` guards stopped NaN/Infinity but a finite extreme like `1e12` was still accepted — pinning the per-provider singleton for ~30,000 years (`expires_in`) or scheduling a TIMEOUT_MAX-clamped poll that never fires within `expiresAt` (`interval`). Two new constants (`DEVICE_FLOW_MAX_EXPIRES_IN_SEC = 3600`, `DEVICE_FLOW_MAX_INTERVAL_MS = 60_000`) cap the worst case. #4 — Extract `getDeviceFlowOrSynthetic404(...)` helper in `DaemonAuthFlow.ts` and route BOTH the loop body and the timeout-ceiling final read through it. Previously the ceiling read went directly through `client.getDeviceFlow` and a 404 at the boundary (entry evicted just as the timeout fired) would reject with `DaemonHttpError(404)` instead of returning the structured `{ status: 'error', errorKind: 'not_found_or_evicted' }` that the rest of `awaitCompletion` promises. #5 — Validate `AwaitCompletionOptions.timeoutMs` and `pollOverrideMs` with `Number.isFinite + > 0`. NaN slipped past the previous `?? default` form (NaN is truthy-ish in that position) and produced a `ceiling` of `NaN` (loop runs forever — `now >= NaN` always false) or a `setTimeout(NaN)` (Node clamps to 1ms — tight polling loop). Sanitize to `undefined` so the documented defaults take effect. #6 — Thread `signal` into `DaemonClient.getDeviceFlow` and forward to `fetchWithTimeout` (which already composes caller + timeout signals). awaitCompletion now passes `opts.signal` from both GET sites. Without this, an `awaitCompletion` caller that aborts mid- poll could not cancel an in-flight stalled GET; it would have to wait for the daemon-side `fetchTimeoutMs` (30s default) to fire. Four new tests in `deviceFlow.test.ts` pin the new behaviors: hanging-start timeout (#1), hanging-persist → persist_failed (#2), extreme-expiresIn clamp (#3), extreme-interval clamp (#3). FakeProvider gained a `startHangs` flag for the non-cooperative provider scenario. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-7 review feedback Two findings from a DeepSeek /review pass; both small but legitimate defense-in-depth gaps. #1 — `runPollTick`'s catch block forwarded `err.message` verbatim into the SSE-broadcast `hint`. The provider's `@remarks` contract (fold-in 6 #4) requires throwers to sanitize, but if violated the unbounded raw payload would reach every SSE subscriber. Added `DEVICE_FLOW_POLL_HINT_MAX_LEN = 256` + `truncatePollHint()`, applied to the catch's `result.hint`. Full raw `err.message` is still routed to the audit trail (`audit?.record({hint: 'provider.poll() threw (raw): ...'})`) so operator visibility for incident response is preserved. Belt-and-suspenders: the contract is now structurally enforced rather than relying on every future provider author to read the JSDoc. #2 — `updateMatchingFlow` (and the `started`/`authorized` handlers in `reduceDaemonAuthEvent`) unconditionally overwrote state without comparing `rawEvent.id` against the existing flow's `lastSeenEventId`. The field's JSDoc documented it as a monotonic counter to prevent stale frames from overwriting newer state, but the code didn't enforce that contract. SSE reconnect with `Last-Event-ID < terminal-frame-id` would replay older frames; if any of them were for the same `deviceFlowId` (e.g. a delayed `failed` arriving after `authorized`) the stale frame would overwrite the terminal. Daemon-side `transitionTerminal` makes the exact reachable scenario thin, but the documented contract should match the code. Threaded `rawEventId` into `updateMatchingFlow` and added the gate there + in the `started` and `authorized` handlers (the two cases that don't go through `updateMatchingFlow`). Synthetic frames without an envelope `id` (`rawEventId === undefined`) bypass the gate — they originate inside SDK reducer machinery and aren't subject to replay ordering. Three new tests pin the contracts: - `runPollTick catch truncates the SSE hint and preserves raw on the audit (fold-in 8 #1)` — `pollThrowsWith` flag on FakeProvider models a non-conforming provider; SSE hint < 400 chars + contains 'truncated'; audit hint contains the full 4_000-char raw. - `reduceDaemonAuthEvent rejects out-of-order frames (fold-in 8 #2 monotonicity)` — stale `failed`(id=7) does NOT overwrite `authorized`(id=10); stale `started`(id=4) for a different flow also rejected. - `reduceDaemonAuthEvent passes synthetic frames (no envelope id) through the gate` — SDK-internal frames without `id` are honored. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-8 review feedback Twelve correctness + structural fixes from a wenshao + DeepSeek + gpt-5.5 review pass. Tests deferred to fold-in 10 (separate, larger commit). CRITICAL CORRECTNESS #7 — `provider.persist()` Promise.race could publish `persist_failed` to SSE while a non-cooperative provider was still committing credentials to disk. Added an independent tracker on the original persist promise: if the race timed out (`persistTimedOut === true`) AND the underlying persist later resolved successfully, audit a `lost_success_after_timeout` breadcrumb so operators see the inconsistency. Tightened the persist `@remarks` contract to require signal honoring end-to-end. Qwen provider already complies (fold-in 3 #10); this is forward-defense for future providers. #11 — auth surface (`DaemonAuthFlow`, `reduceDaemonAuthEvent`, `createDaemonAuthState`, `DEVICE_FLOW_EXPIRY_GRACE_MS`, all event / data / state types) was re-exported from `src/daemon/index.ts` but NEVER from the published SDK entry `src/index.ts`. SDK consumers got `undefined` for everything except `client.auth.start()` (which traveled through the already-exported `DaemonClient`). Added the missing exports and pinned via `daemon-public-surface.test.ts`. #12 — `core/src/qwen/qwenOAuth2.ts:373`'s `debugLogger.debug('Device authorization result:', result)` writes the raw `device_code` (RFC 8628 bearer-equivalent credential) to stderr / journald, bypassing the `BrandedSecret` redaction layer. Pre-existing on main but PR 21 expanded the exposure surface. Sanitized to log only `{ ok, expires_in }` on success / `{ ok, error }` on error. #13 — `runPollTick` success-branch persist-failure × past-`expiresAt` classified as `expired_token` instead of `persist_failed`, routing operators toward "tell user to retry" (RFC 8628 expiry) when the actual root cause was disk I/O. Reclassified to `persist_failed` with a `persist_also_failed_past_expiry` audit hint to preserve the timing detail for incident response. SMALL CORRECTNESS #1 — `runPollTick` catch hint replaced with a STATIC bounded message ("provider.poll() failed; see daemon audit log for details"). The fold-in 8 truncated-prefix approach could still leak the first 256 chars of provider-templated raw text including secret material. Full raw still routed to audit channel for operator visibility. #5 — `cancellerClientId` field added to `DeviceFlowEntry`; deferred- cancel branch in `cancel()` now stamps it on the entry, and the persist-resolution `cancelled` event publish uses `entry.cancellerClientId ?? entry.initiatorClientId`. SSE consumers that suppress self-emitted events can now attribute the cancel correctly. #6 — `AwaitCompletionOptions.timeoutMs === 0` (the documented "settle immediately, return current daemon view" contract) was treated as falsy by the `?` ternary, falling back to the default. `sanitizePositiveMs` now takes an `allowZero` opt-in; the ceiling computation uses `!== undefined` instead of truthy check. #8 — `EventBus.publish()` returns `undefined` for closed buses (it does NOT throw). `broadcastWorkspaceEvent` previously counted that path as success, hiding the all-buses-dropped operator alarm. Folded the closed-bus-as-failure check into the canonical `publishWorkspaceEvent` (see #X below). #9 — start-timeout Promise.race rejected with a plain `Error`, falling through `sendBridgeError` to a generic 500. Switched to `UpstreamDeviceFlowError` so a hung IdP correctly surfaces as 502 (matching the envelope every other IdP start failure uses). STRUCTURAL #3 — Three identical `transitionTerminal + publish + audit` expired_token blocks in `runPollTick`/`sweep`/(removed by #13) deduplicated into a private `expireEntry()` helper. Future event- shape changes are now a one-edit operation. #X — PR 16 (#4249) merged on 2026-05-18 06:27Z. Per the inline comment at httpAcpBridge.ts:501, PR 21's `broadcastWorkspaceEvent` was kept distinct only to avoid the merge conflict; once PR 16 landed, it became a fold-in candidate. Folded the closed-bus + all-failed-stderr-escalation operator-visibility features (PR 21's S5 + fold-in 9 #8) INTO `publishWorkspaceEvent`; dropped `broadcastWorkspaceEvent` from the bridge interface + impl + test mocks. PR 21's deviceFlowEventSink now calls `bridge.publishWorkspaceEvent` — single canonical workspace fan-out. DOC #16 — Added a "Cross-client take-over" paragraph to `docs/users/qwen-serve.md` explaining that two clients on the same daemon for the same provider get the per-provider singleton with `attached: true`/`false` distinguishing them; no separate event fires (both eventually observe the same `auth_device_flow_authorized`). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao round-9 review feedback Two small non-blocking items from the round-9 pass; defensive shape + docs only. The 4 deferred test-coverage threads (#1-4 of round-8) are still tracked for fold-in 10. #6 — `lastSeenEventId` typed `number` with `?? 0` defaults in the `auth_device_flow_started` reducer case. The daemon-side `EventBus` assigns ids ≥ 1 so the `0` sentinel has no real-traffic meaning, but the monotonic gate (`rawEventId <= flow.lastSeenEventId`) would reject any future SDK-internal synthetic frame using `id: 0`. Changed the field type to `number | undefined` and dropped the `?? 0` from the started case. The `updateMatchingFlow` / `auth_device_flow_authorized` guards already short-circuit on `existing.lastSeenEventId !== undefined`, so undefined is safe end-to-end. Existing 34 reducer tests still pass unchanged. #7 — Added `@remarks` block to `DeviceFlowErrorKind.persist_failed`'s JSDoc explaining the lost-success retry UX. When fold-in 9 #7's `lost_success_after_timeout` audit fires (non-conforming provider violates signal contract; disk write succeeds after registry published `persist_failed`), a naive SDK retry hits the IdP a second time with a fresh `device_code` and prompts the user twice — but the first credential set is already valid. JSDoc now documents the mitigation: SDK consumers writing retry logic on `persist_failed` should call `client.auth.getStatus()` BEFORE re-prompting; operators can grep stderr/audit for `lost_success_after_timeout` to detect occurrences. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * test(serve): fold-in 10 — auth device-flow test bundle (#4255) Lands the four deferred test-coverage items the round-8 review flagged (and round-9 re-surfaced) as a hard merge prerequisite. Net +41 tests across registry / SDK helper / client HTTP / HTTP route layers. #1 — `deviceFlow.test.ts` `persist failure paths` describe (3 tests, +3). The success arm's three terminal mappings — pure `persist_failed`, `cancelled` (cancel during persist), and `persist_failed` past `expiresAt` (the fold-in 9 #13 reclassification with `persist_also_failed_past_expiry` audit hint) — were 0-covered. Now pinned. Test #2 also asserts the fold-in 9 #5 cancellerClientId routing on the deferred `cancelled` event. #2 — new `DaemonAuthFlow.test.ts` (+14 tests). Mock DaemonClient with sequenced `getDeviceFlow` replies. Covers happy-path polling → `authorized`; `slow_down`-driven `intervalMs` bump firing `onThrottled`; `signal.abort()` rejection; `signal` propagation through `client.getDeviceFlow` (fold-in 7 #6); `timeoutMs` ceiling final-read; `timeoutMs:0` immediate-return (round-9 #6); NaN/Infinity → `sanitizePositiveMs` fallback to default ceiling (fold-in 7 #5); 404 → synthetic `error`/`not_found_or_evicted` (fold-in 3 #4) at BOTH the loop body AND the timeoutMs ceiling read (fold-in 7 #4); non-404 DaemonHttpError rethrown; `cancel()` and top-level `status()`/`cancel()` wrappers forward correctly. #3 — `DaemonClient.test.ts` `device-flow methods` describe (+11 tests). POSTs `/workspace/auth/device-flow` happy path + clientId header + body shape; 200/201 acceptance; non-2xx → `DaemonHttpError`. GETs URL-encode the deviceFlowId; forward `opts.signal` to `fetchWithTimeout`'s composed signal (fold-in 7 #6 — verified by aborting caller signal and observing the fetch's signal flip to `aborted`); 404 throws. DELETEs swallow 204 + 404 (idempotent, mirrors `closeSession`); non- 204/404 throws. `getAuthStatus` plain GET. `client.auth` lazy-instantiated singleton. #4 — `server.test.ts` 5 supplementary contract tests (+5). The existing 8 `it()`s cover happy paths + take-over + 401 POST + DELETE pending/terminal/unknown + 502 upstream + sweeper. This commit plugs gaps: 400 `invalid_request` for missing / non-string providerId (fold-in W2 split this from `unsupported_provider`); 409 `too_many_active_flows` (via injected fake registry); 401 `token_required` on DELETE without bearer; the asymmetric GET posture (`/workspace/auth/device-flow/:id` IS strict-gated to prevent peer-process userCode shoulder-surf; `/workspace/auth/status` stays read-only because its `pendingDeviceFlows` entries intentionally redact `userCode`). Validation: cli serve 631/631 (+8 from #1, #4); sdk 384/384 (+25 from #2, #3, +/- some pre-existing churn). Typecheck + lint clean. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(qwen): atomic temp+chmod+rename in cacheQwenCredentials (PR #4255 round-11 #2) gpt-5.5 /review flagged a real correctness/security gap: the post-write `chmod` ordering left a window where freshly-written credentials could land in a broadly-readable existing `oauth_creds.json` before the chmod tightened it. On POSIX, a chmod failure additionally degraded to a debug warning while the broadly-readable tokens stayed on disk. New shape mirrors the standard atomic-write idiom: 1. Write `${filePath}.tmp.${pid}.${randomUUID()}` with `mode: 0o600`. The temp path doesn't exist beforehand, so the `mode` flag actually applies on creation (it doesn't on existing files, which was the root of the original race). 2. Defensive `chmod` on the temp file. POSIX failure is now a HARD ERROR (refuses to publish broad-perm credentials to the canonical filename). Windows logs a debug breadcrumb and proceeds, since chmod is a no-op on most NTFS volumes (perms go through ACLs). 3. Atomic `fs.rename` over `filePath`. The canonical path is ALWAYS at `0o600` from the moment it contains the new tokens; readers see either the old creds or the new creds, never a partially-written or broadly-readable state. 4. Best-effort `fs.unlink` of the temp file on any failure path so failed writes don't leave `.tmp.<pid>.<uuid>` litter on disk. Test mock in `qwenOAuth2.test.ts` extended with `chmod` + `rename` no-op stubs so the existing 158 core/qwen tests still pass; no test behavior change beyond the mock surface. Validation: typecheck clean (cli + core + sdk-typescript); core qwen 158/158; cli serve 643/643; sdk 384/384. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4255 wenshao + gpt-5.5 round-12 review feedback Eight findings from a wenshao + gpt-5.5 /review pass: 1 critical correctness, 2 real defensive defects, 4 edge cases / minor hardening, 1 test gap. All adopted. CRITICAL CORRECTNESS #1 CzSpN — `dispose()` race: after `await provider.poll(...)` the post-await guard checked only `entry.status !== 'pending'`, NOT `this.disposed`. `dispose()` clears the registry maps and aborts the entry's signal but doesn't mutate `entry.status`, so a provider whose poll already resolved (or doesn't honor abort) could enter the success branch and call `result.persist({...})` — committing credentials on a shutting-down daemon. Added the `if (this.disposed) return;` guard symmetric with the top-of-method check. REAL DEFENSIVE DEFECTS #2 Cy_ZG — sync-throw escape: the `result.persist({signal})` call happens BEFORE the `new Promise` constructor that captures it (`persistTracker` is closed-over inside the constructor). A non-conforming provider whose persist throws synchronously (e.g. top-of-function validation) would escape past the outer `try/catch (await new Promise(...))` and become an `unhandledRejection` since `runPollTick` is fire-and-forget via `void`. Wrapped the persist invocation in a try/catch that routes the sync-throw into the same `persistError` branch. #3 CzSpe — runtime provider map: provider validation hardcoded `DEVICE_FLOW_SUPPORTED_PROVIDERS` even though `deps.deviceFlowProviders` is the documented extension hook for tests/future providers. Switched both POST validation and `/workspace/auth/status` `supportedDeviceFlowProviders` to derive from `deviceFlowProviderMap.keys()` — single source of truth matches the registry's `resolveProvider`. EDGE CASES / MINOR HARDENING #4 Cy_Y9 — `slow_down` re-clamp: `intervalMs += SLOW_DOWN_BUMP_MS` can push past `DEVICE_FLOW_MAX_INTERVAL_MS` (the bound that keeps `setTimeout` from clamping to TIMEOUT_MAX). Wrapped in `Math.min(MAX_INTERVAL_MS, ...)` symmetric with the doStart clamp. #5 Cy_ZF — `expiresInSec` lower bound: `0.5` was finite-positive and produced `expiresAt = now() + 500 ms` — first poll (clamped at ≥1 s) fires AFTER expiresAt → flow expires before any user could authorize. Added `DEVICE_FLOW_MIN_EXPIRES_IN_SEC = 30` (RFC 8628 §3.2 calls 5–30 minutes "reasonable"; sub-30s is non-compliant). #6 CzHOK — take-over response privacy: `initiatorClientId` was echoed to ANY take-over POST caller, including those with no `X-Qwen-Client-Id` header. Bearer-gated already, but the asymmetry "anonymous caller learns who started it" violated the no-header-as-privacy-signal contract. Now only echoed when the caller's id matches the entry's initiator. #7 CzSpd — production audit visibility: production audit sink dropped `line.hint`, but the registry uses hints for operator-only breadcrumbs (`provider.poll() threw (raw)...`, `lost_success_after_timeout`, `persist_also_failed_past_expiry`, take-over correlation, `deferred (persist in flight; ...)`). The documented troubleshooting trail was invisible in production stderr. Now included with a 1 KiB bound + JSON-quoted so multi- word hints stay parseable. TEST GAP #8 Cy_ZH — `lost_success_after_timeout` audit: the fold-in 9 #7 split-brain detector for non-cooperative providers had no test pinning it. Added a controllable `latePersist` Promise + test that drives poll → success → enters persist race → fires PERSIST_TIMEOUT (registry publishes persist_failed) → resolves persist late → asserts the lost_success audit fires. Validation: typecheck + lint clean; cli serve 644/644 (+1 from the new test); sdk-typescript 384/384. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): close concurrent multi-provider cap bypass (PR #4255 round-13 #1) gpt-5.5 /review caught a real workspace-wide cap bypass: `countActive()` only counted entries already installed in `byProvider`, but the cap check at the top of `start()` runs before any provider's `inFlightStarts` slot completes `provider.start()`. A burst of fresh starts for `DEVICE_FLOW_MAX_CONCURRENT + 1` distinct providers all run synchronously to the cap check (each `start()` is async but runs to its first await — the await happens AFTER the cap check), all observe `count === 0` (no `byProvider` entries installed yet), and all pass — eventually installing more than the documented four pending flows. Fix: include `inFlightStarts.size` in `countActive()`. The two maps are disjoint by construction (the existing-pending fast-path catches any provider with both), so simple addition cannot double-count. The second concurrent caller sees count=1, the third count=2, …, and the (MAX+1)th caller is rejected with `TooManyActiveDeviceFlowsError`. Test: `caps at DEVICE_FLOW_MAX_CONCURRENT under CONCURRENT distinct-provider starts`. Fires `MAX+1` concurrent starts via `Promise.allSettled`, asserts exactly `MAX` fulfilled + exactly 1 rejected with the typed error. Pre-fix this test fails (all `MAX+1` succeed); post-fix it passes. Validation: typecheck clean across all 4 workspaces; deviceFlow.test.ts 35/35 (was 34); cli serve 645/645. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Summary
Adds support for
QWEN_CODE_API_TIMEOUT_MSas an environment override for model generation timeout.Precedence:
Behavior:
Validation
Added unit tests covering:
Local validation:
Coverage note
Repository-wide coverage gaps shown in CI are pre-existing and unrelated to this change. This PR includes targeted coverage for the modified timeout-resolution logic.