Skip to content

merge(upstream): sync QwenLM/qwen-code upstream changes Apr 18-19 2026#1

Merged
TaimoorSiddiquiOfficial merged 7 commits into
mainfrom
merge/upstream-apr-19
Apr 19, 2026
Merged

merge(upstream): sync QwenLM/qwen-code upstream changes Apr 18-19 2026#1
TaimoorSiddiquiOfficial merged 7 commits into
mainfrom
merge/upstream-apr-19

Conversation

@TaimoorSiddiquiOfficial

Copy link
Copy Markdown
Owner

Upstream Sync — Apr 18-19, 2026

Syncs 6 upstream commits from QwenLM/qwen-code while preserving HopCode architecture.

Changes merged from upstream

Commit Description
4ee9ca9 feat(mcp): OSC 52 copy hotkey for OAuth authorization URL — press 'c' to copy URL during OAuth, works over SSH/tmux
afa7fc3 feat(cli): early input capture to prevent keystroke loss during startup
56e8227 perf(vscode): fix input lag in long conversations via React.memo
cd1be1c feat(vscode-ide-companion): agent execution tool display cards
a623655 fix(build): node --import tsx/esm instead of npx tsx — fixes bun compatibility
f340d95 ci(stale): enable 35+35 stale/close PR policy

Architecture preserved

  • All @hoptrendy/hopcode-core / @hoptrendy/webui import paths kept
  • HopCode branding, versioning (0.14.21), and repo URLs preserved
  • CI workflows unchanged — publish-npm.yml with VS Code Marketplace + Open VSX intact
  • Skills system and multi-provider auth model untouched

Conflict resolution

packages/vscode-ide-companion/.../toolcalls/index.tsx — kept @hoptrendy/webui, added ToolCallData to type imports (upstream addition needed for AgentToolCall support)

BZ-D and others added 7 commits April 18, 2026 20:22
…#3337) (QwenLM#3393)

When MCP OAuth authentication falls back to the "copy this URL into
your browser" path (e.g. remote/web terminal where the browser can't
auto-open), long URLs wrap across lines inside the bordered dialog and
the trailing │ border characters get selected alongside the URL,
forcing the user to manually strip them out before pasting.

Surface the URL on a dedicated event and let the user press 'c' to
push it to the local clipboard via an OSC 52 escape sequence. Works
through SSH and modern web terminals (iTerm2, Windows Terminal,
xterm.js-based emulators, tmux with set-clipboard, etc.) without a
subprocess, and falls back to a visible "copy the URL above manually"
hint when the terminal is not a TTY or OSC 52 is blocked.

Key points:
- OAuth provider emits OAUTH_AUTH_URL_EVENT carrying the full URL.
- AuthenticateStep listens, tracks it in state, and binds 'c' while
  authenticating (modifier/paste keys are filtered out).
- copyToClipboardViaOsc52 writes to stderr when it's a TTY,
  falls back to stdout, and wraps the sequence for tmux/GNU screen
  via DCS passthrough so multiplexed sessions still work.
- Honest feedback: distinct "copy request sent" / "cannot write to
  terminal" states with a short auto-revert so repeated presses reset
  the timer.

Fixes QwenLM#3337
…2590)

Preserve structured agent rawOutput through the VSCode session pipeline.

Render dedicated agent execution cards from shared webui components.
…LM#2550)

Extract message list into a React.memo component to prevent
re-rendering the entire chat history on every keystroke.

- Extract MessageList as a memoized component
- Wrap UserMessage, AssistantMessage, ThinkingMessage with React.memo
- Stabilize onFileClick callback with useCallback
- Remove console.log from render path
- Wrap handleToggleThinking with useCallback

Fixes QwenLM#2395

Made-with: Cursor
…tartup (QwenLM#3319)

* feat(cli): add early input capture to prevent keystroke loss during startup (QwenLM#3224)

Start raw mode stdin listening immediately after setRawMode(true), buffer
user input during REPL initialization (200-500ms), then replay it once
KeypressProvider is mounted. Prevents keystrokes typed before the REPL
is ready from being silently dropped.

- Filter out terminal response sequences (DA, DA2, OSC, DCS, APC)
  while preserving real user input (arrow keys, function keys, etc.)
- 64KB buffer limit for safety
- Replay via setImmediate() to ensure subscribers are registered first
- Disable via QWEN_CODE_DISABLE_EARLY_CAPTURE=1
- Add benchmark-startup.sh / benchmark-startup-simple.sh for baseline
  startup time measurement

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(cli): fix bugs and optimize early input capture

- Fix getAndClearCapturedInput resetting captured flag, preventing potential re-arm
- Fix passthrough mode replay bypassing paste marker handling in KeypressContext
- Optimize buffer storage from O(n^2) concat to chunked collection
- Optimize filterTerminalResponses to use pre-allocated Buffer instead of number[]
- Add atomic stopAndGetCapturedInput API to prevent two-step usage errors
- Remove unrelated benchmark shell scripts
- Add test for stopAndGetCapturedInput

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(cli): fix listener leak, silent failures, and error handling in early input capture

- Register cleanup for stdin listener in gemini.tsx to prevent orphaned
  listener on any error path before UI mounts
- Add try-catch and cancellation guard to setImmediate replay in
  KeypressContext to handle component unmount and replay errors gracefully
- Stop capture immediately and warn when buffer limit is reached instead
  of silently dropping data with a debug-level log
- Capture stdin reference at registration time so removeListener always
  operates on the correct stream instance
- Add debug log when early capture is skipped due to non-TTY stdin

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(cli): fix early input capture being lost under React StrictMode

Move stopAndGetCapturedInput() from inside KeypressProvider's useEffect
to before render() in startInteractiveUI. When DEBUG=1, React StrictMode
deliberately runs effect→cleanup→effect, causing the first mount to drain
the buffer and schedule a replay that the cleanup immediately cancels. The
second mount found an empty buffer, silently discarding startup keystrokes.

By draining once before render() and passing the bytes as a stable prop,
StrictMode remounts always read the same data and can schedule replay on
the second (stable) mount.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix: handle split ESC prefixes in early input capture

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix: conditionally flush pending startup capture bytes

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix: drop incomplete escape sequences instead of replaying as user input

When capture stops with an incomplete ESC sequence in pendingTerminalResponse
(e.g. lone \x1b or \x1b[), classifyEscapeSequence returns 'incomplete'.
Previously shouldReplayPendingAtStop used !== 'terminal' which treated
incomplete sequences as user input. Changed to === 'user' so only
definitively-user input is replayed; ambiguous sequences are safely dropped.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

---------

Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…nLM#3237)

* fix(build): invoke tsx directly via node --import instead of npx

npx resolution breaks when scripts/build.js is invoked under bun
(bun's npx wrapper intercepts and runs tsx inside bun's runtime, where
tsx's CJS entry fails to resolve). Using 'node --import tsx/esm' skips
the npx layer entirely and works under both npm and bun invocation.

* fix(build): use node --import tsx/esm for generate:settings-schema script

Matches the approach taken in scripts/build.js so running
`bun run generate:settings-schema` directly bypasses bun's npx wrapper
and avoids the `Cannot find module './cjs/index.cjs'` tsx CJS failure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…3375)

* ci(stale): enable 28+28 stale/close policy for pull requests

- Fix the repository guard so the workflow actually runs on
  QwenLM/qwen-code (it was previously gated to google-gemini/gemini-cli
  and never executed in this repo).
- Scope the behavior to pull requests for now; issue policy will be
  introduced separately once triage labels are in place.
- Mark a PR stale after 4 weeks without activity, then close it after
  another 4 weeks.
- Exempt pinned, security, status/blocked, status/on-hold, and
  status/ready-for-merge from auto-close.
- Remove the stale label automatically when activity resumes, and
  process the oldest PRs first on each run.

* ci(stale): loosen PR cadence from 28+28 to 35+35

Five weeks + five weeks gives contributors more slack around holidays
and busy periods, and reduces the first-run impact on the existing
backlog. The total window moves from 56 days to 70 days.

* ci(stale): move cron from 01:30 UTC to 00:30 UTC

Shift by one hour so results are ready before the Beijing work day
starts (08:30 local), while still avoiding the top of the hour (the
high-contention window for GitHub-hosted runners) and staying 30
minutes after release.yml at 00:00 UTC.

* ci(stale): drop redundant repo guard and document ops-per-run

- Remove the `github.repository == 'QwenLM/qwen-code'` job guard:
  scheduled runs are already disabled on forks by GitHub, and
  workflow_dispatch is manually-triggered so the guard adds no safety.
- Add a comment explaining the `operations-per-run: 100` rationale
  (rate-limit headroom given the ~150-PR backlog).
Merged 6 upstream commits while preserving HopCode architecture:

Features synced from upstream:
- feat(mcp): OSC 52 copy hotkey for OAuth authorization URL (QwenLM#3393)
  Press 'c' during OAuth to copy URL via terminal clipboard, works over SSH
- feat(cli): early input capture to prevent keystroke loss during startup (QwenLM#3319)
  Buffers keystrokes during REPL init, replays once UI is mounted
- perf(vscode): fix input lag in long conversations via React.memo (QwenLM#2550)
  MessageList/UserMessage/AssistantMessage wrapped with React.memo
- feat(vscode-ide-companion): agent execution tool display (QwenLM#2590)
  Render dedicated agent execution cards in webview
- fix(build): invoke tsx via node --import instead of npx (QwenLM#3237)
  Fixes bun compatibility for generate:settings-schema script
- ci(stale): enable 35+35 stale/close PR policy (QwenLM#3375)

Conflict resolution:
- packages/vscode-ide-companion/.../toolcalls/index.tsx: kept @hoptrendy/webui,
  added ToolCallData to imports (upstream added it)
- All @qwen-code/* import paths preserved as @hoptrendy/* (HopCode arch)
- HopCode branding, version, and CI workflows preserved

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 19, 2026 02:05
@ecc-tools

ecc-tools Bot commented Apr 19, 2026

Copy link
Copy Markdown

Analyzing 200 commits...

@ecc-tools

ecc-tools Bot commented Apr 19, 2026

Copy link
Copy Markdown

Analysis Complete

Generated ECC bundle from 7 commits | Confidence: 65%

View Pull Request #2

Repository Profile
Attribute Value
Language TypeScript
Framework Not detected
Commit Convention conventional
Test Directory mixed
Changed Files (30)
Metric Value
Files changed 30
Additions 1740
Deletions 138

Top hotspots

Path Status +/-
packages/cli/src/utils/earlyInputCapture.ts added +380 / -0
packages/cli/src/utils/earlyInputCapture.test.ts added +349 / -0
packages/vscode-ide-companion/src/webview/App.tsx modified +119 / -87
packages/webui/src/components/toolcalls/AgentToolCall.tsx added +156 / -0
packages/vscode-ide-companion/src/webview/components/messages/toolcalls/index.test.tsx added +151 / -0

Top directories

Directory Files Total changes
packages/cli/src/utils 3 730
packages/vscode-ide-companion/src/webview 1 206
packages/vscode-ide-companion/src/webview/components/messages/toolcalls 2 167
packages/webui/src/components/toolcalls 2 161
packages/vscode-ide-companion/src/webview/hooks 2 126
Likely Future Issues (2)
Severity Signal Why it may show up
MEDIUM User-facing UI changes may ship without browser coverage 12 user-facing UI paths changed; 0 browser or e2e coverage files changed
MEDIUM Dependency or CI drift could surface after merge package.json changed; CI/workflow files changed; no lockfile changes detected
  • User-facing UI changes may ship without browser coverage: The PR changes components, pages, or other user-facing UI files without touching any obvious browser or end-to-end coverage.
  • Dependency or CI drift could surface after merge: Package or workflow changes landed without an accompanying lockfile update, which often turns into CI or release noise later.
Suggested Follow-up Work (2)
Type Suggested title Targets
PR test: add browser coverage for packages/cli/src/ui/components/mcp/steps/AuthenticateStep.tsx + packages/cli/src/ui/contexts/KeypressContext.tsx packages/cli/src/ui/components/mcp/steps/AuthenticateStep.tsx, packages/cli/src/ui/contexts/KeypressContext.tsx
PR chore: refresh lockfile and validate CI after dependency updates .github/workflows/stale.yml, package.json
  • test: add browser coverage for packages/cli/src/ui/components/mcp/steps/AuthenticateStep.tsx + packages/cli/src/ui/contexts/KeypressContext.tsx: Backfill browser coverage before another user-facing UI change lands on the touched surface.
  • chore: refresh lockfile and validate CI after dependency updates: Package or workflow changes without a lockfile refresh tend to turn into noisy follow-up fixes after merge.

Copy-ready bodies

test: add browser coverage for packages/cli/src/ui/components/mcp/steps/AuthenticateStep.tsx + packages/cli/src/ui/contexts/KeypressContext.tsx

## Summary
- Add browser or end-to-end coverage for the recently changed user-facing surface.

## Why
- Backfill browser coverage before another user-facing UI change lands on the touched surface.

## Touched paths
- `packages/cli/src/ui/components/mcp/steps/AuthenticateStep.tsx`
- `packages/cli/src/ui/contexts/KeypressContext.tsx`

## Validation
- Add or extend browser / e2e coverage for the changed component, page, or flow.
- Exercise the visible user journey that depends on the touched UI surface.

chore: refresh lockfile and validate CI after dependency updates

## Summary
- Refresh the lockfile and rerun CI after the dependency or workflow changes in this PR.

## Why
- Package or workflow changes without a lockfile refresh tend to turn into noisy follow-up fixes after merge.

## Touched paths
- `.github/workflows/stale.yml`
- `package.json`

## Validation
- Refresh the lockfile in the same package manager used by the repo.
- Run the repo typecheck / test / CI entrypoints that depend on the updated package graph.
Detected Workflows (3)
Workflow Description
feature-development-with-tests-and-shared-components Implements a new feature or capability, including backend logic, UI components, shared types, and corresponding tests. Frequently involves updating both implementation and test files, as well as shared type definitions and documentation/design plans.
infrastructure-or-configuration-update Updates build scripts, CI/CD workflows, or project configuration to improve build reliability, compatibility, or automation (e.g., updating how scripts are invoked, changing workflow policies).
merge-upstream-with-conflict-resolution Merges upstream changes into the current fork, applies conflict resolution, and ensures project-specific customizations (such as branding or architecture) are preserved.
Generated Instincts (27)
Domain Count
git 4
code-style 9
architecture 1
testing 7
workflow 6

After merging, import with:

/instinct-import .claude/homunculus/instincts/inherited/HopCode-instincts.yaml

Files

  • .claude/ecc-tools.json
  • .claude/skills/HopCode/SKILL.md
  • .agents/skills/HopCode/SKILL.md
  • .agents/skills/HopCode/agents/openai.yaml
  • .claude/identity.json
  • .codex/config.toml
  • .codex/AGENTS.md
  • .codex/agents/explorer.toml
  • .codex/agents/reviewer.toml
  • .codex/agents/docs-researcher.toml
  • .claude/homunculus/instincts/inherited/HopCode-instincts.yaml
  • .claude/commands/feature-development-with-tests-and-shared-components.md
  • .claude/commands/infrastructure-or-configuration-update.md
  • .claude/commands/merge-upstream-with-conflict-resolution.md

ECC Tools | Everything Claude Code

@TaimoorSiddiquiOfficial TaimoorSiddiquiOfficial merged commit d17599e into main Apr 19, 2026
5 of 8 checks passed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b9d7806b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

* user input being lost during startup.
*/

import { createDebugLogger } from '@qwen-code/qwen-code-core';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Import early capture logger from the shipped core package

earlyInputCapture.ts imports createDebugLogger from @qwen-code/qwen-code-core, but this repo’s CLI package depends on @hoptrendy/hopcode-core (see packages/cli/package.json), so a normal install cannot resolve the new specifier. Because gemini.tsx imports this module at startup, interactive CLI startup can fail with module-resolution errors before UI initialization.

Useful? React with 👍 / 👎.

import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { ToolCallRouter } from './index.js';

vi.mock('@qwen-code/webui', async () => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mock the same webui module that ToolCallRouter imports

This test mocks @qwen-code/webui, but ToolCallRouter now imports from @hoptrendy/webui (index.tsx), so the mock is never applied. The assertions in this file depend on mocked label output (for example agent:...), which makes the new test fail or become brittle because it exercises real components instead of the intended stubbed ones.

Useful? React with 👍 / 👎.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Syncs upstream changes from QwenLM/qwen-code (Apr 18–19, 2026) into this repo while keeping HopCode package/import architecture, adding: structured agent tool-call rendering, CLI startup input capture, OSC52 OAuth URL copy support, and a stale-PR workflow policy.

Changes:

  • Add structured rawOutput plumbing + dedicated AgentToolCall renderer for subagent execution progress/cards across webui + VS Code companion.
  • Add CLI “early input capture” to prevent keystroke loss during startup, with replay into the keypress layer.
  • Add OAuth authorization URL copy hotkey (OSC 52) and update stale workflow + build script.

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
packages/webui/src/index.ts Re-exports new agent tool-call component/utilities and types.
packages/webui/src/components/toolcalls/shared/types.ts Adds agent execution types and rawOutput to ToolCallData.
packages/webui/src/components/toolcalls/shared/index.ts Re-exports new agent execution types.
packages/webui/src/components/toolcalls/index.ts Exports AgentToolCall and type-guards from toolcalls entrypoint.
packages/webui/src/components/toolcalls/AgentToolCall.tsx New dedicated UI card for structured agent execution output.
packages/webui/src/components/messages/UserMessage.tsx Memoizes user message rendering to reduce rerenders.
packages/webui/src/components/messages/ThinkingMessage.tsx Memoizes thinking message rendering to reduce rerenders.
packages/webui/src/components/messages/Assistant/AssistantMessage.tsx Memoizes assistant message rendering to reduce rerenders.
packages/webui/src/components/ChatViewer/ChatViewer.tsx Routes tool calls to AgentToolCall when rawOutput indicates agent execution.
packages/vscode-ide-companion/src/webview/hooks/useToolCalls.ts Preserves/merges rawOutput across tool-call updates.
packages/vscode-ide-companion/src/webview/hooks/useToolCalls.test.tsx Adds test coverage for rawOutput preservation in the hook.
packages/vscode-ide-companion/src/webview/components/messages/toolcalls/index.tsx Routes to AgentToolCall by inspecting rawOutput (not just kind).
packages/vscode-ide-companion/src/webview/components/messages/toolcalls/index.test.tsx Adds renderer routing tests for agent executions.
packages/vscode-ide-companion/src/webview/App.tsx Memoizes the message list to reduce input lag in long conversations.
packages/vscode-ide-companion/src/types/chatTypes.ts Adds rawOutput to tool-call update types.
packages/vscode-ide-companion/src/services/qwenSessionUpdateHandler.ts Forwards rawOutput from session updates into callbacks.
packages/vscode-ide-companion/src/services/qwenSessionUpdateHandler.test.ts Adds test coverage ensuring rawOutput forwarding.
packages/core/src/mcp/oauth-provider.ts Emits an auth URL event for OAuth flows (for CLI copy hotkey).
packages/core/src/index.ts Re-exports OAuth event constants.
packages/cli/src/utils/events.ts Adds oauth-auth-url event to CLI event enum.
packages/cli/src/utils/earlyInputCapture.ts New early-stdin capture module to buffer keystrokes during startup.
packages/cli/src/utils/earlyInputCapture.test.ts Adds unit tests for early input capture and filtering logic.
packages/cli/src/ui/contexts/KeypressContext.tsx Accepts and replays captured startup input into keypress pipeline.
packages/cli/src/ui/components/mcp/steps/AuthenticateStep.tsx Adds OSC52 copy-on-c hotkey and subscribes to OAuth URL event.
packages/cli/src/i18n/locales/zh.js Adds translations for the copy-to-clipboard UX strings.
packages/cli/src/i18n/locales/en.js Adds English strings for the copy-to-clipboard UX.
packages/cli/src/gemini.tsx Starts early input capture and drains it before React render (StrictMode-safe).
package.json Uses node --import tsx/esm for bun compatibility in settings schema generation.
docs/plans/2026-03-22-agent-tool-display-design.md Adds an implementation plan doc for the agent tool display work.
.github/workflows/stale.yml Updates stale PR policy and disables stale issue handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

appEvents.removeListener(AppEvent.OauthDisplayMessage, displayListener);
appEvents.removeListener(AppEvent.OauthAuthUrl, authUrlListener);
};
}, []);

Copilot AI Apr 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OAuth event subscription effect uses t(...) but has an empty dependency array. With react-hooks/exhaustive-deps enabled in this repo, this will raise a lint warning and can also capture a stale t implementation if locale/context ever changes. Include t in the dependency list (or refactor so the listeners don't close over changing values) to satisfy the hooks rule.

Suggested change
}, []);
}, [t]);

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +59
// Save original stdin
originalStdin = process.stdin;
originalIsTTY = process.stdin.isTTY ?? false;

// Create mock stdin
mockStdin = new PassThrough();
Object.defineProperty(process, 'stdin', {
value: mockStdin,
writable: true,
configurable: true,
});
Object.defineProperty(process.stdin, 'isTTY', {
value: true,
writable: true,
configurable: true,
});

delete process.env['QWEN_CODE_DISABLE_EARLY_CAPTURE'];
});

afterEach(() => {
resetCaptureState();

// Restore original stdin
Object.defineProperty(process, 'stdin', {
value: originalStdin,
writable: true,
configurable: true,
});
Object.defineProperty(process.stdin, 'isTTY', {
value: originalIsTTY,
writable: true,
configurable: true,
});

Copilot AI Apr 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test restores process.stdin.isTTY by writing a new boolean property descriptor. Other tests in this repo preserve and restore the original descriptor (e.g. packages/cli/src/gemini.test.tsx), which avoids errors if isTTY is non-writable/non-configurable in some Node/CI environments. Consider capturing Object.getOwnPropertyDescriptor(process.stdin, 'isTTY') and restoring that descriptor (or deleting the override when none existed) instead of forcing writable/configurable.

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +9
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** Add a dedicated VSCode/web UI display for Agent tool executions so subagent progress, summaries, and failures render from structured `rawOutput` instead of falling back to the generic tool card.

**Architecture:** Preserve ACP `rawOutput` through the VSCode session/update pipeline into `ToolCallData`, then let the shared web UI router detect `task_execution` payloads and render a dedicated `AgentToolCall` component. Keep the change shared in `packages/webui` so VSCode and `ChatViewer` stay aligned.

**Tech Stack:** TypeScript, React, Vitest, shared `@qwen-code/webui` tool-call components.

Copilot AI Apr 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plan doc contains tool-specific AI instructions ("For Claude... superpowers:executing-plans") and references @qwen-code/webui, but this repo’s public package import path is @hoptrendy/webui. To keep project documentation repo-agnostic and accurate, remove the AI-specific directive and update the package name/wording to match the HopCode packages used in the implementation.

Suggested change
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Add a dedicated VSCode/web UI display for Agent tool executions so subagent progress, summaries, and failures render from structured `rawOutput` instead of falling back to the generic tool card.
**Architecture:** Preserve ACP `rawOutput` through the VSCode session/update pipeline into `ToolCallData`, then let the shared web UI router detect `task_execution` payloads and render a dedicated `AgentToolCall` component. Keep the change shared in `packages/webui` so VSCode and `ChatViewer` stay aligned.
**Tech Stack:** TypeScript, React, Vitest, shared `@qwen-code/webui` tool-call components.
**Goal:** Add a dedicated VSCode/web UI display for Agent tool executions so subagent progress, summaries, and failures render from structured `rawOutput` instead of falling back to the generic tool card.
**Architecture:** Preserve ACP `rawOutput` through the VSCode session/update pipeline into `ToolCallData`, then let the shared web UI router detect `task_execution` payloads and render a dedicated `AgentToolCall` component. Keep the change shared in `packages/webui` so VSCode and `ChatViewer` stay aligned.
**Tech Stack:** TypeScript, React, Vitest, shared `@hoptrendy/webui` tool-call components.

Copilot uses AI. Check for mistakes.
* user input being lost during startup.
*/

import { createDebugLogger } from '@qwen-code/qwen-code-core';

Copilot AI Apr 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

earlyInputCapture.ts imports createDebugLogger from @qwen-code/qwen-code-core, but this repo consistently uses @hoptrendy/hopcode-core (and there are no other references to @qwen-code/qwen-code-core). This will fail to build/resolve. Switch the import to @hoptrendy/hopcode-core (or the local core entrypoint used elsewhere in the CLI package).

Suggested change
import { createDebugLogger } from '@qwen-code/qwen-code-core';
import { createDebugLogger } from '@hoptrendy/hopcode-core';

Copilot uses AI. Check for mistakes.
}

const minutes = Math.floor(durationMs / 60_000);
const seconds = Math.round((durationMs % 60_000) / 1000);

Copilot AI Apr 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatDuration() can produce an invalid "Xm 60s" display because seconds is computed with Math.round(...) (e.g. 119999ms -> 1m 60s). Use Math.floor for seconds (or normalize when rounding hits 60) so the minutes/seconds output stays within 0–59 seconds.

Suggested change
const seconds = Math.round((durationMs % 60_000) / 1000);
const seconds = Math.floor((durationMs % 60_000) / 1000);

Copilot uses AI. Check for mistakes.
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { ToolCallRouter } from './index.js';

vi.mock('@qwen-code/webui', async () => {

Copilot AI Apr 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test mocks @qwen-code/webui, but the router under test imports tool-call components from @hoptrendy/webui. As written, the mock won't apply and the test will exercise the real webui package instead (or fail if @qwen-code/webui doesn't exist). Update the vi.mock(...) target to @hoptrendy/webui to match the implementation import path.

Suggested change
vi.mock('@qwen-code/webui', async () => {
vi.mock('@hoptrendy/webui', async () => {

Copilot uses AI. Check for mistakes.
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request Apr 24, 2026
The global ProxyAgent was overriding per-request dispatchers used by the
OpenAI SDK, causing 'API Error: Connection error. (cause: fetch failed)'.

- Remove setGlobalDispatcher(new ProxyAgent()) from channel/start.ts
- Remove unused undici import
- Bump version to 0.17.4 across all packages

Fixes #1
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request Apr 24, 2026
When no proxy is set, return undefined instead of creating a default
undici Agent. This avoids potential compatibility issues with the
OpenAI SDK's default fetch behavior.

Fixes #1
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request Apr 29, 2026
* feat(core): wire background shells into the task_stop tool

Phase B follow-up #1 from QwenLM#3634, unblocked by QwenLM#3471 (control plane) merging
in. The model can now cancel a managed background shell with the same
`task_stop` tool it uses for subagents — no more falling back to
`kill <pid>` via BashTool.

Lookup order: subagent registry first (existing behavior), then the
background shell registry as a fallback. Agent IDs follow
`<subagentName>-<suffix>` and shell IDs follow `bg_<8 hex chars>`, so the
two namespaces cannot collide in practice; the order is fixed for
determinism (a defensive test pins agent-wins-over-shell).

The shell cancel path resolves through the entry's own AbortController
(which `BackgroundShellRegistry.cancel` triggers); the child process
exit handler then settles the registry to `cancelled` and the on-disk
output file is preserved for inspection via `/tasks` or a direct `Read`.
This matches Phase B's "registry's own AbortController is the
cancellation source of truth" design without needing the in-flight
notification framework that subagents use.

Tests: 7 task-stop tests (was 4) — added cancel-shell happy path,
NOT_RUNNING for already-exited shell, and a defensive
agent-takes-precedence-on-id-collision case.

* fix(core): defer shell terminal transition until spawn handler settles

@doudouOUC noticed that the previous task_stop path called
`BackgroundShellRegistry.cancel(id, Date.now())`, which marked the entry
`cancelled` immediately. The spawn handler's settle path only records
real exit info via cancel/complete/fail when the entry is still
`running`, so the cancel-vs-exit race could permanently hide a real
completed/failed result and `/tasks` would show a terminal endTime
while the process was still draining.

Add a `requestCancel(id)` method to `BackgroundShellRegistry` that
triggers the entry's AbortController only; status stays `running` until
the settle path observes the abort and records the real terminal state.
The immediate-mark `cancel(id, endTime)` is reserved for `abortAll()` /
shutdown, where the CLI process is tearing down anyway and there is no
settle handler to wait for.

Tests updated:
- `task-stop.test.ts` cancel-shell happy path now asserts the entry
  stays `running` with `endTime` undefined post-stop, and the abort
  signal fires (the settle path's contract, not task_stop's, is the
  one that flips status).
- 3 new `requestCancel` tests in `backgroundShellRegistry.test.ts`:
  running → abort+still-running, terminal entry no-op, unknown id no-op.

---------

Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request May 5, 2026
…rovider (QwenLM#3788)

* fix(core): inject thinking blocks for DeepSeek anthropic-compatible provider

DeepSeek's anthropic-compatible endpoint
(https://api.deepseek.com/anthropic) rejects follow-up requests with
HTTP 400 ("The content[].thinking in the thinking mode must be passed
back to the API.") whenever a prior assistant turn carrying tool_use
omits a thinking block. The model can legitimately return a tool round
without thinking text, so qwen-code stored no thought parts and rebuilt
the next request with no thinking block, tripping the API's check.

Mirroring the existing OpenAI-side fix (QwenLM#3729, QwenLM#3747), the converter
now detects DeepSeek by base URL or model name and prepends an empty
{ type: 'thinking', thinking: '', signature: '' } block to assistant
turns missing one. Other anthropic-protocol providers are unaffected.

Verified against the live api.deepseek.com/anthropic endpoint:
- assistant with tool_use, no thinking → 400 (reproduces QwenLM#3786)
- assistant with tool_use, empty thinking injected → 200 OK

Refs QwenLM#3786

* fix(core): gate DeepSeek thinking-block injection on thinking mode

Address PR review feedback:

1. (Critical) Gate empty-thinking injection on the same per-request
   condition that emits the top-level `thinking` parameter. The previous
   implementation injected unconditionally on DeepSeek providers, but
   `buildThinkingConfig()` may omit `thinking` when reasoning=false or
   `thinkingConfig.includeThoughts=false` — which is exactly what
   suggestionGenerator / ArenaManager / forkedAgent do. Shipping
   thinking blocks without enabling thinking mode is a protocol
   violation that DeepSeek may reject. Move the option from converter
   constructor to a per-request `convertGeminiRequestToAnthropic`
   parameter so the generator can compute the gate correctly.

2. (CodeQL) Replace `baseUrl.includes('api.deepseek.com')` with
   `new URL(baseUrl).hostname` exact-match. The substring check would
   accept spoofed hosts like `api.deepseek.com.evil.com`.

3. Document the empty-signature workaround inline.

4. Rename the misleading "redacted_thinking" test case.

* fix(core): narrow DeepSeek thinking injection to tool_use turns + subdomain test

Address PR review round 2:

1. Narrow injection scope to assistant turns containing tool_use. Live
   verification against api.deepseek.com/anthropic showed plain-text
   assistant turns without thinking are accepted unchanged — only
   tool_use turns trigger the HTTP 400. Injecting on every assistant
   turn unnecessarily bloats replay history with synthetic blocks the
   API does not require. Existing thinking blocks on any turn are still
   preserved untouched.

2. Add test coverage for the subdomain hostname branch
   (us.api.deepseek.com → matches), addressing the gap noted in review.

3. Update existing negative-case tests (non-deepseek / spoofed /
   reasoning=false / includeThoughts=false) to use tool_use scenarios
   so they actually exercise the gating logic instead of trivially
   passing under the narrowed scope.

* docs(core): align DeepSeek thinking-injection comments with narrowed scope

Address PR review round 3 (copilot-pull-request-reviewer × 3): comments
in three locations still described the constraint as applying to "any
prior assistant turn", which was true before commit 8721b41 but no
longer matches the implementation. Update the doc comment on
isDeepSeekAnthropicProvider and the two test-suite header comments to
state the actual narrower contract: the API rejects only tool-use turns
that omit thinking blocks; plain-text assistant turns are accepted
unchanged.

Comment-only change; 58 tests still pass.

* fix(core): per-request DeepSeek detection + strip thinking when off

Address PR review round 4 (copilot-pull-request-reviewer × 2):

1. Stale provider-detection cache (HCia). The constructor cached
   isDeepSeekProvider once, but Config.setModel() mutates
   contentGeneratorConfig.model in place. After a runtime /model switch
   from a non-DeepSeek model to a DeepSeek one on the same auth config,
   buildRequest() would keep using the stale flag. Move the detection
   into buildRequest so each call sees the current model. The detector
   is cheap (URL parse + string compare).

2. Real thought parts leak through when thinking is disabled (HCib).
   The previous gate only blocked synthetic injection — but the
   converter still replayed any existing `thought: true` parts in
   request.contents as thinking blocks. Code paths that disable
   thinking against a session whose history was built with thinking on
   (suggestionGenerator / ArenaManager / forkedAgent) would still emit
   thinking blocks alongside an absent top-level `thinking` config —
   the same protocol mismatch the gate was meant to avoid.

   Add a `stripAssistantThinking` converter option, set in buildRequest
   to `isDeepSeek && !thinking`. The converter strips thinking and
   redacted_thinking blocks from assistant messages before message
   construction completes. Mirror behavior is already proven safe by
   live verification (DeepSeek currently tolerates either shape, but
   stripping makes the request body internally consistent and robust
   to future validation tightening).

3 new tests:
- converter strips thinking from assistant turns when option set
- generator strips real thought parts when reasoning=false
- generator reflects runtime model changes (no stale cache)

61 tests pass; lint + typecheck clean.

* fix(core): preserve thinking-only assistant turns instead of emitting empty content

Address PR review round 5 (copilot-pull-request-reviewer × 2 — code +
test):

stripThinkingFromAssistantMessages previously replaced message.content
with the filtered array unconditionally. For an assistant turn whose
only blocks are thinking/redacted_thinking (e.g. a round cut off by
max_tokens before any text or tool_use was emitted), this left
`content: []` — which Anthropic API rejects.

Dropping the message entirely was considered but would break the
required user/assistant alternation. Instead, fall back to leaving the
original blocks in place when stripping would empty the message.
DeepSeek empirically tolerates the residual `thinking-block +
no-thinking-config` shape (verified against api.deepseek.com/anthropic
in the V2/X scenarios), so leaving the message untouched is the safer
choice than emitting invalid structure.

Add regression test for the thinking-only turn shape.

62 tests pass; lint + typecheck clean.

* fix(core): validate thinking-block signature, rename option, gate output_config

Address PR review round 6 — five substantive items:

1. (Critical) Drop non-compliant thinking blocks lacking a `signature`
   field and replace them with a synthetic one. A `redacted_thinking`
   block round-tripped through Gemini Part format becomes
   `{ text: '', thought: true }` (no thoughtSignature) and converts
   back to `{ type: 'thinking', thinking: '' }` without `signature` —
   not spec-compliant. The previous `hasThinking` check accepted these
   as already-satisfying, leaving non-compliant blocks in the wire
   message. Tighten the check so they're filtered out and the
   synthetic injection runs. Live verification: DeepSeek currently
   tolerates both shapes (lenient), but normalizing is defensively
   correct against future tightening.

2. Rename converter option `ensureAssistantThinking` →
   `ensureThinkingOnToolUseTurns`. The new name reflects the actual
   contract (tool-use turns only, not every assistant turn).

3. Honor `thinkingConfig.includeThoughts: false` in `buildOutputConfig`.
   Previously a per-request opt-out dropped the top-level `thinking`
   parameter but still emitted `output_config.effort`, leaking a
   reasoning-shaped field into side queries that don't want it.

4. Add regression test for mixed text + tool_use assistant turns
   (common shape: model says something, then calls a tool).

5. Add explicit test for the signature-validation path: an existing
   compliant thinking block (with signature) is preserved untouched.

64 tests pass; lint + typecheck clean.

* fix(core): clean up non-compliant thinking blocks on plain-text turns + assert output_config gating

Address PR review round 7 (copilot-pull-request-reviewer × 2):

1. (Hp-x) Round-tripped redacted_thinking blocks were left malformed on
   assistant turns lacking tool_use. The previous structure only ran
   the cleanup pass when a tool_use block was present (early return on
   `!hasToolUse`), so plain-text turns kept the non-compliant
   `{ type: 'thinking', thinking: '' }` shape. Restructure into two
   sequential steps:
     a. Drop non-compliant thinking blocks (no `signature`) on every
        assistant turn — same fallback that avoids `content: []` if
        the message is thinking-only.
     b. Inject the synthetic empty thinking block on tool_use turns
        that still lack a compliant thinking block after step (a).

2. (Hp-7) The includeThoughts=false test asserted that the top-level
   `thinking` field is suppressed but didn't cover `output_config`,
   leaving regressions in the new `buildOutputConfig` gate uncaught.
   Tighten the assertion to also verify `output_config` is absent.

3. New converter test: cleanup runs on plain-text assistant turns too.

65 tests pass; lint + typecheck clean.

* test(core): add explicit redacted_thinking injection-path coverage

Address PR review round 8 (#30 — copilot reviewer). The converter
treats `redacted_thinking` as already satisfying the thinking-block
requirement (no synthetic injected), distinguished from a
signature-less `thinking` block which is non-compliant and gets
dropped/replaced. Existing tests covered the latter path; this adds
explicit coverage of the former.

processContent doesn't synthesize redacted_thinking from Gemini parts,
so the test reaches into the private helper directly. (#31 — subdomain
hostname coverage — already exists at line 602.)

66 tests pass; lint + typecheck clean.

* fix(core): per-request anthropic-beta + normalize thinking-only turns

Address PR review round 9 (copilot-pull-request-reviewer × 2):

1. (Hydz) thinking-only assistant turns (e.g. max_tokens cutoff or
   round-tripped redacted_thinking) hit the cleanup-empties fallback
   and kept the original non-compliant `{ type: 'thinking',
   thinking: '' }` block. The fallback now replaces the message with
   a synthetic empty thinking block (`signature: ''` included), which
   keeps the message non-empty AND spec-compliant.

2. (Hyd4) `anthropic-beta` was set once at construction from the global
   `reasoning` config, so requests with per-request
   `thinkingConfig.includeThoughts=false` still advertised
   interleaved-thinking / effort even though the body had dropped the
   matching fields. Move beta computation to a new
   `buildPerRequestHeaders` that derives the header from the actual
   `thinking` / `output_config` fields present in the request body, and
   pass it via `messages.create(..., { headers })`. The wire shape is
   now internally consistent.

Test updates:
- Drop the three constructor-time beta assertions; they no longer apply.
- Add four per-request header tests covering: both betas present,
  only interleaved-thinking, reasoning=false (no betas), and per-request
  includeThoughts=false (no betas).

67 tests pass; lint + typecheck clean.

* fix(core): preserve thinking text by normalizing in place + merge user beta flags

Address PR review round 10 (copilot-pull-request-reviewer × 2):

1. (H0oF) The previous cleanup filtered out every thinking block missing
   a `signature` field. But that shape is the normal output from
   OpenAI/Gemini/agent-runtime generators, which only set `thought:
   true` without a signature. Users switching providers mid-session
   would silently lose preserved thinking text on the first DeepSeek
   request. Change Step 1 to NORMALIZE in place: when a thinking block
   has no signature, set `signature: ''` rather than dropping the
   block. The original `thinking` text is preserved; DeepSeek
   empirically accepts empty signatures so the wire shape stays valid.

2. (H0oL) `buildPerRequestHeaders()` overwrote
   `customHeaders['anthropic-beta']` whenever the per-request override
   fired, regressing the customHeaders escape hatch for unrelated
   Anthropic beta features. Merge the user's flags into the computed
   list (deduped) so users can stack their own betas alongside
   interleaved-thinking / effort.

Test changes:
- Renamed and rewrote "drops non-compliant... plain-text" test to
  assert in-place normalization that preserves thinking text.
- Updated "replaces a non-compliant thinking block" comment + name to
  describe the normalization (the assertion was already correct because
  the test happened to use empty thinking text).
- The empty-content fallback in Step 1 is no longer reachable under
  the new logic, so the dedicated thinking-only-turn test now exercises
  only the strip path (where it remains relevant).
- Added 3 customHeaders[anthropic-beta] tests: merge with computed,
  passthrough when no thinking/effort, dedupe.

70 tests pass; lint + typecheck clean.

* docs(core): align thinking-injection comments with normalize semantics + add stream test

Address PR review round 11 (copilot-pull-request-reviewer × 3):

1. (H3Iw) Update the `ensureThinkingOnToolUseTurns` option docstring
   to describe in-place normalization (preserving thinking text by
   filling in `signature: ''`) instead of the old drop-and-replace
   semantics.

2. (H3I9) Same update on the `applyEmptyThinkingToToolUseTurns` helper
   JSDoc — clarify that signature-less thinking blocks are normalized
   in place (preserving original text), not dropped. Mention the
   common case of cross-provider history where non-Anthropic
   generators only set `thought: true`.

3. (H3I3) Add a streaming test asserting that
   `generateContentStream()` also attaches the per-request
   `anthropic-beta` header. The previous coverage only exercised
   `generateContent()`, leaving the streaming path's separate code
   path (line 144 in anthropicContentGenerator.ts) unverified.

71 tests pass; lint + typecheck clean.

* refactor(core): split DeepSeek thinking option in two + add header coexistence test

Address PR review round 12 (copilot-pull-request-reviewer × 2):

1. (H6ws) The single `ensureThinkingOnToolUseTurns` option was
   misleadingly narrow: the implementation also rewrote non-tool-use
   turns by normalizing malformed thinking blocks. Future callers
   could enable it expecting only the tool-use behavior. Split into
   two precisely-named options:
     - normalizeAssistantThinkingSignature: fill missing `signature`
       on every assistant `thinking` block (cross-provider history
       compat).
     - injectThinkingOnToolUseTurns: prepend synthetic empty thinking
       on tool_use turns missing one (issue QwenLM#3786 trigger).
   The generator wires both together for DeepSeek when thinking mode
   is on; either can be used independently if a future caller needs
   only one pass.

2. (H6w4) Add a test asserting that the per-request `headers` path
   coexists correctly with `customHeaders`: User-Agent and unrelated
   customHeaders entries stay in `defaultHeaders` while only the
   computed `anthropic-beta` rides on the per-request path. Defends
   against a future regression where header config might be routed
   through a code path that wipes the constructor defaults.

72 tests pass; lint + typecheck clean.

* fix(core): case-insensitive customHeaders[anthropic-beta] merge

Address yiliang114 review feedback (QwenLM#3788).

HTTP header names are case-insensitive by spec, and the Anthropic SDK
lower-cases them during merge. Previously buildPerRequestHeaders only
read the lower-case `anthropic-beta` key from customHeaders, so a
user-configured `Anthropic-Beta` or `ANTHROPIC-BETA` would be silently
overwritten by the per-request computed value.

Replace the direct dict lookup with collectCustomBetaFlags() which
walks all customHeaders entries and matches the key case-insensitively.
Multiple matching entries (unlikely but possible) are concatenated; the
existing dedupe pass handles any duplicates.

Add a regression test for both `Anthropic-Beta` and `ANTHROPIC-BETA`
key shapes.

73 tests pass; lint + typecheck clean.

* docs(core): align thinking-injection docs with normalize-in-place semantics + redacted_thinking strip test

Address PR review round 14 (copilot-pull-request-reviewer × 4):

1. (IAl4) PR description still described "dropped here so synthetic
   injection takes over" but the implementation now normalizes
   signature-less thinking blocks in place (preserving text). PR
   description rewritten to describe the two-pass model:
   normalize-in-place + injection-when-truly-missing.

2. (IAl7) `injectThinkingOnToolUseTurns` option docstring claimed
   signature-less blocks would be "seen as missing" so the synthetic
   replaces them. Updated to describe the actual flow: the
   normalization pass runs first, blocks become compliant in place,
   the injector then sees them as already-satisfying and prepends
   nothing. Helper JSDoc on `injectEmptyThinkingOnToolUseTurns` fixed
   the same way.

3. (IAl8) Strip-path coverage missed `redacted_thinking` blocks. Added
   regression test that verifies both thinking and redacted_thinking
   blocks are removed when `stripAssistantThinking` is set.

4. (IAl-) Renamed the converter test suite from "thinking-mode
   injection + normalization (DeepSeek thinking on)" to "DeepSeek
   thinking-mode normalization, injection, and stripping" so the
   title accurately covers all behavior the block exercises (including
   `stripAssistantThinking` cases later in the same describe).

74 tests pass; lint + typecheck clean.

* fix(core): exclude anthropic-beta variants from defaultHeaders to avoid wire duplication

Address PR review round 15 (copilot-pull-request-reviewer #1).

`buildHeaders()` previously spread the entire `customHeaders` map into
the SDK's `defaultHeaders`. After moving anthropic-beta computation to
the per-request path, a user-configured mixed-case `Anthropic-Beta`
key would survive in defaultHeaders verbatim, while the per-request
override added a lowercase `anthropic-beta`. The wire then carried two
physical headers for the same logical name — SDK behavior on duplicate
headers with different casings is undefined.

`buildPerRequestHeaders()` already merges those user flags
case-insensitively (commit 0d8b5de), so dropping the entry from
defaultHeaders is the right boundary: the per-request path owns the
header end-to-end. Other customHeaders entries continue to pass
through.

Add a regression test asserting no `Anthropic-Beta` (any casing) lands
in defaultHeaders while unrelated customHeaders are kept.

75 tests pass; lint + typecheck clean.
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request May 5, 2026
…wenLM#3809)

* feat(core): hint to background long-running foreground bash commands

Phase D part (a) of Issue QwenLM#3634. When a foreground `shell` tool call
runs ≥ 60 seconds and completes (succeeds or errors), append an
advisory line to the LLM-facing tool result suggesting re-running with
`is_background: true` next time.

Why: today a foreground bash that takes minutes (build watcher, soak
test, slow npm install, polling loop) blocks the agent indefinitely.
The user is already paying for the wait; the agent's next turn could
have started running in parallel under `is_background: true`. Sleep
interception (QwenLM#3684) handled the egregious `sleep N` case at validate
time; this handles the legitimate-but-long case at result time.

Trade-offs:
- Threshold = 60s. Half the existing 120s foreground timeout. Long
  enough that normal `npm install` / `pytest` runs don't trigger;
  short enough that the hint surfaces before the timeout hard-kills.
- Advisory only — the command still runs to completion in the
  foreground for THIS invocation. The advice is for the agent's NEXT
  decision, not a corrective action on the current one.
- Fires on success AND error completions. The advice is the same
  ("background it next time") in both cases.
- Suppressed on aborted (timeout / user-cancel) — those paths already
  surface their own messaging and don't benefit from a "should have
  been background" reminder when the user / system already killed it.

Implementation:
- New constant `LONG_RUNNING_FOREGROUND_THRESHOLD_MS = 60000` in
  shell.ts, paired with the existing `DEFAULT_FOREGROUND_TIMEOUT_MS`.
- Helper `buildLongRunningForegroundHint(elapsedMs)` exported so
  future surfaces (UI, telemetry) can render the same text without
  duplicating the threshold logic.
- `Date.now()` bracketing around the spawn → `await resultPromise`
  block — mirrors what the background path already captures via
  `entry.startTime`.
- Append happens inside the existing non-aborted result builder;
  zero changes to the cancel / timeout arms.

Tests: 4 new cases — fires on long success, omits on short success,
fires on long error completion, omits on aborted. Uses vi fake timers
to drive wall-clock past the threshold without actually sleeping.

* fix(core): tighten long-run hint suppression + boundary tests + post-truncation insertion

Addresses 8 review threads on PR QwenLM#3809 — 6 from /review bots, 2 from
copilot — covering doc accuracy, code quality, behavioural gaps, and
test coverage.

**Behavioural fixes (real bugs)**:

- **Suppress on external signal kills** (`result.signal != null` with
  `aborted: false`). `shellExecutionService` only sets `aborted` when
  the AbortSignal we passed was triggered, so SIGTERM from container
  shutdown / k8s eviction / OOM killer / sibling process-group reap
  falls through to the non-aborted branch. The advisory shouldn't fire
  there — the process didn't run to its conclusion, so "next time,
  background it" doesn't fit. New test pins this with `signal: 15`
  (SIGTERM), `aborted: false`.

- **Append AFTER `truncateToolOutput`**. Previously the hint was
  appended inside the non-aborted result builder, which meant for
  long outputs it got wrapped in the "Truncated part of the output:"
  envelope — the LLM might read the advisory as part of the command's
  own output. New post-truncation insertion + test that pins ordering
  by mocking `truncateToolOutput` directly (real path needs
  `fs.writeFile` to actually succeed for the replacement branch to
  fire).

- **Hint wording mode-aware**. The dialog mention dropped the
  unconditional "(footer pill + Enter)" specifics, which would mislead
  non-TTY users (`-p` headless / ACP / SDK consumers — no dialog or
  pill exists there). Now qualified as "in interactive mode the
  Background tasks dialog also has...". `/tasks` and the on-disk
  output file are mentioned without qualifier (work in any mode).

**Code quality**:

- **Threshold programmatically coupled to timeout**:
  `LONG_RUNNING_FOREGROUND_THRESHOLD_MS = Math.floor(DEFAULT_FOREGROUND_TIMEOUT_MS / 2)`.
  If the timeout is tuned later, the threshold tracks automatically.

- **Docstring corrected**: removed the misleading "before it gets
  killed by the timeout" claim — the hint is on non-aborted path
  only, so timeout-killed commands never see it. The new docstring
  enumerates all suppression paths explicitly.

- **Removed stale line-number reference**: comment said "mirrors the
  background path's `entry.startTime` capture (line ~781)" which goes
  stale on file edits. Now refers conceptually.

**Test coverage gaps closed**:

- **Off-by-one boundary**: 59_999ms → no hint. Pairs with the existing
  60_000ms-exactly test (which fires) to pin the boundary tightly. A
  regression flipping `>=` to `>` would fail loudly.

- **Timeout path explicit**: previous "aborted" test exercised user-
  cancel only. With `vi.useFakeTimers({ toFake: ['Date'] })`,
  `AbortSignal.timeout()` doesn't fake (it depends on the real timer
  subsystem), so `combinedSignal.aborted` stayed false. New test
  follows the pre-existing `should handle timeout vs user cancellation
  correctly` pattern: stubs `AbortSignal.timeout` + `.any` to return
  an already-aborted combined signal, then verifies "Command timed out
  after Nms" appears AND no advisory.

* fix(core): per-invocation long-run threshold + debug-mode + test isolation

Six suggestions from /review's third pass on PR QwenLM#3809:

**Real semantic fix**:
- Long-run threshold now scales with the EFFECTIVE timeout, not the
  fixed default. A user who sets `timeout: 600_000` (10 min) gets the
  advisory at 5 min, not at 60s — respects the explicit timeout
  intent. Replaced the `LONG_RUNNING_FOREGROUND_THRESHOLD_MS` constant
  with a per-invocation `longRunThresholdFor(effectiveTimeout)` helper.

**Debug-mode visibility**:
- Debug mode previously snapshotted `returnDisplayMessage = llmContent`
  BEFORE the truncation + hint append, so debug-mode users saw the
  pre-hint content while the agent saw the advisory — agent suddenly
  suggesting `is_background: true` had no visible trigger in the TUI.
  Re-sync `returnDisplayMessage` after the hint append (debug-mode
  branch only) so the TUI mirrors what the agent sees.

**Type-safety footgun**:
- `if (typeof llmContent === 'string')` would silently drop the hint
  if `llmContent` ever becomes structured `Part[]`. Added an explicit
  `else` comment documenting the deliberate omission and the conditions
  under which to revisit (no string llmContent path exists today).

**Style**:
- Replaced the JSDoc `/** ... */` block on the (now-defunct) constant
  with a plain `//` comment block on the helper, matching the
  `DEFAULT_FOREGROUND_TIMEOUT_MS` / `OUTPUT_UPDATE_INTERVAL_MS` style.

**Test hygiene**:
- Wrapped both `vi.stubGlobal('AbortSignal', ...)` and
  `vi.spyOn(truncateToolOutput, ...)` in `try/finally` so failures
  during the test body don't leak the stub/spy into subsequent tests
  (would cause confusing cascading failures).
- Dropped the internal-roadmap "Phase D part (a)" reference from the
  test comment — future maintainers don't have the context.

**New test**:
- `threshold scales with the user-supplied timeout (not the default)`:
  sets `timeout: 600_000`, advances 100s, verifies no hint. Pins the
  per-invocation coupling so a regression to a fixed constant would
  fail loudly here.

* fix(core): tighten long-run hint suppression + boundary tests + post-truncation insertion (round 4)

Six suggestions from /review's pai/glm-5-fp8 pass on PR QwenLM#3809:

**Behavioural / UX**:
- **Hint now visible in non-debug TUI too.** Previously only debug
  mode mirrored the hint into `returnDisplay`; non-debug users saw
  the agent suggest `is_background: true` with no visible trigger.
  Now the hint is appended to `returnDisplayMessage` in both modes
  (full mirror in debug, terse-append in non-debug to preserve the
  output-or-status form).

**Test coverage**:
- **Debug-mode re-sync test added.** All other long-run hint tests
  run with `getDebugMode → false`; this one flips it to true and
  asserts the hint appears in `returnDisplay` too. Pins the re-sync
  so a regression that drops the debug branch would fail loudly.
- **Threshold-scaling positive test added.** The negative case
  (`timeout: 600_000`, advance 100s, no hint) was already pinned;
  paired now with the positive case (advance 305s, hint fires) so a
  regression to a fixed 60s threshold is caught at both ends.

**Style / consistency**:
- **`result.signal === null` (was `== null`).** Strict equality to
  match the rest of the file. The `signal` field is typed
  `number | null` so loose equality has identical semantics, but the
  inconsistency was noise.

**Doc clarity (timing semantics)**:
- **Comment explains why elapsedMs is computed BEFORE truncation.**
  Two reviewers disagreed on the timing — one read it as before
  truncation (correct, slightly under-reports), the other as after
  (incorrect read). The intent is to report the COMMAND's runtime,
  not the tool call's total time. Truncation is post-processing,
  not part of "agent blocking time", so excluding it is the right
  semantic. Inline comment now spells this out so future readers
  don't have to infer.

* fix(core): error-path hint surfacing + clock-resilient elapsed + threshold floor + observability

Round 5 of PR QwenLM#3809 review — 10 threads, mix of Critical and Suggestion:

**Critical fixes**:

1. **Hint survives the error path** (`#OWbA`). When result.error is
   set, coreToolScheduler builds the model-facing functionResponse
   from `error.message` ONLY (not llmContent — see
   convertToFunctionResponse + the toolResult.error branch in
   scheduler:1648-1724). My hint was being silently dropped on
   long-command-failed cases. Now the hint is appended to
   error.message too so the advisory survives whichever branch the
   scheduler takes.

2. **Hint wording de-ambiguated** (`#OU6o`). "prefer re-running with
   is_background: true" was ambiguous — model could read it as
   "re-run THIS command in the background", which on stateful
   commands (DB migrations, deploys, git push) would cause double
   side effects. Reworded to "Next time you run a SIMILAR
   long-running process..." with an explicit parenthetical that
   warns against re-running the just-completed command.

3. **Debug observability** (`#OU6s`). Added `debugLogger.debug` at
   the hint decision point with elapsedMs / threshold / aborted /
   signal — when a user reports "my 65s command didn't get the
   hint" the suppression branch is now visible in DEBUG output.

**Other behaviour fixes**:

4. **Threshold floor of 1000ms** (`#OU6r`). Pathological
   `timeout: 0` / `timeout: 1` would have given a 0-ms threshold,
   firing the hint on every invocation showing "ran for 0s".
   Floor at 1s makes that branch unreachable.

5. **`performance.now()` instead of `Date.now()`** (`#OU6v`). NTP
   corrections / VM clock drift between capture and read would
   silently make `elapsedMs` negative and skip the hint with no
   observable failure. Monotonic clock prevents that.

6. **Debug mode preserves truncation marker** (`#OU6w` / `#OWCq`).
   Previously `returnDisplayMessage = llmContent` after hint
   clobbered the "Output too long and was saved to: …" line
   appended during truncation. Switched to append-style re-sync in
   BOTH modes so prior content is preserved.

**Test coverage gaps closed**:

7. **Non-debug returnDisplay test** (`#OWCo`). Pinned that the
   user TUI gets the hint in the default (non-debug) mode too.

8. **Test rename** (`#OWCl`). The "debug-mode TUI mirror" test
   passed in non-debug too after the recent refactor; split into
   two tests, one per branch.

9. **Error-path hint test**. Added a test that pins `result.error?.message`
   contains both the original error text AND the hint, covering
   the scheduler-routing-via-error.message path that was silently
   broken before fix #1.

10. **Test: faketimers also fakes `performance`**. Since we
    switched to `performance.now()`, `vi.useFakeTimers({ toFake:
    ['Date'] })` no longer covered the elapsed measurement;
    extended to `['Date', 'performance']` so the threshold tests
    can drive the wall-clock with `advanceTimersByTimeAsync`.

#OU6t (else-comment for the type guard) was already addressed in
the prior round — the explicit else-with-comment is in place;
adding logging there would be noise.

* test(core): cover the MIN_LONG_RUN_THRESHOLD_MS floor branch

PR QwenLM#3809 review: the new `Math.max(MIN_LONG_RUN_THRESHOLD_MS, ...)`
floor in `longRunThresholdFor` was untested — only default-timeout
and large-custom-timeout cases existed. A regression that strips the
floor would let `timeout: 1` produce a 0ms threshold and fire a
"ran for 0s" advisory on every invocation; the test suite would not
catch it.

New test: build with `timeout: 1`, advance 500ms (below the 1000ms
floor), resolve with `aborted: false` to isolate the threshold logic
from the abort path. Asserts no hint appears. A regression that
removes the floor flips the assertion to fail.

* fix(core): structured delimiter on error.message hint + tighten timeout floor comment

Two of three threads from the latest /review pass on PR QwenLM#3809 (the
third — PR description / threshold scaling reconciliation — is fixed
in the PR description update, not in code):

- **`\n---\n` divider before hint in `error.message`** (`#Pt7C`).
  Downstream consumers of `error.message` (firePostToolUseFailureHook,
  telemetry grouping, SIEM alerting, hook-side error parsers) were
  receiving ~400 chars of advisory text mixed inline with the
  original error body — pattern-matching on error messages would
  absorb the advisory into the matched body. Added a `---` separator
  line so the boundary is unambiguous and split-able.

- **Threshold-floor comment narrowed to `timeout: 1`** (`#Pu9o`).
  The comment said the floor guards `timeout: 0` / `timeout: 1`, but
  `validateToolParamValues` rejects `timeout <= 0` at validate time,
  so `timeout: 0` can't reach `longRunThresholdFor`. Updated the
  comment to mention only the actually-allowed pathological case
  (`timeout: 1` and any value `< 2` rounds to 0).

Test updated to assert the `---` divider format with `toMatch`.

* fix(core): capture executionStartTime AFTER spawn so PTY import isn't counted

PR QwenLM#3809 review: copilot caught that `executionStartTime` was
captured BEFORE `await ShellExecutionService.execute(...)`, which
meant the elapsed measurement included `getPty()` dynamic-import
setup (~50-200ms on first call). The hint's "ran for Xs" reading was
slightly inflated, and the comment claiming "spawn → settle" wasn't
strictly accurate.

Moved the capture immediately after the execute() call returns its
{ result, pid } handle. The pid being set by that point confirms the
process has been spawned, so the subtraction is true post-spawn-to-
settle. Comment updated to reflect the actual semantics.

The displayed accuracy gain is small (50-200ms on a 60s+ threshold
is <1%), but the comment claim now matches what the code measures.
Tests unaffected — fakeTimers don't drive real dynamic imports, so
the threshold tests behave identically.

* fix(core): align long-run hint code/tests with ShellExecutionResult.error semantics

Four copilot threads on PR QwenLM#3809 — all rooted in the same
observation: `ShellExecutionResult.error` is reserved for
spawn/setup failures (per the field's doc comment in
shellExecutionService.ts), NOT for non-zero exit codes. My existing
code/tests conflated the two, making the error-path coverage less
realistic and the inline comments inaccurate.

**Test shape fixes**:

- `appends the hint when a long-running foreground command exits
  with error` → `exits non-zero`. Changed `error: new Error('exit
  1')` to `error: null` (the realistic shape for a non-zero exit
  without spawn failure). Added a comment explaining the field
  contract so future test authors don't repeat the conflation.

- `hint survives the error path (appended to error.message)`:
  reframed the mock from `spawn ENOENT` (which would resolve in
  <1s in practice, making the long-elapsed scenario unrealistic)
  to `PTY initialization failed after 75s` — a slow-spawn-failure
  shape that COULD plausibly take 75s. Test still pins the same
  CODE PATH; comment now acknowledges the edge-case nature
  ("rare but real: PTY init dragging, remote-fs exec syscalls,
  security scanners interposing").

**Comment corrections**:

- `returnDisplayMessage` build-order comment was misleading. It
  said "the hint is appended after both the truncation block and
  the returnDisplayMessage build" — but `returnDisplayMessage` is
  built BEFORE truncation. Replaced with a chronological enumeration
  (1. initial value, 2. truncation marker append, 3. hint append)
  that matches what the code actually does.

- Error-path preservation comment now acknowledges the narrow
  applicability (spawn failures only, exit codes don't reach this
  branch). Code is unchanged — the path is still real, just rare.

* test(core): pin empty-output success + background-no-hint paths

Two defensive tests for the long-running foreground hint:

- empty-output success at >=60s — exercises the
  returnDisplayMessage='' → hint append branch (write-only commands
  like `tar czf` / `cp -r` produce no stdout). Asserts the user-
  facing returnDisplay still surfaces the advisory even when the
  command produced nothing else to show.

- background never includes the hint — the foreground hint logic
  lives in executeForeground only, so today this can't fail; the
  test guards against a future refactor hoisting the advisory into
  a shared post-execute path that would tag every background launch
  with a nonsensical "ran for 0s, consider is_background: true"
  suggestion.
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request May 9, 2026
…wenLM#3115)

* feat: add commit attribution with per-file AI contribution tracking via git notes

Track character-level AI vs human contributions per file and store
detailed attribution metadata as git notes (refs/notes/ai-attribution)
after each successful git commit. This enables open-source AI disclosure
and enterprise compliance audits without polluting commit messages.

* feat: enhance commit attribution with real AI/human ratios and generated file exclusion

- Replace line-based diff with a prefix/suffix character-level algorithm
  for precise contribution calculation (e.g. "Esc"→"esc" = 1 char, not whole line)
- Compute real AI vs human contribution percentages at commit time by analyzing
  git diff --stat output: humanChars = max(0, diffSize - trackedAiChars)
- Add generated file exclusion (lock files, dist/, .min.js, .d.ts, etc.)
  ported from an existing generatedFiles.ts
- Add file deletion tracking via recordDeletion()
- Update git notes payload format: {aiChars, humanChars, percent} per file
  with real percentages instead of hardcoded 100%

* feat: add surface tracking, prompt counting, session persistence, and PR attribution

Align with the full attribution feature set:
- Surface tracking: read QWEN_CODE_ENTRYPOINT env var (cli/ide/api/sdk),
  include surfaceBreakdown in git notes payload
- Prompt counting: incrementPromptCount() hooked into client.ts message
  loop, tracks promptCount/permissionPromptCount/escapeCount
- Session persistence: toSnapshot()/restoreFromSnapshot() for serializing
  attribution state; ChatRecordingService.recordAttributionSnapshot()
  writes to session JSONL; client.ts restores on session resume
- PR attribution: addAttributionToPR() in shell.ts detects `gh pr create`
  and appends "🤖 Generated with Qwen Code (N-shotted by Qwen-Coder)"
- Session baseline: saves content hash on first AI edit of each file
  for precise human/AI contribution detection
- generatePRAttribution() method for programmatic access

* fix: audit fixes — initial commit handling, cron prompt exclusion, failed commit counter preservation

- Handle initial commit (no HEAD~1) by detecting parent with rev-parse
  and falling back to --root for first commit in repo
- Exclude Cron-triggered messages from promptCount (not user-initiated)
- Add commitSucceeded parameter to clearAttributions() so failed/disabled
  commits don't reset the prompts-since-last-commit counter
- Add test for clearAttributions(false) behavior

* fix: cross-platform and correctness fixes from multi-round audit

- Normalize path.relative() to forward slashes for Windows compatibility
- Use diff-tree --root for initial commits (git diff --root is invalid)
- Replace String.replace() with indexOf+slice to avoid $& special patterns
- Fix clearAttributions(false→true) when co-author disabled but commit succeeded
- Use real newlines instead of literal \n in PR attribution text
- Add surface fallback in restoreFromSnapshot for version compatibility
- Fix single-quote regex to not assume bash supports \' escaping
- Case-insensitive directory matching in generated file detection
- Handle renamed file brace notation in parseDiffStat

* fix(attribution): also snapshot on ToolResult turns so resume keeps tool edits

Previously, recordAttributionSnapshot() only ran at the start of UserQuery
and Cron turns — before the tools for that turn had executed. A session
that wrote a file in turn 1 and committed in turn 2 (across process
boundaries via --resume) lost the tracked edit: the last persisted
snapshot was the turn-1-start snapshot (empty fileStates), so on resume
the attribution service restored empty state and no git notes were
attached to the commit.

Move the snapshot call out of the UserQuery/Cron conditional and run it
on every non-Retry turn. ToolResult turns are scheduled right after
tools execute, so their start-of-turn snapshot now captures any edits
those tools made. Retry turns are skipped since the state is unchanged
from the prior turn.

Added unit tests asserting the snapshot fires for ToolResult/UserQuery
turns and skips Retry turns.

Verified end-to-end in a scratch repo: write-file in turn 1 (no commit)
→ exit → --resume → commit in turn 2 → git notes now contain the
recorded file with correct aiChars and promptCount: 2.

* refactor(attribution): merge duplicate retry guard and update stale doc

Collapse the two back-to-back messageType !== Retry blocks in
sendMessageStream into one, and refresh chatRecordingService's
recordAttributionSnapshot doc comment to reflect that snapshots fire
on every non-retry turn (not just after user prompts).

* feat(attribution): split gitCoAuthor into independent commit and pr toggles

Matches the shape used upstream in Claude Code's `attribution.{commit,pr}`
so users can disable the PR body line without losing the commit-message
Co-authored-by trailer (or vice versa). The previous boolean forced both
to move together, which conflated two different surfaces.

- settingsSchema: gitCoAuthor becomes an object with nested commit/pr
  booleans, each `showInDialog: true` so both appear in /settings.
- Config constructor accepts legacy boolean (coerced to { commit: v, pr: v })
  so stored preferences from the pre-split schema carry over.
- shell.ts: attachCommitAttribution and addCoAuthorToGitCommit read .commit;
  addAttributionToPR reads .pr.

* feat(settings): add v3→v4 migration for gitCoAuthor shape change

Legacy gitCoAuthor was a single boolean and shipped ~4 months ago; the
previous commit split it into { commit, pr } sub-toggles. Without a
migration, users who had set gitCoAuthor: false would see the settings
dialog show the default (true) for both sub-toggles — misleading and
likely to flip their preference on the next save because getNestedValue
returns undefined when asked for .commit on a boolean.

- New v3-to-v4 migration expands boolean → { commit: v, pr: v },
  preserves already-object values, resets invalid values to {} with a
  warning.
- SETTINGS_VERSION bumped 3 → 4; existing integration assertions use the
  constant so the next bump is a single-line change.
- Regenerate vscode-ide-companion settings.schema.json to reflect the
  new nested shape.
- Docs: split the single gitCoAuthor row into .commit and .pr.

* test(migration): cover null/array/number and partial object for v3-to-v4

The migration already treats any non-boolean, non-object value as invalid
(reset to {} with warning), but the existing test only exercised the
string "yes" branch. Add parameterized cases for null, array, and number
so a future regression that accepts these in the valid bucket gets caught.
Also cover partial objects — the migration must not paternalistically
fill defaults; that responsibility lives in normalizeGitCoAuthor at the
Config boundary.

* fix(shell): address PR review for compound commits and PR body escaping

Two critical issues called out in review:

1. attachCommitAttribution treated the final shell exit code as proof
   that `git commit` itself failed. For compound commands like
   `git commit -m "x" && npm test`, the commit can succeed and a later
   step can fail; the previous code then cleared attribution without
   writing the git note. Now we snapshot HEAD before the command (via
   `git rev-parse HEAD` through child_process.execFile, kept independent
   of the mockable ShellExecutionService) and detect commit creation by
   HEAD movement, so attribution lands whenever a new commit was created
   regardless of later steps.

2. addAttributionToPR spliced the configured generator name into the
   user-approved `gh pr create --body "..."` argument verbatim. A name
   containing `"`, `$`, a backtick, or `'` could break the command or be
   evaluated as command substitution. Now we shell-escape the appended
   text per the surrounding quote style before splicing.

Tests cover the new escape paths for both double- and single-quoted
bodies, including a generator name designed to break interpolation
(`$(rm -rf /) "danger" \`eval\``) and one with an apostrophe.

* fix(attribution): address Copilot review on shell, schema, and totals

Six items called out on PR #3115 by Copilot:

- shell.ts: addAttributionToPR's bash quote escaping doesn't apply to
  cmd.exe / PowerShell, where `\$` and `'\''` aren't honored. Skip the
  PR body rewrite entirely on Windows — losing PR attribution there is
  preferable to corrupting the user-approved `gh pr create` command.

- attributionTrailer.ts + shell.ts call site: buildGitNotesCommand used
  bash-style single-quote escaping on the JSON note, which is broken on
  Windows. Switched to argv form (`{ command, args }`) and routed the
  invocation through child_process.execFile so shell quoting is bypassed
  entirely. Tests updated to assert the argv shape.

- commitAttribution.ts: when a tracked file's aiChars exceeded the diff
  --stat-derived diffSize (long-line edits where diffSize ≈ lines * 40),
  humanChars clamped to 0 but aiChars stayed inflated, leaving aiChars +
  humanChars > the committed change magnitude. Clamp aiChars to diffSize
  so the totals stay consistent.

- shell.ts parseDiffStat: only normalized rename brace notation
  (`{old => new}`). Cross-directory renames emit `old/path => new/path`
  without braces, leaving diffSizes keyed by the full string. Added a
  second normalization step.

- shell.ts: addAttributionToPR docstring claimed `(X% N-shotted)` but
  the implementation only emits `(N-shotted by Generator)`. Updated the
  docstring to match the actual behavior.

- settingsSchema.ts + generator: gitCoAuthor went from boolean to object
  in the V4 migration. The exported JSON Schema now wraps the field in
  `anyOf: [boolean, object]` (via a new `legacyTypes` hint on
  SettingDefinition) so users with a stored boolean don't see a spurious
  IDE warning before their next launch runs the migration.

* fix(attribution): parse binary diffs, source generator from model, sync schema $version

Three follow-up review items from Copilot:

- parseDiffStat now handles git's binary-diff format (`path | Bin A ->
  B bytes`) using the byte delta with a floor of 1. Without this,
  binary edits arrived at the attribution payload as diffSize=0 and
  were silently dropped. Also extracted the parser to a top-level
  exported function so the binary path is unit-testable; added five
  targeted cases (text/binary/rename normalisation/summary skip).

- attachCommitAttribution now passes `this.config.getModel()` into
  generateNotePayload instead of the user-configurable
  `gitCoAuthor.name`. The note's `generator` field reflects which
  model produced the changes — and CommitAttributionService's
  sanitizeModelName() actually has the codename to scrub now.

- generate-settings-schema.ts imports SETTINGS_VERSION instead of
  hardcoding `default: 3`, so a future bump propagates to the emitted
  JSON schema in one place. Regenerated settings.schema.json bumps
  $version's default from 3 to 4 to match the V4 migration.

* fix(attribution): repo-root baseDir, escape co-author trailer, switch to numstat

Three Critical items called out by wenshao:

- attachCommitAttribution was passing config.getTargetDir() as `baseDir`
  to generateNotePayload, but getCommittedFileInfo returns paths
  relative to `git rev-parse --show-toplevel`. When the working
  directory was a subdirectory of the repo, path.relative produced
  `../...` keys that never matched in the AI-attribution lookup,
  silently zeroing out attribution for every file outside getTargetDir.
  StagedFileInfo now carries an optional `repoRoot` (filled in by
  getCommittedFileInfo via `git rev-parse --show-toplevel`) and the
  caller prefers it over the target dir.

- addCoAuthorToGitCommit interpolated `gitCoAuthorSettings.name` and
  `.email` into the rewritten command without escaping. A name
  containing `$()`, backticks, or `"` could be evaluated as command
  substitution under double quotes, or break the user-approved
  `git commit -m "..."` quoting. Now escapes per the surrounding quote
  style with the same helpers addAttributionToPR uses, gates on
  non-Windows for the same shell-quoting reason, and fixes the regex
  to accept `-m"msg"` shorthand (no space) so users who type the
  bash-shorthand form aren't silently denied a trailer.

- parseDiffStat used `git diff --stat` output and approximated each
  line as ~40 chars by parsing a graphical text bar. Replaced with
  `git diff --numstat` which gives unambiguous integer
  additions+deletions per file; the heuristic remains but the parser
  is no longer fooled by the visual `++--` markers. Binary entries
  fall back to a fixed estimate so they still land in the map (rather
  than dropping out as diffSize=0).

Suggestions also addressed: stale duplicate JSDoc on
addCoAuthorToGitCommit removed, misleading `clearAttributions`
comments rewritten to describe what the boolean argument actually
does. Tests cover the new shorthand path, escape behavior, and
numstat parsing (text/binary/rename/malformed).

* fix(shell): shell-aware git-commit detection and apostrophe-escape handling

Two more Critical items called out by wenshao plus the matching Copilot
quote-handling notes:

- attachCommitAttribution and addCoAuthorToGitCommit now go through a
  shell-aware `looksLikeGitCommit` helper instead of a raw
  `\bgit\s+commit\b` regex. The helper splits the command on shell
  separators (`splitCommands`) and checks each segment, so `echo "git
  commit"` no longer triggers attribution clearing or trailer
  injection. The same helper bails on any segment that contains `cd`
  or `git -C <path>`, since either could redirect the commit into a
  different repo than our cwd — writing notes or capturing HEAD there
  would corrupt unrelated state.

- The post-command attribution call now runs regardless of whether the
  shell wrapper aborted. `git commit -m "x" && sleep 999` could move
  HEAD and then time out, leaving the new commit without its
  attribution note while the stale per-file attribution stayed around
  for a later unrelated commit. attachCommitAttribution still gates on
  HEAD movement, so it's a no-op when no commit was actually created.

- The `-m '...'` and `--body '...'` regexes used to match only the
  first quote segment, so a command like `git commit -m 'don'\''t'`
  (bash's standard apostrophe-escape form) would have the trailer
  spliced mid-message and break the command's quoting. The single-
  quote patterns now use a negative lookahead / inner alternation to
  either skip those messages entirely (commit path) or match the
  whole escape-aware body (PR path).

Tests cover the new behavior: quoted "git commit" is left alone, the
`cd && git commit` and `git -C` patterns get no trailer, and the
apostrophe-escape form passes through unchanged for both `-m` and
`--body`.

* fix(attribution): drop magic 100 fallback for empty deletions

Deleted files with no AI tracking now use diffSize directly. With
numstat as the input source, diffSize is an exact count, and an
empty-file deletion legitimately reports zero — a magic fallback would
only inflate totals.

* fix(shell): broaden git-commit detection, gate background, drop dead helpers

Five Copilot follow-ups:

- looksLikeGitCommit now strips leading env-var assignments
  (`GIT_COMMITTER_DATE=now git commit ...`) and a small allowlist of
  safe wrappers (`sudo`, `command`) before matching. The previous
  exact-prefix match silently skipped trailer injection on common
  real-world commit forms.

- A new looksLikeGhPrCreate (same shell-aware shape) replaces the raw
  `\bgh\s+pr\s+create\b` regex in addAttributionToPR, so quoted text
  like `echo "gh pr create --body \"x\""` no longer triggers a
  command-string rewrite.

- executeBackground refuses to run `git commit` and tells the user to
  re-run foreground. The BackgroundShellRegistry lifecycle has no
  hook for the post-command pre/post-HEAD comparison or git-notes
  write, so allowing the commit through would create the new commit
  without notes and leak stale per-file attribution into the next
  foreground commit.

- recordDeletion was unused outside its own test — removed (and the
  test). When AI-driven deletions need tracking we'll add it with an
  actual integration point rather than carrying dead API surface.

- generatePRAttribution was likewise unused; addAttributionToPR
  builds the trailer string inline. The two formats had already
  diverged. Removed the helper and its tests; reviving from git
  history is straightforward if a future caller needs it.

Tests: env-var and sudo prefixes now produce trailers; quoted
"gh pr create" leaves the command unchanged; existing 81 shell tests
still pass alongside the trimmed 25 commitAttribution tests.

* fix(shell): unified git-commit detection split by intent

Six items called out across CodeQL, Copilot, and wenshao:

- The earlier `looksLikeGitCommit`/`stripCommandPrefix` returned a
  single yes/no and rejected ANY `cd` in the chain. That fixed the
  wrong-repo case but also disabled attribution for `git commit -m
  "x" && cd ..` (commit already landed safely in our cwd; the cd
  came after). It also conflated three distinct decisions onto one
  predicate.

  New `gitCommitContext` returns both `hasCommit` and
  `attributableInCwd`, walking segments in order so that a `cd`
  AFTER the commit doesn't invalidate it. Callers now pick the right
  arm:
  - background-mode refusal uses `hasCommit` (refuses even
    `cd /elsewhere && git commit` since we can't attribute it
    afterward either way)
  - HEAD snapshot, addCoAuthorToGitCommit, and the
    attachCommitAttribution gate use `attributableInCwd`

- Tokenisation switches from a regex while-loop to `shell-quote`'s
  `parse`. Quoted env values like `FOO="a b" git commit` now skip
  correctly (the old `\S*\s+` form would cut after the opening
  quote). Eliminates the CodeQL polynomial-regex alert at the same
  time since the `\S*\s+` pattern is gone.

- attachCommitAttribution now snapshots prompt counters via
  `clearAttributions(true)` whenever a commit lands, even if no
  per-file attributions were tracked. Previously the early-return
  on `hasAttributions() === false` meant `promptCountAtLastCommit`
  never advanced, so a later `gh pr create` reported an inflated
  N-shotted count spanning multiple commits.

Tests: env-var and sudo prefixes still produce trailers; quoted
"git commit" / "gh pr create" leave commands unchanged; cd BEFORE
commit suppresses the rewrite while cd AFTER commit does not; `git
-C <path> commit` is treated as a commit (refused in background)
but not as attributable.

* fix(shell): position-independent git subcommand detection + bash-shell guard

Six review items, two of them critical:

- gitCommitContext was checking fixed-position tokens (`arg1`, `arg3`)
  and missed every git invocation that puts a global flag between
  `git` and the subcommand: `git -c user.email=x@y commit`,
  `git --no-pager commit`, `git -C /p -c k=v commit`, etc. In
  background mode these would slip past the refusal guard; in
  foreground they got no co-author trailer, no git note, and no
  prompt-counter snapshot. New `parseGitInvocation` walks past
  git's global flags (with their values) before reading the
  subcommand, and reports `changesCwd` for `-C` / `--git-dir` /
  `--work-tree`.

- The Windows guard on addCoAuthorToGitCommit and addAttributionToPR
  used `os.platform() === 'win32'`, which incorrectly skipped Windows
  + Git Bash (`getShellConfiguration().shell === 'bash'`). Switched
  both to gate on `getShellConfiguration().shell !== 'bash'` so Git
  Bash users keep the feature.

- attachCommitAttribution was re-parsing `gitCommitContext(command)`
  even though `execute()` already gates on `commitCtx.attributableInCwd`.
  Removed the redundant re-parse — drift between the two checks would
  silently diverge trailer injection from git-notes writes.

- tokeniseSegment (formerly tokeniseProgram) now logs via debugLogger
  on parse failure instead of swallowing silently. Easier to debug
  if shell-quote ever throws on something unusual.

- Added a comment on `cwdShifted` documenting that it's a one-way
  latch — `cd src && cd ..` will still skip attribution. The
  trade-off matches the wrong-repo guard's "better miss than corrupt
  unrelated repos" intent.

- Stale `--stat` reference in the aiChars-clamp comment updated to
  `--numstat` to match the actual git command in
  ShellToolInvocation.getCommittedFileInfo.

Tests: `git -c key=val commit` and `git --no-pager commit` now
produce a trailer; existing 82 shell tests still pass.

* fix(shell): refuse multi-commit attribution; misc review follow-ups

Five follow-ups from the latest review pass:

- attachCommitAttribution now refuses to write a single git note for
  shell commands that produce more than one commit (e.g.
  `git commit -m a && git commit -m b`). The singleton's per-file
  attribution map can't be partitioned across the individual commits,
  so attaching the combined note to HEAD would mis-attribute earlier
  commits' changes to the last one. Walks `preHead..HEAD` via
  `git rev-list --count`; on multi-commit detection it snapshots the
  prompt counters and bails with a debug warning instead of writing
  a misleading note.

- parseGitInvocation now recognises the attached `-C/path` form
  (e.g. `git -C/path commit -m x`). shell-quote tokenises that as a
  single `-C/path` token which previously fell to the generic flag
  branch with `changesCwd = false`, leaving an out-of-cwd commit
  classified as attributable.

- attachCommitAttribution dropped its unused `command` parameter
  (the caller already gates on `commitCtx.attributableInCwd`, so
  re-parsing was removed earlier; the parameter became dead).

- Added wiring guards in edit.test.ts and write-file.test.ts:
  AI-originated edits/writes hit `CommitAttributionService.recordEdit`,
  `modified_by_user: true` skips, and write-file's distinction
  between a true new file and an overwritten empty file (`null` vs
  `''` old content) is now pinned by `aiCreated` assertions.

* fix(attribution): partial-commit clear, symlink baseDir, gh/git flag handling

Two Critical items, two Copilot, and five wenshao Suggestions:

- attachCommitAttribution's `finally` block used to call
  `clearAttributions()` unconditionally, wiping per-file tracking
  for files the AI had edited but the user excluded from this
  commit. Added `clearAttributedFiles(committedAbsolutePaths)` to
  the service and the call site now passes only the paths that
  actually landed in this commit; entries for un-`add`ed files stay
  pending for a later commit.

- generateNotePayload now runs both `baseDir` and each tracked
  absolute path through `fs.realpathSync` before `path.relative`.
  On macOS in particular `/var` symlinks to `/private/var`, so the
  toplevel from `git rev-parse --show-toplevel` and the absolute
  path captured by edit/write-file tools could diverge — producing
  `../../actual/path` keys in the lookup that never matched and
  silently zeroed all per-file AI attribution.

- tokeniseSegment now consumes value-taking sudo flags (`-u`,
  `-g`, `-h`, `-D`, `-r`, `-t`, `-C`, plus the long forms). Without
  this, `sudo -u other git commit` left `other` standing in for
  the program name and skipped the trailer entirely.

- A duplicate JSDoc block above `countCommitsAfter` (a leftover
  from the earlier extraction of `getGitHead`) was removed; both
  helpers now have one accurate comment each.

- attachCommitAttribution's multi-commit guard now also runs when
  `preHead === null` (brand-new repo), via `git rev-list --count
  HEAD`. A compound `git init && git commit -m a && git commit -m b`
  no longer slips through and mis-attributes combined data to the
  last commit.

- addCoAuthorToGitCommit's `-m` matching switched to `matchAll` and
  takes the LAST match. `git commit -m "title" -m "body"` puts the
  trailer at the end of the body so `git interpret-trailers`
  recognises it; the previous first-match behaviour stuffed the
  trailer in the title where git treats it as plain message text.

- addAttributionToPR's `--body` regex accepts both space and
  `=` separators (`--body "..."` and `--body="..."`); the `=` form
  is common with gh.

- New `parseGhInvocation` walks past gh's global flags
  (`--repo`, `-R`, `--hostname`) so `gh --repo owner/repo pr
  create ...` is detected. The earlier fixed-position check at
  tokens[1]/tokens[2] missed any command with a global flag.

- getCommittedFileInfo now fans out the two `rev-parse` calls and
  the three diff calls with `Promise.all`. They're independent and
  serialising them was paying spawn latency 5× per commit.

Tests: sudo with `-u user`, multi `-m`, `gh --repo owner/repo`,
`--body="..."`, plus the existing 84 shell tests still pass.

* fix(attribution): canonicalize file paths centrally in CommitAttributionService

Two related Copilot follow-ups:

- recordEdit/getFileAttribution/clearAttributedFiles now run input
  paths through fs.realpathSync before storing/looking up, so a
  symlinked path (e.g. macOS /var ↔ /private/var) resolves to the
  same key regardless of which form the caller passes. Previously
  edit.ts/write-file.ts handed in non-realpath'd absolute paths
  while generateNotePayload tried to realpath only inside its
  lookup loop, leaving partial-clear and clear-on-finally paths
  unable to find entries when the forms diverged.

- restoreFromSnapshot also canonicalises on the way in so a
  session resumed from a pre-fix snapshot (where keys may not
  have been canonical) ends up with the same shape as newly
  recorded entries — otherwise a single file could end up with
  two parallel records.

- generateNotePayload's lookup loop dropped its per-entry realpath
  call (now redundant since keys are canonical at write time),
  keeping only the realpath of `baseDir` (which still comes from
  `git rev-parse --show-toplevel` and may be a symlink).

- Updated `clearAttributedFiles` doc to describe the new semantics:
  callers can pass either the resolved repo-relative path or an
  already-canonical absolute path, and either will match.

* fix(attribution): canonicalize-from-root cleanup; fix mixed-quote -m / gh -R=

Five review items, one Critical:

- attachCommitAttribution now canonicalises via the repo *root* (one
  realpath call) and resolves committed paths against that canonical
  root, rather than per-leaf realpath inside clearAttributedFiles.
  At cleanup time the leaf for a just-deleted file no longer exists,
  so per-leaf fs.realpathSync would fail and silently fall back to a
  non-canonical path that misses the stored canonical key — leaving
  stale attributions for deleted files.
  clearAttributedFiles drops its internal realpath and now documents
  the canonical-paths-required precondition explicitly.

- addCoAuthorToGitCommit picks the LAST `-m` regardless of quote
  style. Previously `doubleMatch ?? singleMatch` always preferred
  the last double-quoted match, so `git commit -m "Title" -m
  'Body'` injected the trailer into the title where git
  interpret-trailers would silently ignore it. Now compares match
  indices, and the escape helper follows the actually-selected
  match's quote style.

- parseGhInvocation handles `-R=value` (the equals form of the
  short `--repo` alias). `--repo=...` and `--hostname=...` were
  already covered; `-R=...` previously fell through to the generic
  flag branch and skipped the value.

- New tests for the symlink-aware canonicalisation: macOS-style
  `/var` ↔ `/private/var` mapping is mocked via vi.mock on
  node:fs, with cases for record-then-look-up under either form,
  generateNotePayload with a symlinked baseDir, partial clear via
  the canonical-root-derived path (deleted leaf), and snapshot
  restore canonicalisation.

- Doc-only: integration-test header comments updated from
  "V1 -> V2 -> V3" / "migration to V3" to reflect the actual V4
  end state (assertions already used the literal `4`).

* fix(shell): scope -m rewrite to commit segment, reject nested matches

Two Critical findings on addCoAuthorToGitCommit, plus a Copilot
maintainability nit:

- The `-m` regex used to scan the whole compound command, so
  `git commit -m "fix" && git tag -a v1 -m "release"` would target
  the LATER tag annotation (last -m wins) and splice the trailer
  there instead of the commit message. The rewrite now scopes to
  the actual `git commit` segment via a new
  findAttributableCommitSegment(): same shell-aware walk
  gitCommitContext does, but returning the segment's character
  range so the regex can be run on a slice and spliced back into
  the original command.

- Within the segment, a literal `-m '...'` *inside* a quoted body
  was treated as a real later -m. For
  `git commit -m "docs mention -m 'flag' for completeness"`, the
  inner single-quoted -m sits at a higher index than the real
  outer -m, and the previous index comparison would have it win —
  splicing the trailer mid-message and corrupting the quoting.
  The new code checks whether the candidate is nested inside the
  other quote-style's range (start/end containment) and prefers
  the outer match when so.

- Hoisted three constant Sets (sudo flag list, git global flags
  taking values, git global flags shifting cwd, gh global flags)
  out of the per-call scope to module constants. Functional
  no-op, but keeps the parsing helpers easier to read and avoids
  re-allocating the Sets on every command.

Two regression tests added for the cases above:
- inner `-m '...'` inside the outer message body is preserved
  literally and the trailer lands after the body
- `git tag -a v1 -m "release notes"` after a real
  `git commit -m "fix"` is left untouched, with the trailer
  appended to "fix" only

* fix(attribution): cd-leak, numstat partial failure, $() bailout, gh pr new alias

Five Critical/Suggestion items:

- `cd subdir && git commit` (or any non-attributable commit chain
  whose HEAD movement still happens in our cwd, e.g. cd into a
  subdirectory of the same repo) used to skip attribution AND fail
  to clear pending per-file entries. Those entries then leaked into
  the next foreground commit, inflating its AI percentage. New
  `else if (commitCtx.hasCommit)` branch in execute() compares pre-
  and post-HEAD; if HEAD moved we drop the per-file state. preHead
  is now snapshotted whenever ANY commit was attempted, not only
  attributable ones.

- getCommittedFileInfo's three diff calls run in `Promise.all`. If
  `--numstat` failed while `--name-only` succeeded, every file's
  diffSize would be 0 and generateNotePayload would clamp aiChars
  to 0 — emitting a structurally valid note with all-zero AI
  percentages. Detect the partial-failure shape (files non-empty,
  diffSizes empty) and return empty so no note is written.

- addCoAuthorToGitCommit and addAttributionToPR now bail when the
  captured `-m`/`--body` value contains `$(`. The tool description
  recommends `git commit -m "$(cat <<'EOF' ... EOF)"` for
  multi-line messages, but the regex's `(?:[^"\\]|\\.)*` body group
  stops at the first interior `"` from a nested shell token —
  splicing the trailer there breaks the command before it reaches
  the executor.

- looksLikeGhPrCreate now accepts `gh pr new` as well — it's a
  documented alias for `gh pr create` and was silently skipped.

- Removed `incrementPermissionPromptCount` / `incrementEscapeCount`
  and their getters: they had no production callers, so the backing
  fields just round-tripped through snapshots as 0. The four
  snapshot fields are now optional so pre-fix snapshots that carry
  non-zero values still load cleanly and just get ignored.

Three regression tests added: heredoc-style `-m "$(cat <<EOF...)"`
preserved literally, heredoc-style `--body` likewise, `gh pr new
--body "..."` rewritten with attribution.

* fix(attribution): --amend, --message/-b aliases, .d.ts over-exclusion

Four Copilot follow-ups, three of them user-visible coverage gaps:

- `git commit --amend` was diffing `HEAD~1..HEAD` for attribution,
  which spans the entire amended commit (parent → amended) rather
  than the actual amend delta. A message-only amend would emit a
  note attributing every file in the original commit to this
  amend. New `isAmendCommit` helper detects the flag and
  getCommittedFileInfo switches to `HEAD@{1}..HEAD` (the pre-amend
  HEAD vs the amended HEAD); if the reflog is GC'd we bail with a
  warning rather than over-attribute.

- `git commit --message "..."` and `--message="..."` were silently
  skipped because the regex only recognised the short `-m` form.
  The flag prefix now matches both alternatives via
  `(?:-[a-zA-Z]*m|--message)\s*=?\s*` (non-capturing inner group
  so the existing `[full, prefix, body]` destructure still works).

- `gh pr create -b "..."` (the short alias for `--body`) was the
  same gap on the PR side; `(?:--body|-b)[\s=]+` now covers both
  forms.

- `.d.ts` was an over-broad blanket exclusion in
  EXCLUDED_EXTENSIONS — declaration files are commonly authored
  (ambient declarations, asset shims like `*.d.ts` for
  `import './x.svg'`); the repo even contains
  `packages/vscode-ide-companion/src/assets.d.ts`. Removed `.d.ts`
  from the extensions Set and adjusted the test to assert the new
  behavior. Auto-generated `.d.ts` (e.g. `tsc --declaration`
  output) still gets caught by the build-directory rules.

Tests added: `--amend` plumbing covered by the new branch in
getCommittedFileInfo (no targeted unit test — the diff invocation
goes through ShellExecutionService and is exercised by the existing
post-command path); `--message`/`--message="..."`/-b/-b="..."` all
have positive trailer-injection assertions; `.d.ts` test split into
"hand-authored" (negative) and "in dist" (positive).

* fix(attribution): cd-subdir, scope --body, multi-commit count guard, /clear reset

Four bugs flagged this round:

- gitCommitContext / findAttributableCommitSegment used a blanket
  "any cd shifts cwd" gate, breaking the very common
  `cd subdir && git commit -m "..."` flow even though the commit
  lands in the same repo. New `cdTargetMayChangeRepo` heuristic:
  treat relative paths that don't escape upward (no leading `..`,
  no absolute path, no `~`/`$VAR` expansion, no bare `cd`/`cd -`)
  as in-repo and let attribution proceed. Conservative on anything
  it can't statically verify.

- addAttributionToPR was running the `--body`/`-b` regex against
  the FULL compound command string. In
  `curl -b "session=abc" && gh pr create --body "summary"` the
  regex would match curl's `-b` cookie flag and inject attribution
  into the cookie value, corrupting the curl call. Added
  `findGhPrCreateSegment` (analog of `findAttributableCommitSegment`)
  and scoped the body regex to that segment, splicing back into
  the original command via offsetting the in-segment match index.

- The multi-commit guard treated `runGitCount === 0` as "single
  commit" and bypassed itself. After `commitCreated === true`, a
  count of 0 is impossible in normal operation — it means
  rev-list errored or timed out. Now we bail on `commitCount !== 1`
  with a tailored message: anything other than exactly 1 commit
  is suspicious and refuses the note.

- The CommitAttributionService singleton survives across
  `Config.startNewSession()` (the `/clear` and resume paths). New
  `CommitAttributionService.resetInstance()` call alongside the
  existing chat-recording / file-cache resets in startNewSession
  prevents pending attributions from a prior session attaching to
  a commit in the new one.

Three regression tests added: `cd src && git commit` produces a
trailer (in-repo cd), `cd .. && git commit` does not (could escape
repo root), and `curl -b "..." && gh pr create --body "..."` leaves
curl's cookie value untouched while attribution lands in gh's body.

* fix(attribution): cd embedded .., env wrapper, Windows ARG_MAX, segment-locator warn

Four review items, all small but real:

- cdTargetMayChangeRepo missed embedded `..` traversal — `cd
  foo/../../escape` and similar would slip past the leading-`..`
  check and be treated as in-repo. Added an `includes('/..')` /
  `includes('\\..')` check (catches POSIX and Windows separators
  without false-positiving on `..` chars inside ordinary names,
  which only escape when followed by a separator).

- tokeniseSegment now recognises `env` as a safe wrapper alongside
  `sudo`/`command`, so `env GIT_COMMITTER_DATE=now git commit ...`
  resolves to `git`. After the wrapper detection we also skip any
  `KEY=VALUE` argv entries (env's own argument syntax for setting
  vars before the program).

- buildGitNotesCommand's MAX_NOTE_BYTES dropped from 128 KB to
  30 KB. Windows' CreateProcess lpCommandLine is capped around
  32,768 UTF-16 chars including the executable path and other argv
  entries; a 128 KB note would still fail to spawn even though
  the function returned a command instead of null. 30 KB leaves
  ~2 KB of headroom for the rest of the argv on Windows and is
  larger than any real commit's metadata in practice.

- findAttributableCommitSegment / findGhPrCreateSegment now log a
  debugLogger.warn when `command.indexOf(sub, cursor)` returns -1
  — splitCommands strips line continuations (`\<newline>`), so a
  multi-line command can have the trimmed segment text fail to
  match its source. Previously the segment was silently skipped
  with no signal; the warn makes the failure observable when
  QWEN_DEBUG_LOG_FILE is set.

Two regression tests added: `cd foo/../../escape && git commit`
gets no trailer (embedded-`..` heuristic catches it), and
`env GIT_COMMITTER_DATE=now git commit` does (env wrapper skipped).

* fix(attribution): scope isAmendCommit to attributable segment only

`git -C ../other commit --amend && git commit -m x` would previously
flag the second (fresh) commit as an amend, causing
attachCommitAttribution to diff `HEAD@{1}..HEAD` against an unrelated
reflog entry. Mirror findAttributableCommitSegment's cd/cwd tracking
so only the first commit segment that runs in the original cwd
determines amend status.

* fix(attribution): last-match --body, symlink leaf canonicalisation, scoped prompt count

- addAttributionToPR: use matchAll/last-match for `--body`/`-b` so the
  trailer lands in the gh-honoured (final) body when multiple flags are
  present. Mirrors addCoAuthorToGitCommit. Adds regression test.
- attachCommitAttribution: also fs.realpathSync the per-file resolved
  path (not just the repo root) so files behind intermediate symlinks
  are matched against canonical keys recordEdit stored, instead of
  silently zeroing attribution and leaking entries past commit.
- incrementPromptCount: scope to SendMessageType.UserQuery — ToolResult,
  Retry, Hook, Cron, Notification are model/background re-entries of
  the same logical turn. Tracking them all inflated the "N-shotted"
  trailer (one user message could become 10-shotted with 10 tool calls).
- AttributionSnapshot: add `version: 1` field; restoreFromSnapshot now
  refuses incompatible versions and validates per-field types so a
  partially-written snapshot can't seed `Math.min(undefined, n) === NaN`
  into git-notes payloads.
- Drop unused permission/escape counters (declared, persisted, never
  read or incremented) — fields, snapshot tolerance, and clear-method
  bookkeeping all removed; AttributionSnapshot interface simplifies.
- isGeneratedFile: switch directory rule from substring `.includes('/dist/')`
  to segment-boundary check (split on `/`) so project dirs like
  `my-dist/` or `xbuild/` don't match. `.lock` removed from the blanket
  extension exclusion — well-known lockfiles already covered by
  EXCLUDED_FILENAMES; hand-authored `.lock` files (e.g. `.terraform.lock.hcl`)
  now stay attributable.
- getClientSurface: document `QWEN_CODE_ENTRYPOINT` as the embedder
  override hook so the always-`'cli'` default is intentional.

* fix(attribution): skip values for env -u NAME and -S string

`env`'s value-taking flags (`-u`/`--unset`, `-S`/`--split-string`) were
not in the wrapper's flag-skip allowlist, so `env -u FOO git commit ...`
left FOO as the next token and the parser treated it as the program —
masking the real `git commit` from attribution detection. Add an
ENV_FLAGS_WITH_VALUE table mirroring the sudo allowlist. Regression
test added.

* fix(attribution): submodule leak, PR body nesting, shallow-clone bail, schema default

- attachCommitAttribution: when HEAD didn't move in our cwd, leave
  pending attributions alone instead of dropping them. The case can be
  a failed commit, `git reset HEAD~1`, OR `cd submodule && git commit`
  (inner repo's HEAD moves, ours doesn't). Dropping was overly
  aggressive and silently lost outer-repo edits in the submodule case.
- addAttributionToPR: mirror addCoAuthorToGitCommit's nested-match
  rejection so `gh pr create --body "docs mention -b 'flag'"` picks the
  outer `--body`, not the inner literal `-b`. Splicing into the inner
  match would corrupt the body. Regression test added.
- getCommittedFileInfo: when `rev-parse --verify HEAD~1` fails, also
  check `rev-list --count HEAD === 1` to confirm HEAD is the true
  root commit. In a shallow clone, HEAD~1 is unreadable but the commit
  has a parent recorded — falling back to `diff-tree --root` would
  diff against the empty tree and over-attribute the entire commit.
  Bail with a debug warning instead.
- generate-settings-schema: lift `default` (and `description`) out of
  the inner `anyOf[N]` schema to the outer level when wrapping with
  `legacyTypes`. Most JSON-schema-driven editors only surface
  top-level defaults; burying the default under `anyOf` lost the
  "enabled by default" hint. Also extend the default filter to
  publish non-empty plain objects (so `gitCoAuthor`'s default can
  appear). gitCoAuthor's source default updated to the runtime shape
  `{commit: true, pr: true}` to match `normalizeGitCoAuthor`.

* fix(attribution): drop unsafe full-clear, tag analysis-failure with null

ju1p (Copilot): the `else if (commitCtx.hasCommit)` branch fully
cleared the singleton on `cd /abs/same-repo/subdir && git commit`
(or `git -C . commit`), losing pending AI edits the user hadn't
staged. We can't tell which files were in the commit from this
branch, and the next attributable commit's partial-clear handles
cleanup correctly anyway. Drop the branch entirely.

ju2D (Copilot): `getCommittedFileInfo` returned the same empty
StagedFileInfo for both "could not analyze" (shallow clone, --amend
without reflog, --numstat partial failure, exception) and
"intentionally empty" (--allow-empty). The caller couldn't tell them
apart, so the partial clear became a no-op on analysis failure and
the just-committed AI edits leaked to the next commit. Switch the
return type to `StagedFileInfo | null` and have the caller treat
null as "fall back to full clear" while empty StagedFileInfo
(--allow-empty) leaves attributions intact for the next real commit.

* fix(attribution): dedup snapshot writes, cap excludedGenerated, doc commit toggle scope

rsf- (Copilot): recordAttributionSnapshot wrote a full snapshot to
the JSONL on every non-retry turn, even when the tracked state was
unchanged. Long-running sessions accumulated thousands of identical
snapshot copies, inflating session size and slowing /resume hydrate.
Dedup by JSON-equality with the prior write — first write always
goes through, identical successors are no-ops.

rsgo (Copilot): excludedGenerated path list was unbounded. A commit
churning thousands of generated artifacts (large dist/ rebuild)
could push the JSON note past MAX_NOTE_BYTES (30KB) and lose
attribution for the real source files in the same commit. Cap the
serialized sample at MAX_EXCLUDED_GENERATED_SAMPLE (50) and add
excludedGeneratedCount for the true total.

rsg9 + rshM (Copilot): the gitCoAuthor.commit description claimed
the toggle only controlled the Co-authored-by trailer, but
attachCommitAttribution also gates the per-file git-notes payload
on the same flag. Update both the schema description and the
settings.md table to mention both effects so disabling the option
isn't a silent surprise.

* fix(attribution): depth-1 shallow detection, snapshot dedup post-rewind/post-failure

sfGz (Copilot): rev-list --count HEAD === 1 cannot distinguish a
true root commit from a depth-1 shallow clone — both report 1
because rev-list only walks locally available objects. Switch to
git log -1 --pretty=%P HEAD which reads the parent SHA directly
from commit metadata: empty means a real root, non-empty means a
parent is recorded (whether or not its object is local). The
shallow-clone bail is now reliable.

sfIm (Copilot): the dedup key persisted across rewindRecording, so
the previous snapshot living on the now-abandoned branch would
match the next post-rewind snapshot and silently skip the write,
leaving /resume on the rewound session with no attribution state.
Reset lastAttributionSnapshotJson when rewindRecording fires.

sfJE (Copilot): dedup key was committed before the async write
settled. A transient write failure would update the key, then
permanently suppress all future identical snapshots even though
nothing was ever persisted. Switch to optimistic-set then rollback
on appendRecord rejection — synchronous identical calls dedup
cleanly, but a failed write clears the key so the next identical
snapshot retries. appendRecord now returns the per-record write
promise (writeChain still has its swallow-catch for chain liveness)
so callers needing per-write success can react to it. Tests added
in chatRecordingService.test.ts for both rewind-reset and
rollback-on-failure paths.

* fix(attribution): preHead race, regex apostrophe-escape, surface failures, dead code

t2G0 (deepseek-v4-pro): addCoAuthorToGitCommit single-quote regex now
matches the bash close-escape-reopen apostrophe form using
((?:[^']|'\\'')*) — the same pattern bodySinglePattern uses for
gh pr create. Input like git commit -m 'don'\''t' was previously
silently un-rewritten because the negative lookahead bailed; the
trailer now lands at the FINAL closing quote. Test updated.

tMBP (gpt-5.5): preHead capture switched from concurrent async
getGitHead to a synchronous getGitHeadSync (execFileSync) BEFORE
ShellExecutionService.execute spawns the user's command. A fast
hot-cached git commit could move HEAD before the async rev-parse
resolved, leaving preHead === postHead and silently skipping the
attribution note. Trade ~10–50 ms event-loop block per
commit-shaped command for correctness of the post-command HEAD
comparison.

t2Gv (deepseek-v4-pro): attribution write failures (note exec
non-zero, payload too large, diff-analysis exception, shallow
clone / amend-without-reflog) are now surfaced on the shell tool's
returnDisplay AND llmContent so the user and agent both see when
their commit succeeded but the per-file git note didn't land.
attachCommitAttribution now returns string | null (warning text or
null for intentional skips like no-tracked-edits). Co-authored-by
trailer is unaffected — only the note is gated by these failures.

t2Gy (deepseek-v4-pro): committedAbsolutePaths now matches against
the canonical keys already stored in fileAttributions
(matchCommittedFiles iterates by relative path against the
canonical repo root) instead of re-resolving each diff path
on the fly. realpathSync(resolved) failed for deleted files and
didn't follow intermediate symlinks, leaving stale per-file
attribution alive past commit and inflating AI percentages on
subsequent commits.

t2HI (deepseek-v4-pro): removed dead sessionBaselines /
FileBaseline / contentHash / computeContentHash infrastructure
(~40 lines). The fields were written, persisted, and restored but
never read for any computation or decision. AttributionSnapshot
schema stays at version 1 — restore tolerates pre-fix snapshots
that carried the now-ignored baselines field.

t2HM (deepseek-v4-pro): extracted the duplicated lastMatch helper
in addCoAuthorToGitCommit and addAttributionToPR into a single
module-level lastMatchOf so future fixes can't be applied to only
one copy.

* chore(schema): regenerate settings.schema.json to match gitCoAuthor.commit description

The settingsSchema.ts source for `gitCoAuthor.commit.description` was
updated in 3c0e3293b but the JSON schema only picked up the OUTER
description rewrite and missed this inner property's. The Lint check
("Check settings schema is up-to-date") fails on that drift; this
commit re-runs `npm run generate:settings-schema` to sync them.

* fix(attribution): preserve unstaged AI edits across cleanup branches

uxU5 + uxVQ + uxUO (Copilot): every cleanup branch in
attachCommitAttribution that called clearAttributions(true) was
wholesale-erasing pending AI edits for files the user never staged
in this commit. Reviewer scenarios:
- multi-commit chain (`commit a && commit b`) bails out without
  writing a note, but unstaged edits to file Z (touched by neither
  commit) get cleared along with the chain's committed files.
- attribution toggle off: same — toggling the flag wipes pending
  unstaged work.
- analysis failure (shallow clone, --amend without reflog, partial
  diff failure): the finally-block fallback wholesale-cleared
  every pending file, consuming unrelated AI edits.
- 0%-AI commit: when no file in the commit was AI-touched,
  generateNotePayload was emitting an "0% AI" note attached to a
  commit that legitimately had no AI involvement — actively
  misleading metadata.

Add `noteCommitWithoutClearing()` to the service: snapshots the
prompt counter as the new "at last commit" but leaves the per-file
map alone. Use it in the multi-commit, no-tracked-edits,
toggle-off, and analysis-failure paths. The committed-files
partial-clear (clearAttributedFiles) still runs in the success
path. The 0%-AI no-match case now skips the note write entirely.

* fix(attribution): runGit null-on-failure, versionless v3→v4 migration

z54M (Copilot): runGit returned '' on both successful-empty-output
and silent failure, so a `--name-only` that errored mid-way through
the diff fan-out aliased to a real `--allow-empty` commit. The
empty-commit branch then preserved pending attributions, leaving
the just-committed file's tracked AI edit alive to re-attribute on
the next commit. Switch runGit to `Promise<string | null>`,
distinguishing exit code 0 (any output, including '') from non-zero
(null). The diff-stage fan-out and ancillary probes now treat null
as analysis failure and bail with `return null` instead of falling
into the empty-commit path.

z539 (Copilot): the v3→v4 `shouldMigrate` only fired on
`$version === 3`. A versionless settings file carrying the legacy
`general.gitCoAuthor: false` boolean would skip every migration
(gitCoAuthor isn't in V1_INDICATOR_KEYS — it post-dates V2), get
its `$version` normalized to 4 by the loader, and leave the
boolean in place. The settings dialog then reads the V4
`{commit, pr}` shape, sees missing keys, defaults both to true, and
silently overwrites the user's opt-out on the next save. Also fire
when `$version` is absent AND the value at `general.gitCoAuthor`
is a boolean. Tests cover the new path and confirm the existing
versioned/object-shape paths are untouched.

* fix(attribution): toggle-off partial clear, normalizeGitCoAuthor type-check, terraform lockfile

0oAK (Copilot): the gitCoAuthor.commit toggle-off branch returned
before computing the committed file set, leaving the just-committed
files' tracked AI work in the singleton. Re-enabling the toggle and
committing the same file again would re-attribute earlier (already-
committed) AI edits to the new commit. Move the toggle gate AFTER
matchCommittedFiles so the finally block does a proper partial clear
of the just-committed files even when the note write is skipped.

0oAg (Copilot): normalizeGitCoAuthor copied value?.commit / value?.pr
without type-checking. settings.json is hand-editable; a stored
`{ commit: "false" }` reached runtime as a truthy string and behaved
as if attribution were enabled. Add a per-field bool coercion that
falls back to the schema default (true) for any non-boolean,
matching what the dialog and IDE schema already imply. Tests cover
the string / number / null cases.

0oAo (Copilot): v3→v4 shouldMigrate only special-cased versionless
legacy booleans — versionless files with invalid gitCoAuthor values
(`"off"`, `[]`, etc.) skipped the migration and the loader stamped
`$version: 4` over the bad value. Runtime normalization then
silently re-enabled attribution. Extend shouldMigrate to fire on ANY
versionless non-object value at general.gitCoAuthor; the existing
migrate() body's drop-and-warn path resets it. Already-object
shapes (hand-edited to v4) still skip cleanly. Tests added.

0oAt (Copilot): `.terraform.lock.hcl` got dropped from generated-file
exclusion when `.lock` was removed from the blanket extension list
in 3c0e3293b. It's a generated provider lockfile in the same class
as `package-lock.json` and dominates Terraform-repo commits. Re-add
to EXCLUDED_FILENAMES and add a regression test covering both
repo-root and module-nested locations.

* fix(attribution): harden restoreFromSnapshot against corrupt payloads

1KMY (Copilot): snapshot.surface was copied without type validation.
A corrupted/partially-written snapshot with a non-string surface
(e.g. {}, 42, null) would later be serialized into the git note as
"[object Object]" and used as a Map key downstream, breaking the
expected payload shape. Type-check and fall back to the current
client surface for any non-string (or empty-string) value.

1KLq (Copilot): per-field sanitiseCount enforced
`promptCount >= 0` and `promptCountAtLastCommit >= 0` independently,
but never the cross-field invariant. A snapshot with
promptCountAtLastCommit > promptCount would surface a negative
getPromptsSinceLastCommit() and propagate as a "(-N)-shotted"
trailer into PR text. Clamp atLastCommit to total on restore.

1KL_ (Copilot): when a snapshot carried both the symlinked and
canonical paths for the same file (a session straddling the
canonicalisation fix), `set(realpathOrSelf(k), ...)` overwrote the
first entry with the second, silently dropping the AI contribution
the first form had accumulated. Merge instead: sum aiContribution
and OR aiCreated when collapsing duplicate keys.

Tests cover all three branches: non-string surface fallback,
promptCount clamp, and duplicate-key merge.

* fix(attribution): roll back snapshot dedup key on sync appendRecord failure

1UMh (Copilot): appendRecord can throw synchronously before returning
a promise — e.g. when ensureConversationFile() rethrows a non-EEXIST
writeFileSync error. The async .catch() handler attached to the
promise never runs in that case, so the optimistic dedup-key set
sticks on a write that never landed and permanently suppresses
identical retries. Roll back lastAttributionSnapshotJson in the outer
catch too. Regression test forces writeFileSync to throw EACCES on
the first invocation, then asserts the second identical snapshot
attempt fires a fresh write rather than getting deduped.

* docs(attribution): align cleanup-branch comments with noteCommitWithoutClearing

Three doc/test-fixture stale-after-refactor cleanups (Copilot
4MDx / 4MEI / 4MEa):

- shell.ts:1944 (around the stagedInfo === null branch): the comment
  still claimed the finally block "falls back to a full clear", but
  1ece87438 switched analysis-failure cleanup to
  noteCommitWithoutClearing(). Update the comment so the reasoning
  matches what the code actually does (and so a future reader doesn't
  reintroduce the wholesale clear thinking it's already there).

- shell.ts: getCommittedFileInfo docstring carried the same stale
  "full clear" claim for the `null` return value. Update to describe
  the noteCommitWithoutClearing() fallback and the smaller-evil
  trade-off for the just-committed file.

- chatRecordingService.test.ts: baseSnapshot fixture for the
  recordAttributionSnapshot tests still carried `baselines: {}`,
  even though that field was removed from AttributionSnapshot in
  296fb55ae's dead-code purge. Structural typing let it compile,
  but the fixture didn't reflect the production shape — drop it.

* fix(attribution): restore fire-and-forget appendRecord, route rollback via callback

6OcJ (Copilot): refactor in 715c258fb returned a Promise from
appendRecord so the snapshot dedup-key path could chain rollback —
but recordUserMessage / recordAssistantTurn / recordAtCommand /
recordSlashCommand / rewindRecording all call appendRecord without
await or .catch(). A transient jsonl.writeLine rejection on any of
those would surface as an unhandled-promise-rejection (warning, or
crash on --unhandled-rejections=throw).

Restore the original fire-and-forget semantics: appendRecord again
returns void and internally swallows async failures (logging via
debugLogger). Per-record failure reactions are routed through an
optional onError callback — recordAttributionSnapshot uses this to
roll back lastAttributionSnapshotJson when the write that set it
ends up rejecting.

Tests: add a fire-and-forget regression that mocks writeLine to
reject and asserts no unhandledRejection events fire while the
existing snapshot rollback tests (sync + async) still pass via the
new callback path.

* fix(attribution): GIT_DIR repo-shift bail, snapshot envelope validation, narrow legacyTypes

80ME (gpt-5.5 /review, [Critical]): tokeniseSegment unconditionally
stripped every leading KEY=value token. `GIT_DIR=elsewhere/.git git
commit ...` was therefore treated as an in-cwd commit, picked up the
Co-authored-by trailer, and produced a per-file note that landed
against our cwd's HEAD even though the actual commit went to a
different repo. Define a GIT_ENV_SHIFTS_REPO set (GIT_DIR,
GIT_WORK_TREE, GIT_COMMON_DIR, GIT_INDEX_FILE, GIT_NAMESPACE) and
have tokeniseSegment refuse to parse any segment whose leading env
block (including the env-wrapper's KEY=VALUE block) carries one of
these. Identity / date variables (GIT_AUTHOR_*, GIT_COMMITTER_*) are
deliberately NOT in the set — they tweak metadata but don't relocate
the repo. Tests cover plain prefix, env-wrapped prefix, and a
GIT_COMMITTER_DATE positive control that should still get the trailer.

8EeQ (Copilot): restoreFromSnapshot received `snapshot as
AttributionSnapshot` from a structural cast off `unknown` (the
resume path), so its TS-typed shape was only a hint. A corrupted
JSONL line (non-object / array / wrong type discriminator / missing
type) would skip past the version check straight into
Object.entries(snapshot.fileStates) — and a non-object fileStates
(an array, say) seeded fileAttributions with numeric-string keys.
Add envelope-level shape gates (isPlainObject + type discriminator)
and a fileStates plain-object check before iterating; both bail to a
clean reset rather than poisoning the singleton. Tests added.

8Eej (Copilot): SettingDefinition.legacyTypes was typed as
SettingsType[] which includes 'enum' and 'object' — JSON Schema's
`type` keyword doesn't accept those values. Adding
`legacyTypes: ['enum']` would silently produce an invalid
settings.schema.json. Narrow the field's type to
ReadonlyArray<'boolean' | 'string' | 'number' | 'array'> (the
JSON-Schema-primitive subset). Future complex-shape legacy support
should land its own branch in convertSettingToJsonSchema.

* docs(attribution): correct legacyTypes / EXCLUDED_DIRECTORY_SEGMENTS comments

9Ta_ (Copilot): the JSDoc on legacyTypes claimed JSON Schema's
`type` keyword does not accept `'object'` — that's wrong; `'object'`
IS a valid JSON Schema type. Reword to reflect the actual rationale:
`'enum'` is not a valid JSON Schema `type` value at all (enum
constraints use the `enum` keyword), and a bare `{type: 'object'}`
would accept any object regardless of what the field's pre-expansion
shape actually allowed. The narrowed `boolean | string | number |
array` set is exactly what the one-liner generator can faithfully
emit; richer legacy shapes belong in their own branch of
convertSettingToJsonSchema.

9Tbs (Copilot): the comment in generatedFiles.ts referenced
`EXCLUDED_DIRECTORIES`, but the constant is `EXCLUDED_DIRECTORY_SEGMENTS`
(renamed during the segment-boundary refactor). Update the
reference so a future maintainer scanning for the rule doesn't
chase a non-existent identifier.

* fix(attribution): SHA-pin git notes, on-disk hash divergence detection, env -C cwd-shift

tanzhenxin review #1 — Note targets symbolic HEAD, not captured SHA:
buildGitNotesCommand hard-coded 'HEAD' as the target; postHead was
captured at commit-detection time but only used for the !== preHead
diff. Between that capture and the execFile, three more awaited git
calls run — anything that moves HEAD in the same cwd (post-commit
hook, chained `commit && tag -m`, parallel process) silently lands
the note on the wrong commit because of `-f`. Thread postHead
through buildGitNotesCommand as a required `targetCommit` arg.
Test asserts the targeted SHA, not the symbolic ref.

tanzhenxin review #2 — Accumulator has no baseline:
recordEdit was monotonic per-path with no reset for out-of-band
mutations. Re-instate FileAttribution.contentHash and:
- recordEdit hashes the input `oldContent` and resets the per-file
  accumulator if it doesn't match what AI's last write recorded
  (catches paste-replace via external editor, manual save, etc.
  WHEN AI subsequently edits the same file again).
- New validateOnDiskHashes() rehashes every tracked file's CURRENT
  on-disk content and drops entries whose hash diverged. Called
  from attachCommitAttribution before matchCommittedFiles so a
  commit can never credit AI for a human-only diff. Deleted files
  (readFileSync throws) are left alone — the commit's deletion
  record is what the note should reflect.

tanzhenxin review #4 — Failed-commit / staleness leak:
The recordEdit divergence check above + commit-time
validateOnDiskHashes together catch tanzhenxin's exact scenario
(AI edits a.ts → hook rejects → user manually edits a.ts → user
commits → no AI credit because validateOnDiskHashes drops the
stale entry). The !commitCreated branch still preserves
attributions to keep the submodule case working — the staleness
problem is now solved at the next commit's validation step.

Self-review item — env -C / --chdir treated as repo-shifting:
Added ENV_FLAGS_SHIFT_CWD set covering -C / --chdir. tokeniseSegment
returns null for `env -C DIR git commit ...` segments — same
contract as a leading GIT_DIR=... assignment. Without this we'd
either misidentify /elsewhere as the program (silently dropping
attribution) or, worse if -C went into the value-skip set,
trailer-inject onto a commit that lands in /elsewhere's repo. Tests
added alongside the existing GIT_DIR repo-shift cases.

339 tests pass; typecheck clean.

* fix(attribution): pickBool intent-aware, shouldClear gate, ETIMEDOUT surface, drop dead exports

-wgA + -wg0 (deepseek): pickBool defaulted non-boolean to true,
turning a hand-edited `{ commit: "false" }` into enabled
attribution. Replace with intent-aware parsing: "true"/"yes"/"on"/
"1" → true, "false"/"no"/"off"/"0"/"" → false, anything else
(unknown strings, non-1 numbers, objects, arrays, null) → false.
Genuinely-absent sub-fields still default to true (schema default).
Migration test scenarios covered. Tests now cover ~17 input cases
across both string/number/null/object/unknown forms.

-wgq (deepseek): when buildGitNotesCommand returned null (oversized
payload) or git notes itself failed, the finally block called
clearAttributedFiles(committedAbsolutePaths) — irreversibly
deleting per-file attribution data the user might need to amend &
retry. Introduce a separate `shouldClear` set that's only assigned
on successful note write OR explicit toggle-off. Failure paths
(oversized, exitCode != 0, exception, analysis failure) leave
shouldClear null so the finally block calls noteCommitWithoutClearing
instead — preserving per-file state for the user's recovery.

9p7W (Copilot): execFile callback coerced ETIMEDOUT / SIGTERM
(timeout) into a generic exitCode=1 warning. Detect both
`error.code === 'ETIMEDOUT'` and `error.killed === true &&
error.signal === 'SIGTERM'` so the user-visible warning correctly
names "timed out after 5s" instead of "exited 1".

-wg7 (deepseek): formatAttributionSummary and getAttributionNotesRef
were exported but had zero production callers (only tests). Remove
the dead exports + their tests (~40 LOC). If/when a logging surface
needs them, they can be re-introduced.

-wgb (deepseek): tokeniseSegment doesn't recursively unwrap
`bash -c '...'` / `sh -c` / `zsh -c`, so addCoAuthorToGitCommit
won't splice the trailer into a wrapped command. The background
refusal AND the post-commit note path DO catch the wrapped commit
because stripShellWrapper at the top of execute peels the wrapper
before gitCommitContext / getGitHead run — so the worst-case
("background bash -c 'git commit' bypasses the guard") doesn't
materialize. The remaining gap (no Co-authored-by trailer for
bash -c-wrapped commits) requires recursively splicing into the
inner script with proper bash single-quote re-quoting; significant
enough that it's worth its own PR. Documented as a partial-coverage
limitation.

3…
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request May 15, 2026
…e + Agent isolation (QwenLM#4073)

* feat(tools): add generic worktree support (Phase A + B of QwenLM#4056)

Adds first-class git worktree as a general-purpose capability:

Phase A — User-facing tools
- enter_worktree: creates `<projectRoot>/.qwen/worktrees/<slug>` on a
  `worktree-<slug>` branch and returns the absolute path. Slug auto-generated
  when omitted; validated against path traversal and disallowed characters.
- exit_worktree: keeps or removes the worktree (and its branch). Refuses to
  remove a worktree with uncommitted tracked changes or untracked files
  unless `discard_changes: true` is set.

Phase B — Agent isolation
- Agent tool gains an `isolation: 'worktree'` parameter that provisions a
  temporary `agent-<7hex>` worktree, prepends a worktree notice to the task
  prompt, and on completion either removes the worktree (no changes) or
  preserves it and reports its path/branch in the result. Background and
  foreground execution paths both wired up; rejected for fork agents.
- worktreeCleanup.cleanupStaleAgentWorktrees: fail-closed sweep for
  ephemeral `agent-<7hex>` worktrees older than 30 days with no tracked
  changes and no unpushed commits. User-named worktrees are never swept.
- buildWorktreeNotice helper for fork subagents (parity with claude-code).

Arena compatibility
- The existing Arena worktree implementation (GitWorktreeService.setupWorktrees,
  ArenaManager, agents.arena.worktreeBaseDir) is untouched. Arena uses its
  own batch APIs and `~/.qwen/arena` base dir; the new general-purpose APIs
  live alongside under `<projectRoot>/.qwen/worktrees/`.

Subagent safety
- enter_worktree / exit_worktree are added to EXCLUDED_TOOLS_FOR_SUBAGENTS
  so a subagent cannot mutate the parent session's worktree state.

Refs QwenLM#4056

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(worktree): use path.join in expected paths so the test passes on Windows

The Windows CI run reported `enter-worktree.test.ts` failing because the
expected string was hardcoded with `/` while `getUserWorktreesDir()` uses
`path.join`, which returns `\\` on Windows. Build the expected path via
`path.join` so the platform-correct separator is compared.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(enter-worktree): treat empty name as auto-generate

Some models pass `{ "name": "" }` when calling EnterWorktree, because the
schema marks `name` as optional and they emit an empty placeholder. The
previous validation rejected the empty string with "Worktree name must be
a non-empty string", which surprised users running the auto-slug path.

Now both `validateToolParams` and `execute` treat `name: ""` as equivalent
to `name: undefined` and fall back to the auto-generated `{adj}-{noun}-{4hex}`
slug. Explicit invalid slugs (`'../etc'`, `'a/b'`, etc.) are still rejected
as before.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review findings 1-6 from PR QwenLM#4073

Six issues raised on the initial review; each addressed with a verifiable
guarantee.

1. Real isolation for `agent isolation: 'worktree'`
   Before: subagent's Config still resolved `getTargetDir()` to the parent
   project root, so Edit/Write/Read workspace checks and Shell's default cwd
   silently operated on the parent tree. The cleanup helper then saw a
   "clean" worktree and removed it — destroying the evidence.
   After: the worktree is provisioned BEFORE `createApprovalModeOverride`,
   and the resulting agent Config has `getTargetDir`/`getCwd`/`getWorkingDir`
   rebound to the worktree path. Relative paths, unqualified shell
   commands, and glob/grep roots all confine to the worktree.

2. `exit_worktree action='remove'` now prompts in default/auto-edit modes
   Added `getDefaultPermission()` on the invocation: `'ask'` when action is
   `remove`, `'allow'` when `keep`. Brings it in line with edit, write_file,
   and run_shell_command.

3. Force-delete no longer silently destroys unpushed commits
   `removeUserWorktree` now uses `git branch -d` (refuses unmerged) by
   default and surfaces `branchPreserved: true` when git refuses. Added
   `hasUnmergedWorktreeCommits` (checks if branch tip is reachable from any
   other local branch or remote ref). Both the agent isolation cleanup and
   `exit_worktree action='remove'` use this check: if the branch has work
   not covered elsewhere, the worktree+branch are preserved even when
   `discard_changes: true` is set (there is no `discard_commits` flag —
   committed work is rarely what `remove` means to discard).

4. Both new tools are now deferred behind ToolSearch
   `shouldDefer: true` + `searchHint` on both. Verified via openai-logging:
   `enter_worktree` and `exit_worktree` no longer appear in the function-
   declaration list sent on every API request.

5. Stale-worktree cleanup is wired in
   `Config.initialize()` fires `cleanupStaleAgentWorktrees(targetDir)` as a
   non-awaited startup sweep (skipped in bare mode). Picks up orphaned
   `agent-<7hex>` worktrees left by crashed runs.

6. Foreground isolation no longer leaks on uncaught throw
   The foreground try block tracks whether the cleanup helper ran on the
   success path; the finally block invokes it as a fallback when the try
   bailed early. Mirrors the background path's pattern.

Verification:
- Unit tests: 83 passed (16 worktree + 64 existing agent + 3 cleanup) — no
  regressions.
- E2E #1: agent told to write `hello.txt` via RELATIVE path — file landed
  at `.qwen/worktrees/agent-XXXXXXX/hello.txt`, NOT at the parent root.
- E2E #3: created worktree, committed work inside it, called exit_worktree
  with `discard_changes=true` — refused with clear message; worktree and
  branch both preserved.
- E2E #4: openai-logging confirms worktree tools absent from API tool list
  (7 tools sent instead of 9).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 2 findings (1 from tanzhenxin, 7+8 from wenshao)

The first round closed the data-loss-class issues. This round addresses
follow-ups from a deeper audit:

1. Stale-worktree sweep was inert on common-case repos
   `cleanupStaleAgentWorktrees` previously ran `git log --branches --not
   --remotes --oneline` from each worktree's directory — that lists
   unpushed commits across EVERY local branch, not just the worktree's
   own branch. On any repo with no remote configured (or with stray
   unpushed branches), the sweep refused to remove every candidate.
   Replaced with `service.hasUnmergedWorktreeCommits(slug)` which scopes
   the check to the worktree branch via `for-each-ref --contains <tip>`.
   Also added the `branchPreserved` warn log requested in M7 and an
   `fs.access` shortcut for the empty-worktrees-dir case (M8).

2. `cleanupWorktreeIsolation` and `worktreeIsolation` were inside the
   inner try (~660 lines from the outer catch). Hoisted both to the top
   of `execute()` so the outer catch can reap or preserve the worktree
   when anything between provisioning and the inner try throws (e.g.
   `createApprovalModeOverride`, agent creation). Closure carries the
   resolved `repoRoot` so cleanup never has to re-resolve.

3. Background error path discarded the cleanup result. Now captures
   `formatWorktreeSuffix(...)` and appends it to the registry's failure
   /cancel message, so users see the preserved path/branch even when
   the agent crashed before reporting.

4. `cleanupWorktreeIsolation` now treats `result.success === false` as
   "worktree still on disk" and surfaces it as preserved instead of
   silently dropping it from the result.

5. Override was incomplete. Several Config methods read `this.targetDir`
   directly (`getProjectRoot`, `getFileService`, etc.) — own-property
   getter overrides did not redirect them. Now also shadows `targetDir`
   and `cwd` as own properties on the agent's Config override, swaps in
   a `FileDiscoveryService` rooted at the worktree, and rebuilds
   `WorkspaceContext` to point at the worktree only. Verified
   end-to-end: shell `pwd > pwd-record.txt` (no directory arg) lands at
   `.qwen/worktrees/agent-<7hex>/pwd-record.txt`, not the parent root.

6. monorepo subdir issue. Both `enter_worktree` and the agent isolation
   path now resolve `git rev-parse --show-toplevel` first and anchor
   `.qwen/worktrees/<slug>` at the repo root. Worktrees created from
   any subdirectory now end up where the startup sweep can find them.

7. Replaced `git worktree add -B` (silent force-reset of pre-existing
   branches) with `git worktree add -b` plus an explicit existence
   check via `git for-each-ref` (NOT `show-ref --quiet`, which
   simple-git swallows). Pre-existing `worktree-<slug>` branches now
   trigger a clear error instead of clobbering committed work.

8. First worktree creation in a repo writes `<projectRoot>/.qwen/.gitignore`
   with `worktrees/` so worktree contents stay out of the parent's
   `git status`, glob/grep results, and bundle tools. Idempotent: never
   overwrites an existing file.

9. Logging across the failure paths (`enter_worktree` errors,
   `agent.ts:failWorktreeProvisioning`, `cleanupWorktreeIsolation`,
   `hasUnmergedWorktreeCommits` swallowed errors,
   `cleanupStaleAgentWorktrees`'s `branchPreserved` race).

10. `exit_worktree` no longer suggests `discard_changes: true` when the
    git status check itself fails — that would be advising the user to
    bypass a safety check whose precondition is unknown. Now points at
    the underlying repo problem.

11. `generateAutoSlug` switched from `Math.random()` (4 hex, weak RNG,
    one-in-65k collision) to `randomBytes` (6 hex, ~16M combinations).
    Two RNG sources in this file collapsed to one.

Pushed back: the TOCTOU swap in `removeUserWorktree` (S6 round 1) is
left as-is — `git branch -d` is the real safety, and reordering does
not eliminate the window. Windows reserved-name validation (M5 round 2)
deferred to a follow-up; the current allowlist already rejects path
separators, `..`, leading dot/dash, and the >64-char case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): use randomInt to silence CodeQL biased-modulo finding

CodeQL's `js/biased-cryptographic-random` flagged
`randomBytes(4)[i] % ARRAY.length` in `generateAutoSlug`. The math is
actually exact for the current word-list lengths (256 % 8 == 0), but
the lint rule does not know that — and a future contributor changing
the list to a non-power-of-two length would silently introduce bias.

Switched the index lookups to `crypto.randomInt(0, length)`, which uses
rejection sampling and is uniform by construction. Suffix still uses
`randomBytes(3).toString('hex')` since hex encoding is unbiased.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 3 findings 1-6 from PR QwenLM#4073

The previous round added `getRepoTopLevel` for `enter_worktree`'s
provisioning, but missed three sibling call sites that still used the
raw cwd. The double-cleanup race in the foreground path also leaked
stale `[worktree preserved]` suffixes on rejected promises. All six
findings from the deeper audit are addressed:

1. exit_worktree now resolves through `getRepoTopLevel()` before
   building its `GitWorktreeService`, mirroring `enter_worktree`. Without
   this, launching `qwen` from a monorepo subdirectory created the
   worktree under the repo root but exit_worktree looked under the
   subdir's `.qwen/worktrees/` and always returned "Worktree not found".
   Verified end-to-end: enter + exit from `packages/core/` works.

2. agent.ts cleanup helper now nulls `worktreeIsolation` immediately
   after capturing the closure value. The previous structure could
   reach the helper twice — once in the foreground try's success path
   and once in the foreground finally fallback (or once in the inner
   try and once in the outer catch on a thrown rejection). The second
   call would `hasWorktreeChanges()` against a directory the first
   call already removed, fail-closed, and emit a bogus
   `[worktree preserved: <missing path>]` suffix.

3. Config.initialize's startup sweep now resolves `getRepoTopLevel()`
   before invoking `cleanupStaleAgentWorktrees`. Without this, every
   subdir launch scanned a non-existent `<subdir>/.qwen/worktrees/`
   and the 30-day expiry sweep was permanently a no-op.

4. agent.ts's `buildWorktreeNotice` now passes
   `worktreeIsolation.repoRoot` as `parentCwd` instead of
   `this.config.getTargetDir()`. The notice's path-translation
   guidance (≈ "translate paths from <parent> to <worktree>") would
   otherwise misdirect the subagent in a monorepo subdir launch.

5. Removed dead method `GitWorktreeService.listUserWorktrees`. It had
   no callers anywhere in the codebase and used `execSync` in a loop
   (would have blocked the event loop if anyone wired it up).

6. `localBranchExists` no longer swallows git failures silently. The
   defensive `false` default is preserved (so `git worktree add -b`
   itself surfaces the conflict if the check missed an existing
   branch), but the catch now logs via `debugLogger.warn` so disk-full
   / permission / ref-store-corruption cases are visible in debug
   output instead of being invisible.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 4 findings (data-loss + visibility)

Seven actionable findings from a deeper audit, all closed:

1. User worktree slugs could collide with ephemeral-agent shape
   `validateUserWorktreeSlug` did not reject names starting with
   `agent-`, so a user-named `agent-1234567` matched the cleanup regex
   `/^agent-[0-9a-f]{7}$/` and would be silently swept after 30 days
   along with whatever work was in it. Now reserved — clear error
   message points users at the cause.

2. Slug producer and consumer were string-coupled across files
   `agent.ts` hardcoded `agent-${hex(7)}` and `worktreeCleanup.ts`
   independently hardcoded `/^agent-[0-9a-f]{7}$/`. Future change to
   hex length on one side would silently break the other. Lifted
   `AGENT_WORKTREE_PREFIX`, `AGENT_WORKTREE_HEX_LENGTH`,
   `AGENT_WORKTREE_SLUG_PATTERN`, and `generateAgentWorktreeSlug()` to
   `gitWorktreeService.ts`; both call sites import them.

3. Startup sweep was invisible at default log level
   Fire-and-forget sweep used `debug` for errors and discarded the
   success count. A leak-chasing operator had no log breadcrumb.
   Errors promoted to `warn`; successful removals (count > 0) logged
   at `info`.

4. `getRepoTopLevel()` silent catch
   Returned `null` on any git failure with no log. Combined with
   `?? cwd` fallback in callers, a flaky git would have made worktree
   creators and the startup sweep disagree silently about which dir to
   use. Now logs the underlying error.

5. `hasTrackedChanges()` silent catch
   Cleanup's fail-closed `return true` had no log. Couldn't tell
   "has real changes — leave alone" from "git index unreadable — repo
   may be corrupt". Now logs.

6. `cleanupWorktreeIsolation` claimed `preservedPath` for a removed dir
   When `removeUserWorktree` returns `{ success: true, branchPreserved:
   true }` it has already deleted the directory and failed only on
   `git branch -d`. The helper still reported the (now non-existent)
   path as preserved. Now returns only `preservedBranch` for that
   case; `formatWorktreeSuffix` emits a distinct message instructing
   recovery via `git worktree add <new-path> <branch>`.

7. `removeUserWorktree` swallowed branch-delete failures
   Both `-d` and `-D` catch blocks were empty. Locked refs, perms,
   disk full all looked identical to "unmerged commits". Both now
   `debugLogger.warn` with the underlying error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(worktree): self-review pass — reuse, parallelism, dead code

Self-review caught a handful of issues across three categories:

Reuse:
- `pathExists` in the new code now uses the existing `fileExists` from
  `utils/fileUtils.ts` instead of duplicating an `fs.access` wrapper.
- `worktree-` branch prefix was string-literalled in five places. Added
  `WORKTREE_BRANCH_PREFIX` and `worktreeBranchForSlug(slug)` exports in
  `gitWorktreeService.ts`; updated `gitWorktreeService.ts`,
  `worktreeCleanup.ts`, and `exit-worktree.ts` to use them. Future
  prefix changes are a single edit.

Efficiency:
- `Config.initialize` used two `await import(...)` calls inside the
  startup-sweep IIFE, paying that cost on every CLI start. Switched to
  static imports at the top of `config.ts` — the modules are tiny and
  the dynamic indirection bought nothing.
- `cleanupWorktreeIsolation` in `agent.ts` ran `hasWorktreeChanges` and
  `hasUnmergedWorktreeCommits` sequentially. They have no data
  dependency on each other and each spawns its own `git` invocation;
  `Promise.all` halves the cleanup wall-clock on the common path.
  Same fix in `worktreeCleanup.ts`'s per-entry loop.
- `ensureWorktreesGitignored` used `fs.access` then `fs.writeFile`, a
  TOCTOU race when two agent invocations created worktrees concurrently
  (both could pass the `access` check and the second would clobber the
  first's `.gitignore`). Now writes with `flag: 'wx'` and treats
  `EEXIST` as the no-op case — atomic in one syscall.

Quality:
- Dropped the `worktreeCleanupRan` boolean in the foreground execution
  path. `cleanupWorktreeIsolation` already nulls its closure variable
  at the top of every call (see the comment at its definition), so
  re-entries are no-ops. The boolean and its tracking were dead weight
  that obscured the real guard.
- Trimmed the Phase-2 override comment block to drop the WHAT-stating
  enumerations (items 3 and 4 just narrated the lines below) and
  removed a navigation comment about hoisted helpers — the helpers are
  visible at the top of the same method.

84 unit tests pass; typecheck clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 5 — design-doc commitments + correctness

Five critical findings + four suggestions, all closed.

Critical:
1. Wrong base branch for agent isolation. `createUserWorktree(slug)` with
   no `baseBranch` arg fell back to `getCurrentBranch()` on the **main**
   working tree, returning `main` regardless of which branch the user
   was actually on. A subagent invoked from `feature-x` would silently
   start from `main` and produce diffs against the wrong baseline.
   `enter_worktree` had the same bug. Both now resolve the parent's
   current branch first and pass it explicitly. Verified end-to-end:
   `git checkout feature-x` → `enter_worktree` → worktree HEAD includes
   the feature-x commit.

2. `countWorktreeChanges` (used by `exit_worktree`'s dirty-state guard)
   missed `status.conflicted[]`. In simple-git that array is mutually
   exclusive with the staged/modified/etc. arrays, so a worktree
   mid-merge with only conflicts looked `{tracked: 0, untracked: 0}`
   to the guard and `action='remove'` would proceed without
   `discard_changes: true`. Added `+ status.conflicted.length`.

3. `exit_worktree` had no session-ownership check, contradicting the
   design doc's "only operates on worktrees created by THIS session".
   In yolo mode a prompt injection could enumerate `.qwen/worktrees/`
   and pass any name to drop another session's work. Now:
   `enter_worktree` and agent isolation write a `.qwen-session`
   marker into the worktree at provisioning time; `exit_worktree
   action='remove'` reads it and refuses if it does not match the
   current `Config.getSessionId()`. Worktrees from before this guard
   (no marker file) are treated as "owner unknown" — allowed with a
   warn log so the change is observable.

4. `enter_worktree` did not refuse nested invocations from inside an
   existing worktree, contradicting the design doc. Now rejects any
   cwd containing `.qwen/worktrees/` as a path component, with a
   clear "Already inside a git worktree…" message. Verified: enter
   from inside a worktree returns is_error with that text.

6. `hasTrackedChanges` (cleanup sweep) had the same `conflicted[]`
   gap. Rewrote to use raw `git status --porcelain --untracked-files=no`
   which lists every tracked change including `UU` conflict markers
   in a single git call and explicitly skips the untracked walk
   (the prior comment claimed to skip it, but `status()` always
   does the scan).

Suggestion:
7. `buildWorktreeNotice` now receives the parent agent's actual
   `getTargetDir()` again (was switched to `repoRoot` in round 3 on
   a different reviewer's suggestion; round-5 caught that the model's
   inherited paths reference the parent's cwd, not necessarily the
   repo root, so the prior behaviour was correct).

8. Startup sweep now does `fs.access(<targetDir>/.qwen/worktrees)`
   *before* importing GitWorktreeService and spawning `git
   rev-parse --show-toplevel`. The git probe is reserved for users
   who actually have a worktrees directory locally — 99% of users
   pay only one syscall on startup.

9. Tests:
   - New `exit-worktree.test.ts` covers metadata, validation,
     `getDefaultPermission` (ask vs allow), and getDescription.
   - `agent.test.ts` adds three `validateToolParams` cases for the
     `isolation` parameter (accepted with subagent_type, rejected
     without, rejected for non-"worktree" values).
   - `enter-worktree.test.ts` adds round-trip tests for
     `writeWorktreeSessionMarker` / `readWorktreeSessionMarker` plus
     a `worktreeBranchForSlug` sanity check.
   - Total: 101 tests pass (was 86 → +15).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): drop unused @ts-expect-error in exit-worktree.test.ts

Empty string `''` is a valid `string` type, so the @ts-expect-error
directive on `validateToolParams({ name: '', action: 'keep' })` did
nothing — TypeScript correctly accepted the line, and `tsc --build`
in CI reported TS2578 ("Unused '@ts-expect-error' directive"). The
runtime assertion already covers the case; the directive was leftover
from an earlier draft.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): use importActual in ArenaManager mock to preserve new exports

The Arena test mocks `gitWorktreeService.js` with a factory that
returns only `{ GitWorktreeService }`. PR QwenLM#4073 added several other
exports to that module (`AGENT_WORKTREE_SLUG_PATTERN`,
`WORKTREE_BRANCH_PREFIX`, `worktreeBranchForSlug`,
`generateAgentWorktreeSlug`, `writeWorktreeSessionMarker`,
`readWorktreeSessionMarker`, `WORKTREE_SESSION_FILE`).

Other modules in the dep graph reach the mocked surface — most
notably `worktreeCleanup.ts` imports `AGENT_WORKTREE_SLUG_PATTERN`
and `worktreeBranchForSlug`, and now reaches the mock via the static
`config.ts` → `worktreeCleanup.ts` import chain added in the
self-review pass. The Arena test failed at module-load with:

  Caused by: Error: [vitest] No "AGENT_WORKTREE_SLUG_PATTERN" export
  is defined on the "../../services/gitWorktreeService.js" mock. Did
  you forget to return it from "vi.mock"?

Use `importOriginal` to capture every real export, spread it into
the return object, and only replace `GitWorktreeService` (the class
the test actually needs to mock). The class-level mock keeps its
existing static-method shims.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 6 (5 critical + 6 suggestions)

The biggest item — #1 — is a self-inflicted regression from round 5:
the new agent- prefix reservation in `validateUserWorktreeSlug`
rejected EVERY slug that `generateAgentWorktreeSlug` produces, since
that helper emits exactly `agent-<7hex>`. Net effect: every
`AgentTool isolation: 'worktree'` invocation failed at validation.
The reservation now allows the canonical pattern through (everything
the helper can produce) and only rejects user-chosen `agent-*` names
that don't match it. Added a round-trip regression guard: 50
`generateAgentWorktreeSlug()` outputs are fed back through
`validateUserWorktreeSlug` and must all pass.

Other critical fixes:

2. `hasWorktreeChanges` (used by agent isolation cleanup) was the
   one remaining caller relying solely on `status.isClean()`.
   Defensive `|| status.conflicted.length > 0` so a future simple-git
   bookkeeping change can't let a mid-merge worktree appear clean and
   get auto-deleted.

3. `readWorktreeSessionMarker` swallowed every I/O error as "marker
   missing", which let a disk error / EACCES silently bypass the
   session-ownership guard. ENOENT is still treated as missing
   (legitimate); every other code now logs.

4. `exit_worktree` `fs.stat` catch was the same shape — every error
   collapsed to "Worktree not found". ENOENT → not found; everything
   else logs and returns a distinct "cannot access" error.

5. `cleanupStaleAgentWorktrees` `fs.stat` catch was again the same.
   ENOENT → silently skip (entry vanished between readdir and stat);
   everything else logs.

Suggestions:

6. Startup sweep fast-bail was running BEFORE resolving the repo
   top-level. For monorepo subdir launches, `targetDir/.qwen/worktrees`
   never exists and the sweep early-returned — permanently a no-op.
   Now resolves the root first, then fast-bails against the resolved
   `<root>/.qwen/worktrees`. Also logs the skip case so operators can
   tell "skipped" from "ran, found nothing".

7. `.qwen-session` marker was visible to `git add -A` inside the
   worktree. Now writes a `.git/info/exclude` rule (resolved via
   `git rev-parse --git-dir`, since worktree `.git` is a file
   pointing at the parent repo's `.git/worktrees/<name>/`).
   Best-effort: failure to write the rule does not abort
   provisioning.

8. Agent isolation now refuses to provision when the parent's cwd is
   already inside a worktree — same regex guard as `enter_worktree`.

9. `exit_worktree`'s wrapper around `hasUnmergedWorktreeCommits` now
   logs at the call site so the chain (caller → reason it asked →
   underlying git error) is complete in operator logs.

10. Sweep now logs unconditionally at `info`. Three distinct messages:
    "skipped (no worktrees dir)", "ran, nothing to remove", "removed N".

Tests:

11. New `execute()` coverage:
    • exit-worktree: session-ownership refusal, keep happy path,
      legacy/no-marker fallthrough with warn log, missing-worktree
      error, unmerged-commits guard with `discard_changes: true`,
      `writeWorktreeSessionMarker` round-trip.
    • enter-worktree: nested-guard rejection, non-git-repo error.
    These spin up real temp git repos (no filesystem mocking) and
    drive the actual tool invocation pipeline.

   Total: 135 tests pass (was 101 → +34).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(worktree): demote noise startup-sweep logs to debug

Self-review pass applying the round-6 review-triage framework
(filter #5: "If a log only fires on the happy path, it's noise.")
to my own round-6 changes:

- "Stale worktree sweep skipped: <dir> does not exist" — fires on
  every CLI start for ~99% of users who never use worktrees.
- "Stale worktree sweep ran under <root>: nothing to remove" —
  fires on every CLI start for users who have any worktrees but
  no stale ones at the moment.

Both are happy-path noise at `info`. Demoted to `debug` so an
operator can opt in via `--debug` when they want to confirm the
sweep is wired up, but normal output stays clean.

Only the actually-actionable case ("removed N worktrees") stays at
`info` — that's the signal someone chasing a worktree leak would
grep for.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): close AUTO_EDIT bypass + parent-dirty stale-code hazard

Round-7 review caught two correctness gaps:

1. exit_worktree action='remove' was still auto-approved in AUTO_EDIT
   `getDefaultPermission` returning 'ask' is necessary but not
   sufficient. `permissionFlow.isAutoEditApproved` auto-approves any
   tool whose `confirmationDetails.type` is 'edit' OR 'info', and
   `BaseToolInvocation` returns 'info' by default. So a session in
   AUTO_EDIT could silently destroy a worktree (with branch deletion)
   without a confirmation prompt — the data-loss path the round-1
   `'ask'` switch was meant to close. Now overrides
   `getConfirmationDetails` to return `type: 'exec'` for action=remove,
   which keeps the prompt in AUTO_EDIT. The `keep` action still falls
   through to the base info-type since it is non-destructive.

   Regression-guard test asserts the type is 'exec' (not 'info') for
   remove and that the command field describes both the worktree-remove
   and branch-delete operations.

2. Agent isolation worktrees ran against parent's HEAD, not its
   working tree
   `git worktree add -b <branch> <path> <base>` only checks out the
   base ref's tip — uncommitted edits in the parent's working tree do
   NOT propagate. The "edit code → ask review/test agent before
   committing" workflow silently ran the subagent against the
   pre-edit HEAD and returned results that looked authoritative but
   reflected stale code.

   Reviewer offered two options: overlay parent's dirty state à la
   Arena (~50 LOC, edge cases), or refuse isolation when parent is
   dirty (~10 LOC, clear UX). Chose the latter for Phase B scope —
   simpler, decisive, and matches the design-doc's explicit
   commitment that dirty-state overlay is Arena-specific. Users can
   commit/stash before re-invoking agent isolation; overlay can be a
   follow-up if users complain about the friction.

   Fail-closed on the dirty-check itself (assume dirty rather than
   silently launch on a possibly-stale tree).

   Test exercises both "dirty parent → guard fires" and
   "clean parent → guard passes" against real temp git repos.

139 unit tests pass (was 135, +4 regression guards).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
(cherry picked from commit 609e05b)
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request May 23, 2026
…e + Agent isolation (QwenLM#4073)

* feat(tools): add generic worktree support (Phase A + B of QwenLM#4056)

Adds first-class git worktree as a general-purpose capability:

Phase A — User-facing tools
- enter_worktree: creates `<projectRoot>/.qwen/worktrees/<slug>` on a
  `worktree-<slug>` branch and returns the absolute path. Slug auto-generated
  when omitted; validated against path traversal and disallowed characters.
- exit_worktree: keeps or removes the worktree (and its branch). Refuses to
  remove a worktree with uncommitted tracked changes or untracked files
  unless `discard_changes: true` is set.

Phase B — Agent isolation
- Agent tool gains an `isolation: 'worktree'` parameter that provisions a
  temporary `agent-<7hex>` worktree, prepends a worktree notice to the task
  prompt, and on completion either removes the worktree (no changes) or
  preserves it and reports its path/branch in the result. Background and
  foreground execution paths both wired up; rejected for fork agents.
- worktreeCleanup.cleanupStaleAgentWorktrees: fail-closed sweep for
  ephemeral `agent-<7hex>` worktrees older than 30 days with no tracked
  changes and no unpushed commits. User-named worktrees are never swept.
- buildWorktreeNotice helper for fork subagents (parity with claude-code).

Arena compatibility
- The existing Arena worktree implementation (GitWorktreeService.setupWorktrees,
  ArenaManager, agents.arena.worktreeBaseDir) is untouched. Arena uses its
  own batch APIs and `~/.qwen/arena` base dir; the new general-purpose APIs
  live alongside under `<projectRoot>/.qwen/worktrees/`.

Subagent safety
- enter_worktree / exit_worktree are added to EXCLUDED_TOOLS_FOR_SUBAGENTS
  so a subagent cannot mutate the parent session's worktree state.

Refs QwenLM#4056

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(worktree): use path.join in expected paths so the test passes on Windows

The Windows CI run reported `enter-worktree.test.ts` failing because the
expected string was hardcoded with `/` while `getUserWorktreesDir()` uses
`path.join`, which returns `\\` on Windows. Build the expected path via
`path.join` so the platform-correct separator is compared.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(enter-worktree): treat empty name as auto-generate

Some models pass `{ "name": "" }` when calling EnterWorktree, because the
schema marks `name` as optional and they emit an empty placeholder. The
previous validation rejected the empty string with "Worktree name must be
a non-empty string", which surprised users running the auto-slug path.

Now both `validateToolParams` and `execute` treat `name: ""` as equivalent
to `name: undefined` and fall back to the auto-generated `{adj}-{noun}-{4hex}`
slug. Explicit invalid slugs (`'../etc'`, `'a/b'`, etc.) are still rejected
as before.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review findings 1-6 from PR QwenLM#4073

Six issues raised on the initial review; each addressed with a verifiable
guarantee.

1. Real isolation for `agent isolation: 'worktree'`
   Before: subagent's Config still resolved `getTargetDir()` to the parent
   project root, so Edit/Write/Read workspace checks and Shell's default cwd
   silently operated on the parent tree. The cleanup helper then saw a
   "clean" worktree and removed it — destroying the evidence.
   After: the worktree is provisioned BEFORE `createApprovalModeOverride`,
   and the resulting agent Config has `getTargetDir`/`getCwd`/`getWorkingDir`
   rebound to the worktree path. Relative paths, unqualified shell
   commands, and glob/grep roots all confine to the worktree.

2. `exit_worktree action='remove'` now prompts in default/auto-edit modes
   Added `getDefaultPermission()` on the invocation: `'ask'` when action is
   `remove`, `'allow'` when `keep`. Brings it in line with edit, write_file,
   and run_shell_command.

3. Force-delete no longer silently destroys unpushed commits
   `removeUserWorktree` now uses `git branch -d` (refuses unmerged) by
   default and surfaces `branchPreserved: true` when git refuses. Added
   `hasUnmergedWorktreeCommits` (checks if branch tip is reachable from any
   other local branch or remote ref). Both the agent isolation cleanup and
   `exit_worktree action='remove'` use this check: if the branch has work
   not covered elsewhere, the worktree+branch are preserved even when
   `discard_changes: true` is set (there is no `discard_commits` flag —
   committed work is rarely what `remove` means to discard).

4. Both new tools are now deferred behind ToolSearch
   `shouldDefer: true` + `searchHint` on both. Verified via openai-logging:
   `enter_worktree` and `exit_worktree` no longer appear in the function-
   declaration list sent on every API request.

5. Stale-worktree cleanup is wired in
   `Config.initialize()` fires `cleanupStaleAgentWorktrees(targetDir)` as a
   non-awaited startup sweep (skipped in bare mode). Picks up orphaned
   `agent-<7hex>` worktrees left by crashed runs.

6. Foreground isolation no longer leaks on uncaught throw
   The foreground try block tracks whether the cleanup helper ran on the
   success path; the finally block invokes it as a fallback when the try
   bailed early. Mirrors the background path's pattern.

Verification:
- Unit tests: 83 passed (16 worktree + 64 existing agent + 3 cleanup) — no
  regressions.
- E2E #1: agent told to write `hello.txt` via RELATIVE path — file landed
  at `.qwen/worktrees/agent-XXXXXXX/hello.txt`, NOT at the parent root.
- E2E #3: created worktree, committed work inside it, called exit_worktree
  with `discard_changes=true` — refused with clear message; worktree and
  branch both preserved.
- E2E #4: openai-logging confirms worktree tools absent from API tool list
  (7 tools sent instead of 9).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 2 findings (1 from tanzhenxin, 7+8 from wenshao)

The first round closed the data-loss-class issues. This round addresses
follow-ups from a deeper audit:

1. Stale-worktree sweep was inert on common-case repos
   `cleanupStaleAgentWorktrees` previously ran `git log --branches --not
   --remotes --oneline` from each worktree's directory — that lists
   unpushed commits across EVERY local branch, not just the worktree's
   own branch. On any repo with no remote configured (or with stray
   unpushed branches), the sweep refused to remove every candidate.
   Replaced with `service.hasUnmergedWorktreeCommits(slug)` which scopes
   the check to the worktree branch via `for-each-ref --contains <tip>`.
   Also added the `branchPreserved` warn log requested in M7 and an
   `fs.access` shortcut for the empty-worktrees-dir case (M8).

2. `cleanupWorktreeIsolation` and `worktreeIsolation` were inside the
   inner try (~660 lines from the outer catch). Hoisted both to the top
   of `execute()` so the outer catch can reap or preserve the worktree
   when anything between provisioning and the inner try throws (e.g.
   `createApprovalModeOverride`, agent creation). Closure carries the
   resolved `repoRoot` so cleanup never has to re-resolve.

3. Background error path discarded the cleanup result. Now captures
   `formatWorktreeSuffix(...)` and appends it to the registry's failure
   /cancel message, so users see the preserved path/branch even when
   the agent crashed before reporting.

4. `cleanupWorktreeIsolation` now treats `result.success === false` as
   "worktree still on disk" and surfaces it as preserved instead of
   silently dropping it from the result.

5. Override was incomplete. Several Config methods read `this.targetDir`
   directly (`getProjectRoot`, `getFileService`, etc.) — own-property
   getter overrides did not redirect them. Now also shadows `targetDir`
   and `cwd` as own properties on the agent's Config override, swaps in
   a `FileDiscoveryService` rooted at the worktree, and rebuilds
   `WorkspaceContext` to point at the worktree only. Verified
   end-to-end: shell `pwd > pwd-record.txt` (no directory arg) lands at
   `.qwen/worktrees/agent-<7hex>/pwd-record.txt`, not the parent root.

6. monorepo subdir issue. Both `enter_worktree` and the agent isolation
   path now resolve `git rev-parse --show-toplevel` first and anchor
   `.qwen/worktrees/<slug>` at the repo root. Worktrees created from
   any subdirectory now end up where the startup sweep can find them.

7. Replaced `git worktree add -B` (silent force-reset of pre-existing
   branches) with `git worktree add -b` plus an explicit existence
   check via `git for-each-ref` (NOT `show-ref --quiet`, which
   simple-git swallows). Pre-existing `worktree-<slug>` branches now
   trigger a clear error instead of clobbering committed work.

8. First worktree creation in a repo writes `<projectRoot>/.qwen/.gitignore`
   with `worktrees/` so worktree contents stay out of the parent's
   `git status`, glob/grep results, and bundle tools. Idempotent: never
   overwrites an existing file.

9. Logging across the failure paths (`enter_worktree` errors,
   `agent.ts:failWorktreeProvisioning`, `cleanupWorktreeIsolation`,
   `hasUnmergedWorktreeCommits` swallowed errors,
   `cleanupStaleAgentWorktrees`'s `branchPreserved` race).

10. `exit_worktree` no longer suggests `discard_changes: true` when the
    git status check itself fails — that would be advising the user to
    bypass a safety check whose precondition is unknown. Now points at
    the underlying repo problem.

11. `generateAutoSlug` switched from `Math.random()` (4 hex, weak RNG,
    one-in-65k collision) to `randomBytes` (6 hex, ~16M combinations).
    Two RNG sources in this file collapsed to one.

Pushed back: the TOCTOU swap in `removeUserWorktree` (S6 round 1) is
left as-is — `git branch -d` is the real safety, and reordering does
not eliminate the window. Windows reserved-name validation (M5 round 2)
deferred to a follow-up; the current allowlist already rejects path
separators, `..`, leading dot/dash, and the >64-char case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): use randomInt to silence CodeQL biased-modulo finding

CodeQL's `js/biased-cryptographic-random` flagged
`randomBytes(4)[i] % ARRAY.length` in `generateAutoSlug`. The math is
actually exact for the current word-list lengths (256 % 8 == 0), but
the lint rule does not know that — and a future contributor changing
the list to a non-power-of-two length would silently introduce bias.

Switched the index lookups to `crypto.randomInt(0, length)`, which uses
rejection sampling and is uniform by construction. Suffix still uses
`randomBytes(3).toString('hex')` since hex encoding is unbiased.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 3 findings 1-6 from PR QwenLM#4073

The previous round added `getRepoTopLevel` for `enter_worktree`'s
provisioning, but missed three sibling call sites that still used the
raw cwd. The double-cleanup race in the foreground path also leaked
stale `[worktree preserved]` suffixes on rejected promises. All six
findings from the deeper audit are addressed:

1. exit_worktree now resolves through `getRepoTopLevel()` before
   building its `GitWorktreeService`, mirroring `enter_worktree`. Without
   this, launching `qwen` from a monorepo subdirectory created the
   worktree under the repo root but exit_worktree looked under the
   subdir's `.qwen/worktrees/` and always returned "Worktree not found".
   Verified end-to-end: enter + exit from `packages/core/` works.

2. agent.ts cleanup helper now nulls `worktreeIsolation` immediately
   after capturing the closure value. The previous structure could
   reach the helper twice — once in the foreground try's success path
   and once in the foreground finally fallback (or once in the inner
   try and once in the outer catch on a thrown rejection). The second
   call would `hasWorktreeChanges()` against a directory the first
   call already removed, fail-closed, and emit a bogus
   `[worktree preserved: <missing path>]` suffix.

3. Config.initialize's startup sweep now resolves `getRepoTopLevel()`
   before invoking `cleanupStaleAgentWorktrees`. Without this, every
   subdir launch scanned a non-existent `<subdir>/.qwen/worktrees/`
   and the 30-day expiry sweep was permanently a no-op.

4. agent.ts's `buildWorktreeNotice` now passes
   `worktreeIsolation.repoRoot` as `parentCwd` instead of
   `this.config.getTargetDir()`. The notice's path-translation
   guidance (≈ "translate paths from <parent> to <worktree>") would
   otherwise misdirect the subagent in a monorepo subdir launch.

5. Removed dead method `GitWorktreeService.listUserWorktrees`. It had
   no callers anywhere in the codebase and used `execSync` in a loop
   (would have blocked the event loop if anyone wired it up).

6. `localBranchExists` no longer swallows git failures silently. The
   defensive `false` default is preserved (so `git worktree add -b`
   itself surfaces the conflict if the check missed an existing
   branch), but the catch now logs via `debugLogger.warn` so disk-full
   / permission / ref-store-corruption cases are visible in debug
   output instead of being invisible.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 4 findings (data-loss + visibility)

Seven actionable findings from a deeper audit, all closed:

1. User worktree slugs could collide with ephemeral-agent shape
   `validateUserWorktreeSlug` did not reject names starting with
   `agent-`, so a user-named `agent-1234567` matched the cleanup regex
   `/^agent-[0-9a-f]{7}$/` and would be silently swept after 30 days
   along with whatever work was in it. Now reserved — clear error
   message points users at the cause.

2. Slug producer and consumer were string-coupled across files
   `agent.ts` hardcoded `agent-${hex(7)}` and `worktreeCleanup.ts`
   independently hardcoded `/^agent-[0-9a-f]{7}$/`. Future change to
   hex length on one side would silently break the other. Lifted
   `AGENT_WORKTREE_PREFIX`, `AGENT_WORKTREE_HEX_LENGTH`,
   `AGENT_WORKTREE_SLUG_PATTERN`, and `generateAgentWorktreeSlug()` to
   `gitWorktreeService.ts`; both call sites import them.

3. Startup sweep was invisible at default log level
   Fire-and-forget sweep used `debug` for errors and discarded the
   success count. A leak-chasing operator had no log breadcrumb.
   Errors promoted to `warn`; successful removals (count > 0) logged
   at `info`.

4. `getRepoTopLevel()` silent catch
   Returned `null` on any git failure with no log. Combined with
   `?? cwd` fallback in callers, a flaky git would have made worktree
   creators and the startup sweep disagree silently about which dir to
   use. Now logs the underlying error.

5. `hasTrackedChanges()` silent catch
   Cleanup's fail-closed `return true` had no log. Couldn't tell
   "has real changes — leave alone" from "git index unreadable — repo
   may be corrupt". Now logs.

6. `cleanupWorktreeIsolation` claimed `preservedPath` for a removed dir
   When `removeUserWorktree` returns `{ success: true, branchPreserved:
   true }` it has already deleted the directory and failed only on
   `git branch -d`. The helper still reported the (now non-existent)
   path as preserved. Now returns only `preservedBranch` for that
   case; `formatWorktreeSuffix` emits a distinct message instructing
   recovery via `git worktree add <new-path> <branch>`.

7. `removeUserWorktree` swallowed branch-delete failures
   Both `-d` and `-D` catch blocks were empty. Locked refs, perms,
   disk full all looked identical to "unmerged commits". Both now
   `debugLogger.warn` with the underlying error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(worktree): self-review pass — reuse, parallelism, dead code

Self-review caught a handful of issues across three categories:

Reuse:
- `pathExists` in the new code now uses the existing `fileExists` from
  `utils/fileUtils.ts` instead of duplicating an `fs.access` wrapper.
- `worktree-` branch prefix was string-literalled in five places. Added
  `WORKTREE_BRANCH_PREFIX` and `worktreeBranchForSlug(slug)` exports in
  `gitWorktreeService.ts`; updated `gitWorktreeService.ts`,
  `worktreeCleanup.ts`, and `exit-worktree.ts` to use them. Future
  prefix changes are a single edit.

Efficiency:
- `Config.initialize` used two `await import(...)` calls inside the
  startup-sweep IIFE, paying that cost on every CLI start. Switched to
  static imports at the top of `config.ts` — the modules are tiny and
  the dynamic indirection bought nothing.
- `cleanupWorktreeIsolation` in `agent.ts` ran `hasWorktreeChanges` and
  `hasUnmergedWorktreeCommits` sequentially. They have no data
  dependency on each other and each spawns its own `git` invocation;
  `Promise.all` halves the cleanup wall-clock on the common path.
  Same fix in `worktreeCleanup.ts`'s per-entry loop.
- `ensureWorktreesGitignored` used `fs.access` then `fs.writeFile`, a
  TOCTOU race when two agent invocations created worktrees concurrently
  (both could pass the `access` check and the second would clobber the
  first's `.gitignore`). Now writes with `flag: 'wx'` and treats
  `EEXIST` as the no-op case — atomic in one syscall.

Quality:
- Dropped the `worktreeCleanupRan` boolean in the foreground execution
  path. `cleanupWorktreeIsolation` already nulls its closure variable
  at the top of every call (see the comment at its definition), so
  re-entries are no-ops. The boolean and its tracking were dead weight
  that obscured the real guard.
- Trimmed the Phase-2 override comment block to drop the WHAT-stating
  enumerations (items 3 and 4 just narrated the lines below) and
  removed a navigation comment about hoisted helpers — the helpers are
  visible at the top of the same method.

84 unit tests pass; typecheck clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 5 — design-doc commitments + correctness

Five critical findings + four suggestions, all closed.

Critical:
1. Wrong base branch for agent isolation. `createUserWorktree(slug)` with
   no `baseBranch` arg fell back to `getCurrentBranch()` on the **main**
   working tree, returning `main` regardless of which branch the user
   was actually on. A subagent invoked from `feature-x` would silently
   start from `main` and produce diffs against the wrong baseline.
   `enter_worktree` had the same bug. Both now resolve the parent's
   current branch first and pass it explicitly. Verified end-to-end:
   `git checkout feature-x` → `enter_worktree` → worktree HEAD includes
   the feature-x commit.

2. `countWorktreeChanges` (used by `exit_worktree`'s dirty-state guard)
   missed `status.conflicted[]`. In simple-git that array is mutually
   exclusive with the staged/modified/etc. arrays, so a worktree
   mid-merge with only conflicts looked `{tracked: 0, untracked: 0}`
   to the guard and `action='remove'` would proceed without
   `discard_changes: true`. Added `+ status.conflicted.length`.

3. `exit_worktree` had no session-ownership check, contradicting the
   design doc's "only operates on worktrees created by THIS session".
   In yolo mode a prompt injection could enumerate `.qwen/worktrees/`
   and pass any name to drop another session's work. Now:
   `enter_worktree` and agent isolation write a `.qwen-session`
   marker into the worktree at provisioning time; `exit_worktree
   action='remove'` reads it and refuses if it does not match the
   current `Config.getSessionId()`. Worktrees from before this guard
   (no marker file) are treated as "owner unknown" — allowed with a
   warn log so the change is observable.

4. `enter_worktree` did not refuse nested invocations from inside an
   existing worktree, contradicting the design doc. Now rejects any
   cwd containing `.qwen/worktrees/` as a path component, with a
   clear "Already inside a git worktree…" message. Verified: enter
   from inside a worktree returns is_error with that text.

6. `hasTrackedChanges` (cleanup sweep) had the same `conflicted[]`
   gap. Rewrote to use raw `git status --porcelain --untracked-files=no`
   which lists every tracked change including `UU` conflict markers
   in a single git call and explicitly skips the untracked walk
   (the prior comment claimed to skip it, but `status()` always
   does the scan).

Suggestion:
7. `buildWorktreeNotice` now receives the parent agent's actual
   `getTargetDir()` again (was switched to `repoRoot` in round 3 on
   a different reviewer's suggestion; round-5 caught that the model's
   inherited paths reference the parent's cwd, not necessarily the
   repo root, so the prior behaviour was correct).

8. Startup sweep now does `fs.access(<targetDir>/.qwen/worktrees)`
   *before* importing GitWorktreeService and spawning `git
   rev-parse --show-toplevel`. The git probe is reserved for users
   who actually have a worktrees directory locally — 99% of users
   pay only one syscall on startup.

9. Tests:
   - New `exit-worktree.test.ts` covers metadata, validation,
     `getDefaultPermission` (ask vs allow), and getDescription.
   - `agent.test.ts` adds three `validateToolParams` cases for the
     `isolation` parameter (accepted with subagent_type, rejected
     without, rejected for non-"worktree" values).
   - `enter-worktree.test.ts` adds round-trip tests for
     `writeWorktreeSessionMarker` / `readWorktreeSessionMarker` plus
     a `worktreeBranchForSlug` sanity check.
   - Total: 101 tests pass (was 86 → +15).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): drop unused @ts-expect-error in exit-worktree.test.ts

Empty string `''` is a valid `string` type, so the @ts-expect-error
directive on `validateToolParams({ name: '', action: 'keep' })` did
nothing — TypeScript correctly accepted the line, and `tsc --build`
in CI reported TS2578 ("Unused '@ts-expect-error' directive"). The
runtime assertion already covers the case; the directive was leftover
from an earlier draft.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): use importActual in ArenaManager mock to preserve new exports

The Arena test mocks `gitWorktreeService.js` with a factory that
returns only `{ GitWorktreeService }`. PR QwenLM#4073 added several other
exports to that module (`AGENT_WORKTREE_SLUG_PATTERN`,
`WORKTREE_BRANCH_PREFIX`, `worktreeBranchForSlug`,
`generateAgentWorktreeSlug`, `writeWorktreeSessionMarker`,
`readWorktreeSessionMarker`, `WORKTREE_SESSION_FILE`).

Other modules in the dep graph reach the mocked surface — most
notably `worktreeCleanup.ts` imports `AGENT_WORKTREE_SLUG_PATTERN`
and `worktreeBranchForSlug`, and now reaches the mock via the static
`config.ts` → `worktreeCleanup.ts` import chain added in the
self-review pass. The Arena test failed at module-load with:

  Caused by: Error: [vitest] No "AGENT_WORKTREE_SLUG_PATTERN" export
  is defined on the "../../services/gitWorktreeService.js" mock. Did
  you forget to return it from "vi.mock"?

Use `importOriginal` to capture every real export, spread it into
the return object, and only replace `GitWorktreeService` (the class
the test actually needs to mock). The class-level mock keeps its
existing static-method shims.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): address review round 6 (5 critical + 6 suggestions)

The biggest item — #1 — is a self-inflicted regression from round 5:
the new agent- prefix reservation in `validateUserWorktreeSlug`
rejected EVERY slug that `generateAgentWorktreeSlug` produces, since
that helper emits exactly `agent-<7hex>`. Net effect: every
`AgentTool isolation: 'worktree'` invocation failed at validation.
The reservation now allows the canonical pattern through (everything
the helper can produce) and only rejects user-chosen `agent-*` names
that don't match it. Added a round-trip regression guard: 50
`generateAgentWorktreeSlug()` outputs are fed back through
`validateUserWorktreeSlug` and must all pass.

Other critical fixes:

2. `hasWorktreeChanges` (used by agent isolation cleanup) was the
   one remaining caller relying solely on `status.isClean()`.
   Defensive `|| status.conflicted.length > 0` so a future simple-git
   bookkeeping change can't let a mid-merge worktree appear clean and
   get auto-deleted.

3. `readWorktreeSessionMarker` swallowed every I/O error as "marker
   missing", which let a disk error / EACCES silently bypass the
   session-ownership guard. ENOENT is still treated as missing
   (legitimate); every other code now logs.

4. `exit_worktree` `fs.stat` catch was the same shape — every error
   collapsed to "Worktree not found". ENOENT → not found; everything
   else logs and returns a distinct "cannot access" error.

5. `cleanupStaleAgentWorktrees` `fs.stat` catch was again the same.
   ENOENT → silently skip (entry vanished between readdir and stat);
   everything else logs.

Suggestions:

6. Startup sweep fast-bail was running BEFORE resolving the repo
   top-level. For monorepo subdir launches, `targetDir/.qwen/worktrees`
   never exists and the sweep early-returned — permanently a no-op.
   Now resolves the root first, then fast-bails against the resolved
   `<root>/.qwen/worktrees`. Also logs the skip case so operators can
   tell "skipped" from "ran, found nothing".

7. `.qwen-session` marker was visible to `git add -A` inside the
   worktree. Now writes a `.git/info/exclude` rule (resolved via
   `git rev-parse --git-dir`, since worktree `.git` is a file
   pointing at the parent repo's `.git/worktrees/<name>/`).
   Best-effort: failure to write the rule does not abort
   provisioning.

8. Agent isolation now refuses to provision when the parent's cwd is
   already inside a worktree — same regex guard as `enter_worktree`.

9. `exit_worktree`'s wrapper around `hasUnmergedWorktreeCommits` now
   logs at the call site so the chain (caller → reason it asked →
   underlying git error) is complete in operator logs.

10. Sweep now logs unconditionally at `info`. Three distinct messages:
    "skipped (no worktrees dir)", "ran, nothing to remove", "removed N".

Tests:

11. New `execute()` coverage:
    • exit-worktree: session-ownership refusal, keep happy path,
      legacy/no-marker fallthrough with warn log, missing-worktree
      error, unmerged-commits guard with `discard_changes: true`,
      `writeWorktreeSessionMarker` round-trip.
    • enter-worktree: nested-guard rejection, non-git-repo error.
    These spin up real temp git repos (no filesystem mocking) and
    drive the actual tool invocation pipeline.

   Total: 135 tests pass (was 101 → +34).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(worktree): demote noise startup-sweep logs to debug

Self-review pass applying the round-6 review-triage framework
(filter #5: "If a log only fires on the happy path, it's noise.")
to my own round-6 changes:

- "Stale worktree sweep skipped: <dir> does not exist" — fires on
  every CLI start for ~99% of users who never use worktrees.
- "Stale worktree sweep ran under <root>: nothing to remove" —
  fires on every CLI start for users who have any worktrees but
  no stale ones at the moment.

Both are happy-path noise at `info`. Demoted to `debug` so an
operator can opt in via `--debug` when they want to confirm the
sweep is wired up, but normal output stays clean.

Only the actually-actionable case ("removed N worktrees") stays at
`info` — that's the signal someone chasing a worktree leak would
grep for.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(worktree): close AUTO_EDIT bypass + parent-dirty stale-code hazard

Round-7 review caught two correctness gaps:

1. exit_worktree action='remove' was still auto-approved in AUTO_EDIT
   `getDefaultPermission` returning 'ask' is necessary but not
   sufficient. `permissionFlow.isAutoEditApproved` auto-approves any
   tool whose `confirmationDetails.type` is 'edit' OR 'info', and
   `BaseToolInvocation` returns 'info' by default. So a session in
   AUTO_EDIT could silently destroy a worktree (with branch deletion)
   without a confirmation prompt — the data-loss path the round-1
   `'ask'` switch was meant to close. Now overrides
   `getConfirmationDetails` to return `type: 'exec'` for action=remove,
   which keeps the prompt in AUTO_EDIT. The `keep` action still falls
   through to the base info-type since it is non-destructive.

   Regression-guard test asserts the type is 'exec' (not 'info') for
   remove and that the command field describes both the worktree-remove
   and branch-delete operations.

2. Agent isolation worktrees ran against parent's HEAD, not its
   working tree
   `git worktree add -b <branch> <path> <base>` only checks out the
   base ref's tip — uncommitted edits in the parent's working tree do
   NOT propagate. The "edit code → ask review/test agent before
   committing" workflow silently ran the subagent against the
   pre-edit HEAD and returned results that looked authoritative but
   reflected stale code.

   Reviewer offered two options: overlay parent's dirty state à la
   Arena (~50 LOC, edge cases), or refuse isolation when parent is
   dirty (~10 LOC, clear UX). Chose the latter for Phase B scope —
   simpler, decisive, and matches the design-doc's explicit
   commitment that dirty-state overlay is Arena-specific. Users can
   commit/stash before re-invoking agent isolation; overlay can be a
   follow-up if users complain about the friction.

   Fail-closed on the dirty-check itself (assume dirty rather than
   silently launch on a possibly-stale tree).

   Test exercises both "dirty parent → guard fires" and
   "clean parent → guard passes" against real temp git repos.

139 unit tests pass (was 135, +4 regression guards).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request May 23, 2026
…enLM#4175 Wave 2.5 PR 10) (QwenLM#4237)

* feat(serve): SSE replay sizing + slow_client_warning backpressure

QwenLM#4175 Wave 2.5 PR 10. Closes the SSE replay / backpressure knobs
called out in QwenLM#3803 §02 so chatty Stage 1 sessions get an honest
reconnect window and operators get a heads-up signal before clients
are summarily evicted.

- **`DEFAULT_RING_SIZE` 4000 → 8000.** Per-session replay ring depth
  now matches the QwenLM#3803 §02 target for chatty sessions.
- **`--event-ring-size <n>`** CLI flag (default 8000) lets operators
  tune the ring per daemon. Threaded `ServeOptions` →
  `BridgeOptions.eventRingSize` → both `new EventBus()` construction
  sites (fresh sessions + restore path). Validation is fail-CLOSED
  (positive finite integer; 0 / NaN / negative throw at boot).
- **`slow_client_warning` SSE frame.** When a subscriber's queue
  crosses 75% full the bus force-pushes a synthetic
  `slow_client_warning` to that subscriber once per overflow
  episode, carrying `{queueSize, maxQueued, lastEventId}`. The flag
  re-arms after the queue drains below 37.5% (hysteresis, no flap
  near threshold). If the queue actually overflows after the
  warning, the existing `client_evicted` terminal frame path still
  fires. Like `client_evicted`, the warning has no `id` (synthetic
  frame; must not burn a sequence slot for other subscribers).
- **`?maxQueued=N`** query param on `GET /session/:id/events`
  (range `[16, 2048]`, default 256). Lets cold reconnect clients
  pre-size their per-subscriber backlog so a large `Last-Event-ID:
  0` replay doesn't trip the warning on the first publish. Range
  rationale: lower bound 16 (smaller is useless for any replay);
  upper bound 2048 (so a single subscriber can't pin ~1 MB just by
  asking). Out-of-range / non-decimal returns `400
  invalid_max_queued` BEFORE opening the SSE stream — clean 4xx
  beats half-opening a stream + emitting a `stream_error` (which
  EventSource would auto-reconnect on).
- **`slow_client_warning` capability tag** — single source of truth
  for the warning frame + `?maxQueued` query param + ring-size
  knob. Old daemons silently lack all of these; pre-flight via
  `caps.features`.
- **SDK extensions** (`@qwen-code/sdk`): typed
  `DaemonSlowClientWarningEvent` (added to known event union and
  `DaemonStreamLifecycleEvent`); schema-validated by a new
  `isSlowClientWarningData` predicate; reducer
  (`reduceDaemonSessionEvent`) increments `slowClientWarningCount`
  + stores `lastSlowClientWarning`. Warning is **non-terminal** —
  `alive` stays true (only `client_evicted` / `stream_error` /
  `session_died` close the stream). Re-exported from the public
  SDK entry.
- **Docs**: `qwen-serve-protocol.md` updates the features list (adds
  `slow_client_warning` and the previously-missing `client_identity`
  to match reality post-QwenLM#4231), documents the `?maxQueued` query
  param, adds the warning frame to the event table, and notes the
  new default ring size. `qwen-serve.md` adds the `--event-ring-size`
  flag row.

Tests: 19 eventBus (4 new: warning at 75%, once per episode,
no `id` on the synthetic frame, hysteresis re-arm), 106 bridge
(2 new: validate eventRingSize accept/reject), 111 server (4 new:
?maxQueued accept/absent/non-decimal/out-of-range +
EXPECTED_STAGE1_FEATURES update), 14 SDK daemonEvents (2 new:
schema validation + non-terminal reducer behavior). 321 focused
tests total, all green.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* refactor(serve): adopt PR QwenLM#4237 review feedback (eventBus polish)

Address the actionable items from the Qwen Code review bot's pass
on PR QwenLM#4237:

- Pre-compute `warnThreshold` / `warnResetThreshold` per
  `InternalSub` at `subscribe()` time so `publish()`'s per-event
  hot path is one integer compare per subscriber instead of a
  multiply + compare. The `!warned` short-circuit still collapses
  the steady state to a single boolean read; this just shaves a
  multiply when the threshold check actually fires.
- Document the back-of-queue ordering choice for the synthetic
  `slow_client_warning` frame in `EventBus.publish()`: front-push
  was considered but mid-stream front-insertion would mis-count
  `forcedInBuf` in `BoundedAsyncQueue.next()`, and `forcePush`
  already short-circuits via `resolvers.shift()` for the
  active-consumer case — the back-of-queue path only matters for
  stalled consumers, who can't drain regardless of warning
  position.
- Reuse the existing `collect()` helper in the "default ring size
  8000" test for consistency with the rest of the file; the new
  test also tightens the assertion by checking that the first
  retained event id is 2 (id=1 dropped by the ring) and the last
  is 8001.
- Soften the "~500 B per session" magic number in
  `BridgeOptions.eventRingSize`'s JSDoc to a qualitative
  description (each retained `BridgeEvent` is a reference plus its
  serialized payload; ceiling scales as
  `ringSize × average-event-size`).

Rejected:
- Bot's claim that the error JSON contains `\`...\`` escape
  sequences — bot misread the JS template-literal source as the
  wire output; `JSON.stringify` does not escape backticks, and
  the existing `cwd` error messages use the same style.
- Bot's "use `Record<string, never>` instead of `[key: string]:
  unknown`" suggestion on `DaemonSlowClientWarningData` — every
  other event-data type in `sdk-typescript/src/daemon/events.ts`
  carries the same index signature for additive-field
  compatibility.
- Bot's "features list breaks alphabetical order" — the
  capability list is grouped by protocol lifecycle (health →
  capabilities → session lifecycle → events → permissions), not
  alphabetical.

Tests: 139 focused tests across eventBus + httpAcpBridge + SDK
daemon events — all passing. Behavior unchanged; this is
hot-path micro-opt + comment polish only.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve): correct queue tagging + plumb maxQueued through SDK

Address both P2 findings from the Codex review pass on PR QwenLM#4237.

**Bug 1: `BoundedAsyncQueue.forcedInBuf` position-invariant break**

The previous `forcedInBuf` counter only tracked LIVE-vs-FORCED
correctly when all forced entries lived at the FRONT of the buffer
(subscribe-time `Last-Event-ID` replay). The new mid-stream
`slow_client_warning` path force-pushes to the BACK of the queue
while the queue is still open, which the existing accounting was
not designed for:

  - publish 6 events at maxQueued=8 → 75% threshold trips →
    force-push warning at the back → buf=[1..6, warning],
    forcedInBuf=1.
  - consumer shifts `1` → forcedInBuf decremented to 0 (incorrect:
    `1` was a live frame, not the forced one).
  - consumer drains 2..6 + warning → buf=[], forcedInBuf=0, true
    live count = 0, but `size` getter and `push()` cap check then
    use `buf.length - forcedInBuf` which drifts over subsequent
    refills, causing premature warn / eviction before the cap is
    actually reached.

Replace the position-dependent counter with a per-entry
`{value, forced}` tag. `liveCount` is incremented in `push()` /
decremented in `next()` only when the shifted entry was non-forced
— position becomes irrelevant. `size` getter returns `liveCount`
directly. The class doc comment is rewritten to call out that the
new tag is the position-independent replacement for the old
"forced frames must stay at the front" invariant.

Regression test in `eventBus.test.ts` reproduces the codex trace
(warn at 75%, drain past warning, refill to cap) and asserts no
premature eviction.

**Bug 2: SDK does not expose `?maxQueued`**

`docs/users/qwen-serve.md` and `docs/developers/qwen-serve-protocol.md`
both document `?maxQueued=N` as something SDK clients can request,
but `SubscribeOptions` on `DaemonClient` only declared `lastEventId`
+ `signal`, and `subscribeEvents()` always fetched `/events` without
a query string. Typed-SDK consumers had no way to opt in without
hand-crafting URLs.

  - Add `SubscribeOptions.maxQueued?: number` with JSDoc noting the
    daemon range `[16, 2048]` and the pre-flight requirement on
    `caps.features.slow_client_warning`.
  - `DaemonClient.subscribeEvents` builds the URL with an optional
    `?maxQueued=<n>` segment. No client-side range validation —
    the daemon's `parseMaxQueuedQuery` is the source of truth and
    returns structured `400 invalid_max_queued`; duplicating the
    bounds in two layers would diverge on the next tweak.
  - `DaemonSessionSubscribeOptions extends SubscribeOptions` so the
    new field flows through `DaemonSessionClient` automatically.

Three new SDK tests:
  - subscribeEvents appends `?maxQueued=N` when set
  - omits the query string when absent (existing behavior preserved)
  - propagates a `400 invalid_max_queued` unchanged

Tests: 214 focused tests across eventBus / bridge / SDK
DaemonClient / DaemonSessionClient / daemonEvents, plus 111 in the
server suite. All green; the new eventBus regression case proves
the position-invariant fix.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* refactor(serve): adopt PR QwenLM#4237 copilot review feedback

Address 6 of 8 copilot-reviewer findings on PR QwenLM#4237; the other 2
(#1 forcedInBuf live-size corruption, #5 SDK lacks maxQueued) were
already fixed in bae42c8 — replied on the threads with the
commit hash.

- **[2] server.ts:1068** — `?maxQueued=` (present-but-empty) now
  fails closed with `400 invalid_max_queued` instead of silently
  falling back to the default queue cap. The API documents
  fail-closed for any malformed value before opening SSE, so an
  empty string is unambiguously malformed. New server.test.ts
  case locks this in.
- **[3] commands/serve.ts:93** — CLI help text for
  `--event-ring-size` no longer mis-shapes `Last-Event-ID` as a
  query parameter. It is an HTTP header, and the daemon's SSE
  route does not parse a `?Last-Event-ID=` query.
- **[4] docs/developers/qwen-serve-protocol.md:351** — clarify
  that `?maxQueued=N` controls the LIVE-event backlog cap.
  Replay frames are force-pushed and exempt from the cap; what
  consumes it is live events that arrive while the subscriber is
  still draining a cold-reconnect replay. Bumping for cold
  reconnects is still the right answer, but for the live tail,
  not for the replay frames themselves.
- **[6] eventBus.ts:214** — stale `ringSize=4000` performance
  comment updated to the new `ringSize=8000` default with a note
  about the O(n) `shift()` cost scaling.
- **[7] sdk-typescript events.ts:492** — `isSlowClientWarningData`
  now uses the existing `isFiniteNumber` helper instead of bare
  `typeof === 'number'`. Mirrors the sibling predicates and
  rejects `NaN` / `Infinity` payloads as schema garbage. New
  daemonEvents.test.ts assertions cover both.
- **[8] server.ts:127** — `createServeApp`'s default-bridge
  construction now also forwards `opts.eventRingSize` to
  `createHttpAcpBridge`, symmetric with the `runQwenServe.ts`
  path. Direct embeds / tests that called `createServeApp`
  without supplying their own bridge but did pass
  `ServeOptions.eventRingSize` were silently getting the
  default 8000 ring.

Tests: 326 focused tests across eventBus / bridge / SDK
DaemonClient / DaemonSessionClient / daemonEvents / server. All
green; the new server.test.ts case + the extended
daemonEvents.test.ts assertions cover the tightened guards.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* refactor(serve): adopt PR QwenLM#4237 wenshao round-2 review feedback

Six adopted findings from @wenshao's second review pass on
PR QwenLM#4237. The seventh ([10] forcedInBuf 3rd case invariant) was
already fixed in bae42c8 — replied on that thread.

- **[9] + [14] server.ts** — Sanitize attacker-controlled values
  before stderr interpolation in both `parseMaxQueuedQuery` and
  `parseLastEventId`. New `safeLogValue()` helper uses
  `JSON.stringify` to escape control characters (`\n`/`\r`/…) so a
  URL-encoded newline in `?maxQueued=%0a` can't inject extra log
  lines into journald/Loki/Splunk pipelines. Matches the
  `workspace_mismatch` sanitization style in `sendBridgeError`.
  Fixed in both helpers (the sibling pre-existing
  `parseLastEventId` had the same shape) so the file stays
  consistent.

- **[11] httpAcpBridge.ts** — `!Number.isFinite(eventRingSize)`
  was redundant: `Number.isInteger(NaN)` and
  `Number.isInteger(Infinity)` both return `false`, so the sibling
  `!Number.isInteger` already catches both. Drop the dead guard.

- **[12] httpAcpBridge.ts** — Add soft upper bound
  `MAX_EVENT_RING_SIZE = 1_000_000` on `eventRingSize` to catch
  operator typos (`--event-ring-size 80000000` vs `8000000`). At
  ~500 B per `BridgeEvent` an 1M-frame ring already pins ~500 MB
  per session — well past any realistic workload. Not a security
  boundary (operator-controlled flag), pure typo defense. Existing
  bridge construction test extended with an `80_000_000` case.

- **[13] commands/serve.ts** — CLI `--event-ring-size` flag now
  sources its default from `DEFAULT_RING_SIZE` (imported from
  `serve/eventBus.js`) instead of the hardcoded literal `8000`.
  Without this, a future bump of the bus default would silently
  not take effect for daemons launched through the CLI because
  the flag always overrides — single source of truth fixes that.

- **[15] eventBus.ts** — Drop unreachable `event.id ?? this.lastEventId`
  fallback in the `slow_client_warning` frame. `event` is locally
  constructed at the top of `publish()` with `id: this.nextId++`
  and is guaranteed defined. Use `event.id as number` directly +
  an inline note about the invariant.

Tests: 197 (eventBus 20 / bridge 107 / SDK DaemonClient 57 / SDK
daemonEvents 14) + 112 server. All green; the new upper-bound
bridge case + the existing log assertions pin the changed
behaviors.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request May 23, 2026
* feat(serve): mutation gating helper and --require-auth

Implements issue QwenLM#4175 Wave 4 PR 15. Adds the centralized
state-changing-route gate that Wave 4 follow-ups (memory CRUD, file
edit, MCP restart, device-flow auth) will reuse, plus the
`--require-auth` deployment knob that hardens the loopback developer
default for shared dev hosts / CI runners.

- `createMutationGate({ tokenConfigured, requireAuth })` factory in
  serve/auth.ts — per-route middleware with a 4-cell behavior matrix:
  pass-through under `requireAuth` or any token configured;
  `401 token_required` for `strict: true` routes on no-token loopback
  defaults; baseline pass-through otherwise.
- Existing Wave 1-2 mutation routes (POST /session, /session/:id/{load,
  resume,prompt,cancel,model}, /permission/:requestId) opt into the
  default non-strict factory call as the centralization marker. Wave 4
  routes will pass `{ strict: true }` to require a token even on
  loopback.
- `--require-auth` CLI flag + `ServeOptions.requireAuth`. Boot refuses
  without a token; closes the `/health` exemption when on so loopback
  `/health` also requires bearer auth; stderr breadcrumb so the
  hardened mode is visible in journald/docker logs.
- Conditional `require_auth` capability tag advertised only when the
  flag is on. New `CONDITIONAL_SERVE_FEATURES` registry primitive so
  future per-deployment toggles follow the same shape.
- 5 new unit tests in auth.test.ts covering the gate matrix; 5 added
  in server.test.ts for capability advertisement, conditional tag,
  /health 401 under --require-auth, and runQwenServe boot
  refusal + happy path. 245/245 serve tests pass; typecheck + eslint
  clean.

Refs: QwenLM#4175

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR QwenLM#4236 review feedback

Three small follow-ups from the automated reviewers on PR QwenLM#4236:

1. **Drop misleading `--require-auth` from `token_required` error
   message** (Copilot inline auth.ts:262). The strict-mode 401
   listed three remediations but `--require-auth` is paired-required
   with a token at boot — naming it standalone would loop the operator
   into a different boot error. Keep the two valid standalone fixes
   (env var, --token); add inline note explaining the omission.
   `auth.test.ts` regex updated to `not.toMatch(/--require-auth/)`
   to anchor the new wording.

2. **Mention `/health` gating in `--require-auth` CLI description**
   (auto-reviewer Medium #2). Operators flipping the flag without
   reading the protocol doc would get paged when k8s/Compose probes
   start 401-ing. One sentence in the yargs description prevents that.

3. **Drift insurance comment between registry and
   `CONDITIONAL_SERVE_FEATURES`** (auto-reviewer Low #3). Document
   the four-step procedure for adding a new conditional tag so a
   future contributor doesn't update only the registry and silently
   advertise the tag unconditionally. Notes the Map<predicate>
   refactor as the right move when a second tag lands.

Deferred (not in this fix-up):
- Module-level PASSTHROUGH singleton (High #1) — micro-optimization,
  unmeasurable.
- Map<feature, predicate> for conditional features (High #2) —
  premature abstraction with one tag.
- Per-route `// non-strict marker` comments (Medium #1) — noise.
- `@see` cross-ref in types.ts (Low #2) — sugar.
- JSDoc bullet-list vs table (Low #1) — current format is fine.

Refs: QwenLM#4175 QwenLM#4236

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR QwenLM#4236 round-2 review feedback

Five small follow-ups from @wenshao + DeepSeek (via Qwen Code /review)
on PR QwenLM#4236:

1. **Map<predicate> refactor for `CONDITIONAL_SERVE_FEATURES`**
   (review threads #3254467192 + #3254485912). Two reviewers asked
   for the same shape on the grounds that the `Set` + per-feature
   `if`-branch needed FOUR coordinated changes per new conditional
   tag and silently fail-CLOSED when the branch was missed. The Map
   collapses the predicate-decision and the set-membership into one
   entry per feature — adding a new conditional tag is now two
   coordinated changes (registry + Map entry) and a missing predicate
   is a TypeScript error rather than a silent omission. JSDoc
   updated.

2. **Drift-insurance test that iterates `CONDITIONAL_SERVE_FEATURES`**
   (review thread #3254467192 option 1, layered on top of #1).
   `server.test.ts` now walks every Map entry and asserts the
   predicate accepts/rejects as expected; future entries that don't
   add an assertion branch fail the test loudly so a missing
   predicate cannot ship silently. Adoption-of-record for the Map
   shape rather than relying on a hand-maintained invariant.

3. **Cache `strictDenier` for allocation symmetry** (review thread
   #3254467193). Wave 4 PRs will mount strict mode on multiple
   routes; without the cache each `mutate({strict:true})` call would
   allocate a fresh 401 closure. Now both the passthrough and the
   strict denier are pre-built singletons. Identity assertion in
   `auth.test.ts` anchors the cache so a future change that loses it
   surfaces in CI.

4. **Doc cosmetic — extra blank line in qwen-serve.md** (review
   thread #3254467198). Single blank line between the `>` quoted
   example and the following non-quoted bash block now.

5. **Doc correctness — `require_auth` is post-auth confirmation**
   (review thread #3254485910 from DeepSeek). When `--require-auth`
   is on, the global `bearerAuth` middleware gates every route
   including `/capabilities`, so an unauthenticated client cannot
   pre-flight `caps.features` to discover that auth is required —
   the discovery surface is the 401 response body itself. Both
   `qwen-serve.md` and `qwen-serve-protocol.md` rewritten to
   describe the tag as a post-authentication confirmation, matching
   the auth.ts JSDoc which already stated this correctly.

Trade-offs documented (no code change):

- **Body-parser ordering** (review thread #3254485915 from DeepSeek)
  noted as a comment block in `auth.ts`. Strict-mode 401 fires AFTER
  `express.json()` because the gate is per-route middleware. On
  loopback no-token defaults a strict route therefore parses the
  request body before refusing it — bounded by
  `express.json({limit: '10mb'})` × `--max-connections` (256
  default). Strict routes Wave 4 actually adds carry small bodies in
  legitimate use, so this isn't a production hot path. Future routes
  accepting large bodies should lift the gate to app-level (maintain
  a strict-path Set in `createServeApp`); flagged as a Wave 4
  follow-up rather than re-architecting the helper.

- **`bearerAuth` body-shape inconsistency** (review thread
  #3254467197 from @wenshao) flagged as a Wave 4 cross-PR
  follow-up. `bearerAuth` returns `{error: 'Unauthorized'}` while
  the strict gate returns `{code: 'token_required', error: '...'}`;
  SDK clients have to branch on both shapes. Standardizing
  `bearerAuth` to also carry a `code` field is orthogonal to this
  PR's scope.

Validation: 260/260 cli serve tests pass (was 258 — added the drift
insurance test + strict denier identity test); typecheck + eslint
clean.

Refs: QwenLM#4175 QwenLM#4236

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request May 23, 2026
…#4247)

* feat(serve): MCP client guardrails (QwenLM#4175 Wave 3 PR 14)

Adds an in-process MCP client counter, slot-reservation enforcement at all 3 spawn sites (discoverAllMcpTools / discoverAllMcpToolsIncremental / readResource), new `--mcp-client-budget=N` + `--mcp-budget-mode={enforce,warn,off}` CLI flags forwarded to the ACP child via env, and additive `clientCount` / `clientBudget` / `budgetMode` / `budgets[]` fields plus `disabledReason: 'budget'` tagging on `GET /workspace/mcp`.

Always-on capability tag `mcp_guardrails` with `modes: ['warn', 'enforce']` so SDK clients can pre-flight refusal semantics. Typed SSE push events (`mcp_budget_warning` / `mcp_child_refused_batch`) intentionally deferred to a small follow-up PR — the snapshot already exposes `budgets[0].status: 'warning'|'error'` + `refusedCount` so operator visibility isn't blocked.

* fixup(serve): address PR 14 review (QwenLM#4247) findings 1-7

Addresses Codex + Copilot review feedback on QwenLM#4247. Seven functional and forward-compat fixes; (8) `tcp` transport mapper vs createTransport deferred pending @wenshao direction (separate core/protocol decision).

1. **Single-server rediscovery bypass** — add `tryReserveSlot` at the top of `discoverMcpToolsForServerInternal`. Pre-fix a server refused at startup could be brought online later via `/mcp reconnect <name>` and exceed the cap in enforce mode.
2. **Empty `budgets[]` when mode=off** — early `return []` in `buildBudgetCells` when mode is `off`. Protocol docs / SDK types promise empty array; pre-fix emitted a synthetic noisy cell.
3. **runQwenServe validation + env leakage** — mirror CLI budget validation in `runQwenServe` (the embedded entry point); explicitly delete `QWEN_SERVE_MCP_*` env vars when options are undefined so multiple daemons in one process don't leak prior budget config to subsequent ACP children.
4. **Disabled-vs-refused precedence + stale refusal log** — config-disable wins over budget refusal in the per-server cell; `removeServer` + `disconnectServer` drop the entry from `lastRefusedServerNames` so operator action immediately clears the budget tag.
5. **Incremental remove-before-reserve ordering** — process config-removed servers FIRST in `discoverAllMcpToolsIncremental` so freed slots are visible to subsequent `tryReserveSlot` calls. Pre-fix scenario {a,b}→{a,c} with budget=2 wasted a slot.
6. **`scope` forward-compat type widening** — `'workspace' | (string & {})` on both `ServeMcpBudgetStatusCell` and `DaemonMcpBudgetStatusCell` so SDK consumers don't break when PR 23 adds `scope: 'pool'` per the documented no-schema-bump contract.
7. **Test comment alignment** — fix "With budget=1" comment to match `clientBudget: 2` code.

Plus 4 new core regression tests covering #1/#2/#4/#5, and 4 new serve tests covering #3 (boot rejection + env cleanup). 237/237 pass across the affected files (36 core mcp-client-manager + 50 acpAgent + 151 serve).

* docs(serve): clarify v1 snapshot-based budget warning detection (QwenLM#4247)

Address github-actions review-summary finding (I) on PR QwenLM#4247: v1 operators have no SSE push event for budget pressure yet (deferred to PR 14b), so the protocol doc should explicitly say how to detect warning / error states from the snapshot. Adds the three-way mapping `budgets[0].status` ↔ live/refused counts.

* fixup(serve): address PR 14 review round 2 (QwenLM#4247 wenshao)

Addresses @wenshao review on PR QwenLM#4247. Three critical safety fixes + four suggestion-level improvements.

Critical (zombie slot leaks — would break `enforce` mode for the rest of the daemon's lifetime):
- C2: `discoverAllMcpTools` connect() catch now releases reservedSlots + clients entry. Pre-fix one failed connect permanently consumed a budget slot.
- C3: `readResource` wraps client.connect() in try/catch; on throw the slot + client entry are cleaned up before re-raising. Tracked `weReservedSlot` so the cleanup only fires for newly-created lazy spawns (reused already-CONNECTED clients are untouched).
- (wenshao C1 was the rediscovery-bypass also caught by Codex + Copilot — already addressed in fixup 597f011.)

Suggestion:
- S4: `readBudgetFromEnv` downgrades `mode='enforce'` → `'off'` when no budget is set, mirroring the CLI + `runQwenServe` invariant. Fail-closed on operator misconfiguration rather than silently bypassing enforcement.
- S5: extract duplicated `mcp_budget_decision` telemetry into private `emitBudgetTelemetry(configuredCount)`.
- S6: rename `BudgetExhaustedError` constructor param `liveCount` → `reservedCount`. `reservedSlots.size` is what's blocking the new server, not the live CONNECTED count (those differ when a reserved server is disconnected).
- S7a: bump accounting-failure log level — `debugLogger.debug` (gated on debug=true) replaced by `process.stderr.write` so production daemons surface slot-leak / type-mismatch failures in journald/docker logs.

(S7b — expose `reservedSlots[]` on the wire for slot-leak debugging — deferred as additive; will be in PR 14b alongside the typed events.)

+ 3 new core regression tests (C2 leak release, C3 lazy-spawn leak release, S4 env enforce-downgrade). 626/626 tests pass across the focused suite; typecheck + lint clean.

* fixup(serve): address PR 14 review round 3 (QwenLM#4247 wenshao second pass)

Addresses @wenshao's second review pass on PR QwenLM#4247 (submitted 15:56Z after round 2 fixup landed). Four code fixes + three doc clarifications.

Code:
- R3 #5: `readResource` lazy-spawn path now checks `isMcpServerDisabled` BEFORE the budget gate. Pre-existing gap: a server disabled via `mcpServers.<name>.disabled: true` or `/mcp disable <name>` could be resurrected by any resource read. Disabled precedence over budget mirrors the per-server cell logic.
- R3 #6: `buildBudgetCells` now receives the post-disabled-filter `refusedCount` so the workspace cell matches the per-server cell precedence. Pre-fix a server disabled after being refused rendered `disabled` on its per-server row but `error: budget_exhausted` on the workspace row.
- R3 #7: extract `MCP_BUDGET_WARN_FRACTION = 0.75` constant. Was hardcoded in `acpAgent.buildBudgetCells` AND `commands/serve.ts` stderr breadcrumb (the latter with `Math.ceil` divergence on non-integer multiples). Pre-extract so PR 14b's dual-threshold (0.75 warn + 0.375 rearm) lands in one file.
- R3 #1: env-var enforce-without-budget downgrade (already fixed in round 2 ba3e3fe S4 — reply-only on the new thread).

Docs:
- R3 #2: docstring on `mcpTransportOf` now spells out the `tcp` vs `createTransport` divergence + records the deferred decision (PR 14b / future core). Closes the "comment claims X but code does Y" gap.
- R3 #3: comments in both `discoverAllMcpTools` catch (release slot — stop() owns lifecycle) AND `discoverMcpToolsForServerInternal` catch (KEEP slot — operator intent + health-monitor retry). Different paths, different contracts, both explicit.
- R3 #4: invariant note in `readResource` lookup→reserve sequence documenting the synchronous no-await guarantee that closes the TOCTOU window.

+ 3 new core regression tests (readResource disabled gate, disabled-wins-over-budget precedence, MCP_BUDGET_WARN_FRACTION pin). 629/629 tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 4 (QwenLM#4247 wenshao second + third pass)

Addresses @wenshao's second + third review passes on PR QwenLM#4247. One critical scope-correction (per-session vs per-workspace) + one zombie leak fix shared across three threads.

Critical correction — per-session vs per-workspace (wenshao R3 line 117 docs):
- Reality check: `acpAgent.newSessionConfig()` constructs a fresh `Config` + `ToolRegistry` + `McpClientManager` for EVERY ACP session. Each manager independently reads `QWEN_SERVE_MCP_CLIENT_BUDGET` env. So `--mcp-client-budget=10` with 5 sessions caps at 5 × 10 = 50 live MCP clients across the daemon, NOT 10. The "per-workspace" framing in v1 docs was incorrect.
- Pragmatic v1 path (not the big refactor): rewrite docs + change `scope: 'workspace'` → `scope: 'session'` so the wire contract reflects reality. Wave 5 PR 23 (shared MCP pool) will introduce a workspace-scoped manager and add `scope: 'workspace'` cells alongside.
- Files touched: `status.ts` + `sdk types.ts` (cell `scope` field widened to `'session' | 'workspace' | (string & {})` with v1 emitting `'session'`), `acpAgent.buildBudgetCells` (emits `'session'` + new code comment explaining the per-session truth), `docs/users/qwen-serve.md` (CLI flag + budget section relabel + ⚠️ v1 limitation callout), `docs/developers/qwen-serve-protocol.md` (capabilities section + JSON example + paragraph rewrite + per-session detection hint).

Zombie leak fix — single weReserved-pattern fix in discoverMcpToolsForServerInternal closes wenshao R3 line 546 + R4 line 639 + R4 line 929:
- Same pattern as R2 C3 (`readResource`): track `weReservedSlot = reservation === 'reserved' && this.reservedSlots.has(serverName)` (the set-membership guard distinguishes a real fresh reservation from `off`-mode's no-op return). On connect-failure, release slot + drop client only when `weReservedSlot`; an `'already_held'` reconnect keeps its slot so health-monitor retry doesn't compete for capacity.
- Pre-fix a brand-new server connecting via /mcp reconnect / health monitor / incremental's serversToUpdate that failed on connect() would permanently consume a budget slot under enforce mode.
- Updated R3's "always keep" doc comment to reflect the new two-mode cleanup (release on fresh + keep on reconnect).
- Caught and added a tripwire test for the `off`-mode no-op edge case (`tryReserveSlot` returns `'reserved'` without adding to the set in off mode — without the has-guard, my fix would have broken the pre-existing "should restore health checks after failed server rediscovery" test by deleting the failed client even in unbudgeted operation).

+ 2 new core regression tests (fresh-reserve connect-failure releases slot, reconnect connect-failure keeps slot). 631/631 focused tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 5 (QwenLM#4247 wenshao fourth pass)

Addresses @wenshao's fourth review pass on PR QwenLM#4247. Two critical zombie-leak / staleness fixes; three reviewer findings deferred or already-addressed (replied + resolved on the threads).

Critical fixes:
- R5 line 956: `runWithDiscoveryTimeout` timeout handler now releases `reservedSlots.delete(serverName)` and drops the stale `lastRefusedServerNames` entry alongside the existing `clients.delete`. Pre-fix a timed-out server in `enforce` mode permanently held its budget slot; N consecutive timeouts permanently degraded daemon capacity. + regression test.
- R5 line 1268-1: `readResource` lazy-spawn path drops the server from `lastRefusedServerNames` when `tryReserveSlot` returns `'reserved'` (a successful late re-reservation). Pre-fix a server refused at discovery but later re-reserved via `readResource` (e.g., after another server freed a slot) kept its stale `disabledReason: 'budget'` tag in the snapshot. + regression test.

Reviewer findings deferred / already done (replied + resolved):
- R5 line 1268-2 (`no try/catch around connect()` in readResource): stale view — R2 C3 fixup ba3e3fe added the try/catch with the weReservedSlot cleanup pattern.
- R5 line 1274 (`BudgetExhaustedError.liveCount` semantic mismatch): R2 S6 fixup ba3e3fe renamed the param + readonly field to `reservedCount`, exactly matching the proposed semantic.
- R5 acpAgent.ts null line (`Math.ceil(0.75 * budget)` for small budgets): proposed fix is semantically a no-op for integer liveCount — `liveCount >= 0.75` and `liveCount >= Math.ceil(0.75) === 1` give identical results when liveCount is an integer. The underlying "small budgets jump ok→error" observation is a real but inherent limitation of percentage-based thresholds at small N; design tradeoff, not implementation bug.

46/46 core tests pass (44 prior + 2 new R5 regression). Typecheck + lint clean.

* fixup(serve): address PR 14 review round 6 (QwenLM#4247 wenshao fifth pass)

Addresses @wenshao's fifth review pass on PR QwenLM#4247. Two critical fixes (one TOCTOU race, one cross-daemon env leak).

Critical fixes:
- R6 Thread 2 (line 956): remove the duplicate pre-reservation block in `discoverAllMcpToolsIncremental`. The reservation already happens inside `discoverMcpToolsForServerInternal` (R1 fix #1). With both sites reserving, the timeout cleanup raced against the inner connect path — `runWithDiscoveryTimeout`'s timeout handler could release the slot mid-flight while the inner `connect()` later resolved successfully, leaving a CONNECTED client with NO reservation and breaking `enforce`-mode budget enforcement. With pre-reservation removed, the inner call owns the entire reservation lifecycle (reserve → connect → release-on-failure-via-weReservedSlot → cleared-by-timeout-if-fires) at a single site. Refusal behavior is observably identical from outside.

- R6 Thread 1 (runQwenServe.ts:216): per-handle env passthrough via new `BridgeOptions.childEnvOverrides` instead of mutating global `process.env`. Pre-fix concurrent embedded `runQwenServe()` handles with different MCP budgets would race on the global env — `defaultSpawnChannelFactory` snapshots `process.env` AT SPAWN TIME, so the last `runQwenServe()` call to set the var would silently win for ALL daemon handles' subsequent ACP child spawns. Wire surface:
  - `ChannelFactory` signature: `(workspaceCwd, childEnvOverrides?) => Promise<AcpChannel>`.
  - `BridgeOptions.childEnvOverrides?: Readonly<Record<string, string | undefined>>` — `undefined` value means "scrub this var from the child env" so an embedded caller can wipe a stale inherited var without touching global state.
  - `defaultSpawnChannelFactory` merges overrides AFTER `SCRUBBED_CHILD_ENV_KEYS` so the daemon-only secret list still wins (operators can't override the scrub).
  - `runQwenServe` closes over per-handle overrides; never touches `process.env`.

+ 3 new regression tests (incremental refusal post-pre-reservation-removal, runQwenServe-doesn't-mutate-process.env, bridge forwards childEnvOverrides to channelFactory with two concurrent bridges asserting isolation). 327/327 focused tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 7 (QwenLM#4247 wenshao sixth pass)

Addresses @wenshao's sixth review pass on PR QwenLM#4247 (glm-5.1 via Qwen Code /review). One critical staleness fix + four real bug fixes + one operator-visibility breadcrumb + one refactor.

Critical:
- R7 #1 line 612: `discoverMcpToolsForServerInternal` now drops the entry from `lastRefusedServerNames` on successful connect+discover. Pre-fix a previously-refused server that reconnects via `/mcp reconnect` (or health-monitor retry after another server frees capacity) left the snapshot reporting `error / disabledReason: 'budget'` for a CONNECTED, working server until the next discovery pass cleared the per-pass log.

Real bugs:
- R7 #2 line 528: disabled gate added to `discoverMcpToolsForServerInternal`. Reachable from `/mcp reconnect`, OAuth re-discovery, and health-monitor `reconnectServer` — none of which previously checked `isMcpServerDisabled`. Pre-fix a disabled server could be resurrected through any of these paths, wasting a budget slot and registering tools the operator told us to ignore. Mirrors the bulk-discovery + readResource patterns. Optional-chain on the call to stay defensive against test fixtures missing the method.
- R7 #3 line 634: transport leak in the `discoverMcpToolsForServerInternal` connect-failure catch. Pre-fix when `connect()` succeeded (transport established) and `discover()` later threw, the catch deleted the client reference without calling `client.disconnect()`, leaking the stdio child / socket until Node exit. Best-effort `await client.disconnect()` added before the map cleanup.
- R7 #4 line 1302: `readResource`'s `weReservedSlot` now uses the same `reservation === 'reserved' && this.reservedSlots.has(serverName)` guard as `discoverMcpToolsForServerInternal`. Distinguishes a real fresh reservation from `off`-mode's no-op return. Maintenance-trap fix; in `off` mode the cleanup branch never fires now.
- R7 #5 line 1342: `readResource` re-checks `isMcpServerDisabled` on EVERY call, regardless of whether the client was just lazy-spawned or pre-existing. Pre-fix a server connected pre-disable and then operator-disabled mid-session via settings reload still served resource reads via its existing CONNECTED client until the next incremental discovery pass called `removeServer`.

Polish:
- R7 #6 line 191: `readBudgetFromEnv` now emits a stderr breadcrumb when env values are invalid (`QWEN_SERVE_MCP_CLIENT_BUDGET=abc`, `QWEN_SERVE_MCP_BUDGET_MODE=foo`). Pre-fix operator typos silently fell through to "no enforcement". Same pattern as the `--require-auth` boot log.
- R7 #7 line 464: extracted `dropRefusalEntry` (4 sites) + `refuseAndLog` (3 sites) helpers. Pure refactor, zero behavior change. The `readResource` refusal path now calls `refuseAndLog` before throwing `BudgetExhaustedError` so operators get the same stderr trail as bulk-discovery refusals.

+ 5 new core regression tests (refusal-cleared-on-success, internal-disabled-gate, discover-throw-disconnects, env-typo-breadcrumb, existing-client-disabled-rejected). 52/52 core tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 8 (QwenLM#4247 wenshao seventh pass)

Addresses @wenshao's seventh review pass on PR QwenLM#4247 (gpt-5.5 + DeepSeek/deepseek-v4-pro via Qwen Code /review). One critical transport leak + three soundness/consistency fixes; one optional clarity refactor explicitly deferred.

Critical:
- R8 #1 line 532 (4 duplicate threads): bulk-path transport leak. Mirrors the R7 #3 fix but in `discoverAllMcpTools` instead of the per-server path. Pre-fix: when `connect()` succeeded (transport established) and `discover()` later threw, the bulk catch deleted the client reference without calling `client.disconnect()`, leaking the stdio child / WebSocket / HTTP socket for the rest of the daemon's lifetime (`stop()` can't see what we just removed from `this.clients`). Best-effort `await client.disconnect()` added before `clients.delete` + `reservedSlots.delete`. Updated the doc comment that misleadingly claimed `stop()` was the lifecycle owner — true only for slot bookkeeping, not transports.

Soundness:
- R8 #2 line 431: tighten `readBudgetFromEnv` mode-without-budget downgrade. Originally only `enforce` got downgraded to `off` when no budget was set; `warn` mode without a budget threshold reached `emitBudgetTelemetry` with `clientBudget: undefined`, contradicting the JSDoc invariant `mode !== 'off' ⇒ clientBudget defined`. Now both `enforce` AND `warn` downgrade to `off` when no budget is configured. The invariant comment was also weakened to match the actual `?? 0` defense-in-depth (the new R8 #5 constructor downgrade closes the remaining edge case).

- R8 #5 line 302: constructor mirrors the `readBudgetFromEnv` downgrade for the direct `budgetConfig` parameter. All production callers (CLI, `runQwenServe`, env-var fallback) validate upfront, but a future code path that injects `budgetConfig` directly without re-validating would re-introduce the silent fail-open. Defense in depth.

- R8 #4 line 1221: distinguish fresh vs `'already_held'` reservations in `runWithDiscoveryTimeout`'s timeout handler. New private `freshReservations: Set<string>` field marked when `weReservedSlot === true` inside `discoverMcpToolsForServerInternal` and cleared in finally / catch / success. Timeout handler now releases the slot ONLY when `freshReservations.has(serverName)` — meaning the slot was freshly reserved by THIS in-flight call. `'already_held'` reconnect timeouts (a previously-healthy server's transient hiccup) keep the slot so health-monitor retry doesn't have to compete for capacity with new servers admitted during the timeout window. Aligns the timeout handler with the connect-failure catch's `weReservedSlot` semantics — closes the asymmetry wenshao R8 #4 caught.

Deferred:
- R8 #3 line 332 (`tryReserveSlot` `'observed'` return value clarity): optional, non-blocking style improvement that ripples through 3 call sites + many tests for zero behavior change. Worth doing in a focused refactor PR; flagged as deferred polish, not in this fixup.

+ 3 new core regression tests (bulk discover-throw disconnects, warn-no-budget downgrade, constructor enforce downgrade). 679/679 focused tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 9 (QwenLM#4247 wenshao eighth pass)

Addresses @wenshao's eighth review pass on PR QwenLM#4247 (glm-5.1 via Qwen Code /review). Six actionable findings adopted; two threads explained as not-actionable (one stale-view, one reviewer hallucination).

Critical / real bugs:
- R9 #2 line 1534: `readResource` lazy-spawn connect-failure catch now does best-effort `await client.disconnect()` BEFORE `clients.delete` + `reservedSlots.delete`. Mirror of R7 #3 (per-server discovery) and R8 #1 (bulk discovery) — closes the same transport-leak class for the third spawn path. Pre-fix: connect() establishing the transport but throwing on a later handshake step would orphan the stdio child / socket.
- R9 #6 line 1521: `readResource` lazy `client.connect()` now wraps in `Promise.race` against `discoveryTimeoutFor(serverConfig)` — same per-server timeout the bulk + incremental paths use. Pre-fix a hung MCP server during a resource-read spawn would block forever and permanently consume a budget slot under enforce mode, cascading into total budget exhaustion. `serverConfig` lookup hoisted to the top of `readResource` so both lazy-spawn and existing-client branches use identical timeout behavior.
- R9 #8 line 1514: `readResource` lazy spawn now calls `this.startHealthCheck(serverName)` after a successful connect. Pre-fix a lazy-spawned server that later disconnected (crash, network) had no automatic reconnect — sat DISCONNECTED until the next readResource or incremental pass. Mirrors `discoverMcpToolsForServerInternal`'s finally-block pattern.

Operator-visibility:
- R9 #7 (general): `readBudgetFromEnv` now writes a stderr breadcrumb when the `(enforce|warn)`-without-budget downgrade fires. Pre-fix a Docker Compose / k8s env that set `QWEN_SERVE_MCP_BUDGET_MODE=enforce` but forgot the matching `_BUDGET=N` would silently boot with enforcement off and `mcp_guardrails` capability advertised — operator only signal was the snapshot's `budgetMode: 'off'`. Now mirrors the R7 #6 invalid-value breadcrumb pattern.

Doc fixes:
- R9 #4 line 81: `McpBudgetConfig.clientBudget` JSDoc now reflects the R4 per-session scope correction. The doc was a leftover from the original "per-workspace" framing — every other doc surface (protocol doc, user doc, type comments on the snapshot cell, capability tag) was rewritten in R4 except this one.
- R9 #5 line 870: `acpAgent.buildBudgetCells` now spells out the `liveCount` (`accounting.total`, CONNECTED only — operator observability) vs `reservedSlots.size` (all reserved including in-flight — enforcement) semantic distinction. The intentional gap was undocumented in the type signatures, JSDoc, and protocol doc; future PR 14b SSE event payloads should reference both.

Not adopted:
- R9 #1 acpAgent:15: claimed "MCP_BUDGET_WARN_FRACTION not exported + getMcpClient* methods don't exist + 4 tsc errors" — verified incorrect: the constant IS exported (mcp-client-manager.ts:61), the 3 methods ARE class members (lines 379, 407, 412), and `npm run typecheck` is clean across all 4 workspaces. Reviewer's tool hallucinated this critical finding.
- R9 #3 mcp:410: reported the bulk-path transport leak that R8 #1 (commit 7228813) had already closed. Reviewer was on the pre-R8 commit view.

+ 2 new core regression tests (readResource lazy connect-fail disconnects + R9 #7 stderr breadcrumb). 57/57 core tests + 679/679 focused suite pass. Typecheck + lint clean.

* fixup(serve): address PR 14 review round 10 (QwenLM#4247 wenshao ninth pass)

Two non-blocking 🟢 nits — both adopted for symmetry / explicitness.

- R10 line 357: constructor downgrade now emits the same stderr breadcrumb the env-var path got in R9 #7. Pre-R10 the `(enforce|warn)`-without-budget downgrade was silent for the direct-`budgetConfig` path, so a future caller bypassing CLI / env-var validation would have shipped a daemon advertising `mcp_guardrails` while silently disabling enforcement. Now boot logs surface the misconfiguration uniformly across all three resolution paths.
- R10 line 1572: documented the `McpClient.disconnect()` cancel-pending-connect contract that the timeout-race cleanup relies on across all three spawn paths (lazy `readResource`, bulk `discoverAllMcpTools`, per-server `discoverMcpToolsForServerInternal`). The bulk path's production stability since QwenLM#3889 is implicit evidence the contract holds; comment makes the assumption discoverable to the next reader and notes a follow-up unit test would be valuable. No behavior change.

57/57 core tests pass. Typecheck + lint clean.
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request May 23, 2026
…M#4255)

* feat(serve): auth device-flow route

Implements issue #4175 Wave 4 PR 21. Brokers OAuth 2.0 Device
Authorization Grant (RFC 8628) through the `qwen serve` daemon so a
remote SDK client can trigger a Qwen-account login whose tokens land
on the **daemon** filesystem, not on the client. The daemon polls the
IdP itself; the client's only job is to display the verification URL +
user code.

Runtime locality (#4175 §11): the daemon NEVER spawns a browser or
calls `open(url)` — even when running locally. Static-source grep
test fails the build on `node:child_process` / `open` / `xdg-open` /
`shell.openExternal` / `execa` / `shelljs` / `process.spawn` and
their dynamic-import / require variants.

- `POST /workspace/auth/device-flow` — strict mutation gate; returns
  201 fresh / 200 idempotent take-over with `attached: true`. Per
  per-`providerId` singleton: a second POST while pending takes over
  rather than allocating a new `device_code`.
- `GET /workspace/auth/device-flow/:id` — public state read. Pending
  entries echo `userCode/verificationUri/expiresAt/intervalMs`;
  terminal entries (5-min grace) drop them and surface
  `status/errorKind/hint`.
- `DELETE /workspace/auth/device-flow/:id` — strict; idempotent
  (terminal → 204 no-op; unknown → 404).
- `GET /workspace/auth/status` — pending flows + supported providers
  snapshot. v1 stub for `providers: []` (populated in fold-in 1).

`DeviceFlowRegistry` (`packages/cli/src/serve/auth/deviceFlow.ts`)
is the in-memory state holder:
- per-`providerId` singleton with idempotent take-over
- workspace-wide cap of 4 active flows (abuse defense)
- 5-min terminal grace so SDK reconnects can still observe results
- TTL sweeper evicts grace-expired entries every 30s
- in-flight `Promise` map coalesces concurrent `start()` calls so two
  parallel POSTs don't double-allocate IdP `device_code`
- `transitionTerminal` returns `boolean` so caller-side emit/audit
  guard prevents sweeper × poll-tick double-fire
- `dispose()` wired into `runQwenServe.close()`'s shutdown drain;
  cancels `provider.poll()` mid-flight via `cancelController`,
  records `lost_success` audit when an IdP-minted token is dropped
  by transition

`DeviceFlowProvider` interface accepts `start({signal})` +
`poll(state, {signal})`. `QwenOAuthDeviceFlowProvider` wraps the
existing `QwenOAuth2Client.requestDeviceAuthorization` /
`pollDeviceToken` primitives directly (NOT
`authWithQwenDeviceFlow`, which calls `open(url)`). PKCE is
provider-required by Qwen but optional in the interface for future
non-PKCE providers. `success.persist()` writes to disk FIRST, then
updates the in-process client — a failed disk write no longer
leaves the daemon with a zombie in-memory token. Maps RFC 8628
errors via an anchored regex (`^Device token poll failed:
(expired_token|access_denied|invalid_grant)`) so an
`error_description` containing one of those literals can't
mis-classify an unrelated upstream error.

`BrandedSecret<T extends string>` holds the `device_code` and PKCE
verifier. Earlier draft used `new String()` wrapper which leaked
through `+` / template literals (`Symbol.toPrimitive` →
`valueOf` returned the primitive). Final shape: frozen plain object
+ `WeakMap` indirection + 4-way redaction
(`toString` / `toJSON` / `Symbol.toPrimitive` / numeric coercion →
`'[redacted]'` or `NaN`) + `unique symbol` brand. 6 leak-path
tests: `JSON.stringify` / `String()` / concat / template / `+x` /
reveal-roundtrip.

5 new daemon events (workspace-scoped, fanned out to every active
session bus via `bridge.broadcastWorkspaceEvent`):

- `auth_device_flow_started` — `{deviceFlowId, providerId, expiresAt}`
  (no userCode/verificationUri — see PR 21 design §3)
- `auth_device_flow_throttled` — `{deviceFlowId, intervalMs}`,
  emitted only on upstream `slow_down` interval bumps
- `auth_device_flow_authorized` — `{deviceFlowId, providerId,
  expiresAt?, accountAlias?}`; `accountAlias` is best-effort
  non-PII (never email/phone)
- `auth_device_flow_failed` — `{deviceFlowId, errorKind, hint?}`
  with `errorKind ∈ {expired_token, access_denied, invalid_grant,
  upstream_error, persist_failed}`
- `auth_device_flow_cancelled` — `{deviceFlowId}` (DELETE on pending)

Workspace-scoped reducer `reduceDaemonAuthEvent` produces
`DaemonAuthState { flows: Partial<Record<ProviderId, ...>> }` —
parallel to `reduceDaemonSessionEvent`. Session reducer no-ops on
auth events (workspace-scoped state belongs in its own reducer).

`bridge.broadcastWorkspaceEvent` is intentionally distinct from PR
16's `publishWorkspaceEvent` to avoid merge conflict; collapses to
the shared helper as a fold-in once #4249 lands (~25 LoC).

`@qwen-code/sdk` (`packages/sdk-typescript/`):

- 4 new `DaemonClient` methods: `startDeviceFlow`, `getDeviceFlow`,
  `cancelDeviceFlow`, `getAuthStatus` — typed against the wire
  shapes, errors mapped through the existing `DaemonHttpError`.
- High-level `client.auth` getter (lazy `DaemonAuthFlow` singleton)
  exposes a `start(...).awaitCompletion()` shape mirroring `gh auth
  login`'s UX: print code first, let the SDK consumer decide where
  to open the browser. `awaitCompletion` polls GET on the
  daemon-supplied `intervalMs`, honors `slow_down` bumps, and
  fall-back-recovers from 404 (entry evicted post-grace).

POST + DELETE flow through PR 15's `mutate({strict: true})` —
401 `token_required` on token-less loopback defaults. GET routes
use only the global `bearerAuth`. Every state transition
(`started/authorized/failed/cancelled/expired/lost_success`)
records a structured stderr breadcrumb (`[serve] auth.device-flow:
provider=... deviceFlowId=abc12... clientId=... status=...`)
since `mutate()` doesn't carry an audit hook — events alone aren't
enough since SDK can silently drop them; stderr → journald/docker
logs is the unfalsifiable record.

`auth_device_flow` advertised unconditionally on
`/capabilities.features`. Supported providers list lives on
`/workspace/auth/status` to keep the registry descriptor uniform.

- `packages/core/src/qwen/qwenOAuth2.ts`:
  - exports `cacheQwenCredentials` (was a private function; needed
    by the daemon's device-flow registry)
  - `cacheQwenCredentials` now calls `SharedTokenManager.clearCache()`
    after writing, folding what was previously a paired call site at
    L820+L829. Idempotent change.
  - file mode `0o600` on `oauth_creds.json` (was default 0o666 +
    umask). Mirrors opencode's `auth/index.ts`.
- `packages/cli/src/serve/runQwenServe.ts`: device-flow registry
  `dispose()` wired into the shutdown drain (BEFORE
  `bridge.shutdown()`).

- `auth/deviceFlow.test.ts` — 21 tests: BrandedSecret leak paths,
  state machine (slow_down / success / error), terminal grace,
  concurrent-start coalescing, dispose, cancel idempotency, static-
  source grep against browser-spawn primitives.
- `server.test.ts` — 10 device-flow integration tests:
  POST 201/200 take-over, strict 401, 400 `unsupported_provider`,
  GET / DELETE / `/workspace/auth/status`, 502 `upstream_error`
  mapping, sweeper-driven auto-expiry with controlled clock,
  capability advertisement.
- `daemonEvents.test.ts` — 5 SDK reducer tests: type guards, per-
  provider state projection, `failed` always → `status: 'error'`
  (errorKind carries the kind, including new `persist_failed`),
  session reducer no-ops on auth events.

369/369 serve + SDK tests pass; typecheck + `eslint
--max-warnings 0` clean across 14 PR 21 files.

- [x] Independently mergeable (depends only on merged PR 4 / PR 7 /
      PR 12 / PR 15)
- [x] Backward compatible (4 new routes + 1 capability tag + 5 typed
      events + 4 SDK helpers; existing routes/events untouched)
- [x] Default off (capability advertised but no client is forced to
      use it; CLI `qwen` OAuth flow unchanged)
- [x] `qwen serve` Stage 1 routes / SDK behavior preserved
- [x] Gradual migration (v1 only `qwen-oauth`; future providers
      register through the `DeviceFlowProvider` interface)
- [x] Reversible (revert removes 4 routes + 1 tag + 5 events with no
      schema migration)
- [x] Tests-first (28 new tests across 3 layers)

- Inline `bridge.broadcastWorkspaceEvent` → fold-in to PR 16 (#4249)
  `publishWorkspaceEvent` once that lands
- `/workspace/auth/status` vs PR 12 `/workspace/providers` boundary
  — separate route in v1; merge alternative discussed
- Wave 4 PRs 17/19/20 should adopt the same mutate-strict +
  workspace event-fan-out pattern

5 items from pre-PR specialist passes parked for a focused
follow-up: `DeviceFlowEntry` discriminated union, single-source SDK
status / ProviderId unions, `awaitCompletion` memoization,
broadcast-100%-fail stderr elevation, SDK 404 →
`not_found_or_evicted` errorKind.

Refs: #4175

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 round-1 review feedback

Eleven items from copilot-pull-request-reviewer's round-1 pass on
#4255 — 4 inline threads + 7 from the PR-level review summary.

## Adopted (11 items, code/doc changes)

- **`lastSeenAt` → `lastSeenEventId`** (`events.ts`,
  `DaemonDeviceFlowReducerState`). The field was set from
  `rawEvent.id` (SSE event id) but documented as "epoch ms" — a real
  semantic mismatch that would mislead consumers into time-based
  logic against a monotonic counter. Rename + tighten the JSDoc to
  describe it as an event-id counter; reducer cases updated.
- **`DEVICE_FLOW_EXPIRY_GRACE_MS = 30_000` extracted** in
  `DaemonAuthFlow.ts` (was a magic number on `start.expiresAt +
  30_000`). `AwaitCompletionOptions.timeoutMs` doc now describes the
  actual grace-past-expiry behavior + the rationale (clock skew +
  daemon sweeper interval + network latency) instead of the wrong
  "defaults to expiresAt - Date.now()" claim.
- **Explicit `chmod 0o600`** in `cacheQwenCredentials` after every
  write. `fs.writeFile`'s `mode` only applies on file creation; a
  pre-existing `oauth_creds.json` written under a broader umask kept
  its old permissions across upgrades. The chmod now tightens it on
  every write; chmod failure (Windows / hardened FS) surfaces via
  `debugLogger.warn` instead of silently dropping the invariant.
- **`SharedTokenManager.clearCache()` failure now logs**
  `debugLogger.warn` (was a silent `try { } catch { }`). In
  production a swallowed clearCache means in-process callers serve
  stale credentials until the SharedTokenManager mtime watcher
  catches up — a recoverable degradation worth a log line.
- **Protocol doc** lists `persist_failed` in the
  `auth_device_flow_failed.errorKind` union (was added to the type
  but missed in the doc).
- **`pollDeviceToken({signal})`** plumbed through
  `IQwenOAuth2Client` interface + `QwenOAuth2Client` impl + the Qwen
  device-flow provider. Cancel / dispose during a slow IdP response
  now aborts the in-flight HTTP socket immediately instead of
  waiting for the upstream timeout. Two new registry tests assert
  `cancel()` / `dispose()` propagate abort to the signal observed by
  `provider.poll`.
- **`revealSecret` error message** clarified: was "secret has been
  GC-evicted" (impossible — WeakMap doesn't evict reachable keys).
  Now points at the actual reachable failure modes (forged shape /
  serialize+reparse losing the WeakMap binding).
- **`transitionTerminal` JSDoc** clarifies that the PRIMARY guard
  against late timer secret leaks is the `entry.status !== 'pending'`
  check at the top of `runPollTick`; secret-clearing here is
  defense-in-depth.
- **`DeviceFlowErrorKind` JSDoc'd per variant** so consumers can tell
  when each fires (RFC 8628 distinctions + `persist_failed` vs
  `upstream_error` boundary).
- **Stale "PR 16 / PR 21 §3" temporal references** in
  `DaemonAuthFlow.ts:124` rephrased to be timeless ("workspace-scoped
  events fan out through whatever session buses happen to be live"
  — no PR number references that rot when those PRs merge).

## Not adopted (4 items, replied to in-thread)

- **`authWithQwenDeviceFlow` browser-launch separation** — correct
  architectural advice but out of #4255 scope (would refactor a CLI
  auth UX module that PR 21 only touched additively). Tracked as a
  Wave 5 follow-up.
- **Copyright header year range** — repo-wide convention "2025"; not
  introduced by this PR.
- **Spread `...(x ? {x} : {})` → `x: x ?? undefined`** — the two are
  not semantically equivalent. The current form omits the key
  entirely on falsy `x`; the suggested form always includes the key.
  Tests assert object shape and would break under the change.
- **Eager `client.auth` getter** — public API boundary. Lazy
  construction matches `DaemonSessionClient` precedent + saves the
  module load for SDK consumers that never touch auth.

Refs: #4175 #4255

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-1 review feedback

15 items from @wenshao's review batches on #4255. Catches a handful
of real bugs that the earlier round (commit 3d9f082f5) didn't
surface.

## Critical fixes

- **C1 — `pollUntilTerminal` providerId pass-through**
  (`DaemonAuthFlow.ts:185`). The synthetic 404 fallback hardcoded
  `providerId: 'qwen-oauth'`; the parent `awaitCompletion` already
  receives the real providerId via `start.providerId` but
  `pollUntilTerminal`'s parameter type stripped it. Add the field to
  the param type, propagate.
- **C2 — open `errorKind` allowlist** (`events.ts`). The closed
  5-value union in the type guard silently dropped any `failed`
  event whose errorKind the daemon added without mirroring SDK-side
  (e.g. a future `rate_limited`). The flow's reducer state would
  never transition to terminal, leaving SDK consumers stuck on
  `pending` forever. Open the union with `(string & {})` and accept
  any non-empty string in the runtime guard. Updated test asserts
  forward-compat behavior + still rejects the truly-malformed
  empty-string case.
- **C3 — `persist()` timeout + signal**
  (`deviceFlow.ts`). A wedged disk I/O (NFS stall, encrypted-volume
  contention) without bounds would pin the entry in `pending` until
  the upstream `expires_in` elapsed (potentially minutes). The
  registry now passes its `cancelController.signal` AND arms a hard
  `DEVICE_FLOW_PERSIST_TIMEOUT_MS = 30_000` timer; persist failure
  surfaces as `persist_failed` immediately. The
  `DeviceFlowPollResult` `success` variant signature changed to
  `persist({signal})`.
- **C4 — cancel × success race rollback**
  (`deviceFlow.ts` + Qwen provider). Today, if `cancel()`
  transitions while `persist()` is in flight, the credentials get
  written but the flow's status is `cancelled`. User sees cancelled,
  daemon disk has a valid token. `DeviceFlowPollResult.success`
  gains an optional `unpersist()` callback the registry calls when
  `transitionTerminal(authorized)` fails — the Qwen provider wires
  it to `clearQwenCredentials()`. Rollback failure is audited but
  not propagated (re-running auth would overwrite anyway).
- **C5 — don't `unref()` the `awaitCompletion` sleep timer**
  (`DaemonAuthFlow.ts`). On a standalone Node CLI/script doing just
  `client.auth.start().awaitCompletion()`, the unref'd between-poll
  timer was the only event-loop handle, so Node could exit before
  the user finished authorization. The poll wait is foreground work
  the caller explicitly awaits — keep it ref'd.

## Information-leak fixes

- **S1 — sanitize `persist_failed` hint**. `err.message` from
  `cacheQwenCredentials` embeds the full `~/.qwen/oauth_creds.json`
  path. Broadcast via SSE, that path leaks the daemon's home layout
  to every connected session subscriber. Replace user-facing hint
  with `"credentials could not be written to the daemon filesystem
  — check disk space and permissions"`; full err goes to stderr
  audit only.
- **S2 — sanitize upstream `pollDeviceToken` hint**. The class
  embedded the entire raw IdP response body (which can be an HTML
  error page from a reverse proxy) into the thrown message. Same
  broadcast leak path. Replace upstream-error hint with
  `"unexpected response from identity provider"`; RFC 8628 errors
  use `"Qwen IdP returned ${kind}"`.

## Cleanup / forward-compat

- **D1 — drop duplicate `clearCache()`** at `qwenOAuth2.ts:840`. The
  paired call became redundant once `cacheQwenCredentials` folded
  the clearCache in (PR #4255 fold-in 1). The fold-in 1 message
  said this would be done; the duplicate slipped through.
- **S3 — drop unused `DeviceFlowNotFoundError`** (`deviceFlow.ts`).
  Exported but never imported; route handlers do inline 404 JSON.
- **S4 — single-source SDK status / errorKind unions**
  (`types.ts`). `DaemonAuthDeviceFlowSdkStatus` /
  `DaemonAuthDeviceFlowSdkErrorKind` were parallel literal copies
  of the canonical events.ts definitions — drift waiting to happen.
  Now imported + aliased as type-only re-exports.
- **S5 — broadcast 100% fail elevates to stderr**
  (`httpAcpBridge.ts`). Per-session bus failures stay debug-only,
  but a broadcast where EVERY session bus refused is operationally
  interesting (clients won't see the event). Track success / fail
  counts; `writeStderrLine` when `successCount === 0`.
- **S6 — `this.disposed` check after `await provider.start()`**
  (`deviceFlow.ts`). `dispose()` mid-start would orphan the freshly-
  inserted entry (`schedulePoll` guards on `disposed` so no poll
  fires; the entry never transitions). Throw post-await if disposed.
- **W1 — thread `signal` into `requestDeviceAuthorization`**
  (`qwenOAuth2.ts` + Qwen provider). `start()` had the same
  cancellation gap that `pollDeviceToken` had — a slow
  device-authorization request couldn't be aborted during shutdown.
  Now plumbed end-to-end.
- **W2 — split `invalid_request` from `unsupported_provider`**
  (`server.ts`). Conflating them surfaced misleading remediation
  hints to SDK consumers branching on `code` ("this provider isn't
  supported here" when the real cause was a serializer dropping the
  field). Bad-shape now returns `code: 'invalid_request'`;
  unknown-but-well-formed stays `unsupported_provider`.
- **W3 — drop never-populated `accountAlias`**
  (Qwen provider). The field was wired through types / events /
  reducer / audit but the Qwen IdP's token response doesn't carry
  one (no `name` / `email` / `sub`). Returning only `{expiresAt}`
  makes the field type-honestly absent rather than always-undefined.
  Future provider with an alias-bearing response can populate it.
- **W4 — `DaemonAuthFlow` JSDoc accuracy**. Doc claimed "first
  attempts to consume an SSE event stream … falls back to GET-based
  polling"; actual is GET-only with SSE as a real-time hint for
  clients already subscribed to a session stream.
- **W5 — clearer unit arithmetic** in interval normalization. The
  `(_INTERVAL_MS / 1000) * 1000` cancelation hid the s↔ms boundary;
  expanded form makes both branches unit-explicit.

## Test changes

- `daemonEvents.test.ts` updated to match the now-OPEN errorKind
  union (forward-compat assertion + empty-string still rejected).
- `deviceFlow.test.ts` `FakeProvider.poll` aligned with the new
  `persist({signal})` signature + optional `unpersist`.

## Validation

- `npm run typecheck --workspace packages/cli --workspace
  packages/sdk-typescript --workspace packages/core` — clean
- `npx vitest run packages/cli/src/serve/
  packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 368/368
- `npx eslint --max-warnings 0` over the 11 PR 21 surface files —
  clean

Refs: #4175 #4255

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-2 review feedback

10 new threads from @wenshao's second deep-review pass on #4255.
Verified status: 5 real issues, 1 improvement, 3 stale (already
fixed; comments lagged), 1 false alarm (typecheck demonstrably
clean).

## Critical fixes

- **fold-in 2 C4 REVERSED**: when `provider.poll()` returns success
  AND `cancel()` / `dispose()` transitioned the entry mid-`persist()`,
  the registry now FORCES the entry to `authorized` and keeps the
  on-disk credentials. The earlier rollback (`unpersist()`) wasted
  the user's IdP approval because the RFC 8628 `device_code` is
  single-use — re-running the flow would force them through the
  whole browser-prompt + paste-code dance again for a click whose
  intent was likely "stop the wait" rather than "undo my already-
  completed approval". Aligns with gh CLI / Auth0 SDK / git-
  credential-manager. Audit captures the race via `hint:
  'lost_success_kept ...'`. `DeviceFlowPollResult.success.unpersist`
  field + Qwen provider's `clearQwenCredentials` rollback removed.
- **#1 GET /workspace/auth/device-flow/:id strict gate**: this GET
  surfaces `userCode` / `verificationUri` for pending entries, which
  on the loopback no-token default were readable by any local
  process. POST + DELETE were already strict; aligning GET closes
  the information-disclosure asymmetry. `/workspace/auth/status`
  stays bearer-only (its `pendingDeviceFlows` entries intentionally
  omit `userCode`).
- **#2 `inFlightStarts` hard timeout**: a hung `provider.start()`
  (network partition, unresponsive IdP) used to leave the per-
  `providerId` slot in `inFlightStarts` occupied forever, blocking
  every subsequent POST until daemon restart. New
  `DEVICE_FLOW_START_TIMEOUT_MS = 30_000` arms a timer that
  `cancelController.abort()`s the start; the rejected promise
  unwinds through the `try/finally` clearing the slot.
- **#10 chain-completing the C3 persist-timeout**: the earlier C3
  fix armed a 30s timer that fired `cancelController.abort()` then
  `await result.persist({signal})`, but the chain ended at the
  registry boundary — `cacheQwenCredentials` didn't take a signal,
  so `fs.writeFile` couldn't be aborted. Now `cacheQwenCredentials`
  accepts an optional `{signal}` and threads it into
  `fs.writeFile(..., {signal})` (Node native). The Qwen provider's
  `persist({signal})` forwards the entry's
  `cancelController.signal` end-to-end.

## Improvement (#4): 404 fallback errorKind

`pollUntilTerminal`'s 404 catch used to synthesize
`{status: 'expired'}` for ALL evicted entries — conflating "your
flow expired during your disconnect", "the daemon was restarted",
and "your deviceFlowId was wrong". Now returns
`status: 'error'` + `errorKind: 'not_found_or_evicted'` + a `hint`
so SDK consumers branching on errorKind can distinguish.

## Information leak (#9): start() path raw IdP message

S2 (fold-in 2) sanitized `poll()`'s upstream-error hint, but
`start()` still embedded the raw `err.message` (full IdP response,
potentially HTML from a reverse proxy / WAF) into the
`UpstreamDeviceFlowError` that flowed to SDK clients via the 502.
Now uses static messages for the SDK-visible errors; raw detail
goes through `writeStderrLine` for operator audit only. Mirrors
S2's approach.

## Stale comments cleaned (#5, #7)

`qwenDeviceFlowProvider.ts:177` claimed
`cacheQwenCredentials` "doesn't currently take a signal — that's
a follow-up". After #10 above, that's no longer true; the comment
is replaced with the actual end-to-end signal-threading note.

## Not adopted (1 false alarm)

- Thread on `types.ts:330` claimed type-only-import-after-
  declarations breaks `tsc` and fails `daemonEvents.test.ts:670`
  with TS2345. Demonstrably false: `npx tsc -p
  packages/sdk-typescript/tsconfig.json --noEmit` exits 0;
  `daemonEvents.test.ts` is the post-fold-in-2 file with the
  open-allowlist assertion (test 28/28 passes). The reviewer may
  have been looking at a transient state during their analysis.

## Validation

- `npm run typecheck --workspace packages/cli --workspace
  packages/sdk-typescript --workspace packages/core` — clean
- `npx vitest run packages/cli/src/serve/
  packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 398/398
  pass
- `npx eslint --max-warnings 0` over the PR 21 surface — clean

Refs: #4175 #4255

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-3 review feedback

5 new threads from the third deep-review pass on #4255. 3 real
issues fixed; 1 stale (already done in fold-in 3); 1 deferred as
non-blocking design suggestion.

- **A — `expiresIn` / `interval` non-finite guard**
  (`deviceFlow.ts`). The provider contract types both as `number`,
  but a misbehaving / future provider could hand `undefined` /
  `NaN` / `Infinity`. `Math.max(0, NaN) * 1000` is `NaN`, then
  `now() + NaN` is `NaN`, then `now >= NaN` is always `false` —
  the sweeper would NEVER evict the entry, pinning an upstream
  `device_code` slot until daemon restart. Same hazard on
  `interval * 1000` (NaN → `setTimeout(NaN)` fires immediately,
  Infinity → scheduler clamps to TIMEOUT_MAX). Now both fields go
  through `Number.isFinite(x) && x > 0`; missing/bad values fall
  back to RFC 8628's recommended ceilings (10 min for expiry, 5s
  for interval).

- **D — typed `app.locals` accessor**
  (`deviceFlow.ts` + writer/reader call sites). The
  `app.locals['deviceFlowRegistry']` string key was shared between
  `createServeApp` (writer) and `runQwenServe` (reader); a typo on
  either side would compile cleanly and the shutdown dispose call
  would silently no-op, leaving polling timers running until the
  `unref()` rescue. New `setDeviceFlowRegistry(app, registry)` /
  `getDeviceFlowRegistry(app)` pair gives both call sites
  type-checked access; the string literal is encapsulated in one
  module.

- **E — `UnsupportedDeviceFlowProviderError` docstring**
  (`deviceFlow.ts`). After fold-in 2's W2 fix split
  `invalid_request` from `unsupported_provider`, the route layer
  screens unknown ids against `DEVICE_FLOW_SUPPORTED_PROVIDERS`
  before reaching the registry — so this error is now reachable
  ONLY on a daemon-internal invariant violation (id is declared
  supported but not registered in the runtime provider map).
  Docstring + thrown message updated to reflect that this branch
  signals a programmer error, not user input.

- **B** claimed `cacheQwenCredentials(credentials)` doesn't forward
  signal to `fs.writeFile`. Verified: fold-in 3 (#10) at
  `qwenDeviceFlowProvider.ts:204` calls
  `cacheQwenCredentials(credentials, { signal: persistOpts.signal })`
  and the core helper threads it into `fs.writeFile(..., {mode,
  signal})`. The reviewer was looking at the comment block above
  (lines 174-181) without scrolling to the actual call site.

- **C — SDK `cancelDeviceFlow` lossy 204/404 collapse**.
  Suggested returning `{existed: boolean; alreadyTerminal: boolean}`
  instead of resolving void on both 204 and 404. Real signal-loss
  but tagged "[非阻塞]" by the reviewer; changing requires a
  daemon route shape change (200 + body instead of 204) which is
  better as a focused follow-up PR. Acknowledged in-thread;
  deferred to a fold-in PR after #4255 lands.

- `npm run typecheck` — clean across `packages/{cli,sdk-typescript,core}`
- `npx vitest run packages/cli/src/serve/
  packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 398/398
- `npx eslint --max-warnings 0` over the PR 21 surface — clean

Refs: #4175 #4255

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-4 review feedback

4 threads from the fourth review pass on #4255. 3 adopted + 1
deferred (out-of-scope rename of PR 15's `mutate` helper).

## Adopted

### #1 — `persistInFlight` flag suppresses cancel × persist event-stream UX trap

When `provider.poll()` returns success and we await `persist()`, a
concurrent `cancel()` would synchronously transition the entry to
`cancelled` and emit `auth_device_flow_cancelled` — then `persist()`
resolves and (per fold-in 3 C4) force-overrides to `authorized` +
emits `auth_device_flow_authorized`. The reducer state correctly
last-write-wins on `authorized`, but DIRECT event-stream consumers
(close-dialog handlers, telemetry, UI cleanup) race onto an unmounted
UI when the second event lands.

Now: while persist is in-flight, `cancel()` and the sweeper SKIP the
state transition + event emit. They register intent (set
`cancelRequestedDuringPersist=true` for cancel; sweeper just no-ops)
and let the persist resolution decide:

- persist succeeds → `authorized` (IdP wins per fold-in 3 C4)
- persist fails AND cancel was requested → `cancelled`
- persist fails AND `now >= expiresAt` → `expired` / `expired_token`
- persist fails otherwise → `error` / `persist_failed`

Result: at most one terminal event per flow. Imperative SSE
consumers no longer see oscillating terminal states. Audit captures
the race (`hint: 'lost_success_kept ...'`) for incident-response
correlation.

### #2 — `revealSecret` → `unsafeRevealSecret` rename

The earlier JSDoc claimed "the `unsafeReveal_` naming is intentional:
greppable in code review, easy to allowlist in lint rules, hard to
invoke by accident" — but the actual function was named
`revealSecret`. The promised safety properties didn't exist; a code
reviewer wouldn't single out `revealSecret` as suspicious, and a
`no-restricted-syntax` ESLint rule wouldn't flag it.

Renamed to `unsafeRevealSecret` so the JSDoc-promised "greppable" /
"lintable" property is now actually true. Two call sites in the
Qwen provider + 4 test references updated. Internal symbol; not
exposed through the SDK package.

### #4 — `QwenOAuthPollError` typed class replaces substring regex

The earlier RFC 8628 error mapper used an anchored regex against the
thrown error message text — an implicit cross-file string contract
between `qwenOAuth2.ts` (throws) and `qwenDeviceFlowProvider.ts`
(matches). If `qwenOAuth2.ts` ever changed its message format, ALL
RFC 8628 errors (`expired_token` / `access_denied` / `invalid_grant`)
would silently fall through to `upstream_error` — wrong errorKind
flowing through telemetry with no test or type-system check to catch
the drift.

Now `QwenOAuth2Client.pollDeviceToken` throws a structured
`QwenOAuthPollError extends Error` with `oauthError` / `description`
/ `status` fields. The provider branches on `instanceof
QwenOAuthPollError` and reads `.oauthError` directly via a
dedicated `mapRfc8628OAuthCode(code)` switch. The drift hazard is
gone: a future code change that touches the typed class will
fail tsc until both sides are updated. Message format preserved
for any pre-existing log-parsing / substring matchers.

## Not adopted

### #3 — `mutate({strict:true})` semantic awkwardness on GET

Reviewer correctly noted that `mutate` is named for state-changing
routes, but `GET /workspace/auth/device-flow/:id` uses it for an
information-disclosure defense (only reachable code path is reading
state). Suggested rename: `mutate` → `strictHttpGate`.

Deferred: the rename touches PR 15's helper which has many call
sites in `server.ts` and is shared infrastructure for Wave 4 PRs
17/19/20. PR 21 is the first / only consumer of the strict-on-GET
form so far; widening the rename to a Wave 4 follow-up keeps the
fold-in scope tight. Replied in-thread.

## Validation

- `npm run typecheck` — clean across `packages/{cli,sdk-typescript,core}`
- `npx vitest run packages/cli/src/serve/
  packages/sdk-typescript/test/unit/daemonEvents.test.ts` — 544/544
- `npx eslint --max-warnings 0` over the PR 21 surface — clean

Refs: #4175 #4255

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-5 review feedback

Five small adopt items from the round-5 review pass; one stale thread
already addressed in b5b77ee90 (fold-in 5).

#2 — `as const` + derived type for DEVICE_FLOW_SUPPORTED_PROVIDERS so
adding/removing a provider id requires touching exactly ONE site.
Mirrors `SERVE_ERROR_KINDS` / `ServeErrorKind` in `status.ts`.

#3 — Clarify `DEVICE_FLOW_EXPIRY_GRACE_MS` JSDoc to distinguish the
daemon's 30s SWEEP cadence (what the grace tracks) from the 5-min
TERMINAL_GRACE_MS reconnect window (which awaitCompletion does NOT
need to wait through).

#4 — Add `@remarks` block on `DeviceFlowProvider.poll()` warning
future provider authors that thrown `err.message` flows verbatim
into the SSE-broadcast `auth_device_flow_failed` hint, and must be
sanitized. Two equally-correct paths documented (typed `error`
result vs sanitized thrown message).

#5 — Truncate raw IdP detail in `qwenDeviceFlowProvider.ts` stderr
audit lines to 2 KiB. WAFs / reverse proxies can return MB-sized
HTML error pages, and container log aggregators (Loki, Fluent Bit,
Stackdriver) typically truncate or drop lines past 4-32 KiB —
losing the useful prefix downstream. 2 KiB retains structured JSON
envelopes while staying well below every aggregator's per-line cap.

#6 — Track latest `originatorClientId` on per-provider singleton
take-over via new `entry.lastOriginatorClientId` field +
`recordTakeover()` helper. When a second SDK client posts
`POST /workspace/auth/device-flow` for an already-pending provider
(or one being created in `inFlightStarts`) with a different
`initiatorClientId`, an audit breadcrumb records the take-over so
incident response can correlate "client A started, client B took
over at 12:34". Event-routing intentionally still uses the original
`initiatorClientId` (events are workspace-broadcast and changing
the originator field mid-flow would break SDK reducers that key on
it). Two new tests cover the differing-id audit + same-id no-op.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-6 review feedback

Six "Critical" findings from a gpt-5.5 /review pass — all real
liveness/correctness defects in the daemon's auth device-flow path
and the SDK's awaitCompletion polling loop.

#1 — Make `provider.start()` timeout authoritative via `Promise.race`
in `DeviceFlowRegistry.doStart`. The earlier shape only ABORTED the
signal on timeout; a provider that ignores `signal` (non-abortable
I/O, future implementer who forgets to thread it to `fetch`) would
leave the `await` hanging until daemon restart, pinning the
`inFlightStarts` slot for that providerId. Race against a rejecting
timer makes the timeout authoritative regardless of provider
cooperation; abort still fires for cooperative cleanup.

#2 — Same shape for `result.persist()` in the success branch of
`runPollTick`. A future provider whose persist performs
non-abortable steps (mkdir/chmod/mv outside the abortable
fs.writeFile path) would otherwise hang the poll tick until process
restart. Race against rejecting timer; rejection maps to
`persist_failed`.

#3 — Clamp `expiresIn` and `interval` upper bounds. Previous
`Number.isFinite + > 0` guards stopped NaN/Infinity but a finite
extreme like `1e12` was still accepted — pinning the per-provider
singleton for ~30,000 years (`expires_in`) or scheduling a
TIMEOUT_MAX-clamped poll that never fires within `expiresAt`
(`interval`). Two new constants (`DEVICE_FLOW_MAX_EXPIRES_IN_SEC =
3600`, `DEVICE_FLOW_MAX_INTERVAL_MS = 60_000`) cap the worst case.

#4 — Extract `getDeviceFlowOrSynthetic404(...)` helper in
`DaemonAuthFlow.ts` and route BOTH the loop body and the
timeout-ceiling final read through it. Previously the ceiling read
went directly through `client.getDeviceFlow` and a 404 at the
boundary (entry evicted just as the timeout fired) would reject with
`DaemonHttpError(404)` instead of returning the structured `{ status:
'error', errorKind: 'not_found_or_evicted' }` that the rest of
`awaitCompletion` promises.

#5 — Validate `AwaitCompletionOptions.timeoutMs` and `pollOverrideMs`
with `Number.isFinite + > 0`. NaN slipped past the previous `??
default` form (NaN is truthy-ish in that position) and produced a
`ceiling` of `NaN` (loop runs forever — `now >= NaN` always false)
or a `setTimeout(NaN)` (Node clamps to 1ms — tight polling loop).
Sanitize to `undefined` so the documented defaults take effect.

#6 — Thread `signal` into `DaemonClient.getDeviceFlow` and forward
to `fetchWithTimeout` (which already composes caller + timeout
signals). awaitCompletion now passes `opts.signal` from both GET
sites. Without this, an `awaitCompletion` caller that aborts mid-
poll could not cancel an in-flight stalled GET; it would have to
wait for the daemon-side `fetchTimeoutMs` (30s default) to fire.

Four new tests in `deviceFlow.test.ts` pin the new behaviors:
hanging-start timeout (#1), hanging-persist → persist_failed (#2),
extreme-expiresIn clamp (#3), extreme-interval clamp (#3).
FakeProvider gained a `startHangs` flag for the non-cooperative
provider scenario.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-7 review feedback

Two findings from a DeepSeek /review pass; both small but legitimate
defense-in-depth gaps.

#1 — `runPollTick`'s catch block forwarded `err.message` verbatim
into the SSE-broadcast `hint`. The provider's `@remarks` contract
(fold-in 6 #4) requires throwers to sanitize, but if violated the
unbounded raw payload would reach every SSE subscriber. Added
`DEVICE_FLOW_POLL_HINT_MAX_LEN = 256` + `truncatePollHint()`,
applied to the catch's `result.hint`. Full raw `err.message` is
still routed to the audit trail (`audit?.record({hint: 'provider.poll()
threw (raw): ...'})`) so operator visibility for incident response
is preserved. Belt-and-suspenders: the contract is now structurally
enforced rather than relying on every future provider author to
read the JSDoc.

#2 — `updateMatchingFlow` (and the `started`/`authorized` handlers
in `reduceDaemonAuthEvent`) unconditionally overwrote state without
comparing `rawEvent.id` against the existing flow's
`lastSeenEventId`. The field's JSDoc documented it as a monotonic
counter to prevent stale frames from overwriting newer state, but
the code didn't enforce that contract. SSE reconnect with
`Last-Event-ID < terminal-frame-id` would replay older frames; if
any of them were for the same `deviceFlowId` (e.g. a delayed
`failed` arriving after `authorized`) the stale frame would
overwrite the terminal. Daemon-side `transitionTerminal` makes the
exact reachable scenario thin, but the documented contract should
match the code.

Threaded `rawEventId` into `updateMatchingFlow` and added the gate
there + in the `started` and `authorized` handlers (the two cases
that don't go through `updateMatchingFlow`). Synthetic frames
without an envelope `id` (`rawEventId === undefined`) bypass the
gate — they originate inside SDK reducer machinery and aren't
subject to replay ordering.

Three new tests pin the contracts:
- `runPollTick catch truncates the SSE hint and preserves raw on
  the audit (fold-in 8 #1)` — `pollThrowsWith` flag on FakeProvider
  models a non-conforming provider; SSE hint < 400 chars + contains
  'truncated'; audit hint contains the full 4_000-char raw.
- `reduceDaemonAuthEvent rejects out-of-order frames (fold-in 8 #2
  monotonicity)` — stale `failed`(id=7) does NOT overwrite
  `authorized`(id=10); stale `started`(id=4) for a different flow
  also rejected.
- `reduceDaemonAuthEvent passes synthetic frames (no envelope id)
  through the gate` — SDK-internal frames without `id` are honored.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-8 review feedback

Twelve correctness + structural fixes from a wenshao + DeepSeek + gpt-5.5
review pass. Tests deferred to fold-in 10 (separate, larger commit).

CRITICAL CORRECTNESS

#7 — `provider.persist()` Promise.race could publish `persist_failed`
to SSE while a non-cooperative provider was still committing
credentials to disk. Added an independent tracker on the original
persist promise: if the race timed out (`persistTimedOut === true`)
AND the underlying persist later resolved successfully, audit a
`lost_success_after_timeout` breadcrumb so operators see the
inconsistency. Tightened the persist `@remarks` contract to require
signal honoring end-to-end. Qwen provider already complies (fold-in
3 #10); this is forward-defense for future providers.

#11 — auth surface (`DaemonAuthFlow`, `reduceDaemonAuthEvent`,
`createDaemonAuthState`, `DEVICE_FLOW_EXPIRY_GRACE_MS`, all event /
data / state types) was re-exported from `src/daemon/index.ts` but
NEVER from the published SDK entry `src/index.ts`. SDK consumers got
`undefined` for everything except `client.auth.start()` (which
traveled through the already-exported `DaemonClient`). Added the
missing exports and pinned via `daemon-public-surface.test.ts`.

#12 — `core/src/qwen/qwenOAuth2.ts:373`'s
`debugLogger.debug('Device authorization result:', result)` writes
the raw `device_code` (RFC 8628 bearer-equivalent credential) to
stderr / journald, bypassing the `BrandedSecret` redaction layer.
Pre-existing on main but PR 21 expanded the exposure surface.
Sanitized to log only `{ ok, expires_in }` on success / `{ ok,
error }` on error.

#13 — `runPollTick` success-branch persist-failure × past-`expiresAt`
classified as `expired_token` instead of `persist_failed`, routing
operators toward "tell user to retry" (RFC 8628 expiry) when the
actual root cause was disk I/O. Reclassified to `persist_failed`
with a `persist_also_failed_past_expiry` audit hint to preserve the
timing detail for incident response.

SMALL CORRECTNESS

#1 — `runPollTick` catch hint replaced with a STATIC bounded message
("provider.poll() failed; see daemon audit log for details"). The
fold-in 8 truncated-prefix approach could still leak the first 256
chars of provider-templated raw text including secret material. Full
raw still routed to audit channel for operator visibility.

#5 — `cancellerClientId` field added to `DeviceFlowEntry`; deferred-
cancel branch in `cancel()` now stamps it on the entry, and the
persist-resolution `cancelled` event publish uses
`entry.cancellerClientId ?? entry.initiatorClientId`. SSE consumers
that suppress self-emitted events can now attribute the cancel
correctly.

#6 — `AwaitCompletionOptions.timeoutMs === 0` (the documented
"settle immediately, return current daemon view" contract) was
treated as falsy by the `?` ternary, falling back to the default.
`sanitizePositiveMs` now takes an `allowZero` opt-in; the ceiling
computation uses `!== undefined` instead of truthy check.

#8 — `EventBus.publish()` returns `undefined` for closed buses (it
does NOT throw). `broadcastWorkspaceEvent` previously counted that
path as success, hiding the all-buses-dropped operator alarm.
Folded the closed-bus-as-failure check into the canonical
`publishWorkspaceEvent` (see #X below).

#9 — start-timeout Promise.race rejected with a plain `Error`,
falling through `sendBridgeError` to a generic 500. Switched to
`UpstreamDeviceFlowError` so a hung IdP correctly surfaces as 502
(matching the envelope every other IdP start failure uses).

STRUCTURAL

#3 — Three identical `transitionTerminal + publish + audit`
expired_token blocks in `runPollTick`/`sweep`/(removed by #13)
deduplicated into a private `expireEntry()` helper. Future event-
shape changes are now a one-edit operation.

#X — PR 16 (#4249) merged on 2026-05-18 06:27Z. Per the inline
comment at httpAcpBridge.ts:501, PR 21's `broadcastWorkspaceEvent`
was kept distinct only to avoid the merge conflict; once PR 16
landed, it became a fold-in candidate. Folded the closed-bus +
all-failed-stderr-escalation operator-visibility features (PR 21's
S5 + fold-in 9 #8) INTO `publishWorkspaceEvent`; dropped
`broadcastWorkspaceEvent` from the bridge interface + impl + test
mocks. PR 21's deviceFlowEventSink now calls
`bridge.publishWorkspaceEvent` — single canonical workspace fan-out.

DOC

#16 — Added a "Cross-client take-over" paragraph to
`docs/users/qwen-serve.md` explaining that two clients on the same
daemon for the same provider get the per-provider singleton with
`attached: true`/`false` distinguishing them; no separate event
fires (both eventually observe the same `auth_device_flow_authorized`).

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao round-9 review feedback

Two small non-blocking items from the round-9 pass; defensive shape +
docs only. The 4 deferred test-coverage threads (#1-4 of round-8) are
still tracked for fold-in 10.

#6 — `lastSeenEventId` typed `number` with `?? 0` defaults in the
`auth_device_flow_started` reducer case. The daemon-side `EventBus`
assigns ids ≥ 1 so the `0` sentinel has no real-traffic meaning, but
the monotonic gate (`rawEventId <= flow.lastSeenEventId`) would
reject any future SDK-internal synthetic frame using `id: 0`.
Changed the field type to `number | undefined` and dropped the
`?? 0` from the started case. The `updateMatchingFlow` /
`auth_device_flow_authorized` guards already short-circuit on
`existing.lastSeenEventId !== undefined`, so undefined is safe
end-to-end. Existing 34 reducer tests still pass unchanged.

#7 — Added `@remarks` block to `DeviceFlowErrorKind.persist_failed`'s
JSDoc explaining the lost-success retry UX. When fold-in 9 #7's
`lost_success_after_timeout` audit fires (non-conforming provider
violates signal contract; disk write succeeds after registry
published `persist_failed`), a naive SDK retry hits the IdP a
second time with a fresh `device_code` and prompts the user
twice — but the first credential set is already valid. JSDoc now
documents the mitigation: SDK consumers writing retry logic on
`persist_failed` should call `client.auth.getStatus()` BEFORE
re-prompting; operators can grep stderr/audit for
`lost_success_after_timeout` to detect occurrences.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* test(serve): fold-in 10 — auth device-flow test bundle (#4255)

Lands the four deferred test-coverage items the round-8 review
flagged (and round-9 re-surfaced) as a hard merge prerequisite.
Net +41 tests across registry / SDK helper / client HTTP /
HTTP route layers.

#1 — `deviceFlow.test.ts` `persist failure paths` describe (3
tests, +3). The success arm's three terminal mappings — pure
`persist_failed`, `cancelled` (cancel during persist), and
`persist_failed` past `expiresAt` (the fold-in 9 #13
reclassification with `persist_also_failed_past_expiry` audit
hint) — were 0-covered. Now pinned. Test #2 also asserts the
fold-in 9 #5 cancellerClientId routing on the deferred
`cancelled` event.

#2 — new `DaemonAuthFlow.test.ts` (+14 tests). Mock DaemonClient
with sequenced `getDeviceFlow` replies. Covers happy-path
polling → `authorized`; `slow_down`-driven `intervalMs` bump
firing `onThrottled`; `signal.abort()` rejection; `signal`
propagation through `client.getDeviceFlow` (fold-in 7 #6);
`timeoutMs` ceiling final-read; `timeoutMs:0` immediate-return
(round-9 #6); NaN/Infinity → `sanitizePositiveMs` fallback to
default ceiling (fold-in 7 #5); 404 → synthetic
`error`/`not_found_or_evicted` (fold-in 3 #4) at BOTH the loop
body AND the timeoutMs ceiling read (fold-in 7 #4); non-404
DaemonHttpError rethrown; `cancel()` and top-level
`status()`/`cancel()` wrappers forward correctly.

#3 — `DaemonClient.test.ts` `device-flow methods` describe
(+11 tests). POSTs `/workspace/auth/device-flow` happy path +
clientId header + body shape; 200/201 acceptance; non-2xx →
`DaemonHttpError`. GETs URL-encode the deviceFlowId; forward
`opts.signal` to `fetchWithTimeout`'s composed signal (fold-in
7 #6 — verified by aborting caller signal and observing the
fetch's signal flip to `aborted`); 404 throws. DELETEs
swallow 204 + 404 (idempotent, mirrors `closeSession`); non-
204/404 throws. `getAuthStatus` plain GET. `client.auth`
lazy-instantiated singleton.

#4 — `server.test.ts` 5 supplementary contract tests (+5).
The existing 8 `it()`s cover happy paths + take-over + 401
POST + DELETE pending/terminal/unknown + 502 upstream + sweeper.
This commit plugs gaps: 400 `invalid_request` for missing /
non-string providerId (fold-in W2 split this from
`unsupported_provider`); 409 `too_many_active_flows` (via
injected fake registry); 401 `token_required` on DELETE
without bearer; the asymmetric GET posture
(`/workspace/auth/device-flow/:id` IS strict-gated to prevent
peer-process userCode shoulder-surf; `/workspace/auth/status`
stays read-only because its `pendingDeviceFlows` entries
intentionally redact `userCode`).

Validation: cli serve 631/631 (+8 from #1, #4); sdk 384/384
(+25 from #2, #3, +/- some pre-existing churn). Typecheck +
lint clean.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(qwen): atomic temp+chmod+rename in cacheQwenCredentials (PR #4255 round-11 #2)

gpt-5.5 /review flagged a real correctness/security gap: the
post-write `chmod` ordering left a window where freshly-written
credentials could land in a broadly-readable existing
`oauth_creds.json` before the chmod tightened it. On POSIX, a
chmod failure additionally degraded to a debug warning while the
broadly-readable tokens stayed on disk.

New shape mirrors the standard atomic-write idiom:

  1. Write `${filePath}.tmp.${pid}.${randomUUID()}` with `mode: 0o600`.
     The temp path doesn't exist beforehand, so the `mode` flag
     actually applies on creation (it doesn't on existing files,
     which was the root of the original race).
  2. Defensive `chmod` on the temp file. POSIX failure is now a
     HARD ERROR (refuses to publish broad-perm credentials to the
     canonical filename). Windows logs a debug breadcrumb and
     proceeds, since chmod is a no-op on most NTFS volumes (perms
     go through ACLs).
  3. Atomic `fs.rename` over `filePath`. The canonical path is
     ALWAYS at `0o600` from the moment it contains the new tokens;
     readers see either the old creds or the new creds, never a
     partially-written or broadly-readable state.
  4. Best-effort `fs.unlink` of the temp file on any failure path
     so failed writes don't leave `.tmp.<pid>.<uuid>` litter on
     disk.

Test mock in `qwenOAuth2.test.ts` extended with `chmod` + `rename`
no-op stubs so the existing 158 core/qwen tests still pass; no test
behavior change beyond the mock surface.

Validation: typecheck clean (cli + core + sdk-typescript); core
qwen 158/158; cli serve 643/643; sdk 384/384.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4255 wenshao + gpt-5.5 round-12 review feedback

Eight findings from a wenshao + gpt-5.5 /review pass: 1 critical
correctness, 2 real defensive defects, 4 edge cases / minor
hardening, 1 test gap. All adopted.

CRITICAL CORRECTNESS

#1 CzSpN — `dispose()` race: after `await provider.poll(...)` the
post-await guard checked only `entry.status !== 'pending'`, NOT
`this.disposed`. `dispose()` clears the registry maps and aborts
the entry's signal but doesn't mutate `entry.status`, so a
provider whose poll already resolved (or doesn't honor abort) could
enter the success branch and call `result.persist({...})` —
committing credentials on a shutting-down daemon. Added the
`if (this.disposed) return;` guard symmetric with the top-of-method
check.

REAL DEFENSIVE DEFECTS

#2 Cy_ZG — sync-throw escape: the `result.persist({signal})` call
happens BEFORE the `new Promise` constructor that captures it
(`persistTracker` is closed-over inside the constructor). A
non-conforming provider whose persist throws synchronously (e.g.
top-of-function validation) would escape past the outer
`try/catch (await new Promise(...))` and become an
`unhandledRejection` since `runPollTick` is fire-and-forget via
`void`. Wrapped the persist invocation in a try/catch that routes
the sync-throw into the same `persistError` branch.

#3 CzSpe — runtime provider map: provider validation hardcoded
`DEVICE_FLOW_SUPPORTED_PROVIDERS` even though `deps.deviceFlowProviders`
is the documented extension hook for tests/future providers.
Switched both POST validation and `/workspace/auth/status`
`supportedDeviceFlowProviders` to derive from
`deviceFlowProviderMap.keys()` — single source of truth matches
the registry's `resolveProvider`.

EDGE CASES / MINOR HARDENING

#4 Cy_Y9 — `slow_down` re-clamp: `intervalMs += SLOW_DOWN_BUMP_MS`
can push past `DEVICE_FLOW_MAX_INTERVAL_MS` (the bound that keeps
`setTimeout` from clamping to TIMEOUT_MAX). Wrapped in
`Math.min(MAX_INTERVAL_MS, ...)` symmetric with the doStart clamp.

#5 Cy_ZF — `expiresInSec` lower bound: `0.5` was finite-positive
and produced `expiresAt = now() + 500 ms` — first poll (clamped at
≥1 s) fires AFTER expiresAt → flow expires before any user could
authorize. Added `DEVICE_FLOW_MIN_EXPIRES_IN_SEC = 30` (RFC 8628
§3.2 calls 5–30 minutes "reasonable"; sub-30s is non-compliant).

#6 CzHOK — take-over response privacy: `initiatorClientId` was
echoed to ANY take-over POST caller, including those with no
`X-Qwen-Client-Id` header. Bearer-gated already, but the
asymmetry "anonymous caller learns who started it" violated the
no-header-as-privacy-signal contract. Now only echoed when the
caller's id matches the entry's initiator.

#7 CzSpd — production audit visibility: production audit sink
dropped `line.hint`, but the registry uses hints for operator-only
breadcrumbs (`provider.poll() threw (raw)...`,
`lost_success_after_timeout`, `persist_also_failed_past_expiry`,
take-over correlation, `deferred (persist in flight; ...)`). The
documented troubleshooting trail was invisible in production
stderr. Now included with a 1 KiB bound + JSON-quoted so multi-
word hints stay parseable.

TEST GAP

#8 Cy_ZH — `lost_success_after_timeout` audit: the
fold-in 9 #7 split-brain detector for non-cooperative providers
had no test pinning it. Added a controllable `latePersist` Promise
+ test that drives poll → success → enters persist race → fires
PERSIST_TIMEOUT (registry publishes persist_failed) → resolves
persist late → asserts the lost_success audit fires.

Validation: typecheck + lint clean; cli serve 644/644 (+1 from
the new test); sdk-typescript 384/384.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): close concurrent multi-provider cap bypass (PR #4255 round-13 #1)

gpt-5.5 /review caught a real workspace-wide cap bypass:
`countActive()` only counted entries already installed in
`byProvider`, but the cap check at the top of `start()` runs
before any provider's `inFlightStarts` slot completes
`provider.start()`. A burst of fresh starts for
`DEVICE_FLOW_MAX_CONCURRENT + 1` distinct providers all run
synchronously to the cap check (each `start()` is async but
runs to its first await — the await happens AFTER the cap
check), all observe `count === 0` (no `byProvider` entries
installed yet), and all pass — eventually installing more
than the documented four pending flows.

Fix: include `inFlightStarts.size` in `countActive()`. The
two maps are disjoint by construction (the existing-pending
fast-path catches any provider with both), so simple
addition cannot double-count. The second concurrent caller
sees count=1, the third count=2, …, and the (MAX+1)th caller
is rejected with `TooManyActiveDeviceFlowsError`.

Test: `caps at DEVICE_FLOW_MAX_CONCURRENT under CONCURRENT
distinct-provider starts`. Fires `MAX+1` concurrent starts
via `Promise.allSettled`, asserts exactly `MAX` fulfilled +
exactly 1 rejected with the typed error. Pre-fix this test
fails (all `MAX+1` succeed); post-fix it passes.

Validation: typecheck clean across all 4 workspaces;
deviceFlow.test.ts 35/35 (was 34); cli serve 645/645.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request May 23, 2026
…#4340)

* fix(review): harden SKILL.md against weak-model rule skipping

Weak models often skip parts of the long /review prompt and fall back
to familiar defaults — `gh pr checkout` instead of the worktree flow,
or running the autofix prompt even when the user passed `--comment`
(which means "only post inline comments, don't mutate code").

Three reinforcements, all in SKILL.md (no CLI changes):

- Promote the two most commonly violated rules to the top of the
  "Critical rules" list: worktree is mandatory for PR reviews, and
  `--comment` skips Step 8 entirely.
- Add an inline blockquote at the top of the Step 1 PR branch that
  names the specific forbidden commands (`gh pr checkout`,
  `git checkout`, `git switch`, `git pull`, `git reset --hard`).
- Add an explicit skip block at the top of Step 8 listing the three
  conditions that bypass autofix — `--comment`, cross-repo lightweight
  mode, or no fixable findings — so a weak model doesn't have to
  infer them from scattered earlier text.

* fix(review): address /review comments on rule scope + Step 8 dedup

Follow-up to the initial harden pass, addressing the inline review
comments on PR QwenLM#4340.

Rule #1 (worktree mandatory):
- Scope it to **same-repo PR reviews** so cross-repo PRs running in
  lightweight mode (no matching local remote, no worktree) don't read
  as a contradiction.
- Replace "Your very first action" with "After argument parsing and
  remote detection, the first command that touches code state" — the
  literal "very first" was wrong since `--comment` parsing and
  URL/remote disambiguation legitimately run before `fetch-pr`.
- Align the forbidden-command list with the Step 1 blockquote (add
  `git pull` and `git reset --hard`) so a weak model that only reads
  the Critical rules section sees the same five commands as a model
  that reaches the blockquote at the point of use.
- Add an explicit "cross-repo PRs use lightweight mode" parenthetical
  so the same model knows where to look for the alternative path.

Step 8 skip block:
- Drop the redundant third bullet ("no Critical or Suggestion findings
  with concrete, applicable fixes") — it was both logically equivalent
  to the "Otherwise" clause below and used a different qualifier
  ("concrete, applicable" vs "clear, unambiguous"), risking a weak
  model treating them as two distinct thresholds.
- "ANY of the following" → "EITHER" since only two bullets remain.
- Fold the no-findings case into the Otherwise clause as a no-op note.
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request May 25, 2026
…wenLM#4103) (QwenLM#4502)

* feat(cli): headless runaway-protection guardrails (QwenLM#4103)

Adds two opt-in run-level budgets and a startup safety warning for
non-interactive / CI / SDK runs. All defaults preserve existing
behavior; the budgets only fire when the user explicitly sets a limit.

Phase 1 — surface unsafe configs and fix doc drift
- New `--yolo`-without-sandbox stderr warning at startup of every
  non-interactive run, emitted by `getHeadlessYoloSafetyWarning` in
  `packages/cli/src/utils/headlessSafetyWarnings.ts`. Suppressible
  via `QWEN_CODE_SUPPRESS_YOLO_WARNING=1` (strict `1`/`true` match so
  `=0` / `=false` don't silence it). Strict env match also applied to
  the `SANDBOX` check so values like `SANDBOX=0` don't accidentally
  bypass the warning.
- Gated on `!config.isInteractive()` at the gemini.tsx call site so
  TUI users aren't nagged.
- `docs/users/configuration/settings.md`: corrected
  `model.skipLoopDetection` default (`true`, not `false`) and reworded
  the `--yolo`/sandbox section — `--yolo` does NOT auto-enable a
  sandbox; sandboxing must still be opted into explicitly.

Phase 2 — run-level budgets with distinct exit code
- `--max-wall-time` / `model.maxWallTimeSeconds`: wall-clock duration
  for the whole run. Flag accepts `90` (s), `30s`, `5m`, `1h`,
  `500ms`. Settings is plain seconds.
- `--max-tool-calls` / `model.maxToolCalls`: cumulative tool
  executions (success + failure). Ticked BEFORE each `executeToolCall`
  so a budget of N caps the run at exactly N executions.
- New `FatalBudgetExceededError` (exit code 55), distinct from
  `FatalTurnLimitedError` (53) and `FatalCancellationError` (130) so
  CI scripts can branch on the reason. JSON output mirrors the
  `handleMaxTurnsExceededError` / `handleCancellationError` envelope
  convention.
- Enforced via `RunBudgetEnforcer` in
  `packages/cli/src/utils/runBudget.ts`, wired to the same
  `AbortController` as SIGINT so existing cancellation plumbing
  carries the abort. A `routeAbort` helper distinguishes budget vs.
  SIGINT at the abort-check sites and at the outer catch.

Critical correctness fixes (informed by the QwenLM#4105 review pass)
- Drain-loop fall-through: the inner drain-item `for await` previously
  exited via `finalizeAssistantMessage(); return;`, swallowing a
  budget abort that fires during the last drain item and surfacing
  exit code 0. Now routes through `routeAbort` so exit 55 is
  preserved.
- Settings symmetry: `maxWallTimeSeconds: 0` in settings.json is now
  rejected (same as `--max-wall-time 0`); the enforcer treats `<=0`
  as "no timer" so silent disable would be a foot-gun.
  `validateMaxWallTimeSetting` also rejects `Infinity` / `NaN`.
- `setTimeout` overflow: both parser paths reject durations above
  `Math.floor((2^31 - 1) / 1000)s` (~24.8 days). Node clamps
  oversized delays to 1ms and fires the timer almost immediately;
  fail loud at startup instead.
- First-fence-wins + SIGINT race: `markExceeded` no-ops if the
  controller was already aborted by a third party, so a budget tick
  arriving after user SIGINT doesn't misattribute the abort to exit
  code 55.
- Outer catch re-routes mid-stream `AbortError`s through the budget
  handler so users see "Run aborted: …" instead of raw "AbortError".

Tests
- `runBudget.test.ts` (32 tests): parser happy / reject paths,
  setting validator, post-increment off-by-one, `maxToolCalls=0`
  meaning "disallowed", `-1` meaning unlimited, wall-clock under
  fake timers, `stop()` cancels pending timer, idempotent `start()`,
  first-fence-wins, SIGINT-race protection.
- `headlessSafetyWarnings.test.ts` (7 tests): YOLO + sandbox / env
  matrix; strict-truthy `SANDBOX` check; suppression env.
- Pre-existing suites: `nonInteractiveCli.test.ts` (46),
  `gemini.test.tsx` (23), `config/config.test.ts` (220),
  `core/utils/errors.test.ts` (12), `core/config/config.test.ts`
  (172) all green after picking up the new config getters / CliArgs
  fields.

Backward compatibility
- All budgets default to `-1` (unlimited); existing CLI invocations
  behave identically.
- New stderr warning only fires in the narrow YOLO-no-sandbox case,
  with an explicit suppress env.
- New exit code 55 is purely additive; no existing exit codes change
  meaning.

* fix(cli): address audit findings for headless guardrails (QwenLM#4103, QwenLM#4502)

Round-1 audit (3 angles × line-by-line + removed-behavior + cross-file)
plus an open-ended design pass surfaced eight correctness issues. This
commit lands all of them; the larger ACP / serve-mode structural items
are documented for follow-up.

Correctness fixes

- headlessSafetyWarnings: `SANDBOX` env check reverted to plain truthy.
  The sandbox transport sets `SANDBOX` to `sandbox-exec` (macOS
  seatbelt) or the container name (`qwen-code-sandbox`), neither of
  which matches `isTruthyEnv`. The PR's strict-`1`/`true` check was
  emitting the "no sandbox" warning INSIDE real sandboxes. Match the
  rest of the codebase (sandboxConfig.ts, gemini.tsx, Footer.tsx,
  prompts.ts, …) which all treat any non-empty value as "sandboxed".
- nonInteractiveCli main-loop abort: add `finalizeAssistantMessage()`
  before `routeAbort()`. The drain-item loop already had it (PR QwenLM#4502
  Critical bug #1); the main loop was asymmetric — stream-json
  consumers would see an unterminated `message_start` when a budget /
  SIGINT abort landed mid-stream.
- nonInteractiveCli drain-loop `routeAbort`: also flush
  `flushQueuedNotificationsToSdk(localQueue)` and
  `finalizeOneShotMonitors()` before exiting. The old `return`-and-
  fall-through path went through the outer holdback loop, which did
  this flushing; switching to `routeAbort()` skipped it, so
  `task_started` envelopes lost their paired `task_notification`.
- nonInteractiveCli catch handler: emit `adapter.emitResult({...})`
  BEFORE `handleBudgetExceededError`, with the budget message as
  `errorMessage` when budget tripped. Previously the budget handler
  `process.exit(55)`ed before the adapter could emit a terminal
  `result` envelope, so STREAM_JSON consumers never saw a stream
  terminator on budget exits and hung waiting for one.
- runBudget: new `validateMaxToolCalls` mirrors
  `validateMaxWallTimeSetting`. yargs coerces non-numeric flag values
  (`--max-tool-calls abc`) to `NaN`, and the enforcer's `>= 0` gate
  treats `NaN` and negatives as "no limit", silently disabling the
  budget. Reject `NaN`, `Infinity`, fractional, and negative-other-
  than-`-1` values at both flag and settings layers. `0` remains
  legal (`first tick aborts`), unlike wall-time where 0 is fatal.
- runBudget: new `MIN_WALL_TIME_SECONDS = 1` floor. Previously
  `--max-wall-time 500ms` parsed cleanly and aborted on the next
  event-loop tick before any model round-trip — almost certainly a
  typo (`5m`?) and not a useful guardrail at any rate.
- nonInteractiveCli `tickToolCall`: exempt `ToolNames.STRUCTURED_OUTPUT`.
  Under `--json-schema` this is the terminal "I'm done" contract tool,
  not real work. Without the exemption a budget-edge completion is
  aborted as a false positive (model used N tools then emitted
  structured_output as call N+1 → exit 55 instead of success).
- commands/serve.ts: emit the YOLO-no-sandbox warning at daemon
  startup when settings.json statically configures
  `tools.approvalMode: 'yolo'` with no `tools.sandbox` /
  `SANDBOX` env. The daemon can't use `getHeadlessYoloSafetyWarning`
  (no Config yet — sessions get their own) so we re-derive the
  predicate from settings. Per-session ACP override is documented as
  out of scope.

Documentation

- `docs/users/features/headless.md`: new "Scope" subsection under
  Run-level budgets explaining (a) `--max-tool-calls` counts top-level
  dispatches only — subagent / `agent` tool inner calls are not
  counted, (b) `structured_output` is exempt, (c) stream-json input
  mode resets budgets per user message, (d) `qwen serve` / ACP
  sessions do not currently consult budgets from settings.json.

Tests

- `runBudget.test.ts` grows from 32 → 41 tests: `validateMaxToolCalls`
  (NaN / Infinity / negatives / fractional), `parseDurationSeconds`
  sub-second rejection, `validateMaxWallTimeSetting` sub-second
  rejection.
- `headlessSafetyWarnings.test.ts`: replaced the "still warns when
  SANDBOX is 0/false/no" case (which encoded the strict-check bug) with
  positive coverage for the real sandbox-set values
  (`sandbox-exec`, `qwen-code-sandbox`).

All previously-green suites still green: cli/nonInteractiveCli (46),
cli/gemini.test (23), cli/config/config.test (220), core/utils/errors
(12), core/config/config.test (172). 337 tests across the touched suites.

Won't-fix (out of scope, documented or pre-existing)

- Unpaired `tool_use` in stream-json when a tool is aborted mid-execution
  — pre-existing structural gap (SIGINT mid-tool has the same outcome);
  PR amplifies it but doesn't introduce it.
- Narrow SIGINT-vs-budget-timer race — already mitigated by
  `markExceeded`'s `signal.aborted` check.
- `tickToolCall` increments past abort (cosmetic; only affects the
  `observed` value in the error envelope for a pathological caller).

* fix(cli): round-2 audit fixes for headless guardrails (QwenLM#4103, QwenLM#4502)

Round-2 audit (after round-1 commit 40ae6dd) surfaced two NEW
correctness issues introduced by the round-1 catch-handler restructure,
plus a handful of polish items from a parallel design pass.

Correctness fixes (new bugs from R1)

- nonInteractiveCli catch handler: wrap `adapter.emitResult` in
  try/catch. R1 moved the emit BEFORE `handleBudgetExceededError` so
  STREAM_JSON consumers see a terminal envelope first. But emitResult
  eventually hits `stdout.write`, which throws on EPIPE /
  ERR_STREAM_WRITE_AFTER_END when a piped consumer closes early
  (`qwen -p ... | head -n 1` is the common CI case). Letting that
  throw bubble out skipped both `handleBudgetExceededError` and
  `handleError`, dropping the documented exit-code-55 contract
  precisely when stdout was in trouble. Best-effort emit and continue
  to the exit handler.
- nonInteractiveCli `structured_output` exemption: also require
  `config.getJsonSchema?.() !== undefined`. Without that guard, an
  MCP server registering an unrelated tool literally named
  `structured_output` would silently bypass `--max-tool-calls`. Also
  documents (in `headless.md` "Scope") the related caveat that failed
  Ajv-validation retries skip the tick too, so a malformed-output
  retry loop is NOT bounded by `--max-tool-calls` — combine with
  `--max-session-turns` or `--max-wall-time`.

Polish

- runBudget `validateMaxToolCalls` upper bound: cap at 1_000_000.
  `1e10` (typo for `1e1`) would otherwise parse cleanly, pass the
  `>= 0` gate forever, and silently disable the budget — the exact
  foot-gun `MAX_WALL_TIME_SECONDS` was built to prevent. Symmetry.
- runBudget `parseDurationSeconds` sub-second hint: only append the
  "did you mean Ns?" suggestion when the input actually contained
  `ms`. Bare `0.5` would otherwise produce a useless "did you mean
  0.5s?" suggestion.
- nonInteractiveCli `routeAbort`: the `throw 'unreachable'` is only
  hit if `handleBudgetExceededError` / `handleCancellationError` ever
  becomes resumable (e.g. mocked `process.exit` in a test). Carry
  the original exceeded.message into the thrown Error so the outer
  catch's `errorMessage` field stays actionable instead of degrading
  to a literal "unreachable" string.
- commands/serve.ts: compare `approvalMode` against `ApprovalMode.YOLO`
  enum instead of the string literal `'yolo'`. If the enum value is
  ever renamed, the startup warning stays in sync with the helper at
  `headlessSafetyWarnings.ts` instead of silently going dead.

Documentation

- `headless.md` "Scope": clarify the `structured_output` exemption is
  unconditional (including failed validations); add explicit note
  that `--max-session-turns` does NOT exempt `structured_output`, so
  size to `N+1` for `N` real-work turns under `--json-schema`.
- `headless.md` flag table: add `1.5h` to the accepted-forms hint for
  `--max-wall-time` (the parser already accepts fractional units).

Tests

- `runBudget.test.ts`: new coverage for the `validateMaxToolCalls`
  ceiling. Total 42 tests across `runBudget.test.ts` (was 41), all
  green. cli/nonInteractiveCli, gemini.test, config/config all
  unchanged and still green.

Won't-fix (documented above or out of scope)

- ACP per-session approval-mode escalation (mid-session flip to YOLO)
  doesn't print the warning — daemon-level wiring; out of scope for
  this PR.
- 1s wall-time floor vs higher (5–10s) — debatable, keeping 1s with
  loud sub-second rejection; can raise later without semver impact.
- Integration test for the full budget-trip → catch → emitResult →
  exit 55 path — requires a process-exit-mocking harness; tracked as
  follow-up.

* docs: align headless guardrails examples with R1 sub-second floor

Round-3 audit caught two stale doc surfaces that R1's 1-second wall-time
floor (and R2's `1.5h` fractional-unit addition) didn't update:

- `docs/users/features/headless.md` budget table: replace stale `500ms`
  example with `1.5h`, add explicit "minimum 1s — sub-second values are
  rejected as typos" note.
- `docs/users/configuration/settings.md` `model.maxWallTimeSeconds` row:
  same fix. Also extend `model.maxToolCalls` row with the structured_output
  exemption note, the `0` semantic, and the 1,000,000 ceiling that R2
  added.

A user copying the documented `--max-wall-time 500ms` example from either
surface would hit a startup error after R1.

Known follow-up (not addressed in this commit)

- No test exercises the R2 `isStructuredOutputExempt` predicate end-to-end.
  Adding one needs the same process-exit-mocking harness called out in the
  R2 commit as a separate follow-up.

* docs: align JSDoc / schema / CLI help with R1+R2 validation rules

Round-4 final-pass audit caught four schema/help-text/JSDoc surfaces
that drifted from the validators introduced in R1 (1s wall-time floor,
24-day ceiling) and R2 (1M tool-call ceiling, structured_output
exemption, `0` sentinel).

- `runBudget.ts` `parseDurationSeconds` JSDoc: replace stale claim
  that `500ms` is accepted and "sub-second precision is preserved"
  with the actual contract — `[MIN_WALL_TIME_SECONDS, MAX_WALL_TIME_SECONDS]`,
  ms suffix only legal when value resolves to >= 1s. Adds `1.5h` to
  the accepted-forms list.
- `settingsSchema.ts` `model.maxWallTimeSeconds` description: now
  documents the 1s minimum and ~24-day ceiling.
- `settingsSchema.ts` `model.maxToolCalls` description: documents the
  structured_output exemption, the `0` sentinel ("no tool calls
  allowed"), and the 1,000,000 ceiling.
- `vscode-ide-companion/schemas/settings.schema.json`: mirrors both
  schema descriptions above so the VS Code settings UI auto-completion
  matches.
- `config.ts` yargs `--max-wall-time` description: documents the 1s
  floor and the ~24-day max.
- `config.ts` yargs `--max-tool-calls` description: documents the
  structured_output exemption, the `0` sentinel, and the 1M ceiling.
  `qwen --help` is the most-read surface for these flags; matches the
  prose docs in headless.md and settings.md.

No code changes — pure doc/help-text alignment.

---------

Co-authored-by: 克竟 <dingbingzhi.dbz@alibaba-inc.com>
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request Jun 12, 2026
…NL (closes QwenLM#3681, QwenLM#4095 Phase 2) (QwenLM#4333)

* feat(core): add atomicWriteFileSync + forceMode option

Sync mirror of atomicWriteFile for code paths that can't await
(settings persistence on exit, sync config writers). Same semantics:
symlink chain resolution, permission preservation, fsync via flush:true,
EPERM/EACCES rename retry, EXDEV fallback to direct write.

Add forceMode option on AtomicWriteFileOptions — when true, ignore the
existing target's permission bits and apply options.mode regardless.
Needed for credential files that must heal historically over-permissive
files (e.g. a 0o644 token restored from backup must be forced to 0o600).
Honored by both async and sync paths. Default false preserves existing
behavior.

Reuses Atomics.wait for true blocking sleep in renameWithRetrySync —
no busy-wait, no extra dep.

Refs: QwenLM#4095 Phase 2

* refactor(core): migrate credential writes to atomicWriteFile (QwenLM#4095 Tier 1)

Route all OAuth credential persistence through atomicWriteFile with
forceMode: true, so a process crash mid-write cannot leave the user
with a half-written token file, and historically over-permissive files
(e.g. 0o644 from a manual restore) are healed to 0o600 on the next
write.

- oauth-token-storage.ts: setCredentials, deleteCredentials
- file-token-storage.ts: saveTokens (encrypted MCP token storage)
- qwenOAuth2.ts: cacheQwenCredentials (also fixes missing mode — was
  inheriting 0o644 from umask, now forced to 0o600)
- sharedTokenManager.ts: saveCredentialsToFile — drops ~15 lines of
  hand-rolled tmp + rename in favor of the shared helper

Lock-file writes using flag: 'wx' (sharedTokenManager.ts:720) are
intentionally left untouched — they rely on exclusive-create semantics
that atomic write does not preserve.

Tests updated to mock atomicWriteFile instead of fs.writeFile.

Refs: QwenLM#4095 Phase 2

* refactor(core): migrate memory state writes to atomicWriteFile (QwenLM#4095 Tier 2)

Route all auto-memory state persistence through atomicWriteFile so a
process crash during a dream/extract/forget cycle cannot corrupt the
metadata sidecar, extraction cursor, or topic body files.

Touched: manager (writeDreamMetadata), extract (writeExtractCursor +
bumpMetadata), indexer (rebuild), dream (bumpDreamMetadata), forget
(bumpMetadata + topic body rewrite).

manager.ts:362 acquireDreamLock uses flag: 'wx' for exclusive create —
left untouched, atomic write does not preserve that semantic.

Uses atomicWriteFile (not atomicWriteJSON) to preserve the trailing
newline these files have always had.

Refs: QwenLM#4095 Phase 2

* refactor: migrate config + logger + state writes to atomic helpers (QwenLM#4095 Tier 3a)

Route the remaining state-file write paths through atomic helpers so a
crash mid-write cannot corrupt config, log, or session-scoped state:

- trustedFolders.ts (sync): atomicWriteFileSync — sole path that flips
  workspace trust, must not half-write
- logger.ts (4 sites): atomicWriteFile — full-file JSON rewrites for
  logs.json and per-checkpoint files
- tipHistory, installationManager, projectSummary, todoWrite,
  trustedHooks: bonus sites with the same shape (state JSON written
  multiple times per session)

todoWrite is on the hot path — writes every time the todo list mutates
— so the added rename + fsync cost is measurable (a few ms per write
on SSD). Trade-off accepted to avoid a half-written todos file
silently breaking the next session's resume.

Export atomicWriteFile / atomicWriteFileSync from the core public
index so CLI-side callers (trustedFolders, tipHistory) can reach them.

Tests updated:
- logger.test.ts uses vi.importActual to re-export the real helper and
  override per-test via vi.mocked(atomicWriteFile).mockRejectedValueOnce
- trustedFolders.test.ts and todoWrite.test.ts mock the helper directly

Refs: QwenLM#4095 Phase 2

* fix(core): flush JSONL appends to disk (QwenLM#4095 Tier 3b, closes QwenLM#3681)

QwenLM#3656 fixed the read side of glued '}{' JSONL records — when a process
was killed mid-appendFile, the trailing '\n' was lost and the next
record was concatenated. The write side was left for a follow-up
(QwenLM#3681).

This adds flush:true (fsync) to every per-line append:
- jsonl-utils.ts writeLine / writeLineSync (session transcripts,
  auto-titles, prompt history)
- debugLogger.ts appendFile (per-session debug log)

jsonl-utils.ts write() (full-file replace) now goes through
atomicWriteFileSync so a crash during overwrite cannot corrupt the
session transcript either.

Trade-off: fsync on every append adds disk-sync latency (single-digit
ms on SSD, more on spinning disk / network FS). Acceptable for a few
writes per turn; the alternative is silently losing the last record
of every interrupted session, which QwenLM#3681 explicitly flagged.

Refs: QwenLM#4095 Phase 2 Tier 3b
Closes: QwenLM#3681

* refactor(core): migrate extension config + LSP edit to atomic write

Catch up two sites where Claude Code's equivalent path is atomic but
qwen-code's isn't (verified against
/Users/jinye.djy/Projects/claude-code on 2026-05-19):

- extension/extensionManager.ts:533, :1073 — enablement config and
  install metadata writes. Claude Code's plugin install-counts and zip
  cache use atomic temp+rename via writeFileSyncAndFlush_DEPRECATED.
- lsp/NativeLspService.ts:1351 — applying an LSP edit to a user file.
  Claude Code's FileWriteTool/FileEditTool both route through atomic
  writeTextContent → writeFileSyncAndFlush_DEPRECATED. A bare
  writeFileSync here could half-write the user's source file if the
  process is killed during an LSP-driven rename or quick-fix.

Also clean up stale fs.rename mock setups in sharedTokenManager.test.ts
that became no-ops after Tier 1 migration (rename is no longer called
by saveCredentialsToFile). The fs.writeFile mocks stay because the
wx-flag lock path still uses them.

Refs: QwenLM#4095 Phase 2

* chore: cosmetic cleanups from PR review

- packages/core/src/index.ts: move atomicFileWrite export to its
  alphabetical position (before browser.js)
- tipHistory.ts: add forceMode: true to atomicWriteFileSync for
  consistency with other 0o600 sites — heals legacy 0o644 files even
  though tips are non-critical

Refs: QwenLM#4095 Phase 2

* fix(core): address Codex review findings on Phase 2 PR

Three issues caught by post-merge Codex review of the QwenLM#4095 Phase 2
branch — none had user-visible symptoms yet but all were latent bugs.

1. atomicFileWrite: forceMode without mode silently downgraded perms
   `if (!options?.forceMode)` skipped the existing-mode stat whenever
   forceMode was true, regardless of whether `mode` was also supplied.
   Calling `atomicWriteFile(p, data, { forceMode: true })` (no mode) on
   an existing 0o600 file produced 0o644 (umask default) instead of
   preserving 0o600. Tightened the guard to also require `options.mode`
   to be defined; mirrored fix in atomicWriteFileSync. Added two
   regression tests (async + sync) that assert mode preservation.

2. logger.test.ts: vi.resetAllMocks() blanked the atomicWriteFile shim
   The vi.fn(actual.atomicWriteFile) factory implementation gets reset
   to a no-op by `vi.resetAllMocks()` in beforeEach, which would make
   `logger.initialize()` silently skip creating logs.json on disk.
   Tests passed by coincidence (file pre-existence from prior runs).
   Captured the real implementation at module load and re-attach it via
   `mockImplementation` after each reset.

3. NativeLspService.applyTextEdits: atomic write bypassed file unwritability
   The read catch swallowed every error and treated it as "new empty
   file". With atomic write (tmp + rename), an unreadable target on a
   writable parent could be replaced with edits applied to an empty
   buffer — the old fs.writeFileSync would have errored on the target
   permission. Now only ENOENT is treated as new-file; other read
   errors (EACCES, EISDIR, etc.) propagate.

Refs: QwenLM#4095 Phase 2

* fix(lsp): refuse LSP edits to chmod 0444 files (Codex round 2)

The previous fix only handled "read failed → propagate the error".
Codex round 2 caught the remaining gap: a file that's readable but
chmod 0444 (read-only) would still be replaced by the atomic rename,
because rename only needs parent-directory write access.

Add an explicit fs.accessSync(W_OK) check before the atomic write.
ENOENT is allowed through so LSP can still create new files via edits.

Refs: QwenLM#4095 Phase 2

* fix(core): drop withTimeout around atomic credential write (Codex round 3)

`saveCredentialsToFile` wrapped `atomicWriteFile` in `withTimeout(5000)`.
If the call hits the 5s budget (e.g. slow NFS home, network-backed
storage, fsync added by Phase 2), withTimeout rejects but the
atomicWriteFile internal write+rename keeps running unobserved:

  1. withTimeout rejects → saveCredentialsToFile throws
  2. performTokenRefresh `finally` releases the refresh lock
  3. Another process acquires the lock and writes newer credentials
  4. The original atomicWriteFile finally completes its rename and
     overwrites the newer credentials — silent token rollback

Pre-migration the code awaited the tmp write and the rename in two
separate withTimeout calls; a timed-out tmp write never reached the
rename so there was no race against the target file. The migration
collapsed both into one inseparable atomicWriteFile, which made the
timeout actively unsafe (the work cannot be cancelled after the
timeout fires — fs.rename is not abortable).

Atomic write is durable by design — accept the I/O latency. The
mkdir and stat timeouts are kept (idempotent and read-only
respectively, no corruption risk on late completion).

Refs: QwenLM#4095 Phase 2

* test(core): add rename-retry + EXDEV-fallback coverage (QwenLM#4333 review)

Address PR review suggestions from wenshao (via qwen-latest /review):
neither renameWithRetry/Sync nor the EXDEV cross-device fallback had
direct test coverage. Both paths are critical (Windows AV contention,
Docker tmpfs /tmp) and a regression would degrade silently.

Vitest can't spy on ESM exports of `node:fs` (`Cannot redefine property:
renameSync`), so add narrow internal test seams instead:
- renameWithRetry / renameWithRetrySync take an optional `_renameImpl`
  parameter, defaulting to fs.rename / fs.renameSync.
- atomicWriteFile / atomicWriteFileSync take an optional `_testFs`
  parameter with `rename` and `writeFile` overrides, forwarded to the
  retry helper and used in the EXDEV fallback branch.

The seams are underscore-prefixed and JSDoc-tagged as "Internal test
seam — production callers never pass this", which keeps the public API
clean while making the behavior testable.

New coverage (+9 tests, 36 → 45):
- renameWithRetry: retry-EPERM-then-succeed, give-up after retries,
  no-retry on non-retryable (ENOSPC)
- renameWithRetrySync: same 3 patterns (EACCES, EPERM exhausted, EINVAL)
- EXDEV fallback: async direct write + tmp cleanup, sync ditto,
  non-EXDEV failure propagates without fallback (rejects EIO + tmp cleanup)

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(test): update telemetry sdk.test.ts appendFile assertions for flush:true

CI failure on all 3 OSes (macos / ubuntu / windows): sdk.test.ts asserted
`fs.appendFile` was called with `'utf8'` as the 3rd positional argument,
but commit b7badc7 (QwenLM#4095 Tier 3b — JSONL fsync) changed the
`debugLogger.ts` appendFile call from string-form to options-form
`{ encoding: 'utf8', flush: true }` to enable per-line fsync. Update the
3 assertions in the telemetry diagnostics test to match the new shape.

No behavior change — debugLogger still flushes per append; only the
assertion in this previously-unrelated suite needed updating.

Refs: QwenLM#4333, QwenLM#4095 Phase 2 Tier 3b

* test: cover LSP error branches + sync EXDEV cleanup + JSONL writes (QwenLM#4333 review)

Address three review suggestions from wenshao (via qwen-latest /review),
each pointing at a real coverage gap introduced by this PR:

1. NativeLspService.applyTextEdits error branches (round-2 LSP fix):
   the ENOENT-only read guard and the fs.accessSync(W_OK) refusal had
   no automated coverage. Added 3 tests accessing applyTextEdits via a
   typed cast (the method is private; making it public for one
   verification inflates API surface). Tests use chmod 0000 / chmod 0444
   reproducers and assert (a) read failure propagates EACCES without
   silently overwriting with empty content, (b) W_OK rejects with
   EACCES/EPERM before the atomic rename touches the target,
   (c) nonexistent files are still accepted so LSP can create via edits.

2. atomicWriteFileSync non-EXDEV rename failure cleanup: the async
   counterpart had an explicit EIO-rename test asserting tmp cleanup;
   the sync variant did not. Added the mirror — injects a sync rename
   throwing EIO via the existing _testFs seam and asserts
   `readdirSync(tmpDir).length === 0`.

3. jsonl-utils writeLine / writeLineSync / write smoke tests: the three
   write paths are the core fix for QwenLM#3681 (the PR's headline goal) but
   downstream callers (chatRecordingService, sessionService) mock them
   entirely. Without direct unit tests, a regression that dropped
   `flush: true` or reverted `write()` to bare writeFileSync would go
   undetected. Added 3 real-fs roundtrip tests.

Test count delta:
- NativeLspService.test.ts: 15 → 18
- atomicFileWrite.test.ts: 45 → 46
- jsonl-utils.test.ts: 22 → 25

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core,test): wrap atomic write errors + guard root user + cover save failure (QwenLM#4333 review)

Three review fold-ins from wenshao (via qwen-latest /review):

1. atomicFileWrite: error messages reference the random `.tmp.<hex>`
   path, not the logical target — many callers (memory subsystem,
   extension manager) don't wrap the error, making debug logs unhelpful.
   Add `annotateWriteError(error, targetPath)` that mutates the error
   message in-place to prefix `atomicWriteFile("<targetPath>"): ` while
   preserving `code` / `errno` / `syscall` / `stack` / the prototype
   chain so downstream `err.code === 'ENOENT'` checks and `instanceof`
   narrowing keep working. Applied to both async and sync variants;
   only the final propagated throw (not the EXDEV fallback path) is
   annotated.

2. NativeLspService.test.ts: the chmod 0444 and chmod 0000 tests rely
   on `accessSync(W_OK)` and `readFileSync` failing — but on POSIX with
   UID 0 (root, including most Docker CI runners), permission bits are
   bypassed and `accessSync` always succeeds. The tests would silently
   pass even with the W_OK guard removed entirely. Add
   `process.getuid?.() === 0` to the skip guard on both tests.

3. sharedTokenManager.test.ts: the catch block in saveCredentialsToFile
   that maps disk-full / permission-denied to
   `TokenManagerError(FILE_ACCESS_ERROR)` was never exercised — every
   prior test mocked atomicWriteFile as always-successful. Added a
   regression test that rejects atomicWriteFile with ENOSPC and asserts
   the wrapped TokenManagerError surfaces with the right type and
   carries the original message.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core,lsp): annotate sync errors correctly + cover EXDEV fallback + async LSP IO (QwenLM#4333 review)

Three review fold-ins from wenshao (via qwen-latest /review), all real
correctness/consistency issues:

1. annotateWriteError hardcoded "atomicWriteFile" prefix regardless of
   caller. Sync write failures produced misleading
   `atomicWriteFile("/path"):` prefixes in incident logs. Add optional
   `fnName` parameter (defaults to async name) and have sync call sites
   pass "atomicWriteFileSync".

2. EXDEV fallback path in both async and sync variants did NOT route
   the inner writeFileImpl/tryChmod failures through annotateWriteError.
   On a cross-device write that subsequently hit ENOSPC, the propagated
   error had a bare syscall message without the target-path prefix —
   breaking the function's documented error-shape contract. Wrap both
   fallback branches in try/catch + annotate.

3. NativeLspService.applyTextEdits is declared `async` and all callers
   `await` it, but the round-2 fix mixed in sync IO: readFileSync,
   accessSync, atomicWriteFileSync. The sync helper's renameWithRetrySync
   blocks the event loop up to ~350ms under Atomics.wait EPERM backoff
   — particularly bad for LSP workspace edits that loop over many files.
   Switch to async throughout: fsp.readFile, fsp.access, atomicWriteFile.
   Behavior preserved (same ENOENT-vs-other distinction, same W_OK gate).
   Existing tests pass unchanged (they already use the async typed-cast
   entry point).

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* test(core): cover EXDEV-fallback-write-failure annotation path (QwenLM#4333 review)

The previous fold-in added try/catch around the EXDEV fallback write in
both async and sync variants, routing fallback failures through
`annotateWriteError(err, target, fnName)`. The sync-specific
`fnName='atomicWriteFileSync'` differentiation is ONLY exercised on
that path, so a regression that dropped or misapplied the annotation
on sync would otherwise go undetected.

Two new tests inject both a failing rename (EXDEV) and a failing
writeFile (ENOSPC) via the `_testFs` seam, then assert (a) the
original `code === 'ENOSPC'` propagates intact, and (b) the message
matches `/atomicWriteFile(Sync)?\(<target>\):.*ENOSPC/` — verifying
target-path prefix AND correct fn-name differentiation.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core): annotate guard, drop debug fsync, add noFollow for creds (QwenLM#4333 review)

Four review fold-ins from wenshao (via qwen-latest /review):

1. annotateWriteError guard `!error.message.includes(targetPath)` was
   always false in production. tmpPath is `${targetPath}.<hex>.tmp`, so
   any real fs error like `ENOSPC ... '/path/creds.json.a1b2c3.tmp'`
   *contains* targetPath as substring → guard returned false → the
   annotation prefix was silently skipped on every real failure path.
   Tests passed only because mock errors used bare `new Error('ENOSPC')`
   messages without paths. Change guard to idempotency check on our
   own prefix (`startsWith(`${fnName}(`)`), and update the two existing
   EXDEV-fallback tests to use realistic path-embedding fs messages.

2. debugLogger.appendFile dropped `flush: true`. 1050+ call sites,
   default-enabled, fire-and-forget — per-line fsync creates sustained
   I/O pressure / SSD wear with no user benefit (debug logs are
   best-effort, the module already tracks `hasWriteFailure` for the
   degraded-mode UI). Kept the kernel page-cache flush; revert
   debugLogger.test.ts and telemetry/sdk.test.ts assertions back to
   plain `'utf8'`. The QwenLM#3681 closure target is jsonl-utils writeLine,
   not debug logs.

3. Symlink security regression: the old
   `fs.writeFile(tmp) + fs.rename(tmp, filePath)` pattern atomically
   *replaced* a pre-placed symlink at `filePath`. atomicWriteFile's
   default `resolveSymlinkChain(filePath)` follows the link and writes
   through, redirecting tokens to wherever the link points (real
   concern on shared hosts with weaker-than-expected dir perms). Add
   `noFollow?: boolean` option that skips chain resolution; apply
   `noFollow: true` to all 4 credential write sites
   (oauth-token-storage [2 sites], file-token-storage, sharedTokenManager)
   to match the pre-migration replace-symlink semantics.

4a. Test seam `_testFs` was a 4th positional arg → considered moving
    into options. Punted: positional with underscore + JSDoc is
    materially the same surface as options field with underscore,
    and the only realistic collision (future production option as
    5th arg) is bounded by review.
4b. Sync/async code duplication (~110 lines mirror) → DECLINED.
    Refactoring to a sync/async-polymorphic helper introduces a new
    abstraction layer with worse type ergonomics; the duplication is
    mechanical and lined up for easy diffing. Tracked as Phase 2.5
    candidate if divergence actually accumulates.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core): EXDEV fallback honors noFollow + fix test correctness (QwenLM#4333 review)

Three review fold-ins from wenshao (via qwen-latest /review):

1. **[CRITICAL]** EXDEV fallback path silently bypassed `noFollow`
   protection. The fallback `writeFileImpl(targetPath, ...)` is
   `fs.writeFile` / `fs.writeFileSync`, both of which follow symlinks.
   When a credential write site set `noFollow: true` and rename threw
   EXDEV (realistic on Docker OverlayFS / union mounts), the fallback
   would write credentials *through* a pre-placed symlink to an
   attacker-controlled target — the exact attack noFollow was meant
   to prevent. Fix: when `noFollow` is set, the EXDEV fallback now
   `unlink`s any existing entry, then opens with `O_WRONLY|O_CREAT|O_EXCL`
   to refuse to write through a symlink that races back. Applied to
   both async and sync variants.

2. **Test correctness**: the two EXDEV-fallback-write-failure tests
   added in the previous round had a bug — the `failingWrite` mock
   threw on every call, including the first call which is the
   tmp-file write. The tmp write failed → outer catch caught
   ENOSPC → EXDEV check returned false (code is ENOSPC) → fell
   through to the outer annotateWriteError. The inner
   EXDEV-fallback annotation was never exercised. The assertions
   passed only because annotateWriteError produces the same format
   in both catch blocks. Fix: selective-failure mock that succeeds
   on the first call (tmp write) and fails on the second (fallback
   write), genuinely reaching the EXDEV branch.

3. **Behavioral noFollow tests**: previously `noFollow: true` was
   only verified at the "option is passed to mock" level. Added 4
   real-fs tests (async + sync × happy-path + EXDEV-fallback) that
   pre-place a symlink, call atomicWriteFile with noFollow, and
   assert: (a) the symlink is replaced by a regular file, (b) the
   new file holds the new data, (c) the real file behind the
   symlink is untouched. A regression flipping the noFollow ternary
   or skipping the noFollow-aware EXDEV fallback now fails directly.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core): close TOCTOU window on noFollow EXDEV fallback + cover ENOENT path

Two PR QwenLM#4333 round-7 review items folded in:

1. TOCTOU race: the path-based `tryChmod(targetPath)` after the EXDEV
   noFollow branch ran AFTER `fd.close()` released the inode reference.
   Between close and chmod, an attacker with parent-directory write access
   could replace the regular file with a symlink, redirecting the
   `chmod 0o600` onto an attacker-chosen target — silently defeating the
   `noFollow` protection that the `unlink + O_EXCL` pattern was added to
   provide. Fix: switch to `fd.chmod`/`fchmodSync` on the open fd before
   close (operates on the inode, immune to symlink swap), and skip the
   path-based chmod for the noFollow branch (path-based chmod remains
   for the non-noFollow direct-write branch, where following symlinks
   was already in scope).

2. Missing test coverage: all 4 existing noFollow EXDEV tests pre-place
   a symlink at the target, so `unlink(targetPath)` always succeeded —
   the ENOENT-swallow branch (first-write scenarios, e.g. initial
   credential provisioning on a cross-device mount) had no coverage.
   Added 2 tests (async + sync) verifying the fallback creates a new
   file with the requested mode when the target never existed.

Test results: 54/54 atomicFileWrite tests pass (was 52). Async + sync
parity preserved.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* test(core): skip new noFollow EXDEV-fallback tests on Windows (NTFS perm bits)

The two new-file noFollow EXDEV tests added in 7c47664 assert the file
ends up at 0o600. NTFS reports 0o666 for any file that isn't read-only
regardless of chmod/fchmod, so the assertion fails on Windows runners
(`AssertionError: expected 438 to be 384`). Match the existing
`it.skipIf(process.platform === 'win32')` pattern already used by all
mode-asserting tests in this file. Linux/macOS coverage of the
ENOENT-swallow + new-file path is unchanged.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core): narrow fchmod catch + verify mechanism on noFollow EXDEV

Two PR QwenLM#4333 round-8 review items:

1. The `catch {}` around `fd.chmod(desiredMode)` (and `fchmodSync`) was
   intended to tolerate filesystems without POSIX permissions (FAT/exFAT)
   but silently swallowed every error code: EPERM under a hardened
   sandbox, EIO on flaky NFS/CIFS, EROFS if the filesystem is remounted
   read-only mid-write. Because this path operates on the *final
   credential file* — not a tmp file — a silent fchmod failure leaves
   the target at the umask-masked open() mode with no diagnostic trail.
   Narrowed the catch to ENOSYS/ENOTSUP so the FAT/exFAT case still
   tolerates failure but security-relevant errors propagate.

2. The round-7 noFollow-EXDEV-new-file tests asserted the final mode
   (0o600) but didn't verify the *mechanism*. Under typical umask 0o022,
   `open(O_EXCL, 0o600)` already creates the file at 0o600, so a
   regression that swapped `fd.chmod()` back to a path-based
   `tryChmod(targetPath)` (the pre-fix TOCTOU-vulnerable form) would
   leave the mode assertion passing — defeating the round-7 fix
   undetected. Added a `chmod` test seam to `atomicWriteFile` /
   atomicWriteFileSync` (mirroring the existing `rename` / `writeFile`
   seams) and asserted that path-based chmod is never invoked against
   the credential target on this code path.

54/54 atomicFileWrite tests pass.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core): clean up O_EXCL orphan on noFollow EXDEV fchmod failure + cover both branches

Two PR QwenLM#4333 round-9 review items, one critical correctness bug
introduced by the round-8 catch narrowing:

1. **[Critical]** When `fd.chmod(desiredMode)` (or `fchmodSync`) on the
   noFollow EXDEV fallback throws a propagating error (EPERM under
   seccomp, EIO on flaky NFS, EROFS after a remount), the file created
   by `open(targetPath, O_CREAT|O_EXCL)` remained on disk after the
   error rethrew. Every subsequent credential write retry then hit
   EEXIST from O_EXCL and failed permanently — turning a transient
   sandbox EPERM into a permanent OAuth-refresh deadlock that requires
   manual file removal. All three credential write sites
   (`sharedTokenManager`, `oauth-token-storage` save+delete,
   `file-token-storage`) hit this code path. Fix: a `writeOk` flag plus
   nested try/catch — fd is closed in the inner finally, then if write/
   sync/fchmod failed, the orphan is unlinked best-effort before the
   error rethrows.

2. The fchmod catch-narrowing (the headline behavior of round 8) had
   zero test coverage on either branch — `_testFs` exposed `rename`,
   `writeFile`, and path-based `chmod`, but no hook for the open-fd
   `fchmod`. A one-line revert from the narrowed catch back to
   `catch {}` would pass every existing test. Added `fchmod` field to
   `_testFs` (async + sync) and four tests:
   - ENOSYS swallowed → write succeeds (FAT/exFAT happy path)
   - EPERM propagates AND `targetPath` is absent (regression-tests #1)
   for both async and sync.

58/58 atomicFileWrite tests pass (was 54). Async + sync parity preserved.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core): annotate symlink errors + cover ENOTSUP/EACCES + log orphan unlink

Four PR QwenLM#4333 round-10 review items, all small bounded fixes:

1. Parameterized the FAT/exFAT fchmod-swallow tests over both ENOSYS
   (Linux) and ENOTSUP (macOS). Round-9 only covered ENOSYS — a
   one-token regression dropping `ENOTSUP` from the catch condition
   would have passed every existing test.

2. The orphan-unlink catch block was empty (`/* best effort */`),
   leaving no diagnostic trail when the cleanup itself fails (EROFS,
   immutable flag, sandboxed container). Added a `createDebugLogger`
   import (`'ATOMIC_WRITE'` category) and a debug log so incident
   response can correlate the original write error with a subsequent
   EEXIST loop. Async + sync.

3. The pre-open `unlink(targetPath)` correctly propagates non-ENOENT
   errors (EACCES on parent dir, EROFS), but no test exercised that
   path. Added an `unlink` field to the `_testFs` seam (async + sync,
   matching the existing `rename` / `writeFile` / `chmod` / `fchmod`
   pattern) and two tests verifying EACCES propagates instead of
   getting hidden behind a downstream EEXIST from O_EXCL.

4. `resolveSymlinkChain(filePath)` ran before the function's main
   try-block, so symlink-resolution errors (EACCES on intermediate
   dir, ELOOP from circular chain) bypassed the
   `atomicWriteFile("path"): ...` annotation that every other failure
   path applies — leaving `err.path` referencing an internal
   intermediate directory the caller never asked about. Wrapped with
   `.catch(err => throw annotateWriteError(err, filePath))` (async)
   and the equivalent try/catch for sync. Added a real-fs ELOOP
   regression test for both variants (skipped on Windows).

64/64 atomicFileWrite tests pass (was 58).

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core): widen atomicWriteJSON options + tighten trustedHooks perms + cover backoff/mkdir branches

Five PR QwenLM#4333 round-11 review items, all small and bounded:

1. **atomicWriteJSON option-type widening**: Was typed as the narrower
   `AtomicWriteOptions` (retries / delayMs only), so credential-grade
   options added in this PR (`mode`, `forceMode`, `noFollow`) and
   pre-existing ones (`flush`, `encoding`) were silently dropped at the
   type level even though the body spread them at runtime. A future
   maintainer calling `atomicWriteJSON(credPath, creds, {noFollow:
   true, mode: 0o600, forceMode: true})` would have typechecked but
   silently lost noFollow + forceMode + mode. Widened to
   `AtomicWriteFileOptions`.

2. **trustedHooks 0o600 + forceMode**: This file lists user-approved
   *executable hook commands* — strictly more sensitive than the
   sibling state files (`trustedFolders.json`, `tipHistory.json`) that
   already use `{mode: 0o600, forceMode: true}`. Was dropping to the
   process umask (0o644 by default), and a backup-restored looser mode
   was never healed. Now matches the sibling pattern.

3. **renameWithRetry exponential-backoff coverage**: Existing tests
   covered retry count and error propagation but not the
   `delayMs * 2 ** attempt` curve itself. A regression to linear,
   constant, or — worst — regressive backoff (which intensifies under
   Windows AV-scan stress) would have passed every existing test.
   Added a test using `vi.useFakeTimers()` that records gaps between
   mock-rename invocations and asserts `[delayMs, 2*delayMs,
   4*delayMs]` for `(retries=3, delayMs=50)`.

4. **jsonl-utils write() parent-dir creation**: The other `write()`
   test targets a path inside the pre-created `tmpRoot`, so the
   `!existsSync(dir) → mkdirSync(dir, {recursive: true})` branch was
   never exercised. Added a one-liner test that targets a deeply
   nested non-existent path.

5. **writeLineSync docstring accuracy**: The docstring claimed "uses
   a simple flag-based locking mechanism (less robust than async
   version)" but there is no flag-based locking — and `writeLine`
   serializes via per-file `Mutex` that this function bypasses. Now
   accurately documents the lack of locking, the bypass, and the
   `flush: true` rationale (closes QwenLM#3681).

Test results: 65/65 atomicFileWrite tests pass (was 64), 26/26
jsonl-utils tests pass (was 25). Typecheck clean.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core): force JS slow path for appendFile flush + correct debugLogger format

Two PR QwenLM#4333 round-12 review items, both real bugs:

1. **flush:true was silently a no-op for string payloads on appendFile**.
   Node's C++ fast path (binding.writeFileUtf8) for string + utf8 +
   appendFile bypasses the JS-side flush/fsync logic entirely —
   empirically string+flush:true takes ~0.05ms/op (identical to no
   flush) while Buffer+flush:true takes ~4.9ms/op (91× slower,
   proving fsync only runs for Buffer payloads). The data still
   reaches the kernel page cache (the syscall is synchronous), so
   `kill -9` is fine, but power-loss durability — the actual QwenLM#3681
   guarantee — was silently absent.

   Fix: pass `Buffer.from(line, 'utf8')` to both writeLine (async)
   and writeLineSync. This forces the JS slow path that honors
   `flush: true` and actually fsyncs the file. Updated the JSDoc
   on both functions to document the C++ fast-path bypass so a
   future maintainer doesn't revert to the simpler string form.

2. **`debugLogger.debug` doesn't do printf substitution**.
   `debugLogger`'s `formatArgs` (debugLogger.ts:67-77) just joins
   args with spaces — no `util.format()`. The round-10 calls used
   `'orphan unlink failed for %s: %s'` which rendered the literal
   `%s` markers in the log:
       orphan unlink failed for %s: %s /path/to/target Error: EACCES
   instead of:
       orphan unlink failed for /path/to/target: Error: EACCES
   Switched both async and sync sites to template literals, matching
   every other `debugLogger` call site in the codebase.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core): narrow tryChmod catch + writeFileSync(fd) + cover credential write failures

Three PR QwenLM#4333 round-13 review items + one comment-wording softening:

1. **`tryChmod`/`tryChmodSync` catch narrowed (wenshao)** — was bare
   `catch {}` swallowing all errors, while the round-8 `fchmod` catch
   was already narrowed to ENOSYS/ENOTSUP. Same security rationale
   applies — and *specifically* on the EXDEV non-noFollow fallback,
   `tryChmod(targetPath)` is the *sole* mode-setting mechanism for an
   existing target (writeFile ignores `mode` when the target exists),
   so a silent EPERM/EROFS would leave credentials at the old mode.
   Non-credential callers don't pass `mode` → `desiredMode === undefined`
   short-circuits, so they're unaffected.

2. **Sync EXDEV `writeSync` → `writeFileSync(fd)` (yiliang114)** —
   `fsSync.writeSync(fd, buf)` returns bytes-actually-written and can
   short-write; the current code ignored the return, so a partial write
   would silently truncate the credential file with the call still
   returning success after fsync+fchmod. Switched to
   `fsSync.writeFileSync(fd, buf)` which loops internally per Node spec.
   The async sibling (`fd.writeFile`) already handles short-writes;
   this brings sync parity.

3. **`file-token-storage` failure-path coverage (wenshao)** — both
   `setCredentials` and `deleteCredentials` propagate `atomicWriteFile`
   rejections (no try/catch around the call), but no test exercised
   that path. Added two tests mirroring the round-1 sharedTokenManager
   precedent: ENOSPC on `setCredentials` and EROFS on `deleteCredentials`
   both rethrow.

4. **Round-12 comment wording softened (wenshao verification report)** —
   strace on Node v22/v24 confirms string + utf8 + flush:true does
   fsync correctly today, counter to my round-12 "silent no-op"
   framing. Buffer is still the safer documented form (forward-compat
   insurance against any future fast-path optimization), but the
   commit's claim that it was *fixing* a confirmed bug overstated what
   reproduces. Reframed both writeLine and writeLineSync comments
   accordingly without changing the code behavior.

Test results: 109/109 affected suites pass (atomicFileWrite 65,
jsonl-utils 26, file-token-storage 18). Broader credential/state
suites also green: 216/216 across sharedTokenManager + oauth-token-storage
+ qwenOAuth2 + logger + trustedFolders. Typecheck clean.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core): route orphan unlink through seam + cover tryChmod + trustedHooks tests

Three PR QwenLM#4333 round-14 review items:

1. **Path-level tryChmod / tryChmodSync catch narrowing had zero direct
   coverage**. Round-13 narrowed the bare `catch {}` to ENOSYS/ENOTSUP
   only (matching the round-8 fd-level fchmod narrowing), but every
   existing EXDEV test passed `options: undefined`, so `desiredMode ===
   undefined` short-circuited before any chmod was attempted. A
   regression that inverted the catch condition (missing `!` prefix)
   would silently swallow EPERM on every sync credential write
   (trustedHooks, tipHistory, trustedFolders, etc.) with zero
   diagnostic signal. Added 6 tests via the existing `_testFs.chmod`
   seam: parameterized ENOSYS/ENOTSUP swallow + EPERM propagation, for
   both async (`atomicWriteFile`) and sync (`atomicWriteFileSync`).

2. **Orphan-cleanup unlink on the noFollow EXDEV failure path was
   using raw `fs.unlink` / `fsSync.unlinkSync` instead of the
   injected `unlinkImpl` seam**. The pre-open unlink correctly used
   the seam, but the round-9 orphan cleanup added later bypassed it,
   making it the only fs operation in `atomicWriteFile` not flowing
   through the test seam. Routed both async and sync orphan cleanup
   through `unlinkImpl`, and added 2 tests that inject a spy and
   assert orphan cleanup is invoked against targetPath after a
   simulated fchmod EPERM.

3. **`trustedHooks.ts` had no test coverage**. Round-11 migrated it to
   `atomicWriteFileSync` with `{ mode: 0o600, forceMode: true }` —
   strictly the most security-sensitive write in the PR since the
   file stores user-approved executable hook commands — but unlike
   the sibling files (trustedFolders, tipHistory) it had no test
   file. A regression that dropped `forceMode: true` or weakened
   the mode would have passed all existing tests. Created
   `trustedHooks.test.ts` covering: write goes through
   atomicWriteFileSync with `{ mode: 0o600, forceMode: true }`,
   write targets the global qwen dir path, the persisted content
   matches the hook key derived from the hook config, and round-trip
   trust/untrust behavior.

Test results: 73/73 atomicFileWrite tests (was 65) + 5/5
trustedHooks (new). Typecheck clean.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core): add noFollow to trustedHooks/trustedFolders + route tmp cleanup through seam

Two PR QwenLM#4333 round-15 review items:

1. **`trustedHooks.ts` and `trustedFolders.ts` were missing `noFollow:
   true`**. The credential write sites
   (`sharedTokenManager`/`oauth-token-storage`/`file-token-storage`)
   all pass `{ mode: 0o600, forceMode: true, noFollow: true }` to
   prevent pre-placed symlink attacks. The trustedHooks comment
   already called it "strictly more sensitive than trustedFolders /
   tipHistory" — yet credential paths got symlink protection it
   didn't. A pre-placed symlink at `~/.qwen/trusted_hooks.json` (or
   `trustedFolders.json`) could redirect the atomic write to an
   attacker-controlled path, either leaking the executable-trust
   list / trusted-folder list, or leaving the user's real config
   silently stale. Added `noFollow: true` to both write sites and
   updated the assertions in `trustedHooks.test.ts` and
   `trustedFolders.test.ts`.

2. **Tmp-file cleanup at L240 (async) and L568 (sync) used raw
   `fs.unlink` / `fsSync.unlinkSync` instead of the injected
   `unlinkImpl` seam**. Pre-open unlink and orphan cleanup correctly
   routed through `unlinkImpl`, making the tmp-cleanup branch the
   only outlier — `_testFs.unlink`-injecting tests couldn't intercept
   this path, weakening the seam abstraction. Behavioral impact is
   nil (cleanup is best-effort, errors swallowed), but consistency
   matters for future test authors. Routed both async and sync
   variants through `unlinkImpl`.

Test results: 99/99 affected suites pass (atomicFileWrite 73,
trustedHooks 5, trustedFolders 21). Typecheck clean on core.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core): use path.join in trustedHooks test to fix Windows CI

PR QwenLM#4333 round-15 review item (Critical, from claude-opus-4-8 /qreview):

trustedHooks.test.ts:68 hardcoded a POSIX path
('/mock/home/.qwen/trusted_hooks.json'), but production builds the path
with path.join(Storage.getGlobalQwenDir(), 'trusted_hooks.json')
(trustedHooks.ts:29-31). On Windows path.join emits '\' separators, so
the mocked atomicWriteFileSync receives
'\mock\home\.qwen\trusted_hooks.json' and the toBe assertion fails — the
cause of the red Test (windows-latest, Node 22.x) check. macOS/Linux
runs are green because forward slashes match.

Build the expected path with path.join so it matches the platform
separator, and add the node:path import.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* fix(core,cli): round-16 review — tipHistory noFollow + atomicWrite test coverage + LSP concurrency note

Four PR QwenLM#4333 follow-up review items from wenshao:

1. tipHistory.ts was missing noFollow: true while the sibling
   0o600+forceMode config sites (trustedFolders.ts:192,
   trustedHooks.ts:63) both set it. The cleanup commit added forceMode
   "for consistency with other 0o600 sites" but stopped short of
   noFollow. All three store user-trusted paths and should share the
   same pre-placed-symlink protection. Added noFollow: true.

2. annotateWriteError idempotency guard had no direct coverage. The
   guard switched from includes(targetPath) to startsWith(fnName+"(")
   because real syscall errors embed the *tmp* path (which contains the
   target as a substring), so the old guard silently skipped annotation
   on every real failure. Added a test where the rename error message
   embeds a tmp-style path containing the target; the correct startsWith
   guard still annotates it, and reverting to includes would fail it.

3. non-EXDEV rename failure tests only asserted /EIO/, not the
   atomicWriteFile(...): / atomicWriteFileSync(...): annotation prefix
   that the production re-throw applies on that path. Tightened both
   async and sync assertions to match the prefix.

4. applyTextEdits concurrency constraint was undocumented. The async
   read-modify-write has await points between read and write;
   atomicWriteFile prevents torn files but does not serialize writers,
   so concurrent same-path edits can lose updates. Latent today (no
   production caller / no workspace/applyEdit handler), so documented
   the per-file-serialization requirement instead of adding a lock.

Not taking: extending the _testFs seam with open/openSync to fault-inject
O_EXCL EEXIST — the noFollow/O_EXCL security behavior is already covered
by the real-fs behavioral tests (noFollow EXDEV fallback refuses to
follow symlinks); a seam-injected EEXIST would only exercise error
propagation, not the guarantee.

Test results: atomicFileWrite 74/74 pass, eslint clean.

Refs: QwenLM#4333, QwenLM#4095 Phase 2

* test(core): route atomicWriteFile open through _testFs seam + assert O_EXCL (PR QwenLM#4333 review)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request Jun 12, 2026
QwenLM#3731) (QwenLM#4432)

* feat(telemetry): Phase 4b — retry visibility for qwen-code.llm_request (QwenLM#3731)

Adds per-attempt retry telemetry for HTTP-status retries (429/5xx) emitted by
retryWithBackoff at the 4 LLM call sites. Second slice of Phase 4 (sub-issue

Architectural discovery (mid-planning)
--------------------------------------

The Phase 4 design doc assumed claude-code's "one LLM span owns the retry
loop" pattern. Reading the 4 retryWithBackoff call sites revealed qwen-code
inverts that: retryWithBackoff sits ABOVE LoggingContentGenerator. Each
attempt creates a fresh LLM span. The original "in-LCG accumulator" plan
wouldn't work.

Resolution: propagate retry state via AsyncLocalStorage (`retryContext`).
retryWithBackoff wraps each `await fn()` in `retryContext.run(...)`, and
LoggingContentGenerator reads the ALS in its synchronous prelude (before
the first await) and threads the snapshot into all endLLMRequestSpan
callsites — success / error / idle-timeout / abort. Matches existing
patterns (promptIdContext, subagentNameContext, agent-context).

Plan went through 3 review rounds (Plan-agent reviews) finding 22 issues
total — all addressed before implementation.

Changes
-------

- New retryContext.ts (AsyncLocalStorage<RetryAttemptContext>) with
  attempt + requestSetupMs + retryTotalDelayMs fields. Computed in
  retry.ts immediately before `await fn()` so values are anchored to the
  attempt's actual start, not derived downstream.

- retry.ts:
  - New `onRetry?: (info: RetryAttemptInfo) => void` option on RetryOptions.
    Opt-in per caller: non-LLM callers stay silent.
  - Monotonic `iterationCount` decoupled from `attempt` (which is clamped at
    `maxAttempts - 1` in persistent mode). Always reflects "this is the Nth
    fn() call" — no flip-flopping for mixed-error sequences.
  - retryContext.run wrap around fn() so LCG can read the ALS.
  - onRetry invocations wrapped in try/catch: telemetry exceptions never
    break the retry loop (logged via debugLogger).
  - logRetryAttempt debug log line KEPT — useful when OTel SDK isn't wired
    up (local CLI debugging, integration tests, early-startup errors).

- ApiRetryEvent telemetry event class (types.ts) with model + promptId +
  attempt_number + error fields + subagent_name. JSDoc cross-references
  ContentRetryEvent (they cover different retry budgets — HTTP-status vs
  invalid-stream — and can both fire for one prompt).

- logApiRetry function in loggers.ts — three-sink fan-out matching
  logContentRetry: QwenLogger RUM, OTel log signal (bridged via
  LogToSpanProcessor), recordApiRetry metric counter.

- recordApiRetry metric (metrics.ts) — `qwen-code.api.retry.count` Counter
  tagged with {model}. Full COUNTER_DEFINITIONS entry + initialization +
  recording function + index.ts export.

- qwen-logger.ts adds logApiRetryEvent for RUM consistency.

- 4 LLM caller wiring sites (client.ts, baseLlmClient.ts x2,
  geminiChat.ts) opt in with onRetry callback that emits ApiRetryEvent
  with subagentName from subagentNameContext.getStore().

- LoggingContentGenerator: snapshotRetryMetadata() helper called in the
  SYNCHRONOUS prelude of generateContent / generateContentStream — only
  point where retryContext is guaranteed active for the streaming path
  (the returned AsyncGenerator is iterated AFTER retryWithBackoff
  resolves). Snapshot threaded as parameter to loggingStreamWrapper so
  every endLLMRequestSpan callsite (success / error / idle-timeout /
  abort) sees the same values. `attempt` defaults to 1 when no retry
  context is present (warmup, side-queries, direct calls) so dashboards
  filtering WHERE attempt=1 include those.

Bundled Phase 4a bug fix (sampling_ms formula)
-----------------------------------------------

Phase 4a's `sampling_ms = duration_ms - ttft_ms - (requestSetupMs ?? 0)`
was silently wrong. `duration_ms` only covers `ttft + sampling` for the
span (startTime is captured when startLLMRequestSpan runs, AFTER any
setup phase). Subtracting setup again is double-counting. Phase 4a
masked the bug because requestSetupMs was always undefined → 0. Phase
4b populates requestSetupMs with cumulative retry overhead — without
this fix, sampling_ms would clamp to 0 for every retried request,
wiping output-throughput data exactly when operators need it most.

Fix: `sampling_ms = duration_ms - ttft_ms` (drop the setup subtraction).
Phase 4a tests updated accordingly: 1 test rewritten to use inputs that
actually exercise the clamp under the new formula (ttft > duration =
clock skew); 1 test renamed to assert the FIX (setup is NOT subtracted).

Out of scope (deferred, noted in PR description)
------------------------------------------------

- Persistent retry mode emission cap (50+ events under
  QWEN_CODE_UNATTENDED_RETRY). Aggregated attempt/retry_total_delay_ms
  remain accurate regardless.
- SDK-internal retries (openai/google-genai maxRetries=3) remain
  invisible — operator awareness only.
- Stream-iteration errors (mid-stream network drop during for-await)
  bypass retryWithBackoff entirely. Pre-existing behavior, not a Phase 4b
  regression.
- shouldRetryOnContent content-retry path (retry.ts:184-193) skips
  onRetry. No caller uses this path today — code path is dead.

Tests
-----

- retry.test.ts: 9 new cases (monotonic counter, requestSetupMs growth,
  first-try success, onRetry callback contract, absent-callback silence,
  callback-throws resilience, shouldRetryOnError mid-loop giveup,
  parallel-call ALS isolation, nested-retry inner-frame read).
- loggers.test.ts: 3 new cases (3-sink fan-out, subagent_name
  propagation, SDK-not-initialized path).
- loggingContentGenerator.test.ts: 4 new cases (non-stream ALS
  propagation, non-stream default attempt=1, stream ALS propagation
  through wrapper closure, stream default attempt=1).
- session-tracing.test.ts: 1 test rewritten + 1 renamed for the
  sampling_ms fix.

All 580 telemetry + retry + LCG tests pass. tsc --noEmit clean.
eslint clean.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(telemetry): address Phase 4b review comments (QwenLM#4432)

Fixes 6 of 9 inline review comments from wenshao + Copilot. The remaining
3 are pushback (duration_ms semantic = design intent per D5; persistent
retry cap = explicitly deferred in PR description).

1. Fix JSDoc inaccuracy on `onRetry` contract (#1+#2): the comment
   incorrectly said "synchronous throws inside fn execute OUTSIDE the ALS
   frame." In fact fn() runs inside retryContext.run() so throws ARE inside
   the frame. What's outside the frame is the onRetry callback itself (it
   fires from the catch block). Rewritten per wenshao's suggestion: tells
   callers not to read retryContext.getStore() inside onRetry — all data
   comes via the RetryAttemptInfo parameter.

2. Add doc comment on content-retry delay inflation (#3): retryTotalDelayMs
   accumulator includes content-retry delays (shouldRetryOnContent path)
   which don't fire onRetry. This is intentional — the LLM span attribute
   reports total user-perceived backoff time — but was undocumented.

3. Add signal?.aborted guard before onRetry invocations (#6): if the abort
   signal fires between the catch and onRetry execution point, we now skip
   the callback to avoid phantom retry events that inflate the counter for
   retries that never actually proceeded. Applied to both persistent and
   normal retry paths.

4. Add persistent retry path test (status=429 + persistentMode) (#4): the
   highest-volume production retry path had zero Phase 4b test coverage.
   Now verifies onRetry fires with monotonic attempt counter and that
   persistent-mode exponential backoff produces increasing delayMs.

5. Add Retry-After header path test (status=429 + retry-after: 2) (#7):
   verifies that when the error carries a Retry-After header,
   onRetry.delayMs reflects the parsed header value (2000ms) instead of
   the exponential backoff calculation.

6. Add stream idle-timeout retry-attr propagation test (#8): verifies that
   the closure-captured retrySnapshot reaches the setTimeout-fired
   endLLMRequestSpan call with correct retry context values (attempt=4,
   requestSetupMs=3000, retryTotalDelayMs=2500).

All 186 affected tests pass (retry 68 + LCG 48 + session-tracing 70).
tsc --noEmit clean. eslint clean.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(telemetry): R3 review fixes — idle-timeout test guard + prompt_id in RUM (QwenLM#4432)

Addresses 2 of 5 R3 review comments from wenshao (2026-05-26):

1. loggingContentGenerator.test.ts:2290 — replace `if (timeoutRecord)` guard
   with `expect(timeoutRecord).toBeDefined()` so the idle-timeout retry-attr
   test fails loudly instead of passing with 0 assertions when setTimeout
   doesn't fire. Also rewrote the test to use fake timers from the START
   (so the 5-min idle timeout is created under fake clock and can be advanced
   via vi.advanceTimersByTimeAsync), fixing the underlying reason it wasn't
   firing.

2. qwen-logger.ts:963 — add `prompt_id: event.prompt_id` to
   logApiRetryEvent RUM properties. Without this, RUM dashboards cannot
   correlate api_retry events with specific prompts, unlike the analogous
   logApiErrorEvent which already includes prompt_id.

165 affected tests pass. Remaining 3 R3 items (#9 onRetry helper, #10
error-path test coverage, #11 caller integration assertions) deferred to
follow-up PR — non-blocking refactor/test-hardening.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request Jun 12, 2026
…wenLM#4647)

* fix(clipboard): use platform-native tools for image paste on Linux

Replace @teddyzhu/clipboard native module with wl-paste/xclip on Linux
to fix image paste in WSL2+Wayland environments.

The native module uses X11 protocol and cannot read clipboard images
when the session uses Wayland (common in WSL2 with WSLg). This causes
clipboardHasImage() to return false even when the clipboard contains
an image.

Changes:
- Use wl-paste --list-types to detect images (Wayland)
- Use xclip -selection clipboard -t TARGETS -o to detect images (X11)
- Handle image/bmp format from Windows clipboard (WSL2 exposes BMP)
- Convert BMP to PNG using Python PIL when available
- Detect clipboard tool via WAYLAND_DISPLAY when XDG_SESSION_TYPE is unset
- Keep @teddyzhu/clipboard as fallback for macOS/Windows

Fixes QwenLM#3517
Fixes QwenLM#2885

* test: update clipboard tests for platform-native tools

The tests were mocking @teddyzhu/clipboard but the implementation now
uses platform-native tools (wl-paste/xclip) on Linux. Update mocks
to test the spawn-based implementation.

* fix: address critical review comments

1. Fix command injection in Python BMP-to-PNG conversion
   - Use sys.argv instead of string interpolation
   - Prevents path traversal via single-quote injection

2. Fix BMP fallback dead code
   - When PIL is not available, return BMP file path instead of
     deleting the only copy and returning false
   - Update saveClipboardImage to handle non-PNG return paths

* fix: address review suggestions for resource leaks and robustness

- #3: Add proper cleanup in saveFromCommand error paths (kill child, destroy stream)
- #4: Add 5s timeout for all spawned processes to prevent TUI hangs
- #7: Check exit code in checkClipboardForImage (code === 0)
- #8: Move fs.mkdir inside try/catch in saveClipboardImage
- #10: Merge checkWlPasteForImage/checkXclipForImage into checkClipboardForImage

* fix: address all remaining review comments

Source code fixes:
- #25: Add timeout to getWlPasteImageTypes (PROCESS_TIMEOUT_MS)
- #26: Add timeout to python3 spawn in BMP-to-PNG conversion
- #27: Wrap child.kill() in try-catch in timeout handlers
- #28: Replace dynamic import('node:fs/promises') with static statSync
- #30: Export resetLinuxClipboardTool() for testability
- Add try-catch around spawn in checkClipboardForImage
- Use stdio: ['ignore', 'ignore', 'ignore'] for python3 spawn

Test fixes:
- #24: Use vi.hoisted() for mock functions (avoids hoisting issue)
- #31: Stub process.platform = 'linux' in beforeEach
- Add default export to node:child_process mock
- Use EventEmitter-based mock child for async behavior
- All 7 tests passing

* perf: cache wl-paste --list-types result to avoid redundant calls

Avoid spawning wl-paste twice on the paste hot path:
1. clipboardHasImage calls wl-paste --list-types (check)
2. saveClipboardImage calls getWlPasteImageTypes (get types)

Now the result is cached after the first call and reused.
Cache is reset via resetLinuxClipboardTool() for testing.

* fix: address remaining review suggestions

- #1: Add child.stdout error handler in saveFromCommand
- #2: Add macOS/Windows test coverage for @teddyzhu/clipboard fallback
- #3: Fix .replace('.png', '.bmp') to use regex /\.png$/ to prevent path corruption

* fix: address critical cache invalidation and other review feedback

- #1 Critical: Reset cachedWlPasteImageTypes at start of clipboardHasImage
  to prevent stale data between paste operations
- #1 Critical: Check exit code in getWlPasteImageTypes close handler,
  do not cache failed results
- #2: Replace statSync with async fs.stat to avoid blocking event loop
- #3: Remove async from close handler, use promise chain instead
- #4: Return false instead of bmpPath when PIL conversion fails,
  as downstream expects .png files
- #5: Capture stderr from spawned processes for diagnostics

* fix: address remaining code review issues

- #1: Narrow detection to only report supported formats (png/bmp)
- #2: Do not cache results on timeout or error
- #3: Use line-level matching instead of includes('image/')
- #4: Replace execSync with execFileSync to avoid shell injection
- #5: Upgrade BMP→PNG failure log to warn level with install hint

* fix: restore getClipboardModule import caching (regression fix)

The original Qwen Code cached the @teddyzhu/clipboard module import via
getClipboardModule() with cachedClipboardModule and clipboardLoadAttempted.
Our refactoring removed this caching, causing the module to be re-imported
on every clipboardHasImage/saveClipboardImage call.

Restored the original caching mechanism for macOS/Windows fallback path.

* test: add saveClipboardImage success path and cache behavior tests

- Add test for successful PNG save path
- Add test for cache invalidation between clipboardHasImage calls
- All 11 tests passing

* fix: revert execSync to fix WSL2 clipboard detection

execFileSync('command', ['-v', 'wl-paste']) fails because 'command'
is a shell built-in, not an executable. execSync runs through a shell
so it can find 'command'. Reverted to execSync to restore clipboard
tool detection on WSL2.

Also fixed TypeScript errors in tests by using (child as any) for
mock event emitter properties.

* fix: address critical file leak and filter issues from review

- #1: Clean up bmpPath in catch block when PIL conversion fails
- #2: Narrow getWlPasteImageTypes filter to only image/png and image/bmp
- #3: Clean up empty PNG file when size guard fails
- #3b: Fix typo python3-pyl → python3-pil

* test: add xclip, BMP, error path test coverage; fix weak assertion

- Add xclip/X11 path tests (detection, no image, not found)
- Add BMP-to-PNG conversion tests (PIL failure, prefer PNG over BMP)
- Add saveFromCommand error path tests (timeout, spawn error, stdout error)
- Replace tautological 'successful PNG save' assertion with proper null-on-error tests
- Fix ESLint: add no-explicit-any suppressions, prefix unused setupWaylandEnv

Note: xclip save success path requires createWriteStream mock that vitest
cannot fully support with ...actual spread. Detection and error paths verified.

19 tests passing.

* fix: remove unused _setupWaylandEnv function that breaks TS build

Fixes TS6133 error caused by noUnusedLocals: true in tsconfig.json.
The function was generated by test agent but never called.

* fix: clean up tempFilePath on PIL conversion failure

When python3 PIL conversion fails mid-write, tempFilePath (the target
.png) may have been partially written. Add fs.unlink(tempFilePath) in
the catch block to prevent partial file leakage.

Suggested by wenshao in PR review.

* fix: address review feedback on file leaks and test coverage

- Add tempFilePath cleanup when python3 PIL conversion fails mid-write
- Restore image/bmp detection with clarifying comment (WSL2 Wayland)
- Fix stat mock syntax (remove debug console.log, simplify)
- Fix originalPlatform scope (was undefined in afterEach)

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>

19 tests passing, tsc + eslint clean.

* ci: retrigger tests

* fix: address review feedback on test coverage and defensive guard

- Replace tautological saveClipboardImage assertion with meaningful
  spawn-argument verification
- Wrap clipboardHasImage Linux branch in try/catch guard (preserve
  'never throw, return false' contract)
- Fix node:fs/promises mock to use importOriginal for indirect deps
- Add readFile/writeFile/appendFile/access/copyFile/rename/rm/rmdir
  to mock (required by indirect deps like chatCompressionService)
- Remove node:fs root mock to avoid cross-test pollution

19 tests passing, tsc + eslint clean.

* fix: address review feedback on test coverage and defensive guard

- Replace tautological saveClipboardImage assertion with spawn-arg
  verification (prefer PNG over BMP test)
- Wrap clipboardHasImage Linux branch in try/catch guard
- Fix node:fs/promises mock to use importOriginal for indirect deps
- Add missing fs/promises methods (readFile etc.) required by deps
- Remove node:fs root mock entirely to avoid cross-test pollution
- Document xclip/BMP save success path: blocked by vitest built-in
  module mock limitation

19 tests passing, tsc + eslint clean.

* fix: secure clipboard temp filename with random UUID suffix

Add random UUID to temp filename to prevent predictable path
symlink attacks (Critical review feedback). The UUID makes the
path unguessable, eliminating the symlink attack vector.

19 tests passing, tsc + eslint clean.

* fix: add O_EXCL protection against symlink attacks in saveFromCommand

Use fs.open with O_EXCL flag (O_WRONLY|O_CREAT|O_EXCL) to atomically
create the file, refusing to follow symlinks. Combined with the random
UUID filename from the previous commit, this fully addresses the
symlink attack vector identified in review.

Also update 'prefer PNG over BMP' test: with O_EXCL, the save path
fails when mkdir is mocked (directory doesn't exist), so the test
now verifies format detection only rather than the full save pipeline.

19 tests passing, tsc + eslint clean.

* fix: capture python3 stderr for BMP conversion errors

Use stdio 'pipe' for stderr instead of 'ignore' so users see useful
diagnostic messages (e.g. ModuleNotFoundError: No module named PIL)
when python3 BMP-to-PNG conversion fails.

19 tests passing, tsc + eslint clean.
TaimoorSiddiquiOfficial pushed a commit that referenced this pull request Jun 12, 2026
* perf(core): F2 cleanup PR A — R9/W11/W12/R10 (post-merge follow-ups) (#4411)

* refactor(core): F2 PR A R9 — McpClientManager options-object ctor

R9 (filed as F2 follow-up from #4336 review): 7 positional ctor args
collapse to (config, toolRegistry, options?: McpClientManagerOptions).
The trailing 5 (eventEmitter, sendSdkMcpMessage, healthConfig,
budgetConfig, pool) become named fields on `McpClientManagerOptions`.
Test factory `mkManager(overrides?)` introduced at the top of
`mcp-client-manager.test.ts` so each of the prior 80 inline
constructions becomes a single line naming only the field(s) the test
overrides; the 4 `undefined` sentinels each test threaded through to
reach the trailing `pool` arg are gone.

Net: 113 LOC removed (test) + 35 LOC added (src exposes interface +
mkManager factory + tool-registry call site update). Behavior
unchanged — same field assignments, same downgrade-enforce-without-
budget breadcrumb, same budget event wiring.

Filed bucket: F2 perf / cleanup PR A (R9 + W11 + W12 + R10/R23 T7),
see issue #4175 item 7 "F2 post-merge cleanup PRs". This is the first
of the 4 fixes in PR A; W11/W12/R10 follow as separate commits.

Test sweep: 84/84 mcp-client-manager.test.ts pass; typecheck clean.

* refactor(core): F2 PR A W11 — extract attachPooledSession + rollbackReservationOnSpawnFailure

W11 (filed as F2 follow-up from #4336 review): two private helpers
on `McpTransportPool` to eliminate inline duplication in `acquire()`:

  - `attachPooledSession(entry, id, serverName, cfg, sessionId,
    toolReg, promptReg)`: builds `SessionMcpView` + `entry.attach`
    with the standard pool release callback. Used by both the
    fast-path attach (existing entry) and the post-spawn attach
    (after `await inFlight`). NOT used by `createUnpooledConnection`
    — its release callback runs `entry.forceShutdown('manual')` +
    `indexDetach` directly (no pool refcount accounting since
    unpooled entries are per-session).

  - `rollbackReservationOnSpawnFailure(reservationResult, serverName)`:
    R24 T17 contract — only release the budget slot if THIS acquire
    actually reserved a new slot (`'reserved'`); `'already_held'`
    skips because the sibling owns it. Used by both the unpooled
    catch and the pooled spawn-in-flight catch.

Race-window invariants (W10 / W77 / W90 / W111 / W125 / R24 T17)
stay at the call sites because they describe the SURROUNDING
ordering, not the helpers themselves. Helpers are documented to
defer those decisions back to callers.

Behavior unchanged. Filed bucket: F2 perf cleanup PR A (R9 done /
W11 this commit / W12 + R10 to follow).

Test sweep: 28/28 mcp-transport-pool.test.ts pass; typecheck clean.

* refactor(core): F2 PR A W12 — SessionMcpView precompute filter Sets

W12 (filed as F2 follow-up from #4336 review): `applyTools` /
`applyPrompts` precompute `excludeSet` + `includeSet` once per pass
instead of scanning `cfg.includeTools` / `cfg.excludeTools` arrays
inside every per-tool iteration.

Pre-fix the per-tool predicate (`passesSessionFilter`) walked both
arrays for every snapshot entry → O(M × N) per `applyTools` call.
With M tools × N filter entries, typical M=5-20 / N=2-5 case
finishes in microseconds either way; the win is data-structure
correctness and code clarity, not perceived perf.

`passesSessionFilter` / `passesSessionPromptFilter` (the array-
based predicates) stay exported and unchanged for unit tests + any
caller wanting to test a single name without paying Set construction.
The bulk path uses two new private helpers `compileNameFilter` +
`compiledFilterAccepts` whose Sets live on the `applyTools` /
`applyPrompts` stack frame.

Same semantics: `excludeTools` is direct-equality match (no parens
strip — pre-F2 behavior preserved); `includeTools` strips the first
`(...)` suffix so `toolName(args)` matches `toolName`.

Filed bucket: F2 perf cleanup PR A (R9 + W11 done / W12 this commit
/ R10 to follow).

Test sweep: 13/13 session-mcp-view.test.ts pass; typecheck clean.

* perf(core): F2 PR A R10 / R23 T7 — pid-descendants ps snapshot + pgrep fallback

R10 / R23 T7 (filed as F2 follow-up from #4336 review): the Linux
/ macOS pid-descendant enumeration moves from per-pid `pgrep -P
<pid>` BFS (one subprocess fork per node visited) to a single
`ps -A -o pid=,ppid=` snapshot followed by an in-memory tree walk
over `Map<ppid, pid[]>`. Windows analog: single `Get-CimInstance
Win32_Process | ConvertTo-Csv` snapshot of all `(ProcessId,
ParentProcessId)` rows replaces per-pid
`Get-CimInstance -Filter "ParentProcessId=$p"` BFS.

Two motivations:
  1. **Fork count**: typical `npx → tool` / `uvx → tool` wrapper
     trees are 2-3 levels deep with B=1-3 children per node →
     pre-fix BFS forked ~5-10 subprocesses per pool-shutdown call.
     Post-fix: exactly 1 fork regardless of tree depth.
  2. **Snapshot consistency**: pre-fix BFS walked the table level
     by level; a child that forked between two adjacent BFS levels
     could be missed (we'd see the child but query its
     descendants AFTER the new fork). The snapshot path captures
     the table at one instant; new descendants forked after the
     snapshot are tolerated by the existing ESRCH-tolerant
     SIGTERM loop.

Caveats:
  - `ps -A -o pid=,ppid=` is POSIX standard (macOS / Linux /
    *BSD), but BusyBox `ps` <v1.28 (2018) doesn't support `-o`.
    Distroless containers may not have `ps` at all. To preserve
    behavior on those edge platforms, the legacy per-pid `pgrep`
    BFS is retained as a fallback (`listDescendantPidsUnixPgrepFallback`).
    Same retention on Windows for the per-pid filter path.
  - Snapshot path uses `maxBuffer: 8MB` to cover ~250k-process
    pathological hosts. Default 1MB would clip at ~30k processes.
  - `MAX_DESCENDANTS = 256` / `MAX_DEPTH = 8` caps preserved on
    both snapshot + fallback paths.
  - Snapshot scans the entire host process table (not just the
    target subtree). On the typical 200-500 process developer
    machine this parses in <10ms; the win over BFS is real but
    not order-of-magnitude — ~2x improvement, not 100x. PR A's
    motivation framing is "fork hygiene + consistency", not raw
    perf.

Empty-result detection: snapshot path tracks `parsedRows`. If the
ps/CIM tool runs successfully but produces 0 parseable rows
(BusyBox without `-o` echoing usage, AppLocker truncating CIM
output, etc.), we throw — the outer catch falls back to the
per-pid path. A genuine "root has no children" case parses many
rows and just returns empty from the walk. So the
"no-children-found" semantics are preserved across both paths.

Test gate update: pre-fix `integration: spawn-and-enumerate` test
skipped on `CI === '1'` because pgrep wasn't available on
minimal CI runners. Post-fix `ps -A` is universally available on
non-distroless Linux/macOS — only the Windows skip remains.
6/6 pid-descendants tests pass including the now-active
integration spawn test.

Design doc (`docs/design/f2-mcp-transport-pool.md` §6.4 + the F2
follow-up table at lines 82-85) updated to reflect the snapshot
+ fallback shape, and to mark W11 / W12 / R9 / R10 as ✅ Done in
PR A with the per-fix commit refs.

This commit completes F2 cleanup PR A. Filed bucket order:
R9 (commit 0cb1eaa27) → W11 (commit 2d546efca) → W12 (commit
a4a855ab3) → R10 (this commit). Issue #4175 item 7 "F2 post-
merge cleanup PRs": PR A done; PR B (W93 + W133-a + W134) and
PR C (W133-c SDK breaking) to follow as separate clusters.

Test sweep: 287/287 F2 + cli pass; ESLint clean; typecheck clean
(core + cli). Integration test on macOS local runs the new
snapshot path successfully.

* refactor(core): F2 PR A R2 — wenshao followup (visited set + dedup predicate)

Two Suggestions from wenshao's first PR #4411 review pass (07:15Z),
both small and worth folding before merge:

PR-A-R2 #1 (pid-descendants.ts:309 — walkDescendants visited set):
  `walkDescendants`'s BFS lacked a `visited` set. If the snapshot
  captures a PID-reuse cycle — rare but possible on busy hosts with
  rapid pid churn between `ps -A`'s start and parse, where Linux
  wraparound can show a freed pid in a different parent's children
  list creating an A→B / B→A cycle — pre-fix BFS would revisit nodes
  and fill the MAX_DESCENDANTS=256 quota with duplicate entries,
  starving legitimate descendants. Pre-PR-A the per-pid `pgrep` BFS
  had the same theoretical issue but was less exposed (each
  `pgrep -P pid` call returns only DIRECT children; snapshot captures
  the whole tree at once, making cycles instantly visible).

  Fix: 3-LOC `Set<number>` add. `root` seeded into `visited` so a
  malformed snapshot listing root as a descendant of its own child
  doesn't re-enqueue root either.

PR-A-R2 #2 (session-mcp-view.ts:117 — predicate dedup):
  After W12, the exported `passesSessionFilter` /
  `passesSessionPromptFilter` still called `passesNameFilter` (the
  pre-W12 array-based implementation), while `applyTools` /
  `applyPrompts` used `compiledFilterAccepts(compileNameFilter(...))`.
  Two parallel implementations of the same predicate — future change
  to one without the other would silently diverge:
    - the exported function's tests (passesSessionFilter unit tests)
      would still pass
    - the production filter path in applyTools/applyPrompts would
      behave differently

  Reviewer also noted `passesSessionPromptFilter` had zero callers
  in production code or tests after W12 — `applyPrompts` no longer
  references it. Kept the export rather than deleting it (matches
  the `passesSessionFilter` shape for symmetry + the F3 audit-path
  comment block earmarks both as the replay predicates), but routed
  both through `compiledFilterAccepts(compileNameFilter(...))` so
  there is a single source of truth. Set construction is per-call
  for these exports (negligible for unit-test / one-off probes);
  the bulk paths in `applyTools` / `applyPrompts` still construct
  ONE filter per pass via the original W12 code path.

`passesNameFilter` (the standalone array-based helper) deleted —
its only callers were the two exports, which now use the compiled
path. Public-API surface unchanged: the two exported functions
keep their signatures and semantics.

Test sweep: 19/19 pid-descendants + session-mcp-view tests pass;
typecheck + ESLint clean.

Continues commit chain: f05917071 (R9) → 20d2f1b90 (W11) →
6cf18f641 (W12) → 2a41c6fae (R10) → this (R2 followups).

* fix(core): F2 PR A R3 T3 — Windows CSV delimiter locale fix

`ConvertTo-Csv -NoTypeInformation` honors the system locale's list
separator on PowerShell 5.1. On German / French / Dutch / Italian /
... locales the separator is `;` not `,`, so the regex
`^"(\d+)","(\d+)"$` in `snapshotProcessTreeWin` never matched →
`parsedRows === 0` → snapshot threw → fell back to the per-pid CIM
filter path with ~0.5-1s extra PowerShell startup latency per
descendant on every pool shutdown.

Fix: 1-LOC `-Delimiter ","` on `ConvertTo-Csv`. Forces comma
regardless of locale or PowerShell version. PowerShell 7+ defaults
to comma already; 5.1 (the Windows-bundled version most users have
without explicit upgrade) honored locale. The explicit delimiter
makes both consistent.

Skipped wenshao's companion Suggestion T4 (test coverage for
walkDescendants MAX_DESCENDANTS / MAX_DEPTH caps) as F2 hardening
follow-up — the caps are simple 2-line guards exercisable by
inspection; ~50 LOC of mock infrastructure isn't commensurate
with the regression risk on currently-stable defensive code,
and (per the issue #4175 follow-up bucket) we keep dedicated
test-coverage work out of perf-cleanup PRs.

Continues commit chain: f05917071 (R9) → 20d2f1b90 (W11) →
6cf18f641 (W12) → 2a41c6fae (R10) → ced5d62b0 (R2) → this (R3 T3).

Test sweep: 6/6 pid-descendants tests pass; typecheck + ESLint clean.

* refactor(acp-bridge): F1 test split — lift bridge.test.ts (6861 LOC) to acp-bridge (#4445)

* refactor(acp-bridge): rename httpAcpBridge.test.ts -> bridge.test.ts (git mv)

Pure file rename; zero content change. Follow-up commits will:
- extract FakeAgent + makeChannel + makeBridge into testUtils.ts
- split 4 daemon-host integration tests back to cli/daemonStatusProvider.test.ts

Part of #4175 F1 test split (deferred from #4334).

* refactor(acp-bridge): extract testUtils + split daemon-host tests to cli (#4175 F1)

Net mechanical extraction following commit 2aff1a4d1 (pure git mv of
httpAcpBridge.test.ts -> bridge.test.ts). After this commit
`@qwen-code/acp-bridge` owns the bulk of the lifted bridge test
suite, and cli keeps only the 4 daemon-host integration tests that
need to wire `createDaemonStatusProvider()`.

Changes:

1. New `packages/acp-bridge/src/internal/testUtils.ts` (~280 LOC):
   FakeAgent, FakeAgentOpts, ChannelHandle, makeChannel, makeBridge
   (no statusProvider default — acp-bridge tests exercise the
   no-provider fallback path), WS_A/WS_B/SESS_A constants. Marked
   @internal; lives under `internal/` matching the existing
   `stderrLine.ts` package-private convention. Exposed via new
   `./internal/testUtils` subpath in package.json exports.

2. `packages/acp-bridge/src/bridge.test.ts` shrinks from 6861 ->
   ~6400 LOC: fixtures replaced with named imports from
   `./internal/testUtils.js`; cross-package import
   `from './daemonStatusProvider.js'` removed (4 daemon-host tests
   moved out); ACP SDK + bridgeErrors / workspacePaths / bridge /
   channel / bridgeTypes imports split into multiple statements
   reflecting actual post-F1 provenance.

3. New `packages/cli/src/serve/daemonStatusProvider.test.ts`
   (~240 LOC, 4 tests): wires real `createDaemonStatusProvider()`
   through a cli-side `makeBridge` wrapper to assert end-to-end
   daemon env / preflight cells. Imports
   `createHttpAcpBridge` via the `./httpAcpBridge.js` re-export
   shim — doubles as a shim surface smoke check.

Verification:
- acp-bridge: 291/291 tests pass (177 in bridge.test.ts).
- cli: daemonStatusProvider.test.ts 4/4 pass; full cli suite 6742/6767
  green (16 pre-existing failures in AuthDialog / memoryDiagnostics /
  useAtCompletion — all on `daemon_mode_b_main` baseline, last
  modified by commits predating this branch).
- Tests counts pre-split: 181 in httpAcpBridge.test.ts;
  post-split: 177 in bridge.test.ts + 4 in daemonStatusProvider.test.ts
  = 181 (parity preserved).

Part of #4175 F1 test split (deferred from #4334).

* refactor(acp-bridge): self-review round 1 — vitest alias + doc/comment polish

Five code-reviewer findings folded in on top of e97282f30:

S1 [Suggestion] — Test-utils ships to npm + cli reads stale dist.
  Added `packages/cli/vitest.config.ts:resolve.alias` mapping
  `@qwen-code/acp-bridge/internal/testUtils` → the .ts source. The
  package subpath export is RETAINED (required for TypeScript
  `nodenext` to resolve types — it won't fall back to tsconfig
  paths once exports rejects a subpath). Dual-channel approach
  documented in the testUtils JSDoc, including the alpha-stage 0.0.1
  tradeoff that the file still ships in dist (stripInternal /
  .npmignore deferred).

S2 [Suggestion] — Stale wording "two tests" in narrative comment.
  bridge.test.ts split-marker now correctly says "4 fallback tests"
  (no-provider × 2 surfaces + throwing-provider × 2 surfaces).

S3 [Suggestion] — "Shim smoke check" only half-applied.
  daemonStatusProvider.test.ts now routes `BridgeOptions` and
  `HttpAcpBridge` types through `./httpAcpBridge.js` shim too
  (alongside `createHttpAcpBridge`), so the entire factory surface
  the cli tests rely on flows through the F1 re-export shim.

N1 [Nit] — Asymmetric split-marker phrasing.
  Both markers now describe the 4 moved tests by surface
  (env real / preflight idle / preflight merged-live /
  preflight extMethod-throws) rather than "1 of" + "3 more".

N2 [Nit] — testUtils "the suite" ambiguity.
  makeChannel JSDoc now references `bridge.test.ts` explicitly
  instead of "the suite" (which was unambiguous pre-split when
  helpers + 10 createInMemoryChannel sites lived in the same file).

Verification: 291/291 acp-bridge tests pass; 4/4 cli daemon
integration tests pass; tsc clean on both packages (pre-existing
server.ts errors on baseline unchanged); eslint --max-warnings 0
clean on all 4 touched files.

* docs(cli): self-review round 2 — fix stale vitest.config.ts alias comment

Round 2 reviewer caught a 3-way contradiction in the round 1 docs:
- vitest.config.ts said: alias replaces the export, internal/* stays
  unpublished (matches stderrLine convention).
- package.json: subpath export IS declared.
- testUtils.ts JSDoc: both channels intentionally retained,
  testUtils ships in dist.

Round 1 explicitly chose to retain the export because TS `nodenext`
won't fall back to tsconfig `paths` once `exports` rejects a
subpath; the alias only serves to short-circuit *runtime* resolution
so cli reads src/ not dist/. Rewriting the vitest.config.ts comment
to reflect that dual-channel reality (and pointing readers at
testUtils.ts for the full rationale).

* fix(acp-bridge): #4445 round 3 fold-in — 4 of 7 reviewer threads adopted

PR #4445 review pass — 4 adopt + 3 decline (declines replied
inline; not folded here):

ADOPTED:

T1 [copilot daemonStatusProvider.test.ts:136 — bridge.shutdown
   missing]: added `await bridge.shutdown()` to test 2 (preflight
   idle). Three of four tests already shut down; symmetry +
   future-proof if `createHttpAcpBridge` gains background work
   even when no channel was spawned.

T5 [wenshao testUtils.ts:92 — makeBridge naming collision]: cli-
   side helper renamed `makeBridge` -> `makeBridgeWithDaemonStatusProvider`
   (4 call sites in daemonStatusProvider.test.ts), JSDoc updated to
   reference the wenshao thread. testUtils.makeBridge stays as the
   canonical name used by ~100 tests in bridge.test.ts. A future
   contributor can no longer pick the wrong helper by accident.

T6 [wenshao testUtils.ts:32 — JSDoc mis-claims @internal tag matches
   stderrLine.ts convention]: fixed wording. stderrLine.ts uses prose
   only; @internal is an additional package-private signal, not a
   convention match. Also restructured the npm-leak paragraph to
   describe the new .npmignore-via-files-negation enforcement (T7).

T7 [wenshao package.json:70 — testUtils ships to npm]: switched
   `files: ["dist"]` -> `files: ["dist", "!dist/internal/testUtils.*",
   "!dist/**/*.test.*"]`. Wenshao's suggested `"test"` exports
   condition wasn't viable: vitest sets `vitest` not `test`, and
   gating on `vitest` would hide types from the cli's tsc compile.
   The negation-pattern files-field excludes the built testUtils
   from the publish surface while keeping the subpath export entry
   that TypeScript `nodenext` needs to resolve types. Verified via
   `npm pack --dry-run`: dist/internal/stderrLine.* still ships
   (production internal helper); dist/internal/testUtils.* +
   dist/**/*.test.* are excluded.

DECLINED (replied on PR threads, not folded here):

T2/T3 [copilot — `handles` array unused in tests 3/4]: bookkeeping
   matches the pre-split bridge.test.ts verbatim; cleanup is scope
   creep on this rename PR.

T4 [copilot — testUtils eager-imports createHttpAcpBridge,
   cross-copy identity risk]: cli daemonStatusProvider.test.ts uses
   its OWN local `makeBridgeWithDaemonStatusProvider` and never
   imports testUtils.makeBridge — the cross-copy concern isn't
   triggered. Premature abstraction on a test-only fixture.

Verification: 291/291 acp-bridge tests pass; 4/4 cli daemon tests
pass; tsc clean both packages; eslint --max-warnings 0 clean on
2 touched .ts files; `npm pack --dry-run` confirms publish-surface
exclusions.

* fix(core): F2 cleanup PR B — self-heal observability (W133-a + W134) (#4460)

* fix(core): F2 cleanup PR B — self-heal observability (W133-a + W134)

W93 declined as already satisfied by W1 fix in #4336 commit 6
(spawnEntry's catch already calls forceShutdown which runs the full
cleanup table — listener removal, timer clear, subscriber detach,
sweep+disconnect, onClosed eviction). Source-verified non-repro.

W133-a: McpClient.onerror now captures the error in a private
`lastTransportError` field (reset at each connect()); the W120
silent-drop block at mcp-pool-entry.ts:346 reads it via the new
`getLastTransportError()` getter and appends `: <error.message>` to
the lastError string on the emitted 'failed' event. Preserves the
literal "silent transport drop" prefix invariant for log-grep
backward compat — pre-fix marker stays a substring.

W134: sweepAndDisconnect now returns SweepResult instead of void —
{ pidSweepError?, disconnectError?, descendantsFound?,
descendantsSignaled? }. The silent-drop fire-and-forget caller chains
to inspect the result and emits a structured warn log when either
pid-sweep threw OR sigtermPids partially signaled (signaled < found)
— surfaces orphan-process pressure without inflating PR scope (no
new SSE event or SDK reducer state; deferred to W134-followup if
maintainers want metrics).

forceShutdown / doRestart sweep callers ignore the return value (JS
implicit-void at await sites preserves behavior).

4 new tests in mcp-transport-pool.test.ts covering W133-a happy path
+ fallback (no prior onerror) + W134 pidSweepError + W134
partial-signal failure modes. Module-mocks pid-descendants.js for
controllable sweep behavior, and debugLogger.js to observe warn
calls (production logger is session-gated and a no-op in tests).
Singleton-stub debugLogger mock so production module-load
`createDebugLogger('McpPool:Entry')` and the test's retrieval get
the same vi.fn instances.

Verification:
- tsc clean: packages/core, packages/cli (server.ts pre-existing
  errors unchanged)
- F2 transport-pool: 32/32 pass (28 pre-existing + 4 new)
- mcp-client: 46/46 pass
- eslint --max-warnings 0 clean on 3 touched files

Part of #4175 #4336 follow-up bucket.

* fix(core): #4460 round 1 fold-in — 4 copilot doc/comment threads adopted

T1 [copilot mcp-pool-entry.ts:116 — stale line ref in SweepResult JSDoc]:
  replaced `mcp-pool-entry.ts:383` with stable method-anchor reference
  to the W120 silent-drop block inside `statusChangeListener`. Line
  numbers drift on every edit; method names don't.

T2 [copilot mcp-pool-entry.ts:453 — `?? 0` ambiguous in warn payload]:
  silent-drop warn log now prints `descendantsFound=unknown` and
  `descendantsSignaled=unknown` when the values are undefined (only
  reachable in the pidSweepError branch — sweep threw before
  assignment). Operators triaging the warn can now distinguish
  "sweep succeeded but found 0 descendants" from "sweep itself
  threw, count is genuinely unmeasured". Locked in via a new
  assertion in the W134 pidSweepError test.

T3 [copilot mcp-client.ts:116 — brittle line refs in lastTransportError
  JSDoc]: replaced `mcp-pool-entry.ts:346` and `mcp-client.ts:130`
  with stable method/block names (the `statusChangeListener` silent-
  drop block; the `client.onerror` arrow inside connect()). Same
  fix applied to the parallel comment in mcp-transport-pool.test.ts:730
  for consistency.

T4 [copilot mcp-transport-pool.test.ts:797 — singleton-stub mock comment
  contradictory]: rewrote the comment to unambiguously describe what
  the mock DOES (factory body runs once; inner arrow returns the same
  object on every call) instead of the prior hypothetical phrasing
  ("Returning a fresh object would have...") which read as a
  description of current behavior at first glance.

All 4 are doc/comment fixes — zero behavior change apart from the
T2 string format ('unknown' instead of '0'). Verified:
- 32/32 mcp-transport-pool.test.ts pass
- tsc clean on packages/core
- eslint --max-warnings 0 clean on 3 touched files

* fix(core): #4460 round 2 fold-in — remove dead SweepResult.disconnectError field

T5 [wenshao mcp-pool-entry.ts:134 — `disconnectError` is dead data]:
  glm-5.1 review caught that the field was populated when
  `client.disconnect()` threw (line 844) but no consumer ever read
  it — the silent-drop `.then()` handler gated only on
  `pidSweepError` and partial-signal; `forceShutdown` and `doRestart`
  ignore the return; no test asserted on it.

Removed the field from `SweepResult` and the assignment in the
disconnect catch. The pre-existing `debugLogger.error(`client.disconnect
failed for ...`)` inside `sweepAndDisconnect` already gives operators
the signal — adding it to the outer silent-drop warn would have been
duplicate noise. If a future consumer needs to gate logic on disconnect
failures, re-add the field + reader at that point.

Verification: 32/32 mcp-transport-pool.test.ts pass; tsc + eslint
clean on the touched file.

* feat(sdk/daemon-ui): unified completeness follow-up to #4328 (#4353)

* feat(sdk/daemon-ui): expand event coverage to 28+ daemon event types (PR-A)

Closes the "12+ daemon events fall through to debug" gap surfaced in the PR
the daemon currently emits (Stage 1 + Wave 3-4), so renderers stop having
to peek at `rawEvent.data` for known event categories.

Session-meta:
- session.metadata.changed (from session_metadata_updated)
- session.approval_mode.changed (from approval_mode_changed)
- session.available_commands (from available_commands_update; upgraded
  from a status-text fallback to a typed event carrying the command list)

Workspace state (Wave 3-4):
- workspace.memory.changed
- workspace.agent.changed
- workspace.tool.toggled
- workspace.initialized
- workspace.mcp.budget_warning
- workspace.mcp.child_refused
- workspace.mcp.server_restarted
- workspace.mcp.server_restart_refused

Auth device-flow (Wave 4 OAuth, RFC 8628):
- auth.device_flow.started
- auth.device_flow.throttled
- auth.device_flow.authorized
- auth.device_flow.failed (carries DaemonAuthDeviceFlowSdkErrorKind)
- auth.device_flow.cancelled

- `DaemonUiErrorEvent.errorKind?: DaemonErrorKind` — closed-enum error
  category propagated from daemon's typed-error taxonomy. Renderers can
  branch on errorKind for "retry auth" vs "check file path" affordances
  instead of regex-matching `text`.
- `DaemonUiToolUpdateEvent.provenance?: DaemonUiToolProvenance` +
  `.serverId?` — closed enum ('builtin' | 'mcp' | 'subagent' | 'unknown').
  Falls back to the `mcp__<server>__<tool>` naming heuristic when the
  daemon doesn't stamp provenance explicitly. Unblocks UI namespace
  dispatch without string-matching toolName.

Session-meta / workspace / auth events do NOT push transcript blocks.
They are intentional sidechannel observations: `lastEventId` advances
(monotonic invariant preserved), but the chat-stream transcript stays
focused on user/assistant/tool/shell/permission content. Renderers
consume them via selectors (introduced in follow-up PRs).

All new event types produce short structured lines in
`daemonUiEventToTerminalText` for tail-style debug consumers. Web/IDE
renderers should consume the typed events directly via subscription.

40/40 tests pass. New tests verify:
- All 16 new event types normalize correctly
- Malformed payloads fall back to debug without leaking raw data
  (`secret` field never appears in fallback text)
- MCP tool provenance heuristic (`mcp__github__create_issue` →
  provenance='mcp', serverId='github')
- errorKind propagation on session_died / stream_error
- Reducer is no-op on new event types; lastEventId still advances

This is PR-A of the unified-renderer-layer follow-up series:
- PR-A (this commit) — event coverage + closed-enum schema
- PR-B — server-side timestamps + ordering refactor
- PR-C — multimodal content + tool preview taxonomy
- PR-D — render contract (toMarkdown / toHtml / toPlainText) + adapter
  conformance test framework
- PR-E — reducer state machine (subagent / progress / current tool /
  cancellation propagation)

See https://github.com/QwenLM/qwen-code/pull/4328#issuecomment-4494179724
for the full proposal.

Generated with AI

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* feat(sdk/daemon-ui): server timestamps + event-id-based ordering (PR-B)

Closes the "时间定义不标准" gap surfaced in the PR #4328 review:
- Client-side `Date.now()` drifts across clients
- No daemon-authoritative timestamp propagated to UI
- Out-of-order replay events get fresher `state.now` than originals,
  breaking `createdAt` ordering

- `DaemonUiEventBase.serverTimestamp?: number` — daemon-authoritative
  wall-clock timestamp extracted from envelope.
- `DaemonTranscriptBlockBase.serverTimestamp?: number` + `clientReceivedAt: number`.
- `createdAt` preserved as `@deprecated` alias for `clientReceivedAt`
  (backward compat for code written before this PR).

`extractServerTimestamp` looks at three candidate envelope locations:

1. `event.serverTimestamp` (preferred when daemon adds it)
2. `event._meta.serverTimestamp` (Anthropic-style metadata convention)
3. `event.data._meta.serverTimestamp` (sessionUpdate nested location)

The SDK is ready to consume serverTimestamp WHEN daemon emits it, without
requiring a coordinated SDK release. Undefined when daemon doesn't emit
(current state) — graceful degradation to client-clock ordering.

`selectTranscriptBlocksOrderedByEventId(state)` — returns blocks sorted by:

1. `eventId` (daemon-monotonic SSE cursor) — primary key
2. `serverTimestamp` (daemon wall clock) — fallback for synthetic frames
3. `clientReceivedAt` (local clock) — last resort

Use this when displaying long sessions where event id 5 may arrive AFTER
event id 7 (typical in SSE replay-after-reconnect).

`formatBlockTimestamp(block, opts)` — formats the most authoritative
timestamp on a block using `Intl.DateTimeFormat`. Prefers
`serverTimestamp` over `clientReceivedAt` for cross-client consistency.
Accepts locale / timeZone / dateStyle / timeStyle.

Daemon needs to stamp `_meta.serverTimestamp` on every SSE envelope. This
SDK PR is ready to consume it the moment the daemon ships the field; no
coordination needed.

- serverTimestamp extraction from all three envelope locations
- Defaults undefined when envelope has none
- `selectTranscriptBlocksOrderedByEventId` sorts mixed-arrival events by
  eventId (replay scenario)
- `formatBlockTimestamp` prefers serverTimestamp; returns localized string

PR-B of the unified follow-up to PR #4328 (PR-A + PR-B + PR-C + PR-D +
PR-E in one branch).

Generated with AI

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* feat(sdk/daemon-ui): reducer state machine — currentTool / approvalMode / cancellation propagation (PR-E)

Closes the "reducer state machine 设计缺漏" gap surfaced in the PR #4328 review:
- No `currentTool` — UI scans `blocks[]` to find the running tool
- No mirrored approval mode — UI walks events to badge "plan"/"yolo"
- Cancellation does not propagate — in-flight tool blocks stuck at
  'in_progress' forever when the parent prompt is cancelled

## State additions (sidechannel, no transcript blocks)

`DaemonTranscriptSidechannelState`:
- `currentToolCallId?: string` — toolCallId of the in-flight tool
- `approvalMode?: string` — mirrored from session.approval_mode.changed
- `toolProgress: Record<string, { ratio?, step? }>` — per-tool progress
  shape (daemon-side emission of `tool.progress` events pending)

## Reducer behavior

### `tool.update` events

`IN_FLIGHT_TOOL_STATUSES` = { pending, confirming, running, in_progress }
`TERMINAL_TOOL_STATUSES` = { completed, success, failed, error, canceled, cancelled }

- Tool enters in-flight: set `currentToolCallId = event.toolCallId`
- Tool enters terminal: clear `currentToolCallId` if it matches
- Unknown status (forward-compat): leave pointer untouched

This avoids the failure mode where a future daemon-emitted status like
`'paused'` would silently mark unknown states as either in-flight or
terminal incorrectly.

### `session.approval_mode.changed`

Mirror `event.next` onto `state.approvalMode`. Renderers can render a
mode badge ("plan" / "default" / "auto-edit" / "yolo") with a single
selector call, no event-stream walking.

### `assistant.done` with `reason === 'cancelled'`

`propagateCancellationToInFlightTools` walks every tool block whose
status is still in-flight and force-sets it to 'cancelled'. The daemon
does not guarantee terminal `tool_call_update` for every in-flight tool
when the parent prompt is cancelled, so this propagation prevents UI
spinners from spinning forever.

`currentToolCallId` is also cleared in the same call.

Non-cancellation `assistant.done` (e.g., `reason: 'end_turn'`) does NOT
propagate — in-flight tools remain in-flight until the daemon emits
their terminal update naturally.

## Selectors

- `selectCurrentTool(state)` — returns the running tool block, or undefined
- `selectApprovalMode(state)` — returns the mirrored approval mode
- `selectToolProgress(state, toolCallId)` — per-tool progress query

All exported from `@qwen-code/sdk/daemon`.

## Scope deliberately deferred

Subagent nesting (`parentBlockId` / `delegationId` / `DaemonSubagentTranscriptBlock`)
is NOT in this PR. The shape needs design discussion (how to project nested
events; whether to bake delegation tracking into transcript or sidechannel).
PR-D / PR-F follow-up.

## Test coverage (51/51 pass)

- currentToolCallId set on enter, cleared on terminal
- approvalMode mirrors changes
- Cancellation marks in-flight tools 'cancelled', leaves completed alone
- Unknown status does NOT clear currentToolCallId (forward-compat)
- Non-cancellation `assistant.done` does NOT propagate

## Roadmap

PR-E of the unified follow-up to PR #4328 (PR-A + PR-B + PR-E in this
branch; PR-C / PR-D pending).

Generated with AI

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* feat(sdk/daemon-ui): tool preview taxonomy + multimodal content extraction (PR-C)

Closes two related gaps surfaced in the PR #4328 review:
- `DaemonToolPreview` had only 4 kinds — UI fell back to `key_value` /
  `generic` for tools that deserved structured display
- `getTextContent` silently dropped non-text content (image / audio /
  resource), so multimodal conversations vanished from the UI

`DaemonToolPreview` extends from 4 to 8 variants:

- `file_diff` — `{ path, oldText?, newText?, patch? }` — file edit tools
  (Anthropic-style `oldText/newText`, aider-style `patch`, write-style
  `newText` alone)
- `file_read` — `{ path, range?: [start, end] }` — file read tools, with
  range extracted from `lineRange` tuple OR `offset/limit` pair
- `web_fetch` — `{ url, method? }` — HTTP fetch tools (requires URL
  with scheme to avoid false positives on relative paths)
- `mcp_invocation` — `{ serverId, toolName, argsSummary? }` — MCP server
  tool calls, identified via `mcp__<server>__<tool>` naming convention
  (same heuristic as PR-A `DaemonUiToolUpdateEvent.provenance`)

Detector order matters — MCP wins first (most specific), then file_diff,
file_read, web_fetch, then the existing command / key_value fallbacks.

New helper `extractContentPart(value): DaemonUiContentPart | undefined`
returns a discriminated union:

```ts
type DaemonUiContentPart =
  | { kind: 'text'; text: string }
  | { kind: 'image'; mediaType: string; source: { url?, data? } }
  | { kind: 'audio'; mediaType: string; source: { url?, data? } }
  | { kind: 'resource'; uri: string; mediaType?, description? };
```

The existing `getTextContent` is preserved for backward compat. Renderers
that need to surface non-text content (web UI thumbnails, IDE attachment
chips) now have a typed shape to consume.

- Wiring `extractContentPart` into the normalizer / reducer so text
  blocks accumulate `parts: DaemonUiContentPart[]` alongside `text`
  (additive shape change requires render contract coordination — PR-D).
- 5 additional tool preview kinds (image_generation / code_block /
  tabular / subagent_delegation / search) — useful but not urgent;
  current 8 kinds cover the typical agent flows.

- file_diff detection from Anthropic / aider / write shapes
- file_read with lineRange tuple AND offset+limit pair
- web_fetch with method, REJECTS relative paths (no scheme)
- mcp_invocation with serverId + toolName extraction
- Detector priority: MCP wins over file_diff on conflicting shapes
- extractContentPart for text / image (url) / audio (data) / resource
- Unknown content type returns undefined (skip rather than synthesize)
- Image without source returns undefined (defensive)

PR-C of the unified follow-up to PR #4328 (PR-A + PR-B + PR-E + PR-C in
this branch; PR-D render contract pending).

Generated with AI

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* feat(sdk/daemon-ui): render contract — markdown / HTML / plain text helpers (PR-D)

Closes the "render 契约只覆盖 terminal" gap surfaced in the PR #4328 review:

> PR ships `daemonUiEventToTerminalText` for terminal. Web/IDE/channel
> adapters each roll their own projection. No shared contract → adapter
> divergence is inevitable.

## New helpers

```ts
daemonBlockToMarkdown(block, opts?): string  // GFM-compatible
daemonBlockToHtml(block, opts?): string      // conservatively escaped HTML
daemonBlockToPlainText(block, opts?): string // for copy-paste / logs
daemonToolPreviewToMarkdown(preview, opts?): string
```

All three respect the same `kind` discrimination so adapters can switch
between them without touching call sites.

## Per-kind projection

For each `DaemonTranscriptBlock['kind']`:

- `user` / `assistant` / `thought` — plain text with role labels
- `tool` — header with toolName + structured preview + status badge
- `shell` — fenced code block, stream-discriminated (stdout vs stderr)
- `permission` — title + options list + resolved/pending indicator
- `status` / `debug` / `error` — semantic class / role (error → role=alert)

For each `DaemonToolPreview['kind']`:

- `ask_user_question` — question + options as bullet list
- `command` — fenced bash with optional cwd comment
- `file_diff` — unified diff in fenced code block (oldText/newText OR patch)
- `file_read` — `path (lines N-M)` line
- `web_fetch` — `METHOD url` line
- `mcp_invocation` — `serverId::toolName` with args summary
- `key_value` — bullet list
- `generic` — emphasized summary

## Security

- Default HTML sanitizer escapes `<`, `>`, `&`, `"`, `'` and FIRST strips
  ANSI/control sequences via `sanitizeTerminalText` (defense against
  agent-emitted escape codes in HTML output).
- Custom sanitizer hook for consumers wanting markdown→HTML pipelines
  (markdown-it + DOMPurify, etc.).
- `sanitizeUrls` option strips token-like query params (`token=`, `key=`,
  `x-amz-`, etc.) from URLs in `web_fetch` previews.
- `maxFieldLength` truncation defaults 8192, prevents pathological
  rendering on huge content.

## Adapter conformance (out of scope for this commit)

The conformance test framework (fixture corpus + `runAdapterConformanceSuite`)
mentioned in PR-D scope is deferred to a follow-up. The render helpers
here are the precondition — once stable, the conformance framework can
use them as the reference projection.

## Test coverage (77/77 pass)

- All 9 block kinds render in markdown (verified for user/assistant/tool/
  shell/permission/error specifically)
- file_diff renders as unified diff with old/new lines
- mcp_invocation renders as `server::tool` format
- HTML escapes XSS (`<script>` → `&lt;script&gt;`)
- HTML strips terminal escape sequences before escaping
- Error blocks emit `role="alert"` for screen readers
- plain text drops markdown delimiters
- maxFieldLength truncates with ellipsis
- sanitizeUrls strips token query params
- Custom sanitizer hook works

## Roadmap

PR-D of the unified follow-up to PR #4328 — completes the 5-PR series
(A: event coverage, B: time schema, E: state machine, C: tool preview +
content extraction, D: render contract).

Generated with AI

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* feat(sdk/daemon-ui): 5 additional tool preview kinds — taxonomy complete (PR-F)

Closes the "5 additional preview kinds" item in PR #4353's TODO §A
(SDK-only work).

## New preview kinds (8 → 13)

- `code_block` — `{ language?, code, origin? }` — REPL / formatter /
  generator output, fenced as `\`\`\`<language>` in markdown
- `search` — `{ query, resultCount?, top? }` — grep / ripgrep / find /
  glob results with up to 5 top hits
- `tabular` — `{ columns, rows, totalRows? }` — structured table output
  (50-row cap with `totalRows` truncation indicator); supports both
  `columns: string[] + rows: unknown[][]` explicit shape and legacy
  `data: Array<Record<>>` shape (auto-infers columns from first row)
- `image_generation` — `{ prompt, thumbnailUrl?, model? }` — dall-e /
  diffusion / imagen / flux / sora style tools
- `subagent_delegation` — `{ agentName, task, parentDelegationId? }` —
  Anthropic-style Task tool and similar sub-agent dispatchers

## Detector priority

Order matters — most specific wins. New detectors slot in between
`mcp_invocation` and `file_diff`:

```
mcp_invocation > subagent_delegation > search > image_generation
  > file_diff > file_read > web_fetch > code_block > tabular
  > command > key_value > generic
```

Rationale: subagent / search / image generation are most discriminable
(distinct toolName patterns); file ops next; code_block / tabular last
because their shapes (`code:`, `columns:`) can appear in other tools.

## Render projections

Both `daemonToolPreviewToMarkdown` and the plain-text rendering paths
extended with cases for all 5 new kinds:

- code_block: fenced markdown code block with language tag
- search: bold header + GFM bullet list of top results
- tabular: GFM pipe table with header / separator / body / truncation hint
- image_generation: bold header + blockquoted prompt + embedded markdown
  image (URL sanitization respected via `sanitizeUrls` opt)
- subagent_delegation: bold delegate-arrow header + blockquoted task +
  optional parent delegation reference

## Test coverage (91/91 pass, +14 new)

- Each detector with positive case
- Detector priority verified: subagent_delegation wins over file_diff
  when toolName='Task' has both subagent + file-edit fields
- Tabular row cap (50) + totalRows stamping for truncated data
- Legacy data: Array<Record<>> auto-column inference
- Each render projection with structural assertions (markdown table
  format, image embed, bullet lists)

## Roadmap

PR-F of the unified follow-up to PR #4328. Brings the preview taxonomy
to 13 kinds covering: file ops (3), web (1), code/data (2), media (1),
agent control (2 — ask_user_question + subagent_delegation), MCP (1),
search (1), generic fallbacks (2).

Generated with AI

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* feat(sdk/daemon-ui): adapter conformance framework + fixture corpus (PR-G)

Closes the "Adapter conformance test framework" item in PR #4353's TODO §A.
Lets any daemon-ui adapter (TUI / web / IDE / channel / mobile) validate
that it projects a fixed corpus of daemon SSE event streams to the same
semantic shape — catches projection drift before it reaches users.

## API surface

```ts
interface DaemonUiAdapterUnderTest {
  reduce(events: readonly DaemonUiEvent[]): unknown;
  renderToText(state: unknown): string;
}

interface DaemonUiConformanceFixture {
  name: string;
  description: string;
  envelopes: DaemonEvent[];           // raw daemon envelopes
  expectedContains: string[];          // phrases the rendered text MUST contain
  expectedAbsent?: string[];           // phrases that MUST NOT appear
  normalizeOptions?: { ... };          // forward-compat normalize opts
}

runAdapterConformanceSuite(adapter, opts?): ConformanceSuiteResult
DAEMON_UI_CONFORMANCE_FIXTURES: ReadonlyArray<DaemonUiConformanceFixture>
```

## Design

**Format-agnostic assertion**: adapters can render to ANSI / HTML /
markdown / JSX — the framework only inspects plain text via
`renderToText`. Catches semantic divergence (missing user message,
wrong tool status, leaked secret) without forcing identical formatting.

**Embedded fixture corpus** (no fs reads — works in browser bundle):
- `simple-chat` — user/assistant streaming flow
- `tool-call-lifecycle` — running → completed transition
- `file-edit-diff` — file_diff preview surfacing
- `mcp-invocation` — MCP serverId/toolName extraction via heuristic
- `permission-lifecycle` — request + resolved with outcome
- `mcp-budget-warning` — Wave 3 event (adapter must observe but rendering
  is its choice)
- `cancellation-propagates` — tool block status flows
- `malformed-payload-redaction` — uses `includeRawEvent: true` to verify
  even a debug-mode adapter doesn't leak `token: secret-do-not-leak`
- `auth-device-flow-success` — Wave 4 OAuth events
- `available-commands-typed-event` — PR-A upgrade from status text

Per-fixture `expectedContains` and `expectedAbsent` describe the
content contract independently of format.

## Suite result

```ts
{
  passed: number,
  failed: ConformanceFailure[],   // each carries missing + leaked + excerpt
  total: number,
}
```

**Does not throw** — caller asserts on `result.failed` so adapter test
suites can produce per-fixture diagnostics rather than a single opaque
exception.

## Filter options

`only` / `skip` allow targeted runs during adapter development:

```ts
runAdapterConformanceSuite(myAdapter, { only: ['simple-chat'] });
runAdapterConformanceSuite(myAdapter, { skip: ['cancellation-propagates'] });
```

## Test coverage (97/97 pass, +6 new)

- SDK reference adapter (reducer + markdown render) passes all fixtures
- SDK reference adapter (reducer + plainText render) also passes
- Buggy adapter (empty string output) fails every fixture with non-empty
  `expectedContains`
- Buggy adapter (raw event dump via JSON.stringify) caught by redaction
  fixture's `expectedAbsent`
- `only` filter narrows to a single fixture
- `skip` filter excludes named fixtures from the corpus

## Usage from adapter authors

```ts
// In your adapter's test file
import { runAdapterConformanceSuite } from '@qwen-code/sdk/daemon';
import { reduceForTui, renderTuiState } from './my-tui-adapter';

it('TUI adapter conforms to daemon UI corpus', () => {
  const result = runAdapterConformanceSuite({
    reduce: reduceForTui,
    renderToText: renderTuiState,
  });
  expect(result.failed).toEqual([]);
});
```

## Roadmap

PR-G of the unified follow-up to PR #4328. The corpus is intentionally
small (10 fixtures) but extensible — adapter authors can submit new
fixtures via additions to `DAEMON_UI_CONFORMANCE_FIXTURES` to lock in
regression coverage for edge cases their adapter encountered.

Generated with AI

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* feat(webui+sdk/daemon-ui): wire transcriptAdapter to SDK render contract (PR-H)

Closes the "WebUI transcriptAdapter migration" item in PR #4353's TODO §A.
Validates the PR-D render contract end-to-end on the real WebUI consumer.

`daemonTranscriptToUnifiedMessages(blocks, options?)` gains a new options
parameter:

```ts
interface DaemonTranscriptAdapterOptions {
  useMarkdown?: boolean;                  // default: false
  enrichToolDetailsWithPreview?: boolean; // default: false
}
```

Defaults preserve legacy behavior — existing callers see no change.

For `user` / `assistant` / `thought` blocks, content is projected via
SDK's `daemonBlockToMarkdown` instead of raw sanitized text. The WebUI's
markdown renderer (markdown-it) then gets:

- `**You**\n\n<content>` for user blocks (bold "You" label)
- Raw text for assistant blocks (markdown formatting in agent output
  passes through cleanly)
- `> *thought:* <text>` blockquote for thought blocks

For `tool` blocks, `rawOutput` is replaced with `daemonToolPreviewToMarkdown(block.preview)`.
This lets WebUI surfaces without per-preview-kind React components still
display:

- `file_diff` as a fenced unified diff
- `mcp_invocation` as `server::tool` with args summary
- `tabular` as GFM pipe table
- `search` as bullet list with match count
- `image_generation` as embedded markdown image
- `subagent_delegation` as delegate arrow + task quote

Renderers with per-kind components should leave this opt-out.

`packages/sdk-typescript/src/daemon/index.ts` was missing exports for
PR-D / PR-F / PR-G / PR-B / PR-E surface — WebUI's `@qwen-code/sdk/daemon`
import path uses the daemon root, not the ui/ sub-index. Added 15+
re-exports so consumers don't need to use the longer
`@qwen-code/sdk/daemon/ui/index.js` path.

Now exported from `@qwen-code/sdk/daemon` root:
- `daemonBlockToMarkdown` / `daemonBlockToHtml` / `daemonBlockToPlainText`
- `daemonToolPreviewToMarkdown`
- `extractContentPart` + `DaemonUiContentPart` type
- `formatBlockTimestamp` + `selectTranscriptBlocksOrderedByEventId`
- `selectCurrentTool` / `selectApprovalMode` / `selectToolProgress`
- `runAdapterConformanceSuite` + `DAEMON_UI_CONFORMANCE_FIXTURES`
- All associated types

`webui/src/daemon/transcriptAdapter.test.ts` mock blocks updated to include
`clientReceivedAt` (required field added in PR-B). Mechanical change —
every `createdAt: N` test fixture gets a matching `clientReceivedAt: N`.

- WebUI `npm run typecheck` — clean
- SDK `npm run typecheck` — clean
- SDK `vitest run test/unit/daemonUi.test.ts` — 97/97 pass
- WebUI transcriptAdapter test fixtures typecheck against updated
  DaemonTranscriptBlockBase schema

PR-H of the unified follow-up to PR #4328. Closes the WebUI migration
gap in TODO §A.

Generated with AI

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* docs(daemon-ui): add developer guide + migration cookbook (PR-I)

Closes the final "Documentation" item in PR #4353's TODO §A. Brings the
unified daemon UI surface to ~95% SDK-side completion.

## Files added

- `docs/developers/daemon-ui/README.md` — full API reference
  - Three-layer model (normalizer → reducer → render helpers)
  - Quick start with idiomatic event-loop pattern
  - Event taxonomy (28+ types categorized: chat-stream / session-meta /
    workspace / auth device-flow)
  - Render contract cookbook (markdown / HTML / plainText)
  - Tool preview taxonomy (13 kinds with use cases)
  - State selectors (currentTool / approvalMode / toolProgress / ordering)
  - Cancellation propagation explanation
  - Time semantics (eventId > serverTimestamp > clientReceivedAt
    precedence)
  - Adapter conformance usage
  - ErrorKind dispatch pattern
  - Tool provenance dispatch pattern
  - Forward-compat principles

- `docs/developers/daemon-ui/MIGRATION.md` — adapter author migration
  cookbook
  - Step-by-step recommended adoption order (9 steps, value-ranked)
  - Before/after code examples for each step
  - Backward-compat checklist (everything is additive — no breaking
    changes)
  - Cross-references to PR-A through PR-H commits

## Roadmap

PR-I of the unified follow-up to PR #4328. Documentation-only — no
code changes; no tests affected.

Generated with AI

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* fix(daemon-ui): address review feedback

* fix(daemon-ui): address review hardening feedback

* fix(daemon-ui): handle resync-required events

* feat(sdk/daemon-ui): consume daemon-side subagent nesting context (PR-K)

Closes the SDK-side gap for §B1 in PR #4353's TODO list. PR-E originally
deferred subagent nesting because daemon-side parent-context wasn't yet
stamped on tool_call events. After the rebase onto current
daemon_mode_b_main, source verification confirms the daemon now emits
`tool_call._meta.parentToolCallId` + `tool_call._meta.subagentType` via
`SubAgentTracker.getSubagentMeta()` (core), so the SDK side is unblocked.

## Schema additions (additive, forward-compat-safe)

`DaemonUiToolUpdateEvent`:
  - parentToolCallId?: string  — toolCallId of the parent Task / delegation
  - subagentType?: string      — sub-agent type label (e.g. 'code-reviewer')

`DaemonToolTranscriptBlock`:
  - parentToolCallId?: string  — mirror of event field
  - subagentType?: string      — mirror of event field
  - parentBlockId?: string     — pre-resolved by reducer when parent already
                                 in state, so renderers don't re-correlate

## Normalizer wiring

`normalizeToolUpdate` checks both top-level and `_meta` for parentToolCallId
+ subagentType (fallback chain mirrors how provenance/serverId are read).
Top-level tool calls without sub-agent context omit the fields cleanly.

## Reducer behavior

- New tool block: resolves `parentBlockId` from `toolBlockByCallId` at
  create time. Out-of-order arrival (child before parent) leaves
  `parentBlockId` undefined — selectors fall back to `parentToolCallId`
  lookup.
- Existing tool block update: adopts parent context if not yet
  correlated, never overwrites established correlation (handles the
  flow where SubAgentTracker activates after the initial tool_call).

## New public selectors

- selectSubagentChildBlocks(state, parentToolCallId): returns the
  array of tool blocks invoked inside a given parent delegation
- isSubagentChildBlock(block): type guard for "this tool block came
  from a sub-agent"

Both exported from @qwen-code/sdk/daemon root + ui/index.

## Forward-compat properties

- Top-level tool calls (no sub-agent) work identically as before
- Trimmed parent blocks: child fallback to undefined parentBlockId
- Daemon emits both fields together; SDK reads independently to tolerate
  partial future stamping

## Test coverage (129/129 pass, +5 new tests)

- Extract parentToolCallId + subagentType from `_meta`
- Top-level tool calls have undefined parent fields (forward-compat)
- Reducer correlates parentBlockId at create time
- Reducer adopts parent context on later update (out-of-order arrival)
- isSubagentChildBlock discriminator

## Roadmap

PR-K of the unified follow-up to PR #4353. Closes §B1 (subagent nesting)
in the TODO declaration; daemon-side already shipped on
`daemon_mode_b_main` via SubAgentTracker (core).

Remaining TODO §B / §D items still depend on further daemon/Core work:
- §B2 `tool.progress` event type (daemon emit pending)
- §D MessageEmitter multimodal echo + HistoryReplayer inlineData/fileData
  (core change pending)

Generated with AI

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* fix(daemon-ui): PR-K self-review hardening — back-fill / trim / self-ref / docs

Multi-round self-review of PR-K (d8375fe46) surfaced two real bugs, a
few defensive gaps, and missing docs/fixture coverage. All addressed
in one commit.

## Bugs fixed

### Bug 1 — `parentBlockId` never back-filled for out-of-order arrival

Original PR-K resolved `parentBlockId` only at child create time, which
broke this flow:

  1. Child arrives WITH parent stamp → block created with
     `parentToolCallId` set, `parentBlockId` undefined (parent not in
     state yet)
  2. Parent arrives later → block created, `toolBlockByCallId` indexed
  3. Subsequent child updates: existing-block branch only ran the
     back-fill inside `!existing.parentToolCallId`, which is false (we
     already adopted the stamp in step 1). `parentBlockId` stayed
     undefined forever.

Fix: separate the two correlations.
  - existing-block update: independently back-fill `parentBlockId`
    whenever `parentToolCallId` is set and `parentBlockId` is missing
  - new-block create: scan existing children whose `parentToolCallId`
    matches the new block's `toolCallId` and back-fill their
    `parentBlockId`. Cheap O(n) over current blocks.

### Bug 2 — dangling `parentBlockId` after trim

`trimTranscriptState` reset `toolBlockByCallId[id]` to the trimmed
sentinel for evicted blocks but did NOT walk surviving children to
null their `parentBlockId` references. Renderers walking
`blockIndexById.get(parentBlockId)` would get undefined, with no
"why" signal.

Fix: post-trim, walk remaining tool blocks; if `parentBlockId`
references an id not in `keptIds`, null it. `parentToolCallId` stays
(survives trimming so selector-keyed queries still work).

## Defensive hardening

- **Self-reference guard** (normalizer): drop
  `parentToolCallId === toolCallId` before it reaches the reducer.
  Daemon should never emit this, but defending costs nothing.
- **Selector docstring**: clarify `selectSubagentChildBlocks` returns
  **direct** children only; document cycle / depth-cap responsibility
  for renderers walking up the chain.
- **Cosmetic**: remove redundant `as DaemonToolTranscriptBlock` cast
  in `isSubagentChildBlock` (TypeScript already narrows after
  `block.kind === 'tool'` on the discriminated union).
- **Alphabetical**: move `isSubagentChildBlock` re-export to correct
  position in both `daemon/index.ts` and `daemon/ui/index.ts`.

## Docs + conformance gaps closed

- `README.md` — new "Sub-agent nesting (PR-K)" section with full
  reducer behavior, out-of-order handling note, recursive walk example,
  cycle-defense note.
- `MIGRATION.md` — new step 8a with before/after for nested rendering.
- `conformance.ts` — new `subagent-nesting` fixture covering parent +
  nested child via `tool_call._meta`. Markdown-safe phrases chosen
  (markdown escapes `-` so titles cannot be substring-matched as-is).

## Test coverage (+5 tests, 134/134 pass)

- Self-reference dropped in normalizer
- Back-fill on out-of-order parent arrival (child first, parent after)
- Back-fill on later child update when parent now exists
- Dangling `parentBlockId` nulled after parent trimmed
- New `subagent-nesting` conformance fixture passes SDK reference adapter

## Side-effect verification

Verified no regressions:
- Cancellation propagation still cancels parent + children together
  (iterates `toolBlockByCallId`, which includes both)
- Render contract unchanged (`daemonBlockToMarkdown` etc. project per
  block, no nested awareness required)
- No serializer to update
- `selectTranscriptBlocksOrderedByEventId` unaffected (parent-agnostic)

Generated with AI

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* fix(daemon-ui): permission block trim contract — wenshao review

Addresses both items from wenshao's review on PR #4353:

## Critical — resolvePermissionBlock missing TRIMMED guard

The sibling `upsertPermissionBlock` (transcript.ts:544) correctly returns
early when `existingId === TRIMMED_PERMISSION_BLOCK_ID`, but
`resolvePermissionBlock` (transcript.ts:581) had no such guard. When
`maxBlocks` trimming evicted a pending permission request, a subsequent
`permission.resolved` event would:

1. Fail the `getWritableBlockById` lookup (sentinel is not a real block id)
2. Fall through and create a brand-new orphan resolution block

This wasted a block slot, accelerated further trimming, and silently
broke the trimmed-block contract that the request-side guard establishes.

Fix: mirror the request-side guard. Read the index entry up front,
return early on the sentinel.

## Suggestion — permissionBlockByRequestId grows unboundedly

`trimTranscriptState` writes `TRIMMED_PERMISSION_BLOCK_ID` for evicted
permission requests but never deletes those entries. Unlike the tool
side (which calls `pruneTrimmedToolIndexes` post-trim), the permission
index grew without bound in long sessions.

Fix: add `pruneTrimmedPermissionIndexes` analogous to the tool-side
helper. Caps the sentinel set at `maxBlocks` entries; older entries are
deleted (any later resolution event still drops cleanly via the new
Critical guard).

## Tests

- Updated existing `keeps orphan permission resolutions visible after
  request trimming` test to encode the corrected contract (drops silently
  instead of creating an orphan). Test rename: "drops resolution for
  trimmed permission requests (wenshao Critical)".
- New `Suggestion: pruneTrimmedPermissionIndexes caps the trimmed
  sentinel set` test verifies the cap.

Total: 136/136 tests pass, SDK + WebUI typecheck green.

## Side-effect verification

- `upsertPermissionBlock` already had the equivalent guard — no
  asymmetry remains.
- `pruneTrimmedPermissionIndexes` only touches entries holding the
  sentinel; live permission blocks are unaffected.
- Selectors over `state.blocks` (e.g. `selectPendingPermissionBlocks`)
  iterate the block array, not the index — unaffected by cap.

Generated with AI

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* fix(daemon-ui): address wenshao + doudouOUC inline reviews (2026-05-23)

Addresses the 13 inline review comments from wenshao (6) and doudouOUC
(7, one overlap) on the 2026-05-23 review round.

## Critical / Important

### sanitizeUrls not threaded through HTML preview path (doudouOUC)

`daemonBlockToHtml` for tool blocks called `daemonToolPreviewToPlainText`
which didn't accept `opts` — when callers set `sanitizeUrls: true`, the
markdown path stripped auth tokens but the HTML path leaked them into
the DOM. Now: helper accepts opts, threads through `web_fetch.url` and
`image_generation.thumbnailUrl`.

### enrichToolDetailsWithPreview overwrote rawOutput (doudouOUC)

The webui adapter replaced structured `rawOutput` with a markdown
summary string when `enrichDetails: true`. Downstream `ToolCallData`
consumers may branch on the shape (object vs string) and break. Plus
the actual tool output was silently dropped.

Fix: keep `rawOutput` verbatim, surface markdown via a new optional
`previewMarkdown` field added to `ToolCallData`.

### transcriptBlockToTerminalText zero test coverage (wenshao)

Added 12 tests covering each `switch` branch (user / assistant / thought
/ tool / shell stdout+stderr / permission unresolved+resolved / status /
debug / error) plus the unknown-kind degradation path. Verified
`assertNever` returns a graceful error line (does NOT throw) — wenshao's
reviewer was slightly wrong on the throw claim but coverage gap was
real.

### selectTranscriptBlocksOrderedByEventId no memoization (wenshao)

Selector was called from React `useSyncExternalStore` and re-sorted on
every dispatch — including sidechannel-only events that don't touch
blocks. Added WeakMap cache keyed on `state.blocks` reference; the
reducer preserves the same array reference for non-block-mutating
events, so the cache hits across renders.

### selectSubagentChildBlocks O(n) per call (wenshao)

Naive `state.blocks.filter()` was O(n) per call; rendering a tree with
m parents made it O(n*m). Built a memoized reverse index keyed on
`state.blocks` reference (WeakMap of parentToolCallId →
DaemonToolTranscriptBlock[]). Each lookup now O(1) after first call.

### Test file TS errors at root tsc (wenshao)

Fixed multiple TS errors in `daemonUi.test.ts` flagged by root
`tsc --noEmit`:
- Added `DaemonTranscriptState` + `DaemonUiEvent` imports
- `block.content` access via `as Array<Record<string, unknown>>` cast
- `delete` on globalThis property via narrower interface cast
- `debug?.text` via `DaemonUiEvent & { text: string }` narrowing (Extract on
  union with `'status' | 'debug'` literal would resolve to never)
- 6 occurrences of index-signature access via bracket notation
- `raw: null` added to 3 `DaemonUiPermissionOption` literals (required field)
- Explicit type annotations on conformance-suite `renderToText` params

Note: `webui/src/daemon/transcriptAdapter.test.ts` shows residual
"clientReceivedAt does not exist" errors at root tsc, but this is
environmental — the resolution trace shows `@qwen-code/sdk/daemon`
crossing into a sibling worktree's stale dist via shared workspace
node_modules. In a single-worktree CI checkout this resolves cleanly.

## Suggestions (cleanups)

### Hoist asDaemonErrorKind double-eval (doudouOUC)

`session_died` + `stream_error` cases each computed `asDaemonErrorKind`
twice in the conditional spread (predicate + value). Hoisted to const,
no functional change.

### renderToolHeader bypassed opts (doudouOUC)

Forwarded `opts` so `maxFieldLength` is honored for tool title /
toolName / toolKind.

### isSensitiveKey duplicates (doudouOUC)

Removed duplicate `endsWith('accesskey')` / `endsWith('secretkey')`
checks and the redundant exact-match `privatekey` (already covered by
`endsWith`).

### propagateCancellationToInFlightTools iterated trimmed (wenshao)

Filter `TRIMMED_TOOL_BLOCK_ID` sentinels up front. Avoids redundant
index dereferences in long sessions with many historical tools.

### toolProgress shallow clone (doudouOUC + wenshao)

`cloneTranscriptState` outer `...state` spread shared inner
`{ ratio?, step? }` references between snapshots. Once `tool.progress`
event handlers start mutating in place, the prior snapshot would leak.
Deep-clone the inner records now (cost bounded by in-flight tools,
small).

### isDeviceFlowErrorKind closed set (wenshao + doudouOUC)

Both reviewers suggested strict validation. We INTENTIONALLY kept
lenient pass-through — the public type
`DaemonAuthDeviceFlowSdkErrorKind` explicitly includes `(string & {})`
as a forward-compat escape hatch (existing test `keeps future
auth_device_flow_failed errorKind values observable` enforces this).
Now expose `KNOWN_DEVICE_FLOW_ERROR_KINDS` as documentation and
explain the design in the JSDoc.

## Validation

| | |
|---|---|
| SDK tests | 148/148 pass (+12 terminal coverage + assorted hardening) |
| SDK typecheck | clean |
| WebUI typecheck | clean |

## Side-effect verification

- WeakMap memos invalidate correctly: reducer creates a fresh
  `state.blocks` reference only on block-mutating events. Sidechannel
  events reuse t…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants