test: PR label feature verification#3
Closed
BingqingLyu wants to merge 2 commits into
Closed
Conversation
BingqingLyu
pushed a commit
that referenced
this pull request
May 7, 2026
…QwenLM#3774) * feat(core): enforce prior read before Edit / WriteFile mutates a file Introduces a session-scoped invariant: the model cannot mutate an existing file without having actually Read it (or its post-write state) earlier in this conversation. Builds on the FileReadCache landed in QwenLM#3717. Two new ToolErrorType codes: - EDIT_REQUIRES_PRIOR_READ — file has no entry in the session cache. The model is told to use read_file first. - FILE_CHANGED_SINCE_READ — file has an entry but its mtime or size drifted since the recorded fingerprint. The model is told to re-read before retrying. EditTool blocks the existing-file path on cache.check; new-file creation (old_string === '' on a non-existent target) is exempt. WriteFileTool blocks the overwrite path; new-file creation (fileExists === false) is exempt. Both tools route through the existing fileReadCacheDisabled escape hatch on Config — flipping it bypasses enforcement byte-for-byte, matching pre-cache behaviour. Operators can use this as a kill switch if a session falls into a state where the cache cannot be trusted. ReadFile fix on the auto-memory path: PR QwenLM#3717 had auto-memory reads skip the cache entirely (both lookup and record), but with the new enforcement that means a model that just Read AGENTS.md cannot then Edit it. Decoupled the two: auto-memory reads still skip the file_unchanged fast-path (the per-read freshness <system-reminder> must always reach the model) but DO record into the cache so the follow-up Edit sees `fresh`. New regression test asserts this. Test plan - vitest run (all of @qwen-code/qwen-code-core): 6308 passed, 2 skipped - 9 new enforcement tests across edit.test.ts and write-file.test.ts: unknown rejects, stale rejects, new-file exempt, edit chain stays authorised, escape hatch bypasses, plus the auto-memory record regression in read-file.test.ts. - tsc --noEmit clean. eslint clean. core build succeeds. * test(core): clear shared fileReadCache between write-file.test.ts cases CI surfaced one Linux-only failure: the prior-read enforcement test 'rejects a write that would overwrite an unread existing file' returned FILE_CHANGED_SINCE_READ instead of EDIT_REQUIRES_PRIOR_READ. Root cause: the FileReadCache instance is declared at module scope (line 41) and shared across every test in write-file.test.ts. State from earlier tests — most recently the 'records a write' integration test that records the same path — leaks forward. On Linux the test ordering puts a record-bearing test before the enforcement test, so the cache reports `stale` (mtime drifted) instead of `unknown`. macOS / Windows happen to order them differently and never hit it. Adding a fileReadCache.clear() to beforeEach gives every test a known-empty cache, matching how edit.test.ts already isolates its per-test cache by re-instantiating it. * fix(core): close prior-read enforcement gaps flagged in 3rd review Three concrete loopholes / regressions that the original PR-B introduction left open. All three are addressed in the same commit because the underlying refactor (move enforcement earlier and tighten the fresh predicate) is shared across them. 1. fresh != "model has seen the bytes". Pre-fix, requirePriorRead() accepted any cache.check === 'fresh'. ReadFileTool records every successful read into the cache, including ranged reads (offset/limit), truncated full reads, and non-cacheable binary/image/audio/video/PDF/notebook reads (lastReadCacheable = false). This let the model peek at a slice or a structured payload of a file and then mutate the whole thing. Tightened the accept predicate to fresh && lastReadAt && lastReadWasFull && lastReadCacheable. 2. Read-less content oracle through calculateEdit error codes. Pre-fix, execute() ran calculateEdit (which reads file bytes and counts matches) before the enforcement check. A model could probe an unread file by attempting Edits with candidate old_strings and observing NO_OCCURRENCE_FOUND vs EXPECTED_OCCURRENCE_MISMATCH vs EDIT_NO_CHANGE — reverse-engineering content without ever calling read_file. Moved enforcement to the top of calculateEdit, before any content read; only a stat is performed up to the rejection point. 3. Confirmation flow regression. Pre-fix, getConfirmationDetails() read the existing file to render a diff for the user, then approval flowed to execute() which would freshly check the cache and reject. The user could approve a diff computed from current bytes the model never saw, and the call would still fail. Moved enforcement before the confirmation read in both EditTool (via the shared calculateEdit path) and WriteFileTool (explicit check at the top of getConfirmationDetails). The user now never sees a confirmation diff for an unread file — the call rejects up front. Public API surface change: requirePriorRead() -> checkPriorRead() that returns a structured decision, so the same predicate can route into a CalculatedEdit.error (calculateEdit), a thrown error (getConfirmationDetails), or a ToolResult (execute) without duplicating the boolean / message / type plumbing in three shapes. Reported by pomelo-nwu (3 inline comments on PR QwenLM#3774). * refactor(core): close 4 prior-read enforcement gaps from 4th review 1. recordWrite now seeds read metadata on brand-new entries (lastReadAt / lastReadWasFull / lastReadCacheable). The strict accept predicate added in the previous round (#3 review) requires all three, but recordWrite only set lastWriteAt — so a model creating a file with Edit (old_string="") or WriteFile and then editing it again was rejected on the second edit. The model authored the bytes it just wrote; for the purposes of prior-read enforcement that counts as having seen them. New regression test in edit.test.ts: "allows a create-then-edit-then-edit chain without an intervening read". 2. Extracted checkPriorRead into src/tools/priorReadEnforcement.ts. The two copies in edit.ts and write-file.ts had already drifted (one used ${ReadFileTool.Name}, the other hardcoded 'read_file'); the boolean guard is security-sensitive and a one-sided fix would silently weaken the boundary. The shared utility takes a verb ('editing' | 'overwriting') so the user-facing prose can differ between callers without duplicating the decision logic. 3. WriteFileTool.execute now runs prior-read enforcement BEFORE readTextFile. Pre-fix, an unread overwrite still slurped the entire file into memory (encoding / BOM / line-ending detection) and only then rejected it: wasted I/O, and momentary in-memory custody of bytes the model never legitimately read. Now matches the order in getConfirmationDetails(). 4. The "rejects a write that would overwrite an unread existing file" test now spies on FileSystemService.readTextFile and asserts not.toHaveBeenCalled() — without that, the test gave false confidence: it passed both pre-fix (read happened, then reject) and post-fix (reject before read), so the ordering regression in (3) was invisible to the assertion. Reported by glm-5.1 via /review on PR QwenLM#3774. * refactor(core): close 4 prior-read enforcement gaps from 4th review (Copilot) Five concrete gaps that the previous round of enforcement work left open. Reported by Copilot via /review on PR QwenLM#3774. 1. Confirmation-time rejections lost their ToolErrorType code. getConfirmationDetails() in both EditTool and WriteFileTool threw a plain Error on prior-read failure, which coreToolScheduler collapsed into UNHANDLED_EXCEPTION — silently breaking the EDIT_REQUIRES_PRIOR_READ / FILE_CHANGED_SINCE_READ contract for any approval-required flow. Fix: introduce PriorReadEnforcementError that carries `errorType: ToolErrorType`. Both confirmation paths now throw it, and coreToolScheduler reads `error.errorType` (falling back to UNHANDLED_EXCEPTION when absent). New regression tests assert the thrown error's `errorType` field for both tools. 2. checkPriorRead's "re-read with read_file" advice was wrong for binary / image / audio / video / PDF / notebook files. Their ReadFile result always sets lastReadCacheable=false, so the message would loop the agent forever on the same rejection. Fix: detect the fresh-but-non-cacheable case explicitly and return a dedicated message that explains the dead end ("Edit / WriteFile cannot mutate that payload safely") instead of asking for another read. Updated the existing non-cacheable regression test to assert the new message and the absence of "use the read_file tool first". 3. checkPriorRead swallowed every stat() failure and returned ok:true. EACCES, EBUSY, NFS hiccups, etc. would silently re-open the blind-write path the helper exists to block. Fix: only ENOENT continues to return ok:true (disappearance race). Any other code is fail-closed: returns EDIT_REQUIRES_PRIOR_READ with a message that names the errno. New regression test in write-file.test.ts spies on fs.promises .stat to inject EACCES and asserts the rejection. 4. The auto-memory record regression test only asserted `state === 'fresh'`. A future change that recorded auto-memory reads as partial / non-cacheable would still satisfy that assertion but would actually fail enforcement on every follow-up Edit. Fix: also assert lastReadAt is defined, lastReadWasFull is true, and lastReadCacheable is true. The full "what enforcement requires" predicate is now explicit in the test. (The 5th item, the WriteFile mirror of (1), is covered by the same PriorReadEnforcementError change.) * refactor(core): tighten StructuredToolError naming + add scheduler test Four follow-ups raised by deepseek-v4-pro on PR QwenLM#3774. None of them change the enforcement boundary; they are all about making the contract clearer and harder to break in future changes. 1. PriorReadEnforcementError -> StructuredToolError. The class now wraps any content-derived ToolErrorType from calculateEdit (EDIT_NO_OCCURRENCE_FOUND, EDIT_EXPECTED_OCCURRENCE_MISMATCH, EDIT_NO_CHANGE, ATTEMPT_TO_CREATE_EXISTING_FILE) on top of the prior-read codes. The old name suggested the class was prior- read-specific, which would mislead any oncall engineer seeing it paired with one of the calculateEdit error codes. 2. EDIT_REQUIRES_PRIOR_READ kept its name (the prefix mentions "edit" but the enum is shared with WriteFileTool) — chose documentation over rename to avoid the churn of a value rename across logs/dashboards already keyed on it. JSDoc now spells out the cross-tool usage explicitly. 3. Stat failures other than ENOENT now map to a new PRIOR_READ_VERIFICATION_FAILED code instead of being conflated with EDIT_REQUIRES_PRIOR_READ. The failure mode is "we cannot verify" rather than "definitely not read" — operators routing on error codes can distinguish the two populations. 4. Added a coreToolScheduler test (`surfaces error.errorType from a confirmation throw instead of UNHANDLED_EXCEPTION`) that constructs a stub tool whose getConfirmationDetails throws StructuredToolError and asserts the surfaced ToolCall response carries the correct ToolErrorType. Without this test the scheduler's explicitErrorType branch would have no coverage at all. Tool tests updated for the new StructuredToolError class name and the PRIOR_READ_VERIFICATION_FAILED code on the EACCES path. * fix(core): close TOCTOU + grammar + directory regressions in PR-B Six concrete issues that the previous round of enforcement work left open. Reported by Copilot via /review on PR QwenLM#3774. 1. TOCTOU window between pre-read checkPriorRead and readTextFile. The pre-read stat could pass enforcement, then an external writer could land between that stat and the actual read, leaving currentContent reflecting bytes the model never saw — exactly the stale-write path the PR is supposed to block. Closed by re-running checkPriorRead immediately after every readTextFile that fed currentContent / originalContent: EditTool.calculateEdit and the two WriteFileTool paths (execute + getConfirmationDetails). A `stale` outcome now fails the operation with FILE_CHANGED_SINCE_READ at the correct moment. 2. Directory targets sent the model into an enforcement loop. `fileExists` is a plain access check, so directories also entered the enforcement branch — the model would be told to call `read_file`, but `read_file` rejects directories with TARGET_IS_DIRECTORY, so the loop never terminated. Fixed in checkPriorRead: if `fs.stat` reports the path is not a regular file, return `ok: true` so the downstream readTextFile / write path can surface its own EISDIR / similar error. 3. Confirmation-time error messages used the short `display` form instead of the full `raw` form. Approval-required Edit calls therefore lost the remediation detail (file path, stale-vs-unread distinction, "without offset / limit / pages" hint) that the execute path already surfaced and that the WriteFile confirmation path already preserved. EditTool.getConfirmationDetails now throws StructuredToolError with `editData.error.raw`. 4. Non-text payload displayMessage was grammatically broken: built from the gerund `editing` / `overwriting`, it rendered as "cannot editing via this tool" / "cannot overwriting via this tool". Fixed by deriving a bare-verb form (`edit` / `overwrite`) alongside the gerund and using it in displayMessage. (Items 1, 5 and 6 from Copilot's batch are the same TOCTOU class — EditTool calculateEdit + WriteFile execute + WriteFile confirmation — addressed together in (1) above.) The "bypasses enforcement entirely" test now uses mockReturnValue instead of mockReturnValueOnce because calculateEdit calls getFileReadCacheDisabled twice — once for the pre-read check and once for the post-read TOCTOU re-check. Both must see disabled=true to actually bypass. * fix(core): close fileExists TOCTOU on WriteFile prior-read enforcement WriteFile gated prior-read enforcement on `fileExists` from `isFilefileExists()`, but a file that sprang into existence between that check and the write would still be overwritten without enforcement — `fileExists === false` skipped the check entirely. Made the gate unconditional on `fileExists`. checkPriorRead's own `fs.stat` decides what to do: - ENOENT → ok:true, fall through to the new-file path as before - file exists right now (whether or not isFilefileExists saw it) → unknown / stale check runs, the race-created file is rejected. Applied to both getConfirmationDetails and execute. The path that actually creates new files is unchanged because checkPriorRead's ENOENT branch is the disappearance-race exit, which is the correct exit for "the file truly does not exist yet". Reported by glm-5.1 via /review on PR QwenLM#3774. * fix(core): close 4 enforcement gaps + 1 critical bug from 5th Copilot review Six issues raised by deepseek-v4-pro / glm-5.1 / qwen3.6-plus on PR QwenLM#3774. Listed by reviewer-assigned severity. [Critical] (qwen3.6-plus) recordWrite previously only seeded the read metadata for brand-new entries. The reproduction was real: ReadFile(limit=10) → WriteFile(full content) → Edit. The partial read's lastReadWasFull=false would persist through the write, and the Edit would be rejected with EDIT_REQUIRES_PRIOR_READ even though the model just authored every byte. recordWrite now unconditionally refreshes lastReadAt, lastReadWasFull=true, and lastReadCacheable=true. The fileReadCache.test.ts case that previously asserted "preserves lastReadAt" is rewritten to assert the new "refreshes lastReadAt to match the write" contract, and a new "upgrades lastReadWasFull / lastReadCacheable after a full write" regression test pins the reproduction reviewer described. [Suggestion] (deepseek-v4-pro) Narrowed the non-regular-file bypass in priorReadEnforcement from `!stats.isFile()` to `stats.isDirectory()`. The earlier broad form covered FIFOs, sockets, and devices that the model has no legitimate "read first" recourse for and that can block readTextFile (FIFO) or over-allocate (/dev/urandom). Those now flow through to cache.check() and reject with the unread-file path before any I/O. [Suggestion] (glm-5.1) Removed the `fileExists && ...` gate from EditTool.calculateEdit, mirroring the f4ef756 fix on WriteFile. A file that springs into existence between isFilefileExists() and the enforcement check is now caught here as well; ENOENT inside checkPriorRead remains the disappearance-race exit and new-file creation flow is unchanged. [Suggestion] (deepseek-v4-pro) Added debugLogger.warn() at every post-read TOCTOU rejection site (Edit calculateEdit, WriteFile getConfirmationDetails, WriteFile execute). These rejections are rare and self-healing — without a debug record, an operator investigating "why did this Edit fail once?" had nothing to grep. debugLogger uses dedicated 'EDIT_PRIOR_READ' / 'WRITE_FILE' tags. [Suggestion] (qwen3.6-plus) Added a final pre-write checkPriorRead in EditTool.execute() and WriteFileTool.execute(). The earlier post-read check ran inside calculateEdit (Edit) or before mkdirSync (WriteFile), but the actual writeTextFile call could be arbitrarily later — user approval, modify-and-confirm, etc. The window from "post-read check → writeTextFile" is now bounded to "pre-write stat → writeTextFile" (two adjacent syscalls). * fix(core): close new-file race + special-file enforcement loop Three issues from the latest Copilot review on PR QwenLM#3774. 1. New-file race in pre-write enforcement (write-file.ts:348, edit.ts:487). The earlier pre-write checkPriorRead was gated on `fileExists` (WriteFile) and `!editData.isNewFile` (Edit). If the path was absent at planning time and another process created it while approval was pending, the gated form would skip enforcement and silently overwrite a pre-existing file the model never read. Run unconditionally in both tools — checkPriorRead's own ENOENT branch is the disappearance-race exit, so genuine new-file creation is unaffected, but a race-created file now hits the `unknown` branch and is rejected as unread. 2. FIFO / socket / device sent the model into an enforcement loop (priorReadEnforcement.ts:220). After narrowing the non-regular-file bypass to directories only, FIFOs etc. fell through to cache.check, returned `unknown`, and produced a "use read_file first" message — but read_file rejects those same targets as "not a regular file", so the model would loop on read_file forever. Added a dedicated `!stats.isFile()` branch (after the directory exemption) that returns a "special file; cannot edit/overwrite via this tool — use shell instead" message, matching the shape of the existing non-text-payload guidance. (Tool-error.ts and the non-cacheable policy notes are addressed in the PR description update — not in code.) * fix(core): close 4 enforcement gaps from 6th Copilot review (Plus a doc-only update for the 5th — the mtime+size limitation warning in the Risk section now mentions the silent-overwrite escalation that this PR's mutation paths bring along.) 1. ENOENT after the model has already read the file is no longer silently treated as `ok: true`. Added an `expectExisting` option to `checkPriorRead`; post-read and pre-write callers pass `true`. ENOENT under that flag now rejects with `FILE_CHANGED_SINCE_READ` ("file disappeared after the model read it") rather than falling through to the new-file path with stale bytes. Pre-read callers keep the old default (ENOENT → ok:true → fall through to genuine new-file creation). EditTool's pre-write check derives the flag from `editData.isNewFile`; WriteFile's pre-write check derives it from the post-read `fileExists` value. 2. Directory targets now reject with `TARGET_IS_DIRECTORY` and a structured message instead of returning `ok: true`. The previous form fell through to readTextFile(), which on the WriteFile confirmation path threw a plain Error and was surfaced by the scheduler as `UNHANDLED_EXCEPTION`. Both Edit and WriteFile now emit a structured rejection at enforcement time. (WriteFile's build-time validateToolParamValues already rejects directories, so the change matters most for EditTool.) 3. Non-cacheable rejection's `rawMessage` no longer hard-codes "overwrite" — it now uses the same `verbBare` derivation as the `displayMessage`, so EditTool's path correctly says "if you need to edit it" and WriteFile's path stays "if you need to overwrite it". The previous form was confusing for in-place edits. 4. WriteFile.getConfirmationDetails now mirrors execute()'s ENOENT-to-new-file fallback: a file that disappears between isFilefileExists() and the readTextFile-for-diff call no longer throws a plain Error (which would surface as UNHANDLED_EXCEPTION) — it falls back to the brand-new-file diff so the user sees a clean confirmation rather than an unstructured crash. Tests: - New: `rejects an edit on a directory with TARGET_IS_DIRECTORY` - New: `confirmation falls back to a new-file diff when the file disappears mid-flight` (WriteFile) - Updated: non-cacheable rejection asserts `verbBare` is "edit" on the EditTool path and "overwrite" on the WriteFile path. Reported by Copilot via /review on PR QwenLM#3774. * docs(core): clarify stat→write race + EDIT_REQUIRES_PRIOR_READ scope Three doc-only follow-ups from Copilot's latest review pass on PR QwenLM#3774. None change behaviour; the pre-fix code state was already the actual contract — the docs just lagged it. 1. EDIT_REQUIRES_PRIOR_READ enum comment now lists the three cases the code actually returns it for (never-read, partial / ranged / non-cacheable read, structural dead end — non-text payload or special file). The previous one-liner described only the first case and would mislead future maintainers. 2. The Final pre-write freshness check blocks in EditTool.execute and WriteFileTool.execute now spell out that they DO NOT eliminate the stat → writeTextFile race. The window narrows from the previously-unbounded post-read-to-write gap down to two adjacent syscalls, but a concurrent writer landing in that pair can still be clobbered. Closing the residual would require an atomic write (write-to-temp + rename) or a content-hash post-write recheck — both deferred. Operators who need strict protection set `fileReadCacheDisabled: true` and rely on application-level locking. 3. PR description Risk section gains a "Known unmitigated: stat → write race window" subsection (English + Chinese mirror) matching the code comments. * chore(core): minor follow-ups from review #4229917446 Three of the five MINOR items raised in the independent code review on 2026-05-05 — the cheap, isolated ones. The other two (race- simulating integration test, moving StructuredToolError out of priorReadEnforcement.ts) are deferred as the reviewer suggested. 1. EditTool now has a symmetric `PRIOR_READ_VERIFICATION_FAILED` regression test (mocks fs.promises.stat to reject with EACCES, asserts the EditTool path produces the same fail-closed result that the existing WriteFile EACCES test pins). Five-line fix to close the asymmetry that, while harmless today (the helper is shared), would let a future Edit-side change to checkPriorRead slip through without test coverage. 2. ensureParentDirectoriesExist / mkdirSync now run AFTER the pre-write checkPriorRead in both EditTool.execute() and WriteFileTool.execute(). Doing it before would leak intermediate directories on the rejection path — a real (if minor) FS litter the previous order created on every rejected new-file write. 3. EDIT_REQUIRES_PRIOR_READ enum docstring gains a one-line note for operators routing alerts on this code: a single `edit_requires_prior_read` signal can mean any of the three cases (no read / partial read / structural dead-end), and if per-cause monitoring becomes important the enum can be split in a follow-up. The originating tool name and the message text already disambiguate at runtime. * fix(core): close 2 correctness gaps from maintainer review #4232751470 Both tracked back to the cache's "track most recent read shape" model diverging from prior-read enforcement's "model has seen these bytes" model. 1. SVG (and similar string-content fallbacks) recorded as non-cacheable, blocking subsequent Edit / WriteFile. `read-file.ts` derives `cacheable` from `originalLineCount !== undefined && !isTruncated`. The SVG branch in `fileUtils.ts` returned content without `originalLineCount`, so `cacheable` collapsed to false and a follow-up Edit hit the dead-end "non-text payload — use shell" rejection — telling the model to use shell to mutate a file it had just successfully read as text. This was a real regression vs pre-PR behaviour where SVG-as-text editing worked. Fix: SVG-as-text branch now sets `originalLineCount` (split on '\n') and `isTruncated: false`, so ReadFile records it as a full cacheable read. The binary-fallback string and over-1MB SVG branches are deliberately left non-cacheable — they return placeholder strings ("Cannot display content of ...") rather than file content, so blocking edits there is correct. New regression test in `read-file.test.ts`: `records SVG-as-text reads with cacheable=true so a follow-up Edit passes enforcement`. 2. recordRead unconditionally overwriting lastReadWasFull / lastReadCacheable, revoking prior write-author or full-read rights. The `WriteFile(create) → ReadFile(offset/limit) → Edit` sequence rejected the Edit because the partial read clobbered the `lastReadWasFull = true` that `recordWrite` had stamped at create time. Same shape applies to a full text read followed by a partial one of the same inode. Fix: `recordRead` is now sticky-on-true for the read flags — `if (opts.full) entry.lastReadWasFull = true;` and the matching guard for `cacheable`. Prior `true` survives a later partial / non-cacheable read. The fast-path `file_unchanged` check still gates on the incoming request's own `isFullRead` in `read-file.ts`, so a partial read still does not get a placeholder it shouldn't. Updated the existing "overwrites earlier lastReadWasFull" test to assert the new sticky semantics, and added a `lastReadCacheable` symmetric test plus a `Write → partial-Read → Edit` end-to-end test in `edit.test.ts`. Reported by tanzhenxin via independent maintainer review on 2026-05-06. * fix(core): close 3 correctness gaps from re-review #4233904930 All three are tightenings of the prior `de8ddf530` round. 1. **Sticky-on-true narrowed to "no fingerprint drift"**. `fileReadCache.recordRead` previously kept `lastReadWasFull` / `lastReadCacheable` true across drifted recordings, which re-opened a `Read full @x → external write @y → Read partial @y → Edit` hole: the partial recordRead silently advanced the entry's mtime+size to Y while preserving the sticky `full=true` from X, so a follow-up Edit ran against bytes the model only saw the first 10 lines of. Now the sticky branch only fires when `(mtimeMs, sizeBytes)` matches the existing entry; on drift, both flags reset to exactly what this read produced. New regression test in `fileReadCache.test.ts` reproduces the reviewer's reported sequence. 2. **Subagent FileReadCache isolation now covers the inherits-model + same-approval-mode common case**. The own-property machinery from QwenLM#3717 only triggers when an `Object.create(parent)` actually fires; both `agent.ts:990-993` (same-approval-mode) and `subagent-manager.ts:699-701` (inherits-model) had paths that returned the parent Config directly, so the subagent's `getFileReadCache()` resolved to the parent's instance — a parent Read could satisfy the subagent's Edit on a path the subagent's transcript never contained. Both sites now build a thin `Object.create(base)` override unconditionally; no method changes for the inherits / same-mode cases, but a distinct instance triggers the lazy-init in `Config.getFileReadCache()` so the subagent gets an isolated cache. 3. **Cache records the read pipeline's internal stat instead of a post-read re-stat**. `processSingleFileContent` now surfaces its internal stat via `result.stats`, and read-file uses that for `recordRead` instead of running its own stat after the read returns. Pre-fix, an external write between the pipeline call and the post-read stat let the cache record fingerprint Y for content the model received at X — a subsequent Edit would pass enforcement against bytes the model never legitimately saw. The internal-stat-to-read window is still a few microseconds wide; that residue is the same content-hash territory acknowledged in the Risk section. Reported by tanzhenxin via re-review on PR QwenLM#3774. * docs(core): clarify partial subagent isolation per review #4234090906 tanzhenxin's third review correctly observed that the `Object.create(parent)` wrappers in `agent.ts:createApprovalModeOverride` and `subagent-manager.ts:maybeOverrideContentGenerator` only isolate the FileReadCache for code that consults `Config.getFileReadCache()` directly. Bound `EditTool` / `WriteFileTool` instances were registered against the parent's tool registry at initialise time, so tool invocations still resolve `this.config` to the parent and reach the parent's cache. `InProcessBackend.createPerAgentConfig` already does the right thing (`override.createToolRegistry()` + `copyDiscoveredToolsFrom(base.getToolRegistry())`); bringing that to these two spawn sites is the real fix. Reviewer's verdict was COMMENT, not REQUEST_CHANGES — the gap pre-dates this PR (it's a property of QwenLM#3717's per-Config own-property machinery) and pre-PR there was no enforcement on subagent mutations at all, so the PR is strictly an improvement on every spawn path. Documented the partial guarantee explicitly: - Inline comments on both spawn sites note the bound-tool caveat and point at `InProcessBackend.createPerAgentConfig` as the model for the follow-up. - PR description's subagent paragraph (English + Chinese mirror) now splits into "fully isolated" (`InProcessBackend.createPerAgentConfig`) and "partial isolation" (the two sites in this PR) so readers don't walk away with the wrong contract. Filing the registry-rebuild work as a follow-up; not in this PR.
BingqingLyu
pushed a commit
that referenced
this pull request
May 7, 2026
…wenLM#3831 PR-1 of 3) (QwenLM#3842) * feat(core): add signal.reason convention for ShellExecutionService.execute() Foundation for QwenLM#3831 Phase D (b) — Ctrl+B promote of a running foreground shell to background. Defines a discriminated `ShellAbortReason` union that the AbortSignal carries; default behavior (no reason / `{ kind: 'cancel' }`) keeps the existing tree-kill on abort. `{ kind: 'background' }` is a takeover signal — execute() skips the kill, drops the child from its active set (so cleanup() won't kill it later), flushes a snapshot of captured output, and resolves the result Promise immediately with `promoted: true` so the awaiting caller unblocks. Pure plumbing: no caller sets the reason yet, so this is a zero-behavior change for existing call sites. The `promoted?: boolean` field is optional on ShellExecutionResult so existing consumers compile against the new shape without source changes. Tests pin both branches in both childProcessFallback and executeWithPty: default abort still SIGTERM-tree-kills; `{ kind: 'cancel' }` is identical to default (pin against accidental routing through the background branch); `{ kind: 'background' }` skips the kill, snapshot output is preserved, mockProcessKill / mockPtyProcess.kill are NOT called. Part of QwenLM#3831 (Phase D part b — Ctrl+B promote running shell to background). PR-1 of 3. * fix(core): detach service listeners on background-promote (resolve review) Addresses 4 Critical + 2 Suggestion findings on PR-1 of QwenLM#3831: - **childProcess listener detach** (review line 555 + 573): Anonymous arrow listeners on stdout/stderr/error/exit could not be off()'d. After background-promote, post-promote bytes would re-enter handleOutput, which then calls decoder.decode() on a now-finalized text decoder (cleanup() already called .decode() without stream:true) → TypeError crash. Even without the crash, old onOutputEvent would fire for new data → ownership contract violation + duplication. Fix: extract named handler refs (stdoutHandler / stderrHandler / errorHandler / exitHandler) and call off() on all four in the background-promote branch via a detachServiceListeners() helper. - **PTY listener detach** (review line 967 + 990): node-pty's onData / onExit return IDisposable handles; the abort handler now captures dataDisposable / exitDisposable and calls .dispose() in the background-promote branch. ptyProcess.on('error') is EventEmitter-style (not IDisposable) — extract a named ptyErrorHandler ref and off() it. Without these, post-promote PTY error throws → Node.js crash; post-promote data continues writing to headlessTerminal and calling old onOutputEvent → ownership violation. - **PTY in-flight chain item ownership** (related to review line 990): processingChain may have already-enqueued callbacks past the early listenersDetached check. Refactored from "early-return short-circuit" to "guard each onOutputEvent emit individually" so in-flight writes still LAND in headlessTerminal (snapshot reflects them) but no events leak to the foreground onOutputEvent. Also clear renderTimeout in the abort handler so a pending throttled render doesn't fire post-promote. - **PTY snapshot freshness** (review line 972, suggestion): The original abort handler called serializeTerminalToText immediately. Now we await Promise.race([processingChain drain, SIGKILL_TIMEOUT_MS]) first (mirrors the onExit finalize pattern at ~line 970) so in-flight headlessTerminal.write callbacks land before serialization. Skipped render(true) intentionally because it would emit final onOutputEvent data (renderFn calls onOutputEvent), violating the "no emit post-promote" invariant — added a comment explaining why direct serialize is correct. - **Handoff-boundary tests** (review line 1257, suggestion): Added 4 new tests pinning the ownership contract — 2 for child_process (post-promote stdout/stderr does NOT route to onOutputEvent; child exit does NOT re-resolve result), 2 for PTY (data/exit disposables ARE called; result shape stays promoted: true even if post-promote events fire). Also: test setup now stubs mockPtyProcess.onData / .onExit to return { dispose: vi.fn() } so the background-promote path's dispose() calls don't crash on undefined (the stub's mock.results[0].value is then inspected by the new handoff tests). 58 / 58 tests pass (50 baseline + 4 first-pass + 4 handoff). Total +235 / -35 on top of the prior commit. * fix(core): defensive hardening for ShellExecutionService background-promote (resolve 2nd review pass) Addresses 6 follow-up [Suggestion] threads on PR-1 of QwenLM#3831 — all substantive code-quality issues raised by the second-pass review of the dispose-based detach commit (8e8e18c): - **Exhaustive switch on `ShellAbortReason.kind`** (both abort handlers). Earlier `if (reason?.kind === 'background')` form silently fell through to kill for any unrecognized variant — a future `{ kind: 'suspend' }` would have killed the process with zero compile-time signal. Switched to `switch (kind)` with a `never`-typed default that runs `debugLogger.warn` and falls back to the safest behavior (cancel/kill). Each branch is now extracted into a named helper (`performBackgroundPromote` / `performCancelKill`) so the switch body stays a single screenful. - **Each `dispose()` wrapped in its own try/catch** (PTY). node-pty's `IDisposable` contract doesn't guarantee no-throw. Without per-dispose try/catch a single throwing dispose() would skip subsequent cleanup (the other dispose, off('error'), activePtys.delete, drain, resolve) and the caller would hang forever on `await result`. Each call now logs via debugLogger.warn on failure but continues. - **`.catch(() => undefined)` on the processingChain side of the drain race** (PTY). `Promise.race([processingChain.then(drain).then(drain), timeout])` would propagate a chain rejection out of the race; since `addEventListener` doesn't await our handler, the rejection became unhandled and `resolve()` was never called → caller hung. Now the rejection is swallowed; the timeout side still terminates the race on time. - **Drain-timeout truncation now emits a diagnostic warning** (PTY). Previously the 200ms drain timeout could fire, the snapshot would be taken with the buffer in mid-write state, and the result.output would be silently truncated. Race result is now observed via a symbol sentinel; when the timeout side wins, debugLogger.warn fires pointing the user at rawOutput as the un-truncated fallback. - **Snapshot serialize failure logs instead of swallowing silently** (PTY). Empty `catch {}` made result.output indistinguishable from "command produced no output" if serializeTerminalToText threw. Now `debugLogger.warn` with the error message leaves a trail for support bundles. - **Dedicated `PROMOTE_DRAIN_TIMEOUT_MS` constant** separated from `SIGKILL_TIMEOUT_MS`. Both are 200ms today, but they have unrelated reasons-to-change (kill escalation timing vs. promote drain ceiling) — sharing the constant means tuning one would silently change the other. Also adds a module-level `debugLogger = createDebugLogger('SHELL_EXECUTION')` since the service had no logging surface before this commit. 58 / 58 tests pass; tsc clean; ESLint clean. No new tests added: the new behaviors (timeout sentinel firing, dispose throw, exhaustive switch default) are defensive log-only paths; existing handoff tests already cover the happy path. Adding mock-throw tests is reasonable follow-up but not blocking. * fix(core): real bug — ptyProcess.off → removeListener; defensive abort-reason read Resolves the third review pass on PR-1 of QwenLM#3831 — 1 real bug + 2 defensive hardenings: - **Real bug: `ptyProcess.off('error', ...)` throws TypeError at runtime** (line ~1074). `@lydell/node-pty`'s `IPty` interface exposes the legacy Node EventEmitter `removeListener`, not the modern `off` alias. Previous form threw, the surrounding try/catch swallowed it (post-prior-pass dispose hardening), but the old `ptyErrorHandler` stayed registered — so a post-promote PTY error would still hit our foreground handler and `throw err`, breaking the handoff contract that PR-1's whole listener-detach work is supposed to enforce. Switched to `removeListener`. The catch + warn stays as defense-in-depth; the message wording is updated. - **Prototype-pollution-safe `kind` read** (extracted to module-level helper `getShellAbortReasonKind`). The previous `reason?.kind` walked the prototype chain — a polluted `Object.prototype.kind = 'background'` would silently route `abortController.abort({})` (any plain object reason) into the promote branch and skip the kill. Lifecycle/safety branch deserves the extra check. Helper now: rejects non-object reasons; reads `kind` only as an OWN property (`hasOwnProperty`); whitelists against `'background' | 'cancel'`; defaults to `'cancel'` (the safe historical behavior) for everything else. Both abort handlers (childProcess + PTY) now share this helper. - **`streamStdout: true` + background-promote = silent empty snapshot** (childProcess `performBackgroundPromote`). The promote snapshot reads from the `stdout` / `stderr` string accumulators; but in `streamStdout` mode `handleOutput` forwards bytes through `onOutputEvent` and skips the accumulators entirely. Today PR-1's only call site (foreground shell.ts) uses `streamStdout: false`, so the combination is unreachable — but if a future caller pairs the two, `result.output` would be empty with no diagnostic. Added a `debugLogger.warn` when the combination occurs, pointing the caller at `rawOutput` as the fallback. Cheaper than building a parallel accumulator just for this latent case. 58 / 58 tests pass; tsc clean; ESLint clean. * fix(core): liveness check + throw-safe abort-reason read + encoding-aware PTY snapshot (resolve 4th review pass) Resolves 6 threads on PR-1 of QwenLM#3831 — 1 Critical + 1 real bug + 2 quality + 2 test-coverage: - **[Critical] `getShellAbortReasonKind` throw-safe property read.** Previous form read `reason.kind` after only checking that `kind` is an own property. An own accessor that throws (or a Proxy with a trapping getter) would throw before the helper reached either the cancel kill path or the background promote path. Abort handlers are dispatched async and not awaited by AbortSignal, so a leaked throw here would have left the shell process alive instead of being killed on cancel — quietly. Wrapped the property read in try/catch with a fall-back to the safe 'cancel' kill behavior. - **Real bug: child_process post-exit race in background-promote** (`performBackgroundPromote`). The child may have already exited but the 'exit' event hasn't reached our handler yet (Node delivers events on the next microtask). Promoting in that window would detach our exit listener and report `promoted: true` for a process that's already dead — the caller would hold an inert pid expecting to take over. Now we read `child.exitCode` / `child.signalCode` before detaching: if either is non-null, fall through and let the pending exit handler resolve normally with the real exit info. Mirrored mock setup so `exitCode` / `signalCode` default to `null` (matching real ChildProcess) instead of `undefined`. - **PTY snapshot: re-decode + replay (mirror exit-path encoding).** The promoted snapshot was serializing `headlessTerminal` directly, which was fed by a streaming decoder initialized from the first-chunk encoding heuristic. When early output is ASCII-only but later output is in a different encoding (GBK / Shift-JIS / etc.), this produces mojibake — and the normal exit path doesn't, because it re-decodes `finalBuffer` with `getCachedEncodingForBuffer` and replays through a fresh terminal. Now mirrors that logic so `result.output` shape matches across the two paths. Direct-serialize remains as a last-ditch fallback if replay throws. - **Switch `default` no longer emits a runtime warn.** Reviewer noted the helper's whitelist made the `default: { _exhaustive: never }` branch unreachable at runtime — the `debugLogger.warn` in it could never fire. Kept the `_: never = kind` type assertion (so a future ShellAbortReason variant forces a TS error here, directing the developer to extend BOTH the helper's whitelist AND add a `case`), removed the unreachable warn. Added a comment that the assertion is the static-only safety net the union expansion would trigger. - **Direct unit tests for `getShellAbortReasonKind`** (8 cases). The helper's prototype-pollution defense is the main reason it exists; if `hasOwnProperty` is accidentally removed the regression would silently send `abortController.abort({})` (any plain reason) into the promote path. Exported the helper and added direct tests for: null / undefined, non-object, empty object (no own kind), prototype- only kind (pollution), unknown kind value, throwing accessor, Proxy trap, and the two happy paths. - **`removeListener` regression guard.** The fix to call `ptyProcess.removeListener('error', ...)` instead of `.off(...)` matters because `@lydell/node-pty`'s IPty interface only exposes `removeListener` — `.off()` throws TypeError on a real PTY but the EventEmitter mock tolerates both. Added a test that spies on both methods and asserts the production code uses `removeListener` for the 'error' event, so a future swap back to `.off()` regresses loudly under the mock instead of silently. 68 / 68 tests pass (58 baseline + 9 helper boundary + 1 removeListener guard + 1 post-exit race); tsc clean; ESLint clean. * fix(core): PTY background-promote post-exit race guard (resolve 5th review pass) Mirrors the child_process post-exit race fix from 4cc558b into the PTY path — addresses 1 [Critical] thread on PR-1 of QwenLM#3831: The PTY may have already exited but our `exitDisposable` (onExit callback) hasn't run yet — node-pty delivers the exit event asynchronously after the PTY's native SIGCHLD, so there's a window between "PTY actually dead" and "service onExit fires". Promoting in that window detaches our exit listener and reports `promoted: true` for a dead PTY, losing the real exit status; the caller would hold an inert pid expecting to take over. The IPty interface doesn't expose an `exitCode` field we can read directly (unlike `child.exitCode` / `child.signalCode` for child_process), so use `process.kill(pid, 0)` as a best-effort liveness check via the existing `ShellExecutionService.isPtyActive` helper. If kill(pid, 0) throws ESRCH, the pid is gone — log at debug level and fall through, letting the pending onExit callback resolve normally with the real exit info. Also adds a unit test mirroring the child_process race test: mocks `process.kill(pid, 0)` to throw ESRCH on the liveness probe, asserts the result has no `promoted: true` and reports the real exitCode. 69 / 69 tests pass; tsc clean; ESLint clean. * docs(core): correct getShellAbortReasonKind boundary-test count in JSDoc Doc said 'all six edge cases' but the test suite has 8 cases (added Proxy-trap and undefined later). Off-by-2 cosmetic only — no behavior change. Caught during a multi-round self-audit of PR-1 of QwenLM#3831. Audit summary: 7 rounds (correctness / reverse / consistency / coverage / build / exception paths / style) found one false-positive (a sync- abort registration-order race I initially thought existed). Verified that Node's WHATWG AbortSignal does NOT auto-fire 'abort' listeners on already-aborted signals, so the race window cannot open. No code change needed for that scenario; this commit is just the JSDoc fix. 69 / 69 tests still pass; tsc + ESLint clean. * docs(core): document the helper / union / switch sync invariant explicitly Multi-round self-audit found that `getShellAbortReasonKind`'s value whitelist has no compile-time tie to the `ShellAbortReason` union: when the union grows, TypeScript's `_exhaustive: never` in each switch forces #3 (the case arm) to be added, but the helper's whitelist (#2) silently keeps degrading the new variant to 'cancel', and the new case arm is never reached at runtime. Reviewer #4 raised this on the second pass; the original commit chose to accept it (option B in that thread) but didn't leave a strong in-code signal for future contributors. Added an INVARIANT block inside the helper enumerating the three sites that must be kept in sync, so the next person extending `ShellAbortReason` sees the coupling at the place where they're most likely to forget it. No behavior change — comment-only. 69 / 69 tests still pass; tsc + ESLint clean. Audit summary (this round + prior round): 18 angles total over two sweeps and one reverse-attack pass. Found: - 0 real bugs - 1 false-positive race (sync-abort registration order — Node WHATWG AbortSignal does NOT auto-fire on already-aborted signals; investigated, reverted) - 1 cosmetic doc fix (boundary-test count off-by-2) - 1 cosmetic INVARIANT block (this commit) Areas reviewed without finding new issues: caller-side ShellExecutionResult shape compatibility (optional `promoted?` field, existing callers spread-untouched); `exited` flag lifecycle (monotonic, cleanup() idempotent); processingChain in-flight ownership (listenersDetached guards every onOutputEvent emit including the renderFn-rendered case via the same flag); race between exit event and abort handler (both microtasks, FIFO ordering gives correct outcome either way); Node version dependence (`AbortSignal.reason` is Node 17.2+, engines: >=20 covers it); test isolation (mockImplementationOnce + module-level mockProcessKill clears each beforeEach); `process.kill(pid, 0)` Windows liveness reliability (best-effort, acceptable for PR-1 plumbing); PID reuse race on the PTY liveness check (theoretically possible, microsecond window, unavoidable at the OS level — rejected in spec discussion); PR-2/PR-3 contract surface (caller MUST attach listeners before abort — documented; any future caller violating this is its own bug). * test(core): align mockChildProcess.exitCode/signalCode in second beforeEach The 'execution method selection' describe block has its own beforeEach (separate from 'child_process fallback') that builds mockChildProcess but does not set `exitCode` / `signalCode = null`. Real Node `ChildProcess.exitCode` / `signalCode` are `null` while the process is alive — and production now reads these in the background-promote race guard. The current tests in this block don't exercise the promote path, so they pass regardless, but any future promote-related test landing here would silently trip the guard (`undefined !== null` is true) and fall through to the normal-exit branch instead of promoting. Mirror the `child_process fallback` block's mock setup so the two beforeEach hooks produce equivalent ChildProcess shapes, eliminating a quiet foot-gun for future contributors. Comment-only / test-fixture change. 69 / 69 tests still pass; tsc clean. Found during a deeper third-round self-audit of PR-1 of QwenLM#3831.
BingqingLyu
pushed a commit
that referenced
this pull request
May 11, 2026
…Change emit (QwenLM#3919) * fix(cli,core): isPending gate on subagent scrollback summary + post-delete statusChange emit Two follow-ups from PR QwenLM#3909 review. 1. **Re-introduce `isPending` gate on `SubagentExecutionRenderer`'s scrollback summary** (Copilot finding on PRRT_kwDOPB-92c6AUQHn). The verbose inline frame retirement collapsed `SubagentExecutionRenderer` to "render the summary whenever a subagent reaches a terminal status" — but with `isPending` removed in QwenLM#3909, that fired in BOTH live (pendingHistoryItems) AND committed (Static) phases. Live-phase rendering duplicated the row LiveAgentPanel already paints below the composer until the parent turn committed. Add `isPending` back to `ToolMessageProps` purely as a gate for this one render path: the summary fires only when `!isPending` (committed). `ToolGroupMessage` forwards the flag (it kept the prop on its own interface for upstream compat the whole time). Test gap closed by the new `live (isPending) terminal subagent → no scrollback summary (panel owns the row)` case. 2. **Emit `statusChange` AFTER delete in `unregisterForeground`** (Copilot finding on PRRT_kwDOPB-92c6AUQGc + the panel-only reconciliation it spawned). The shared snapshot in `useBackgroundTaskView` only refreshes on `statusChange`, and `unregisterForeground` previously fired exactly once — BEFORE delete — so the snapshot froze with the agent as "running" while `registry.get()` returned undefined. Result: `BackgroundTasksDialog` list mode showed a ghost "running" row with cancel hints whose `x` was a no-op, contradicting what the panel already showed (synthesized neutral terminal). Fire `statusChange` a second time AFTER `agents.delete()` so snapshot consumers see the registry-less state and stop surfacing the agent. The first emit still mirrors complete/fail/cancel/finalize ordering (callbacks that re-read `registry.get` see the entry); the second emit is the new contract for snapshot-based views. React batches the two resulting setState calls into one re-render so consumers re-render exactly once. Updated the existing "emits status change before removing the entry" test to capture both emits and explicitly assert that the second observes the registry-less state. Added a sibling test covering the post-delete `getAll()` count. Coverage: 190 passing tests across core + cli (background-view + ToolMessage + ToolGroupMessage + useBackgroundTaskView). * fix(cli,core): compact-mode terminal subagent expansion + statusChange context flag Five review findings on PR QwenLM#3919: 1. **Compact mode bypassed the scrollback summary** (gpt-5.5 via /qreview, ToolGroupMessage:324). `ToolGroupMessage` returns `CompactToolGroupDisplay` before the ToolMessage path when `compactMode === true`, so the new `isPending` gate on `SubagentExecutionRenderer` only protected the expanded path — committed terminal subagents in compact mode never reached `SubagentScrollbackSummary` and the LiveAgentPanel → committed- summary handoff broke for users who turned compact mode on. Force-expand the group when `!isPending` AND any tool call has a terminal `task_execution` resultDisplay. Stay compact while the parent turn is still live (`isPending`) — the panel below the composer owns that surface and an inline summary would duplicate it. Coverage: 4 new ToolGroupMessage cases (compact + completed-committed expands; compact + running-live stays compact; compact + completed-live stays compact; compact + failed-committed expands). 2. **Snapshot-coupled comment in `packages/core`** (Copilot, background-tasks.ts:292). The comment named CLI/UI consumers (`useBackgroundTaskView`, `BackgroundTasksDialog`) and asserted React batching guarantees from a core file. Reword to "snapshot-style consumers that re-pull `getAll()` from inside the callback" and drop the framework-specific batching claim. 3. **Two-phase emit needed an explicit signal** (Copilot, background-tasks.ts:283). Emitting `statusChange` twice without distinguishing the phases forced consumers to either do duplicate work or risk persisting a stale `entry` from the second callback. Add an optional second arg `context?: { removed?: boolean }` to `BackgroundStatusChangeCallback`; the post-delete emit passes `{ removed: true }` so consumers can disambiguate without re-querying the registry. Backwards compatible — existing callbacks ignore the new arg. Tests updated to assert both `mock.calls[0][1] === undefined` and `mock.calls[1][1] === { removed: true }`. 4. **`isPending` doc clarified** (Copilot, ToolMessage.tsx:507). Made the default semantics explicit: omitted/undefined is treated as committed (not pending); live-area renderers MUST pass `true` explicitly to suppress the scrollback summary. 5. (4 of the threads were duplicate Copilot fires of #2 + #3.) Coverage: 219 test files / 3369 passing across cli/ui + core/agents. * docs(cli): update ToolGroupMessageProps.isPending JSDoc The previous prop comment claimed `isPending` was "not consumed by the group body" — true at the time, but the body now reads it for two real purposes (compact-mode gating + forwarding to ToolMessage). Update the doc so future callers / tests don't treat it as legacy. Addresses Copilot finding on PRRT_kwDOPB-92c6AYE0V. * fix(cli): hide live-phase subagent tool entries — LiveAgentPanel owns the row User report: with compact mode OFF, a running subagent shows up twice — once as the parent tool group's `task` row (status icon + name + description), once as the LiveAgentPanel row beneath the composer. Same agent, two surfaces, redundant. Filter `task_execution` tool entries out of the expanded `ToolGroupMessage` while `isPending=true` so the panel is the single source of truth for in-flight subagents. The entry returns once the parent turn commits (`isPending=false`), letting `SubagentScrollbackSummary` land inside the parent's tool group as a persistent audit trail. Exception: subagents with a pending approval still render, because the focus-routed banner / queued marker is the only inline surface that lets users answer the prompt without opening the dialog. If a group is purely panel-owned (e.g. a single Task call with no sibling tools), the entire `ToolGroupMessage` returns `null` so an empty bordered container doesn't float above the panel. Coverage: +4 ToolGroupMessage cases — running entry hidden in live phase / mixed group keeps siblings / pending-approval entry still renders / committed entry comes back for the audit trail. * refactor(cli): tighten subagent-tool helper naming + ANSI-safe scrollback summary Self-audit + independent review found 5 cleanup items on the live-phase hide path; all addressed in one commit since none are behavioral changes: 1. **Move `allEntriesPanelOwned` short-circuit BEFORE `showCompact`** so a pure-subagent group in compact mode is also hidden during the live phase (previously CompactToolGroupDisplay rendered a single summary line above the panel — a mild duplicate on top of what the non-compact path already fixed). 2. **Rename `isLiveSubagentTool` → `isSubagentToolEntry`.** The helper identifies a tool's resultDisplay shape; it doesn't check live-state. The previous name conflated "predicate" with "use case" and read as if it returned true only during the live phase. 3. **DRY up `hasCommittedTerminalSubagent`** to use `isSubagentToolEntry` instead of inlining its own type-narrowing. 4. **ANSI-escape `subagentName` / `taskDescription` / `terminateReason`** in `SubagentScrollbackSummary`. Same threat model as the panel rows and HistoryItemDisplay — these strings come from subagent config (user-authored) and LLM output and could carry terminal control sequences. The stats fields (tool count / duration / tokens) flow through trusted formatters and don't need escaping. 5. **Doc comments updated** to reflect the four real responsibilities of `isPending` on `ToolGroupMessageProps` (hide pure groups, force-expand committed compact, per-tool filter, forward to ToolMessage), to clarify that the keyboard-focused subagent id can point at a hidden tool harmlessly (the iterator returns `null` before the focus prop is computed), and to drop the redundant "EXCEPT" clause on the per-tool filter in favor of a single sentence. Coverage unchanged: 251 passing tests across messages / background-view / core/agents; broader 3374-test sweep clean; TS clean on both cli and core packages. * fix(cli,core): address 3 critical review findings + ANSI/doc cleanups Three real bugs flagged by gpt-5.5 via /qreview, plus 4 doc / sanitization nits from Copilot. All 7 threads close together since they share the same surfaces. ## Critical fixes 1. **Foreground subagents disappeared mid-parent-turn** (PRRT_kwDOPB-92c6AYvL9). Post-QwenLM#3921 swap-order, `unregisterForeground` drops the entry from the panel snapshot the moment the subagent finishes. The previous round's `!isPending` gate on `SubagentScrollbackSummary` then suppressed the inline summary too, leaving the user with nothing on screen for the run until the parent committed. - Drop the `!isPending` gate — `unregisterForeground` already removes the row from the panel, so the inline summary can fire in BOTH live and committed phases without duplicating it. - Tighten the `ToolGroupMessage` live-phase hide so it only filters `running` / `paused` / `background` task entries (`isPanelOwnedSubagentTool`), not terminal ones. Terminal entries pass through immediately so the summary lands. - The "panel-owned" predicate is now distinct from the broader "subagent tool entry" predicate (`isSubagentToolEntry`) and the "terminal subagent" predicate (`isTerminalSubagentTool`); each usage site picks the one it actually means. 2. **Compact mode dropped the scrollback summary** (PRRT_kwDOPB-92c6AYvLw). Force-expanding the group made the container go through the expanded path, but `ToolMessage`'s own compact-mode gate (`!compactMode || forceShowResult ? renderer : 'none'`) still suppressed the result block, so `SubagentScrollbackSummary` never rendered for compact-mode users. Pass `forceShowResult={true}` for terminal subagent tool entries so the result block is always rendered. 3. **`mergeCompactToolGroups.isForceExpandGroup` didn't know about terminal subagents** (PRRT_kwDOPB-92c6AYvMC). The committed- history preprocessor merged adjacent tool_groups before render, so a terminal `task_execution` group could be absorbed into a compact batch (its `tool_use_summary` label dropped), and the render-time force-expand check never got a chance to override. Mirror the `hasCommittedTerminalSubagent` predicate inside `isForceExpandGroup` so preprocessing and rendering agree. ## Doc / sanitization nits - `BackgroundStatusChangeCallback` doc now lists every emitter (register / complete / fail / cancel / finalizeCancelled / finalizeCancellationIfPending / abandon / unregisterForeground / reset) and groups them by ordering camp (keeps-the-entry vs removes-the-entry — `reset` joins `unregisterForeground` in the delete-then-emit camp). - ANSI-escape `data.subagentName` in the focus-holder banner and the queued marker (`SubagentExecutionRenderer`) — same threat model as the panel rows and `SubagentScrollbackSummary`. ## Coverage delta - New ToolMessage case: live-phase terminal subagent now renders inline (replaces the prior "no scrollback summary" assertion that was the symptom of the AYvL9 bug). - New ToolGroupMessage cases: terminal subagent in live phase renders inline; `forceShowResult=true` propagates for terminal subagent tools (mock now exposes the prop). - New mergeCompactToolGroups parametrized cases: terminal subagent in any of completed / failed / cancelled stays its own batch. 280 tests pass across cli messages + utils + background-view + core/agents. TS clean. * fix(cli): drop `'paused'` arm from isPanelOwnedSubagentTool — not in AgentResultDisplay union CI Lint failed with TS2367: the previous round's `isPanelOwnedSubagentTool` checked for `status === 'paused'` but `AgentResultDisplay.status` (the tool-result-side type) only carries `'running' | 'completed' | 'failed' | 'cancelled' | 'background'`. The `'paused'` status lives on the registry-side `BackgroundTaskStatus` union and is only ever surfaced through `LiveAgentPanel` directly, never through a `task_execution` payload. Drop the dead arm and add a comment so a future "let's also check paused here" doesn't get re-introduced. * fix(cli): apply panel-ownership filter once before compact-mode decision Mixed live groups (running subagent + sibling tool) leaked the panel-owned subagent into `CompactToolGroupDisplay`'s count and `getActiveTool` selection, because `showCompact` returned BEFORE the inline `.map()` filter ran. Compact-mode users would see e.g. `task × 2 Delegate task to subagent` even though LiveAgentPanel already owned the subagent row below the composer. Derive `inlineToolCalls` once via `useMemo` immediately after the existing hook block and use it consistently for the compact summary, sizing math, and the render map. The early-return for "all-entries-panel-owned" collapses into `inlineToolCalls.length === 0` (gated on `isPending` so the legacy empty-input committed-phase snapshot is preserved). Remove the inner `.map()` filter — the upstream derivation already excluded the same entries. JSDoc updates: - `ToolGroupMessageProps.isPending` now describes the real flow (build inlineToolCalls / force-expand / forward to ToolMessage for parity). - `ToolMessageProps.isPending` is documented as forwarded-but-inert (`SubagentExecutionRenderer` doesn't gate on it; the live-phase filter and the unconditional terminal summary do the actual work). Regression test: live mixed group in compact mode → sibling wins active-tool, count collapses to 1, no `× 2` suffix, no subagent description in the header. Addresses Copilot review comments 3205262972 / 3205263020 (doc/code mismatch) and gpt-5.5 critical 3205288299 (compact-mode leak). * fix(cli): force-expand compact groups on terminal subagent in live phase too Resolved comment 3203286936 codified the design intent that `SubagentScrollbackSummary` "fires in BOTH live and committed phases" to bridge `unregisterForeground`'s post-delete panel-snapshot drop and the parent turn committing. Non-compact mode honored that contract (terminal subagents render the summary inline whenever they appear in `inlineToolCalls`), but compact mode still gated `hasCommittedTerminalSubagent` on `!isPending`, so a foreground subagent finishing mid-turn under compact mode produced NOTHING inline until the parent committed — exactly the gap the bridge was meant to close. Drop the `!isPending` arm and rename `hasCommittedTerminalSubagent` → `hasTerminalSubagent`. The force-expand now applies to terminal subagents in either phase; compact-mode users see the same outcome line non-compact users already get. Mirrors `SubagentExecutionRenderer`'s ungated terminal-summary path and `mergeCompactToolGroups.isForceExpandGroup`'s no-isPending-gate preprocessing rule. Tests: - Flip "compact mode: live group with completed subagent stays compact" → "force-expands so the summary bridges the panel-snapshot drop". Update rationale to reflect post-QwenLM#3921 reality (panel evicts terminal foreground rows immediately). - Add "compact mode: live mixed group with terminal subagent + sibling force-expands and renders both" — covers the bridge in mixed groups. - Update two stale `hasCommittedTerminalSubagent` cross-references in `mergeCompactToolGroups.{ts,test.ts}` comments.
BingqingLyu
pushed a commit
that referenced
this pull request
May 11, 2026
…nLM#3892) * fix(core): close bound-tool gap on runForkedAgent's YOLO wrapper Follow-up to QwenLM#3873 review (#3 of the three flagged adjacent Config-wrapper sites). `runForkedAgent`'s AgentHeadless path used to build its YOLO override via a local `Object.create(parent) + getApprovalMode = YOLO` helper that did NOT rebuild the tool registry, so: 1. The YOLO approval mode was silently ignored on the bound-tool path — parent's already-bound `EditTool` / `WriteFileTool` / `ReadFileTool` resolved `this.config.getApprovalMode()` back to the parent. 2. The fork's reads / mutations went through the parent's `FileReadCache` instead of a per-fork cache. 3. Memory-extraction and dream-agent paths stack the YOLO wrapper over a `getPermissionManager`-overriding scoped wrapper. Since the bound tools resolved to the parent, BOTH overrides — the YOLO approval mode AND the scoped permission manager — were bypassed. The fix routes through the existing `createApprovalModeOverride` helper, which: - rebuilds the tool registry on the wrapper (so bound tools resolve `this.config` to the wrapper), - copies discovered tools from the upstream registry, - sets the `TOOL_REGISTRY_REBUILT` Symbol marker so any further downstream wrapper layer recognises the rebuild and skips redundant work. The memory-extraction / dream-agent composition now resolves correctly via prototype walk — the YOLO wrapper sits above the scoped wrapper, so bound tools observe `getApprovalMode() = YOLO` on the wrapper itself and `getPermissionManager() = scopedPm` one prototype level up. Adds a try/finally around the AgentHeadless run so the per-fork ToolRegistry is stopped after execution — same shape as the spawn finallys in `agent.ts` and `background-agent-resume.ts`. Without this, every AgentTool / SkillTool the fork's model later instantiates leaks its change-listener on shared SubagentManager / SkillManager. Adds `forkedAgent.agent.test.ts` covering: marker + YOLO + distinct registry on the wrapper passed to AgentHeadless.create; bound EditTool resolves to the wrapper; memory-scoped composition preserves both YOLO and scopedPm; `stop()` fires after the AgentHeadless body finishes. Uses `vi.spyOn(AgentHeadless, 'create')` rather than module mocking so the real `ContextState` / `AgentEventEmitter` keep working. `npx vitest run packages/core/src` — 269 files / 6992 passed. * test(core): cover stop() lifecycle on AgentHeadless.create + execute failure paths Self-review feedback on QwenLM#3892: the stop lifecycle test only covered the success path. A future refactor could move the stop() out of the `finally` block and onto the success branch, reintroducing listener leaks when create or execute rejects, while every existing test still passes. Two new tests pin the cleanup to the `finally`: 1. `stops the per-fork ToolRegistry even when AgentHeadless.create rejects` — make `AgentHeadless.create` return a rejected promise; assert the rejection propagates and the stop spy still fires once. 2. `stops the per-fork ToolRegistry even when headless.execute rejects` — return a headless object whose `execute` rejects; same shape. Together with the success-path test these three cases cover every exit edge of the AgentHeadless body. `npx vitest run packages/core/src` — 269 files / 6994 passed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Test PR for verifying the pr-label feature. This PR is used to test the conflicting-group label.