Skip to content

test: PR label feature verification#3

Closed
BingqingLyu wants to merge 2 commits into
mainfrom
test-pr-label
Closed

test: PR label feature verification#3
BingqingLyu wants to merge 2 commits into
mainfrom
test-pr-label

Conversation

@BingqingLyu

Copy link
Copy Markdown
Owner

Test PR for verifying the pr-label feature. This PR is used to test the conflicting-group label.

BingqingLyu pushed a commit that referenced this pull request May 7, 2026
…QwenLM#3774)

* feat(core): enforce prior read before Edit / WriteFile mutates a file

Introduces a session-scoped invariant: the model cannot mutate an
existing file without having actually Read it (or its post-write
state) earlier in this conversation. Builds on the FileReadCache
landed in QwenLM#3717.

Two new ToolErrorType codes:
- EDIT_REQUIRES_PRIOR_READ — file has no entry in the session cache.
  The model is told to use read_file first.
- FILE_CHANGED_SINCE_READ — file has an entry but its mtime or size
  drifted since the recorded fingerprint. The model is told to
  re-read before retrying.

EditTool blocks the existing-file path on cache.check; new-file
creation (old_string === '' on a non-existent target) is exempt.
WriteFileTool blocks the overwrite path; new-file creation
(fileExists === false) is exempt.

Both tools route through the existing fileReadCacheDisabled escape
hatch on Config — flipping it bypasses enforcement byte-for-byte,
matching pre-cache behaviour. Operators can use this as a kill switch
if a session falls into a state where the cache cannot be trusted.

ReadFile fix on the auto-memory path: PR QwenLM#3717 had auto-memory reads
skip the cache entirely (both lookup and record), but with the new
enforcement that means a model that just Read AGENTS.md cannot then
Edit it. Decoupled the two: auto-memory reads still skip the
file_unchanged fast-path (the per-read freshness <system-reminder>
must always reach the model) but DO record into the cache so the
follow-up Edit sees `fresh`. New regression test asserts this.

Test plan
- vitest run (all of @qwen-code/qwen-code-core): 6308 passed, 2 skipped
- 9 new enforcement tests across edit.test.ts and write-file.test.ts:
  unknown rejects, stale rejects, new-file exempt, edit chain stays
  authorised, escape hatch bypasses, plus the auto-memory record
  regression in read-file.test.ts.
- tsc --noEmit clean. eslint clean. core build succeeds.

* test(core): clear shared fileReadCache between write-file.test.ts cases

CI surfaced one Linux-only failure: the prior-read enforcement test
'rejects a write that would overwrite an unread existing file'
returned FILE_CHANGED_SINCE_READ instead of EDIT_REQUIRES_PRIOR_READ.

Root cause: the FileReadCache instance is declared at module scope
(line 41) and shared across every test in write-file.test.ts. State
from earlier tests — most recently the 'records a write' integration
test that records the same path — leaks forward. On Linux the test
ordering puts a record-bearing test before the enforcement test, so
the cache reports `stale` (mtime drifted) instead of `unknown`.
macOS / Windows happen to order them differently and never hit it.

Adding a fileReadCache.clear() to beforeEach gives every test a
known-empty cache, matching how edit.test.ts already isolates its
per-test cache by re-instantiating it.

* fix(core): close prior-read enforcement gaps flagged in 3rd review

Three concrete loopholes / regressions that the original PR-B
introduction left open. All three are addressed in the same commit
because the underlying refactor (move enforcement earlier and tighten
the fresh predicate) is shared across them.

1. fresh != "model has seen the bytes". Pre-fix, requirePriorRead()
   accepted any cache.check === 'fresh'. ReadFileTool records every
   successful read into the cache, including ranged reads
   (offset/limit), truncated full reads, and non-cacheable
   binary/image/audio/video/PDF/notebook reads (lastReadCacheable
   = false). This let the model peek at a slice or a structured
   payload of a file and then mutate the whole thing. Tightened the
   accept predicate to fresh && lastReadAt && lastReadWasFull &&
   lastReadCacheable.

2. Read-less content oracle through calculateEdit error codes. Pre-fix,
   execute() ran calculateEdit (which reads file bytes and counts
   matches) before the enforcement check. A model could probe an
   unread file by attempting Edits with candidate old_strings and
   observing NO_OCCURRENCE_FOUND vs EXPECTED_OCCURRENCE_MISMATCH vs
   EDIT_NO_CHANGE — reverse-engineering content without ever calling
   read_file. Moved enforcement to the top of calculateEdit, before
   any content read; only a stat is performed up to the rejection
   point.

3. Confirmation flow regression. Pre-fix, getConfirmationDetails()
   read the existing file to render a diff for the user, then
   approval flowed to execute() which would freshly check the cache
   and reject. The user could approve a diff computed from current
   bytes the model never saw, and the call would still fail. Moved
   enforcement before the confirmation read in both EditTool (via the
   shared calculateEdit path) and WriteFileTool (explicit check at
   the top of getConfirmationDetails). The user now never sees a
   confirmation diff for an unread file — the call rejects up front.

Public API surface change: requirePriorRead() -> checkPriorRead() that
returns a structured decision, so the same predicate can route into
a CalculatedEdit.error (calculateEdit), a thrown error
(getConfirmationDetails), or a ToolResult (execute) without
duplicating the boolean / message / type plumbing in three shapes.

Reported by pomelo-nwu (3 inline comments on PR QwenLM#3774).

* refactor(core): close 4 prior-read enforcement gaps from 4th review

1. recordWrite now seeds read metadata on brand-new entries
   (lastReadAt / lastReadWasFull / lastReadCacheable). The strict
   accept predicate added in the previous round (#3 review) requires
   all three, but recordWrite only set lastWriteAt — so a model
   creating a file with Edit (old_string="") or WriteFile and then
   editing it again was rejected on the second edit. The model
   authored the bytes it just wrote; for the purposes of prior-read
   enforcement that counts as having seen them. New regression test
   in edit.test.ts: "allows a create-then-edit-then-edit chain
   without an intervening read".

2. Extracted checkPriorRead into src/tools/priorReadEnforcement.ts.
   The two copies in edit.ts and write-file.ts had already drifted
   (one used ${ReadFileTool.Name}, the other hardcoded 'read_file');
   the boolean guard is security-sensitive and a one-sided fix
   would silently weaken the boundary. The shared utility takes a
   verb ('editing' | 'overwriting') so the user-facing prose can
   differ between callers without duplicating the decision logic.

3. WriteFileTool.execute now runs prior-read enforcement BEFORE
   readTextFile. Pre-fix, an unread overwrite still slurped the
   entire file into memory (encoding / BOM / line-ending detection)
   and only then rejected it: wasted I/O, and momentary in-memory
   custody of bytes the model never legitimately read. Now matches
   the order in getConfirmationDetails().

4. The "rejects a write that would overwrite an unread existing
   file" test now spies on FileSystemService.readTextFile and
   asserts not.toHaveBeenCalled() — without that, the test gave
   false confidence: it passed both pre-fix (read happened, then
   reject) and post-fix (reject before read), so the ordering
   regression in (3) was invisible to the assertion.

Reported by glm-5.1 via /review on PR QwenLM#3774.

* refactor(core): close 4 prior-read enforcement gaps from 4th review (Copilot)

Five concrete gaps that the previous round of enforcement work left
open. Reported by Copilot via /review on PR QwenLM#3774.

1. Confirmation-time rejections lost their ToolErrorType code.
   getConfirmationDetails() in both EditTool and WriteFileTool threw
   a plain Error on prior-read failure, which coreToolScheduler
   collapsed into UNHANDLED_EXCEPTION — silently breaking the
   EDIT_REQUIRES_PRIOR_READ / FILE_CHANGED_SINCE_READ contract for
   any approval-required flow.

   Fix: introduce PriorReadEnforcementError that carries
   `errorType: ToolErrorType`. Both confirmation paths now throw it,
   and coreToolScheduler reads `error.errorType` (falling back to
   UNHANDLED_EXCEPTION when absent). New regression tests assert
   the thrown error's `errorType` field for both tools.

2. checkPriorRead's "re-read with read_file" advice was wrong for
   binary / image / audio / video / PDF / notebook files. Their
   ReadFile result always sets lastReadCacheable=false, so the
   message would loop the agent forever on the same rejection.

   Fix: detect the fresh-but-non-cacheable case explicitly and
   return a dedicated message that explains the dead end ("Edit /
   WriteFile cannot mutate that payload safely") instead of asking
   for another read. Updated the existing non-cacheable regression
   test to assert the new message and the absence of "use the
   read_file tool first".

3. checkPriorRead swallowed every stat() failure and returned
   ok:true. EACCES, EBUSY, NFS hiccups, etc. would silently
   re-open the blind-write path the helper exists to block.

   Fix: only ENOENT continues to return ok:true (disappearance
   race). Any other code is fail-closed: returns
   EDIT_REQUIRES_PRIOR_READ with a message that names the errno.
   New regression test in write-file.test.ts spies on fs.promises
   .stat to inject EACCES and asserts the rejection.

4. The auto-memory record regression test only asserted `state ===
   'fresh'`. A future change that recorded auto-memory reads as
   partial / non-cacheable would still satisfy that assertion but
   would actually fail enforcement on every follow-up Edit.

   Fix: also assert lastReadAt is defined, lastReadWasFull is true,
   and lastReadCacheable is true. The full "what enforcement
   requires" predicate is now explicit in the test.

(The 5th item, the WriteFile mirror of (1), is covered by the same
PriorReadEnforcementError change.)

* refactor(core): tighten StructuredToolError naming + add scheduler test

Four follow-ups raised by deepseek-v4-pro on PR QwenLM#3774. None of them
change the enforcement boundary; they are all about making the
contract clearer and harder to break in future changes.

1. PriorReadEnforcementError -> StructuredToolError. The class now
   wraps any content-derived ToolErrorType from calculateEdit
   (EDIT_NO_OCCURRENCE_FOUND, EDIT_EXPECTED_OCCURRENCE_MISMATCH,
   EDIT_NO_CHANGE, ATTEMPT_TO_CREATE_EXISTING_FILE) on top of the
   prior-read codes. The old name suggested the class was prior-
   read-specific, which would mislead any oncall engineer seeing
   it paired with one of the calculateEdit error codes.

2. EDIT_REQUIRES_PRIOR_READ kept its name (the prefix mentions
   "edit" but the enum is shared with WriteFileTool) — chose
   documentation over rename to avoid the churn of a value rename
   across logs/dashboards already keyed on it. JSDoc now spells
   out the cross-tool usage explicitly.

3. Stat failures other than ENOENT now map to a new
   PRIOR_READ_VERIFICATION_FAILED code instead of being conflated
   with EDIT_REQUIRES_PRIOR_READ. The failure mode is "we cannot
   verify" rather than "definitely not read" — operators routing
   on error codes can distinguish the two populations.

4. Added a coreToolScheduler test (`surfaces error.errorType from
   a confirmation throw instead of UNHANDLED_EXCEPTION`) that
   constructs a stub tool whose getConfirmationDetails throws
   StructuredToolError and asserts the surfaced ToolCall response
   carries the correct ToolErrorType. Without this test the
   scheduler's explicitErrorType branch would have no coverage at
   all.

Tool tests updated for the new StructuredToolError class name and
the PRIOR_READ_VERIFICATION_FAILED code on the EACCES path.

* fix(core): close TOCTOU + grammar + directory regressions in PR-B

Six concrete issues that the previous round of enforcement work
left open. Reported by Copilot via /review on PR QwenLM#3774.

1. TOCTOU window between pre-read checkPriorRead and readTextFile.
   The pre-read stat could pass enforcement, then an external writer
   could land between that stat and the actual read, leaving
   currentContent reflecting bytes the model never saw — exactly the
   stale-write path the PR is supposed to block. Closed by re-running
   checkPriorRead immediately after every readTextFile that fed
   currentContent / originalContent: EditTool.calculateEdit and the
   two WriteFileTool paths (execute + getConfirmationDetails). A
   `stale` outcome now fails the operation with
   FILE_CHANGED_SINCE_READ at the correct moment.

2. Directory targets sent the model into an enforcement loop.
   `fileExists` is a plain access check, so directories also entered
   the enforcement branch — the model would be told to call
   `read_file`, but `read_file` rejects directories with
   TARGET_IS_DIRECTORY, so the loop never terminated. Fixed in
   checkPriorRead: if `fs.stat` reports the path is not a regular
   file, return `ok: true` so the downstream readTextFile / write
   path can surface its own EISDIR / similar error.

3. Confirmation-time error messages used the short `display` form
   instead of the full `raw` form. Approval-required Edit calls
   therefore lost the remediation detail (file path, stale-vs-unread
   distinction, "without offset / limit / pages" hint) that the
   execute path already surfaced and that the WriteFile confirmation
   path already preserved. EditTool.getConfirmationDetails now
   throws StructuredToolError with `editData.error.raw`.

4. Non-text payload displayMessage was grammatically broken: built
   from the gerund `editing` / `overwriting`, it rendered as
   "cannot editing via this tool" / "cannot overwriting via this
   tool". Fixed by deriving a bare-verb form (`edit` / `overwrite`)
   alongside the gerund and using it in displayMessage.

(Items 1, 5 and 6 from Copilot's batch are the same TOCTOU class —
EditTool calculateEdit + WriteFile execute + WriteFile confirmation —
addressed together in (1) above.)

The "bypasses enforcement entirely" test now uses mockReturnValue
instead of mockReturnValueOnce because calculateEdit calls
getFileReadCacheDisabled twice — once for the pre-read check and
once for the post-read TOCTOU re-check. Both must see disabled=true
to actually bypass.

* fix(core): close fileExists TOCTOU on WriteFile prior-read enforcement

WriteFile gated prior-read enforcement on `fileExists` from
`isFilefileExists()`, but a file that sprang into existence between
that check and the write would still be overwritten without
enforcement — `fileExists === false` skipped the check entirely.

Made the gate unconditional on `fileExists`. checkPriorRead's own
`fs.stat` decides what to do:
- ENOENT → ok:true, fall through to the new-file path as before
- file exists right now (whether or not isFilefileExists saw it) →
  unknown / stale check runs, the race-created file is rejected.

Applied to both getConfirmationDetails and execute. The path that
actually creates new files is unchanged because checkPriorRead's
ENOENT branch is the disappearance-race exit, which is the correct
exit for "the file truly does not exist yet".

Reported by glm-5.1 via /review on PR QwenLM#3774.

* fix(core): close 4 enforcement gaps + 1 critical bug from 5th Copilot review

Six issues raised by deepseek-v4-pro / glm-5.1 / qwen3.6-plus on
PR QwenLM#3774. Listed by reviewer-assigned severity.

[Critical] (qwen3.6-plus) recordWrite previously only seeded the
read metadata for brand-new entries. The reproduction was real:
ReadFile(limit=10) → WriteFile(full content) → Edit. The partial
read's lastReadWasFull=false would persist through the write, and
the Edit would be rejected with EDIT_REQUIRES_PRIOR_READ even
though the model just authored every byte. recordWrite now
unconditionally refreshes lastReadAt, lastReadWasFull=true, and
lastReadCacheable=true. The fileReadCache.test.ts case that
previously asserted "preserves lastReadAt" is rewritten to assert
the new "refreshes lastReadAt to match the write" contract, and a
new "upgrades lastReadWasFull / lastReadCacheable after a full
write" regression test pins the reproduction reviewer described.

[Suggestion] (deepseek-v4-pro) Narrowed the non-regular-file
bypass in priorReadEnforcement from `!stats.isFile()` to
`stats.isDirectory()`. The earlier broad form covered FIFOs,
sockets, and devices that the model has no legitimate "read first"
recourse for and that can block readTextFile (FIFO) or
over-allocate (/dev/urandom). Those now flow through to
cache.check() and reject with the unread-file path before any I/O.

[Suggestion] (glm-5.1) Removed the `fileExists && ...` gate from
EditTool.calculateEdit, mirroring the f4ef756 fix on WriteFile.
A file that springs into existence between isFilefileExists() and
the enforcement check is now caught here as well; ENOENT inside
checkPriorRead remains the disappearance-race exit and new-file
creation flow is unchanged.

[Suggestion] (deepseek-v4-pro) Added debugLogger.warn() at every
post-read TOCTOU rejection site (Edit calculateEdit, WriteFile
getConfirmationDetails, WriteFile execute). These rejections are
rare and self-healing — without a debug record, an operator
investigating "why did this Edit fail once?" had nothing to grep.
debugLogger uses dedicated 'EDIT_PRIOR_READ' / 'WRITE_FILE' tags.

[Suggestion] (qwen3.6-plus) Added a final pre-write checkPriorRead
in EditTool.execute() and WriteFileTool.execute(). The earlier
post-read check ran inside calculateEdit (Edit) or before mkdirSync
(WriteFile), but the actual writeTextFile call could be arbitrarily
later — user approval, modify-and-confirm, etc. The window from
"post-read check → writeTextFile" is now bounded to "pre-write
stat → writeTextFile" (two adjacent syscalls).

* fix(core): close new-file race + special-file enforcement loop

Three issues from the latest Copilot review on PR QwenLM#3774.

1. New-file race in pre-write enforcement (write-file.ts:348,
   edit.ts:487). The earlier pre-write checkPriorRead was gated on
   `fileExists` (WriteFile) and `!editData.isNewFile` (Edit). If the
   path was absent at planning time and another process created it
   while approval was pending, the gated form would skip enforcement
   and silently overwrite a pre-existing file the model never read.
   Run unconditionally in both tools — checkPriorRead's own ENOENT
   branch is the disappearance-race exit, so genuine new-file
   creation is unaffected, but a race-created file now hits the
   `unknown` branch and is rejected as unread.

2. FIFO / socket / device sent the model into an enforcement loop
   (priorReadEnforcement.ts:220). After narrowing the
   non-regular-file bypass to directories only, FIFOs etc. fell
   through to cache.check, returned `unknown`, and produced a
   "use read_file first" message — but read_file rejects those same
   targets as "not a regular file", so the model would loop on
   read_file forever. Added a dedicated `!stats.isFile()` branch
   (after the directory exemption) that returns a "special file;
   cannot edit/overwrite via this tool — use shell instead" message,
   matching the shape of the existing non-text-payload guidance.

(Tool-error.ts and the non-cacheable policy notes are addressed in
the PR description update — not in code.)

* fix(core): close 4 enforcement gaps from 6th Copilot review

(Plus a doc-only update for the 5th — the mtime+size limitation
warning in the Risk section now mentions the silent-overwrite
escalation that this PR's mutation paths bring along.)

1. ENOENT after the model has already read the file is no longer
   silently treated as `ok: true`. Added an `expectExisting` option
   to `checkPriorRead`; post-read and pre-write callers pass `true`.
   ENOENT under that flag now rejects with `FILE_CHANGED_SINCE_READ`
   ("file disappeared after the model read it") rather than falling
   through to the new-file path with stale bytes. Pre-read callers
   keep the old default (ENOENT → ok:true → fall through to genuine
   new-file creation). EditTool's pre-write check derives the flag
   from `editData.isNewFile`; WriteFile's pre-write check derives it
   from the post-read `fileExists` value.

2. Directory targets now reject with `TARGET_IS_DIRECTORY` and a
   structured message instead of returning `ok: true`. The previous
   form fell through to readTextFile(), which on the WriteFile
   confirmation path threw a plain Error and was surfaced by the
   scheduler as `UNHANDLED_EXCEPTION`. Both Edit and WriteFile now
   emit a structured rejection at enforcement time. (WriteFile's
   build-time validateToolParamValues already rejects directories,
   so the change matters most for EditTool.)

3. Non-cacheable rejection's `rawMessage` no longer hard-codes
   "overwrite" — it now uses the same `verbBare` derivation as the
   `displayMessage`, so EditTool's path correctly says "if you need
   to edit it" and WriteFile's path stays "if you need to overwrite
   it". The previous form was confusing for in-place edits.

4. WriteFile.getConfirmationDetails now mirrors execute()'s
   ENOENT-to-new-file fallback: a file that disappears between
   isFilefileExists() and the readTextFile-for-diff call no longer
   throws a plain Error (which would surface as
   UNHANDLED_EXCEPTION) — it falls back to the brand-new-file diff
   so the user sees a clean confirmation rather than an unstructured
   crash.

Tests:
- New: `rejects an edit on a directory with TARGET_IS_DIRECTORY`
- New: `confirmation falls back to a new-file diff when the file
  disappears mid-flight` (WriteFile)
- Updated: non-cacheable rejection asserts `verbBare` is "edit" on
  the EditTool path and "overwrite" on the WriteFile path.

Reported by Copilot via /review on PR QwenLM#3774.

* docs(core): clarify stat→write race + EDIT_REQUIRES_PRIOR_READ scope

Three doc-only follow-ups from Copilot's latest review pass on PR
QwenLM#3774. None change behaviour; the pre-fix code state was already
the actual contract — the docs just lagged it.

1. EDIT_REQUIRES_PRIOR_READ enum comment now lists the three cases
   the code actually returns it for (never-read, partial / ranged /
   non-cacheable read, structural dead end — non-text payload or
   special file). The previous one-liner described only the first
   case and would mislead future maintainers.

2. The Final pre-write freshness check blocks in EditTool.execute
   and WriteFileTool.execute now spell out that they DO NOT
   eliminate the stat → writeTextFile race. The window narrows
   from the previously-unbounded post-read-to-write gap down to
   two adjacent syscalls, but a concurrent writer landing in
   that pair can still be clobbered. Closing the residual would
   require an atomic write (write-to-temp + rename) or a
   content-hash post-write recheck — both deferred. Operators who
   need strict protection set `fileReadCacheDisabled: true` and
   rely on application-level locking.

3. PR description Risk section gains a "Known unmitigated: stat →
   write race window" subsection (English + Chinese mirror)
   matching the code comments.

* chore(core): minor follow-ups from review #4229917446

Three of the five MINOR items raised in the independent code review
on 2026-05-05 — the cheap, isolated ones. The other two (race-
simulating integration test, moving StructuredToolError out of
priorReadEnforcement.ts) are deferred as the reviewer suggested.

1. EditTool now has a symmetric `PRIOR_READ_VERIFICATION_FAILED`
   regression test (mocks fs.promises.stat to reject with EACCES,
   asserts the EditTool path produces the same fail-closed result
   that the existing WriteFile EACCES test pins). Five-line fix to
   close the asymmetry that, while harmless today (the helper is
   shared), would let a future Edit-side change to checkPriorRead
   slip through without test coverage.

2. ensureParentDirectoriesExist / mkdirSync now run AFTER the
   pre-write checkPriorRead in both EditTool.execute() and
   WriteFileTool.execute(). Doing it before would leak intermediate
   directories on the rejection path — a real (if minor) FS litter
   the previous order created on every rejected new-file write.

3. EDIT_REQUIRES_PRIOR_READ enum docstring gains a one-line note
   for operators routing alerts on this code: a single
   `edit_requires_prior_read` signal can mean any of the three
   cases (no read / partial read / structural dead-end), and if
   per-cause monitoring becomes important the enum can be split
   in a follow-up. The originating tool name and the message text
   already disambiguate at runtime.

* fix(core): close 2 correctness gaps from maintainer review #4232751470

Both tracked back to the cache's "track most recent read shape"
model diverging from prior-read enforcement's "model has seen
these bytes" model.

1. SVG (and similar string-content fallbacks) recorded as
   non-cacheable, blocking subsequent Edit / WriteFile.

   `read-file.ts` derives `cacheable` from
   `originalLineCount !== undefined && !isTruncated`. The SVG
   branch in `fileUtils.ts` returned content without
   `originalLineCount`, so `cacheable` collapsed to false and a
   follow-up Edit hit the dead-end "non-text payload — use shell"
   rejection — telling the model to use shell to mutate a file it
   had just successfully read as text. This was a real regression
   vs pre-PR behaviour where SVG-as-text editing worked.

   Fix: SVG-as-text branch now sets `originalLineCount` (split
   on '\n') and `isTruncated: false`, so ReadFile records it as
   a full cacheable read. The binary-fallback string and
   over-1MB SVG branches are deliberately left non-cacheable —
   they return placeholder strings ("Cannot display content of
   ...") rather than file content, so blocking edits there is
   correct. New regression test in `read-file.test.ts`:
   `records SVG-as-text reads with cacheable=true so a follow-up
   Edit passes enforcement`.

2. recordRead unconditionally overwriting lastReadWasFull /
   lastReadCacheable, revoking prior write-author or full-read
   rights.

   The `WriteFile(create) → ReadFile(offset/limit) → Edit`
   sequence rejected the Edit because the partial read clobbered
   the `lastReadWasFull = true` that `recordWrite` had stamped
   at create time. Same shape applies to a full text read
   followed by a partial one of the same inode.

   Fix: `recordRead` is now sticky-on-true for the read flags —
   `if (opts.full) entry.lastReadWasFull = true;` and the
   matching guard for `cacheable`. Prior `true` survives a later
   partial / non-cacheable read. The fast-path `file_unchanged`
   check still gates on the incoming request's own `isFullRead`
   in `read-file.ts`, so a partial read still does not get a
   placeholder it shouldn't. Updated the existing
   "overwrites earlier lastReadWasFull" test to assert the new
   sticky semantics, and added a `lastReadCacheable` symmetric
   test plus a `Write → partial-Read → Edit` end-to-end test in
   `edit.test.ts`.

Reported by tanzhenxin via independent maintainer review on
2026-05-06.

* fix(core): close 3 correctness gaps from re-review #4233904930

All three are tightenings of the prior `de8ddf530` round.

1. **Sticky-on-true narrowed to "no fingerprint drift"**.
   `fileReadCache.recordRead` previously kept `lastReadWasFull` /
   `lastReadCacheable` true across drifted recordings, which
   re-opened a `Read full @x → external write @y → Read partial
   @y → Edit` hole: the partial recordRead silently advanced the
   entry's mtime+size to Y while preserving the sticky `full=true`
   from X, so a follow-up Edit ran against bytes the model only
   saw the first 10 lines of. Now the sticky branch only fires
   when `(mtimeMs, sizeBytes)` matches the existing entry; on
   drift, both flags reset to exactly what this read produced.
   New regression test in `fileReadCache.test.ts` reproduces the
   reviewer's reported sequence.

2. **Subagent FileReadCache isolation now covers the
   inherits-model + same-approval-mode common case**. The
   own-property machinery from QwenLM#3717 only triggers when an
   `Object.create(parent)` actually fires; both
   `agent.ts:990-993` (same-approval-mode) and
   `subagent-manager.ts:699-701` (inherits-model) had paths that
   returned the parent Config directly, so the subagent's
   `getFileReadCache()` resolved to the parent's instance — a
   parent Read could satisfy the subagent's Edit on a path the
   subagent's transcript never contained. Both sites now build
   a thin `Object.create(base)` override unconditionally; no
   method changes for the inherits / same-mode cases, but a
   distinct instance triggers the lazy-init in
   `Config.getFileReadCache()` so the subagent gets an isolated
   cache.

3. **Cache records the read pipeline's internal stat instead of
   a post-read re-stat**. `processSingleFileContent` now
   surfaces its internal stat via `result.stats`, and read-file
   uses that for `recordRead` instead of running its own stat
   after the read returns. Pre-fix, an external write between
   the pipeline call and the post-read stat let the cache record
   fingerprint Y for content the model received at X — a
   subsequent Edit would pass enforcement against bytes the
   model never legitimately saw. The internal-stat-to-read
   window is still a few microseconds wide; that residue is the
   same content-hash territory acknowledged in the Risk section.

Reported by tanzhenxin via re-review on PR QwenLM#3774.

* docs(core): clarify partial subagent isolation per review #4234090906

tanzhenxin's third review correctly observed that the
`Object.create(parent)` wrappers in `agent.ts:createApprovalModeOverride`
and `subagent-manager.ts:maybeOverrideContentGenerator` only isolate
the FileReadCache for code that consults `Config.getFileReadCache()`
directly. Bound `EditTool` / `WriteFileTool` instances were registered
against the parent's tool registry at initialise time, so tool
invocations still resolve `this.config` to the parent and reach the
parent's cache. `InProcessBackend.createPerAgentConfig` already does
the right thing (`override.createToolRegistry()` +
`copyDiscoveredToolsFrom(base.getToolRegistry())`); bringing that to
these two spawn sites is the real fix.

Reviewer's verdict was COMMENT, not REQUEST_CHANGES — the gap
pre-dates this PR (it's a property of QwenLM#3717's per-Config own-property
machinery) and pre-PR there was no enforcement on subagent mutations
at all, so the PR is strictly an improvement on every spawn path.
Documented the partial guarantee explicitly:

- Inline comments on both spawn sites note the bound-tool caveat
  and point at `InProcessBackend.createPerAgentConfig` as the model
  for the follow-up.
- PR description's subagent paragraph (English + Chinese mirror) now
  splits into "fully isolated" (`InProcessBackend.createPerAgentConfig`)
  and "partial isolation" (the two sites in this PR) so readers don't
  walk away with the wrong contract.

Filing the registry-rebuild work as a follow-up; not in this PR.
BingqingLyu pushed a commit that referenced this pull request May 7, 2026
…wenLM#3831 PR-1 of 3) (QwenLM#3842)

* feat(core): add signal.reason convention for ShellExecutionService.execute()

Foundation for QwenLM#3831 Phase D (b) — Ctrl+B promote of a running foreground
shell to background. Defines a discriminated `ShellAbortReason` union that
the AbortSignal carries; default behavior (no reason / `{ kind: 'cancel' }`)
keeps the existing tree-kill on abort. `{ kind: 'background' }` is a takeover
signal — execute() skips the kill, drops the child from its active set (so
cleanup() won't kill it later), flushes a snapshot of captured output, and
resolves the result Promise immediately with `promoted: true` so the
awaiting caller unblocks.

Pure plumbing: no caller sets the reason yet, so this is a zero-behavior
change for existing call sites. The `promoted?: boolean` field is optional
on ShellExecutionResult so existing consumers compile against the new shape
without source changes.

Tests pin both branches in both childProcessFallback and executeWithPty:
default abort still SIGTERM-tree-kills; `{ kind: 'cancel' }` is identical to
default (pin against accidental routing through the background branch);
`{ kind: 'background' }` skips the kill, snapshot output is preserved,
mockProcessKill / mockPtyProcess.kill are NOT called.

Part of QwenLM#3831 (Phase D part b — Ctrl+B promote running shell to background).
PR-1 of 3.

* fix(core): detach service listeners on background-promote (resolve review)

Addresses 4 Critical + 2 Suggestion findings on PR-1 of QwenLM#3831:

- **childProcess listener detach** (review line 555 + 573): Anonymous arrow
  listeners on stdout/stderr/error/exit could not be off()'d. After
  background-promote, post-promote bytes would re-enter handleOutput, which
  then calls decoder.decode() on a now-finalized text decoder (cleanup()
  already called .decode() without stream:true) → TypeError crash. Even
  without the crash, old onOutputEvent would fire for new data → ownership
  contract violation + duplication. Fix: extract named handler refs
  (stdoutHandler / stderrHandler / errorHandler / exitHandler) and call
  off() on all four in the background-promote branch via a
  detachServiceListeners() helper.

- **PTY listener detach** (review line 967 + 990): node-pty's onData / onExit
  return IDisposable handles; the abort handler now captures
  dataDisposable / exitDisposable and calls .dispose() in the
  background-promote branch. ptyProcess.on('error') is EventEmitter-style
  (not IDisposable) — extract a named ptyErrorHandler ref and off() it.
  Without these, post-promote PTY error throws → Node.js crash; post-promote
  data continues writing to headlessTerminal and calling old onOutputEvent
  → ownership violation.

- **PTY in-flight chain item ownership** (related to review line 990):
  processingChain may have already-enqueued callbacks past the early
  listenersDetached check. Refactored from "early-return short-circuit" to
  "guard each onOutputEvent emit individually" so in-flight writes still
  LAND in headlessTerminal (snapshot reflects them) but no events leak to
  the foreground onOutputEvent. Also clear renderTimeout in the abort
  handler so a pending throttled render doesn't fire post-promote.

- **PTY snapshot freshness** (review line 972, suggestion): The original
  abort handler called serializeTerminalToText immediately. Now we
  await Promise.race([processingChain drain, SIGKILL_TIMEOUT_MS]) first
  (mirrors the onExit finalize pattern at ~line 970) so in-flight
  headlessTerminal.write callbacks land before serialization. Skipped
  render(true) intentionally because it would emit final onOutputEvent
  data (renderFn calls onOutputEvent), violating the "no emit post-promote"
  invariant — added a comment explaining why direct serialize is correct.

- **Handoff-boundary tests** (review line 1257, suggestion): Added 4 new
  tests pinning the ownership contract — 2 for child_process (post-promote
  stdout/stderr does NOT route to onOutputEvent; child exit does NOT
  re-resolve result), 2 for PTY (data/exit disposables ARE called; result
  shape stays promoted: true even if post-promote events fire).

Also: test setup now stubs mockPtyProcess.onData / .onExit to return
{ dispose: vi.fn() } so the background-promote path's dispose() calls
don't crash on undefined (the stub's mock.results[0].value is then
inspected by the new handoff tests).

58 / 58 tests pass (50 baseline + 4 first-pass + 4 handoff). Total +235 / -35
on top of the prior commit.

* fix(core): defensive hardening for ShellExecutionService background-promote (resolve 2nd review pass)

Addresses 6 follow-up [Suggestion] threads on PR-1 of QwenLM#3831 — all
substantive code-quality issues raised by the second-pass review of
the dispose-based detach commit (8e8e18c):

- **Exhaustive switch on `ShellAbortReason.kind`** (both abort handlers).
  Earlier `if (reason?.kind === 'background')` form silently fell
  through to kill for any unrecognized variant — a future
  `{ kind: 'suspend' }` would have killed the process with zero
  compile-time signal. Switched to `switch (kind)` with a `never`-typed
  default that runs `debugLogger.warn` and falls back to the safest
  behavior (cancel/kill). Each branch is now extracted into a named
  helper (`performBackgroundPromote` / `performCancelKill`) so the
  switch body stays a single screenful.

- **Each `dispose()` wrapped in its own try/catch** (PTY). node-pty's
  `IDisposable` contract doesn't guarantee no-throw. Without per-dispose
  try/catch a single throwing dispose() would skip subsequent cleanup
  (the other dispose, off('error'), activePtys.delete, drain, resolve)
  and the caller would hang forever on `await result`. Each call now
  logs via debugLogger.warn on failure but continues.

- **`.catch(() => undefined)` on the processingChain side of the drain
  race** (PTY). `Promise.race([processingChain.then(drain).then(drain),
  timeout])` would propagate a chain rejection out of the race; since
  `addEventListener` doesn't await our handler, the rejection became
  unhandled and `resolve()` was never called → caller hung. Now the
  rejection is swallowed; the timeout side still terminates the race
  on time.

- **Drain-timeout truncation now emits a diagnostic warning** (PTY).
  Previously the 200ms drain timeout could fire, the snapshot would be
  taken with the buffer in mid-write state, and the result.output
  would be silently truncated. Race result is now observed via a
  symbol sentinel; when the timeout side wins, debugLogger.warn fires
  pointing the user at rawOutput as the un-truncated fallback.

- **Snapshot serialize failure logs instead of swallowing silently**
  (PTY). Empty `catch {}` made result.output indistinguishable from
  "command produced no output" if serializeTerminalToText threw. Now
  `debugLogger.warn` with the error message leaves a trail for support
  bundles.

- **Dedicated `PROMOTE_DRAIN_TIMEOUT_MS` constant** separated from
  `SIGKILL_TIMEOUT_MS`. Both are 200ms today, but they have unrelated
  reasons-to-change (kill escalation timing vs. promote drain
  ceiling) — sharing the constant means tuning one would silently
  change the other.

Also adds a module-level `debugLogger = createDebugLogger('SHELL_EXECUTION')`
since the service had no logging surface before this commit.

58 / 58 tests pass; tsc clean; ESLint clean. No new tests added: the new
behaviors (timeout sentinel firing, dispose throw, exhaustive switch
default) are defensive log-only paths; existing handoff tests already
cover the happy path. Adding mock-throw tests is reasonable
follow-up but not blocking.

* fix(core): real bug — ptyProcess.off → removeListener; defensive abort-reason read

Resolves the third review pass on PR-1 of QwenLM#3831 — 1 real bug + 2
defensive hardenings:

- **Real bug: `ptyProcess.off('error', ...)` throws TypeError at
  runtime** (line ~1074). `@lydell/node-pty`'s `IPty` interface
  exposes the legacy Node EventEmitter `removeListener`, not the
  modern `off` alias. Previous form threw, the surrounding try/catch
  swallowed it (post-prior-pass dispose hardening), but the old
  `ptyErrorHandler` stayed registered — so a post-promote PTY error
  would still hit our foreground handler and `throw err`, breaking
  the handoff contract that PR-1's whole listener-detach work is
  supposed to enforce. Switched to `removeListener`. The catch +
  warn stays as defense-in-depth; the message wording is updated.

- **Prototype-pollution-safe `kind` read** (extracted to module-level
  helper `getShellAbortReasonKind`). The previous `reason?.kind`
  walked the prototype chain — a polluted
  `Object.prototype.kind = 'background'` would silently route
  `abortController.abort({})` (any plain object reason) into the
  promote branch and skip the kill. Lifecycle/safety branch deserves
  the extra check. Helper now: rejects non-object reasons; reads
  `kind` only as an OWN property (`hasOwnProperty`); whitelists
  against `'background' | 'cancel'`; defaults to `'cancel'` (the
  safe historical behavior) for everything else. Both abort handlers
  (childProcess + PTY) now share this helper.

- **`streamStdout: true` + background-promote = silent empty
  snapshot** (childProcess `performBackgroundPromote`). The promote
  snapshot reads from the `stdout` / `stderr` string accumulators;
  but in `streamStdout` mode `handleOutput` forwards bytes through
  `onOutputEvent` and skips the accumulators entirely. Today PR-1's
  only call site (foreground shell.ts) uses `streamStdout: false`,
  so the combination is unreachable — but if a future caller pairs
  the two, `result.output` would be empty with no diagnostic. Added a
  `debugLogger.warn` when the combination occurs, pointing the caller
  at `rawOutput` as the fallback. Cheaper than building a parallel
  accumulator just for this latent case.

58 / 58 tests pass; tsc clean; ESLint clean.

* fix(core): liveness check + throw-safe abort-reason read + encoding-aware PTY snapshot (resolve 4th review pass)

Resolves 6 threads on PR-1 of QwenLM#3831 — 1 Critical + 1 real bug + 2
quality + 2 test-coverage:

- **[Critical] `getShellAbortReasonKind` throw-safe property read.**
  Previous form read `reason.kind` after only checking that `kind` is
  an own property. An own accessor that throws (or a Proxy with a
  trapping getter) would throw before the helper reached either the
  cancel kill path or the background promote path. Abort handlers are
  dispatched async and not awaited by AbortSignal, so a leaked throw
  here would have left the shell process alive instead of being killed
  on cancel — quietly. Wrapped the property read in try/catch with a
  fall-back to the safe 'cancel' kill behavior.

- **Real bug: child_process post-exit race in background-promote**
  (`performBackgroundPromote`). The child may have already exited but
  the 'exit' event hasn't reached our handler yet (Node delivers
  events on the next microtask). Promoting in that window would
  detach our exit listener and report `promoted: true` for a process
  that's already dead — the caller would hold an inert pid expecting
  to take over. Now we read `child.exitCode` / `child.signalCode`
  before detaching: if either is non-null, fall through and let the
  pending exit handler resolve normally with the real exit info.
  Mirrored mock setup so `exitCode` / `signalCode` default to `null`
  (matching real ChildProcess) instead of `undefined`.

- **PTY snapshot: re-decode + replay (mirror exit-path encoding).**
  The promoted snapshot was serializing `headlessTerminal` directly,
  which was fed by a streaming decoder initialized from the
  first-chunk encoding heuristic. When early output is ASCII-only but
  later output is in a different encoding (GBK / Shift-JIS / etc.),
  this produces mojibake — and the normal exit path doesn't, because
  it re-decodes `finalBuffer` with `getCachedEncodingForBuffer` and
  replays through a fresh terminal. Now mirrors that logic so
  `result.output` shape matches across the two paths. Direct-serialize
  remains as a last-ditch fallback if replay throws.

- **Switch `default` no longer emits a runtime warn.** Reviewer noted
  the helper's whitelist made the `default: { _exhaustive: never }`
  branch unreachable at runtime — the `debugLogger.warn` in it could
  never fire. Kept the `_: never = kind` type assertion (so a future
  ShellAbortReason variant forces a TS error here, directing the
  developer to extend BOTH the helper's whitelist AND add a `case`),
  removed the unreachable warn. Added a comment that the assertion is
  the static-only safety net the union expansion would trigger.

- **Direct unit tests for `getShellAbortReasonKind`** (8 cases). The
  helper's prototype-pollution defense is the main reason it exists;
  if `hasOwnProperty` is accidentally removed the regression would
  silently send `abortController.abort({})` (any plain reason) into
  the promote path. Exported the helper and added direct tests for:
  null / undefined, non-object, empty object (no own kind), prototype-
  only kind (pollution), unknown kind value, throwing accessor, Proxy
  trap, and the two happy paths.

- **`removeListener` regression guard.** The fix to call
  `ptyProcess.removeListener('error', ...)` instead of `.off(...)`
  matters because `@lydell/node-pty`'s IPty interface only exposes
  `removeListener` — `.off()` throws TypeError on a real PTY but the
  EventEmitter mock tolerates both. Added a test that spies on both
  methods and asserts the production code uses `removeListener` for
  the 'error' event, so a future swap back to `.off()` regresses
  loudly under the mock instead of silently.

68 / 68 tests pass (58 baseline + 9 helper boundary + 1 removeListener
guard + 1 post-exit race); tsc clean; ESLint clean.

* fix(core): PTY background-promote post-exit race guard (resolve 5th review pass)

Mirrors the child_process post-exit race fix from 4cc558b into the
PTY path — addresses 1 [Critical] thread on PR-1 of QwenLM#3831:

The PTY may have already exited but our `exitDisposable` (onExit
callback) hasn't run yet — node-pty delivers the exit event
asynchronously after the PTY's native SIGCHLD, so there's a window
between "PTY actually dead" and "service onExit fires". Promoting in
that window detaches our exit listener and reports `promoted: true`
for a dead PTY, losing the real exit status; the caller would hold an
inert pid expecting to take over.

The IPty interface doesn't expose an `exitCode` field we can read
directly (unlike `child.exitCode` / `child.signalCode` for
child_process), so use `process.kill(pid, 0)` as a best-effort
liveness check via the existing `ShellExecutionService.isPtyActive`
helper. If kill(pid, 0) throws ESRCH, the pid is gone — log at debug
level and fall through, letting the pending onExit callback resolve
normally with the real exit info.

Also adds a unit test mirroring the child_process race test: mocks
`process.kill(pid, 0)` to throw ESRCH on the liveness probe, asserts
the result has no `promoted: true` and reports the real exitCode.

69 / 69 tests pass; tsc clean; ESLint clean.

* docs(core): correct getShellAbortReasonKind boundary-test count in JSDoc

Doc said 'all six edge cases' but the test suite has 8 cases (added
Proxy-trap and undefined later). Off-by-2 cosmetic only — no behavior
change. Caught during a multi-round self-audit of PR-1 of QwenLM#3831.

Audit summary: 7 rounds (correctness / reverse / consistency / coverage
/ build / exception paths / style) found one false-positive (a sync-
abort registration-order race I initially thought existed). Verified
that Node's WHATWG AbortSignal does NOT auto-fire 'abort' listeners
on already-aborted signals, so the race window cannot open. No code
change needed for that scenario; this commit is just the JSDoc fix.

69 / 69 tests still pass; tsc + ESLint clean.

* docs(core): document the helper / union / switch sync invariant explicitly

Multi-round self-audit found that `getShellAbortReasonKind`'s value
whitelist has no compile-time tie to the `ShellAbortReason` union: when
the union grows, TypeScript's `_exhaustive: never` in each switch
forces #3 (the case arm) to be added, but the helper's whitelist
(#2) silently keeps degrading the new variant to 'cancel', and the
new case arm is never reached at runtime.

Reviewer #4 raised this on the second pass; the original commit chose
to accept it (option B in that thread) but didn't leave a strong
in-code signal for future contributors. Added an INVARIANT block
inside the helper enumerating the three sites that must be kept in
sync, so the next person extending `ShellAbortReason` sees the
coupling at the place where they're most likely to forget it.

No behavior change — comment-only. 69 / 69 tests still pass; tsc +
ESLint clean.

Audit summary (this round + prior round): 18 angles total over two
sweeps and one reverse-attack pass. Found:
  - 0 real bugs
  - 1 false-positive race (sync-abort registration order — Node WHATWG
    AbortSignal does NOT auto-fire on already-aborted signals;
    investigated, reverted)
  - 1 cosmetic doc fix (boundary-test count off-by-2)
  - 1 cosmetic INVARIANT block (this commit)

Areas reviewed without finding new issues: caller-side
ShellExecutionResult shape compatibility (optional `promoted?` field,
existing callers spread-untouched); `exited` flag lifecycle
(monotonic, cleanup() idempotent); processingChain in-flight
ownership (listenersDetached guards every onOutputEvent emit
including the renderFn-rendered case via the same flag); race
between exit event and abort handler (both microtasks, FIFO ordering
gives correct outcome either way); Node version dependence
(`AbortSignal.reason` is Node 17.2+, engines: >=20 covers it);
test isolation (mockImplementationOnce + module-level mockProcessKill
clears each beforeEach); `process.kill(pid, 0)` Windows liveness
reliability (best-effort, acceptable for PR-1 plumbing); PID reuse
race on the PTY liveness check (theoretically possible, microsecond
window, unavoidable at the OS level — rejected in spec discussion);
PR-2/PR-3 contract surface (caller MUST attach listeners before
abort — documented; any future caller violating this is its own bug).

* test(core): align mockChildProcess.exitCode/signalCode in second beforeEach

The 'execution method selection' describe block has its own
beforeEach (separate from 'child_process fallback') that builds
mockChildProcess but does not set `exitCode` / `signalCode = null`.
Real Node `ChildProcess.exitCode` / `signalCode` are `null` while the
process is alive — and production now reads these in the
background-promote race guard. The current tests in this block don't
exercise the promote path, so they pass regardless, but any future
promote-related test landing here would silently trip the guard
(`undefined !== null` is true) and fall through to the normal-exit
branch instead of promoting.

Mirror the `child_process fallback` block's mock setup so the two
beforeEach hooks produce equivalent ChildProcess shapes, eliminating
a quiet foot-gun for future contributors.

Comment-only / test-fixture change. 69 / 69 tests still pass; tsc clean.
Found during a deeper third-round self-audit of PR-1 of QwenLM#3831.
BingqingLyu pushed a commit that referenced this pull request May 11, 2026
…Change emit (QwenLM#3919)

* fix(cli,core): isPending gate on subagent scrollback summary + post-delete statusChange emit

Two follow-ups from PR QwenLM#3909 review.

1. **Re-introduce `isPending` gate on `SubagentExecutionRenderer`'s
   scrollback summary** (Copilot finding on PRRT_kwDOPB-92c6AUQHn).
   The verbose inline frame retirement collapsed
   `SubagentExecutionRenderer` to "render the summary whenever a
   subagent reaches a terminal status" — but with `isPending`
   removed in QwenLM#3909, that fired in BOTH live (pendingHistoryItems)
   AND committed (Static) phases. Live-phase rendering duplicated
   the row LiveAgentPanel already paints below the composer until
   the parent turn committed.

   Add `isPending` back to `ToolMessageProps` purely as a gate for
   this one render path: the summary fires only when `!isPending`
   (committed). `ToolGroupMessage` forwards the flag (it kept the
   prop on its own interface for upstream compat the whole time).
   Test gap closed by the new `live (isPending) terminal subagent
   → no scrollback summary (panel owns the row)` case.

2. **Emit `statusChange` AFTER delete in `unregisterForeground`**
   (Copilot finding on PRRT_kwDOPB-92c6AUQGc + the panel-only
   reconciliation it spawned). The shared snapshot in
   `useBackgroundTaskView` only refreshes on `statusChange`, and
   `unregisterForeground` previously fired exactly once — BEFORE
   delete — so the snapshot froze with the agent as "running"
   while `registry.get()` returned undefined. Result:
   `BackgroundTasksDialog` list mode showed a ghost "running" row
   with cancel hints whose `x` was a no-op, contradicting what the
   panel already showed (synthesized neutral terminal).

   Fire `statusChange` a second time AFTER `agents.delete()` so
   snapshot consumers see the registry-less state and stop
   surfacing the agent. The first emit still mirrors
   complete/fail/cancel/finalize ordering (callbacks that re-read
   `registry.get` see the entry); the second emit is the new
   contract for snapshot-based views. React batches the two
   resulting setState calls into one re-render so consumers
   re-render exactly once.

   Updated the existing "emits status change before removing the
   entry" test to capture both emits and explicitly assert that
   the second observes the registry-less state. Added a sibling
   test covering the post-delete `getAll()` count.

Coverage: 190 passing tests across core + cli (background-view +
ToolMessage + ToolGroupMessage + useBackgroundTaskView).

* fix(cli,core): compact-mode terminal subagent expansion + statusChange context flag

Five review findings on PR QwenLM#3919:

1. **Compact mode bypassed the scrollback summary** (gpt-5.5 via
   /qreview, ToolGroupMessage:324). `ToolGroupMessage` returns
   `CompactToolGroupDisplay` before the ToolMessage path when
   `compactMode === true`, so the new `isPending` gate on
   `SubagentExecutionRenderer` only protected the expanded path —
   committed terminal subagents in compact mode never reached
   `SubagentScrollbackSummary` and the LiveAgentPanel → committed-
   summary handoff broke for users who turned compact mode on.

   Force-expand the group when `!isPending` AND any tool call has a
   terminal `task_execution` resultDisplay. Stay compact while the
   parent turn is still live (`isPending`) — the panel below the
   composer owns that surface and an inline summary would
   duplicate it. Coverage: 4 new ToolGroupMessage cases (compact +
   completed-committed expands; compact + running-live stays compact;
   compact + completed-live stays compact; compact + failed-committed
   expands).

2. **Snapshot-coupled comment in `packages/core`** (Copilot,
   background-tasks.ts:292). The comment named CLI/UI consumers
   (`useBackgroundTaskView`, `BackgroundTasksDialog`) and asserted
   React batching guarantees from a core file. Reword to
   "snapshot-style consumers that re-pull `getAll()` from inside
   the callback" and drop the framework-specific batching claim.

3. **Two-phase emit needed an explicit signal** (Copilot,
   background-tasks.ts:283). Emitting `statusChange` twice without
   distinguishing the phases forced consumers to either do
   duplicate work or risk persisting a stale `entry` from the
   second callback. Add an optional second arg
   `context?: { removed?: boolean }` to
   `BackgroundStatusChangeCallback`; the post-delete emit passes
   `{ removed: true }` so consumers can disambiguate without
   re-querying the registry. Backwards compatible — existing
   callbacks ignore the new arg. Tests updated to assert both
   `mock.calls[0][1] === undefined` and
   `mock.calls[1][1] === { removed: true }`.

4. **`isPending` doc clarified** (Copilot, ToolMessage.tsx:507).
   Made the default semantics explicit: omitted/undefined is
   treated as committed (not pending); live-area renderers MUST
   pass `true` explicitly to suppress the scrollback summary.

5. (4 of the threads were duplicate Copilot fires of #2 + #3.)

Coverage: 219 test files / 3369 passing across cli/ui + core/agents.

* docs(cli): update ToolGroupMessageProps.isPending JSDoc

The previous prop comment claimed `isPending` was "not consumed by the
group body" — true at the time, but the body now reads it for two real
purposes (compact-mode gating + forwarding to ToolMessage). Update
the doc so future callers / tests don't treat it as legacy.

Addresses Copilot finding on PRRT_kwDOPB-92c6AYE0V.

* fix(cli): hide live-phase subagent tool entries — LiveAgentPanel owns the row

User report: with compact mode OFF, a running subagent shows up
twice — once as the parent tool group's `task` row (status icon +
name + description), once as the LiveAgentPanel row beneath the
composer. Same agent, two surfaces, redundant.

Filter `task_execution` tool entries out of the expanded
`ToolGroupMessage` while `isPending=true` so the panel is the
single source of truth for in-flight subagents. The entry returns
once the parent turn commits (`isPending=false`), letting
`SubagentScrollbackSummary` land inside the parent's tool group
as a persistent audit trail.

Exception: subagents with a pending approval still render, because
the focus-routed banner / queued marker is the only inline surface
that lets users answer the prompt without opening the dialog.

If a group is purely panel-owned (e.g. a single Task call with no
sibling tools), the entire `ToolGroupMessage` returns `null` so
an empty bordered container doesn't float above the panel.

Coverage: +4 ToolGroupMessage cases — running entry hidden in
live phase / mixed group keeps siblings / pending-approval entry
still renders / committed entry comes back for the audit trail.

* refactor(cli): tighten subagent-tool helper naming + ANSI-safe scrollback summary

Self-audit + independent review found 5 cleanup items on the live-phase
hide path; all addressed in one commit since none are behavioral
changes:

1. **Move `allEntriesPanelOwned` short-circuit BEFORE `showCompact`**
   so a pure-subagent group in compact mode is also hidden during the
   live phase (previously CompactToolGroupDisplay rendered a single
   summary line above the panel — a mild duplicate on top of what the
   non-compact path already fixed).
2. **Rename `isLiveSubagentTool` → `isSubagentToolEntry`.** The helper
   identifies a tool's resultDisplay shape; it doesn't check live-state.
   The previous name conflated "predicate" with "use case" and read as
   if it returned true only during the live phase.
3. **DRY up `hasCommittedTerminalSubagent`** to use `isSubagentToolEntry`
   instead of inlining its own type-narrowing.
4. **ANSI-escape `subagentName` / `taskDescription` / `terminateReason`**
   in `SubagentScrollbackSummary`. Same threat model as the panel rows
   and HistoryItemDisplay — these strings come from subagent config
   (user-authored) and LLM output and could carry terminal control
   sequences. The stats fields (tool count / duration / tokens) flow
   through trusted formatters and don't need escaping.
5. **Doc comments updated** to reflect the four real responsibilities
   of `isPending` on `ToolGroupMessageProps` (hide pure groups,
   force-expand committed compact, per-tool filter, forward to
   ToolMessage), to clarify that the keyboard-focused subagent id can
   point at a hidden tool harmlessly (the iterator returns `null`
   before the focus prop is computed), and to drop the redundant
   "EXCEPT" clause on the per-tool filter in favor of a single
   sentence.

Coverage unchanged: 251 passing tests across messages /
background-view / core/agents; broader 3374-test sweep clean; TS
clean on both cli and core packages.

* fix(cli,core): address 3 critical review findings + ANSI/doc cleanups

Three real bugs flagged by gpt-5.5 via /qreview, plus 4 doc /
sanitization nits from Copilot. All 7 threads close together since
they share the same surfaces.

## Critical fixes

1. **Foreground subagents disappeared mid-parent-turn**
   (PRRT_kwDOPB-92c6AYvL9). Post-QwenLM#3921 swap-order, `unregisterForeground`
   drops the entry from the panel snapshot the moment the subagent
   finishes. The previous round's `!isPending` gate on
   `SubagentScrollbackSummary` then suppressed the inline summary
   too, leaving the user with nothing on screen for the run until
   the parent committed.

   - Drop the `!isPending` gate — `unregisterForeground` already
     removes the row from the panel, so the inline summary can fire
     in BOTH live and committed phases without duplicating it.
   - Tighten the `ToolGroupMessage` live-phase hide so it only
     filters `running` / `paused` / `background` task entries
     (`isPanelOwnedSubagentTool`), not terminal ones. Terminal
     entries pass through immediately so the summary lands.
   - The "panel-owned" predicate is now distinct from the broader
     "subagent tool entry" predicate (`isSubagentToolEntry`) and the
     "terminal subagent" predicate (`isTerminalSubagentTool`); each
     usage site picks the one it actually means.

2. **Compact mode dropped the scrollback summary**
   (PRRT_kwDOPB-92c6AYvLw). Force-expanding the group made the
   container go through the expanded path, but `ToolMessage`'s own
   compact-mode gate (`!compactMode || forceShowResult ? renderer
   : 'none'`) still suppressed the result block, so
   `SubagentScrollbackSummary` never rendered for compact-mode
   users. Pass `forceShowResult={true}` for terminal subagent tool
   entries so the result block is always rendered.

3. **`mergeCompactToolGroups.isForceExpandGroup` didn't know about
   terminal subagents** (PRRT_kwDOPB-92c6AYvMC). The committed-
   history preprocessor merged adjacent tool_groups before render,
   so a terminal `task_execution` group could be absorbed into a
   compact batch (its `tool_use_summary` label dropped), and the
   render-time force-expand check never got a chance to override.
   Mirror the `hasCommittedTerminalSubagent` predicate inside
   `isForceExpandGroup` so preprocessing and rendering agree.

## Doc / sanitization nits

- `BackgroundStatusChangeCallback` doc now lists every emitter
  (register / complete / fail / cancel / finalizeCancelled /
  finalizeCancellationIfPending / abandon / unregisterForeground /
  reset) and groups them by ordering camp (keeps-the-entry vs
  removes-the-entry — `reset` joins `unregisterForeground` in the
  delete-then-emit camp).
- ANSI-escape `data.subagentName` in the focus-holder banner and
  the queued marker (`SubagentExecutionRenderer`) — same threat
  model as the panel rows and `SubagentScrollbackSummary`.

## Coverage delta

- New ToolMessage case: live-phase terminal subagent now renders
  inline (replaces the prior "no scrollback summary" assertion that
  was the symptom of the AYvL9 bug).
- New ToolGroupMessage cases: terminal subagent in live phase
  renders inline; `forceShowResult=true` propagates for terminal
  subagent tools (mock now exposes the prop).
- New mergeCompactToolGroups parametrized cases: terminal subagent
  in any of completed / failed / cancelled stays its own batch.

280 tests pass across cli messages + utils + background-view +
core/agents. TS clean.

* fix(cli): drop `'paused'` arm from isPanelOwnedSubagentTool — not in AgentResultDisplay union

CI Lint failed with TS2367: the previous round's
`isPanelOwnedSubagentTool` checked for `status === 'paused'` but
`AgentResultDisplay.status` (the tool-result-side type) only carries
`'running' | 'completed' | 'failed' | 'cancelled' | 'background'`.
The `'paused'` status lives on the registry-side
`BackgroundTaskStatus` union and is only ever surfaced through
`LiveAgentPanel` directly, never through a `task_execution` payload.

Drop the dead arm and add a comment so a future "let's also check
paused here" doesn't get re-introduced.

* fix(cli): apply panel-ownership filter once before compact-mode decision

Mixed live groups (running subagent + sibling tool) leaked the
panel-owned subagent into `CompactToolGroupDisplay`'s count and
`getActiveTool` selection, because `showCompact` returned BEFORE the
inline `.map()` filter ran. Compact-mode users would see e.g.
`task × 2 Delegate task to subagent` even though LiveAgentPanel
already owned the subagent row below the composer.

Derive `inlineToolCalls` once via `useMemo` immediately after the
existing hook block and use it consistently for the compact summary,
sizing math, and the render map. The early-return for
"all-entries-panel-owned" collapses into `inlineToolCalls.length === 0`
(gated on `isPending` so the legacy empty-input committed-phase
snapshot is preserved). Remove the inner `.map()` filter — the
upstream derivation already excluded the same entries.

JSDoc updates:
- `ToolGroupMessageProps.isPending` now describes the real flow
  (build inlineToolCalls / force-expand / forward to ToolMessage for
  parity).
- `ToolMessageProps.isPending` is documented as forwarded-but-inert
  (`SubagentExecutionRenderer` doesn't gate on it; the live-phase
  filter and the unconditional terminal summary do the actual work).

Regression test: live mixed group in compact mode → sibling wins
active-tool, count collapses to 1, no `× 2` suffix, no subagent
description in the header.

Addresses Copilot review comments 3205262972 / 3205263020 (doc/code
mismatch) and gpt-5.5 critical 3205288299 (compact-mode leak).

* fix(cli): force-expand compact groups on terminal subagent in live phase too

Resolved comment 3203286936 codified the design intent that
`SubagentScrollbackSummary` "fires in BOTH live and committed phases"
to bridge `unregisterForeground`'s post-delete panel-snapshot drop
and the parent turn committing. Non-compact mode honored that
contract (terminal subagents render the summary inline whenever they
appear in `inlineToolCalls`), but compact mode still gated
`hasCommittedTerminalSubagent` on `!isPending`, so a foreground
subagent finishing mid-turn under compact mode produced NOTHING
inline until the parent committed — exactly the gap the bridge was
meant to close.

Drop the `!isPending` arm and rename `hasCommittedTerminalSubagent`
→ `hasTerminalSubagent`. The force-expand now applies to terminal
subagents in either phase; compact-mode users see the same outcome
line non-compact users already get. Mirrors
`SubagentExecutionRenderer`'s ungated terminal-summary path and
`mergeCompactToolGroups.isForceExpandGroup`'s no-isPending-gate
preprocessing rule.

Tests:
- Flip "compact mode: live group with completed subagent stays
  compact" → "force-expands so the summary bridges the panel-snapshot
  drop". Update rationale to reflect post-QwenLM#3921 reality (panel evicts
  terminal foreground rows immediately).
- Add "compact mode: live mixed group with terminal subagent +
  sibling force-expands and renders both" — covers the bridge in
  mixed groups.
- Update two stale `hasCommittedTerminalSubagent` cross-references
  in `mergeCompactToolGroups.{ts,test.ts}` comments.
BingqingLyu pushed a commit that referenced this pull request May 11, 2026
…nLM#3892)

* fix(core): close bound-tool gap on runForkedAgent's YOLO wrapper

Follow-up to QwenLM#3873 review (#3 of the three flagged adjacent
Config-wrapper sites). `runForkedAgent`'s AgentHeadless path used to
build its YOLO override via a local `Object.create(parent) +
getApprovalMode = YOLO` helper that did NOT rebuild the tool
registry, so:

1. The YOLO approval mode was silently ignored on the bound-tool
   path — parent's already-bound `EditTool` / `WriteFileTool` /
   `ReadFileTool` resolved `this.config.getApprovalMode()` back to
   the parent.
2. The fork's reads / mutations went through the parent's
   `FileReadCache` instead of a per-fork cache.
3. Memory-extraction and dream-agent paths stack the YOLO wrapper
   over a `getPermissionManager`-overriding scoped wrapper. Since
   the bound tools resolved to the parent, BOTH overrides — the
   YOLO approval mode AND the scoped permission manager — were
   bypassed.

The fix routes through the existing `createApprovalModeOverride`
helper, which:
- rebuilds the tool registry on the wrapper (so bound tools resolve
  `this.config` to the wrapper),
- copies discovered tools from the upstream registry,
- sets the `TOOL_REGISTRY_REBUILT` Symbol marker so any further
  downstream wrapper layer recognises the rebuild and skips
  redundant work.

The memory-extraction / dream-agent composition now resolves
correctly via prototype walk — the YOLO wrapper sits above the
scoped wrapper, so bound tools observe `getApprovalMode() = YOLO`
on the wrapper itself and `getPermissionManager() = scopedPm` one
prototype level up.

Adds a try/finally around the AgentHeadless run so the per-fork
ToolRegistry is stopped after execution — same shape as the spawn
finallys in `agent.ts` and `background-agent-resume.ts`. Without
this, every AgentTool / SkillTool the fork's model later
instantiates leaks its change-listener on shared SubagentManager /
SkillManager.

Adds `forkedAgent.agent.test.ts` covering: marker + YOLO + distinct
registry on the wrapper passed to AgentHeadless.create; bound
EditTool resolves to the wrapper; memory-scoped composition
preserves both YOLO and scopedPm; `stop()` fires after the
AgentHeadless body finishes. Uses `vi.spyOn(AgentHeadless, 'create')`
rather than module mocking so the real `ContextState` /
`AgentEventEmitter` keep working.

`npx vitest run packages/core/src` — 269 files / 6992 passed.

* test(core): cover stop() lifecycle on AgentHeadless.create + execute failure paths

Self-review feedback on QwenLM#3892: the stop lifecycle test only covered
the success path. A future refactor could move the stop() out of
the `finally` block and onto the success branch, reintroducing
listener leaks when create or execute rejects, while every existing
test still passes.

Two new tests pin the cleanup to the `finally`:

1. `stops the per-fork ToolRegistry even when AgentHeadless.create rejects`
   — make `AgentHeadless.create` return a rejected promise; assert
   the rejection propagates and the stop spy still fires once.
2. `stops the per-fork ToolRegistry even when headless.execute rejects`
   — return a headless object whose `execute` rejects; same shape.

Together with the success-path test these three cases cover every
exit edge of the AgentHeadless body.

`npx vitest run packages/core/src` — 269 files / 6994 passed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant