fix(core): address @tanzhenxin's PR-1 review notes (post-merge follow-up to #3842)#3886
Conversation
…ata assertion Two non-blocking review notes from @tanzhenxin's PR-1 approval (#3842, post-merge follow-up): - **Note 2 (real bug)**: `getShellAbortReasonKind` had `Object.prototype.hasOwnProperty.call(reason, 'kind')` outside the try/catch. `hasOwnProperty.call` triggers the `[[GetOwnProperty]]` Proxy trap (`getOwnPropertyDescriptor` handler). A Proxy whose `getOwnPropertyDescriptor` throws — separate from a throwing `get` trap, which the prior commit already covered — would propagate past the helper, leaving the abort handler's switch on `kind` to throw through `addEventListener` (which doesn't await async listener return values), so the shell process would stay alive instead of being killed on cancel. Moved the descriptor probe inside the same try block as the value read. My own multi-round audit covered six attack vectors but missed this one — only `get` trap throws were considered. The reviewer caught it on first pass. - **Note 3 (test parity)**: the PTY post-promotion handoff test asserted `dispose` was called but never re-invoked the data callback to verify the foreground `onOutputEvent` actually stops firing — the child_process equivalent has that assertion. Mirrored it: emit data AFTER abort by re-invoking the captured `dataCallback` reference, and assert `onOutputEventMock.mock.calls.length` does NOT increase past the moment of promote. Exercises the production `listenersDetached` guard inside the chain callback, which the bare dispose-was-called check didn't. Note 1 from the same review (the `aborted: true + promoted: true` shape forcing PR-2 callers to check `promoted` before `aborted`) is deliberately NOT addressed here — it's a contract simplification that affects PR-2's branching, so it belongs in PR-2 along with the caller-side decision on whether to flip `aborted` for promoted results. Added a TODO upstream in #3831 (PR-2 design) to track. 70 / 70 tests pass (69 baseline + 1 new helper boundary for the throwing-getOwnPropertyDescriptor case). tsc + ESLint clean.
…-up) The PTY post-promotion handoff test added in the previous commit copied the child_process equivalent's pattern verbatim — sync \`expect(count).toBe(countAtPromote)\` immediately after dataCallback returns. That works for child_process because its \`handleOutput\` is fully synchronous (sniff → decoder → emit, all on the same call stack), so the count change happens BEFORE the assertion. PTY's \`handleOutput\` is async — \`processingChain.then(...)\` queues a microtask that does the sniff + write + render-then-emit work. The sync assertion captures both \`countAtPromote\` and the post-emit count BEFORE the chain microtask ever runs, so both reads return whatever happened before the assert (typically 0). The test would tautologically pass even if the production \`listenersDetached\` guard were removed — i.e., it didn't actually verify the guard. Restructured to: 1. Drive the PTY through \`simulateExecution\` so \`await handle.result\` forces all queued microtasks (including pre-promote chain items AND the abort handler's drain) to settle. 2. Capture \`eventCountAfterSettle\` once everything has stabilized. 3. Re-invoke the captured \`dataCallback\` with post-promote data, await two more macrotask boundaries to let the new chain item fully run. 4. Assert the count hasn't moved. If the production \`listenersDetached\` guard is removed, the post-promote chain item emits, count increases past \`eventCountAfterSettle\`, and this assertion fails. So the test actually exercises the guard now. Found in self-audit while reviewing my own follow-up commit. Caught because audit was paranoid about *whether the test verifies what it claims to verify*, not just whether it passes. 70 / 70 tests pass; tsc + ESLint clean.
The previous fix asserted post-promote count equals the \`eventCountAfterSettle\` baseline, but didn't pin the baseline itself. With the production \`listenersDetached\` guard intact, both halves (pre-promote chain and post-promote chain) suppress emit, so \`eventCountAfterSettle === 0 === post-count\` and the relative comparison is vacuously true. If a future refactor changed the production guard semantics so the pre-promote chain item DID emit (count becomes 1+ after settle), the relative-comparison test would still pass as long as post-promote also emitted the same number — that's a regression the test should catch. Adding \`expect(eventCountAfterSettle).toBe(0)\` makes the contract explicit: once \`listenersDetached\` is set during the abort handler's sync part, BOTH the in-flight chain item (pre-promote) and the future chain item (post-promote) skip emit. Found in another self-audit pass — even after fixing the tautological assertion, the test could still mask certain future regressions. 70 / 70 tests pass; tsc + ESLint clean.
The dataCallbackHolder pattern (capturing the onData callback inside simulateExecution then invoking it after via a closure-shared object) was unnecessary indirection — \`mockPtyProcess.onData.mock.calls[0][0]\` reads the same callback reference whether you read it inside or outside the simulation closure. Vi's mock.calls array is per-mock-instance and beforeEach re-creates mockPtyProcess + .onData freshly, so there's no stale-reference risk in the simpler form. No behavior change in what's tested. 70 / 70 pass.
| // (`getOwnPropertyDescriptor` handler), so a Proxy whose | ||
| // `getOwnPropertyDescriptor` throws — separate from a throwing | ||
| // `get` trap — would otherwise propagate past the helper. (Caught | ||
| // by @tanzhenxin in the PR-1 review; my own audit only covered |
There was a problem hiding this comment.
[Suggestion] This explanation is useful, but the parenthetical ties long-lived production code to ephemeral review-process context (@tanzhenxin, PR-1 review, my own audit). Future readers only need the durable technical reason: hasOwnProperty.call can trigger the Proxy getOwnPropertyDescriptor trap, so both the descriptor probe and value read must stay inside the try.
Consider keeping the Proxy-trap rationale and dropping the attribution/review-history sentence.
— gpt-5.5 via Qwen Code /review
| expect(getShellAbortReasonKind(proxyReason)).toBe('cancel'); | ||
| }); | ||
|
|
||
| it("returns 'cancel' when the `getOwnPropertyDescriptor` Proxy trap throws (regression for @tanzhenxin's PR-1 review note)", () => { |
There was a problem hiding this comment.
[Suggestion] The test covers the right regression, but the name points to reviewer/review-thread history instead of the behavior under test. Test names are long-lived documentation, so this will be easier to maintain if it describes the boundary directly.
For example, this could be shortened to something behavior-focused like:
it("returns 'cancel' when the getOwnPropertyDescriptor Proxy trap throws", () => {— gpt-5.5 via Qwen Code /review
…ndler
Two cosmetic cleanups found in another self-audit pass:
- **Helper comment**: removed the parenthetical attribution ("Caught
by @tanzhenxin in the PR-1 review; my own audit only covered \`get\`
trap throws."). The technical content of the comment — explaining
why both the descriptor probe and the value read live inside the
try — stands on its own. The reviewer credit lives in commit
history / PR description, where it belongs; an in-source @-mention
ages poorly (handles change, the relevant person may move on) and
doesn't help future readers reason about the code.
- **Test Proxy**: \`throwingDescriptorProxy\` declared a \`get()\` handler
that always returned \`undefined\`. The descriptor probe throws
before the helper ever reaches the value read, so the \`get\`
handler is unreachable — dropped it and added a one-line comment
explaining why no \`get\` handler is needed for this test. Mirror
test (\`throwingReason\` with throwing accessor + \`proxyReason\` with
throwing \`get\`) keeps the symmetric "throwing-`get`" coverage.
70 / 70 tests pass; tsc + ESLint clean.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
Two narrow fixes addressing the PR-1 review notes. The Proxy-trap close is correct: wrapping hasOwnProperty.call and the kind read in the same try covers the getOwnPropertyDescriptor-throw path, and the new test deliberately omits a get handler so hasOwnProperty.call throws before kind is ever read — pinning the bug as the descriptor-probe leak rather than the already-tested get trap leak.
The PTY post-promotion test now actually exercises the listenersDetached guard rather than tautologically passing. I traced the abort-dispatch microtask ordering (sync abort → guard set inside performBackgroundPromote's pre-await prelude → pre-promote chain .then runs after) and confirmed the strict === 0 baseline meaningfully fails if the guard is removed. Two setImmediate cycles is the right drain primitive — render() runs sync when no render timer is pending.
Note 1 deferral to PR-2 is the right call; re-shaping aborted/promoted twice would just churn callers.
Verdict
APPROVE — both fixes do what they claim, no new bugs.
…-up to #3842) (#3886) * fix(core): wrap hasOwnProperty.call inside try + add post-abort PTY data assertion Two non-blocking review notes from @tanzhenxin's PR-1 approval (#3842, post-merge follow-up): - **Note 2 (real bug)**: `getShellAbortReasonKind` had `Object.prototype.hasOwnProperty.call(reason, 'kind')` outside the try/catch. `hasOwnProperty.call` triggers the `[[GetOwnProperty]]` Proxy trap (`getOwnPropertyDescriptor` handler). A Proxy whose `getOwnPropertyDescriptor` throws — separate from a throwing `get` trap, which the prior commit already covered — would propagate past the helper, leaving the abort handler's switch on `kind` to throw through `addEventListener` (which doesn't await async listener return values), so the shell process would stay alive instead of being killed on cancel. Moved the descriptor probe inside the same try block as the value read. My own multi-round audit covered six attack vectors but missed this one — only `get` trap throws were considered. The reviewer caught it on first pass. - **Note 3 (test parity)**: the PTY post-promotion handoff test asserted `dispose` was called but never re-invoked the data callback to verify the foreground `onOutputEvent` actually stops firing — the child_process equivalent has that assertion. Mirrored it: emit data AFTER abort by re-invoking the captured `dataCallback` reference, and assert `onOutputEventMock.mock.calls.length` does NOT increase past the moment of promote. Exercises the production `listenersDetached` guard inside the chain callback, which the bare dispose-was-called check didn't. Note 1 from the same review (the `aborted: true + promoted: true` shape forcing PR-2 callers to check `promoted` before `aborted`) is deliberately NOT addressed here — it's a contract simplification that affects PR-2's branching, so it belongs in PR-2 along with the caller-side decision on whether to flip `aborted` for promoted results. Added a TODO upstream in #3831 (PR-2 design) to track. 70 / 70 tests pass (69 baseline + 1 new helper boundary for the throwing-getOwnPropertyDescriptor case). tsc + ESLint clean. * test(core): fix tautological PTY post-promote assertion (audit follow-up) The PTY post-promotion handoff test added in the previous commit copied the child_process equivalent's pattern verbatim — sync \`expect(count).toBe(countAtPromote)\` immediately after dataCallback returns. That works for child_process because its \`handleOutput\` is fully synchronous (sniff → decoder → emit, all on the same call stack), so the count change happens BEFORE the assertion. PTY's \`handleOutput\` is async — \`processingChain.then(...)\` queues a microtask that does the sniff + write + render-then-emit work. The sync assertion captures both \`countAtPromote\` and the post-emit count BEFORE the chain microtask ever runs, so both reads return whatever happened before the assert (typically 0). The test would tautologically pass even if the production \`listenersDetached\` guard were removed — i.e., it didn't actually verify the guard. Restructured to: 1. Drive the PTY through \`simulateExecution\` so \`await handle.result\` forces all queued microtasks (including pre-promote chain items AND the abort handler's drain) to settle. 2. Capture \`eventCountAfterSettle\` once everything has stabilized. 3. Re-invoke the captured \`dataCallback\` with post-promote data, await two more macrotask boundaries to let the new chain item fully run. 4. Assert the count hasn't moved. If the production \`listenersDetached\` guard is removed, the post-promote chain item emits, count increases past \`eventCountAfterSettle\`, and this assertion fails. So the test actually exercises the guard now. Found in self-audit while reviewing my own follow-up commit. Caught because audit was paranoid about *whether the test verifies what it claims to verify*, not just whether it passes. 70 / 70 tests pass; tsc + ESLint clean. * test(core): pin eventCountAfterSettle === 0 in PTY post-promote test The previous fix asserted post-promote count equals the \`eventCountAfterSettle\` baseline, but didn't pin the baseline itself. With the production \`listenersDetached\` guard intact, both halves (pre-promote chain and post-promote chain) suppress emit, so \`eventCountAfterSettle === 0 === post-count\` and the relative comparison is vacuously true. If a future refactor changed the production guard semantics so the pre-promote chain item DID emit (count becomes 1+ after settle), the relative-comparison test would still pass as long as post-promote also emitted the same number — that's a regression the test should catch. Adding \`expect(eventCountAfterSettle).toBe(0)\` makes the contract explicit: once \`listenersDetached\` is set during the abort handler's sync part, BOTH the in-flight chain item (pre-promote) and the future chain item (post-promote) skip emit. Found in another self-audit pass — even after fixing the tautological assertion, the test could still mask certain future regressions. 70 / 70 tests pass; tsc + ESLint clean. * test(core): drop dataCallbackHolder pattern, read mock.calls directly The dataCallbackHolder pattern (capturing the onData callback inside simulateExecution then invoking it after via a closure-shared object) was unnecessary indirection — \`mockPtyProcess.onData.mock.calls[0][0]\` reads the same callback reference whether you read it inside or outside the simulation closure. Vi's mock.calls array is per-mock-instance and beforeEach re-creates mockPtyProcess + .onData freshly, so there's no stale-reference risk in the simpler form. No behavior change in what's tested. 70 / 70 pass. * chore(core): drop in-source @-mention attribution + dead Proxy.get handler Two cosmetic cleanups found in another self-audit pass: - **Helper comment**: removed the parenthetical attribution ("Caught by @tanzhenxin in the PR-1 review; my own audit only covered \`get\` trap throws."). The technical content of the comment — explaining why both the descriptor probe and the value read live inside the try — stands on its own. The reviewer credit lives in commit history / PR description, where it belongs; an in-source @-mention ages poorly (handles change, the relevant person may move on) and doesn't help future readers reason about the code. - **Test Proxy**: \`throwingDescriptorProxy\` declared a \`get()\` handler that always returned \`undefined\`. The descriptor probe throws before the helper ever reaches the value read, so the \`get\` handler is unreachable — dropped it and added a one-line comment explaining why no \`get\` handler is needed for this test. Mirror test (\`throwingReason\` with throwing accessor + \`proxyReason\` with throwing \`get\`) keeps the symmetric "throwing-`get`" coverage. 70 / 70 tests pass; tsc + ESLint clean.
…f 3) (#3894) * feat(core): foreground → background promote integration (#3831 PR-2 of 3) Builds on the \`signal.reason\` foundation merged in #3842 / #3886. Wires the foreground \`shell\` tool to detect a background-promote abort, snapshot the captured output to a \`bg_xxx.output\` file, register a \`BackgroundShellEntry\` in the existing \`BackgroundShellRegistry\`, and return a model-facing \`ToolResult\` pointing at \`/tasks\` / the dialog / \`task_stop\`. Also resolves design question 7 from #3831 (raised by @tanzhenxin in the PR-1 review): set \`result.aborted: false\` when \`result.promoted: true\` so existing \`if (result.aborted)\` consumer branches fall through naturally. ## Changes **\`shellExecutionService.ts\`** — both \`executeWithPty\` and \`childProcessFallback\` background-promote branches now resolve with \`aborted: false, promoted: true\` (was \`aborted: true\`). The flag now answers "should the caller emit a cancel/timeout message?" rather than "did the abort signal fire?" — and a promoted shell is neither cancelled nor timed out (the child is still running, ownership simply transferred). \`ShellExecutionResult.promoted?\` JSDoc updated to document this contract. **\`shell.ts\`** — \`ShellToolInvocation.execute()\` gains a 5th optional parameter \`setPromoteAbortControllerCallback?: (ac: AbortController) => void\`. The foreground path now creates an internal \`promoteAbortController\` and combines its signal into the existing \`signal + timeoutSignal\` AbortSignal.any() chain. Right after \`setPidCallback\` fires, \`setPromoteAbortControllerCallback\` exposes the controller to the scheduler so a UI surface (PR-3 Ctrl+B keybind) can find it by callId and trigger \`abort({ kind: 'background', shellId })\`. When \`result.promoted\` is observed after \`await resultPromise\`, a new \`handlePromotedForeground\` private method: 1. Generates \`bg_xxx\` shellId + on-disk \`outputPath\` under the same project temp dir \`executeBackground\` uses. 2. Writes \`result.output\` (the snapshot the service flushed at promote time) as the file's initial content (best-effort — ENOSPC / EACCES logged + swallowed; the registry entry is valuable on its own). 3. Constructs a \`BackgroundShellEntry\` with the running pid + the same \`promoteAbortController\` already wired into the live child — \`task_stop bg_xxx\` and the dialog's \`x\` key both abort via \`entry.abortController\` and will land on the still-running process. 4. Returns a model-facing \`ToolResult\` pointing at \`/tasks\` / the Background tasks dialog / \`task_stop\` for follow-up. **\`coreToolScheduler.ts\`** — \`TrackedExecutingToolCall\` gains an optional \`promoteAbortController?: AbortController\` field, populated when the shell tool's \`setPromoteAbortControllerCallback\` fires. The scheduler routes only the shell-tool branch to pass this callback, matching the existing \`setPidCallback\` pattern. ## Limitations (deferred to PR-2.5) Two follow-up items intentionally NOT in scope here. Scope discipline keeps PR-2 reviewable while still delivering the user-facing promote flow end-to-end (PR-3's Ctrl+B keybind can wire to this PR's \`promoteAbortController\` to ship a working feature). - **Post-promote stream redirect**: today the \`outputPath\` content is FROZEN at the promote moment. The service detached its data listener as part of PR-1's ownership-transfer contract, so post-promote bytes from the still-running child don't reach the file. \`Read\`-ing the output via \`/tasks\` shows what was captured before promote, not live updates. PR-2.5 will add caller-side \`onPostPromoteData\` callback (or equivalent) so post-promote bytes stream to the file like a normal background shell. - **Natural-exit registry settle**: the registry entry stays \`'running'\` until \`task_stop bg_xxx\` or session-end \`abortAll\` clears it. The service's exit listener was disposed at promote, so there's no observation point for natural child exit. PR-2.5 will keep the exit listener attached post-promote (with a separate \`onPostPromoteSettle\` callback) so the entry transitions to \`completed\` / \`failed\` like a normal background shell. These limitations are visible to users (output frozen, entry stays running until task_stop/session end) but don't break the core promote contract: the agent unblocks, the registry entry is observable, the process stays alive, cancel via \`task_stop\` works. ## Tests **\`shellExecutionService.test.ts\`** — two existing promote tests now assert \`aborted: false\` (per design question 7) instead of \`true\`. \`70 / 70 pass\`. **\`shell.test.ts\`** — three new tests in a \`foreground → background promote (#3831 PR-2)\` describe block: 1. \`setPromoteAbortControllerCallback\` exposes a real \`AbortController\` after spawn. 2. On \`result.promoted: true\`, the registry receives a \`bg_xxx\` entry with pid + abortController + outputPath, the snapshot is written via \`fs.writeFileSync\`, and the model-facing copy references \`/tasks\` + \`task_stop\` + the dialog. 3. A snapshot-write failure (mocked ENOSPC) doesn't break promote — the registry entry still gets registered with the running pid. \`96 / 96 pass\`. **\`coreToolScheduler.test.ts\`** — \`98 / 98 pass\` (no new tests; the new \`promoteAbortController\` field is exercised end-to-end via shell.test.ts). Total: \`264 / 264 affected tests pass\`; tsc + ESLint clean. ## Related - #3831 (Phase D part b — design + 3-PR sequencing; question 7 resolved here) - #3842 (PR-1 — \`signal.reason\` foundation) - #3886 (PR-1 follow-up — Proxy-trap fix + handoff test parity) - #3634 (Background task management roadmap) cc @tanzhenxin * fix(core): give promoted shell entry a FRESH AbortController so task_stop kills the child Real bug found in self-audit of #3894 PR-2: \`entry.abortController\` was being set to the same \`promoteAbortController\` that triggered the promote — which is **already aborted** by the time we reach \`handlePromotedForeground\`. Two consequences: 1. \`task_stop bg_xxx\` calls \`entry.abortController.abort()\`. On an already-aborted controller this is a no-op (the abort event was dispatched once when the controller fired; the second \`abort()\` doesn't re-fire listeners per WHATWG spec). 2. \`ShellExecutionService\` has already detached its own abort listener as part of the PR-1 ownership-transfer contract, so even if the abort COULD re-fire, there's nobody left listening to translate the signal into an actual SIGTERM/SIGKILL on the still- running child. Net effect: a promoted shell would survive \`task_stop\` forever — the agent would think it cancelled, the registry entry would stay \`'running'\`, and the OS process would keep running until the user killed the CLI session. Fix: \`handlePromotedForeground\` now creates a fresh \`AbortController\` for the registry entry and wires its abort listener to: 1. Send SIGTERM → SIGKILL to the still-running child via \`process.kill(-pid, …)\` (Linux/Mac process group, mirroring the \`detached: !isWindows\` spawn the foreground path uses) or \`taskkill /pid /f /t\` (Windows). Reuses the same SIGTERM-then- timeout-then-SIGKILL pattern \`ShellExecutionService.execute()\` uses on the non-promote cancel path; new constant \`PROMOTE_CANCEL_SIGKILL_TIMEOUT_MS = 200ms\` (intentionally separate from the service's \`SIGKILL_TIMEOUT_MS\` so tuning one doesn't silently change the other). 2. Sync-mark the registry entry \`cancelled\` via \`registry.cancel()\` so \`/tasks\` and the dialog reflect the user intent immediately. Added a regression test pinning \`entry.abortController.signal.aborted === false\` at registration time. Without the fix, this asserts \`true\` and the test fails — which is the visible canary for the silent-task_stop-failure mode. 97 / 97 shell.test.ts pass; tsc + ESLint clean. * fix(core): add 'error' listener on Windows taskkill spawn (audit follow-up) Reverse-audit found a Windows-specific crash mode: \`cpSpawn('taskkill', …)\` returns a \`ChildProcess\` whose 'error' event (emitted when the spawn fails — taskkill binary missing, EACCES, etc.) crashes Node by default if no 'error' listener is attached. Same pattern as PR-1's \`@lydell/node-pty\` IPty incident — Web/Node spec quirk easy to miss without specifically thinking about Windows + spawn-failure. Also wrapped the \`cpSpawn\` call itself in try/catch for the rarer sync-throw mode (EMFILE / ENOMEM at spawn-time). Recovery in both cases: log via debugLogger.warn + continue; \`registry.cancel\` below still transitions the entry, and the still-running child becomes an orphan that Windows reaps when the CLI session ends. 97 / 97 shell.test.ts pass; tsc + ESLint clean. * test(core): close 3 test gaps from #3894 review Three [Suggestion] threads from the @tanzhenxin-style review on PR-2, all real test gaps that would have let silent regressions through: 1. **\`setPromoteAbortControllerCallback\` test was too weak.** The old test only asserted that the callback received an \`AbortController\` instance, not that the controller's signal was actually wired into the \`AbortSignal.any(...)\` chain handed to ShellExecutionService. If \`shell.ts\` exposed the controller but forgot to combine its signal, Ctrl+B promotion would never reach the service while the bare-instance test still passed. Strengthened: capture the AbortSignal handed to ShellExecutionService.execute (4th arg), abort the promote controller, and assert the captured signal goes from \`aborted: false\` → \`true\`. 2. **The post-promote cancellation kill path was unverified.** The prior commit added a real-bug fix (fresh \`entryAc\` + abort listener that sends SIGTERM/SIGKILL + sync-marks the registry entry cancelled) but the only test it had was "the controller is fresh, signal not aborted". Reviewer rightly noted that this is the **core operational guarantee** for promoted shells — \`task_stop bg_xxx\` must actually stop the child. Added a test that uses fake timers + a \`process.kill\` spy: register a promoted entry, abort \`entry.abortController\`, flush microtasks (SIGTERM dispatch), advance fake time past \`PROMOTE_CANCEL_SIGKILL_TIMEOUT_MS\` (SIGKILL dispatch + \`registry.cancel\` mark). Pins the entire kill chain. 3. **Scheduler-side wiring of \`promoteAbortController\` was untested.** PR-3's Ctrl+B keybind looks up the executing tool call by callId and aborts \`tc.promoteAbortController\` — if \`CoreToolScheduler\` stops populating that field, the keybind silently breaks. Added a test in \`coreToolScheduler.test.ts\` that uses a \`TestShellInvocation extends ShellToolInvocation\` (so the scheduler's \`instanceof ShellToolInvocation\` check still routes the call through the shell-specific branch that wires the callback) and asserts that an \`onToolCallsUpdate\` batch emitted during the executing window contains a tool call where \`tc.promoteAbortController\` matches the controller the test exposed. 98 / 98 shell.test.ts pass; 99 / 99 coreToolScheduler.test.ts pass; tsc + ESLint clean. * fix(core): use commandToExecute in promoted entry + try/catch register Resolves 3 #3894 review threads: - **Critical**: `entry.command` and `llmContent` for the promoted foreground shell now use `commandToExecute` (post-co-author-rewrite form) instead of raw `this.params.command`. For `git commit -m` invocations that `addCoAuthorToGitCommit()` rewrote, the registry entry now mirrors what actually ran — matching `executeBackground`'s long-standing convention (line 1234). - Defensive try/catch around `registry.register(entry)`: today the call is internally safe (Map.set + emit), but a future implementation that throws would leave a zombie child detached from service listeners with no kill path. Catch path logs, fires `entryAc.abort()` for best-effort kill via the wired listener, and re-throws so the scheduler surfaces the failure. - Updates the misleading comment (line 748) that claimed the registry entry uses "the same `promoteAbortController`" — actual impl uses a fresh `entryAc` (the audit-fix from the previous push). Tests: - `entry.command` git-commit case pinning post-rewrite form - register-throw rejection + SIGTERM/SIGKILL kill via fake timers - 100/100 shell.test.ts pass; tsc + ESLint clean * fix(core): close 2 #3894 review findings — promote refused-race + mkdir orphan Resolves @tanzhenxin's CHANGES_REQUESTED review on #3894. 1. **Refused-promote race no longer reported as "Command timed out"** The combined-abort signal folds in `signal | timeoutSignal | promoteAbortController.signal`, but the timeout discriminator only excluded the user-cancel signal — not the promote signal. When the user fires Ctrl+B (PR-3's keybind) but the service's race guard refuses promotion (the child terminated a beat earlier), the result lands `aborted: true, promoted: false` and the foreground path falsely reported `Command timed out after 120000ms`. Both the agent and the user would see a timeout that didn't happen. Fix: extend the discriminator to ALSO exclude `promoteAbortController.signal.aborted`. Add a `wasPromoteRefused` branch that surfaces the actual cause: "Command finished before the background-promote request could be honoured (the child had already exited)." Same fix applied to both the llmContent path and the returnDisplay path so the model and the visible UI agree. Latent in PR-2 itself (no in-tree caller fires the promote yet), but PR-3's keybind would expose it on first ship. 2. **Unguarded mkdirSync orphans the promoted child** After `result.promoted: true`, ownership of the still-running child has transferred and the service's kill path is detached. The promote handler creates the snapshot output directory next, but the original `fs.mkdirSync(outputDir, { recursive: true })` had no guard — read- only temp mounts, sandboxed perms, full disk on inode/metadata exhaustion would reject the handler BEFORE the registry's kill listener was wired. The still-running child became an orphan zombie with no kill path until the OS reaped it on session end. Fix: wrap mkdirSync in try/catch (matches the safety pattern around `registry.register`). On failure, log + best-effort kill the child (SIGTERM via process.kill(-pid) on POSIX, taskkill /f /t on Windows with an `error` listener so a spawn failure doesn't crash Node) + re-throw so the scheduler surfaces the failure to the agent. Tests: 2 new regressions in `shell.test.ts`: - `mkdirSync(outputDir) throws → child gets SIGTERM, error re-raised` - `promote-refused race (aborted: true, promoted: false after promote signal) is NOT reported as "Command timed out"` 171/171 shell.test.ts pass; tsc + ESLint clean.
…PR-2 of 3) (QwenLM#3894) * feat(core): foreground → background promote integration (QwenLM#3831 PR-2 of 3) Builds on the \`signal.reason\` foundation merged in QwenLM#3842 / QwenLM#3886. Wires the foreground \`shell\` tool to detect a background-promote abort, snapshot the captured output to a \`bg_xxx.output\` file, register a \`BackgroundShellEntry\` in the existing \`BackgroundShellRegistry\`, and return a model-facing \`ToolResult\` pointing at \`/tasks\` / the dialog / \`task_stop\`. Also resolves design question 7 from QwenLM#3831 (raised by @tanzhenxin in the PR-1 review): set \`result.aborted: false\` when \`result.promoted: true\` so existing \`if (result.aborted)\` consumer branches fall through naturally. **\`shellExecutionService.ts\`** — both \`executeWithPty\` and \`childProcessFallback\` background-promote branches now resolve with \`aborted: false, promoted: true\` (was \`aborted: true\`). The flag now answers "should the caller emit a cancel/timeout message?" rather than "did the abort signal fire?" — and a promoted shell is neither cancelled nor timed out (the child is still running, ownership simply transferred). \`ShellExecutionResult.promoted?\` JSDoc updated to document this contract. **\`shell.ts\`** — \`ShellToolInvocation.execute()\` gains a 5th optional parameter \`setPromoteAbortControllerCallback?: (ac: AbortController) => void\`. The foreground path now creates an internal \`promoteAbortController\` and combines its signal into the existing \`signal + timeoutSignal\` AbortSignal.any() chain. Right after \`setPidCallback\` fires, \`setPromoteAbortControllerCallback\` exposes the controller to the scheduler so a UI surface (PR-3 Ctrl+B keybind) can find it by callId and trigger \`abort({ kind: 'background', shellId })\`. When \`result.promoted\` is observed after \`await resultPromise\`, a new \`handlePromotedForeground\` private method: 1. Generates \`bg_xxx\` shellId + on-disk \`outputPath\` under the same project temp dir \`executeBackground\` uses. 2. Writes \`result.output\` (the snapshot the service flushed at promote time) as the file's initial content (best-effort — ENOSPC / EACCES logged + swallowed; the registry entry is valuable on its own). 3. Constructs a \`BackgroundShellEntry\` with the running pid + the same \`promoteAbortController\` already wired into the live child — \`task_stop bg_xxx\` and the dialog's \`x\` key both abort via \`entry.abortController\` and will land on the still-running process. 4. Returns a model-facing \`ToolResult\` pointing at \`/tasks\` / the Background tasks dialog / \`task_stop\` for follow-up. **\`coreToolScheduler.ts\`** — \`TrackedExecutingToolCall\` gains an optional \`promoteAbortController?: AbortController\` field, populated when the shell tool's \`setPromoteAbortControllerCallback\` fires. The scheduler routes only the shell-tool branch to pass this callback, matching the existing \`setPidCallback\` pattern. Two follow-up items intentionally NOT in scope here. Scope discipline keeps PR-2 reviewable while still delivering the user-facing promote flow end-to-end (PR-3's Ctrl+B keybind can wire to this PR's \`promoteAbortController\` to ship a working feature). - **Post-promote stream redirect**: today the \`outputPath\` content is FROZEN at the promote moment. The service detached its data listener as part of PR-1's ownership-transfer contract, so post-promote bytes from the still-running child don't reach the file. \`Read\`-ing the output via \`/tasks\` shows what was captured before promote, not live updates. PR-2.5 will add caller-side \`onPostPromoteData\` callback (or equivalent) so post-promote bytes stream to the file like a normal background shell. - **Natural-exit registry settle**: the registry entry stays \`'running'\` until \`task_stop bg_xxx\` or session-end \`abortAll\` clears it. The service's exit listener was disposed at promote, so there's no observation point for natural child exit. PR-2.5 will keep the exit listener attached post-promote (with a separate \`onPostPromoteSettle\` callback) so the entry transitions to \`completed\` / \`failed\` like a normal background shell. These limitations are visible to users (output frozen, entry stays running until task_stop/session end) but don't break the core promote contract: the agent unblocks, the registry entry is observable, the process stays alive, cancel via \`task_stop\` works. **\`shellExecutionService.test.ts\`** — two existing promote tests now assert \`aborted: false\` (per design question 7) instead of \`true\`. \`70 / 70 pass\`. **\`shell.test.ts\`** — three new tests in a \`foreground → background promote (QwenLM#3831 PR-2)\` describe block: 1. \`setPromoteAbortControllerCallback\` exposes a real \`AbortController\` after spawn. 2. On \`result.promoted: true\`, the registry receives a \`bg_xxx\` entry with pid + abortController + outputPath, the snapshot is written via \`fs.writeFileSync\`, and the model-facing copy references \`/tasks\` + \`task_stop\` + the dialog. 3. A snapshot-write failure (mocked ENOSPC) doesn't break promote — the registry entry still gets registered with the running pid. \`96 / 96 pass\`. **\`coreToolScheduler.test.ts\`** — \`98 / 98 pass\` (no new tests; the new \`promoteAbortController\` field is exercised end-to-end via shell.test.ts). Total: \`264 / 264 affected tests pass\`; tsc + ESLint clean. - QwenLM#3831 (Phase D part b — design + 3-PR sequencing; question 7 resolved here) - QwenLM#3842 (PR-1 — \`signal.reason\` foundation) - QwenLM#3886 (PR-1 follow-up — Proxy-trap fix + handoff test parity) - QwenLM#3634 (Background task management roadmap) cc @tanzhenxin * fix(core): give promoted shell entry a FRESH AbortController so task_stop kills the child Real bug found in self-audit of QwenLM#3894 PR-2: \`entry.abortController\` was being set to the same \`promoteAbortController\` that triggered the promote — which is **already aborted** by the time we reach \`handlePromotedForeground\`. Two consequences: 1. \`task_stop bg_xxx\` calls \`entry.abortController.abort()\`. On an already-aborted controller this is a no-op (the abort event was dispatched once when the controller fired; the second \`abort()\` doesn't re-fire listeners per WHATWG spec). 2. \`ShellExecutionService\` has already detached its own abort listener as part of the PR-1 ownership-transfer contract, so even if the abort COULD re-fire, there's nobody left listening to translate the signal into an actual SIGTERM/SIGKILL on the still- running child. Net effect: a promoted shell would survive \`task_stop\` forever — the agent would think it cancelled, the registry entry would stay \`'running'\`, and the OS process would keep running until the user killed the CLI session. Fix: \`handlePromotedForeground\` now creates a fresh \`AbortController\` for the registry entry and wires its abort listener to: 1. Send SIGTERM → SIGKILL to the still-running child via \`process.kill(-pid, …)\` (Linux/Mac process group, mirroring the \`detached: !isWindows\` spawn the foreground path uses) or \`taskkill /pid /f /t\` (Windows). Reuses the same SIGTERM-then- timeout-then-SIGKILL pattern \`ShellExecutionService.execute()\` uses on the non-promote cancel path; new constant \`PROMOTE_CANCEL_SIGKILL_TIMEOUT_MS = 200ms\` (intentionally separate from the service's \`SIGKILL_TIMEOUT_MS\` so tuning one doesn't silently change the other). 2. Sync-mark the registry entry \`cancelled\` via \`registry.cancel()\` so \`/tasks\` and the dialog reflect the user intent immediately. Added a regression test pinning \`entry.abortController.signal.aborted === false\` at registration time. Without the fix, this asserts \`true\` and the test fails — which is the visible canary for the silent-task_stop-failure mode. 97 / 97 shell.test.ts pass; tsc + ESLint clean. * fix(core): add 'error' listener on Windows taskkill spawn (audit follow-up) Reverse-audit found a Windows-specific crash mode: \`cpSpawn('taskkill', …)\` returns a \`ChildProcess\` whose 'error' event (emitted when the spawn fails — taskkill binary missing, EACCES, etc.) crashes Node by default if no 'error' listener is attached. Same pattern as PR-1's \`@lydell/node-pty\` IPty incident — Web/Node spec quirk easy to miss without specifically thinking about Windows + spawn-failure. Also wrapped the \`cpSpawn\` call itself in try/catch for the rarer sync-throw mode (EMFILE / ENOMEM at spawn-time). Recovery in both cases: log via debugLogger.warn + continue; \`registry.cancel\` below still transitions the entry, and the still-running child becomes an orphan that Windows reaps when the CLI session ends. 97 / 97 shell.test.ts pass; tsc + ESLint clean. * test(core): close 3 test gaps from QwenLM#3894 review Three [Suggestion] threads from the @tanzhenxin-style review on PR-2, all real test gaps that would have let silent regressions through: 1. **\`setPromoteAbortControllerCallback\` test was too weak.** The old test only asserted that the callback received an \`AbortController\` instance, not that the controller's signal was actually wired into the \`AbortSignal.any(...)\` chain handed to ShellExecutionService. If \`shell.ts\` exposed the controller but forgot to combine its signal, Ctrl+B promotion would never reach the service while the bare-instance test still passed. Strengthened: capture the AbortSignal handed to ShellExecutionService.execute (4th arg), abort the promote controller, and assert the captured signal goes from \`aborted: false\` → \`true\`. 2. **The post-promote cancellation kill path was unverified.** The prior commit added a real-bug fix (fresh \`entryAc\` + abort listener that sends SIGTERM/SIGKILL + sync-marks the registry entry cancelled) but the only test it had was "the controller is fresh, signal not aborted". Reviewer rightly noted that this is the **core operational guarantee** for promoted shells — \`task_stop bg_xxx\` must actually stop the child. Added a test that uses fake timers + a \`process.kill\` spy: register a promoted entry, abort \`entry.abortController\`, flush microtasks (SIGTERM dispatch), advance fake time past \`PROMOTE_CANCEL_SIGKILL_TIMEOUT_MS\` (SIGKILL dispatch + \`registry.cancel\` mark). Pins the entire kill chain. 3. **Scheduler-side wiring of \`promoteAbortController\` was untested.** PR-3's Ctrl+B keybind looks up the executing tool call by callId and aborts \`tc.promoteAbortController\` — if \`CoreToolScheduler\` stops populating that field, the keybind silently breaks. Added a test in \`coreToolScheduler.test.ts\` that uses a \`TestShellInvocation extends ShellToolInvocation\` (so the scheduler's \`instanceof ShellToolInvocation\` check still routes the call through the shell-specific branch that wires the callback) and asserts that an \`onToolCallsUpdate\` batch emitted during the executing window contains a tool call where \`tc.promoteAbortController\` matches the controller the test exposed. 98 / 98 shell.test.ts pass; 99 / 99 coreToolScheduler.test.ts pass; tsc + ESLint clean. * fix(core): use commandToExecute in promoted entry + try/catch register Resolves 3 QwenLM#3894 review threads: - **Critical**: `entry.command` and `llmContent` for the promoted foreground shell now use `commandToExecute` (post-co-author-rewrite form) instead of raw `this.params.command`. For `git commit -m` invocations that `addCoAuthorToGitCommit()` rewrote, the registry entry now mirrors what actually ran — matching `executeBackground`'s long-standing convention (line 1234). - Defensive try/catch around `registry.register(entry)`: today the call is internally safe (Map.set + emit), but a future implementation that throws would leave a zombie child detached from service listeners with no kill path. Catch path logs, fires `entryAc.abort()` for best-effort kill via the wired listener, and re-throws so the scheduler surfaces the failure. - Updates the misleading comment (line 748) that claimed the registry entry uses "the same `promoteAbortController`" — actual impl uses a fresh `entryAc` (the audit-fix from the previous push). Tests: - `entry.command` git-commit case pinning post-rewrite form - register-throw rejection + SIGTERM/SIGKILL kill via fake timers - 100/100 shell.test.ts pass; tsc + ESLint clean * fix(core): close 2 QwenLM#3894 review findings — promote refused-race + mkdir orphan Resolves @tanzhenxin's CHANGES_REQUESTED review on QwenLM#3894. 1. **Refused-promote race no longer reported as "Command timed out"** The combined-abort signal folds in `signal | timeoutSignal | promoteAbortController.signal`, but the timeout discriminator only excluded the user-cancel signal — not the promote signal. When the user fires Ctrl+B (PR-3's keybind) but the service's race guard refuses promotion (the child terminated a beat earlier), the result lands `aborted: true, promoted: false` and the foreground path falsely reported `Command timed out after 120000ms`. Both the agent and the user would see a timeout that didn't happen. Fix: extend the discriminator to ALSO exclude `promoteAbortController.signal.aborted`. Add a `wasPromoteRefused` branch that surfaces the actual cause: "Command finished before the background-promote request could be honoured (the child had already exited)." Same fix applied to both the llmContent path and the returnDisplay path so the model and the visible UI agree. Latent in PR-2 itself (no in-tree caller fires the promote yet), but PR-3's keybind would expose it on first ship. 2. **Unguarded mkdirSync orphans the promoted child** After `result.promoted: true`, ownership of the still-running child has transferred and the service's kill path is detached. The promote handler creates the snapshot output directory next, but the original `fs.mkdirSync(outputDir, { recursive: true })` had no guard — read- only temp mounts, sandboxed perms, full disk on inode/metadata exhaustion would reject the handler BEFORE the registry's kill listener was wired. The still-running child became an orphan zombie with no kill path until the OS reaped it on session end. Fix: wrap mkdirSync in try/catch (matches the safety pattern around `registry.register`). On failure, log + best-effort kill the child (SIGTERM via process.kill(-pid) on POSIX, taskkill /f /t on Windows with an `error` listener so a spawn failure doesn't crash Node) + re-throw so the scheduler surfaces the failure to the agent. Tests: 2 new regressions in `shell.test.ts`: - `mkdirSync(outputDir) throws → child gets SIGTERM, error re-raised` - `promote-refused race (aborted: true, promoted: false after promote signal) is NOT reported as "Command timed out"` 171/171 shell.test.ts pass; tsc + ESLint clean.
…PR-2 of 3) (QwenLM#3894) * feat(core): foreground → background promote integration (QwenLM#3831 PR-2 of 3) Builds on the \`signal.reason\` foundation merged in QwenLM#3842 / QwenLM#3886. Wires the foreground \`shell\` tool to detect a background-promote abort, snapshot the captured output to a \`bg_xxx.output\` file, register a \`BackgroundShellEntry\` in the existing \`BackgroundShellRegistry\`, and return a model-facing \`ToolResult\` pointing at \`/tasks\` / the dialog / \`task_stop\`. Also resolves design question 7 from QwenLM#3831 (raised by @tanzhenxin in the PR-1 review): set \`result.aborted: false\` when \`result.promoted: true\` so existing \`if (result.aborted)\` consumer branches fall through naturally. ## Changes **\`shellExecutionService.ts\`** — both \`executeWithPty\` and \`childProcessFallback\` background-promote branches now resolve with \`aborted: false, promoted: true\` (was \`aborted: true\`). The flag now answers "should the caller emit a cancel/timeout message?" rather than "did the abort signal fire?" — and a promoted shell is neither cancelled nor timed out (the child is still running, ownership simply transferred). \`ShellExecutionResult.promoted?\` JSDoc updated to document this contract. **\`shell.ts\`** — \`ShellToolInvocation.execute()\` gains a 5th optional parameter \`setPromoteAbortControllerCallback?: (ac: AbortController) => void\`. The foreground path now creates an internal \`promoteAbortController\` and combines its signal into the existing \`signal + timeoutSignal\` AbortSignal.any() chain. Right after \`setPidCallback\` fires, \`setPromoteAbortControllerCallback\` exposes the controller to the scheduler so a UI surface (PR-3 Ctrl+B keybind) can find it by callId and trigger \`abort({ kind: 'background', shellId })\`. When \`result.promoted\` is observed after \`await resultPromise\`, a new \`handlePromotedForeground\` private method: 1. Generates \`bg_xxx\` shellId + on-disk \`outputPath\` under the same project temp dir \`executeBackground\` uses. 2. Writes \`result.output\` (the snapshot the service flushed at promote time) as the file's initial content (best-effort — ENOSPC / EACCES logged + swallowed; the registry entry is valuable on its own). 3. Constructs a \`BackgroundShellEntry\` with the running pid + the same \`promoteAbortController\` already wired into the live child — \`task_stop bg_xxx\` and the dialog's \`x\` key both abort via \`entry.abortController\` and will land on the still-running process. 4. Returns a model-facing \`ToolResult\` pointing at \`/tasks\` / the Background tasks dialog / \`task_stop\` for follow-up. **\`coreToolScheduler.ts\`** — \`TrackedExecutingToolCall\` gains an optional \`promoteAbortController?: AbortController\` field, populated when the shell tool's \`setPromoteAbortControllerCallback\` fires. The scheduler routes only the shell-tool branch to pass this callback, matching the existing \`setPidCallback\` pattern. ## Limitations (deferred to PR-2.5) Two follow-up items intentionally NOT in scope here. Scope discipline keeps PR-2 reviewable while still delivering the user-facing promote flow end-to-end (PR-3's Ctrl+B keybind can wire to this PR's \`promoteAbortController\` to ship a working feature). - **Post-promote stream redirect**: today the \`outputPath\` content is FROZEN at the promote moment. The service detached its data listener as part of PR-1's ownership-transfer contract, so post-promote bytes from the still-running child don't reach the file. \`Read\`-ing the output via \`/tasks\` shows what was captured before promote, not live updates. PR-2.5 will add caller-side \`onPostPromoteData\` callback (or equivalent) so post-promote bytes stream to the file like a normal background shell. - **Natural-exit registry settle**: the registry entry stays \`'running'\` until \`task_stop bg_xxx\` or session-end \`abortAll\` clears it. The service's exit listener was disposed at promote, so there's no observation point for natural child exit. PR-2.5 will keep the exit listener attached post-promote (with a separate \`onPostPromoteSettle\` callback) so the entry transitions to \`completed\` / \`failed\` like a normal background shell. These limitations are visible to users (output frozen, entry stays running until task_stop/session end) but don't break the core promote contract: the agent unblocks, the registry entry is observable, the process stays alive, cancel via \`task_stop\` works. ## Tests **\`shellExecutionService.test.ts\`** — two existing promote tests now assert \`aborted: false\` (per design question 7) instead of \`true\`. \`70 / 70 pass\`. **\`shell.test.ts\`** — three new tests in a \`foreground → background promote (QwenLM#3831 PR-2)\` describe block: 1. \`setPromoteAbortControllerCallback\` exposes a real \`AbortController\` after spawn. 2. On \`result.promoted: true\`, the registry receives a \`bg_xxx\` entry with pid + abortController + outputPath, the snapshot is written via \`fs.writeFileSync\`, and the model-facing copy references \`/tasks\` + \`task_stop\` + the dialog. 3. A snapshot-write failure (mocked ENOSPC) doesn't break promote — the registry entry still gets registered with the running pid. \`96 / 96 pass\`. **\`coreToolScheduler.test.ts\`** — \`98 / 98 pass\` (no new tests; the new \`promoteAbortController\` field is exercised end-to-end via shell.test.ts). Total: \`264 / 264 affected tests pass\`; tsc + ESLint clean. ## Related - QwenLM#3831 (Phase D part b — design + 3-PR sequencing; question 7 resolved here) - QwenLM#3842 (PR-1 — \`signal.reason\` foundation) - QwenLM#3886 (PR-1 follow-up — Proxy-trap fix + handoff test parity) - QwenLM#3634 (Background task management roadmap) cc @tanzhenxin * fix(core): give promoted shell entry a FRESH AbortController so task_stop kills the child Real bug found in self-audit of QwenLM#3894 PR-2: \`entry.abortController\` was being set to the same \`promoteAbortController\` that triggered the promote — which is **already aborted** by the time we reach \`handlePromotedForeground\`. Two consequences: 1. \`task_stop bg_xxx\` calls \`entry.abortController.abort()\`. On an already-aborted controller this is a no-op (the abort event was dispatched once when the controller fired; the second \`abort()\` doesn't re-fire listeners per WHATWG spec). 2. \`ShellExecutionService\` has already detached its own abort listener as part of the PR-1 ownership-transfer contract, so even if the abort COULD re-fire, there's nobody left listening to translate the signal into an actual SIGTERM/SIGKILL on the still- running child. Net effect: a promoted shell would survive \`task_stop\` forever — the agent would think it cancelled, the registry entry would stay \`'running'\`, and the OS process would keep running until the user killed the CLI session. Fix: \`handlePromotedForeground\` now creates a fresh \`AbortController\` for the registry entry and wires its abort listener to: 1. Send SIGTERM → SIGKILL to the still-running child via \`process.kill(-pid, …)\` (Linux/Mac process group, mirroring the \`detached: !isWindows\` spawn the foreground path uses) or \`taskkill /pid /f /t\` (Windows). Reuses the same SIGTERM-then- timeout-then-SIGKILL pattern \`ShellExecutionService.execute()\` uses on the non-promote cancel path; new constant \`PROMOTE_CANCEL_SIGKILL_TIMEOUT_MS = 200ms\` (intentionally separate from the service's \`SIGKILL_TIMEOUT_MS\` so tuning one doesn't silently change the other). 2. Sync-mark the registry entry \`cancelled\` via \`registry.cancel()\` so \`/tasks\` and the dialog reflect the user intent immediately. Added a regression test pinning \`entry.abortController.signal.aborted === false\` at registration time. Without the fix, this asserts \`true\` and the test fails — which is the visible canary for the silent-task_stop-failure mode. 97 / 97 shell.test.ts pass; tsc + ESLint clean. * fix(core): add 'error' listener on Windows taskkill spawn (audit follow-up) Reverse-audit found a Windows-specific crash mode: \`cpSpawn('taskkill', …)\` returns a \`ChildProcess\` whose 'error' event (emitted when the spawn fails — taskkill binary missing, EACCES, etc.) crashes Node by default if no 'error' listener is attached. Same pattern as PR-1's \`@lydell/node-pty\` IPty incident — Web/Node spec quirk easy to miss without specifically thinking about Windows + spawn-failure. Also wrapped the \`cpSpawn\` call itself in try/catch for the rarer sync-throw mode (EMFILE / ENOMEM at spawn-time). Recovery in both cases: log via debugLogger.warn + continue; \`registry.cancel\` below still transitions the entry, and the still-running child becomes an orphan that Windows reaps when the CLI session ends. 97 / 97 shell.test.ts pass; tsc + ESLint clean. * test(core): close 3 test gaps from QwenLM#3894 review Three [Suggestion] threads from the @tanzhenxin-style review on PR-2, all real test gaps that would have let silent regressions through: 1. **\`setPromoteAbortControllerCallback\` test was too weak.** The old test only asserted that the callback received an \`AbortController\` instance, not that the controller's signal was actually wired into the \`AbortSignal.any(...)\` chain handed to ShellExecutionService. If \`shell.ts\` exposed the controller but forgot to combine its signal, Ctrl+B promotion would never reach the service while the bare-instance test still passed. Strengthened: capture the AbortSignal handed to ShellExecutionService.execute (4th arg), abort the promote controller, and assert the captured signal goes from \`aborted: false\` → \`true\`. 2. **The post-promote cancellation kill path was unverified.** The prior commit added a real-bug fix (fresh \`entryAc\` + abort listener that sends SIGTERM/SIGKILL + sync-marks the registry entry cancelled) but the only test it had was "the controller is fresh, signal not aborted". Reviewer rightly noted that this is the **core operational guarantee** for promoted shells — \`task_stop bg_xxx\` must actually stop the child. Added a test that uses fake timers + a \`process.kill\` spy: register a promoted entry, abort \`entry.abortController\`, flush microtasks (SIGTERM dispatch), advance fake time past \`PROMOTE_CANCEL_SIGKILL_TIMEOUT_MS\` (SIGKILL dispatch + \`registry.cancel\` mark). Pins the entire kill chain. 3. **Scheduler-side wiring of \`promoteAbortController\` was untested.** PR-3's Ctrl+B keybind looks up the executing tool call by callId and aborts \`tc.promoteAbortController\` — if \`CoreToolScheduler\` stops populating that field, the keybind silently breaks. Added a test in \`coreToolScheduler.test.ts\` that uses a \`TestShellInvocation extends ShellToolInvocation\` (so the scheduler's \`instanceof ShellToolInvocation\` check still routes the call through the shell-specific branch that wires the callback) and asserts that an \`onToolCallsUpdate\` batch emitted during the executing window contains a tool call where \`tc.promoteAbortController\` matches the controller the test exposed. 98 / 98 shell.test.ts pass; 99 / 99 coreToolScheduler.test.ts pass; tsc + ESLint clean. * fix(core): use commandToExecute in promoted entry + try/catch register Resolves 3 QwenLM#3894 review threads: - **Critical**: `entry.command` and `llmContent` for the promoted foreground shell now use `commandToExecute` (post-co-author-rewrite form) instead of raw `this.params.command`. For `git commit -m` invocations that `addCoAuthorToGitCommit()` rewrote, the registry entry now mirrors what actually ran — matching `executeBackground`'s long-standing convention (line 1234). - Defensive try/catch around `registry.register(entry)`: today the call is internally safe (Map.set + emit), but a future implementation that throws would leave a zombie child detached from service listeners with no kill path. Catch path logs, fires `entryAc.abort()` for best-effort kill via the wired listener, and re-throws so the scheduler surfaces the failure. - Updates the misleading comment (line 748) that claimed the registry entry uses "the same `promoteAbortController`" — actual impl uses a fresh `entryAc` (the audit-fix from the previous push). Tests: - `entry.command` git-commit case pinning post-rewrite form - register-throw rejection + SIGTERM/SIGKILL kill via fake timers - 100/100 shell.test.ts pass; tsc + ESLint clean * fix(core): close 2 QwenLM#3894 review findings — promote refused-race + mkdir orphan Resolves @tanzhenxin's CHANGES_REQUESTED review on QwenLM#3894. 1. **Refused-promote race no longer reported as "Command timed out"** The combined-abort signal folds in `signal | timeoutSignal | promoteAbortController.signal`, but the timeout discriminator only excluded the user-cancel signal — not the promote signal. When the user fires Ctrl+B (PR-3's keybind) but the service's race guard refuses promotion (the child terminated a beat earlier), the result lands `aborted: true, promoted: false` and the foreground path falsely reported `Command timed out after 120000ms`. Both the agent and the user would see a timeout that didn't happen. Fix: extend the discriminator to ALSO exclude `promoteAbortController.signal.aborted`. Add a `wasPromoteRefused` branch that surfaces the actual cause: "Command finished before the background-promote request could be honoured (the child had already exited)." Same fix applied to both the llmContent path and the returnDisplay path so the model and the visible UI agree. Latent in PR-2 itself (no in-tree caller fires the promote yet), but PR-3's keybind would expose it on first ship. 2. **Unguarded mkdirSync orphans the promoted child** After `result.promoted: true`, ownership of the still-running child has transferred and the service's kill path is detached. The promote handler creates the snapshot output directory next, but the original `fs.mkdirSync(outputDir, { recursive: true })` had no guard — read- only temp mounts, sandboxed perms, full disk on inode/metadata exhaustion would reject the handler BEFORE the registry's kill listener was wired. The still-running child became an orphan zombie with no kill path until the OS reaped it on session end. Fix: wrap mkdirSync in try/catch (matches the safety pattern around `registry.register`). On failure, log + best-effort kill the child (SIGTERM via process.kill(-pid) on POSIX, taskkill /f /t on Windows with an `error` listener so a spawn failure doesn't crash Node) + re-throw so the scheduler surfaces the failure to the agent. Tests: 2 new regressions in `shell.test.ts`: - `mkdirSync(outputDir) throws → child gets SIGTERM, error re-raised` - `promote-refused race (aborted: true, promoted: false after promote signal) is NOT reported as "Command timed out"` 171/171 shell.test.ts pass; tsc + ESLint clean.
…-up to QwenLM#3842) (QwenLM#3886) * fix(core): wrap hasOwnProperty.call inside try + add post-abort PTY data assertion Two non-blocking review notes from @tanzhenxin's PR-1 approval (QwenLM#3842, post-merge follow-up): - **Note 2 (real bug)**: `getShellAbortReasonKind` had `Object.prototype.hasOwnProperty.call(reason, 'kind')` outside the try/catch. `hasOwnProperty.call` triggers the `[[GetOwnProperty]]` Proxy trap (`getOwnPropertyDescriptor` handler). A Proxy whose `getOwnPropertyDescriptor` throws — separate from a throwing `get` trap, which the prior commit already covered — would propagate past the helper, leaving the abort handler's switch on `kind` to throw through `addEventListener` (which doesn't await async listener return values), so the shell process would stay alive instead of being killed on cancel. Moved the descriptor probe inside the same try block as the value read. My own multi-round audit covered six attack vectors but missed this one — only `get` trap throws were considered. The reviewer caught it on first pass. - **Note 3 (test parity)**: the PTY post-promotion handoff test asserted `dispose` was called but never re-invoked the data callback to verify the foreground `onOutputEvent` actually stops firing — the child_process equivalent has that assertion. Mirrored it: emit data AFTER abort by re-invoking the captured `dataCallback` reference, and assert `onOutputEventMock.mock.calls.length` does NOT increase past the moment of promote. Exercises the production `listenersDetached` guard inside the chain callback, which the bare dispose-was-called check didn't. Note 1 from the same review (the `aborted: true + promoted: true` shape forcing PR-2 callers to check `promoted` before `aborted`) is deliberately NOT addressed here — it's a contract simplification that affects PR-2's branching, so it belongs in PR-2 along with the caller-side decision on whether to flip `aborted` for promoted results. Added a TODO upstream in QwenLM#3831 (PR-2 design) to track. 70 / 70 tests pass (69 baseline + 1 new helper boundary for the throwing-getOwnPropertyDescriptor case). tsc + ESLint clean. * test(core): fix tautological PTY post-promote assertion (audit follow-up) The PTY post-promotion handoff test added in the previous commit copied the child_process equivalent's pattern verbatim — sync \`expect(count).toBe(countAtPromote)\` immediately after dataCallback returns. That works for child_process because its \`handleOutput\` is fully synchronous (sniff → decoder → emit, all on the same call stack), so the count change happens BEFORE the assertion. PTY's \`handleOutput\` is async — \`processingChain.then(...)\` queues a microtask that does the sniff + write + render-then-emit work. The sync assertion captures both \`countAtPromote\` and the post-emit count BEFORE the chain microtask ever runs, so both reads return whatever happened before the assert (typically 0). The test would tautologically pass even if the production \`listenersDetached\` guard were removed — i.e., it didn't actually verify the guard. Restructured to: 1. Drive the PTY through \`simulateExecution\` so \`await handle.result\` forces all queued microtasks (including pre-promote chain items AND the abort handler's drain) to settle. 2. Capture \`eventCountAfterSettle\` once everything has stabilized. 3. Re-invoke the captured \`dataCallback\` with post-promote data, await two more macrotask boundaries to let the new chain item fully run. 4. Assert the count hasn't moved. If the production \`listenersDetached\` guard is removed, the post-promote chain item emits, count increases past \`eventCountAfterSettle\`, and this assertion fails. So the test actually exercises the guard now. Found in self-audit while reviewing my own follow-up commit. Caught because audit was paranoid about *whether the test verifies what it claims to verify*, not just whether it passes. 70 / 70 tests pass; tsc + ESLint clean. * test(core): pin eventCountAfterSettle === 0 in PTY post-promote test The previous fix asserted post-promote count equals the \`eventCountAfterSettle\` baseline, but didn't pin the baseline itself. With the production \`listenersDetached\` guard intact, both halves (pre-promote chain and post-promote chain) suppress emit, so \`eventCountAfterSettle === 0 === post-count\` and the relative comparison is vacuously true. If a future refactor changed the production guard semantics so the pre-promote chain item DID emit (count becomes 1+ after settle), the relative-comparison test would still pass as long as post-promote also emitted the same number — that's a regression the test should catch. Adding \`expect(eventCountAfterSettle).toBe(0)\` makes the contract explicit: once \`listenersDetached\` is set during the abort handler's sync part, BOTH the in-flight chain item (pre-promote) and the future chain item (post-promote) skip emit. Found in another self-audit pass — even after fixing the tautological assertion, the test could still mask certain future regressions. 70 / 70 tests pass; tsc + ESLint clean. * test(core): drop dataCallbackHolder pattern, read mock.calls directly The dataCallbackHolder pattern (capturing the onData callback inside simulateExecution then invoking it after via a closure-shared object) was unnecessary indirection — \`mockPtyProcess.onData.mock.calls[0][0]\` reads the same callback reference whether you read it inside or outside the simulation closure. Vi's mock.calls array is per-mock-instance and beforeEach re-creates mockPtyProcess + .onData freshly, so there's no stale-reference risk in the simpler form. No behavior change in what's tested. 70 / 70 pass. * chore(core): drop in-source @-mention attribution + dead Proxy.get handler Two cosmetic cleanups found in another self-audit pass: - **Helper comment**: removed the parenthetical attribution ("Caught by @tanzhenxin in the PR-1 review; my own audit only covered \`get\` trap throws."). The technical content of the comment — explaining why both the descriptor probe and the value read live inside the try — stands on its own. The reviewer credit lives in commit history / PR description, where it belongs; an in-source @-mention ages poorly (handles change, the relevant person may move on) and doesn't help future readers reason about the code. - **Test Proxy**: \`throwingDescriptorProxy\` declared a \`get()\` handler that always returned \`undefined\`. The descriptor probe throws before the helper ever reaches the value read, so the \`get\` handler is unreachable — dropped it and added a one-line comment explaining why no \`get\` handler is needed for this test. Mirror test (\`throwingReason\` with throwing accessor + \`proxyReason\` with throwing \`get\`) keeps the symmetric "throwing-`get`" coverage. 70 / 70 tests pass; tsc + ESLint clean.
Fast-follow on the PR-1 of #3831 (merged at #3842). @tanzhenxin's approving review left three non-blocking notes; this PR addresses two of them. Note 1 stays for PR-2.
Fixed
Note 2 (real bug) — `getShellAbortReasonKind` Proxy-trap leak
Confirmed: `Object.prototype.hasOwnProperty.call(reason, 'kind')` triggers the `[[GetOwnProperty]]` Proxy trap. A Proxy whose `getOwnPropertyDescriptor` handler throws would propagate up past the helper, then through the abort handler's `switch` on `kind`, then out of the (async) abort listener. `addEventListener` doesn't await the listener's returned Promise, so a leaked throw would silently leave the shell process alive instead of being killed on cancel.
Wrapped both the descriptor probe and the value read inside the same try block. Added a regression test that uses a Proxy whose `getOwnPropertyDescriptor` throws — mirror of the existing throwing-`get` Proxy test.
My multi-round self-audit before merge covered 6 attack vectors but missed this one — only `get` trap throws were considered. Worth noting publicly in case anyone else iterates here.
Note 3 (test parity) — PTY handoff-boundary assertion
Mirrored. The PTY post-promotion test now re-invokes the production `onData` callback (via `mockPtyProcess.onData.mock.calls[0][0]`) AFTER the background abort, then awaits two `setImmediate` cycles to let any chain microtask fully settle, and asserts `onOutputEventMock.mock.calls.length === 0` both pre and post — the strict `=== 0` baseline pin guarantees the assertion fails if a future refactor lets the pre-promote chain item emit before `listenersDetached` is set, not just if pre and post counts diverge. Exercises the production `listenersDetached` guard inside the chain callback, which the bare `dispose-was-called` check didn't.
(PTY's `handleOutput` is async via `processingChain` — child_process's equivalent is sync, so the mirror needed extra microtask flush + an explicit `=== 0` baseline rather than just asserting the count didn't change between two synchronous reads.)
Deferred to PR-2
Note 1 — `aborted` + `promoted` shape
This is a contract simplification that affects PR-2's branching logic. Doing it here would change the result shape AGAIN before PR-2 lands, and PR-2 may want to weigh `aborted: false` vs an explicit `if (result.promoted)` check upstream of the cancel/timeout copy. Tracked as design question 7 in #3831 — PR-2 will resolve this along with the caller-side branching logic.
Test plan
Related
cc @tanzhenxin