Skip to content

fix(cli): drop tool calls after cancellation#5020

Merged
wenshao merged 1 commit into
QwenLM:mainfrom
he-yufeng:fix/drop-cancelled-tool-calls
Jun 13, 2026
Merged

fix(cli): drop tool calls after cancellation#5020
wenshao merged 1 commit into
QwenLM:mainfrom
he-yufeng:fix/drop-cancelled-tool-calls

Conversation

@he-yufeng

Copy link
Copy Markdown
Contributor

What this PR does

When an interactive turn is cancelled after the stream has already yielded a tool-call request, the pending tool calls are now discarded before the turn returns to idle. The final scheduling step also skips dispatch if the stream signal has been aborted.

Why it's needed

This fixes #5016. Previously, a cancellation could stop the visible response while still leaving a collected tool call to be scheduled after the stream finished. That made a cancelled turn capable of running tool work from the interrupted response.

Reviewer Test Plan

How to verify

Start an interactive turn that receives a tool call and cancel the turn before the tool dispatches. The cancellation message should be shown, the UI should return to idle, and the tool call from that cancelled response should not run. The added regression test exercises the same ordering by yielding a tool-call request followed by a cancellation event and asserting that no tool scheduling occurs.

Evidence (Before & After)

Before: a tool-call request collected before cancellation could still be passed to the tool scheduler when stream processing finished. After: cancellation clears queued tool calls, and aborted stream processing does not schedule remaining tool calls.

Tested on

OS Status
🍏 macOS ⚠️ not tested
🪟 Windows ✅ tested
🐧 Linux ⚠️ not tested

Environment (optional)

Windows PowerShell, Node/npm from the repository development environment.

Risk & Scope

  • Main risk or tradeoff: this intentionally drops tool calls that were produced by a response the user cancelled before dispatch.
  • Not validated / out of scope: macOS/Linux manual TUI verification; CI should cover the cross-platform test matrix.
  • Breaking changes / migration notes: none.

Linked Issues

Fixes #5016

中文说明

What this PR does

交互式回合在 stream 已经产出 tool call request 之后被用户取消时,现在会先丢弃尚未调度的 tool call,再回到 idle。最终调度阶段也会在 stream signal 已经 abort 时跳过 tool dispatch。

Why it's needed

修复 #5016。之前取消操作可能停止可见回复,但已经收集到的 tool call 仍然会在 stream 结束后被调度,导致一个已经取消的回合继续执行被中断响应里的工具操作。

Reviewer Test Plan

How to verify

启动一个会收到 tool call 的交互式回合,并在工具调度前取消。界面应该显示取消信息、回到 idle,并且不执行这个已取消响应里的 tool call。新增回归测试用同样的事件顺序模拟:先 yield tool-call request,再 yield cancellation event,并断言没有发生 tool scheduling。

Evidence (Before & After)

Before:取消前已经收集到的 tool-call request 仍可能在 stream 处理结束时传给 tool scheduler。After:取消会清空待调度 tool call,并且 abort 后的 stream 处理不会继续调度剩余 tool call。

Tested on

Windows 本地已验证;macOS / Linux 留给 CI 矩阵覆盖。

Environment (optional)

Windows PowerShell,使用仓库开发环境中的 Node/npm。

Risk & Scope

  • 主要风险或取舍:这里会主动丢弃用户取消前由同一响应产出的、尚未调度的 tool call。
  • 未验证 / 非本次范围:macOS/Linux 手动 TUI 验证;跨平台测试由 CI 覆盖。
  • 破坏性变更 / 迁移说明:无。

Linked Issues

Fixes #5016

expect(result.current.streamingState).toBe(StreamingState.Idle);
});

it('should drop queued tool calls when user cancels the turn', async () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Test only validates guard 1 (toolCallRequests.length = 0 on UserCancelled), not guard 2 (!signal.aborted).

When the stream yields ToolCallRequestUserCancelled, the first guard empties the array. By the time execution reaches if (toolCallRequests.length > 0 && !signal.aborted), the length check is already false!signal.aborted is never evaluated. This test would pass even if guard 2 were removed.

Guard 2 covers the scenario where the client-side abort fires but the stream drains without yielding UserCancelled (e.g., async iterator buffer race). Consider adding a complementary test:

it('should not schedule tool calls when signal is aborted without UserCancelled', async () => {
  mockSendMessageStream.mockReturnValue(
    (async function* () {
      yield {
        type: ServerGeminiEventType.ToolCallRequest,
        value: { callId: 'call_1', name: 'write_file', args: { path: 'a.txt' } },
      };
      // Stream ends normally — no UserCancelled event
    })(),
  );

  const { result } = renderTestHook();

  // Simulate client-side abort (e.g., user presses Escape in UI)
  // The abort signal should prevent scheduleToolCalls even though
  // no UserCancelled event was yielded by the stream.

  await act(async () => {
    await result.current.submitQuery('abort without cancel event');
  });

  expect(mockScheduleToolCalls).not.toHaveBeenCalled();
});

— qwen3.7-max via Qwen Code /review

@he-yufeng he-yufeng force-pushed the fix/drop-cancelled-tool-calls branch from f953729 to 8fcfb00 Compare June 12, 2026 10:59
@he-yufeng

Copy link
Copy Markdown
Contributor Author

Thanks, added a second regression for the !signal.aborted guard as well. The new case queues a tool call, cancels the request before the stream finishes, and verifies that queued tool calls are still not scheduled after the aborted signal path.

Validated locally:

npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx
npx prettier --check packages/cli/src/ui/hooks/useGeminiStream.test.tsx
npx eslint packages/cli/src/ui/hooks/useGeminiStream.test.tsx --max-warnings 0
npm run typecheck --workspace=packages/cli
git diff --check

@wenshao

wenshao commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Runtime verification — real TUI, tmux-driven (maintainer merge reference)

Verdict: PASS — built PR head f9537294 locally and drove the actual TUI through cancel-during-tool-call. The fix deterministically eliminates the scheduling of a cancelled turn's tool calls, with a clean before/after against current main, and does not regress normal tool execution.

Claim (my read of the 2-line src diff)

A tool call is only materialized by the OpenAI converter when the finish_reason chunk arrives → Turn yields ToolCallRequest, which useGeminiStream collects into toolCallRequests. If cancellation lands after that collection but before stream processing finishes, Turn throws on the aborted signal and yields UserCancelled. On main, the UserCancelled branch does not clear toolCallRequests, and the post-loop if (toolCallRequests.length > 0) still calls scheduleToolCalls — so the cancelled response's tool runs (issue #5016). The PR adds two guards: clear toolCallRequests on UserCancelled, and skip the final dispatch when signal.aborted. The diff matches this description.

Method

  • PR worktree built locally (npm install && npm run build); real TUI (packages/cli/dist/index.js --approval-mode yolo) in tmux with an isolated $HOME.
  • A local mock OpenAI server made the otherwise-microsecond race window deterministic. The stream is shaped:
    content → (150ms) → tool_calls delta → (150ms) → finish_reason + usage → [6s stall] → [DONE].
    The pipeline merges usage into the held finish chunk and yields it immediately, so ToolCallRequest is collected at the start of the 6s stall while stream processing is still awaiting [DONE]. Pressing ESC anywhere in that stall reproduces exactly the "collected, not yet scheduled" state Qwen Code executes a tool after cancellation #5016 needs — turning a race into a controllable 6-second window.
  • The mock's tool is run_shell_command (mirroring the issue's reproducer) writing a proof file; the mock logs every stream phase with timestamps.

Steps

  1. Normal path, PR build (no cancel) — tool schedules and runs end-to-end:

    > please write the demo file
    ✦ Writing the demo file now...
    ╭─ ✓  Shell printf 'TOOL EXECUTED for request #13' > .../cancelled-proof.txt (write demo proof file) ─╮
    ✦ Tool result received. Done. (#14)
    

    Proof file written. The !signal.aborted guard does not regress normal dispatch.

  2. ❌→ Cancel during stall, main baseline (5854e28), 6/6 rounds — the cancelled turn still schedules the tool:

    > please write the demo file
    ✦ Writing the demo file now...
    ● Request cancelled.
    ╭─ -  Shell printf 'TOOL EXECUTED for request #1' > .../cancelled-proof.txt (write demo proof file) ─╮
    

    scheduled-after-cancel = 6/6. The Shell card appearing after "Request cancelled." is scheduleToolCalls firing on a cancelled turn — Qwen Code executes a tool after cancellation #5016.

  3. Cancel during stall, PR build, 6/6 rounds — identical timing, no tool card:

    > please write the demo file
    ✦ Writing the demo file now...
    ● Request cancelled.
    [back to idle — no Shell card]
    

    scheduled-after-cancel = 0/6. The collected tool call is dropped; the turn ends cleanly.

  4. 🔍 Cancel UI unaffected — "Request cancelled." renders and the prompt returns to idle in every round on both builds; the PR changes only whether a collected tool is dispatched, not the cancellation UX itself.

main (5854e28) PR (f953729)
Normal tool execution (no cancel) runs runs (✓ no regression)
Tool scheduled after cancel 6/6 rounds 0/6 rounds
Cancel UI ("Request cancelled." → idle) ok ok

Findings

  • ⚠️ The race is real but extremely narrow without help. My first attempt — rapid-fire ESC during a normal-speed tool stream — hit 0/8, because an abort severs the SSE connection before the provider's finish chunk lands, so the tool never materializes (a cancel that early is correctly a no-op on both builds). The bug only surfaces when the tool call is already collected and dispatch hasn't happened yet. I made that state observable by stalling the mock stream after usage but before [DONE]. This is provider-timing control, not a change to qwen — the same ordering the issue's Docker+fuzzing reproducer hits by brute force (signal_after_ms, 595 hits).
  • The shell side-effect file did not land in my macOS runs even on main (file = miss 6/6) — the downstream executor re-checks the signal before spawning, so the proof file isn't written. But the scheduling is what the PR fixes and what's observable: the Shell card appears on main and is gone on the PR. The issue's reproducer (Docker, CPU-limited, sleep-based command) does land the side-effect; the scheduling-vs-execution distinction is environmental, and the PR correctly cuts it off at the scheduling step.
  • I drove cancellation with ESC (qwen's TUI cancel key → "Request cancelled."); the issue used SIGINT. Both abort the active turn's signal and funnel through the same useGeminiStream cancellation path, so the fix is exercised identically.

Build: npm install && npm run build on the PR worktree, clean. Tests not re-run locally (CI covers them); this is runtime evidence only.

中文版(验证报告)

运行时验证 — tmux 驱动真实 TUI(合并参考)

结论:PASS — 本地构建 PR head f9537294,在真实 TUI 里走完"流式 tool call 期间取消"。修复确定性地消除了"已取消回合仍调度其 tool call"的问题,与当前 main 形成干净的前后对照,且不破坏正常的工具执行。

对这 2 行源码改动的理解

tool call 只有在 finish_reason chunk 到达时才被 OpenAI converter 物化 → Turn yield ToolCallRequestuseGeminiStream 收集进 toolCallRequests。如果取消发生在"已收集"之后、"流处理结束"之前,Turn 因 abort 抛错并 yield UserCancelled。在 main 上,UserCancelled 分支不清空 toolCallRequests,而循环结束后的 if (toolCallRequests.length > 0) 仍会调用 scheduleToolCalls —— 于是已取消响应里的工具被执行(issue #5016)。PR 加了两道防线:UserCancelled 时清空数组 + signal.aborted 时跳过最终调度。diff 与此一致。

方法

  • PR worktree 本地构建,真实 TUI 在 tmux 中运行,$HOME 隔离。
  • 本地 mock OpenAI 服务器把本来微秒级的竞态窗口变确定。流的形状:
    content → (150ms) → tool_calls delta → (150ms) → finish_reason + usage → [6 秒 stall] → [DONE]
    pipeline 把 usage 合并进 hold 住的 finish chunk 并立即 yield,所以 ToolCallRequest 在 6 秒 stall 一开始就被收集,而流处理还在等 [DONE]。在这个 stall 里任意时刻按 ESC,就精确重现 Qwen Code executes a tool after cancellation #5016 所需的"已收集、未调度"状态 —— 把竞态变成可控的 6 秒窗口。
  • mock 的工具是 run_shell_command(对齐 issue 复现器),写一个证据文件;mock 给每个流阶段打了时间戳。

步骤

  1. 正常路径,PR 构建(不取消) —— 工具端到端调度并执行:✓ Shell 卡片 + 文件写出 + 第二轮 "Tool result received. Done."。!signal.aborted 防线不破坏正常调度。
  2. ❌→ stall 期间取消,main 基线(5854e28),6/6 轮 —— 已取消回合仍调度工具:"Request cancelled." 之后出现 - Shell 卡片 = scheduleToolCalls 在已取消回合上触发(Qwen Code executes a tool after cancellation #5016)。scheduled-after-cancel = 6/6
  3. stall 期间取消,PR 构建,6/6 轮 —— 完全相同的时序,无工具卡片,直接回 idle。scheduled-after-cancel = 0/6。已收集的 tool call 被丢弃。
  4. 🔍 取消 UI 不受影响 —— 两个构建每轮都正常显示 "Request cancelled." 并回到 idle;PR 只改变"已收集的工具是否被调度",不改取消本身的交互。
main (5854e28) PR (f953729)
正常工具执行(不取消) 执行 执行(✓ 无回归)
取消后工具被调度 6/6 轮 0/6 轮
取消 UI("Request cancelled." → idle) 正常 正常

观察

  • ⚠️ 竞态真实存在但极窄。 我最初的尝试 —— 在正常速度的工具流期间狂按 ESC —— 0/8 命中,因为 abort 会在 provider 的 finish chunk 到达之前就切断 SSE 连接,工具根本没物化(这么早的取消在两个构建上都正确地无操作)。bug 只在"tool call 已收集、调度尚未发生"时出现。我通过让 mock 在 usage 之后、[DONE] 之前 stall,把这个状态变得可观测。这是对 provider 时序的控制,不是改 qwen —— 与 issue 的 Docker+fuzzing 复现器靠暴力命中(signal_after_ms,595 次)的是同一个时序。
  • shell side-effect 文件在我的 macOS 环境下即使在 main 上也没落地(file = miss 6/6) —— 下游 executor 在 spawn 前又检查了 signal,所以证据文件没写。但 PR 修的是调度,可观测的也是调度:main 上出现 Shell 卡片、PR 上消失。issue 的复现器(Docker、限 CPU、基于 sleep 的命令)能让 side-effect 落地;"调度 vs 执行"的差别是环境性的,PR 正确地在调度这一步就切断了它。
  • 我用 ESC(qwen TUI 的取消键 → "Request cancelled.")驱动取消;issue 用的是 SIGINT。两者都 abort 当前回合的 signal 并走同一条 useGeminiStream 取消路径,所以修复被等价地触发。

构建:PR worktree 上 npm install && npm run build 干净通过。本地未重跑测试(CI 已覆盖);本报告只提供运行时证据。

@wenshao

wenshao commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

While reviewing this fix I checked whether the same race exists on the other entry points. Finding: the ACP session path (and therefore the daemon, which multiplexes sessions onto a qwen --acp child) has the same class of race, with a narrower but real window. Out of scope for this PR — the TUI fix here is correct and self-contained — but worth tracking as a follow-up.

What ACP already does right: inside the stream loop, every event checks the prompt signal and bails before anything is dispatched (Session.ts#L1156) — so a cancel that lands mid-stream correctly drops the collected functionCalls. That's the equivalent of this PR's first seatbelt.

What's missing is the second seatbelt: after the stream ends naturally, there are await points before dispatch — e.g. await this.messageEmitter.emitUsageMetadata(...) (L1238) yields the event loop, which is exactly where a session/cancel JSON-RPC notification can be processed. The dispatch chain that follows never consults the signal:

  • if (functionCalls.length > 0) → runToolCalls(...) has no abort gate, at any of its four call sites (L1245, L1510, L2059, L2368)
  • runToolCalls() (L2707) has no entry check
  • runTool() (L2805) has no entry check either — the only abortSignal.aborted reads in the whole chain are post-execution status reporting (cancelled vs error)

So a cancel that lands in the post-stream window still dispatches every collected tool call. The signal is passed into tool execution, so well-behaved tools self-abort, but the dispatch itself is not gated — a tool that doesn't poll the signal runs to completion with side effects. Within a sequential batch, cancelling mid-batch also doesn't stop the remaining not-yet-started calls from being built and dispatched.

The daemon inherits all of this unchanged (the bridge multiplexes sessions onto one qwen --acp child — see the architecture note around bridge.ts#L130), and arguably amplifies it: a REST cancel travels HTTP → bridge → ACP child, so the user-perceived "I cancelled before it ran" gap is wider than a local ESC, and unattended daemon sessions won't notice the extra tool run.

Suggested follow-up fix (symmetric to this PR): an aborted gate at runToolCalls() entry (covers all four dispatch sites at once), plus one at runTool() entry so a mid-batch cancel stops the not-yet-started remainder. Both are one-liners returning the same cancelled-shape responses the post-execution path already produces.

中文说明

复查这个修复时顺带验证了其他入口是否有同款竞态。结论:ACP session 路径存在同类竞态(窗口更窄但真实),daemon 完全继承——bridge 将多个 session 复用到一个 qwen --acp 子进程上,跑的就是同一份 Session 代码。

  • ACP 做对的部分:流循环内逐事件检查 signal(L1156),流中途取消会正确丢弃已收集的 functionCalls——等价于本 PR 的第一道保险。
  • 缺的是第二道:流自然结束后、派发前有 await 点(如 L1238 的 emitUsageMetadata 会让出事件循环,session/cancel 恰好可在此被处理),而后续整条派发链不再查 signal:四处派发点(L1245/L1510/L2059/L2368)、runToolCalls() 入口(L2707)、runTool() 入口(L2805)均无检查,全链路仅有的 aborted 读取是执行后的状态上报。
  • 后果:取消落在这个窗口时工具照常全部派发;signal 虽传入执行层,但不主动轮询 signal 的工具会带副作用跑完。批内串行执行时,中途取消也不会阻止尚未启动的后续工具被 build + 派发。
  • daemon 放大了该窗口:REST 取消需经 HTTP → bridge → ACP 子进程三跳,且无人值守场景多跑的工具更难被察觉。

建议 follow-up(与本 PR 对称):runToolCalls() 入口加一道 aborted 检查(一次覆盖四处派发点),runTool() 入口再加一道(覆盖批内中途取消)。两处都是一行短路,复用既有的 cancelled 响应形态。

@he-yufeng he-yufeng force-pushed the fix/drop-cancelled-tool-calls branch from 8fcfb00 to 8f27f5e Compare June 13, 2026 19:00
@he-yufeng

Copy link
Copy Markdown
Contributor Author

Addressed the review suggestion and rebased the branch onto the latest upstream/main.

The PR already had the UserCancelled coverage; the branch now also includes the complementary aborted-signal case: a ToolCallRequest is queued, the client aborts while the stream drains without yielding UserCancelled, and the test asserts that scheduleToolCalls is not called.

Validated on Windows:

  • npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx
  • npm run typecheck --workspace=packages/cli
  • npx prettier --check packages/cli/src/ui/hooks/useGeminiStream.ts packages/cli/src/ui/hooks/useGeminiStream.test.tsx
  • npx eslint packages/cli/src/ui/hooks/useGeminiStream.ts packages/cli/src/ui/hooks/useGeminiStream.test.tsx --max-warnings 0
  • git diff --check

Typecheck note: after rebasing, local workspace declaration files were stale. I rebuilt packages/core and packages/acp-bridge, then packages/cli typecheck passed.

@wenshao

wenshao commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

🔁 Re-verification — head 8f27f5e0c: my review suggestion is addressed; both guards now pinned ✅

Follow-up to my first runtime verification (PASS) and my inline [Suggestion]: the original test only exercised guard 1 (UserCancelled empties toolCallRequests), so by the time if (toolCallRequests.length > 0 && !signal.aborted) ran, the length check was already false and guard 2 (!signal.aborted) was never evaluated. This force-push adds a dedicated guard-2 regression and rebases onto current main. Re-verified on Linux 6.12 / Node v22.22.2: full hook suite, two test-effectiveness counter-proofs, and a real-TUI run in tmux.

Verdict: the suggestion is correctly addressed — guard 2 now has its own test, decisively pinned — the 2-line fix is intact, it rebases cleanly onto the merged #5071, and the normal tool path is not regressed. Recommend merge.

Rebase / relationship to #5071 (worth noting for merge)

origin/main is now 0db327317 = the squash-merge of #5071 ("submit fast tool results after stream end"). #5020 is rebased on top, so its true diff is just the 2-line guard + 2 tests (GitHub shows useGeminiStream.ts 2+/1-). The two PRs fix different cancellation races and coexist cleanly:

The two guards and their tests

Guard (src) Test
case UserCancelled: … toolCallRequests.length = 0; should drop queued tool calls when user cancels the turn
if (toolCallRequests.length > 0 && !signal.aborted) should not dispatch queued tool calls after the request is aborted (new — my suggestion)

Full suite: 107 passed (107) — 105 inherited from main-with-#5071 + the 2 new #5020 tests.

Counter-proofs — the two tests independently pin the two guards (decisive)

Revert (src only) Result
remove && !signal.aborted (guard 2) should not dispatch queued tool calls after the request is abortedRED; guard-1 test stays green
remove toolCallRequests.length = 0 (guard 1) should drop queued tool calls when user cancels the turnRED; guard-2 test stays green

Each guard reddens exactly one test, with zero collateral on the other 105 — so the guards are not redundant: one covers UserCancelled-without-abort, the other aborted-signal-without-UserCancelled. This is precisely the gap my [Suggestion] flagged, now closed. Restoring → 107/107.

Real-TUI run (tmux + mock OpenAI, --approval-mode yolo)

Mock stream: content → tool_call(run_shell_command) → finish_reason+usage → [6 s stall] → [DONE], so the ToolCallRequest is collected at finish_reason while processing still awaits [DONE] (the #5016 "collected-not-dispatched" window).

  • AFTER, normal path (no cancel): ✦ Writing the demo file now…✓ Shell printf … > cancelled-proof.txtproof file written✦ Tool result received. Done.no regression from the guards.
  • AFTER, cancel mid-stall (ESC): ✦ Writing…● Request cancelled.no Shell card (the collected tool is dropped).

Honest scope note: on current main (post-#5071 + intervening commits) my ESC-during-SSE-stall abort drops the tool on the baseline build too — i.e. I could not reproduce the BEFORE/AFTER runtime divergence here that I got 6/6 on the older base (5854e28) in my first review. #5016 is an extremely narrow timing race (originally surfaced by Docker fuzzing); on today's main the abort interleaving routes around the schedule via the exception path rather than the UserCancelled-after-collection path. The deterministic before/after divergence is the unit counter-proofs above — they exercise the exact event sequences and red/green precisely on each guard. The runtime run here confirms the fix behaves correctly and, importantly, does not regress normal tool execution.

Static & CI

  • prettier --check ✅, eslint … --max-warnings 0 exit 0, tsc --noEmit (cli) ✅ no errors. Working tree byte-identical to PR head after all A/B swaps.
  • CI 8f27f5e0c: Lint ✅, CodeQL ✅, Classify ✅, Test macOS ✅, Test ubuntu ✅; Windows still running at post time.
🇨🇳 中文版(点击展开)

🔁 复核 — head 8f27f5e0c:我的评审建议已落实,两道守卫均被锁定 ✅

承接我的首次运行时验证(PASS)与行内 [Suggestion]:原测试只覆盖了守卫 1UserCancelled 清空 toolCallRequests),等执行到 if (toolCallRequests.length > 0 && !signal.aborted) 时长度已为 0,守卫 2(!signal.aborted)从未被求值。这次强推新增了专门针对守卫 2 的回归,并变基到当前 main。在 Linux 6.12 / Node v22.22.2 复核:完整 hook 测试、两次“测试有效性反证”、tmux 真实 TUI 运行。

结论:建议已正确落实——守卫 2 现在有了自己的测试并被决定性锁定——两行修复完好,干净地变基在已合并的 #5071 之上,且不回归正常工具路径。建议合并。

变基 / 与 #5071 的关系(合并时值得注意)
origin/main 现在是 0db327317 = #5071 的 squash 合并。#5020 变基其上,真实改动只有两行守卫 + 2 个测试(GitHub 显示 useGeminiStream.ts2+/1-)。两个 PR 修的是不同的取消竞态,干净共存:

两道守卫与对应测试

守卫(源码) 测试
case UserCancelled: … toolCallRequests.length = 0; should drop queued tool calls when user cancels the turn
if (toolCallRequests.length > 0 && !signal.aborted) should not dispatch queued tool calls after the request is aborted(新增——我的建议)

完整测试:107 通过 / 107(含 main+#5071 的 105 + 新增 2)。

反证——两个测试各自独立锁定两道守卫(决定性)

仅回退(源码) 结果
去掉 && !signal.aborted(守卫 2) should not dispatch queued tool calls after the request is aborted变红;守卫 1 测试仍绿
去掉 toolCallRequests.length = 0(守卫 1) should drop queued tool calls when user cancels the turn变红;守卫 2 测试仍绿

每道守卫只让对应的一个测试变红,对其余 105 个零误伤——说明两道守卫并不冗余:一个覆盖“UserCancelled 但 signal 未 abort”,另一个覆盖“signal 已 abort 但无 UserCancelled”。这正是我 [Suggestion] 指出的缺口,现已补上。还原 → 107/107。

真实 TUI 运行(tmux + mock OpenAI,--approval-mode yolo
mock 流:content → tool_call(run_shell_command) → finish_reason+usage → [6 秒 stall] → [DONE],因此 ToolCallRequestfinish_reason 时被收集,而处理仍在等 [DONE](即 #5016 的“已收集未派发”窗口)。

  • AFTER 正常路径(不取消): ✦ Writing the demo file now…✓ Shell printf … > cancelled-proof.txt证据文件写出✦ Tool result received. Done.无回归
  • AFTER stall 中取消(ESC): ✦ Writing…● Request cancelled.无 Shell 卡片(已收集的工具被丢弃)。

诚实的范围说明:在当前 main(#5071 合并 + 期间其他提交)上,我的“stall 中 ESC”取消在基线构建上也会丢弃该工具——也就是说,我这次没能复现首评里在更旧基(5854e28)上 6/6 的运行时 BEFORE/AFTER 分化。#5016 是一个极窄的时序竞态(最初由 Docker 模糊测试发现);在今天的 main 上,abort 走的是异常路径而非“收集后再 UserCancelled”路径,从而绕开了派发。确定性的 before/after 分化来自上面的单测反证——它们精确复现事件序列,并对每道守卫精确红/绿。本次运行时运行确认修复行为正确,且关键在于不回归正常工具执行。

静态 & CI

  • prettier --check ✅、eslint … --max-warnings 0 exit 0tsc --noEmit(cli) ✅ 无错误。所有 A/B 替换后工作区与 PR head 字节一致。
  • CI 8f27f5e0c:Lint ✅、CodeQL ✅、Classify ✅、Test macOS ✅、Test ubuntu ✅;撰写时 Windows 仍在运行。

@qqqys qqqys left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical-only re-review: the cancellation scheduling fix remains scoped and the new head 8f27f5e adds independent coverage for the aborted-signal guard as requested. CI is green, and I did not find a new critical blocker in this pass.

@wenshao wenshao merged commit b6b15e4 into QwenLM:main Jun 13, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen Code executes a tool after cancellation

3 participants