Skip to content

fix(cli): submit fast tool results after stream end#5071

Merged
wenshao merged 1 commit into
QwenLM:mainfrom
he-yufeng:fix/auto-edit-completed-tool-race
Jun 13, 2026
Merged

fix(cli): submit fast tool results after stream end#5071
wenshao merged 1 commit into
QwenLM:mainfrom
he-yufeng:fix/auto-edit-completed-tool-race

Conversation

@he-yufeng

Copy link
Copy Markdown
Contributor

What this PR does

Fixes the tool-result handoff race where a very fast tool completion could be dropped after the model stream had already ended but before React replaced handleCompletedTools with a fresh callback. The hook now tracks the actual sendMessageStream lifetime in a synchronous ref and uses that for the active-stream guard, while keeping the existing isResponding state for UI rendering.

Why it's needed

#4672 reports AUTO_EDIT / YOLO turns stalling after a read/edit error until the next user instruction. The maintainer root-cause analysis points to handleCompletedTools reading stale isResponding from a React callback closure: the stream finishes, a tool fails immediately, and the old callback still sees isResponding === true, returns early, and never sends the tool error back to the model. AUTO_EDIT / YOLO removes the approval delay, so the race is easier to hit there.

Reviewer Test Plan

How to verify

Run npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx. The new regression test is submits a fast tool result after the stream ended but before React replaces the callback; it saves the callback captured while React state still says responding, ends the held stream, then fires a fast failed read_file result through that stale callback and verifies a ToolResult request is sent.

Evidence (Before & After)

Before: the stale callback path returned at the isResponding guard and dropped the tool result, so the second sendMessageStream call never happened. After: the guard reads the actual stream-lifetime ref, sees that the stream has ended, marks the tool submitted, and sends the failed tool result back to the model.

Tested on

OS Status
🍏 macOS ⚠️ not tested
🪟 Windows ✅ tested
🐧 Linux ⚠️ not tested

Environment (optional)

Windows 11, Node v24.15.0, npm 11.12.1.

Validation run locally:

npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx -t "fast tool result"  # 1 passed
npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx                      # 104 passed
npm run typecheck --workspace=packages/cli                                                        # passed
npm run lint --workspace=packages/cli -- src/ui/hooks/useGeminiStream.ts src/ui/hooks/useGeminiStream.test.tsx  # passed
npm run build --workspace=packages/cli                                                            # passed

Risk & Scope

  • Main risk or tradeoff: the guard now depends on actual stream lifetime instead of React render timing. A small counter is used so overlapping streams, such as /btw, do not clear the guard early.
  • Not validated / out of scope: manual YOLO-mode terminal reproduction on macOS/Linux was not run locally; the regression is covered at the hook level.
  • Breaking changes / migration notes: none.

Linked Issues

Fixes #4672

中文说明

What this PR does

修复一个工具结果回传的竞态:模型流已经结束,但 React 还没来得及把 handleCompletedTools 换成新闭包时,快速完成或快速失败的工具结果可能被旧闭包丢掉。现在 hook 用同步 ref 记录真实的 sendMessageStream 生命周期,并用它做 active-stream guard;原来的 isResponding state 仍然只负责 UI 渲染语义。

Why it's needed

#4672 反馈 AUTO_EDIT / YOLO 模式下读文件或编辑出错后,本轮不会继续更新文件,需要下一次用户指令才恢复。maintainer 已定位根因:handleCompletedTools 从 React callback 闭包里读到旧的 isResponding。时序是模型流先结束,工具马上失败,旧 callback 仍认为 isResponding === true,于是提前 return,模型收不到工具错误,agent loop 卡住。AUTO_EDIT / YOLO 没有 approval 延迟,所以更容易撞到这个窗口。

Reviewer Test Plan

How to verify

运行 npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx。新增回归测试 submits a fast tool result after the stream ended but before React replaces the callback:测试先保存 React state 仍是 responding 时捕获的旧 callback,再结束 held stream,然后让一个快速失败的 read_file 结果走这个旧 callback,验证它仍会发送 ToolResult 请求。

Evidence (Before & After)

修复前:旧 callback 会在 isResponding guard 直接 return,工具结果被丢弃,第二次 sendMessageStream 不会发生。修复后:guard 读取真实 stream-lifetime ref,发现模型流已经结束,于是标记工具已提交,并把失败的工具结果回传给模型。

Tested on

OS Status
🍏 macOS ⚠️ not tested
🪟 Windows ✅ tested
🐧 Linux ⚠️ not tested

Environment (optional)

Windows 11,Node v24.15.0,npm 11.12.1。

Risk & Scope

  • 主要风险/取舍:guard 现在依赖真实 stream 生命周期,不再依赖 React render 时机。这里用了一个小 counter,避免 /btw 这类重叠 stream 提前清掉 guard。
  • 未验证/不在范围内:本地没有在 macOS/Linux 上手动复现 YOLO 终端流程;本 PR 用 hook 级回归测试覆盖该竞态。
  • 破坏性变更/迁移:无。

Linked Issues

Fixes #4672

});

await act(async () => {
await staleOnComplete?.([fastFailedTool]);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] TS2349: staleOnComplete is typed as ((completedTools: TrackedToolCall[]) => Promise<void>) | null, but fastFailedTool is TrackedCompletedToolCall. These types are incompatible — tsc --noEmit fails with "This expression is not callable." Tests pass at runtime because vitest doesn't typecheck.

Suggested change
await staleOnComplete?.([fastFailedTool]);
await (staleOnComplete as ((tools: TrackedCompletedToolCall[]) => Promise<void>) | null)?.([fastFailedTool]);

Or update capturedOnComplete's type annotation to accept TrackedCompletedToolCall[].

— qwen3.7-max via Qwen Code /review

@@ -2409,7 +2418,6 @@ export const useGeminiStream = (
submitQuery(responsesToSend, SendMessageType.ToolResult, prompt_ids[0]);
},
[

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Missing cancellation guard: removing isResponding from the dependency array eliminates the stale-closure that accidentally protected against post-cancellation tool submissions. Now activeModelStreamsRef.current always reads 0 after stream abort, so handleCompletedTools proceeds to submitQuery at line 2418 — sending tool results (potentially containing file contents or shell output) to a new API call the user explicitly tried to cancel via Ctrl+C.

Scenario: Ctrl+C → stream aborts → counter→0 in finally → tool completes → activeModelStreamsRef.current > 0 guard passes → submitQuery fires unwanted API call. The submitQuery function resets turnCancelledRef.current = false, so the cancel state is lost.

Before this PR, the stale isResponding === true closure returned early. Now the ref is always current, making this path deterministic.

Fix: Add a cancellation check before submitQuery at line 2418:

if (turnCancelledRef.current || abortControllerRef.current?.signal.aborted) {
  markToolsAsSubmitted(callIdsToMarkAsSubmitted);
  return;
}

— qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

✅ Local verification report — PR #5071

Verified locally by a maintainer on macOS — the OS this PR's matrix marked ⚠️ not tested. I reproduced every check in the test plan, plus an adversarial test-effectiveness counter-proof and a real-CLI end-to-end run driven through tmux.

Environment

  • macOS (Darwin 25.5.0), Node v22.22.2, npm 10.9.7
  • PR head 8ed1615 (parent 66c6986); fresh git worktree + npm ci + npm run bundle

Results at a glance

# Check Result
1 Code review of the race fix ✅ mechanism sound
2 Full hook suite useGeminiStream.test.tsx (macOS) 104 passed
3 Test-effectiveness counter-proof (revert source, keep test) decisive — only the new test goes red
4 lint / typecheck / build / bundle ✅ all exit 0
5 Real CLI end-to-end (tmux TUI + mock OpenAI, YOLO) ✅ loop continues past the tool error

1. Code review

The fix swaps the early-return guard in handleCompletedTools from if (isResponding) (React state — lags one render, so a captured callback closure reads a stale value) to if (activeModelStreamsRef.current > 0) (a ref — updated synchronously). Confirmed:

  • The ref is +1 before the stream starts and -1 in finally; setIsResponding(false) only fires when the counter reaches 0, so overlapping streams (e.g. /btw) don't clear the guard early. Math.max(0, …) guards against underflow.
  • isResponding is dropped from the useCallback deps and is no longer referenced anywhere inside handleCompletedTools, so the dep removal introduces no new stale closure. Its only remaining consumer is the streamingState memo (UI). Single-stream UI semantics are unchanged; overlapping-stream UI becomes strictly more correct (stays Responding for the whole overlap).

2. Full suite on macOS

npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx
→ Test Files 1 passed (1) | Tests 104 passed (104)

Matches the author's Windows run and fills the macOS gap.

3. Test-effectiveness counter-proof — the decisive evidence

A regression test for a race is only worth as much as its ability to fail on the bug. I reverted only useGeminiStream.ts to the PR's parent 66c6986 (restoring the if (isResponding) return bug) while keeping the PR's test file, then re-ran the suite:

Tests  1 failed | 103 passed (104)

The single failure is the new test submits a fast tool result after the stream ended …, at expect(mockSendMessageStream).toHaveBeenCalledTimes(2) (a waitFor timeout) — i.e. on the buggy source the stale callback hits the isResponding guard, drops the tool result, and the tool-result sendMessageStream never happens. Restoring the PR source → back to 104/104. This proves the new test pinpoints exactly the race it claims to, with zero collateral on the other 103 tests.

4. Static checks

lint (changed files), typecheck (whole cli package), build, and bundle each exit 0. The bundled dist/cli.js contains the activeModelStreamsRef symbol, confirming the artifact used in check #5 is the fixed build.

5. Real CLI end-to-end (tmux + mock OpenAI server, YOLO)

This race lives in the React TUI path (useGeminiStream), not the headless -p path — so I drove the real bundled CLI (dist/cli.js) in an interactive tmux TUI, YOLO mode, pointed at a local mock OpenAI server. Scenario = #4672's shape: the model calls read_file on a missing path → ENOENT → the CLI must hand the error back and continue.

Observed in the TUI:

✦ Reading the requested file now.
x  ReadFile {"file_path":"/tmp/…-DEADBEEF.txt"}
   File not found: /tmp/…-DEADBEEF.txt
✦ CONTINUATION_OK the read_file error was received: the file does not exist…

Mock server log shows the two model requests, the second carrying the tool result back 145 ms after the first:

turn 1   messages=2   lastRole=user   → model emits read_file tool_call
turn 2   messages=4   lastRole=tool   → continuation        (Δ 145 ms)

The existence of turn 2 (lastRole: tool) is the end-to-end signal that the failed tool result was submitted and the agent loop continued — the exact behavior #4672 reported as broken.

Honest scope note: this race is a sub-render-timing window; on the happy path both the fixed and the buggy build can emit turn 2 (the window isn't hit on every run). So this run proves the fixed CLI works end-to-end and the change doesn't regress normal tool-error handling — the deterministic regression guarantee comes from check #3, not from this run.

Verdict

LGTM — recommend merge. Every claim in the PR test plan reproduces on macOS, the new regression test is proven to actually catch the race (red on the parent commit, green on the PR), and the real CLI handles the YOLO tool-error → continue path end-to-end with no regression.

🇨🇳 中文版(点击展开)

✅ 本地验证报告 — PR #5071

由维护者在 macOS 上本地验证 —— 正是本 PR 矩阵里标记 ⚠️ not tested 的系统。我复现了测试计划中的每一项,并额外做了一次"测试有效性反证"和一次通过 tmux 驱动的真实 CLI 端到端运行。

环境

  • macOS (Darwin 25.5.0)、Node v22.22.2、npm 10.9.7
  • PR head 8ed1615(父提交 66c6986);全新 git worktree + npm ci + npm run bundle

结果速览

# 检查项 结果
1 竞态修复的代码审查 ✅ 机制正确
2 完整 hook 测试套件(macOS) 104 passed
3 测试有效性反证(回退源码、保留测试) 决定性 —— 仅新测试变红
4 lint / typecheck / build / bundle ✅ 全部 exit 0
5 真实 CLI 端到端(tmux TUI + mock OpenAI,YOLO) ✅ 工具出错后 loop 继续

1. 代码审查

修复把 handleCompletedTools 里的早退守卫从 if (isResponding)(React state,会滞后一个 render,被捕获的旧闭包读到陈旧值)改为 if (activeModelStreamsRef.current > 0)(ref,同步更新)。确认:

  • ref 在 stream 开始前 +1、在 finally-1;只有计数归零才 setIsResponding(false),因此重叠 stream(如 /btw)不会提前清掉守卫。Math.max(0, …) 防下溢。
  • isRespondinguseCallback 依赖数组移除, handleCompletedTools 函数体内已不再引用它,所以移除依赖不会引入新的 stale closure。它唯一的剩余消费者是 streamingState memo(UI)。单 stream 的 UI 语义不变;重叠 stream 的 UI 反而更正确(整个重叠期持续显示 Responding)。

2. macOS 全套测试

npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx
→ Test Files 1 passed (1) | Tests 104 passed (104)

与作者的 Windows 结果一致,补上了 macOS 的空缺。

3. 测试有效性反证 —— 决定性证据

竞态的回归测试,其价值取决于它"在 bug 上能否变红"。我useGeminiStream.ts 回退到 PR 父提交 66c6986(恢复 if (isResponding) return 这个 bug),同时保留 PR 的测试文件,再跑一遍:

Tests  1 failed | 103 passed (104)

唯一失败的就是新测试 submits a fast tool result after the stream ended …,失败点在 expect(mockSendMessageStream).toHaveBeenCalledTimes(2)(waitFor 超时)—— 即在 bug 版源码上,旧闭包命中 isResponding 守卫、丢弃工具结果,回传工具结果的第二次 sendMessageStream 永远不发生。还原 PR 源码后 → 回到 104/104。这证明新测试精准命中它声称的竞态,且对其余 103 个测试零误伤。

4. 静态检查

lint(改动文件)、typecheck(整个 cli 包)、buildbundle 各自 exit 0。打包出的 dist/cli.js 含有 activeModelStreamsRef 符号,确认第 5 项用的产物就是修复后的构建。

5. 真实 CLI 端到端(tmux + mock OpenAI server,YOLO)

这个竞态存在于 React TUI 路径(useGeminiStream),不在 headless -p 路径 —— 所以我用 tmux 在交互式 TUI 里驱动真实打包的 CLI(dist/cli.js),YOLO 模式,指向一个本地 mock OpenAI server。场景即 #4672 的形态:模型调用 read_file 读一个不存在的路径 → ENOENT → CLI 必须把错误回传并继续。

TUI 中观察到:

✦ Reading the requested file now.
x  ReadFile {"file_path":"/tmp/…-DEADBEEF.txt"}
   File not found: /tmp/…-DEADBEEF.txt
✦ CONTINUATION_OK the read_file error was received: the file does not exist…

mock server 日志显示两次模型请求,第二次在第一次之后 145 ms 把工具结果带了回来:

turn 1   messages=2   lastRole=user   → 模型给出 read_file 工具调用
turn 2   messages=4   lastRole=tool   → 续轮                 (Δ 145 ms)

turn 2(lastRole: tool)的存在就是端到端信号:失败的工具结果被成功提交、agent loop 继续了 —— 正是 #4672 所报告的"坏掉的"行为。

诚实的范围说明: 这个竞态是一个 render 时机内的亚窗口;在正常路径上,修复版和 bug 版可能发出 turn 2(窗口并非每次命中)。所以这次运行证明的是"修复后的 CLI 端到端可用、且未破坏正常的工具错误处理",而确定性的回归保证来自第 3 项,而非这次运行。

结论

LGTM —— 建议合并。 PR 测试计划中的每一项都在 macOS 复现;新回归测试被证明确实能抓住该竞态(在父提交上红、在 PR 上绿);真实 CLI 在 YOLO 工具出错 → 继续的路径上端到端工作,无回归。

@he-yufeng he-yufeng force-pushed the fix/auto-edit-completed-tool-race branch from 8ed1615 to 1ae9e83 Compare June 13, 2026 18:07
@he-yufeng

Copy link
Copy Markdown
Contributor Author

Updated the branch at 1ae9e83. This addresses both review blockers: the stale scheduler callback now checks the live cancellation state before submitting tool results, and the regression test now covers the post-cancel stale-callback path. I also fixed the test callback type issue called out in review. Verified locally: npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx; npm run typecheck --workspace=packages/cli; npx prettier --check packages/cli/src/ui/hooks/useGeminiStream.ts packages/cli/src/ui/hooks/useGeminiStream.test.tsx; npx eslint packages/cli/src/ui/hooks/useGeminiStream.ts packages/cli/src/ui/hooks/useGeminiStream.test.tsx --max-warnings 0.

@wenshao

wenshao commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

🔁 Re-verification — head 1ae9e8355: both my [Critical] blockers are fixed ✅

Follow-up to my first review, where I gave the original fix an LGTM but filed two [Critical] inline blockers. You force-pushed 1ae9e8355 (rebased onto current main dc6edcd52, single commit) to address them. I re-verified the new head locally on Linux 6.12 / Node v22.22.2 — isolated worktree at the merge-base + just the PR's 2 files, full hook suite, two test-effectiveness counter-proofs, static checks, and a real bundled-CLI run in tmux.

Verdict: both blockers are correctly resolved and independently verified. Recommend merge.

Blocker 1 — TS2349 in the test (tsc --noEmit failed) → FIXED ✅

tsc --noEmit -p packages/cli/tsconfig.json0 errors (whole package). The staleOnComplete / TrackedCompletedToolCall callable-type incompatibility I flagged is gone, so the suite is now type-sound, not just runtime-green.

Blocker 2 — missing cancellation guard → FIXED ✅ (and decisively pinned)

The new guard sits exactly where it needs to — after geminiTools is computed and the geminiTools.length === 0 early-return, before submitQuery:

if (turnCancelledRef.current || abortControllerRef.current?.signal.aborted) {
  markToolsAsSubmitted(geminiTools.map((tc) => tc.request.callId));
  return; // mark submitted (no stuck completed-but-not-submitted), but don't send results to the model
}

So once the ref-based guard correctly allows the post-stream-end path through, this catches the post-cancellation case that the old stale-isResponding closure used to block by accident. It also markToolsAsSubmitted first, so tools don't get stuck in completed-but-not-submitted.

Counter-proof (the decisive evidence) — I removed only the new cancellation-guard block and kept the tests:

× drops a fast tool result after cancellation even if the stale callback runs later
✓ submits a fast tool result after the stream ended ...
→ Tests  1 failed | 104 passed (105)

Restoring the guard → 105/105. The new test fails on exactly the path the guard protects, with zero collateral on the other 104 tests — i.e. the regression test genuinely pins the cancellation fix.

The original fix is still effective (re-checked on this head)

Reverting only the early-return from activeModelStreamsRef.current > 0 back to the buggy isResponding → the original test submits a fast tool result after the stream ended … goes red again. So the main race fix remains pinned on the new head.

Full suite & static

  • useGeminiStream.test.tsx: 105 passed (105) (was 104 — the +1 is the new post-cancel test).
  • prettier --check ✅, eslint … --max-warnings 0 exit 0, tsc --noEmit 0 errors — exactly the commands you listed, all clean.

Real CLI end-to-end (tmux TUI + mock OpenAI, YOLO)

Drove the real bundled dist/cli.js (rebuilt from main + this PR; bundle contains the activeModelStreamsRef symbol) in an interactive TUI against a mock provider. Scenario = #4672's shape: model calls read_file on a missing path → ENOENT → must hand the error back and continue.

✦ Reading the requested file now.
x  ReadFile {"file_path":"/tmp/qwen5071-DEADBEEF.txt"}
   File not found: /tmp/qwen5071-DEADBEEF.txt
✦ CONTINUATION_OK the read_file error was received: the file does not exist.

Mock turn log confirms the loop continued: turn 1 (lastRole=user) → turn 2 (lastRole=tool, Δ91 ms) — the failed tool result was submitted and the agent loop proceeded, with no regression on the normal tool-error path.

Scope note (unchanged from last time): the cancel path is a sub-render timing window that isn’t reliably reproducible by hand in a TUI; the deterministic guarantee for it is the unit-test counter-proof above, not this run. This run proves the normal path still works end-to-end on the fixed bundle.

CI

1ae9e8355: Lint ✅, CodeQL ✅, Test (ubuntu) ✅; macOS/Windows still running at time of writing. MERGEABLE (state BLOCKED = awaiting approval).

🇨🇳 中文版(点击展开)

🔁 复核 — head 1ae9e8355:我提的两个 [Critical] 阻塞项都已修复 ✅

承接我的首次评审:我当时对原修复给了 LGTM,但同时提了两个 [Critical] 行内阻塞项。你强推了 1ae9e8355(变基到当前 main dc6edcd52,单提交)来处理它们。我在 Linux 6.12 / Node v22.22.2 本地复核了新 head —— 在 merge-base 上的隔离 worktree 只应用 PR 的两个文件、完整 hook 测试、两次“测试有效性反证”、静态检查,以及 tmux 内驱动真实打包 CLI。

结论:两个阻塞项均已正确解决并独立验证。建议合并。

阻塞项 1 —— 测试里的 TS2349(tsc --noEmit 失败)→ 已修复 ✅
tsc --noEmit -p packages/cli/tsconfig.json0 错误(整包)。我指出的 staleOnComplete / TrackedCompletedToolCall “不可调用”类型不兼容已消除,测试不再只是“运行时绿”,类型上也成立。

阻塞项 2 —— 缺少取消守卫 → 已修复 ✅(且被精准锁定)
新守卫的位置恰到好处 —— 在算出 geminiTools、并经过 geminiTools.length === 0 早退之后、submitQuery 之前:

if (turnCancelledRef.current || abortControllerRef.current?.signal.aborted) {
  markToolsAsSubmitted(geminiTools.map((tc) => tc.request.callId));
  return; // 标记已提交(避免卡在 completed-but-not-submitted),但不把结果发回模型
}

当基于 ref 的守卫正确放行“流结束后”的路径后,这里就接住了取消之后的情形 —— 而这正是旧的陈旧 isResponding 闭包“意外”挡住的那条路。它还先 markToolsAsSubmitted,避免工具卡在 completed-but-not-submitted

反证(决定性证据) —— 我删掉新增的取消守卫块、保留测试:

× drops a fast tool result after cancellation even if the stale callback runs later
✓ submits a fast tool result after the stream ended ...
→ Tests  1 failed | 104 passed (105)

恢复守卫 → 105/105。新测试恰好在守卫保护的路径上失败,且对其余 104 个测试零误伤 —— 即该回归测试确实精准锁定了取消修复。

原修复在新 head 上依然有效(已重新核对)
只把早退条件从 activeModelStreamsRef.current > 0 退回到有 bug 的 isResponding → 原测试 submits a fast tool result after the stream ended … 再次变红。说明主竞态修复在新 head 上仍被锁定。

完整测试 & 静态

  • useGeminiStream.test.tsx105 通过 / 105(原 104,+1 为新的取消测试)。
  • prettier --check ✅、eslint … --max-warnings 0 exit 0tsc --noEmit 0 错误 —— 与你列出的命令一致,全部干净。

真实 CLI 端到端(tmux TUI + mock OpenAI,YOLO)
在交互式 TUI 里驱动真实打包的 dist/cli.js(由 main + 本 PR 重新构建;bundle 内含 activeModelStreamsRef 符号),指向 mock provider。场景即 #4672 形态:模型对不存在路径调用 read_file → ENOENT → 必须回传错误并继续。

✦ Reading the requested file now.
x  ReadFile {"file_path":"/tmp/qwen5071-DEADBEEF.txt"}
   File not found: /tmp/qwen5071-DEADBEEF.txt
✦ CONTINUATION_OK the read_file error was received: the file does not exist.

mock 轮次日志确认 loop 继续:turn 1 (lastRole=user) → turn 2 (lastRole=tool,Δ91 ms) —— 失败的工具结果被提交、agent loop 继续,正常工具错误路径无回归。

范围说明(同上次):取消路径是一个 render 内的亚时序窗口,在 TUI 里手动并不可靠复现;它的确定性保证来自上面的单测反证,而非这次运行。这次运行证明的是修复后的 bundle 在正常路径上端到端可用。

CI1ae9e8355 —— Lint ✅、CodeQL ✅、Test(ubuntu) ✅;撰写时 macOS/Windows 仍在运行。MERGEABLE(状态 BLOCKED = 等待审批)。

@wenshao

wenshao commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

@qwen-code /triage

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Thanks for the PR!

Template looks good ✓ — all required sections present, bilingual body, test plan with regression command, linked issue.

On direction: aligned. #4672 is a genuine user-facing stall in AUTO_EDIT / YOLO loops — an agent that silently stops after a read/edit error until the next prompt is exactly the kind of reliability bug that erodes trust. Fixing the tool-result handoff race sits squarely in qwen-code's core mission (reliable agent loop). The project has a track record of similar stale-closure fixes: PR #4470 (text buffer submit handler), PR #2374 (prompt cancellation and streaming), and the original "Fix race condition in submitQuery preventing tool response continuations" (#458). CHANGELOG doesn't reference #4672 yet, but the area is clearly relevant.

On approach: scope feels right. Two files, one hook change plus regression coverage. The counter-based activeModelStreamsRef (rather than a boolean) correctly handles overlapping streams like /btw. The added cancellation check before submitting tool results closes the post-cancel stale-callback window that the first fix alone would have left open. Removing isResponding from the useCallback dependency array is the natural consequence — the closure no longer reads it. I don't see a materially simpler path that also handles the overlap case; a boolean ref would break /btw, and pushing the guard into the scheduler would scatter stream-lifetime knowledge across two modules.

One observation for Stage 2: this race is timing-dependent at the sub-millisecond level, so a before/after terminal reproduction of the original #4672 stall isn't reliably triggerable by typing. The regression is best proven at the hook level — the new test captures the stale closure, ends the held stream, and fires a fast-failed read_file through the old callback, which is exactly the #4672 sequence. I'll run the test suite in tmux and include the maintainer's macOS re-verification (already on the PR) as the real-scenario evidence.

Moving on to code review and testing. 🔍

中文说明

感谢贡献!

模板完整 ✓ —— 所有必填章节齐全,双语正文,带回归命令的测试计划,关联 issue。

方向:对齐。#4672 是 AUTO_EDIT / YOLO 模式下一个真实的用户体验问题 —— read/edit 报错后 agent 悄悄停住、必须等下一条 prompt 才恢复,这类可靠性 bug 会直接损害用户对 agent loop 的信任。修复工具结果回传的竞态完全在 qwen-code 的核心职责范围内。项目里也有同类 stale-closure 修复的先例:PR #4470(文本 buffer 提交)、PR #2374(prompt 取消与流式)、以及最早的 #458。CHANGELOG 暂未引用 #4672,但该领域明显相关。

方案:范围合理。两个文件,一处 hook 修改加回归覆盖。用计数型 activeModelStreamsRef(而不是 bool)正确处理了 /btw 这种重叠流。新增的取消检查堵住了"取消后旧回调仍提交工具结果"这条单靠第一处修复会漏掉的窗口。从 useCallback 依赖数组里删掉 isResponding 是自然结果 —— 闭包不再读它。我想不到既更简单又能处理重叠场景的方案:bool ref 会破坏 /btw,把 guard 推到 scheduler 里则会把流生命周期知识散到两个模块。

Stage 2 观察:这个竞态是亚毫秒级时序问题,没法靠在终端里打字稳定复现 #4672 的卡住现象。回归最好放在 hook 层证明 —— 新测试捕获旧闭包、结束 held stream、再通过旧回调触发一个快速失败的 read_file,正好对应 #4672 的时序。我会在 tmux 里跑测试套件,并把 maintainer 已经在 PR 上给出的 macOS 复测结果作为真实场景证据一并纳入。

进入代码审查和测试 🔍

Qwen Code · qwen3.7-max

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

2a. Code review

Independent proposal (before reading the diff): the root cause is handleCompletedTools closing over isResponding state, which lags the actual stream end by one React render. The natural fix is a ref tracking real stream lifetime — increment synchronously before sendMessageStream, decrement in finally, guard on the ref instead of the state. A counter (not a boolean) is needed because submitQuery can overlap itself via /btw. A second issue I'd anticipate: after the user cancels, a stale onComplete callback from a tool that was already in-flight could still fire and submit a result to a model that has already been told to stop — worth an explicit cancellation guard before submitQuery.

Diff comparison: the PR does exactly this, plus it got both of the subtleties I'd expect experienced reviewers to push on:

  • activeModelStreamsRef is a counter with Math.max(0, …) underflow protection — correct for the /btw overlap case.
  • setIsResponding(false) only fires when the counter hits zero, preserving UI semantics.
  • isResponding dropped from the useCallback deps — the closure no longer reads it.
  • A new cancellation guard (turnCancelledRef.current || abortControllerRef.current?.signal.aborted) sits in the right place: after geminiTools is computed and the length === 0 short-circuit, before submitQuery. It markToolsAsSubmitted without sending the result — exactly the trade-off that avoids both a stuck completed-but-not-submitted tool and a spurious API call against an aborted turn.
  • Comments around the history-based dedup were updated to say "active-stream early-return" instead of isResponding — no stale references left behind.

No correctness bugs, no security holes, no regressions, no AGENTS.md violations. The implementation matches my independent proposal and is as tight as the problem allows.

2b. Testing

Hook-level test suite (tmux, this triage run)

Before (main @ dc6edcd, PR not applied):

$ cd packages/cli && npx vitest run src/ui/hooks/useGeminiStream.test.tsx 2>&1 | tee /tmp/after.log

 RUN  v3.2.4 /home/runner/work/qwen-code/qwen-code/packages/cli
      Coverage enabled with v8

 ✓ src/ui/hooks/useGeminiStream.test.tsx (103 tests) 817ms

 Test Files  1 passed (1)
      Tests  103 passed (103)
   Duration  11.52s (transform 2.87s, setup 124ms, collect 4.62s, tests 817ms, environment 412ms, prepare 82ms)

After (main @ dc6edcd + PR 5071 commit 1ae9e83 cherry-picked):

$ cd packages/cli && npx vitest run src/ui/hooks/useGeminiStream.test.tsx 2>&1 | tee /tmp/pr-after.log

 RUN  v3.2.4 /home/runner/work/qwen-code/qwen-code/packages/cli
      Coverage enabled with v8

 ✓ src/ui/hooks/useGeminiStream.test.tsx (105 tests) 818ms

 Test Files  1 passed (1)
      Tests  105 passed (105)
   Duration  11.03s (transform 2.79s, setup 122ms, collect 4.39s, tests 818ms, environment 392ms, prepare 81ms)

The delta is the two new regression tests added by the PR: submits a fast tool result after the stream ended but before React replaces the callback (the #4672 happy path) and drops a fast tool result after cancellation even if the stale callback runs later (the cancellation guard). Both pass on the first local run against a fresh cherry-pick — no test flakes, no environment-specific behavior.

Why no before/after TUI reproduction

The #4672 race requires the stream to end, the tool to fail within the same React microtask window, and the stale handleCompletedTools closure to still be in flight — sub-millisecond timing that isn't reliably triggerable by a human typing through tmux, even in YOLO mode. The hook-level regression captures the exact sequence deterministically (capture the stale callback, release the held stream, fire a fast-failed read_file through the old closure, assert the second sendMessageStream happens). That's the right level of proof for this class of bug; a TUI before/after would just be theater.

Independent maintainer verification (already on the PR)

@wenshao independently verified the PR on both macOS (comment 4698512525) and Linux (comment 4699396894, head 1ae9e8355). Highlights:

  • Full hook suite: 104 → 105 passed (105 at the rebased head).
  • Test-effectiveness counter-proof: reverted the source fix while keeping the new test — only the new test went red, proving the test actually exercises the bug and isn't tautological.
  • tsc --noEmit -p packages/cli/tsconfig.json → 0 errors (TS2349 from the first iteration is resolved).
  • lint / typecheck / build / bundle all exit 0.
  • Real CLI end-to-end in tmux (mock OpenAI, YOLO mode): the loop continues past the tool error — which is exactly the user-facing 自动接受编辑和YOLO模式下因为读取出错不去更新文件,需要再下一次指令才会更新本地文件,频繁触发。 #4672 behavior.
  • Both [Critical] blockers from the first review (TS2349 in the test + missing cancellation guard) confirmed resolved.

Summary

Code review clean. Tests green at the hook level and at the CLI level (maintainer-verified on macOS + Linux). The cancellation guard closes the only remaining hole the first iteration left open. I don't see anything left to ask of the contributor.

中文说明

2a. 代码审查

独立方案(读 diff 之前):根因是 handleCompletedTools 闭包捕获了 isResponding 这个 state,而 state 比真实流结束慢一个 React render。自然的修法是加一个 ref 跟踪真实流生命周期 —— 在 sendMessageStream 之前同步 +1,finally 里 -1,guard 读 ref 而不是 state。需要计数(而不是 bool),因为 /btw 会让 submitQuery 并发重叠。我还会额外防一手:用户取消之后,仍在飞的工具的旧 onComplete 回调可能还会触发、把结果推给一个已经被告知停止的模型 —— 需要在 submitQuery 之前加显式取消检查。

Diff 对比:PR 完全按这个思路做,并且把我预期会被资深 reviewer 挑出的两个细节都做对了:

  • activeModelStreamsRef 是带 Math.max(0, …) 下溢保护的计数器 —— 正确处理 /btw 重叠场景。
  • setIsResponding(false) 只在计数归零时触发,UI 语义不变。
  • isRespondinguseCallback 依赖数组里删掉 —— 闭包不再读它。
  • 新增取消检查(turnCancelledRef.current || abortControllerRef.current?.signal.aborted)位置正确:在 geminiTools 算出且 length === 0 短路之后、submitQuery 之前。只 markToolsAsSubmitted、不发结果 —— 既避免 completed-but-not-submitted 卡死,也避免对已中止 turn 发起无用 API 调用。
  • history-based dedup 周边的注释同步更新成"active-stream early-return",没有遗留 isResponding 的字样。

没有正确性 bug,没有安全漏洞,没有回归,没有违反 AGENTS.md。实现与我的独立方案一致,并且就这个问题而言已经是最精简的。

2b. 测试

Hook 层测试套件(tmux,本次 triage 运行)

Before(main @ dc6edcd,未合 PR):

$ cd packages/cli && npx vitest run src/ui/hooks/useGeminiStream.test.tsx 2>&1 | tee /tmp/after.log

 RUN  v3.2.4 /home/runner/work/qwen-code/qwen-code/packages/cli
      Coverage enabled with v8

 ✓ src/ui/hooks/useGeminiStream.test.tsx (103 tests) 817ms

 Test Files  1 passed (1)
      Tests  103 passed (103)
   Duration  11.52s (transform 2.87s, setup 124ms, collect 4.62s, tests 817ms, environment 412ms, prepare 82ms)

After(main @ dc6edcd + cherry-pick PR 5071 commit 1ae9e83):

$ cd packages/cli && npx vitest run src/ui/hooks/useGeminiStream.test.tsx 2>&1 | tee /tmp/pr-after.log

 RUN  v3.2.4 /home/runner/work/qwen-code/qwen-code/packages/cli
      Coverage enabled with v8

 ✓ src/ui/hooks/useGeminiStream.test.tsx (105 tests) 818ms

 Test Files  1 passed (1)
      Tests  105 passed (105)
   Duration  11.03s (transform 2.79s, setup 122ms, collect 4.39s, tests 818ms, environment 392ms, prepare 81ms)

差值就是 PR 新增的两个回归测试:submits a fast tool result after the stream ended but before React replaces the callback#4672 主路径)和 drops a fast tool result after cancellation even if the stale callback runs later(取消分支 guard)。在全新 cherry-pick 上首次本地跑就全部通过,没有 flake,没有环境相关行为。

为什么没有 before/after TUI 复现

#4672 竞态要求流结束、工具在同一 React microtask 窗口内失败、旧 handleCompletedTools 闭包仍在飞 —— 亚毫秒级时序,靠人在 tmux 里打字触发不了,哪怕在 YOLO 模式下也不行。Hook 层回归确定性地复现了完整时序(捕获旧闭包、释放 held stream、通过旧回调触发一个快速失败的 read_file、断言第二次 sendMessageStream 发生)。对这类 bug 这就是正确层级的证明;TUI 前后对比只是表演。

独立 maintainer 验证(已在 PR 上)

@wenshaomacOScomment 4698512525)和 Linuxcomment 4699396894,head 1ae9e8355)上独立验证:

  • 完整 hook 套件:104 → 105 通过(rebased head 上 105)。
  • 测试有效性反证:回滚源码、保留新测试 —— 只有新测试变红,证明测试真正覆盖了 bug、不是 tautology。
  • tsc --noEmit -p packages/cli/tsconfig.json → 0 错误(第一版 TS2349 已解决)。
  • lint / typecheck / build / bundle 全部 exit 0。
  • 真实 CLI tmux 端到端(mock OpenAI,YOLO 模式):循环在工具错误之后继续 —— 正是 自动接受编辑和YOLO模式下因为读取出错不去更新文件,需要再下一次指令才会更新本地文件,频繁触发。 #4672 用户可见的行为。
  • 第一次 review 的两个 [Critical] blocker(测试 TS2349 + 缺少取消 guard)都已确认解决。

小结

代码审查无问题。Hook 层和 CLI 层测试全绿(maintainer 在 macOS + Linux 上独立验证)。取消 guard 堵住了第一版留下的唯一漏洞。我想不到还需要 contributor 再改什么。

Qwen Code · qwen3.7-max

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Stepping back: this PR does exactly what I'd have written if asked to fix #4672 from scratch. The problem is a real, user-visible stall in the agent loop — the kind of thing that, left unfixed, teaches users to distrust AUTO_EDIT / YOLO and fall back to clicking Approve on every tool call. The fix is 12 lines of hook code plus a counter-based lifetime ref, which is about as small as a correct fix for a React closure race can be. The cancellation guard isn't scope creep — it closes a real second hole that the first iteration of the review caught and the contributor addressed cleanly.

My independent proposal (counter ref + cancellation guard) matches the PR's approach. I don't see a simpler path that also handles /btw overlap. The implementation follows the project's conventions: refs for non-render state, useCallback deps cleaned up, comments updated in lockstep with the code, bilingual PR body, hook-level regression test that deterministically captures the race sequence.

Testing: hook suite 103/103 → 105/105 across the cherry-pick, both new tests green on first local run. Independent maintainer verification (@wenshao) already covered the parts I can't easily do in this environment — macOS verification, test-effectiveness counter-proof (revert source, keep test, only new test goes red), full CLI end-to-end in tmux showing the loop continuing past the tool error, and re-verification of both [Critical] blockers from the first review at head 1ae9e8355 on Linux. The counter-proof in particular is the kind of evidence most PRs never get — it proves the test is actually catching the bug and isn't tautological.

If I had to maintain this in six months: the counter semantics are clear, the cancellation guard has a comment explaining why it markToolsAsSubmitted without sending, and isResponding is visibly gone from the callback deps so no one will accidentally re-close over it. I'd thank the author, not curse them.

Am I approving because I ran out of reasons to say no, or because it's genuinely ready? Genuinely ready. The bug is real, the fix is minimal and correct, the tests prove it at the right level, and two independent maintainers have now exercised it on real hardware across both major dev platforms. Nothing left to ask of the contributor.

中文说明

退一步看:这个 PR 跟让我从零修 #4672 会写的方案一模一样。问题是真实、用户可见的 agent loop 卡住 —— 这种 bug 如果不修,会教会用户不信任 AUTO_EDIT / YOLO、退回到每次工具调用都点 Approve。修法就是 hook 里 12 行代码加一个计数型生命周期 ref,对一个 React 闭包竞态的正确修复来说基本就是最小规模。取消 guard 不是范围蔓延 —— 它堵住了第一轮 review 发现的第二个真实漏洞,contributor 也干净地解决了。

我的独立方案(计数 ref + 取消 guard)与 PR 一致。我找不到既更简单又能处理 /btw 重叠的方案。实现符合项目规范:非渲染状态用 ref,useCallback 依赖同步清理,注释与代码同步更新,双语 PR 正文,hook 层回归测试确定性地捕获竞态时序。

测试:hook 套件 103/103 → cherry-pick 后 105/105,两个新测试首次本地跑就绿。独立 maintainer 验证(@wenshao)已经覆盖了我这个环境里不容易做的部分 —— macOS 验证、测试有效性反证(回滚源码保留测试、只有新测试变红)、tmux 里完整 CLI 端到端显示循环在工具错误后继续、在 Linux 上对 rebased head 1ae9e8355 重新验证两个 [Critical] blocker。尤其是那个反证,是大多数 PR 拿不到的证据 —— 它证明测试真的在抓 bug、不是 tautology。

六个月后让我维护:计数语义清晰,取消 guard 有注释解释为什么 markToolsAsSubmitted 但不发结果,isResponding 明显已经从 callback 依赖里消失,没人会不小心重新闭包它。我会感谢作者,不会骂。

我是因为找不到反对理由才批,还是真的准备好了?真的准备好了。bug 真实,修法最小且正确,测试在正确层级证明它,两位独立 maintainer 已经在两大主流开发平台的真机上验证过。没什么需要 contributor 再改的了。

Qwen Code · qwen3.7-max

@qwen-code-ci-bot qwen-code-ci-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, looks ready to ship. ✅

@qwen-code-ci-bot qwen-code-ci-bot added category/cli Command line interface and interaction priority/P2 Medium - Moderately impactful, noticeable problem scope/components UI components and widgets type/bug Something isn't working as expected labels Jun 13, 2026
@wenshao wenshao merged commit 0db3273 into QwenLM:main Jun 13, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category/cli Command line interface and interaction priority/P2 Medium - Moderately impactful, noticeable problem scope/components UI components and widgets type/bug Something isn't working as expected

Projects

None yet

Development

Successfully merging this pull request may close these issues.

自动接受编辑和YOLO模式下因为读取出错不去更新文件,需要再下一次指令才会更新本地文件,频繁触发。

3 participants