fix(core): throttle shell tool live text updates#3902
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces high-frequency React re-renders caused by unthrottled plain-text streaming output from the shell tool, while preserving responsive ANSI/PTY updates and ensuring the final ToolResult still contains complete command output.
Changes:
- Initialize
lastUpdateTimetoNumber.NEGATIVE_INFINITYso the first live chunk always emits. - Throttle subsequent plain-text
datachunks toOUTPUT_UPDATE_INTERVAL_MS, while letting ANSI (AnsiOutputarray) chunks flow through at full (already service-throttled) rate. - Add a vitest case verifying text-stream throttling emits only periodic updates while still showing the latest chunk.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| packages/core/src/tools/shell.ts | Throttles live text data updates using OUTPUT_UPDATE_INTERVAL_MS while preserving immediate ANSI updates. |
| packages/core/src/tools/shell.test.ts | Adds a fake-timers test to validate throttled text streaming behavior and “latest output wins” semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
| // carries the complete output after command completion. | ||
| shouldUpdate = | ||
| Array.isArray(event.chunk) || | ||
| Date.now() - lastUpdateTime > OUTPUT_UPDATE_INTERVAL_MS; |
There was a problem hiding this comment.
[Critical] This throttle is leading-edge only, so the latest suppressed plain-text output is never flushed if no later chunk arrives after the interval. For example, a long-running command can print an important status or prompt within the 1s window and then go quiet; cumulativeOutput is updated, but the live UI remains stuck on the previous text until another chunk arrives or the command exits.
Consider scheduling a trailing flush for suppressed string chunks and clearing/rescheduling it on subsequent updates and completion/abort/binary transition. A regression test should cover line 2 being suppressed, no further chunks arriving, and advancing timers causing updateOutput('line 2').
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Fixed in commit 8fcfc88. Added a trailingFlushTimer (setTimeout) for suppressed plain-text chunks that fires after the remaining throttle window. It is cancelled and rescheduled on each suppressed chunk, cancelled on a leading-edge update that fires immediately, and cancelled on command completion/abort. A new test 'should flush the last suppressed text chunk when the command goes quiet' covers the exact scenario in the report: line 2 is suppressed, no further chunks arrive, timers advance, and updateOutput('line 2') is called.
| @@ -783,6 +783,39 @@ describe('ShellTool', () => { | |||
| }); | |||
| await promise; | |||
| }); | |||
There was a problem hiding this comment.
[Suggestion] The new test covers plain string chunks, but it does not cover the new Array.isArray(event.chunk) branch that intentionally keeps ANSI AnsiOutput chunks responsive inside the throttle window. A future regression could accidentally throttle PTY/ANSI updates while this test still passes.
Consider adding a test that sends two ANSI-shaped data events back-to-back without advancing fake time and asserts both trigger updateOutput, with the second call containing { ansiOutput, totalLines, totalBytes }.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Added in commit 8fcfc88: 'should pass ANSI chunks through immediately without throttling'. The test sends two ANSI-shaped data events back-to-back without advancing fake time and asserts both trigger updateOutput, with the second call containing the { ansiOutput, totalLines, totalBytes } shape.
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
Targeted fix that throttles plain-text streaming chunks once per second while leaving ANSI (PTY) chunks at full rate. Initializing lastUpdateTime to negative infinity correctly ensures the first chunk emits immediately, and the pattern matches the existing binary_progress branch a few lines below as well as the consumer-side throttle in the shell command processor. The trade-off where the last text chunk can be up to a second stale on mid-execution cancellation is acceptable — on normal completion the rendered output comes from the final accumulated ToolResult, not the throttled live preview.
The numbers in the PR (200 renders → 10 renders for a 200-tick loop) make the React-render cost concrete. Clean, minimal diff with the right scope.
Verdict
APPROVE — Correct, minimal fix on a real hot path.
yiliang114
left a comment
There was a problem hiding this comment.
Reviewed the latest revision again. The trailing flush now covers the quiet-output case from the earlier review, and the ANSI pass-through path is covered as well. The remaining behavior is scoped to live preview throttling only; the final ToolResult still carries the complete output.
I also ran the targeted shell test locally: cd packages/core && npx vitest run src/tools/shell.test.ts — 96 tests passed. Looks good to me.
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] binary_progress 没有 trailing flush,与 text data 行为不一致
binary_progress(shell.ts ~line 685)只使用简单时间门控,没有 trailing flush。如果 binary progress events 在 throttle 窗口内 burst 到达,中间状态被静默丢弃。与 text data events 的 trailing-flush 行为不一致。
虽然 binary 进度只是 advisory UI 状态,但建议至少添加注释说明这是有意为之(或如果 binary 进度显示需要准确,添加 trailing flush 保持一致)。
— glm-5.1 via Qwen Code /review
| // ANSI output is already throttled and semantically deduped by | ||
| // ShellExecutionService, so preserve its live responsiveness. | ||
| // Plain text data can arrive in bursts and does not need every | ||
| // chunk to force a React render; the final ToolResult still |
There was a problem hiding this comment.
[Critical] ANSI / binary 分支未取消 pending trailingFlushTimer
binary_detected、binary_progress 和 ANSI chunk 分支在设置 shouldUpdate = true 时,不会取消 pending 的 trailingFlushTimer。只有 leading-edge text 分支(line ~663)会取消。当 text chunk 被 throttle 后,如果后续到达 binary 或 ANSI 事件,stale timer 会触发冗余的 doUpdate() 调用。
更严重的是,binary_progress 的 throttle 窗口可能被 stale timer 的 lastUpdateTime 更新所扭曲:stale timer 在 T=1000 触发后设置 lastUpdateTime=1000,导致 T=1001 的 binary_progress 事件被错误 suppress(1001-1000=1 < 1000),实际上它应该通过(1001-0 > 1000)。
Reproduction: text@T=0 → leading edge → text@T=100 → throttled, timer@T=1000 → binary_progress@T=500 → shouldUpdate=false → timer@T=1000 fires stale doUpdate() → lastUpdateTime=1000 → binary_progress@T=1001 incorrectly suppressed.
建议在 doUpdate() 中统一取消 timer:
| // chunk to force a React render; the final ToolResult still | |
| const doUpdate = () => { | |
| if (trailingFlushTimer !== null) { | |
| clearTimeout(trailingFlushTimer); | |
| trailingFlushTimer = null; | |
| } | |
| if (!updateOutput) return; | |
| if (typeof cumulativeOutput === 'string') { | |
| updateOutput(cumulativeOutput); | |
| } else { | |
| updateOutput({ | |
| ansiOutput: cumulativeOutput, | |
| totalLines, | |
| totalBytes, | |
| ...(this.params.timeout != null && { | |
| timeoutMs: this.params.timeout, | |
| }), | |
| }); | |
| } | |
| lastUpdateTime = Date.now(); | |
| }; |
— glm-5.1 via Qwen Code /review
| }); | ||
| await promise; | ||
| }); | ||
|
|
There was a problem hiding this comment.
[Critical] 缺少 trailingFlushTimer 在命令完成时取消的测试
三个新测试分别覆盖 text throttle cycle、text trailing flush fire 和 ANSI passthrough,但没有测试验证当命令完成时有 pending trailing flush timer 的场景。
这是最关键的 timer 生命周期不变量:如果 clearTimeout 在 line ~731 被意外移除,stale timer 会在命令完成后触发 doUpdate(),在已经返回的 ToolResult 之后产生 phantom updateOutput 调用。
建议添加测试:
- 发送 leading-edge text chunk
- 发送 suppressed text chunk(timer pending)
- 不 advance time,直接 resolve execution promise
- 断言
updateOutputMock只被调用 1 次(leading edge only) - advance time past interval,断言仍然是 1 次(确认 timer 被取消)
— glm-5.1 via Qwen Code /review
| clearTimeout(trailingFlushTimer); | ||
| trailingFlushTimer = null; | ||
| } | ||
| } else { |
There was a problem hiding this comment.
[Suggestion] Timer rescheduling 开销:每个 suppressed chunk 都 clear + recreate timer
else 分支在每次 suppressed chunk 时都 clearTimeout + setTimeout。由于 callback 通过闭包读取 cumulativeOutput(按引用),所有 timer 在同一 throttle 窗口内的 fire time 相同(lastUpdateTime + OUTPUT_UPDATE_INTERVAL_MS),只需调度一次即可。
| } else { | |
| } else { | |
| // Only schedule once per window; the callback reads | |
| // cumulativeOutput by reference, so it always emits the latest. | |
| if (trailingFlushTimer === null) { | |
| const remaining = | |
| OUTPUT_UPDATE_INTERVAL_MS - (Date.now() - lastUpdateTime); | |
| trailingFlushTimer = setTimeout(() => { | |
| trailingFlushTimer = null; | |
| doUpdate(); | |
| }, remaining); | |
| } | |
| } |
— glm-5.1 via Qwen Code /review
| let totalBytes = 0; | ||
| let trailingFlushTimer: ReturnType<typeof setTimeout> | null = null; | ||
|
|
||
| const doUpdate = () => { |
There was a problem hiding this comment.
[Suggestion] doUpdate() 中 lastUpdateTime 更新在 updateOutput guard 之后
当 updateOutput 为 falsy 时,lastUpdateTime 永远不更新(保持 Number.NEGATIVE_INFINITY),导致 throttle 实质失效——每个 chunk 都通过 throttle 检查并调用 doUpdate()(虽然最终是 no-op)。
当前 ShellTool.canUpdateOutput 始终为 true,所以实际无影响,但将 lastUpdateTime 移到 guard 之前更健壮:
| const doUpdate = () => { | |
| const doUpdate = () => { | |
| lastUpdateTime = Date.now(); | |
| if (!updateOutput) return; | |
| if (typeof cumulativeOutput === 'string') { | |
| updateOutput(cumulativeOutput); | |
| } else { | |
| updateOutput({ | |
| ansiOutput: cumulativeOutput, | |
| totalLines, | |
| totalBytes, | |
| ...(this.params.timeout != null && { | |
| timeoutMs: this.params.timeout, | |
| }), | |
| }); | |
| } | |
| }; |
— glm-5.1 via Qwen Code /review
|
|
||
| const result = await resultPromise; | ||
|
|
||
| // Cancel any pending trailing flush — the command has settled and the |
There was a problem hiding this comment.
[Suggestion] 无 try/finally 保护 trailingFlushTimer 清理
Timer 清理在 await resultPromise 之后,没有 try/finally。如果 resultPromise reject(理论上不太可能,因为 ShellExecutionService.execute() 始终 resolve),timer 会泄漏并触发 stale doUpdate()。
建议用 try/finally 包裹:
let result;
try {
result = await resultPromise;
} finally {
if (trailingFlushTimer !== null) {
clearTimeout(trailingFlushTimer);
trailingFlushTimer = null;
}
}— glm-5.1 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Suggestion (测试覆盖): 新增测试覆盖了 happy path,但以下关键场景缺失:
- 同一节流窗口内 3+ 快速文本块的 timer 取消/重调度
- 命令在有挂起 timer 时完成(
resolveExecutionPromise在 timer 触发前调用) - leading-edge 更新取消挂起 trailing flush
- abort/exception 路径的 timer 清理
— deepseek-v4-pro via Qwen Code /review
| lastUpdateTime = Date.now(); | ||
| }; | ||
|
|
||
| const { result: resultPromise, pid } = await ShellExecutionService.execute( |
There was a problem hiding this comment.
[Critical] trailingFlushTimer 在异常路径中泄漏
ShellExecutionService.execute() 可能抛出异常(例如 PTY 动态 import 失败),此时不会执行到 await resultPromise 之后的 timer 清理代码(line 731)。被泄漏的 timer 后续会以过时的 cumulativeOutput 调用 doUpdate(),在命令已失败后产生僵尸 UI 更新。
| const { result: resultPromise, pid } = await ShellExecutionService.execute( | |
| const { result: resultPromise, pid } = await ShellExecutionService.execute( | |
| commandToExecute, | |
| cwd, | |
| // ...callback... | |
| ).catch((err) => { | |
| if (trailingFlushTimer !== null) { | |
| clearTimeout(trailingFlushTimer); | |
| trailingFlushTimer = null; | |
| } | |
| throw err; | |
| }); |
— deepseek-v4-pro via Qwen Code /review
| // chunk to force a React render; the final ToolResult still | ||
| // carries the complete output after command completion. | ||
| if (Array.isArray(event.chunk)) { | ||
| shouldUpdate = true; |
There was a problem hiding this comment.
[Suggestion] ANSI 路径未取消挂起的 trailingFlushTimer
当纯文本块先被节流(调度了 trailing flush),随后 ANSI 块到达并立即触发 doUpdate(),挂起的 timer 之后会再次调用 doUpdate() 产生重复渲染。leading-edge 文本分支已正确处理(line 664-667),但 ANSI 分支缺失同样的清理。
| shouldUpdate = true; | |
| if (Array.isArray(event.chunk)) { | |
| shouldUpdate = true; | |
| if (trailingFlushTimer !== null) { | |
| clearTimeout(trailingFlushTimer); | |
| trailingFlushTimer = null; | |
| } | |
| } |
— deepseek-v4-pro via Qwen Code /review
| }, remaining); | ||
| } | ||
| break; | ||
| case 'binary_detected': |
There was a problem hiding this comment.
[Suggestion] binary_detected 路径未取消挂起的 trailingFlushTimer
与 ANSI 路径相同的问题:binary_detected 设置 shouldUpdate = true 但未取消之前在节流窗口内调度的 trailing flush timer,导致重复渲染。
| case 'binary_detected': | |
| case 'binary_detected': | |
| isBinaryStream = true; | |
| cumulativeOutput = '[Binary output detected. Halting stream...]'; | |
| shouldUpdate = true; | |
| if (trailingFlushTimer !== null) { | |
| clearTimeout(trailingFlushTimer); | |
| trailingFlushTimer = null; | |
| } | |
| break; |
— deepseek-v4-pro via Qwen Code /review
| clearTimeout(trailingFlushTimer); | ||
| trailingFlushTimer = null; | ||
| } | ||
| } else { |
There was a problem hiding this comment.
[Suggestion] trailingFlushTimer 在 abort 信号触发时未清除
当 combinedSignal 触发(超时或用户取消)时,没有代码清除 timer。在信号触发与 resultPromise 最终 resolve 的时间窗口内,timer 可能触发并调用 updateOutput(),产生过时的输出闪现。
| } else { | |
| combinedSignal.addEventListener('abort', () => { | |
| if (trailingFlushTimer !== null) { | |
| clearTimeout(trailingFlushTimer); | |
| trailingFlushTimer = null; | |
| } | |
| }, { once: true }); |
— deepseek-v4-pro via Qwen Code /review
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
Re-reviewed against HEAD. The original approve stands — this re-review picks up the trailing-flush mechanism and the timer-lifecycle concerns raised after the first pass.
None of the four "[Critical]" timer-leak claims survive a careful trace. The trailing flush timer is only ever scheduled inside the output callback, which can't fire until ShellExecutionService.execute() has handed back its handle, and the result promise is resolve-only on every code path (including aborts and spawn failures), so the synchronous clearTimeout after await resultPromise always wins. The lastUpdateTime placement concern is moot when updateOutput is undefined — there is nothing to throttle, so "throttle disabled" is a non-event.
The one valid follow-up is small: ANSI and binary_* branches fire doUpdate() without cancelling a pending trailing timer, so a text → ANSI → quiet sequence can produce one redundant render. Worth a one-line cleanup in a follow-up. The pre-existing binary_progress path also lacks a trailing flush — same rationale you applied to text would apply there. And the new tests don't yet cover "trailing timer is cancelled when the command settles or aborts" — useful invariants to lock in.
Verdict
APPROVE — narrow, real win on render churn for chatty commands; the lifecycle concerns don't hold up under tracing.
|
@wenshao Addressed in 9da27c9. Both the missing test scenarios from your latest review and the inline timer-lifecycle issues (from this review and the earlier glm-5.1 review) are now covered: Code-side hardening (
New tests (
All 101 tests in |
The previous lastUpdateTime = Date.now() at function entry meant the first 'data' chunk's check (Date.now() - lastUpdateTime > INTERVAL) was always false on first invocation, but shouldUpdate=true was set unconditionally — so every text chunk forced a React render. Initialize lastUpdateTime to NEGATIVE_INFINITY so the first chunk always emits, then throttle subsequent text chunks to OUTPUT_UPDATE_INTERVAL_MS. ANSI (Array<>) chunks are already throttled and deduped by ShellExecutionService and continue to update at full rate. Final ToolResult still carries the complete output after command completion — only the live preview is throttled. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
wenshao CHANGES_REQUESTED (2026-05-07T22:42): the leading-edge-only throttle
left the last suppressed plain-text chunk unshown if the command went quiet
within the 1s window (e.g. a status line printed once and then no more output).
Fix: when a plain-text chunk is suppressed, schedule a setTimeout for the
remaining window duration that calls the existing doUpdate() helper. The timer
is cancelled if a subsequent leading-edge update arrives first (preventing a
redundant render), or when the command settles via await resultPromise.
The do-update logic is extracted into a local doUpdate() helper to avoid
duplicating the string/ANSI branching between the immediate path and the timer.
Test changes:
- Updated existing throttle test to reflect new 3-call sequence: leading-edge
('line 1'), trailing flush ('line 2'), leading-edge ('line 3').
- Added 'trailing flush' test: leading update fires, next chunk suppressed,
time advances → trailing flush emits the last suppressed chunk.
- Added 'ANSI passthrough' test: two back-to-back ANSI chunks both trigger
updateOutput immediately (ANSI branch bypasses throttle, regression guard).
Generated with AI
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Use fg/bg string fields and remove non-existent properties (strikethrough, hidden, blink, foreground, background) so the object literals satisfy the AnsiToken interface and tsc passes. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Centralizes trailing-flush timer cancellation in `doUpdate()` so every emit path (leading-edge text, ANSI passthrough, binary_detected, binary_progress) supersedes a pending timer instead of leaving a stale one that could double-fire. Adds an abort listener so user-cancel / timeout cancels the timer before the result settles, and wraps both `ShellExecutionService.execute()` and `await resultPromise` in try/finally so a thrown PTY import or rejected result still tears down the timer + listener. Adds five regression tests covering the lifecycle invariants flagged in review: - 3+ rapid suppressed chunks coalesce into one trailing flush - command settling cancels a pending trailing-flush timer - leading-edge update path produces no duplicate trailing flush - abort signal cancels a pending trailing-flush timer - execute() rejection cleans up listeners (no late updateOutput) Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
9da27c9 to
226bb17
Compare
wenshao
left a comment
There was a problem hiding this comment.
No issues found. Re-reviewed against HEAD (226bb17) — all previously raised concerns (timer lifecycle, trailing flush, abort/exception cleanup, ANSI passthrough) have been addressed. Typecheck and lint clean, 179/179 tests pass. LGTM ✅
— glm-5.1 via Qwen Code /review
Summary
ShellToolInvocationinitializeslastUpdateTime = Date.now()at the top ofexecute()and then setsshouldUpdate = trueunconditionally on every text data chunk. The throttle checkif (Date.now() - lastUpdateTime > OUTPUT_UPDATE_INTERVAL_MS)only gatesbinary_progressevents — not text-dataevents — so every plain-text shell chunk forces a React render.Concretely, for a command like `for i in $(seq 1 200); do echo "tick $i"; sleep 0.05; done`:
main(no throttle)The fix:
lastUpdateTime = Number.NEGATIVE_INFINITYso the first chunk always emits.datachunks toOUTPUT_UPDATE_INTERVAL_MS(1000 ms).Array<>) chunks at full rate — those are already throttled and semantically deduped byShellExecutionService, so live PTY apps like `htop` continue to update responsively.ToolResultstill carries the complete output after command completion — only the live preview is throttled.Why split out
Extracted from #3663 (umbrella TUI flicker / streaming stability PR). This piece is purely
packages/core— zeroink/reactcoupling, unaffected by the in-flight ink 7 upgrade. Independently testable (vitest fake timers) and independently revertible.Test evidence
The new test in
packages/core/src/tools/shell.test.tsis the canonical evidence — deterministic, no fake-environment, no network:```ts
it('should throttle live text updates while preserving the latest output', async () => {
const invocation = shellTool.build({ command: 'npm test', is_background: false });
const promise = invocation.execute(mockAbortSignal, updateOutputMock);
// 1st chunk: emits immediately (NEGATIVE_INFINITY guard)
mockShellOutputCallback({ type: 'data', chunk: 'line 1' });
expect(updateOutputMock).toHaveBeenCalledOnce();
expect(updateOutputMock).toHaveBeenLastCalledWith('line 1');
// 2nd chunk back-to-back: SWALLOWED by throttle
mockShellOutputCallback({ type: 'data', chunk: 'line 2' });
expect(updateOutputMock).toHaveBeenCalledOnce(); // still 1, not 2
// Advance past the throttle interval
await vi.advanceTimersByTimeAsync(OUTPUT_UPDATE_INTERVAL_MS + 1);
// 3rd chunk: emits, and carries the LATEST content
mockShellOutputCallback({ type: 'data', chunk: 'line 3' });
expect(updateOutputMock).toHaveBeenCalledTimes(2);
expect(updateOutputMock).toHaveBeenLastCalledWith('line 3');
...
});
```
Three deterministic assertions per case:
NEGATIVE_INFINITYinitialization fixes the original off-by-one.Verification
cd packages/core && npx vitest run src/tools/shell.test.ts— 94 / 94 pass (1 new + existing).npm run typecheck --workspace=@qwen-code/qwen-code-core— clean.npm run lint --workspace=@qwen-code/qwen-code-core— clean.Follow-ups
This is one of a series of small PRs splitting out independent fixes from #3663. Sibling splits already up: