feat(cli): display real-time token consumption during streaming (#2742)#3329
Conversation
…LM#2742) Show ↓/↑ token count in the spinner during model execution: - ↓ when receiving content, ↑ when waiting for API response - Accumulates across the whole turn (tool calls don't reset) - Includes agent/subagent token consumption - Uses useAnimationFrame hook (50ms polling) to avoid flickering Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
📋 Review SummaryThis PR implements real-time token consumption display in the loading indicator, showing 🔍 General Feedback
🎯 Specific Feedback🟢 Medium
🔵 Low
✅ Highlights
|
- Replace unsafe type assertion with proper type guard in Composer - Fix license header in useAnimationFrame.ts to match project standard - Clarify tokenCount is replaced (not accumulated) per USAGE_METADATA event - Use multi-line JSDoc format for isReceivingContent prop - Improve re-sync comment in useAnimationFrame hook - Revert unrelated streamingState dep change in AppContainer Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
Real-time token display in the spinner is a good feature, and the approach is sound (chars/4 estimation, 50ms ref polling to avoid re-renders). The ↑/↓ phase arrows are a nice touch. Two bugs to fix.
Issues
-
Subagent token aggregation mixes incompatible units. The main stream estimates output-only tokens (chars/4), but subagent
tokenCountcomes fromtotalTokenCountwhich is input + output. These get summed together, so a subagent with large context dominates the display. Suggestion: use output-only token counts for agents (candidatesTokenCount), or display agent tokens separately. -
Subagent multi-round tokens overwritten instead of accumulated.
USAGE_METADATAis emitted per subagent round, but the code overwritestokenCountinstead of accumulating. Multi-round subagents under-report. Suggestion:display.tokenCount = (display.tokenCount ?? 0) + event.usage.totalTokenCount.
Verdict
REQUEST_CHANGES — The token aggregation bugs need to be fixed.
Subagent token display had two bugs: - Used totalTokenCount (input+output) instead of candidatesTokenCount (output-only), causing mixed units when aggregated with main stream - Overwrote tokenCount per round instead of accumulating, so multi-round subagents only showed the last round's count Co-Authored-By: Qwen-Coder <noreply@qwen.ai>
|
Both issues fixed in d393f23:
|
tanzhenxin
left a comment
There was a problem hiding this comment.
Thanks for the quick turnaround — both High-severity issues are correctly fixed end-to-end: the unit-mismatch via candidatesTokenCount in agent.ts:497, and multi-round accumulation via the per-invocation accumulatedOutputTokens closure. The commit message on d393f23 is clear about the rationale.
A couple of things I'd still tidy up before merging:
-
The jsdoc on
AgentResultDisplay.tokenCountinpackages/core/src/tools/tools.ts:502still says "(input + output)" — that's now stale since the value is output-only. Worth fixing in this PR since the commit that landed the accurate semantics is the one that stranded the doc. -
useAnimationFramecan briefly render the previous turn's count at the start of a new turn. The ref is reset to0inuseGeminiStream.ts:1367when a non-ToolResultsubmit begins, but the hook'suseStatestill holds the previous value until the 50ms tick fires. Simplest fix is to readwatchRef.currentsynchronously on render, or key/reset the hook on new top-level turns so the first render starts at0.
One design thought for later (not a blocker): the spinner now always shows chars/4 during streaming, whereas the old code used server-reported candidatesTokenCount from sessionStats.metrics. chars/4 is the right call during streaming since usage metadata isn't available yet, but once the first response's usage lands it'd be strictly better to swap to the real count. Happy to track this as a follow-up.
Verdict: comment — fine to merge once (1) and (2) are in, (3) can ship separately.
Interpolate displayed token count toward the real value (3/frame for small gaps, ~20% for medium, 50 for large) so chunked arrivals like tool-call args no longer cause visible jumps. Also accumulate tool call args JSON length into the streaming estimate, matching Claude Code's input_json_delta handling. Co-Authored-By: Qwen-Coder <noreply@alibabacloud.com>
The 50ms useAnimationFrame poll lived in Composer, causing its entire subtree (InputPrompt, Footer, KeyboardShortcuts) to reconcile 20×/sec during streaming. Combined with the spinner and streamed text deltas, ink redrew enough lines to produce visible terminal flicker. Move the animation hook into LoadingIndicator so only that component re-renders per frame, and slow polling to 100ms to match the spinner cadence. Co-Authored-By: Qwen-Coder <noreply@alibabacloud.com>
1. AgentResultDisplay.tokenCount jsdoc said "(input + output)" but the value has been output-only since d393f23 — update the comment so it matches the implementation. 2. useAnimationFrame held the previous turn's count in state until the next interval tick, briefly flashing stale numbers when a new turn reset the ref to 0. Snap displayRef down synchronously on render and return Math.min(displayValue, ref.current) so the reset is reflected immediately; the interval tick still catches state up afterward. Co-Authored-By: Qwen-Coder <noreply@alibabacloud.com>
|
Thanks for the careful review! Both blockers addressed in e8eaa7c: (1) Stale jsdoc on (2) Previous turn's count flashing on new turn (
This way new turns start at 0 immediately, with no dependency on tick cadence (which became 100ms in e0147e3 to scope token-animation re-renders to (3) Swap to server-reported |
wenshao
left a comment
There was a problem hiding this comment.
packages/core/src/index.ts\n\n**[Critical]** The root @qwen-code/qwen-code-core barrel no longer re-exports the memory filename helpers from ./memory/const.js, but downstream code in this PR still imports getAllGeminiMdFilenames / setGeminiMdFilename from the package root. This is a public API regression for consumers that resolve through the built package entrypoint.\n\nSuggested fix: restore a root re-export for the memory helpers, for example export * from './memory/const.js';.\n\n_— gpt-5.4 via Qwen Code /review_
|
@wenshao 这个 Critical 看起来是 AI reviewer 误报了,我核对了下:
所以不存在"downstream code in this PR still imports ... from the package root"这种情况,也没有 public API regression。如果你能指出具体的 import 位置或复现命令,我再重新核对一次;否则想请你 dismiss 这个 review。 (顺带: 单测方面,本 PR 覆盖到的文件全部绿(packages/cli 120 tests、packages/core agent 相关 445 tests 全过)。 |
…ealtime-token-display
…s/qwen-code into feat/realtime-token-display
|
@qqqys 抱歉,之前那条 CHANGES_REQUESTED 是我走
AI reviewer 凭空构造了「删除 re-export + 新增 import」两个都不存在的前提,我应该先自己核对再提交,这个锅我背,已 dismiss 那条 review。 另外拉到 worktree 用 tmux 实测了一下,行为与 PR 描述一致:
代码也扫过:@tanzhenxin 指出的 |
AI reviewer 误报,详见评论 #issuecomment-4275498133。tmux 实测功能正常,无阻塞意见。
验证报告环境:Linux / Node 20+ / tmux 3.5a,PR HEAD 对照 PR Test Plan
代码审读要点
结论:可以合并。 |
|
@tanzhenxin 你之前的 follow-up 评论 已经确认两个 High 都修好了,不过那条是 COMMENTED 状态,原先的 CHANGES_REQUESTED 还挂着,分支策略挡住了 merge。方便的时候顺手点一下 Approve 吧?合并的事我来处理。🙏 |
tanzhenxin
left a comment
There was a problem hiding this comment.
Blockers from the prior review are addressed — jsdoc now matches the output-only semantics, and the synchronous displayRef snap-down in useAnimationFrame cleanly resets on new turns. LGTM.
… (#3329) * feat(cli): display real-time token consumption during streaming (#2742) Show ↓/↑ token count in the spinner during model execution: - ↓ when receiving content, ↑ when waiting for API response - Accumulates across the whole turn (tool calls don't reset) - Includes agent/subagent token consumption - Uses useAnimationFrame hook (50ms polling) to avoid flickering Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix: address review feedback for real-time token display - Replace unsafe type assertion with proper type guard in Composer - Fix license header in useAnimationFrame.ts to match project standard - Clarify tokenCount is replaced (not accumulated) per USAGE_METADATA event - Use multi-line JSDoc format for isReceivingContent prop - Improve re-sync comment in useAnimationFrame hook - Revert unrelated streamingState dep change in AppContainer Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): use output-only tokens and accumulate across subagent rounds Subagent token display had two bugs: - Used totalTokenCount (input+output) instead of candidatesTokenCount (output-only), causing mixed units when aggregated with main stream - Overwrote tokenCount per round instead of accumulating, so multi-round subagents only showed the last round's count Co-Authored-By: Qwen-Coder <noreply@qwen.ai> * fix(cli): smooth token counter animation and include tool args Interpolate displayed token count toward the real value (3/frame for small gaps, ~20% for medium, 50 for large) so chunked arrivals like tool-call args no longer cause visible jumps. Also accumulate tool call args JSON length into the streaming estimate, matching Claude Code's input_json_delta handling. Co-Authored-By: Qwen-Coder <noreply@alibabacloud.com> * fix(cli): scope token animation re-renders to LoadingIndicator The 50ms useAnimationFrame poll lived in Composer, causing its entire subtree (InputPrompt, Footer, KeyboardShortcuts) to reconcile 20×/sec during streaming. Combined with the spinner and streamed text deltas, ink redrew enough lines to produce visible terminal flicker. Move the animation hook into LoadingIndicator so only that component re-renders per frame, and slow polling to 100ms to match the spinner cadence. Co-Authored-By: Qwen-Coder <noreply@alibabacloud.com> * fix: address review nits on token display 1. AgentResultDisplay.tokenCount jsdoc said "(input + output)" but the value has been output-only since d393f23 — update the comment so it matches the implementation. 2. useAnimationFrame held the previous turn's count in state until the next interval tick, briefly flashing stale numbers when a new turn reset the ref to 0. Snap displayRef down synchronously on render and return Math.min(displayValue, ref.current) so the reset is reflected immediately; the interval tick still catches state up afterward. Co-Authored-By: Qwen-Coder <noreply@alibabacloud.com> --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> Co-authored-by: Qwen-Coder <noreply@qwen.ai> Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
Cherry-picks upstream qwen-code PR QwenLM#3093, which adds session renaming/deletion + custom-title support. Skips the auto-title-via-LLM piece (depends on un-ported gateway shape) and the vscode-ide-companion files (deleted in our fork). What's in: - /rename: prompt for a custom session title; persisted via ChatRecordingService.recordCustomTitle and surfaced in the picker. - /delete: opens a SessionPicker that calls SessionService.removeSession on selection. - SessionListItem.customTitle field + readSessionTitleFromFile tail scanner on session file load. - SessionService.renameSession / getSessionTitle / findSessionsByTitle. - ACP extMethod handlers for renameSession + deleteSession. - SessionStart restores session-name tag from the persisted custom title via useInitializationEffects. - --resume now accepts UUID or title (validation moved to runtime). Conflict resolution notes: - Kept HEAD's bg-agent useEffect block; the upstream init useEffect was already extracted into useInitializationEffects, so the customTitle restore goes there with an optional setSessionName arg. - Kept HEAD's rewind dialog; added the delete dialog as a sibling. - Kept HEAD's voice/recap state; added sessionName/setSessionName to UIState. Dropped upstream's streamingResponseLengthRef + isReceivingContent (token-display PR QwenLM#3329, un-ported). - Dropped upstream MemoryDialog import (auto-memory un-ported); kept the i18n t import for the Delete dialog title. Tests: 29 new tests pass (rename/delete commands, customTitle recording, sessionService rename/find). Resume tests still pass. Follow-up: auto-title generation (QwenLM#3540) deferred — it depends on a generateSessionTitle path through ContentGenerator that needs adaptation to our gateway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cherry-picks upstream qwen-code PR QwenLM#3093, which adds session renaming/deletion + custom-title support. Skips the auto-title-via-LLM piece (depends on un-ported gateway shape) and the vscode-ide-companion files (deleted in our fork). What's in: - /rename: prompt for a custom session title; persisted via ChatRecordingService.recordCustomTitle and surfaced in the picker. - /delete: opens a SessionPicker that calls SessionService.removeSession on selection. - SessionListItem.customTitle field + readSessionTitleFromFile tail scanner on session file load. - SessionService.renameSession / getSessionTitle / findSessionsByTitle. - ACP extMethod handlers for renameSession + deleteSession. - SessionStart restores session-name tag from the persisted custom title via useInitializationEffects. - --resume now accepts UUID or title (validation moved to runtime). Conflict resolution notes: - Kept HEAD's bg-agent useEffect block; the upstream init useEffect was already extracted into useInitializationEffects, so the customTitle restore goes there with an optional setSessionName arg. - Kept HEAD's rewind dialog; added the delete dialog as a sibling. - Kept HEAD's voice/recap state; added sessionName/setSessionName to UIState. Dropped upstream's streamingResponseLengthRef + isReceivingContent (token-display PR QwenLM#3329, un-ported). - Dropped upstream MemoryDialog import (auto-memory un-ported); kept the i18n t import for the Delete dialog title. Tests: 29 new tests pass (rename/delete commands, customTitle recording, sessionService rename/find). Resume tests still pass. Follow-up: auto-title generation (QwenLM#3540) deferred — it depends on a generateSessionTitle path through ContentGenerator that needs adaptation to our gateway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…) (#218) * feat(session): port /rename and /delete with custom titles (QwenLM#3093) Cherry-picks upstream qwen-code PR QwenLM#3093, which adds session renaming/deletion + custom-title support. Skips the auto-title-via-LLM piece (depends on un-ported gateway shape) and the vscode-ide-companion files (deleted in our fork). What's in: - /rename: prompt for a custom session title; persisted via ChatRecordingService.recordCustomTitle and surfaced in the picker. - /delete: opens a SessionPicker that calls SessionService.removeSession on selection. - SessionListItem.customTitle field + readSessionTitleFromFile tail scanner on session file load. - SessionService.renameSession / getSessionTitle / findSessionsByTitle. - ACP extMethod handlers for renameSession + deleteSession. - SessionStart restores session-name tag from the persisted custom title via useInitializationEffects. - --resume now accepts UUID or title (validation moved to runtime). Conflict resolution notes: - Kept HEAD's bg-agent useEffect block; the upstream init useEffect was already extracted into useInitializationEffects, so the customTitle restore goes there with an optional setSessionName arg. - Kept HEAD's rewind dialog; added the delete dialog as a sibling. - Kept HEAD's voice/recap state; added sessionName/setSessionName to UIState. Dropped upstream's streamingResponseLengthRef + isReceivingContent (token-display PR QwenLM#3329, un-ported). - Dropped upstream MemoryDialog import (auto-memory un-ported); kept the i18n t import for the Delete dialog title. Tests: 29 new tests pass (rename/delete commands, customTitle recording, sessionService rename/find). Resume tests still pass. Follow-up: auto-title generation (QwenLM#3540) deferred — it depends on a generateSessionTitle path through ContentGenerator that needs adaptation to our gateway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: nudge PR conflict recomputation --------- Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…LM#2742) (QwenLM#3329) * feat(cli): display real-time token consumption during streaming (QwenLM#2742) Show ↓/↑ token count in the spinner during model execution: - ↓ when receiving content, ↑ when waiting for API response - Accumulates across the whole turn (tool calls don't reset) - Includes agent/subagent token consumption - Uses useAnimationFrame hook (50ms polling) to avoid flickering Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix: address review feedback for real-time token display - Replace unsafe type assertion with proper type guard in Composer - Fix license header in useAnimationFrame.ts to match project standard - Clarify tokenCount is replaced (not accumulated) per USAGE_METADATA event - Use multi-line JSDoc format for isReceivingContent prop - Improve re-sync comment in useAnimationFrame hook - Revert unrelated streamingState dep change in AppContainer Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): use output-only tokens and accumulate across subagent rounds Subagent token display had two bugs: - Used totalTokenCount (input+output) instead of candidatesTokenCount (output-only), causing mixed units when aggregated with main stream - Overwrote tokenCount per round instead of accumulating, so multi-round subagents only showed the last round's count Co-Authored-By: Qwen-Coder <noreply@qwen.ai> * fix(cli): smooth token counter animation and include tool args Interpolate displayed token count toward the real value (3/frame for small gaps, ~20% for medium, 50 for large) so chunked arrivals like tool-call args no longer cause visible jumps. Also accumulate tool call args JSON length into the streaming estimate, matching Claude Code's input_json_delta handling. Co-Authored-By: Qwen-Coder <noreply@alibabacloud.com> * fix(cli): scope token animation re-renders to LoadingIndicator The 50ms useAnimationFrame poll lived in Composer, causing its entire subtree (InputPrompt, Footer, KeyboardShortcuts) to reconcile 20×/sec during streaming. Combined with the spinner and streamed text deltas, ink redrew enough lines to produce visible terminal flicker. Move the animation hook into LoadingIndicator so only that component re-renders per frame, and slow polling to 100ms to match the spinner cadence. Co-Authored-By: Qwen-Coder <noreply@alibabacloud.com> * fix: address review nits on token display 1. AgentResultDisplay.tokenCount jsdoc said "(input + output)" but the value has been output-only since d393f23 — update the comment so it matches the implementation. 2. useAnimationFrame held the previous turn's count in state until the next interval tick, briefly flashing stale numbers when a new turn reset the ref to 0. Snap displayRef down synchronously on render and return Math.min(displayValue, ref.current) so the reset is reflected immediately; the interval tick still catches state up afterward. Co-Authored-By: Qwen-Coder <noreply@alibabacloud.com> --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> Co-authored-by: Qwen-Coder <noreply@qwen.ai> Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
TLDR
Display real-time token consumption in the spinner/loading indicator during model execution. Shows
↓ N tokenswhen receiving output and↑ N tokenswhen waiting for API response, with agent/subagent tokens included in the total. Token count accumulates across the whole turn and resets on new user queries.Screenshots / Video Demo
Dive Deeper
Architecture
The implementation follows claude-code's approach with adaptations for Qwen Code's Ink-based UI:
Data flow:
Key design decisions:
Ref-based character counting —
streamingResponseLengthRefaccumulates output characters inhandleContentEventwithout triggering React re-renders. Tokens are estimated aschars / 4.useAnimationFramehook — Polls the ref at 50ms intervals but only triggerssetStatewhen the value actually changes. This avoids both the flickering from per-delta state updates and the waste of unconditional 50ms re-renders.Turn-level accumulation — The character counter resets only on new user queries (
submitType !== ToolResult), not on tool-result continuations. This matches claude-code's behavior where token count only increases within a turn.Phase detection —
isReceivingContentis set tofalsewhen enteringsubmitQuery(requesting) andtrueon the first content event (responding). This drives the↑/↓arrow direction.Agent token aggregation — The agent tool now forwards
USAGE_METADATAevents toAgentResultDisplay.tokenCount. Composer aggregates these frompendingGeminiHistoryItemsand adds them to the streaming estimate.Files changed
useAnimationFrame.tsuseGeminiStream.tsstreamingResponseLengthRef(char counter) andisReceivingContent(phase flag)UIStateContext.tsxUIStateinterfaceAppContainer.tsxComposer.tsxuseAnimationFrame, agent token aggregationLoadingIndicator.tsxisReceivingContentprop for dynamic↑/↓arrowtools.tstokenCounttoAgentResultDisplayagent.tsUSAGE_METADATAevents to displayReviewer Test Plan
Basic streaming — Send a simple prompt and verify
↓ N tokensappears in the spinner, increasing as output streams.Tool calls — Send a prompt that triggers tool use (e.g. "read file X"). Verify:
↑while waiting for API after tool result↓when model resumes outputNew turn reset — After a response completes, send a new prompt. Verify token count resets to 0 (no stale flash from previous turn).
Agent/subagent — Launch a task that uses the Agent tool. Verify the main spinner includes the subagent's token consumption.
Narrow terminal — Resize terminal to < 80 columns. Verify tokens are hidden gracefully.
Cancel — Press Esc during streaming. Verify no errors or stale display.
Testing Matrix
Linked issues / bugs
Closes #2742