feat(cli,core): LLM-generated summary labels for tool-call batches#3538
Conversation
After each tool batch completes, fire a parallel fast-model call to generate a short git-commit-subject-style label summarizing what the batch accomplished (e.g. "Read txt files", "Searched in auth/"). In compact mode the label replaces the generic "Tool × N" header so N parallel tool calls collapse to a single semantic row. The fast-model call (~1s) runs fire-and-forget, overlapped with the next turn's API stream, so there is no perceived latency. Missing fast model, aborted turns, and model failures all degrade silently to the existing rendering. The summary is also emitted as a `tool_use_summary` history entry with `precedingToolUseIds`, keeping the shape compatible with SDK clients that want to render collapsed tool views on their own. Gated by `experimental.emitToolUseSummaries` (default on). Can be overridden per-session with `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0|1`. The system prompt and truncation rules (300 chars per tool field, 200 chars of trailing assistant text as intent prefix) match the existing behavior seen in other tools that emit the same message type, so SDK consumers see a consistent shape across clients.
2c98a6a to
6ffeb20
Compare
CodeQL js/polynomial-redos flagged the /^["'`]+|["'`]+$/g pattern in
cleanSummary because its input comes from an LLM (treated as
uncontrolled). The original regex is anchored and linear in practice,
but tightening the quantifier to {1,10} both satisfies the static
check and caps engine work on pathological model output with a long
run of quotes. Ten opening/closing quotes is well past anything a real
label would produce.
…label The summary was only visible in compact mode because the full-mode ToolGroupMessage ignored the compactLabel prop. Compact mode got away with this because mergeCompactToolGroups triggers refreshStatic(), which re-renders the merged tool_group with its newly-looked-up label. Full mode has no such refresh path, so when the fast-model call resolves *after* the tool_group has been committed to the append-only <Static>, there is no way to retroactively decorate it. Switch to rendering `tool_use_summary` as its own inline history item (a single dim `● <label>` line). New items append cleanly to <Static>, so the summary flows in naturally once the fast-model call resolves. Compact mode still replaces the merged tool_group header with the label and hides the standalone summary line via the `compactMode` guard. With this, the feature works under the default `ui.compactMode: false` — not just the opt-in compact view.
wenshao
left a comment
There was a problem hiding this comment.
No issues found on the current PR head after re-checking the latest commits. LGTM! ✅ — gpt-5.4 via Qwen Code /review
Three new docs matching the existing fast-model feature docs layout: - docs/users/features/tool-use-summaries.md — user-facing guide covering full + compact rendering, configuration (settings + env), failure modes, cost, and cross-links to followup-suggestions. - docs/users/configuration/settings.md — register the new experimental.emitToolUseSummaries setting next to the other fast-model-driven UI settings. - docs/design/tool-use-summary/tool-use-summary-design.md — deep dive matching the compact-mode-design.md competitive-analysis style. Documents the Claude Code port (prompt, truncation, timing, gate), the deviations (settings layer, default on, cleanSummary, dual render paths), and the Ink <Static> append-only rationale that drove the inline full-mode render vs header-replacement split.
Full-mode rendering of the summary works, but for small same-type batches (Read × 3 and similar) the label visibly restates what the tool lines already show. Pairing with ui.compactMode: true folds the whole batch into a single labeled row, which is the cleanest transcript shape once the label is available. Adds a dedicated section showing the paired settings.json snippet and explicitly calling out when each mode wins (and when to turn the feature off instead).
chiga0
left a comment
There was a problem hiding this comment.
Thanks for the thorough PR — the design doc is great and the test coverage is strong. Compact mode really does need something better than the generic Tool × N header. I spent a few rounds looking at edge cases (subagents, single-batch vs multi-batch turns, Ctrl+C mid-flight, partial failures, merge behavior) and left specific findings inline.
Blocking (should fix before merge)
- In compact mode, single-batch turns (the most common shape) never render a label.
mergeCompactToolGroupsdoes not drop trailingtool_use_summaryitems, so the length-delta check inMainContentdoesn't firerefreshStatic, and Ink<Static>never repaints the already-committed tool_group. The design doc's claim that the existing merge refresh path covers compact mode only holds when the turn has ≥2 batches. See inline onmergeCompactToolGroups.ts/MainContent.tsxfor a trace. summarySignalbecomes an orphan across turn boundaries. It captures the current turn's signal, butsubmitQuery()right after swapsabortControllerRef.currentto a brand-newAbortController. If the user Ctrl+C's the next turn, the captured signal never aborts, theif (!summarySignal.aborted)guard passes, and the summary appears after cancellation.
Worth fixing together
3. getCompactLabel resolves to whichever batch's summary loaded first, not "first contributing batch" as the PR description claims. With fast-model jitter the merged header can visibly flip from batch B's label to batch A's once A resolves.
4. Partially failed/cancelled batches still feed cancelled / error output into the summarizer. cleanSummary only filters labels the model returns with Error: / Unable to prefixes; it can't prevent the model from generating a misleading label from poisoned input (e.g. "Attempted to read …" / "Failed to fix X").
5. Force-expand groups in compact mode (error / confirmation / user-initiated) bypass CompactToolGroupDisplay, so compactLabel is ignored; the standalone ● <label> line is also gated on !compactMode, so these groups get no label at all — arguably the highest-signal case.
Minor
6. Writing historyRef.current = history; directly in the component body is not concurrent-safe; prefer useLayoutEffect.
7. cleanSummary quote-stripping misses Unicode curly quotes (U+2018/19/1C/1D, CJK brackets) and doesn't strip markdown emphasis (**bold**, _italic_). CJK models occasionally wrap outputs in these.
8. Defaulting the feature on diverges from upstream Claude Code's env-only, default-off model. Worth double-checking that getFastModel() doesn't silently fall back to the main model — if it does, the claimed "$0.001/batch" cost profile isn't accurate.
Overall direction is good — please sanity-check #1 with a real compactMode: true + single-batch run (e.g. a single Read). Single-batch compact turns are the primary case this PR is optimizing, and as far as I can tell from a trace-walk they are not actually refreshing today.
Addresses multiple issues from @chiga0's review: Blocking — compact-mode label invisible for single-batch turns. mergeCompactToolGroups's adjacency-only gating left a trailing tool_use_summary in the merged result whenever there was no second batch to merge across. That pushed mergedHistory.length lock-step with history.length and MainContent's refreshStatic heuristic (currMLen <= prevMLen) never fired, so Ink's append-only <Static> never repainted the tool_group with its newly-looked-up label. Drop tool_use_summary items unconditionally now; gemini_thought still survives to avoid unnecessary repaints. New tests cover the single-batch case and the summary-before-user-message case. Blocking — stale summary appears after Ctrl+C on the next turn. summarySignal captured the CURRENT turn's AbortController, but the summary resolves during the NEXT turn's streaming window. The next turn's submitQuery allocates a fresh controller, so the captured signal was never aborted — Ctrl+C during the new turn used to let the previous turn's summary land in the transcript seconds later. Fix: dedicated per-batch AbortController tracked in a ref set, aborted eagerly from cancelOngoingRequest; resolve-time check reads the live abort state and turnCancelledRef. High — summarizer input pollution. geminiTools contained error/cancelled tools; retry-loop warnings and "Cancelled by user" strings were feeding the fast model. cleanSummary can only reject error-shaped output, not prevent the model from hallucinating a plausible label from bad input (the PR's own tmux screenshot showed "Read txt files · 5 tools" where 4 of the 5 were prior-retry failures). Filter to status === 'success' before building the prompt; skip the call entirely if nothing's left. High — unstable label on merged groups. getCompactLabel iterated all callIds and returned the first hit, so asynchronous resolution order made the header visibly flip from SB to SA when batch A resolved after batch B. Lock onto item.tools[0].callId to keep stable "leading batch governs" semantics. High — force-expanded groups in compact mode had no label at all. Compact mode routes non-force-expand groups through CompactToolGroupDisplay (consumes compactLabel) and force-expand groups through the full ToolGroupMessage (ignores compactLabel); the standalone ● line was gated on !compactMode, creating a dead zone — exactly the diagnostically valuable case. MainContent now computes absorbedCallIds (which groups actually consume the header replacement) and passes summaryAbsorbed to HistoryItemDisplay; force-expand groups in compact mode get the standalone line as the label's only path to the screen. Medium — cleanSummary robustness. Extend quote-strip to Unicode curly + CJK corner brackets; strip markdown emphasis (**bold**, _italic_); broaden refusal-prefix rejection to curly-apostrophe "I can't", Chinese "我无法 / 我不能 / 抱歉 / 无法", and "Failed to / Sorry, / Request failed". 7 new cleanSummary tests cover the added cases. Low — concurrent-rendering safety. Move historyRef.current = history from render phase into useLayoutEffect so bailed renders can't leave a dropped value. Low — CompactToolGroupDisplay readability. Extract renderSummaryHeader / renderDefaultHeader helpers and document the toolCalls.length > 1 count-suffix guard so a future "fix" to >= 1 doesn't reintroduce "Read config.json · 1 tools". Docs — add Scope & Lifecycle section to tool-use-summaries.md covering (1) one generation per batch shared by both modes, (2) no backfill on toggle / session resume, (3) main-agent batches only with the Task-tool clarification.
Critical — force-expand groups lost their summary entirely. Previous round's "drop tool_use_summary unconditionally" merge fix also stripped summaries for force-expanded groups, defeating the exact case (errors, confirmations, focused shell) where the standalone ● label is the label's only path to the screen. The merge function now takes an absorbedCallIds set: summaries whose preceding callIds are all absorbed by a compact tool_group header are dropped (so refreshStatic still fires), but force-expanded summaries pass through to be rendered standalone by HistoryItemDisplay. MainContent computes absorbedCallIds from raw history and passes it in. New tests cover both the absorbed-drop and the force-expand-preserve cases plus the empty-set default for callers that don't compute absorption. Suggestion — late-arriving summaries could land out of order. A slow fast-model call could resolve after the next turn's content was committed, planting the ● label between later items in full mode. The resolve callback now captures the first batch callId, locates the corresponding tool_group at resolve time, and drops the summary if a newer tool_group has already appeared in history. New test exercises this with a manually-resolved fast-model promise. Suggestion — truncateJson allocated full JSON for large strings. A 10MB ReadFile result was being JSON.stringify'd in full only to be sliced down to 300 chars. Added preTruncate that walks the value (depth-bounded to 4) and slices string leaves to maxLength before serialization. Tests verify the input never reaches its full pre-cap form. Suggestion — settings description over-claimed SDK emission. The description said summaries are emitted to SDK clients as a tool_use_summary message; the SDK plumbing isn't actually wired in this PR (the factory is exported for follow-up). Updated settings.json description and regenerated the vscode schema to state CLI-only scope explicitly. Suggestion — fastModel data-boundary not documented. When fastModel uses a different provider than the main session model, tool inputs/outputs cross a new auth boundary that users may not expect. Added "Data flow & privacy" section to the user feature doc spelling out: same-provider fast model = no scope change; different-provider = strictly larger sharing scope; two escape hatches (same-provider fast model OR feature off). Code-level mitigation (metadata-only mode) deferred.
chiga0
left a comment
There was a problem hiding this comment.
LGTM ✅ — 两轮review的所有blocking/critical问题都已在 93f627e 和 b86eee5 中得到妥善解决,已经在最新HEAD上核对了关键代码:
已验证修复
mergeCompactToolGroups通过absorbedCallIds参数区分处理:被compact header吸收的summary会被drop以触发refreshStatic,force-expand组的summary会passthrough让HistoryItemDisplay渲染standalone● <label>行——之前round 2提出的critical问题已正确修复,没有重新引入round 1的bug。useGeminiStream.ts的per-batchAbortController(追踪在summaryAbortRefsRefSet中,由cancelOngoingRequesteagerly abort)+ resolve时三重cancel检查(turnCancelledRef/abortControllerRef.current?.signal.aborted/summaryAbort.signal.aborted)+ 通过anchorCallId的stale-summary检查——跨turn Ctrl+C竞态和乱序落地问题都解决了。toolUseSummary.ts的preTruncate在JSON序列化前对字符串叶节点做深度4的预截断——10MB ReadFile结果不会再被完整stringify后才discard。cleanSummary扩展了Unicode curly/CJK引号、markdown emphasis剥离,以及中文refusal前缀(我无法/我不能/抱歉/无法)。
CI状态
全绿(CodeQL + Lint + 9个 Test matrix 全过)。
剩余权衡(非blocking)
- 默认
default: true与上游Claude Code env-only/默认off不同,但已确认getFastModel()在未配置时返回undefined让feature完全跳过,零成本——可以接受。 - Data-boundary(fastModel跨provider)以文档形式记录而非code-level redaction,"metadata-only mode"已显式deferred到后续PR——也合理,作为设置项可由用户主动关闭。
设计文档、用户文档、Scope & Lifecycle章节、tmux端到端验证都很到位。整体工程质量高,可以合并。
…wenLM#3538) * feat(cli,core): generate tool-use summaries for compact mode After each tool batch completes, fire a parallel fast-model call to generate a short git-commit-subject-style label summarizing what the batch accomplished (e.g. "Read txt files", "Searched in auth/"). In compact mode the label replaces the generic "Tool × N" header so N parallel tool calls collapse to a single semantic row. The fast-model call (~1s) runs fire-and-forget, overlapped with the next turn's API stream, so there is no perceived latency. Missing fast model, aborted turns, and model failures all degrade silently to the existing rendering. The summary is also emitted as a `tool_use_summary` history entry with `precedingToolUseIds`, keeping the shape compatible with SDK clients that want to render collapsed tool views on their own. Gated by `experimental.emitToolUseSummaries` (default on). Can be overridden per-session with `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0|1`. The system prompt and truncation rules (300 chars per tool field, 200 chars of trailing assistant text as intent prefix) match the existing behavior seen in other tools that emit the same message type, so SDK consumers see a consistent shape across clients. * fix(core): bound cleanSummary quote-strip regex to avoid ReDoS CodeQL js/polynomial-redos flagged the /^["'`]+|["'`]+$/g pattern in cleanSummary because its input comes from an LLM (treated as uncontrolled). The original regex is anchored and linear in practice, but tightening the quantifier to {1,10} both satisfies the static check and caps engine work on pathological model output with a long run of quotes. Ten opening/closing quotes is well past anything a real label would produce. * fix(cli): render tool_use_summary inline so full mode also shows the label The summary was only visible in compact mode because the full-mode ToolGroupMessage ignored the compactLabel prop. Compact mode got away with this because mergeCompactToolGroups triggers refreshStatic(), which re-renders the merged tool_group with its newly-looked-up label. Full mode has no such refresh path, so when the fast-model call resolves *after* the tool_group has been committed to the append-only <Static>, there is no way to retroactively decorate it. Switch to rendering `tool_use_summary` as its own inline history item (a single dim `● <label>` line). New items append cleanly to <Static>, so the summary flows in naturally once the fast-model call resolves. Compact mode still replaces the merged tool_group header with the label and hides the standalone summary line via the `compactMode` guard. With this, the feature works under the default `ui.compactMode: false` — not just the opt-in compact view. * docs: tool-use-summaries feature guide, settings entry, and design doc Three new docs matching the existing fast-model feature docs layout: - docs/users/features/tool-use-summaries.md — user-facing guide covering full + compact rendering, configuration (settings + env), failure modes, cost, and cross-links to followup-suggestions. - docs/users/configuration/settings.md — register the new experimental.emitToolUseSummaries setting next to the other fast-model-driven UI settings. - docs/design/tool-use-summary/tool-use-summary-design.md — deep dive matching the compact-mode-design.md competitive-analysis style. Documents the Claude Code port (prompt, truncation, timing, gate), the deviations (settings layer, default on, cleanSummary, dual render paths), and the Ink <Static> append-only rationale that drove the inline full-mode render vs header-replacement split. * docs: add Recommended pairing section to tool-use-summaries Full-mode rendering of the summary works, but for small same-type batches (Read × 3 and similar) the label visibly restates what the tool lines already show. Pairing with ui.compactMode: true folds the whole batch into a single labeled row, which is the cleanest transcript shape once the label is available. Adds a dedicated section showing the paired settings.json snippet and explicitly calling out when each mode wins (and when to turn the feature off instead). * fix: address review feedback on tool-use summary generation Addresses multiple issues from @chiga0's review: Blocking — compact-mode label invisible for single-batch turns. mergeCompactToolGroups's adjacency-only gating left a trailing tool_use_summary in the merged result whenever there was no second batch to merge across. That pushed mergedHistory.length lock-step with history.length and MainContent's refreshStatic heuristic (currMLen <= prevMLen) never fired, so Ink's append-only <Static> never repainted the tool_group with its newly-looked-up label. Drop tool_use_summary items unconditionally now; gemini_thought still survives to avoid unnecessary repaints. New tests cover the single-batch case and the summary-before-user-message case. Blocking — stale summary appears after Ctrl+C on the next turn. summarySignal captured the CURRENT turn's AbortController, but the summary resolves during the NEXT turn's streaming window. The next turn's submitQuery allocates a fresh controller, so the captured signal was never aborted — Ctrl+C during the new turn used to let the previous turn's summary land in the transcript seconds later. Fix: dedicated per-batch AbortController tracked in a ref set, aborted eagerly from cancelOngoingRequest; resolve-time check reads the live abort state and turnCancelledRef. High — summarizer input pollution. geminiTools contained error/cancelled tools; retry-loop warnings and "Cancelled by user" strings were feeding the fast model. cleanSummary can only reject error-shaped output, not prevent the model from hallucinating a plausible label from bad input (the PR's own tmux screenshot showed "Read txt files · 5 tools" where 4 of the 5 were prior-retry failures). Filter to status === 'success' before building the prompt; skip the call entirely if nothing's left. High — unstable label on merged groups. getCompactLabel iterated all callIds and returned the first hit, so asynchronous resolution order made the header visibly flip from SB to SA when batch A resolved after batch B. Lock onto item.tools[0].callId to keep stable "leading batch governs" semantics. High — force-expanded groups in compact mode had no label at all. Compact mode routes non-force-expand groups through CompactToolGroupDisplay (consumes compactLabel) and force-expand groups through the full ToolGroupMessage (ignores compactLabel); the standalone ● line was gated on !compactMode, creating a dead zone — exactly the diagnostically valuable case. MainContent now computes absorbedCallIds (which groups actually consume the header replacement) and passes summaryAbsorbed to HistoryItemDisplay; force-expand groups in compact mode get the standalone line as the label's only path to the screen. Medium — cleanSummary robustness. Extend quote-strip to Unicode curly + CJK corner brackets; strip markdown emphasis (**bold**, _italic_); broaden refusal-prefix rejection to curly-apostrophe "I can't", Chinese "我无法 / 我不能 / 抱歉 / 无法", and "Failed to / Sorry, / Request failed". 7 new cleanSummary tests cover the added cases. Low — concurrent-rendering safety. Move historyRef.current = history from render phase into useLayoutEffect so bailed renders can't leave a dropped value. Low — CompactToolGroupDisplay readability. Extract renderSummaryHeader / renderDefaultHeader helpers and document the toolCalls.length > 1 count-suffix guard so a future "fix" to >= 1 doesn't reintroduce "Read config.json · 1 tools". Docs — add Scope & Lifecycle section to tool-use-summaries.md covering (1) one generation per batch shared by both modes, (2) no backfill on toggle / session resume, (3) main-agent batches only with the Task-tool clarification. * fix: address second-round review feedback on tool-use summaries Critical — force-expand groups lost their summary entirely. Previous round's "drop tool_use_summary unconditionally" merge fix also stripped summaries for force-expanded groups, defeating the exact case (errors, confirmations, focused shell) where the standalone ● label is the label's only path to the screen. The merge function now takes an absorbedCallIds set: summaries whose preceding callIds are all absorbed by a compact tool_group header are dropped (so refreshStatic still fires), but force-expanded summaries pass through to be rendered standalone by HistoryItemDisplay. MainContent computes absorbedCallIds from raw history and passes it in. New tests cover both the absorbed-drop and the force-expand-preserve cases plus the empty-set default for callers that don't compute absorption. Suggestion — late-arriving summaries could land out of order. A slow fast-model call could resolve after the next turn's content was committed, planting the ● label between later items in full mode. The resolve callback now captures the first batch callId, locates the corresponding tool_group at resolve time, and drops the summary if a newer tool_group has already appeared in history. New test exercises this with a manually-resolved fast-model promise. Suggestion — truncateJson allocated full JSON for large strings. A 10MB ReadFile result was being JSON.stringify'd in full only to be sliced down to 300 chars. Added preTruncate that walks the value (depth-bounded to 4) and slices string leaves to maxLength before serialization. Tests verify the input never reaches its full pre-cap form. Suggestion — settings description over-claimed SDK emission. The description said summaries are emitted to SDK clients as a tool_use_summary message; the SDK plumbing isn't actually wired in this PR (the factory is exported for follow-up). Updated settings.json description and regenerated the vscode schema to state CLI-only scope explicitly. Suggestion — fastModel data-boundary not documented. When fastModel uses a different provider than the main session model, tool inputs/outputs cross a new auth boundary that users may not expect. Added "Data flow & privacy" section to the user feature doc spelling out: same-provider fast model = no scope change; different-provider = strictly larger sharing scope; two escape hatches (same-provider fast model OR feature off). Code-level mitigation (metadata-only mode) deferred.
Why
When the model fans out into parallel tool calls, today's UI shows only mechanical information — which tool ran, with what argument, whether it succeeded. There is no synthesis of why the batch was run.
Ctrl+O) — the group collapses to a singleTool × Nheader plus the last tool's description. ForGrep × 3 + Read × 2you only see the tail tool's name; the intent is lost entirely. This is the primary target of the PR.ui.compactMode: false, the default) — individual tool lines are already visible, so the improvement is scenario-dependent: significant for batches that are large or heterogeneous, marginal for small same-type batches where the tool names already tell the whole story. See When the label actually helps for the honest breakdown. Full-mode rendering exists so the feature does not silently disappear for users who leave compact mode off.A short semantic label after each batch — "Read 4 text files", "Searched in auth/", "Fixed NPE in UserService" — closes the gap where it actually exists without hiding the tool details.
What
After each tool batch finalizes, fire a fast-model call that returns a short git-commit-subject-style label. Both full and compact modes now surface the label so the feature works under the default
ui.compactMode: false:Full mode (default) — label appears inline below the tool group:
Compact mode — label replaces the generic
Tool × Nheader:How
handleCompletedToolsinuseGeminiStream, keyed on the turn's abort signal. It overlaps with the next turn's API stream so the ~1s fast-model roundtrip adds no perceived latency.tool_use_summaryhistory item is added with the label and the batch'sprecedingToolUseIds. The history item appends cleanly to Ink's append-only<Static>, so the label shows up even though the tool_group itself is already frozen.HistoryItemDisplayrenders thetool_use_summaryas its own dim● <label>line — natural flow below the tool group.MainContentbuilds acallId → summarylookup and passes the label toCompactToolGroupDisplayas acompactLabelprop. The component rendersSummary · N toolsin place of the default header. The standalone summary line is hidden in compact mode to avoid duplication.mergeCompactToolGroupstreatstool_use_summaryas hidden-in-compact so two consecutive batches still merge across it; the merged group takes the first contributing batch's label.When the label actually helps
The benefit is not uniform across all batches — being honest about where this earns its keep:
Read × 4)Tool × Nheader with semantic intentGrep + Read + Edit + Bash)The
Read × 4example used in this PR's screenshots and the tmux transcript is deliberately in the low-benefit cell — the label ("Read 4 text files") is almost redundant with the individual tool lines. The point of showing it there was to prove the plumbing works end-to-end under a reproducible setup, not to argue that it is especially valuable for that specific shape. Users whose workflows are dominated by small same-type batches and who are running in full mode may want to setexperimental.emitToolUseSummaries: falseto avoid the per-batch cost — the escape hatch is part of the design.Compact mode (and anything downstream of the SDK) is where this PR pays off most consistently; the full-mode inline render exists so the feature does not silently disappear when a user toggles
ui.compactMode: false(the default), not because it's revelatory for every batch.Configuration
experimental.emitToolUseSummariessettingtrueQWEN_CODE_EMIT_TOOL_USE_SUMMARIESenv var=0forces off,=1forces on, overrides settingsfastModelsettingCompared to the upstream inspiration, which only exposes a same-named env gate (no settings layer, default off), this PR adds a persistent settings toggle and defaults on — rationale and trade-offs covered in the design doc (see Deviations).
Failure modes (all silent)
fastModelconfigured → skipped, no history item added.summarySignal.abortedcheck prevents staleaddItem.Error:,I cannot …, etc. — filtered bycleanSummary) → no history item, UI shows the default view.Cost / compat
tool_use_summaryitems are not persisted viaChatRecordingService; resuming a session loses labels but tool groups render normally (no user-visible difference beyond label absence).runNonInteractive) are unchanged. The trigger lives in the interactive stream hook only.fastModelset; others see the current behavior unchanged.Not in scope
The message factory
createToolUseSummaryMessageandToolUseSummaryMessagetype are exported from core so a future PR can wire them into the SDK stream / non-interactive output. This PR only consumes the generated summary in the interactive CLI.Live tmux verification
Reproducible end-to-end proof against a real
fastModel. Both runs use the same 4 scratch files; onlyui.compactModediffers.Setup (both runs)
Full mode (default —
ui.compactMode: false)Raw
tmux capture-pane -poutput, turn end state:Key signal: the dim
● Read 4 text filesline below the tool group — emitted after tools complete, flowed in naturally once the fast-model call resolved. This is the label reviewers should look for.Compact mode (
ui.compactMode: true, or toggle with Ctrl+O)Same prompt,
compactMode: true. Raw capture (the earlier retry batch stays force-expanded because it contains errors; the successful retry collapses to a single compact row):Key signals:
Read txt files, not the genericReadFile × 5.5 toolsrather than4becausemergeCompactToolGroupsfolded the earlier first attempt in; that's expected merge behavior and unaffected by this PR.Files
Docs
docs/users/features/tool-use-summaries.md— user-facing guide (rendering, triggers, fast-model requirement, cost, failure modes).docs/users/configuration/settings.md— registerexperimental.emitToolUseSummariesalongside other fast-model-driven settings.docs/users/features/_meta.ts— add the new feature page to the sidebar.docs/design/tool-use-summary/tool-use-summary-design.md— design doc (competitive analysis, flow, key files, Ink<Static>append-only rationale, deviations, limitations, future work).Core
packages/core/src/services/toolUseSummary.ts— system prompt,truncateJson,cleanSummary,generateToolUseSummary,createToolUseSummaryMessagefactory +ToolUseSummaryMessagetype.packages/core/src/config/config.ts—getEmitToolUseSummaries()with env override.CLI
useGeminiStream.ts— fires generation inhandleCompletedTools, liftshistoryto a ref so the callback stays stable.HistoryItemDisplay.tsx— renderstool_use_summaryas a standalone inline● labelline (gated on!compactMode).MainContent.tsx—callId → summarylookup, passescompactLabeldown toHistoryItemDisplayfor the compact-mode header replacement.ToolGroupMessage/CompactToolGroupDisplay— propagate and render the compact-mode label.mergeCompactToolGroups.ts— treattool_use_summaryas hidden-in-compact.settingsSchema.ts—experimental.emitToolUseSummaries, default true, visible in settings dialog.Test plan
toolUseSummary.test.ts— 27 tests: truncation, cleaning (including CJK, error-message rejection, length cap), UUID + timestamp, model invocation + prompt contents, abort handling, JSON serialization edge cases.useGeminiStream.test.tsx— 4 new integration tests: disabled gate / missing fast model / success path / empty model result, each asserting on whetheraddItemsaw atool_use_summarypayload.CompactToolGroupDisplay.test.tsx— 5 tests for the label-vs-default rendering branches (plus 3 pre-existing timeout tests from feat(cli): combine elapsed + timeout in shell time indicator #3512).HistoryItemDisplay.test.tsx— new test verifyingtool_use_summaryrenders as a dim● <label>line in full mode.mergeCompactToolGroups.test.ts— 1 new case: two batches separated by atool_use_summarystill merge into one group.tsc --noEmiton both packages: clean.fastModel(dashscopegpt-5.4) — transcripts above.CodeQL
cleanSummary's quote-strip regex was flagged as polynomial ReDoS because the input is LLM-produced. Fix in 2nd commit bounds the quantifier to{1,10}(ten opening/closing quotes is well past anything a real label produces); alert auto-closed asfixed.