feat(cli,core): LLM-generated summary labels for tool-call batches#106
Open
BingqingLyu wants to merge 7 commits into
Open
feat(cli,core): LLM-generated summary labels for tool-call batches#106BingqingLyu wants to merge 7 commits into
BingqingLyu wants to merge 7 commits into
Conversation
After each tool batch completes, fire a parallel fast-model call to generate a short git-commit-subject-style label summarizing what the batch accomplished (e.g. "Read txt files", "Searched in auth/"). In compact mode the label replaces the generic "Tool × N" header so N parallel tool calls collapse to a single semantic row. The fast-model call (~1s) runs fire-and-forget, overlapped with the next turn's API stream, so there is no perceived latency. Missing fast model, aborted turns, and model failures all degrade silently to the existing rendering. The summary is also emitted as a `tool_use_summary` history entry with `precedingToolUseIds`, keeping the shape compatible with SDK clients that want to render collapsed tool views on their own. Gated by `experimental.emitToolUseSummaries` (default on). Can be overridden per-session with `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0|1`. The system prompt and truncation rules (300 chars per tool field, 200 chars of trailing assistant text as intent prefix) match the existing behavior seen in other tools that emit the same message type, so SDK consumers see a consistent shape across clients.
CodeQL js/polynomial-redos flagged the /^["'`]+|["'`]+$/g pattern in
cleanSummary because its input comes from an LLM (treated as
uncontrolled). The original regex is anchored and linear in practice,
but tightening the quantifier to {1,10} both satisfies the static
check and caps engine work on pathological model output with a long
run of quotes. Ten opening/closing quotes is well past anything a real
label would produce.
…label The summary was only visible in compact mode because the full-mode ToolGroupMessage ignored the compactLabel prop. Compact mode got away with this because mergeCompactToolGroups triggers refreshStatic(), which re-renders the merged tool_group with its newly-looked-up label. Full mode has no such refresh path, so when the fast-model call resolves *after* the tool_group has been committed to the append-only <Static>, there is no way to retroactively decorate it. Switch to rendering `tool_use_summary` as its own inline history item (a single dim `● <label>` line). New items append cleanly to <Static>, so the summary flows in naturally once the fast-model call resolves. Compact mode still replaces the merged tool_group header with the label and hides the standalone summary line via the `compactMode` guard. With this, the feature works under the default `ui.compactMode: false` — not just the opt-in compact view.
Three new docs matching the existing fast-model feature docs layout: - docs/users/features/tool-use-summaries.md — user-facing guide covering full + compact rendering, configuration (settings + env), failure modes, cost, and cross-links to followup-suggestions. - docs/users/configuration/settings.md — register the new experimental.emitToolUseSummaries setting next to the other fast-model-driven UI settings. - docs/design/tool-use-summary/tool-use-summary-design.md — deep dive matching the compact-mode-design.md competitive-analysis style. Documents the Claude Code port (prompt, truncation, timing, gate), the deviations (settings layer, default on, cleanSummary, dual render paths), and the Ink <Static> append-only rationale that drove the inline full-mode render vs header-replacement split.
Full-mode rendering of the summary works, but for small same-type batches (Read × 3 and similar) the label visibly restates what the tool lines already show. Pairing with ui.compactMode: true folds the whole batch into a single labeled row, which is the cleanest transcript shape once the label is available. Adds a dedicated section showing the paired settings.json snippet and explicitly calling out when each mode wins (and when to turn the feature off instead).
Addresses multiple issues from @chiga0's review: Blocking — compact-mode label invisible for single-batch turns. mergeCompactToolGroups's adjacency-only gating left a trailing tool_use_summary in the merged result whenever there was no second batch to merge across. That pushed mergedHistory.length lock-step with history.length and MainContent's refreshStatic heuristic (currMLen <= prevMLen) never fired, so Ink's append-only <Static> never repainted the tool_group with its newly-looked-up label. Drop tool_use_summary items unconditionally now; gemini_thought still survives to avoid unnecessary repaints. New tests cover the single-batch case and the summary-before-user-message case. Blocking — stale summary appears after Ctrl+C on the next turn. summarySignal captured the CURRENT turn's AbortController, but the summary resolves during the NEXT turn's streaming window. The next turn's submitQuery allocates a fresh controller, so the captured signal was never aborted — Ctrl+C during the new turn used to let the previous turn's summary land in the transcript seconds later. Fix: dedicated per-batch AbortController tracked in a ref set, aborted eagerly from cancelOngoingRequest; resolve-time check reads the live abort state and turnCancelledRef. High — summarizer input pollution. geminiTools contained error/cancelled tools; retry-loop warnings and "Cancelled by user" strings were feeding the fast model. cleanSummary can only reject error-shaped output, not prevent the model from hallucinating a plausible label from bad input (the PR's own tmux screenshot showed "Read txt files · 5 tools" where 4 of the 5 were prior-retry failures). Filter to status === 'success' before building the prompt; skip the call entirely if nothing's left. High — unstable label on merged groups. getCompactLabel iterated all callIds and returned the first hit, so asynchronous resolution order made the header visibly flip from SB to SA when batch A resolved after batch B. Lock onto item.tools[0].callId to keep stable "leading batch governs" semantics. High — force-expanded groups in compact mode had no label at all. Compact mode routes non-force-expand groups through CompactToolGroupDisplay (consumes compactLabel) and force-expand groups through the full ToolGroupMessage (ignores compactLabel); the standalone ● line was gated on !compactMode, creating a dead zone — exactly the diagnostically valuable case. MainContent now computes absorbedCallIds (which groups actually consume the header replacement) and passes summaryAbsorbed to HistoryItemDisplay; force-expand groups in compact mode get the standalone line as the label's only path to the screen. Medium — cleanSummary robustness. Extend quote-strip to Unicode curly + CJK corner brackets; strip markdown emphasis (**bold**, _italic_); broaden refusal-prefix rejection to curly-apostrophe "I can't", Chinese "我无法 / 我不能 / 抱歉 / 无法", and "Failed to / Sorry, / Request failed". 7 new cleanSummary tests cover the added cases. Low — concurrent-rendering safety. Move historyRef.current = history from render phase into useLayoutEffect so bailed renders can't leave a dropped value. Low — CompactToolGroupDisplay readability. Extract renderSummaryHeader / renderDefaultHeader helpers and document the toolCalls.length > 1 count-suffix guard so a future "fix" to >= 1 doesn't reintroduce "Read config.json · 1 tools". Docs — add Scope & Lifecycle section to tool-use-summaries.md covering (1) one generation per batch shared by both modes, (2) no backfill on toggle / session resume, (3) main-agent batches only with the Task-tool clarification.
Critical — force-expand groups lost their summary entirely. Previous round's "drop tool_use_summary unconditionally" merge fix also stripped summaries for force-expanded groups, defeating the exact case (errors, confirmations, focused shell) where the standalone ● label is the label's only path to the screen. The merge function now takes an absorbedCallIds set: summaries whose preceding callIds are all absorbed by a compact tool_group header are dropped (so refreshStatic still fires), but force-expanded summaries pass through to be rendered standalone by HistoryItemDisplay. MainContent computes absorbedCallIds from raw history and passes it in. New tests cover both the absorbed-drop and the force-expand-preserve cases plus the empty-set default for callers that don't compute absorption. Suggestion — late-arriving summaries could land out of order. A slow fast-model call could resolve after the next turn's content was committed, planting the ● label between later items in full mode. The resolve callback now captures the first batch callId, locates the corresponding tool_group at resolve time, and drops the summary if a newer tool_group has already appeared in history. New test exercises this with a manually-resolved fast-model promise. Suggestion — truncateJson allocated full JSON for large strings. A 10MB ReadFile result was being JSON.stringify'd in full only to be sliced down to 300 chars. Added preTruncate that walks the value (depth-bounded to 4) and slices string leaves to maxLength before serialization. Tests verify the input never reaches its full pre-cap form. Suggestion — settings description over-claimed SDK emission. The description said summaries are emitted to SDK clients as a tool_use_summary message; the SDK plumbing isn't actually wired in this PR (the factory is exported for follow-up). Updated settings.json description and regenerated the vscode schema to state CLI-only scope explicitly. Suggestion — fastModel data-boundary not documented. When fastModel uses a different provider than the main session model, tool inputs/outputs cross a new auth boundary that users may not expect. Added "Data flow & privacy" section to the user feature doc spelling out: same-provider fast model = no scope change; different-provider = strictly larger sharing scope; two escape hatches (same-provider fast model OR feature off). Code-level mitigation (metadata-only mode) deferred.
This was referenced Apr 28, 2026
This was referenced May 7, 2026
Owner
Author
Conflict Group 1This PR shares modified functions with 14 other PR(s): #112, #113, #114, #117, #18, #36, #46, #55, #6, #75, #86, #88, #9, #96. These PRs should be reviewed as a batch — merging one may affect the others.
graph LR
PR106["PR #106"]
FisSdkMcpServerConfig_803["isSdkMcpServerConfig<br>config.ts"]
PR106 -->|modifies| FisSdkMcpServerConfig_803
PR112["PR #112"]
PR112 -->|modifies| FisSdkMcpServerConfig_803
PR113["PR #113"]
PR113 -->|modifies| FisSdkMcpServerConfig_803
PR114["PR #114"]
PR114 -->|modifies| FisSdkMcpServerConfig_803
PR117["PR #117"]
PR117 -->|modifies| FisSdkMcpServerConfig_803
PR18["PR #18"]
PR18 -->|modifies| FisSdkMcpServerConfig_803
PR46["PR #46"]
PR46 -->|modifies| FisSdkMcpServerConfig_803
PR75["PR #75"]
PR75 -->|modifies| FisSdkMcpServerConfig_803
PR86["PR #86"]
PR86 -->|modifies| FisSdkMcpServerConfig_803
PR88["PR #88"]
PR88 -->|modifies| FisSdkMcpServerConfig_803
FloadCliConfig_6977["loadCliConfig<br>config.ts"]
PR106 -->|modifies| FloadCliConfig_6977
PR112 -->|modifies| FloadCliConfig_6977
PR113 -->|modifies| FloadCliConfig_6977
PR114 -->|modifies| FloadCliConfig_6977
PR117 -->|modifies| FloadCliConfig_6977
PR36["PR #36"]
PR36 -->|modifies| FloadCliConfig_6977
PR46 -->|modifies| FloadCliConfig_6977
PR75 -->|modifies| FloadCliConfig_6977
PR86 -->|modifies| FloadCliConfig_6977
PR88 -->|modifies| FloadCliConfig_6977
FnormalizeConfigOutputFormat_803["normalizeConfigOutputFormat<br>config.ts"]
PR106 -->|modifies| FnormalizeConfigOutputFormat_803
PR112 -->|modifies| FnormalizeConfigOutputFormat_803
PR113 -->|modifies| FnormalizeConfigOutputFormat_803
PR114 -->|modifies| FnormalizeConfigOutputFormat_803
PR117 -->|modifies| FnormalizeConfigOutputFormat_803
PR18 -->|modifies| FnormalizeConfigOutputFormat_803
PR75 -->|modifies| FnormalizeConfigOutputFormat_803
PR86 -->|modifies| FnormalizeConfigOutputFormat_803
PR88 -->|modifies| FnormalizeConfigOutputFormat_803
FshowCitations_6790["showCitations<br>useGeminiStream.ts"]
PR106 -->|modifies| FshowCitations_6790
PR55["PR #55"]
PR55 -->|modifies| FshowCitations_6790
PR6["PR #6"]
PR6 -->|modifies| FshowCitations_6790
PR9["PR #9"]
PR9 -->|modifies| FshowCitations_6790
PR96["PR #96"]
PR96 -->|modifies| FshowCitations_6790
Posted by codegraph-ai conflict detection. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
When the model fans out into parallel tool calls, today's UI shows only mechanical information — which tool ran, with what argument, whether it succeeded. There is no synthesis of why the batch was run.
Ctrl+O) — the group collapses to a singleTool × Nheader plus the last tool's description. ForGrep × 3 + Read × 2you only see the tail tool's name; the intent is lost entirely. This is the primary target of the PR.ui.compactMode: false, the default) — individual tool lines are already visible, so the improvement is scenario-dependent: significant for batches that are large or heterogeneous, marginal for small same-type batches where the tool names already tell the whole story. See When the label actually helps for the honest breakdown. Full-mode rendering exists so the feature does not silently disappear for users who leave compact mode off.A short semantic label after each batch — "Read 4 text files", "Searched in auth/", "Fixed NPE in UserService" — closes the gap where it actually exists without hiding the tool details.
What
After each tool batch finalizes, fire a fast-model call that returns a short git-commit-subject-style label. Both full and compact modes now surface the label so the feature works under the default
ui.compactMode: false:Full mode (default) — label appears inline below the tool group:
Compact mode — label replaces the generic
Tool × Nheader:How
handleCompletedToolsinuseGeminiStream, keyed on the turn's abort signal. It overlaps with the next turn's API stream so the ~1s fast-model roundtrip adds no perceived latency.tool_use_summaryhistory item is added with the label and the batch'sprecedingToolUseIds. The history item appends cleanly to Ink's append-only<Static>, so the label shows up even though the tool_group itself is already frozen.HistoryItemDisplayrenders thetool_use_summaryas its own dim● <label>line — natural flow below the tool group.MainContentbuilds acallId → summarylookup and passes the label toCompactToolGroupDisplayas acompactLabelprop. The component rendersSummary · N toolsin place of the default header. The standalone summary line is hidden in compact mode to avoid duplication.mergeCompactToolGroupstreatstool_use_summaryas hidden-in-compact so two consecutive batches still merge across it; the merged group takes the first contributing batch's label.When the label actually helps
The benefit is not uniform across all batches — being honest about where this earns its keep:
Read × 4)Tool × Nheader with semantic intentGrep + Read + Edit + Bash)The
Read × 4example used in this PR's screenshots and the tmux transcript is deliberately in the low-benefit cell — the label ("Read 4 text files") is almost redundant with the individual tool lines. The point of showing it there was to prove the plumbing works end-to-end under a reproducible setup, not to argue that it is especially valuable for that specific shape. Users whose workflows are dominated by small same-type batches and who are running in full mode may want to setexperimental.emitToolUseSummaries: falseto avoid the per-batch cost — the escape hatch is part of the design.Compact mode (and anything downstream of the SDK) is where this PR pays off most consistently; the full-mode inline render exists so the feature does not silently disappear when a user toggles
ui.compactMode: false(the default), not because it's revelatory for every batch.Configuration
experimental.emitToolUseSummariessettingtrueQWEN_CODE_EMIT_TOOL_USE_SUMMARIESenv var=0forces off,=1forces on, overrides settingsfastModelsettingCompared to the upstream inspiration, which only exposes a same-named env gate (no settings layer, default off), this PR adds a persistent settings toggle and defaults on — rationale and trade-offs covered in the design doc (see Deviations).
Failure modes (all silent)
fastModelconfigured → skipped, no history item added.summarySignal.abortedcheck prevents staleaddItem.Error:,I cannot …, etc. — filtered bycleanSummary) → no history item, UI shows the default view.Cost / compat
tool_use_summaryitems are not persisted viaChatRecordingService; resuming a session loses labels but tool groups render normally (no user-visible difference beyond label absence).runNonInteractive) are unchanged. The trigger lives in the interactive stream hook only.fastModelset; others see the current behavior unchanged.Not in scope
The message factory
createToolUseSummaryMessageandToolUseSummaryMessagetype are exported from core so a future PR can wire them into the SDK stream / non-interactive output. This PR only consumes the generated summary in the interactive CLI.Live tmux verification
Reproducible end-to-end proof against a real
fastModel. Both runs use the same 4 scratch files; onlyui.compactModediffers.Setup (both runs)
Full mode (default —
ui.compactMode: false)Raw
tmux capture-pane -poutput, turn end state:Key signal: the dim
● Read 4 text filesline below the tool group — emitted after tools complete, flowed in naturally once the fast-model call resolved. This is the label reviewers should look for.Compact mode (
ui.compactMode: true, or toggle with Ctrl+O)Same prompt,
compactMode: true. Raw capture (the earlier retry batch stays force-expanded because it contains errors; the successful retry collapses to a single compact row):Key signals:
Read txt files, not the genericReadFile × 5.5 toolsrather than4becausemergeCompactToolGroupsfolded the earlier first attempt in; that's expected merge behavior and unaffected by this PR.Files
Docs
docs/users/features/tool-use-summaries.md— user-facing guide (rendering, triggers, fast-model requirement, cost, failure modes).docs/users/configuration/settings.md— registerexperimental.emitToolUseSummariesalongside other fast-model-driven settings.docs/users/features/_meta.ts— add the new feature page to the sidebar.docs/design/tool-use-summary/tool-use-summary-design.md— design doc (competitive analysis, flow, key files, Ink<Static>append-only rationale, deviations, limitations, future work).Core
packages/core/src/services/toolUseSummary.ts— system prompt,truncateJson,cleanSummary,generateToolUseSummary,createToolUseSummaryMessagefactory +ToolUseSummaryMessagetype.packages/core/src/config/config.ts—getEmitToolUseSummaries()with env override.CLI
useGeminiStream.ts— fires generation inhandleCompletedTools, liftshistoryto a ref so the callback stays stable.HistoryItemDisplay.tsx— renderstool_use_summaryas a standalone inline● labelline (gated on!compactMode).MainContent.tsx—callId → summarylookup, passescompactLabeldown toHistoryItemDisplayfor the compact-mode header replacement.ToolGroupMessage/CompactToolGroupDisplay— propagate and render the compact-mode label.mergeCompactToolGroups.ts— treattool_use_summaryas hidden-in-compact.settingsSchema.ts—experimental.emitToolUseSummaries, default true, visible in settings dialog.Test plan
toolUseSummary.test.ts— 27 tests: truncation, cleaning (including CJK, error-message rejection, length cap), UUID + timestamp, model invocation + prompt contents, abort handling, JSON serialization edge cases.useGeminiStream.test.tsx— 4 new integration tests: disabled gate / missing fast model / success path / empty model result, each asserting on whetheraddItemsaw atool_use_summarypayload.CompactToolGroupDisplay.test.tsx— 5 tests for the label-vs-default rendering branches (plus 3 pre-existing timeout tests from feat(cli): combine elapsed + timeout in shell time indicator QwenLM/qwen-code#3512).HistoryItemDisplay.test.tsx— new test verifyingtool_use_summaryrenders as a dim● <label>line in full mode.mergeCompactToolGroups.test.ts— 1 new case: two batches separated by atool_use_summarystill merge into one group.tsc --noEmiton both packages: clean.fastModel(dashscopegpt-5.4) — transcripts above.CodeQL
cleanSummary's quote-strip regex was flagged as polynomial ReDoS because the input is LLM-produced. Fix in 2nd commit bounds the quantifier to{1,10}(ten opening/closing quotes is well past anything a real label produces); alert auto-closed asfixed.