feat(cli,core): LLM-generated summary labels for tool-call batches by wenshao · Pull Request #3538 · QwenLM/qwen-code

wenshao · 2026-04-23T01:48:55Z

Why

When the model fans out into parallel tool calls, today's UI shows only mechanical information — which tool ran, with what argument, whether it succeeded. There is no synthesis of why the batch was run.

Compact mode (Ctrl+O) — the group collapses to a single Tool × N header plus the last tool's description. For Grep × 3 + Read × 2 you only see the tail tool's name; the intent is lost entirely. This is the primary target of the PR.
Narrow / SDK UIs (mobile, sidebars, pending SDK consumers) — the one-line cell is the whole signal, and the generic header says almost nothing. Same motivation as above.
Full mode (ui.compactMode: false, the default) — individual tool lines are already visible, so the improvement is scenario-dependent: significant for batches that are large or heterogeneous, marginal for small same-type batches where the tool names already tell the whole story. See When the label actually helps for the honest breakdown. Full-mode rendering exists so the feature does not silently disappear for users who leave compact mode off.

A short semantic label after each batch — "Read 4 text files", "Searched in auth/", "Fixed NPE in UserService" — closes the gap where it actually exists without hiding the tool details.

What

After each tool batch finalizes, fire a fast-model call that returns a short git-commit-subject-style label. Both full and compact modes now surface the label so the feature works under the default ui.compactMode: false:

Full mode (default) — label appears inline below the tool group:

╭──────────────────────────────────────────────╮
│ ✓  ReadFile a.txt (pages 1)                  │
│ ✓  ReadFile b.txt (pages 1)                  │
│ ✓  ReadFile c.txt (pages 1)                  │
│ ✓  ReadFile d.txt (pages 1)                  │
╰──────────────────────────────────────────────╯

 ● Read 4 text files

Compact mode — label replaces the generic Tool × N header:

╭──────────────────────────────────────────────╮
│✓  Read txt files  · 4 tools                  │
│Press Ctrl+O to show full tool output         │
╰──────────────────────────────────────────────╯

How

Generation runs fire-and-forget from handleCompletedTools in useGeminiStream, keyed on the turn's abort signal. It overlaps with the next turn's API stream so the ~1s fast-model roundtrip adds no perceived latency.
On resolve, a tool_use_summary history item is added with the label and the batch's precedingToolUseIds. The history item appends cleanly to Ink's append-only <Static>, so the label shows up even though the tool_group itself is already frozen.
In full mode, HistoryItemDisplay renders the tool_use_summary as its own dim ● <label> line — natural flow below the tool group.
In compact mode, MainContent builds a callId → summary lookup and passes the label to CompactToolGroupDisplay as a compactLabel prop. The component renders Summary · N tools in place of the default header. The standalone summary line is hidden in compact mode to avoid duplication.
mergeCompactToolGroups treats tool_use_summary as hidden-in-compact so two consecutive batches still merge across it; the merged group takes the first contributing batch's label.
Force-expand groups (errors, confirmations, focused shell, user-initiated) bypass the compact path entirely — same as before — and their inline summary line still renders in full mode.

When the label actually helps

The benefit is not uniform across all batches — being honest about where this earns its keep:

Batch shape	Full-mode benefit	Compact-mode benefit
2–4 same-type calls (e.g. `Read × 4`)	Minimal — label restates what the visible tool lines already convey	Meaningful — replaces the generic `Tool × N` header with semantic intent
5–9 same-type calls	Mild — saves some scanning	Meaningful
10+ calls	Meaningful — intent synthesis avoids reading every line	High
Heterogeneous (`Grep + Read + Edit + Bash`)	High — no single tool name implies the collective intent	High
Historical scrollback (transcript navigation)	High — dim labels act as section headers when reviewing a session	High

The Read × 4 example used in this PR's screenshots and the tmux transcript is deliberately in the low-benefit cell — the label ("Read 4 text files") is almost redundant with the individual tool lines. The point of showing it there was to prove the plumbing works end-to-end under a reproducible setup, not to argue that it is especially valuable for that specific shape. Users whose workflows are dominated by small same-type batches and who are running in full mode may want to set experimental.emitToolUseSummaries: false to avoid the per-batch cost — the escape hatch is part of the design.

Compact mode (and anything downstream of the SDK) is where this PR pays off most consistently; the full-mode inline render exists so the feature does not silently disappear when a user toggles ui.compactMode: false (the default), not because it's revelatory for every batch.

Configuration

Lever	Default	Effect
`experimental.emitToolUseSummaries` setting	`true`	Turn off if the extra fast-model call is unwanted
`QWEN_CODE_EMIT_TOOL_USE_SUMMARIES` env var	unset	`=0` forces off, `=1` forces on, overrides settings
`fastModel` setting	unset	Required; without a fast model, generation is skipped and the UI falls back to no label with no cost impact

Compared to the upstream inspiration, which only exposes a same-named env gate (no settings layer, default off), this PR adds a persistent settings toggle and defaults on — rationale and trade-offs covered in the design doc (see Deviations).

Failure modes (all silent)

No fastModel configured → skipped, no history item added.
Aborted turn → the generation promise drops on the floor; the summarySignal.aborted check prevents stale addItem.
API error / empty model output / rejected label (prefixed Error:, I cannot …, etc. — filtered by cleanSummary) → no history item, UI shows the default view.

Cost / compat

One fast-model call per qualifying tool batch. Prompt is ~300 input tokens × (number of tools) capped at 300 chars/field, output is ~20 tokens. At typical fast-model pricing, roughly $0.001 per batch.
tool_use_summary items are not persisted via ChatRecordingService; resuming a session loses labels but tool groups render normally (no user-visible difference beyond label absence).
Non-interactive CLI paths (runNonInteractive) are unchanged. The trigger lives in the interactive stream hook only.
No breaking changes. Existing users see labels appear automatically if they have a fastModel set; others see the current behavior unchanged.

Not in scope

The message factory createToolUseSummaryMessage and ToolUseSummaryMessage type are exported from core so a future PR can wire them into the SDK stream / non-interactive output. This PR only consumes the generated summary in the interactive CLI.

Live tmux verification

Reproducible end-to-end proof against a real fastModel. Both runs use the same 4 scratch files; only ui.compactMode differs.

Setup (both runs)

mkdir -p /tmp/qwen-summary-test
cd /tmp/qwen-summary-test
printf 'line one\nline two\n'                 > a.txt
printf 'hello from b\n'                       > b.txt
printf 'contents of c\n'                      > c.txt
printf 'd has four lines\nfoo\nbar\nbaz\n'    > d.txt

# project-scoped settings (global settings.json provides fastModel: gpt-5.4)
mkdir -p .qwen
cat > .qwen/settings.json <<JSON
{ "ui": { "compactMode": false },
  "experimental": { "emitToolUseSummaries": true } }
JSON

# launch inside a tmux pane
tmux new-session -d -s qwentest -x 220 -y 50 -c /tmp/qwen-summary-test
tmux send-keys -t qwentest "node /path/to/qwen-code/packages/cli" Enter
# (wait for banner, then send the prompt and an Enter):
tmux send-keys -t qwentest "Read a.txt b.txt c.txt d.txt using pages=1 then summarize in one sentence." Enter
tmux send-keys -t qwentest Enter

Full mode (default — `ui.compactMode: false`)

Raw tmux capture-pane -p output, turn end state:

  > Read a.txt b.txt c.txt d.txt using pages=1 then summarize in one sentence.

  ✦  think I need to figure out how to handle file reading. For a trivial single task, it might not be
    necessary to use todo. Now, I want to read absolute files, and I'm wondering if I can use the pages
    argument for PDFs, although I might face errors or it could be ignored. The user mentioned using
    pages = 1, so I should comply, just in case. Plus, I need to read four files, maybe in parallel.


  ╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
  │ ✓  ReadFile a.txt (pages 1)                                                                          │
  │                                                                                                      │
  │ ✓  ReadFile b.txt (pages 1)                                                                          │
  │                                                                                                      │
  │ ✓  ReadFile c.txt (pages 1)                                                                          │
  │                                                                                                      │
  │ ✓  ReadFile d.txt (pages 1)                                                                          │
  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯

   ● Read 4 text files

  ✦ 四个文件分别包含一些简短文本：a.txt 有两行普通内容，b.txt 和 c.txt 各有一行说明文字，d.txt 则有
    四行以 "d has four lines" 开头的内容。

Key signal: the dim ● Read 4 text files line below the tool group — emitted after tools complete, flowed in naturally once the fast-model call resolved. This is the label reviewers should look for.

Compact mode (`ui.compactMode: true`, or toggle with Ctrl+O)

Same prompt, compactMode: true. Raw capture (the earlier retry batch stays force-expanded because it contains errors; the successful retry collapses to a single compact row):

  ╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
  │ x  ReadFile {"file_path":".../a.txt", ... , "pages":""}                                              │
  │    Invalid pages parameter: ''. Use formats like '5' or '1-10'.                                      │
  │ x  ReadFile {... b/c/d ...}                                                                          │
  │    ⚠️ RETRY LOOP DETECTED: ...                                                                        │
  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯

  ╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
  │✓  Read txt files  · 5 tools                                                                          │
  │Press Ctrl+O to show full tool output                                                                 │
  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯

  ✦ 四个文件分别包含：a.txt 的两行示例文本、b.txt 的一行问候语、c.txt 的一行内容说明，以及 d.txt 的
    四行简单占位文本。

Key signals:

The second box's header is the generated label Read txt files, not the generic ReadFile × 5.
Error group is untouched (force-expand bypass — same as before).
5 tools rather than 4 because mergeCompactToolGroups folded the earlier first attempt in; that's expected merge behavior and unaffected by this PR.

Files

Docs

docs/users/features/tool-use-summaries.md — user-facing guide (rendering, triggers, fast-model requirement, cost, failure modes).
docs/users/configuration/settings.md — register experimental.emitToolUseSummaries alongside other fast-model-driven settings.
docs/users/features/_meta.ts — add the new feature page to the sidebar.
docs/design/tool-use-summary/tool-use-summary-design.md — design doc (competitive analysis, flow, key files, Ink <Static> append-only rationale, deviations, limitations, future work).

Core

packages/core/src/services/toolUseSummary.ts — system prompt, truncateJson, cleanSummary, generateToolUseSummary, createToolUseSummaryMessage factory + ToolUseSummaryMessage type.
packages/core/src/config/config.ts — getEmitToolUseSummaries() with env override.

CLI

useGeminiStream.ts — fires generation in handleCompletedTools, lifts history to a ref so the callback stays stable.
HistoryItemDisplay.tsx — renders tool_use_summary as a standalone inline ● label line (gated on !compactMode).
MainContent.tsx — callId → summary lookup, passes compactLabel down to HistoryItemDisplay for the compact-mode header replacement.
ToolGroupMessage / CompactToolGroupDisplay — propagate and render the compact-mode label.
mergeCompactToolGroups.ts — treat tool_use_summary as hidden-in-compact.
settingsSchema.ts — experimental.emitToolUseSummaries, default true, visible in settings dialog.

Test plan

toolUseSummary.test.ts — 27 tests: truncation, cleaning (including CJK, error-message rejection, length cap), UUID + timestamp, model invocation + prompt contents, abort handling, JSON serialization edge cases.
useGeminiStream.test.tsx — 4 new integration tests: disabled gate / missing fast model / success path / empty model result, each asserting on whether addItem saw a tool_use_summary payload.
CompactToolGroupDisplay.test.tsx — 5 tests for the label-vs-default rendering branches (plus 3 pre-existing timeout tests from feat(cli): combine elapsed + timeout in shell time indicator #3512).
HistoryItemDisplay.test.tsx — new test verifying tool_use_summary renders as a dim ● <label> line in full mode.
mergeCompactToolGroups.test.ts — 1 new case: two batches separated by a tool_use_summary still merge into one group.
Full core suite: 6036 tests pass.
Full CLI suite: 4384 tests pass.
tsc --noEmit on both packages: clean.
Live tmux verification in full + compact mode against a real fastModel (dashscope gpt-5.4) — transcripts above.

CodeQL

cleanSummary's quote-strip regex was flagged as polynomial ReDoS because the input is LLM-produced. Fix in 2nd commit bounds the quantifier to {1,10} (ten opening/closing quotes is well past anything a real label produces); alert auto-closed as fixed.

After each tool batch completes, fire a parallel fast-model call to generate a short git-commit-subject-style label summarizing what the batch accomplished (e.g. "Read txt files", "Searched in auth/"). In compact mode the label replaces the generic "Tool × N" header so N parallel tool calls collapse to a single semantic row. The fast-model call (~1s) runs fire-and-forget, overlapped with the next turn's API stream, so there is no perceived latency. Missing fast model, aborted turns, and model failures all degrade silently to the existing rendering. The summary is also emitted as a `tool_use_summary` history entry with `precedingToolUseIds`, keeping the shape compatible with SDK clients that want to render collapsed tool views on their own. Gated by `experimental.emitToolUseSummaries` (default on). Can be overridden per-session with `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0|1`. The system prompt and truncation rules (300 chars per tool field, 200 chars of trailing assistant text as intent prefix) match the existing behavior seen in other tools that emit the same message type, so SDK consumers see a consistent shape across clients.

CodeQL js/polynomial-redos flagged the /^["'`]+|["'`]+$/g pattern in cleanSummary because its input comes from an LLM (treated as uncontrolled). The original regex is anchored and linear in practice, but tightening the quantifier to {1,10} both satisfies the static check and caps engine work on pathological model output with a long run of quotes. Ten opening/closing quotes is well past anything a real label would produce.

…label The summary was only visible in compact mode because the full-mode ToolGroupMessage ignored the compactLabel prop. Compact mode got away with this because mergeCompactToolGroups triggers refreshStatic(), which re-renders the merged tool_group with its newly-looked-up label. Full mode has no such refresh path, so when the fast-model call resolves *after* the tool_group has been committed to the append-only <Static>, there is no way to retroactively decorate it. Switch to rendering `tool_use_summary` as its own inline history item (a single dim `● <label>` line). New items append cleanly to <Static>, so the summary flows in naturally once the fast-model call resolves. Compact mode still replaces the merged tool_group header with the label and hides the standalone summary line via the `compactMode` guard. With this, the feature works under the default `ui.compactMode: false` — not just the opt-in compact view.

wenshao

No issues found on the current PR head after re-checking the latest commits. LGTM! ✅ — gpt-5.4 via Qwen Code /review

Three new docs matching the existing fast-model feature docs layout: - docs/users/features/tool-use-summaries.md — user-facing guide covering full + compact rendering, configuration (settings + env), failure modes, cost, and cross-links to followup-suggestions. - docs/users/configuration/settings.md — register the new experimental.emitToolUseSummaries setting next to the other fast-model-driven UI settings. - docs/design/tool-use-summary/tool-use-summary-design.md — deep dive matching the compact-mode-design.md competitive-analysis style. Documents the Claude Code port (prompt, truncation, timing, gate), the deviations (settings layer, default on, cleanSummary, dual render paths), and the Ink <Static> append-only rationale that drove the inline full-mode render vs header-replacement split.

Full-mode rendering of the summary works, but for small same-type batches (Read × 3 and similar) the label visibly restates what the tool lines already show. Pairing with ui.compactMode: true folds the whole batch into a single labeled row, which is the cleanest transcript shape once the label is available. Adds a dedicated section showing the paired settings.json snippet and explicitly calling out when each mode wins (and when to turn the feature off instead).

chiga0

Thanks for the thorough PR — the design doc is great and the test coverage is strong. Compact mode really does need something better than the generic Tool × N header. I spent a few rounds looking at edge cases (subagents, single-batch vs multi-batch turns, Ctrl+C mid-flight, partial failures, merge behavior) and left specific findings inline.

Blocking (should fix before merge)

In compact mode, single-batch turns (the most common shape) never render a label. mergeCompactToolGroups does not drop trailing tool_use_summary items, so the length-delta check in MainContent doesn't fire refreshStatic, and Ink <Static> never repaints the already-committed tool_group. The design doc's claim that the existing merge refresh path covers compact mode only holds when the turn has ≥2 batches. See inline on mergeCompactToolGroups.ts / MainContent.tsx for a trace.
summarySignal becomes an orphan across turn boundaries. It captures the current turn's signal, but submitQuery() right after swaps abortControllerRef.current to a brand-new AbortController. If the user Ctrl+C's the next turn, the captured signal never aborts, the if (!summarySignal.aborted) guard passes, and the summary appears after cancellation.

Worth fixing together
3. getCompactLabel resolves to whichever batch's summary loaded first, not "first contributing batch" as the PR description claims. With fast-model jitter the merged header can visibly flip from batch B's label to batch A's once A resolves.
4. Partially failed/cancelled batches still feed cancelled / error output into the summarizer. cleanSummary only filters labels the model returns with Error: / Unable to prefixes; it can't prevent the model from generating a misleading label from poisoned input (e.g. "Attempted to read …" / "Failed to fix X").
5. Force-expand groups in compact mode (error / confirmation / user-initiated) bypass CompactToolGroupDisplay, so compactLabel is ignored; the standalone ● <label> line is also gated on !compactMode, so these groups get no label at all — arguably the highest-signal case.

Minor
6. Writing historyRef.current = history; directly in the component body is not concurrent-safe; prefer useLayoutEffect.
7. cleanSummary quote-stripping misses Unicode curly quotes (U+2018/19/1C/1D, CJK brackets) and doesn't strip markdown emphasis (**bold**, _italic_). CJK models occasionally wrap outputs in these.
8. Defaulting the feature on diverges from upstream Claude Code's env-only, default-off model. Worth double-checking that getFastModel() doesn't silently fall back to the main model — if it does, the claimed "$0.001/batch" cost profile isn't accurate.

Overall direction is good — please sanity-check #1 with a real compactMode: true + single-batch run (e.g. a single Read). Single-batch compact turns are the primary case this PR is optimizing, and as far as I can tell from a trace-walk they are not actually refreshing today.

@chiga0

Addresses multiple issues from @chiga0's review: Blocking — compact-mode label invisible for single-batch turns. mergeCompactToolGroups's adjacency-only gating left a trailing tool_use_summary in the merged result whenever there was no second batch to merge across. That pushed mergedHistory.length lock-step with history.length and MainContent's refreshStatic heuristic (currMLen <= prevMLen) never fired, so Ink's append-only <Static> never repainted the tool_group with its newly-looked-up label. Drop tool_use_summary items unconditionally now; gemini_thought still survives to avoid unnecessary repaints. New tests cover the single-batch case and the summary-before-user-message case. Blocking — stale summary appears after Ctrl+C on the next turn. summarySignal captured the CURRENT turn's AbortController, but the summary resolves during the NEXT turn's streaming window. The next turn's submitQuery allocates a fresh controller, so the captured signal was never aborted — Ctrl+C during the new turn used to let the previous turn's summary land in the transcript seconds later. Fix: dedicated per-batch AbortController tracked in a ref set, aborted eagerly from cancelOngoingRequest; resolve-time check reads the live abort state and turnCancelledRef. High — summarizer input pollution. geminiTools contained error/cancelled tools; retry-loop warnings and "Cancelled by user" strings were feeding the fast model. cleanSummary can only reject error-shaped output, not prevent the model from hallucinating a plausible label from bad input (the PR's own tmux screenshot showed "Read txt files · 5 tools" where 4 of the 5 were prior-retry failures). Filter to status === 'success' before building the prompt; skip the call entirely if nothing's left. High — unstable label on merged groups. getCompactLabel iterated all callIds and returned the first hit, so asynchronous resolution order made the header visibly flip from SB to SA when batch A resolved after batch B. Lock onto item.tools[0].callId to keep stable "leading batch governs" semantics. High — force-expanded groups in compact mode had no label at all. Compact mode routes non-force-expand groups through CompactToolGroupDisplay (consumes compactLabel) and force-expand groups through the full ToolGroupMessage (ignores compactLabel); the standalone ● line was gated on !compactMode, creating a dead zone — exactly the diagnostically valuable case. MainContent now computes absorbedCallIds (which groups actually consume the header replacement) and passes summaryAbsorbed to HistoryItemDisplay; force-expand groups in compact mode get the standalone line as the label's only path to the screen. Medium — cleanSummary robustness. Extend quote-strip to Unicode curly + CJK corner brackets; strip markdown emphasis (**bold**, _italic_); broaden refusal-prefix rejection to curly-apostrophe "I can't", Chinese "我无法 / 我不能 / 抱歉 / 无法", and "Failed to / Sorry, / Request failed". 7 new cleanSummary tests cover the added cases. Low — concurrent-rendering safety. Move historyRef.current = history from render phase into useLayoutEffect so bailed renders can't leave a dropped value. Low — CompactToolGroupDisplay readability. Extract renderSummaryHeader / renderDefaultHeader helpers and document the toolCalls.length > 1 count-suffix guard so a future "fix" to >= 1 doesn't reintroduce "Read config.json · 1 tools". Docs — add Scope & Lifecycle section to tool-use-summaries.md covering (1) one generation per batch shared by both modes, (2) no backfill on toggle / session resume, (3) main-agent batches only with the Task-tool clarification.

Critical — force-expand groups lost their summary entirely. Previous round's "drop tool_use_summary unconditionally" merge fix also stripped summaries for force-expanded groups, defeating the exact case (errors, confirmations, focused shell) where the standalone ● label is the label's only path to the screen. The merge function now takes an absorbedCallIds set: summaries whose preceding callIds are all absorbed by a compact tool_group header are dropped (so refreshStatic still fires), but force-expanded summaries pass through to be rendered standalone by HistoryItemDisplay. MainContent computes absorbedCallIds from raw history and passes it in. New tests cover both the absorbed-drop and the force-expand-preserve cases plus the empty-set default for callers that don't compute absorption. Suggestion — late-arriving summaries could land out of order. A slow fast-model call could resolve after the next turn's content was committed, planting the ● label between later items in full mode. The resolve callback now captures the first batch callId, locates the corresponding tool_group at resolve time, and drops the summary if a newer tool_group has already appeared in history. New test exercises this with a manually-resolved fast-model promise. Suggestion — truncateJson allocated full JSON for large strings. A 10MB ReadFile result was being JSON.stringify'd in full only to be sliced down to 300 chars. Added preTruncate that walks the value (depth-bounded to 4) and slices string leaves to maxLength before serialization. Tests verify the input never reaches its full pre-cap form. Suggestion — settings description over-claimed SDK emission. The description said summaries are emitted to SDK clients as a tool_use_summary message; the SDK plumbing isn't actually wired in this PR (the factory is exported for follow-up). Updated settings.json description and regenerated the vscode schema to state CLI-only scope explicitly. Suggestion — fastModel data-boundary not documented. When fastModel uses a different provider than the main session model, tool inputs/outputs cross a new auth boundary that users may not expect. Added "Data flow & privacy" section to the user feature doc spelling out: same-provider fast model = no scope change; different-provider = strictly larger sharing scope; two escape hatches (same-provider fast model OR feature off). Code-level mitigation (metadata-only mode) deferred.

chiga0

LGTM ✅ — 两轮review的所有blocking/critical问题都已在 93f627e 和 b86eee5 中得到妥善解决，已经在最新HEAD上核对了关键代码：

已验证修复

mergeCompactToolGroups 通过 absorbedCallIds 参数区分处理：被compact header吸收的summary会被drop以触发refreshStatic，force-expand组的summary会passthrough让HistoryItemDisplay渲染standalone ● <label> 行——之前round 2提出的critical问题已正确修复，没有重新引入round 1的bug。
useGeminiStream.ts 的per-batch AbortController（追踪在 summaryAbortRefsRef Set中，由 cancelOngoingRequest eagerly abort）+ resolve时三重cancel检查（turnCancelledRef / abortControllerRef.current?.signal.aborted / summaryAbort.signal.aborted）+ 通过anchorCallId的stale-summary检查——跨turn Ctrl+C竞态和乱序落地问题都解决了。
toolUseSummary.ts 的 preTruncate 在JSON序列化前对字符串叶节点做深度4的预截断——10MB ReadFile结果不会再被完整stringify后才discard。
cleanSummary 扩展了Unicode curly/CJK引号、markdown emphasis剥离，以及中文refusal前缀（我无法/我不能/抱歉/无法）。

CI状态
全绿（CodeQL + Lint + 9个 Test matrix 全过）。

剩余权衡（非blocking）

默认 default: true 与上游Claude Code env-only/默认off不同，但已确认 getFastModel() 在未配置时返回 undefined 让feature完全跳过，零成本——可以接受。
Data-boundary（fastModel跨provider）以文档形式记录而非code-level redaction，"metadata-only mode"已显式deferred到后续PR——也合理，作为设置项可由用户主动关闭。

设计文档、用户文档、Scope & Lifecycle章节、tmux端到端验证都很到位。整体工程质量高，可以合并。

@chiga0

…wenLM#3538) * feat(cli,core): generate tool-use summaries for compact mode After each tool batch completes, fire a parallel fast-model call to generate a short git-commit-subject-style label summarizing what the batch accomplished (e.g. "Read txt files", "Searched in auth/"). In compact mode the label replaces the generic "Tool × N" header so N parallel tool calls collapse to a single semantic row. The fast-model call (~1s) runs fire-and-forget, overlapped with the next turn's API stream, so there is no perceived latency. Missing fast model, aborted turns, and model failures all degrade silently to the existing rendering. The summary is also emitted as a `tool_use_summary` history entry with `precedingToolUseIds`, keeping the shape compatible with SDK clients that want to render collapsed tool views on their own. Gated by `experimental.emitToolUseSummaries` (default on). Can be overridden per-session with `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0|1`. The system prompt and truncation rules (300 chars per tool field, 200 chars of trailing assistant text as intent prefix) match the existing behavior seen in other tools that emit the same message type, so SDK consumers see a consistent shape across clients. * fix(core): bound cleanSummary quote-strip regex to avoid ReDoS CodeQL js/polynomial-redos flagged the /^["'`]+|["'`]+$/g pattern in cleanSummary because its input comes from an LLM (treated as uncontrolled). The original regex is anchored and linear in practice, but tightening the quantifier to {1,10} both satisfies the static check and caps engine work on pathological model output with a long run of quotes. Ten opening/closing quotes is well past anything a real label would produce. * fix(cli): render tool_use_summary inline so full mode also shows the label The summary was only visible in compact mode because the full-mode ToolGroupMessage ignored the compactLabel prop. Compact mode got away with this because mergeCompactToolGroups triggers refreshStatic(), which re-renders the merged tool_group with its newly-looked-up label. Full mode has no such refresh path, so when the fast-model call resolves *after* the tool_group has been committed to the append-only <Static>, there is no way to retroactively decorate it. Switch to rendering `tool_use_summary` as its own inline history item (a single dim `● <label>` line). New items append cleanly to <Static>, so the summary flows in naturally once the fast-model call resolves. Compact mode still replaces the merged tool_group header with the label and hides the standalone summary line via the `compactMode` guard. With this, the feature works under the default `ui.compactMode: false` — not just the opt-in compact view. * docs: tool-use-summaries feature guide, settings entry, and design doc Three new docs matching the existing fast-model feature docs layout: - docs/users/features/tool-use-summaries.md — user-facing guide covering full + compact rendering, configuration (settings + env), failure modes, cost, and cross-links to followup-suggestions. - docs/users/configuration/settings.md — register the new experimental.emitToolUseSummaries setting next to the other fast-model-driven UI settings. - docs/design/tool-use-summary/tool-use-summary-design.md — deep dive matching the compact-mode-design.md competitive-analysis style. Documents the Claude Code port (prompt, truncation, timing, gate), the deviations (settings layer, default on, cleanSummary, dual render paths), and the Ink <Static> append-only rationale that drove the inline full-mode render vs header-replacement split. * docs: add Recommended pairing section to tool-use-summaries Full-mode rendering of the summary works, but for small same-type batches (Read × 3 and similar) the label visibly restates what the tool lines already show. Pairing with ui.compactMode: true folds the whole batch into a single labeled row, which is the cleanest transcript shape once the label is available. Adds a dedicated section showing the paired settings.json snippet and explicitly calling out when each mode wins (and when to turn the feature off instead). * fix: address review feedback on tool-use summary generation Addresses multiple issues from @chiga0's review: Blocking — compact-mode label invisible for single-batch turns. mergeCompactToolGroups's adjacency-only gating left a trailing tool_use_summary in the merged result whenever there was no second batch to merge across. That pushed mergedHistory.length lock-step with history.length and MainContent's refreshStatic heuristic (currMLen <= prevMLen) never fired, so Ink's append-only <Static> never repainted the tool_group with its newly-looked-up label. Drop tool_use_summary items unconditionally now; gemini_thought still survives to avoid unnecessary repaints. New tests cover the single-batch case and the summary-before-user-message case. Blocking — stale summary appears after Ctrl+C on the next turn. summarySignal captured the CURRENT turn's AbortController, but the summary resolves during the NEXT turn's streaming window. The next turn's submitQuery allocates a fresh controller, so the captured signal was never aborted — Ctrl+C during the new turn used to let the previous turn's summary land in the transcript seconds later. Fix: dedicated per-batch AbortController tracked in a ref set, aborted eagerly from cancelOngoingRequest; resolve-time check reads the live abort state and turnCancelledRef. High — summarizer input pollution. geminiTools contained error/cancelled tools; retry-loop warnings and "Cancelled by user" strings were feeding the fast model. cleanSummary can only reject error-shaped output, not prevent the model from hallucinating a plausible label from bad input (the PR's own tmux screenshot showed "Read txt files · 5 tools" where 4 of the 5 were prior-retry failures). Filter to status === 'success' before building the prompt; skip the call entirely if nothing's left. High — unstable label on merged groups. getCompactLabel iterated all callIds and returned the first hit, so asynchronous resolution order made the header visibly flip from SB to SA when batch A resolved after batch B. Lock onto item.tools[0].callId to keep stable "leading batch governs" semantics. High — force-expanded groups in compact mode had no label at all. Compact mode routes non-force-expand groups through CompactToolGroupDisplay (consumes compactLabel) and force-expand groups through the full ToolGroupMessage (ignores compactLabel); the standalone ● line was gated on !compactMode, creating a dead zone — exactly the diagnostically valuable case. MainContent now computes absorbedCallIds (which groups actually consume the header replacement) and passes summaryAbsorbed to HistoryItemDisplay; force-expand groups in compact mode get the standalone line as the label's only path to the screen. Medium — cleanSummary robustness. Extend quote-strip to Unicode curly + CJK corner brackets; strip markdown emphasis (**bold**, _italic_); broaden refusal-prefix rejection to curly-apostrophe "I can't", Chinese "我无法 / 我不能 / 抱歉 / 无法", and "Failed to / Sorry, / Request failed". 7 new cleanSummary tests cover the added cases. Low — concurrent-rendering safety. Move historyRef.current = history from render phase into useLayoutEffect so bailed renders can't leave a dropped value. Low — CompactToolGroupDisplay readability. Extract renderSummaryHeader / renderDefaultHeader helpers and document the toolCalls.length > 1 count-suffix guard so a future "fix" to >= 1 doesn't reintroduce "Read config.json · 1 tools". Docs — add Scope & Lifecycle section to tool-use-summaries.md covering (1) one generation per batch shared by both modes, (2) no backfill on toggle / session resume, (3) main-agent batches only with the Task-tool clarification. * fix: address second-round review feedback on tool-use summaries Critical — force-expand groups lost their summary entirely. Previous round's "drop tool_use_summary unconditionally" merge fix also stripped summaries for force-expanded groups, defeating the exact case (errors, confirmations, focused shell) where the standalone ● label is the label's only path to the screen. The merge function now takes an absorbedCallIds set: summaries whose preceding callIds are all absorbed by a compact tool_group header are dropped (so refreshStatic still fires), but force-expanded summaries pass through to be rendered standalone by HistoryItemDisplay. MainContent computes absorbedCallIds from raw history and passes it in. New tests cover both the absorbed-drop and the force-expand-preserve cases plus the empty-set default for callers that don't compute absorption. Suggestion — late-arriving summaries could land out of order. A slow fast-model call could resolve after the next turn's content was committed, planting the ● label between later items in full mode. The resolve callback now captures the first batch callId, locates the corresponding tool_group at resolve time, and drops the summary if a newer tool_group has already appeared in history. New test exercises this with a manually-resolved fast-model promise. Suggestion — truncateJson allocated full JSON for large strings. A 10MB ReadFile result was being JSON.stringify'd in full only to be sliced down to 300 chars. Added preTruncate that walks the value (depth-bounded to 4) and slices string leaves to maxLength before serialization. Tests verify the input never reaches its full pre-cap form. Suggestion — settings description over-claimed SDK emission. The description said summaries are emitted to SDK clients as a tool_use_summary message; the SDK plumbing isn't actually wired in this PR (the factory is exported for follow-up). Updated settings.json description and regenerated the vscode schema to state CLI-only scope explicitly. Suggestion — fastModel data-boundary not documented. When fastModel uses a different provider than the main session model, tool inputs/outputs cross a new auth boundary that users may not expect. Added "Data flow & privacy" section to the user feature doc spelling out: same-provider fast model = no scope change; different-provider = strictly larger sharing scope; two escape hatches (same-provider fast model OR feature off). Code-level mitigation (metadata-only mode) deferred.

wenshao force-pushed the feat/tool-use-summary branch from 2c98a6a to 6ffeb20 Compare April 23, 2026 01:51

github-advanced-security AI found potential problems Apr 23, 2026

View reviewed changes

Comment thread packages/core/src/services/toolUseSummary.ts Fixed

wenshao added 2 commits April 23, 2026 10:02

wenshao changed the title ~~feat(cli,core): generate tool-use summaries for compact mode~~ feat(cli,core): LLM-generated summary labels for tool-call batches Apr 23, 2026

wenshao commented Apr 23, 2026

View reviewed changes

wenshao added 2 commits April 23, 2026 10:52

wenshao requested review from chiga0 and tanzhenxin April 23, 2026 03:18

tanzhenxin assigned chiga0 Apr 23, 2026

chiga0 reviewed Apr 24, 2026

View reviewed changes

wenshao requested a review from chiga0 April 24, 2026 02:57

wenshao commented Apr 24, 2026

View reviewed changes

github-actions Bot mentioned this pull request Apr 25, 2026

📊 AI CLI 工具社区动态日报 2026-04-25 gsscsd/big_model_radar#240

Open

BingqingLyu mentioned this pull request Apr 27, 2026

feat(cli,core): LLM-generated summary labels for tool-call batches BingqingLyu/qwen-code#106

Open

9 tasks

chiga0 approved these changes Apr 27, 2026

View reviewed changes

wenshao merged commit f420742 into QwenLM:main Apr 27, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli,core): LLM-generated summary labels for tool-call batches#3538

feat(cli,core): LLM-generated summary labels for tool-call batches#3538
wenshao merged 7 commits into
QwenLM:mainfrom
wenshao:feat/tool-use-summary

wenshao commented Apr 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

wenshao left a comment

Uh oh!

chiga0 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chiga0 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wenshao commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

How

When the label actually helps

Configuration

Failure modes (all silent)

Cost / compat

Not in scope

Live tmux verification

Setup (both runs)

Full mode (default — ui.compactMode: false)

Compact mode (ui.compactMode: true, or toggle with Ctrl+O)

Files

Test plan

CodeQL

Uh oh!

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

chiga0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chiga0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wenshao commented Apr 23, 2026 •

edited

Loading

Full mode (default — `ui.compactMode: false`)

Compact mode (`ui.compactMode: true`, or toggle with Ctrl+O)