Skip to content

feat(cli,core): LLM-generated summary labels for tool-call batches#3538

Merged
wenshao merged 7 commits into
QwenLM:mainfrom
wenshao:feat/tool-use-summary
Apr 27, 2026
Merged

feat(cli,core): LLM-generated summary labels for tool-call batches#3538
wenshao merged 7 commits into
QwenLM:mainfrom
wenshao:feat/tool-use-summary

Conversation

@wenshao

@wenshao wenshao commented Apr 23, 2026

Copy link
Copy Markdown
Collaborator

Why

When the model fans out into parallel tool calls, today's UI shows only mechanical information — which tool ran, with what argument, whether it succeeded. There is no synthesis of why the batch was run.

  • Compact mode (Ctrl+O) — the group collapses to a single Tool × N header plus the last tool's description. For Grep × 3 + Read × 2 you only see the tail tool's name; the intent is lost entirely. This is the primary target of the PR.
  • Narrow / SDK UIs (mobile, sidebars, pending SDK consumers) — the one-line cell is the whole signal, and the generic header says almost nothing. Same motivation as above.
  • Full mode (ui.compactMode: false, the default) — individual tool lines are already visible, so the improvement is scenario-dependent: significant for batches that are large or heterogeneous, marginal for small same-type batches where the tool names already tell the whole story. See When the label actually helps for the honest breakdown. Full-mode rendering exists so the feature does not silently disappear for users who leave compact mode off.

A short semantic label after each batch — "Read 4 text files", "Searched in auth/", "Fixed NPE in UserService" — closes the gap where it actually exists without hiding the tool details.

What

After each tool batch finalizes, fire a fast-model call that returns a short git-commit-subject-style label. Both full and compact modes now surface the label so the feature works under the default ui.compactMode: false:

Full mode (default) — label appears inline below the tool group:

╭──────────────────────────────────────────────╮
│ ✓  ReadFile a.txt (pages 1)                  │
│ ✓  ReadFile b.txt (pages 1)                  │
│ ✓  ReadFile c.txt (pages 1)                  │
│ ✓  ReadFile d.txt (pages 1)                  │
╰──────────────────────────────────────────────╯

 ● Read 4 text files

Compact mode — label replaces the generic Tool × N header:

╭──────────────────────────────────────────────╮
│✓  Read txt files  · 4 tools                  │
│Press Ctrl+O to show full tool output         │
╰──────────────────────────────────────────────╯

How

  • Generation runs fire-and-forget from handleCompletedTools in useGeminiStream, keyed on the turn's abort signal. It overlaps with the next turn's API stream so the ~1s fast-model roundtrip adds no perceived latency.
  • On resolve, a tool_use_summary history item is added with the label and the batch's precedingToolUseIds. The history item appends cleanly to Ink's append-only <Static>, so the label shows up even though the tool_group itself is already frozen.
  • In full mode, HistoryItemDisplay renders the tool_use_summary as its own dim ● <label> line — natural flow below the tool group.
  • In compact mode, MainContent builds a callId → summary lookup and passes the label to CompactToolGroupDisplay as a compactLabel prop. The component renders Summary · N tools in place of the default header. The standalone summary line is hidden in compact mode to avoid duplication.
  • mergeCompactToolGroups treats tool_use_summary as hidden-in-compact so two consecutive batches still merge across it; the merged group takes the first contributing batch's label.
  • Force-expand groups (errors, confirmations, focused shell, user-initiated) bypass the compact path entirely — same as before — and their inline summary line still renders in full mode.

When the label actually helps

The benefit is not uniform across all batches — being honest about where this earns its keep:

Batch shape Full-mode benefit Compact-mode benefit
2–4 same-type calls (e.g. Read × 4) Minimal — label restates what the visible tool lines already convey Meaningful — replaces the generic Tool × N header with semantic intent
5–9 same-type calls Mild — saves some scanning Meaningful
10+ calls Meaningful — intent synthesis avoids reading every line High
Heterogeneous (Grep + Read + Edit + Bash) High — no single tool name implies the collective intent High
Historical scrollback (transcript navigation) High — dim labels act as section headers when reviewing a session High

The Read × 4 example used in this PR's screenshots and the tmux transcript is deliberately in the low-benefit cell — the label ("Read 4 text files") is almost redundant with the individual tool lines. The point of showing it there was to prove the plumbing works end-to-end under a reproducible setup, not to argue that it is especially valuable for that specific shape. Users whose workflows are dominated by small same-type batches and who are running in full mode may want to set experimental.emitToolUseSummaries: false to avoid the per-batch cost — the escape hatch is part of the design.

Compact mode (and anything downstream of the SDK) is where this PR pays off most consistently; the full-mode inline render exists so the feature does not silently disappear when a user toggles ui.compactMode: false (the default), not because it's revelatory for every batch.

Configuration

Lever Default Effect
experimental.emitToolUseSummaries setting true Turn off if the extra fast-model call is unwanted
QWEN_CODE_EMIT_TOOL_USE_SUMMARIES env var unset =0 forces off, =1 forces on, overrides settings
fastModel setting unset Required; without a fast model, generation is skipped and the UI falls back to no label with no cost impact

Compared to the upstream inspiration, which only exposes a same-named env gate (no settings layer, default off), this PR adds a persistent settings toggle and defaults on — rationale and trade-offs covered in the design doc (see Deviations).

Failure modes (all silent)

  • No fastModel configured → skipped, no history item added.
  • Aborted turn → the generation promise drops on the floor; the summarySignal.aborted check prevents stale addItem.
  • API error / empty model output / rejected label (prefixed Error:, I cannot …, etc. — filtered by cleanSummary) → no history item, UI shows the default view.

Cost / compat

  • One fast-model call per qualifying tool batch. Prompt is ~300 input tokens × (number of tools) capped at 300 chars/field, output is ~20 tokens. At typical fast-model pricing, roughly $0.001 per batch.
  • tool_use_summary items are not persisted via ChatRecordingService; resuming a session loses labels but tool groups render normally (no user-visible difference beyond label absence).
  • Non-interactive CLI paths (runNonInteractive) are unchanged. The trigger lives in the interactive stream hook only.
  • No breaking changes. Existing users see labels appear automatically if they have a fastModel set; others see the current behavior unchanged.

Not in scope

The message factory createToolUseSummaryMessage and ToolUseSummaryMessage type are exported from core so a future PR can wire them into the SDK stream / non-interactive output. This PR only consumes the generated summary in the interactive CLI.


Live tmux verification

Reproducible end-to-end proof against a real fastModel. Both runs use the same 4 scratch files; only ui.compactMode differs.

Setup (both runs)

mkdir -p /tmp/qwen-summary-test
cd /tmp/qwen-summary-test
printf 'line one\nline two\n'                 > a.txt
printf 'hello from b\n'                       > b.txt
printf 'contents of c\n'                      > c.txt
printf 'd has four lines\nfoo\nbar\nbaz\n'    > d.txt

# project-scoped settings (global settings.json provides fastModel: gpt-5.4)
mkdir -p .qwen
cat > .qwen/settings.json <<JSON
{ "ui": { "compactMode": false },
  "experimental": { "emitToolUseSummaries": true } }
JSON

# launch inside a tmux pane
tmux new-session -d -s qwentest -x 220 -y 50 -c /tmp/qwen-summary-test
tmux send-keys -t qwentest "node /path/to/qwen-code/packages/cli" Enter
# (wait for banner, then send the prompt and an Enter):
tmux send-keys -t qwentest "Read a.txt b.txt c.txt d.txt using pages=1 then summarize in one sentence." Enter
tmux send-keys -t qwentest Enter

Full mode (default — ui.compactMode: false)

Raw tmux capture-pane -p output, turn end state:

  > Read a.txt b.txt c.txt d.txt using pages=1 then summarize in one sentence.

  ✦  think I need to figure out how to handle file reading. For a trivial single task, it might not be
    necessary to use todo. Now, I want to read absolute files, and I'm wondering if I can use the pages
    argument for PDFs, although I might face errors or it could be ignored. The user mentioned using
    pages = 1, so I should comply, just in case. Plus, I need to read four files, maybe in parallel.


  ╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
  │ ✓  ReadFile a.txt (pages 1)                                                                          │
  │                                                                                                      │
  │ ✓  ReadFile b.txt (pages 1)                                                                          │
  │                                                                                                      │
  │ ✓  ReadFile c.txt (pages 1)                                                                          │
  │                                                                                                      │
  │ ✓  ReadFile d.txt (pages 1)                                                                          │
  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯

   ● Read 4 text files

  ✦ 四个文件分别包含一些简短文本:a.txt 有两行普通内容,b.txt 和 c.txt 各有一行说明文字,d.txt 则有
    四行以 "d has four lines" 开头的内容。

Key signal: the dim ● Read 4 text files line below the tool group — emitted after tools complete, flowed in naturally once the fast-model call resolved. This is the label reviewers should look for.

Compact mode (ui.compactMode: true, or toggle with Ctrl+O)

Same prompt, compactMode: true. Raw capture (the earlier retry batch stays force-expanded because it contains errors; the successful retry collapses to a single compact row):

  ╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
  │ x  ReadFile {"file_path":".../a.txt", ... , "pages":""}                                              │
  │    Invalid pages parameter: ''. Use formats like '5' or '1-10'.                                      │
  │ x  ReadFile {... b/c/d ...}                                                                          │
  │    ⚠️ RETRY LOOP DETECTED: ...                                                                        │
  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯

  ╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
  │✓  Read txt files  · 5 tools                                                                          │
  │Press Ctrl+O to show full tool output                                                                 │
  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯

  ✦ 四个文件分别包含:a.txt 的两行示例文本、b.txt 的一行问候语、c.txt 的一行内容说明,以及 d.txt 的
    四行简单占位文本。

Key signals:

  • The second box's header is the generated label Read txt files, not the generic ReadFile × 5.
  • Error group is untouched (force-expand bypass — same as before).
  • 5 tools rather than 4 because mergeCompactToolGroups folded the earlier first attempt in; that's expected merge behavior and unaffected by this PR.

Files

Docs

  • docs/users/features/tool-use-summaries.md — user-facing guide (rendering, triggers, fast-model requirement, cost, failure modes).
  • docs/users/configuration/settings.md — register experimental.emitToolUseSummaries alongside other fast-model-driven settings.
  • docs/users/features/_meta.ts — add the new feature page to the sidebar.
  • docs/design/tool-use-summary/tool-use-summary-design.md — design doc (competitive analysis, flow, key files, Ink <Static> append-only rationale, deviations, limitations, future work).

Core

  • packages/core/src/services/toolUseSummary.ts — system prompt, truncateJson, cleanSummary, generateToolUseSummary, createToolUseSummaryMessage factory + ToolUseSummaryMessage type.
  • packages/core/src/config/config.tsgetEmitToolUseSummaries() with env override.

CLI

  • useGeminiStream.ts — fires generation in handleCompletedTools, lifts history to a ref so the callback stays stable.
  • HistoryItemDisplay.tsx — renders tool_use_summary as a standalone inline ● label line (gated on !compactMode).
  • MainContent.tsxcallId → summary lookup, passes compactLabel down to HistoryItemDisplay for the compact-mode header replacement.
  • ToolGroupMessage / CompactToolGroupDisplay — propagate and render the compact-mode label.
  • mergeCompactToolGroups.ts — treat tool_use_summary as hidden-in-compact.
  • settingsSchema.tsexperimental.emitToolUseSummaries, default true, visible in settings dialog.

Test plan

  • toolUseSummary.test.ts — 27 tests: truncation, cleaning (including CJK, error-message rejection, length cap), UUID + timestamp, model invocation + prompt contents, abort handling, JSON serialization edge cases.
  • useGeminiStream.test.tsx — 4 new integration tests: disabled gate / missing fast model / success path / empty model result, each asserting on whether addItem saw a tool_use_summary payload.
  • CompactToolGroupDisplay.test.tsx — 5 tests for the label-vs-default rendering branches (plus 3 pre-existing timeout tests from feat(cli): combine elapsed + timeout in shell time indicator #3512).
  • HistoryItemDisplay.test.tsx — new test verifying tool_use_summary renders as a dim ● <label> line in full mode.
  • mergeCompactToolGroups.test.ts — 1 new case: two batches separated by a tool_use_summary still merge into one group.
  • Full core suite: 6036 tests pass.
  • Full CLI suite: 4384 tests pass.
  • tsc --noEmit on both packages: clean.
  • Live tmux verification in full + compact mode against a real fastModel (dashscope gpt-5.4) — transcripts above.

CodeQL

cleanSummary's quote-strip regex was flagged as polynomial ReDoS because the input is LLM-produced. Fix in 2nd commit bounds the quantifier to {1,10} (ten opening/closing quotes is well past anything a real label produces); alert auto-closed as fixed.

After each tool batch completes, fire a parallel fast-model call to
generate a short git-commit-subject-style label summarizing what the
batch accomplished (e.g. "Read txt files", "Searched in auth/"). In
compact mode the label replaces the generic "Tool × N" header so N
parallel tool calls collapse to a single semantic row.

The fast-model call (~1s) runs fire-and-forget, overlapped with the
next turn's API stream, so there is no perceived latency. Missing
fast model, aborted turns, and model failures all degrade silently to
the existing rendering.

The summary is also emitted as a `tool_use_summary` history entry
with `precedingToolUseIds`, keeping the shape compatible with SDK
clients that want to render collapsed tool views on their own.

Gated by `experimental.emitToolUseSummaries` (default on). Can be
overridden per-session with `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0|1`.

The system prompt and truncation rules (300 chars per tool field,
200 chars of trailing assistant text as intent prefix) match the
existing behavior seen in other tools that emit the same message
type, so SDK consumers see a consistent shape across clients.
@wenshao wenshao force-pushed the feat/tool-use-summary branch from 2c98a6a to 6ffeb20 Compare April 23, 2026 01:51
Comment thread packages/core/src/services/toolUseSummary.ts Fixed
wenshao added 2 commits April 23, 2026 10:02
CodeQL js/polynomial-redos flagged the /^["'`]+|["'`]+$/g pattern in
cleanSummary because its input comes from an LLM (treated as
uncontrolled). The original regex is anchored and linear in practice,
but tightening the quantifier to {1,10} both satisfies the static
check and caps engine work on pathological model output with a long
run of quotes. Ten opening/closing quotes is well past anything a real
label would produce.
…label

The summary was only visible in compact mode because the full-mode
ToolGroupMessage ignored the compactLabel prop. Compact mode got away
with this because mergeCompactToolGroups triggers refreshStatic(),
which re-renders the merged tool_group with its newly-looked-up
label. Full mode has no such refresh path, so when the fast-model
call resolves *after* the tool_group has been committed to the
append-only <Static>, there is no way to retroactively decorate it.

Switch to rendering `tool_use_summary` as its own inline history item
(a single dim `● <label>` line). New items append cleanly to <Static>,
so the summary flows in naturally once the fast-model call resolves.
Compact mode still replaces the merged tool_group header with the
label and hides the standalone summary line via the `compactMode`
guard.

With this, the feature works under the default `ui.compactMode: false`
— not just the opt-in compact view.
@wenshao wenshao changed the title feat(cli,core): generate tool-use summaries for compact mode feat(cli,core): LLM-generated summary labels for tool-call batches Apr 23, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found on the current PR head after re-checking the latest commits. LGTM! ✅ — gpt-5.4 via Qwen Code /review

wenshao added 2 commits April 23, 2026 10:52
Three new docs matching the existing fast-model feature docs layout:

- docs/users/features/tool-use-summaries.md — user-facing guide
  covering full + compact rendering, configuration (settings + env),
  failure modes, cost, and cross-links to followup-suggestions.

- docs/users/configuration/settings.md — register the new
  experimental.emitToolUseSummaries setting next to the other
  fast-model-driven UI settings.

- docs/design/tool-use-summary/tool-use-summary-design.md — deep dive
  matching the compact-mode-design.md competitive-analysis style.
  Documents the Claude Code port (prompt, truncation, timing, gate),
  the deviations (settings layer, default on, cleanSummary, dual
  render paths), and the Ink <Static> append-only rationale that
  drove the inline full-mode render vs header-replacement split.
Full-mode rendering of the summary works, but for small same-type
batches (Read × 3 and similar) the label visibly restates what the
tool lines already show. Pairing with ui.compactMode: true folds
the whole batch into a single labeled row, which is the cleanest
transcript shape once the label is available.

Adds a dedicated section showing the paired settings.json snippet
and explicitly calling out when each mode wins (and when to turn
the feature off instead).

@chiga0 chiga0 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough PR — the design doc is great and the test coverage is strong. Compact mode really does need something better than the generic Tool × N header. I spent a few rounds looking at edge cases (subagents, single-batch vs multi-batch turns, Ctrl+C mid-flight, partial failures, merge behavior) and left specific findings inline.

Blocking (should fix before merge)

  1. In compact mode, single-batch turns (the most common shape) never render a label. mergeCompactToolGroups does not drop trailing tool_use_summary items, so the length-delta check in MainContent doesn't fire refreshStatic, and Ink <Static> never repaints the already-committed tool_group. The design doc's claim that the existing merge refresh path covers compact mode only holds when the turn has ≥2 batches. See inline on mergeCompactToolGroups.ts / MainContent.tsx for a trace.
  2. summarySignal becomes an orphan across turn boundaries. It captures the current turn's signal, but submitQuery() right after swaps abortControllerRef.current to a brand-new AbortController. If the user Ctrl+C's the next turn, the captured signal never aborts, the if (!summarySignal.aborted) guard passes, and the summary appears after cancellation.

Worth fixing together
3. getCompactLabel resolves to whichever batch's summary loaded first, not "first contributing batch" as the PR description claims. With fast-model jitter the merged header can visibly flip from batch B's label to batch A's once A resolves.
4. Partially failed/cancelled batches still feed cancelled / error output into the summarizer. cleanSummary only filters labels the model returns with Error: / Unable to prefixes; it can't prevent the model from generating a misleading label from poisoned input (e.g. "Attempted to read …" / "Failed to fix X").
5. Force-expand groups in compact mode (error / confirmation / user-initiated) bypass CompactToolGroupDisplay, so compactLabel is ignored; the standalone ● <label> line is also gated on !compactMode, so these groups get no label at all — arguably the highest-signal case.

Minor
6. Writing historyRef.current = history; directly in the component body is not concurrent-safe; prefer useLayoutEffect.
7. cleanSummary quote-stripping misses Unicode curly quotes (U+2018/19/1C/1D, CJK brackets) and doesn't strip markdown emphasis (**bold**, _italic_). CJK models occasionally wrap outputs in these.
8. Defaulting the feature on diverges from upstream Claude Code's env-only, default-off model. Worth double-checking that getFastModel() doesn't silently fall back to the main model — if it does, the claimed "$0.001/batch" cost profile isn't accurate.

Overall direction is good — please sanity-check #1 with a real compactMode: true + single-batch run (e.g. a single Read). Single-batch compact turns are the primary case this PR is optimizing, and as far as I can tell from a trace-walk they are not actually refreshing today.

Comment thread packages/cli/src/ui/utils/mergeCompactToolGroups.ts
Comment thread packages/cli/src/ui/hooks/useGeminiStream.ts Outdated
Comment thread packages/cli/src/ui/hooks/useGeminiStream.ts Outdated
Comment thread packages/cli/src/ui/hooks/useGeminiStream.ts Outdated
Comment thread packages/cli/src/ui/components/MainContent.tsx Outdated
Comment thread packages/core/src/services/toolUseSummary.ts Outdated
Comment thread packages/core/src/services/toolUseSummary.ts Outdated
Comment thread packages/cli/src/config/settingsSchema.ts
Comment thread packages/cli/src/ui/components/messages/CompactToolGroupDisplay.tsx Outdated
Comment thread packages/cli/src/ui/hooks/useGeminiStream.ts
Addresses multiple issues from @chiga0's review:

Blocking — compact-mode label invisible for single-batch turns.
mergeCompactToolGroups's adjacency-only gating left a trailing
tool_use_summary in the merged result whenever there was no second
batch to merge across. That pushed mergedHistory.length lock-step
with history.length and MainContent's refreshStatic heuristic
(currMLen <= prevMLen) never fired, so Ink's append-only <Static>
never repainted the tool_group with its newly-looked-up label.
Drop tool_use_summary items unconditionally now; gemini_thought
still survives to avoid unnecessary repaints. New tests cover
the single-batch case and the summary-before-user-message case.

Blocking — stale summary appears after Ctrl+C on the next turn.
summarySignal captured the CURRENT turn's AbortController, but the
summary resolves during the NEXT turn's streaming window. The next
turn's submitQuery allocates a fresh controller, so the captured
signal was never aborted — Ctrl+C during the new turn used to let
the previous turn's summary land in the transcript seconds later.
Fix: dedicated per-batch AbortController tracked in a ref set,
aborted eagerly from cancelOngoingRequest; resolve-time check reads
the live abort state and turnCancelledRef.

High — summarizer input pollution.
geminiTools contained error/cancelled tools; retry-loop warnings
and "Cancelled by user" strings were feeding the fast model.
cleanSummary can only reject error-shaped output, not prevent the
model from hallucinating a plausible label from bad input (the PR's
own tmux screenshot showed "Read txt files · 5 tools" where 4 of
the 5 were prior-retry failures). Filter to status === 'success'
before building the prompt; skip the call entirely if nothing's
left.

High — unstable label on merged groups.
getCompactLabel iterated all callIds and returned the first hit,
so asynchronous resolution order made the header visibly flip
from SB to SA when batch A resolved after batch B. Lock onto
item.tools[0].callId to keep stable "leading batch governs"
semantics.

High — force-expanded groups in compact mode had no label at all.
Compact mode routes non-force-expand groups through
CompactToolGroupDisplay (consumes compactLabel) and force-expand
groups through the full ToolGroupMessage (ignores compactLabel);
the standalone ● line was gated on !compactMode, creating a dead
zone — exactly the diagnostically valuable case. MainContent now
computes absorbedCallIds (which groups actually consume the
header replacement) and passes summaryAbsorbed to
HistoryItemDisplay; force-expand groups in compact mode get the
standalone line as the label's only path to the screen.

Medium — cleanSummary robustness.
Extend quote-strip to Unicode curly + CJK corner brackets; strip
markdown emphasis (**bold**, _italic_); broaden refusal-prefix
rejection to curly-apostrophe "I can't", Chinese "我无法 / 我不能 /
抱歉 / 无法", and "Failed to / Sorry, / Request failed". 7 new
cleanSummary tests cover the added cases.

Low — concurrent-rendering safety.
Move historyRef.current = history from render phase into
useLayoutEffect so bailed renders can't leave a dropped value.

Low — CompactToolGroupDisplay readability.
Extract renderSummaryHeader / renderDefaultHeader helpers and
document the toolCalls.length > 1 count-suffix guard so a future
"fix" to >= 1 doesn't reintroduce "Read config.json · 1 tools".

Docs — add Scope & Lifecycle section to tool-use-summaries.md
covering (1) one generation per batch shared by both modes,
(2) no backfill on toggle / session resume, (3) main-agent batches
only with the Task-tool clarification.
@wenshao wenshao requested a review from chiga0 April 24, 2026 02:57
Comment thread packages/cli/src/ui/utils/mergeCompactToolGroups.ts
Comment thread packages/cli/src/ui/hooks/useGeminiStream.ts
Comment thread packages/core/src/services/toolUseSummary.ts
Comment thread packages/cli/src/ui/hooks/useGeminiStream.ts
Comment thread packages/cli/src/config/settingsSchema.ts Outdated
Critical — force-expand groups lost their summary entirely.
Previous round's "drop tool_use_summary unconditionally" merge fix
also stripped summaries for force-expanded groups, defeating the
exact case (errors, confirmations, focused shell) where the
standalone ● label is the label's only path to the screen. The
merge function now takes an absorbedCallIds set: summaries whose
preceding callIds are all absorbed by a compact tool_group header
are dropped (so refreshStatic still fires), but force-expanded
summaries pass through to be rendered standalone by
HistoryItemDisplay. MainContent computes absorbedCallIds from raw
history and passes it in. New tests cover both the absorbed-drop
and the force-expand-preserve cases plus the empty-set default
for callers that don't compute absorption.

Suggestion — late-arriving summaries could land out of order.
A slow fast-model call could resolve after the next turn's
content was committed, planting the ● label between later items
in full mode. The resolve callback now captures the first batch
callId, locates the corresponding tool_group at resolve time,
and drops the summary if a newer tool_group has already appeared
in history. New test exercises this with a manually-resolved
fast-model promise.

Suggestion — truncateJson allocated full JSON for large strings.
A 10MB ReadFile result was being JSON.stringify'd in full only to
be sliced down to 300 chars. Added preTruncate that walks the
value (depth-bounded to 4) and slices string leaves to maxLength
before serialization. Tests verify the input never reaches its
full pre-cap form.

Suggestion — settings description over-claimed SDK emission.
The description said summaries are emitted to SDK clients as a
tool_use_summary message; the SDK plumbing isn't actually wired
in this PR (the factory is exported for follow-up). Updated
settings.json description and regenerated the vscode schema to
state CLI-only scope explicitly.

Suggestion — fastModel data-boundary not documented.
When fastModel uses a different provider than the main session
model, tool inputs/outputs cross a new auth boundary that users
may not expect. Added "Data flow & privacy" section to the user
feature doc spelling out: same-provider fast model = no scope
change; different-provider = strictly larger sharing scope; two
escape hatches (same-provider fast model OR feature off).
Code-level mitigation (metadata-only mode) deferred.

@chiga0 chiga0 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅ — 两轮review的所有blocking/critical问题都已在 93f627eb86eee5 中得到妥善解决,已经在最新HEAD上核对了关键代码:

已验证修复

  • mergeCompactToolGroups 通过 absorbedCallIds 参数区分处理:被compact header吸收的summary会被drop以触发refreshStatic,force-expand组的summary会passthrough让HistoryItemDisplay渲染standalone ● <label> 行——之前round 2提出的critical问题已正确修复,没有重新引入round 1的bug。
  • useGeminiStream.ts 的per-batch AbortController(追踪在 summaryAbortRefsRef Set中,由 cancelOngoingRequest eagerly abort)+ resolve时三重cancel检查(turnCancelledRef / abortControllerRef.current?.signal.aborted / summaryAbort.signal.aborted)+ 通过anchorCallId的stale-summary检查——跨turn Ctrl+C竞态和乱序落地问题都解决了。
  • toolUseSummary.tspreTruncate 在JSON序列化前对字符串叶节点做深度4的预截断——10MB ReadFile结果不会再被完整stringify后才discard。
  • cleanSummary 扩展了Unicode curly/CJK引号、markdown emphasis剥离,以及中文refusal前缀(我无法/我不能/抱歉/无法)。

CI状态
全绿(CodeQL + Lint + 9个 Test matrix 全过)。

剩余权衡(非blocking)

  • 默认 default: true 与上游Claude Code env-only/默认off不同,但已确认 getFastModel() 在未配置时返回 undefined 让feature完全跳过,零成本——可以接受。
  • Data-boundary(fastModel跨provider)以文档形式记录而非code-level redaction,"metadata-only mode"已显式deferred到后续PR——也合理,作为设置项可由用户主动关闭。

设计文档、用户文档、Scope & Lifecycle章节、tmux端到端验证都很到位。整体工程质量高,可以合并。

@wenshao wenshao merged commit f420742 into QwenLM:main Apr 27, 2026
13 checks passed
xaelistic pushed a commit to xaelistic/qwen-code that referenced this pull request Jun 7, 2026
…wenLM#3538)

* feat(cli,core): generate tool-use summaries for compact mode

After each tool batch completes, fire a parallel fast-model call to
generate a short git-commit-subject-style label summarizing what the
batch accomplished (e.g. "Read txt files", "Searched in auth/"). In
compact mode the label replaces the generic "Tool × N" header so N
parallel tool calls collapse to a single semantic row.

The fast-model call (~1s) runs fire-and-forget, overlapped with the
next turn's API stream, so there is no perceived latency. Missing
fast model, aborted turns, and model failures all degrade silently to
the existing rendering.

The summary is also emitted as a `tool_use_summary` history entry
with `precedingToolUseIds`, keeping the shape compatible with SDK
clients that want to render collapsed tool views on their own.

Gated by `experimental.emitToolUseSummaries` (default on). Can be
overridden per-session with `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0|1`.

The system prompt and truncation rules (300 chars per tool field,
200 chars of trailing assistant text as intent prefix) match the
existing behavior seen in other tools that emit the same message
type, so SDK consumers see a consistent shape across clients.

* fix(core): bound cleanSummary quote-strip regex to avoid ReDoS

CodeQL js/polynomial-redos flagged the /^["'`]+|["'`]+$/g pattern in
cleanSummary because its input comes from an LLM (treated as
uncontrolled). The original regex is anchored and linear in practice,
but tightening the quantifier to {1,10} both satisfies the static
check and caps engine work on pathological model output with a long
run of quotes. Ten opening/closing quotes is well past anything a real
label would produce.

* fix(cli): render tool_use_summary inline so full mode also shows the label

The summary was only visible in compact mode because the full-mode
ToolGroupMessage ignored the compactLabel prop. Compact mode got away
with this because mergeCompactToolGroups triggers refreshStatic(),
which re-renders the merged tool_group with its newly-looked-up
label. Full mode has no such refresh path, so when the fast-model
call resolves *after* the tool_group has been committed to the
append-only <Static>, there is no way to retroactively decorate it.

Switch to rendering `tool_use_summary` as its own inline history item
(a single dim `● <label>` line). New items append cleanly to <Static>,
so the summary flows in naturally once the fast-model call resolves.
Compact mode still replaces the merged tool_group header with the
label and hides the standalone summary line via the `compactMode`
guard.

With this, the feature works under the default `ui.compactMode: false`
— not just the opt-in compact view.

* docs: tool-use-summaries feature guide, settings entry, and design doc

Three new docs matching the existing fast-model feature docs layout:

- docs/users/features/tool-use-summaries.md — user-facing guide
  covering full + compact rendering, configuration (settings + env),
  failure modes, cost, and cross-links to followup-suggestions.

- docs/users/configuration/settings.md — register the new
  experimental.emitToolUseSummaries setting next to the other
  fast-model-driven UI settings.

- docs/design/tool-use-summary/tool-use-summary-design.md — deep dive
  matching the compact-mode-design.md competitive-analysis style.
  Documents the Claude Code port (prompt, truncation, timing, gate),
  the deviations (settings layer, default on, cleanSummary, dual
  render paths), and the Ink <Static> append-only rationale that
  drove the inline full-mode render vs header-replacement split.

* docs: add Recommended pairing section to tool-use-summaries

Full-mode rendering of the summary works, but for small same-type
batches (Read × 3 and similar) the label visibly restates what the
tool lines already show. Pairing with ui.compactMode: true folds
the whole batch into a single labeled row, which is the cleanest
transcript shape once the label is available.

Adds a dedicated section showing the paired settings.json snippet
and explicitly calling out when each mode wins (and when to turn
the feature off instead).

* fix: address review feedback on tool-use summary generation

Addresses multiple issues from @chiga0's review:

Blocking — compact-mode label invisible for single-batch turns.
mergeCompactToolGroups's adjacency-only gating left a trailing
tool_use_summary in the merged result whenever there was no second
batch to merge across. That pushed mergedHistory.length lock-step
with history.length and MainContent's refreshStatic heuristic
(currMLen <= prevMLen) never fired, so Ink's append-only <Static>
never repainted the tool_group with its newly-looked-up label.
Drop tool_use_summary items unconditionally now; gemini_thought
still survives to avoid unnecessary repaints. New tests cover
the single-batch case and the summary-before-user-message case.

Blocking — stale summary appears after Ctrl+C on the next turn.
summarySignal captured the CURRENT turn's AbortController, but the
summary resolves during the NEXT turn's streaming window. The next
turn's submitQuery allocates a fresh controller, so the captured
signal was never aborted — Ctrl+C during the new turn used to let
the previous turn's summary land in the transcript seconds later.
Fix: dedicated per-batch AbortController tracked in a ref set,
aborted eagerly from cancelOngoingRequest; resolve-time check reads
the live abort state and turnCancelledRef.

High — summarizer input pollution.
geminiTools contained error/cancelled tools; retry-loop warnings
and "Cancelled by user" strings were feeding the fast model.
cleanSummary can only reject error-shaped output, not prevent the
model from hallucinating a plausible label from bad input (the PR's
own tmux screenshot showed "Read txt files · 5 tools" where 4 of
the 5 were prior-retry failures). Filter to status === 'success'
before building the prompt; skip the call entirely if nothing's
left.

High — unstable label on merged groups.
getCompactLabel iterated all callIds and returned the first hit,
so asynchronous resolution order made the header visibly flip
from SB to SA when batch A resolved after batch B. Lock onto
item.tools[0].callId to keep stable "leading batch governs"
semantics.

High — force-expanded groups in compact mode had no label at all.
Compact mode routes non-force-expand groups through
CompactToolGroupDisplay (consumes compactLabel) and force-expand
groups through the full ToolGroupMessage (ignores compactLabel);
the standalone ● line was gated on !compactMode, creating a dead
zone — exactly the diagnostically valuable case. MainContent now
computes absorbedCallIds (which groups actually consume the
header replacement) and passes summaryAbsorbed to
HistoryItemDisplay; force-expand groups in compact mode get the
standalone line as the label's only path to the screen.

Medium — cleanSummary robustness.
Extend quote-strip to Unicode curly + CJK corner brackets; strip
markdown emphasis (**bold**, _italic_); broaden refusal-prefix
rejection to curly-apostrophe "I can't", Chinese "我无法 / 我不能 /
抱歉 / 无法", and "Failed to / Sorry, / Request failed". 7 new
cleanSummary tests cover the added cases.

Low — concurrent-rendering safety.
Move historyRef.current = history from render phase into
useLayoutEffect so bailed renders can't leave a dropped value.

Low — CompactToolGroupDisplay readability.
Extract renderSummaryHeader / renderDefaultHeader helpers and
document the toolCalls.length > 1 count-suffix guard so a future
"fix" to >= 1 doesn't reintroduce "Read config.json · 1 tools".

Docs — add Scope & Lifecycle section to tool-use-summaries.md
covering (1) one generation per batch shared by both modes,
(2) no backfill on toggle / session resume, (3) main-agent batches
only with the Task-tool clarification.

* fix: address second-round review feedback on tool-use summaries

Critical — force-expand groups lost their summary entirely.
Previous round's "drop tool_use_summary unconditionally" merge fix
also stripped summaries for force-expanded groups, defeating the
exact case (errors, confirmations, focused shell) where the
standalone ● label is the label's only path to the screen. The
merge function now takes an absorbedCallIds set: summaries whose
preceding callIds are all absorbed by a compact tool_group header
are dropped (so refreshStatic still fires), but force-expanded
summaries pass through to be rendered standalone by
HistoryItemDisplay. MainContent computes absorbedCallIds from raw
history and passes it in. New tests cover both the absorbed-drop
and the force-expand-preserve cases plus the empty-set default
for callers that don't compute absorption.

Suggestion — late-arriving summaries could land out of order.
A slow fast-model call could resolve after the next turn's
content was committed, planting the ● label between later items
in full mode. The resolve callback now captures the first batch
callId, locates the corresponding tool_group at resolve time,
and drops the summary if a newer tool_group has already appeared
in history. New test exercises this with a manually-resolved
fast-model promise.

Suggestion — truncateJson allocated full JSON for large strings.
A 10MB ReadFile result was being JSON.stringify'd in full only to
be sliced down to 300 chars. Added preTruncate that walks the
value (depth-bounded to 4) and slices string leaves to maxLength
before serialization. Tests verify the input never reaches its
full pre-cap form.

Suggestion — settings description over-claimed SDK emission.
The description said summaries are emitted to SDK clients as a
tool_use_summary message; the SDK plumbing isn't actually wired
in this PR (the factory is exported for follow-up). Updated
settings.json description and regenerated the vscode schema to
state CLI-only scope explicitly.

Suggestion — fastModel data-boundary not documented.
When fastModel uses a different provider than the main session
model, tool inputs/outputs cross a new auth boundary that users
may not expect. Added "Data flow & privacy" section to the user
feature doc spelling out: same-provider fast model = no scope
change; different-provider = strictly larger sharing scope; two
escape hatches (same-provider fast model OR feature off).
Code-level mitigation (metadata-only mode) deferred.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants