feat: add /diff command and git diff statistics utility#3491
Conversation
Port numstat + unified-diff parsing into `packages/core/src/utils/gitDiff.ts` to surface structured working-tree change summaries (files changed, lines added/removed, per-file hunks) against HEAD. Caps mirror issue #2997: 50 files, 1MB per file, 400 lines per file, with a 500-file short-circuit via `git diff --shortstat` to avoid expensive work on massive diffs. - `fetchGitDiff(cwd)` returns stats + per-file summaries (tracked + untracked). - `fetchGitDiffHunks(cwd)` returns structured hunks on demand. - `resolveGitDir(cwd)` follows `.git` file indirection so linked worktrees and submodules report the correct gitdir. - Transient-state short-circuit covers merge, cherry-pick, revert, and both `rebase-merge` / `rebase-apply` layouts. - `core.quotepath=false` is forced so non-ASCII filenames stay as UTF-8. Refs #2997
Surface the `fetchGitDiff` utility through an interactive `/diff` command. Prints a header (`N files changed, +A / -R`) followed by per-file rows with padded add/remove counts. Untracked files are marked `?`, binary files are marked `~`. When the change set exceeds the per-file cap, a trailing `…and N more` note tells the user how many entries are hidden. Returns a `MessageActionReturn` so it renders the same way in interactive and non-interactive modes.
- Wrap `fetchGitDiff` in try/catch so permission errors on `.git` surface as a friendly error message instead of crashing the action. - Declare `supportedModes: ['interactive', 'non_interactive', 'acp']` so the command is reachable outside the interactive Ink UI — the default for `commandType: 'local'` is interactive-only. - Align `?` (untracked) and `~` (binary) markers with the `+X -Y` stat column via a padded prefix, so filenames line up regardless of row kind. - Drop the "…and N more" hint when no rows are shown (shortstat fast-path with >500 files) — the count alone is sufficient and "showing first 0" is noise. - Switch header to full-phrase i18n templates (separate singular/plural variants) instead of word-by-word `t()` calls that don't survive non-English locales. - Extend tests to 12 scenarios: empty cwd, fetch rejection, singular "file" form, mixed untracked/binary/tracked alignment, 4-digit padding, shortstat fast-path, and supportedModes declaration. Mocks carry a `satisfies GitDiffResult` annotation so shape drift in core breaks the test at compile time.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
| // Fetch untracked filenames up front so both the shortstat fast path and | ||
| // the numstat slow path report the same `filesCount` surface. | ||
| const untrackedPaths = (await fetchUntrackedPaths(cwd)) ?? []; | ||
|
|
There was a problem hiding this comment.
[Suggestion] fetchGitDiff() loads all untracked paths before deciding whether the --shortstat fast path should skip per-file details. In repos with many untracked files, /diff still pays for a full git ls-files --others --exclude-standard scan and allocates the whole path list even when it ends up returning summary-only output. Consider delaying untracked-path collection until after the fast-path decision, or counting untracked files without materializing every path first.
— gpt-5.4 via Qwen Code /review
There was a problem hiding this comment.
Adopted in 005f88e. fetchGitDiff no longer materializes the untracked-path list before the fast-path decision — it holds only the raw ls-files -z stdout and counts NUL bytes to get the file count, then defers split('\0') until the slow path is actually taken. The per-file array is only allocated when we're going to use it.
| return { | ||
| stats: { | ||
| ...quickStats, | ||
| filesCount: quickStats.filesCount + untrackedPaths.length, |
There was a problem hiding this comment.
[Suggestion] The MAX_FILES_FOR_DETAILS short-circuit only uses the tracked-file count from git diff --shortstat, so a tree with relatively few tracked edits but hundreds of untracked files still falls through to the slow path. That defeats the new 500-file guardrail for one of the most common large-workspace cases this command is meant to protect against. Consider basing the threshold on tracked + untracked counts before deciding to run --numstat.
— gpt-5.4 via Qwen Code /review
There was a problem hiding this comment.
Adopted in 005f88e. The short-circuit now compares quickStats.filesCount + untrackedCount against MAX_FILES_FOR_DETAILS, so a tree with few tracked edits but hundreds of untracked files still takes the summary-only path instead of falling through to --numstat.
- Remove invalid `commandType` field from diffCommand (SlashCommand has no such property; caused a TS build failure). - Drop duplicate `NumstatResult` interface in gitDiff.ts — it is structurally identical to `GitDiffResult`. - Register the 9 missing `/diff` i18n strings in en.js / zh.js so the command is translatable (previously only `Configuration not available.` had entries).
- fetchUntrackedPaths now uses `ls-files -z` so filenames containing newlines, tabs, or non-ASCII bytes round-trip cleanly instead of being C-style quoted and split into phantom entries. - fetchGitDiff runs the `--shortstat` probe and the untracked-paths lookup in parallel, since both are needed regardless of which path the function takes. - parseGitDiff measures per-file diff size via Buffer.byteLength so MAX_DIFF_SIZE_BYTES matches its documented meaning on non-ASCII diffs. - Adds a regression test for an untracked file whose name contains a literal newline.
|
Addresses the five open review threads on #3491: - parseShortstat: anchored and bounded the regex (`^...$` with `\d{1,10}`) so adversarial inputs can no longer drive polynomial backtracking. Closes CodeQL alert #137. - fetchGitDiff: only parse the untracked-path list when we actually need it; the fast path now counts NUL bytes in the raw `ls-files -z` stdout (wenshao P1). - fetchGitDiff: base the `MAX_FILES_FOR_DETAILS` short-circuit on `tracked + untracked`, so repos with few edits but many untracked files still take the summary-only path (wenshao P2). - fetchGitDiff: count newlines in each untracked text file (binary sniff + 1 MB read cap) and fold that into both the header `+N` and the per-file row, so a brand-new file no longer renders as `+0 / -0` (BZ-D P2). - parseGitNumstat: switch to `git diff --numstat -z`. The parser now uses index-based slicing and a rename-pair state machine, so tracked filenames containing tabs/newlines/non-ASCII keep their real bytes (BZ-D P3). Renames collapse into a single `old => new` entry. UI: untracked rows render as `+N filename (new)` (or `~ filename (binary, new)`) instead of the placeholder `?` marker; `/diff` now shows real additions for fresh files.
|
Addressed both items in 005f88e. P2 — untracked totals underreporting
P3 — C-quoted tracked filenames via Switched to |
…m display Two issues surfaced during a directionless multi-round audit of the /diff feature: 1. `countUntrackedLines` reads at most `UNTRACKED_READ_CAP_BYTES` (1 MB) per file, so a 10 MB new log was silently reported as `+~20k` when the real count is ~10×. The helper now `fstat`s the file and returns a `truncated: true` flag when size exceeds the read window; `/diff` surfaces it as `(new, partial)` so the `+N` isn't read as exact. 2. Line-count aggregation was coupled to the per-file display cap: when tracked changes filled the `MAX_FILES` slot, untracked line counts beyond the remaining slots were dropped from `stats.linesAdded` entirely (header under-reported additions). Decoupled: we now read up to `MAX_FILES` untracked files for their line counts regardless of display slots, and only restrict the visible rows to `remainingSlots`. Added regression tests for both: a 1.5 MB new file asserts `truncated: true` and a lower-bound line count, and a `MAX_FILES`-saturated tracked set + 5 untracked files asserts that untracked additions still appear in the header totals even though none of them get displayed.
| stats.filesCount += untrackedCount; | ||
| const untrackedPaths = splitNulDelimited(untrackedOut); | ||
| const remainingSlots = Math.max(0, MAX_FILES - perFileStats.size); | ||
| const visiblePaths = untrackedPaths.slice(0, remainingSlots); |
There was a problem hiding this comment.
[Critical] fetchGitDiff() adds every untracked file to stats.filesCount, but only adds line counts for the subset of untracked files that fit into the 50-entry perFileStats display cap (visiblePaths). If tracked files already fill the cap, or there are more untracked files than remaining slots, /diff can report many changed files while omitting most untracked additions from the header total. Count untracked line stats for all untracked paths when computing header totals, while only inserting the first remainingSlots into perFileStats; or explicitly mark totals as capped if full counting is intentional.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
This was already addressed in 5075ad5, landed before this review ran against 005f88e. fetchGitDiff now reads up to MAX_FILES untracked files for their line counts regardless of display slots and folds all of them into stats.linesAdded; only the visible rows are restricted to remainingSlots. Regression test aggregates untracked line counts into linesAdded even when the per-file map is full of tracked entries covers the case where tracked files saturate the map.
| if (Buffer.byteLength(fileDiff, 'utf8') > MAX_DIFF_SIZE_BYTES) continue; | ||
|
|
||
| const lines = fileDiff.split('\n'); | ||
| const headerMatch = lines[0]?.match(/^a\/(.+?) b\/(.+)$/); |
There was a problem hiding this comment.
[Suggestion] parseGitDiff() parses diff --git a/... b/... with ^a\/(.+?) b\/(.+)$. Valid paths can contain b/, for example a b/c.txt, which causes the parser to key hunks under the wrong filename. Since fetchGitDiffHunks() is exported from core, consumers can receive hunks keyed by invalid paths and fail to correlate hunk data with stats, editor paths, or filesystem paths. Avoid deriving the path from the ambiguous diff --git header; parse the subsequent +++ b/<path> line, or use custom --src-prefix / --dst-prefix values. Add a regression test for a file named like a b/c.txt.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Good catch — confirmed locally that a real a b/c.txt produces diff --git a/a b/c.txt b/a b/c.txt with no escaping, so the previous regex keyed hunks under a wrong path.
Fixed in 54a0b2d. Replaced the ambiguous diff --git header parse with an extractFilePath helper that walks the block for unambiguous markers (rename to / copy to / +++ b/<path> with /dev/null fallback to --- a/<path>) and strips the TAB git appends on paths containing whitespace. Added 4 unit tests (space-in-path, rename, delete, new-file) plus an end-to-end test that creates a real a b/c.txt and verifies fetchGitDiffHunks keys the hunks under a b/c.txt.
`diff --git a/X b/Y` is ambiguous when X contains ` b/` — a file literally named `a b/c.txt` produces `diff --git a/a b/c.txt b/a b/c.txt` with no escape or quoting, and the previous regex `^a\/(.+?) b\/(.+)$` keyed the hunks under the wrong path. Consumers of the exported `fetchGitDiffHunks` API would then fail to correlate hunks with stats or editor paths. Introduces `extractFilePath(lines)` which walks the block for the unambiguous markers (`rename to` / `copy to` / `+++ b/<path>` with a `/dev/null` fallback to `--- a/<path>`) and strips the trailing TAB git appends to paths containing whitespace. Adds unit tests for the `a b/c.txt`, rename, delete, and new-file cases plus an end-to-end test that creates a real `a b/c.txt` file and asserts `fetchGitDiffHunks` keys the hunks correctly. Addresses wenshao review comment #3136657141 on #3491.
The /diff stats used to come back as a plain-text MessageActionReturn. Pipes and ACP still get that, but in interactive terminals we now dispatch a structured history item so the numbers can carry theme colors. - packages/cli/src/ui/types.ts — new DiffRenderRow / DiffRenderModel / HistoryItemDiffStats, MessageType.DIFF_STATS. - packages/cli/src/ui/components/messages/DiffStatsDisplay.tsx — renders +N in theme.status.success (green), -M in theme.status.error (red), and the (new) / (binary) / (new, partial) markers in theme.text.secondary (dim). Column alignment matches the plain-text fallback. - packages/cli/src/ui/components/HistoryItemDisplay.tsx — routes the new item type. - packages/cli/src/ui/commands/diffCommand.ts — builds a DiffRenderModel once and fans out: interactive calls context.ui.addItem; other modes fall through to renderDiffModelText() for the plain-text path. Error and "clean tree" branches keep the existing info/error MessageActionReturn in every mode. - Tests: existing diffCommand suite moved to an explicit non_interactive context (it was asserting text content); new interactive suite covers addItem dispatch and model shape; DiffStatsDisplay component tests cover the four row variants and the "…and N more" note.
| * the working tree is in a transient state (merge, rebase, cherry-pick, | ||
| * revert) — those states carry incoming changes that weren't intentionally | ||
| * made by the user. | ||
| */ |
There was a problem hiding this comment.
[Critical] fetchGitDiff(cwd) runs tracked diff commands and untracked discovery from cwd, then resolves untracked paths with path.join(cwd, relPath). When Qwen Code is launched from a subdirectory, tracked git diff paths are repo-root-relative, but git ls-files --others is scoped/relative to that subdirectory. That can make /diff include repo-wide tracked changes while omitting untracked files outside the current subdirectory, and it can emit inconsistent path keys for untracked rows.
Resolve the repository root once and use it consistently for all Git operations and line counting; keep emitted paths repo-root-relative.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Confirmed locally — running git diff --numstat -z from <repo>/sub emits sub/tracked.txt (repo-root-relative) while git ls-files --others -z emits untracked.txt (cwd-relative) and skips files in sibling directories entirely.
Fixed in bb0164d. fetchGitDiff and fetchGitDiffHunks now resolve the worktree root with findGitRoot(cwd) once and run every git invocation (and the path.join(...) for countUntrackedLines) from that root. All perFileStats keys are repo-root-relative and the listing is repo-wide, so /diff produces identical output regardless of which subdirectory the user is in. Regression test in fetchGitDiff invocation from a subdirectory creates a tracked subdir change plus an untracked file at the repo root and asserts both come back keyed correctly when invoked from the subdir.
| ): Promise<UntrackedLineStats> { | ||
| let fh; | ||
| try { | ||
| fh = await open(absPath, 'r'); |
There was a problem hiding this comment.
[Critical] countUntrackedLines opens every untracked path with open(absPath, 'r') without first verifying that the path is a regular file. Git can list untracked FIFOs and symlinks; opening a FIFO for reading can block indefinitely waiting for a writer, and opening a symlink follows its target outside the worktree.
Use lstat before opening untracked paths and only count regular files. Treat symlinks and special files as non-countable/binary rows, or handle symlinks explicitly without following them.
| fh = await open(absPath, 'r'); | |
| let st; | |
| try { | |
| st = await lstat(absPath); | |
| } catch { | |
| return { added: 0, isBinary: false, truncated: false }; | |
| } | |
| if (!st.isFile()) { | |
| return { added: 0, isBinary: true, truncated: false }; | |
| } | |
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Adopted in bb0164d — countUntrackedLines now lstats the path before open and bails out on anything that isn't a regular file: symlinks and FIFOs render as binary ~ rows, so the open never blocks on a FIFO and never dereferences a symlink target outside the worktree. Regression test creates an untracked symlink pointing at content under a separate tmpdir (outside the worktree) and asserts the entry lands as isBinary: true with linesAdded unchanged — the target's lines never leak into the totals.
# Conflicts: # scripts/unused-keys-only-in-locales.json
Audit of the colorize commit found one real DRY hazard: DiffStatsDisplay and renderDiffModelText each independently re-derived addWidth / remWidth / statColumnWidth from the same row list. If anyone later changed one formula, the interactive Ink output and the non-interactive plain text would silently fall out of column alignment. Extract the computation into computeDiffColumnWidths() exported from diffCommand.ts; both renderers now call it. Adds a focused unit test of the contract (empty rows, widest non-binary row wins, binary rows are ignored, untracked text rows count). Drop a redundant `Omit<HistoryItemDiffStats, 'id'>` annotation since the type already has no id field.
Two Critical findings on PR #3491: 1. (line 63) When /diff is invoked from a subdirectory of the worktree, `git diff` emits repo-root-relative paths but `git ls-files --others` is scoped to cwd and emits cwd-relative paths. Result: mixed path bases in `perFileStats` and silent omission of untracked files in sibling directories. Resolve `findGitRoot(cwd)` once and run every git invocation (and `path.join(...)` for line counting) from there, so all keys are repo-root-relative and the listing is repo-wide. 2. (line 455) `countUntrackedLines` opened every untracked path with `open(absPath, 'r')`. Git's `ls-files --others` can list FIFOs (whose `open()` blocks indefinitely waiting on a writer) and symlinks (which `open()` dereferences, potentially reading outside the worktree). Add an `lstat` gate: only regular files are counted; symlinks and other special files render as binary `~` rows. Two new integration tests cover both regressions: one creates a sibling untracked file at the repo root and invokes fetchGitDiff from a subdir asserting all three changes (root + sub) come back keyed by repo-root-relative paths; the other creates a symlink pointing at content outside the worktree and asserts it lands as a binary row with no contribution to linesAdded.
The previous fix(core) commit accidentally bundled two unrelated working-tree edits (a test comment in .npmrc and a TODO in README.md) that I had used while sanity-testing /diff. They have nothing to do with the fix; restore them to their pre-bb0164d99 state.
E2E verification of untracked-file behaviorSet up a fixture that exercises every branch of Fixture (under
Result invoking
Subdirectory invocation (wenshao Critical line 63 fix) — invoking from Identical totals and identical repo-root-relative keys as the root invocation — the entire repo is reported, not just files under Non-git directory (graceful degradation): The CLI surfaces this as All branches behave as designed. |
| // header silently under-reporting additions when tracked changes have | ||
| // already filled the per-file map. | ||
| const countable = untrackedPaths.slice(0, MAX_FILES); | ||
| const countableStats = await Promise.all( |
There was a problem hiding this comment.
[Suggestion] fetchGitDiff() counts up to 50 untracked files with one Promise.all, and each countUntrackedLines() call allocates a 1 MB buffer. In a workspace with many large untracked files, a single /diff invocation can therefore allocate roughly 50 MB of buffers at once, plus the matching file descriptors and I/O pressure. Because /diff runs on the interactive path, this can cause avoidable memory spikes or latency in constrained environments.
Consider bounding this work with a small concurrency limit, or reading files with a reusable/chunked buffer, so totals stay accurate without making the worst-case allocation fully concurrent.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Adopted in 002d17c.
countUntrackedLines now reads in fixed 64 KB chunks instead of allocating one full UNTRACKED_READ_CAP_BYTES (1 MB) buffer per file. With the existing MAX_FILES = 50 concurrency, peak heap footprint of a single /diff invocation drops from ~50 MB to ~3.2 MB.
Implementation notes:
- Loop reads
min(64 KB, capRemaining)per iteration viafh.read(buf, 0, toRead, totalRead), breaking onbytesRead === 0(EOF) or whentotalReadhitsUNTRACKED_READ_CAP_BYTES. - Binary sniff (
BINARY_SNIFF_BYTES = 8 KB) trackssniffedBytesacross iterations so it works even if a read returns a short chunk; in practice it always completes inside the first chunk because chunk size > sniff window. lastByteis tracked across reads for the trailing-partial-line rule.truncatedis decided once at the end viafh.stat()— only when we hit the cap with more bytes still on disk, identical semantics to the single-shot path.Buffer.allocUnsafefor the chunk buffer (we only iterate withinbytesRead, so uninit bytes are never observed).
All 44 gitDiff tests pass unchanged, including the 1.5 MB truncation test (marks oversized untracked text files as truncated) which now exercises the chunk-boundary code path. Re-ran the e2e fixture from the verification report — new-huge.log still reports +9900, truncated:true byte-for-byte.
I considered also adding a concurrency limit on top, but with chunked reads the per-file buffer is already small enough that the marginal benefit (3.2 MB → 512 KB at concurrency=8) didn't justify the extra worker-pool plumbing. Happy to revisit if you'd prefer that ceiling enforced explicitly.
`countUntrackedLines` allocated a fresh `UNTRACKED_READ_CAP_BYTES` (1 MB) buffer per file. With up to MAX_FILES (=50) line-counts running concurrently via `Promise.all`, the worst-case heap footprint of a single `/diff` invocation was ~50 MB of transient buffers — avoidable spike on small containers / low-memory hosts flagged by wenshao on PR #3491. Switch to a fixed 64 KB chunk buffer and read in a loop, accumulating line counts and tracking the last byte across iterations. Peak footprint is now ~3.2 MB (50 × 64 KB). Behavior is identical: same binary sniff over the first 8 KB, same truncation flag when the read hits the cap with bytes still on disk, same trailing-partial-line rule. All 44 gitDiff tests pass unchanged, including the 1.5 MB truncation test which now crosses chunk boundaries.
Two non-blocking suggestions from qqqys's CR on PR #3491: - `buildDiffRenderModel`: expand the JSDoc to call out the implicit row-ordering contract that both renderers depend on (tracked entries first in numstat order, then untracked appended in ls-files order). Future replacements of the underlying Map need to preserve this sequence. - `DiffStatsDisplay`: drop the `${i}-${filename}` React key in favor of bare `filename`. Filenames are unique within a single `DiffRenderModel` (perFileStats is a Map keyed by filename), so the index prefix added no information.
|
@qqqys 多谢细致的 CR,逐条回复: 1. 两次 2. 3. 4. 5. 行序约定加注释 — 已采纳,24dfb8726。 6. 7/8. rewindCommand / shellExecutionService / terminalSerializer.test.ts 的格式化改动 — 这些不是本 PR 的提交,是 9. — 总结:1/5/6 已落地(早于这条 review),2/3/4 解释保留现状,7/8 是上游 merge 噪声不在本 PR 真正改动里,9 是 schema 里已限定的范围。 |
Symmetrical to the (new) marker for untracked files: tracked files that were removed from the worktree relative to HEAD now render with a (deleted) suffix (or (binary, deleted) for binary deletes), so users can tell a delete apart from a heavy edit. Implementation: - core: `fetchGitDiff` now runs `git diff HEAD --name-status -z` in parallel with the existing numstat call. `parseDeletedFromNameStatus` extracts the set of D-status paths (skipping R/C rename and copy pairs, both halves of which still exist on disk under one name or the other). Each `perFileStats` entry whose key is in that set gets `isDeleted: true`. Numstat alone could not distinguish a delete (`0\t10\tpath`) from a heavy edit; the name-status pass disambiguates. - cli: `DiffRenderRow` carries `isDeleted: boolean`; both the plain-text renderer and the Ink component append the new suffix in `theme.text.secondary` (dim). - i18n: new `(deleted)` and `(binary, deleted)` keys in en/zh/zh-TW. Tests: - Unit: `parseDeletedFromNameStatus` covers D-only extraction, R/C pair skipping, NUL-safe paths (tabs / non-ASCII), and empty input. - Integration: real repo deletes a tracked text + a tracked binary plus edits another file; asserts the deleted entries get `isDeleted: true` but the heavy edit does not. Second test verifies neither half of a `git mv` rename gets flagged as deleted. - CLI / component: `(deleted)` and `(binary, deleted)` rendering variants with column alignment intact.
E2E verification of the
|
| branch | check | ✓ |
|---|---|---|
| binary delete | gone.bin → isBinary:true, isDeleted:true → renders ~ gone.bin (binary, deleted) |
✓ |
| text delete | gone.txt → removed:2, isDeleted:true → renders +0 -2 gone.txt (deleted) |
✓ |
| heavy edit ≠ delete | heavy-edit.txt → removed:5 but no isDeleted → renders +0 -5 heavy-edit.txt (no marker). This is the case numstat alone reports as 0\t5\theavy-edit.txt — same wire shape as a delete; only the parallel --name-status call lets us tell them apart. |
✓ |
| normal modify | kept.txt → added:1 plain row |
✓ |
| rename — neither side flagged | old-name.txt => new-name.txt is the collapsed rename row from parseGitNumstat's rename state machine; parseDeletedFromNameStatus skips both halves of the R<score> pair, so neither old-name.txt nor new-name.txt ends up in the deleted set, and the composite key never matches either. isDeleted stays unset. |
✓ |
| untracked still works | untracked.txt → isUntracked:true, isDeleted: undefined → still renders +1 -0 untracked.txt (new) |
✓ |
Header totals: linesAdded = 2 = 1 (kept) + 1 (untracked); linesRemoved = 7 = 5 (heavy-edit) + 2 (gone) — binary deletes don't contribute to line totals (they have added: 0, removed: 0), consistent with how git's numstat reports binary files. ✓
TUI rendering (after rebuild, with QWEN_WORKING_DIR=/tmp/diff-deleted-demo npm start then /diff):
6 files changed, +2 / -7
~ gone.bin (binary, deleted)
+0 -2 gone.txt (deleted)
+0 -5 heavy-edit.txt
+1 -0 kept.txt
+0 -0 old-name.txt => new-name.txt
+1 -0 untracked.txt (new)
Plus (deleted) / (binary, deleted) / (new) rendered in theme.text.secondary (dim), +N in theme.status.success (green), -M in theme.status.error (red).
All branches behave as designed; the disambiguation between "delete" and "heavy edit that drops N lines" works correctly via the parallel git diff HEAD --name-status -z call.
| ); | ||
| if (diffOut == null) return new Map(); | ||
| return parseGitDiff(diffOut); | ||
| } |
There was a problem hiding this comment.
[Critical] fetchGitDiffHunks() invokes plain git diff HEAD without --no-ext-diff. Unlike the stats paths, plain git diff can honor GIT_EXTERNAL_DIFF and configured external diff drivers, so this exported read-only utility can unexpectedly execute repository/user-configured commands when a caller only wants to inspect hunks.
| } | |
| ['--no-optional-locks', 'diff', 'HEAD', '--no-ext-diff'], |
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Adopted in 667f01a, with two extensions to your suggestion:
-
All four
git diffcalls now pin--no-ext-diff, not just the plainfetchGitDiffHunksone. I empirically verified that the stats variants (--shortstat/--numstat/--name-status) don't currently invoke external diff drivers — running each withGIT_EXTERNAL_DIFF=evil.shin a sandbox confirmed only plaingit difffires the driver. But pinning the flag uniformly is a zero-cost hardening, and git's behavior around external drivers has shifted between versions, so I'd rather not rely on the implicit assumption. -
Regression test plants the actual attack vector:
GIT_EXTERNAL_DIFF=evil.sh(a script whose only side effect is writing a sentinel file) is set inprocess.env, thenfetchGitDiffHunksruns against a real worktree with a modification. The test asserts the sentinel never appears on disk — proving--no-ext-diffreaches the spawnedgitand actually blocks the driver. Without the flag this test would pre-commit fail with a sentinel write.
Manual verification trace (sandbox):
=== plain `git diff HEAD` with GIT_EXTERNAL_DIFF set ===
EVIL EXTERNAL DIFF EXECUTED with args: file.txt /var/.../git-blob-... ...
=== `git diff HEAD --no-ext-diff` ===
diff --git a/file.txt b/file.txt
...
=== `git diff HEAD --numstat` (no flag) ===
1\t0\tfile.txt ← driver did NOT fire
=== `git diff HEAD --shortstat` (no flag) ===
1 file changed, 1 insertion(+) ← driver did NOT fire
=== `git diff HEAD --name-status` (no flag) ===
M\tfile.txt ← driver did NOT fire
So plain diff is the only currently-exploitable call site, but the fix lands on every diff invocation.
Plain `git diff HEAD` honors `GIT_EXTERNAL_DIFF` and configured `diff.<name>.command` drivers, so the exported `fetchGitDiffHunks` utility could execute arbitrary commands when invoked inside a worktree whose user-global or repo-local config registers an external driver. Add `--no-ext-diff` to every `git diff` call: - `fetchGitDiffHunks`'s plain `git diff HEAD` — the actual vulnerability surface. - `fetchGitDiff`'s `--shortstat`, `--numstat`, `--name-status` variants — defense-in-depth. Empirically these stats modes already bypass external drivers in current git, but git's behavior here has shifted between versions before, and pinning the flag everywhere is a zero-cost hardening that keeps the policy uniform across every `git diff` we run. Regression test plants `GIT_EXTERNAL_DIFF=evil.sh` (a driver that writes a sentinel file as its side effect) before calling `fetchGitDiffHunks`, then asserts the sentinel never appears — confirming `--no-ext-diff` actually stops git from spawning the driver. Closes wenshao critical comment on PR #3491.
PR #3491 CI was failing across all 9 platform/Node combos with: Error: [vitest] No "constants" export is defined on the "node:fs" mock. at gitDiff.ts:70 const UNTRACKED_OPEN_FLAGS = fsConstants.O_RDONLY | (fsConstants.O_NOFOLLOW ?? 0) at index.ts:279 export * from './utils/gitDiff.js' Six unrelated test files (`client.test.ts`, `geminiChat.test.ts`, `marketplace.test.ts`, `npm.test.ts`, `mcp-client.test.ts`, `nextSpeakerChecker.test.ts`) `vi.mock('node:fs', ...)` without returning `constants`, and their transitive import of `@qwen-code/qwen-code-core` pulls in `gitDiff.ts`, whose module-load-time `import { constants as fsConstants }` plus the top-level `UNTRACKED_OPEN_FLAGS` constant tripped vitest's strict mock proxy. Two changes: 1. Switch `import { constants }` to `import * as nodeFs from 'node:fs'`. Strict-mock no longer rejects the import statement itself. 2. Move the flag computation out of a module-load constant into a memoized `getUntrackedOpenFlags()` called from inside `countUntrackedLines`. Tests that don't actually invoke `fetchGitDiff` / `fetchGitDiffHunks` (i.e. all six broken ones) never reach the property access, so vitest's proxy never trips. `?? 0` fallback on each constant lookup is preserved so Windows (no `O_NOFOLLOW`) and the genuine "constants is undefined" mock path both degrade to plain `O_RDONLY` without throwing. Locally re-ran all six previously-failing files (199 tests) — all green. Existing 51 gitDiff tests unchanged.
Windows CI was failing only on:
resolveGitDir > follows the gitdir pointer for linked worktrees
AssertionError: expected 'C:/Users/runneradmin/.../main/.git/worktrees/wt'
to contain '.git\worktrees'
Git writes the linked-worktree pointer in the `.git` *file* using
forward slashes — `gitdir: C:/Users/.../main/.git/worktrees/wt` —
even on Windows. `resolveGitDir` surfaces that string verbatim
(intentional, since fs APIs on Windows accept both separators). But
the assertion used `path.join('.git', 'worktrees')`, which is
`'.git\\worktrees'` on Windows, so the substring-contains check
failed despite the value being correct.
Switch to a regex that matches either separator: `/[/\\]\.git[/\\]worktrees[/\\]/`.
Now the assertion holds on POSIX (where path.join uses `/` anyway)
and Windows (where git's value uses `/` but the host uses `\`).
6285/6289 Windows tests already passed before this; only this one
assertion was platform-dependent.
| return { added: 0, isBinary: true, truncated: false }; | ||
| } | ||
| try { | ||
| // Stream the file in fixed-size chunks instead of allocating one full |
There was a problem hiding this comment.
[Critical] extractFilePath() only accepts unquoted +++ b/<path> / --- a/<path> metadata. Git still C-quotes paths containing characters such as literal tabs, producing headers like +++ "b/tab\tname.txt", so the parser fails p.startsWith('b/') and drops the file’s hunks entirely. fetchGitDiffHunks() can therefore silently miss valid modified tracked files, while stats parsing handles these paths via --numstat -z.
Suggested fix: decode Git C-style quoted metadata paths before checking a/ / b/ prefixes, and add a regression test with a tracked filename containing a literal tab.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Adopted in 632b259.
Confirmed locally: with core.quotepath=false set, a tracked file named tab\there.txt still shows up as +++ "b/tab\there.txt" in plain git diff output — the C-quoting (for tabs/newlines/quotes) is independent of core.quotepath (which only suppresses the octal form for non-ASCII bytes). My old extractFilePath failed p.startsWith('b/') and dropped the file's hunks entirely.
Added unquoteCStylePath() that:
- Detects the surrounding
"..."wrapper. - Decodes
\t,\n,\r,\",\\plus octal\NNNescapes to raw bytes, so multi-byte UTF-8 sequences like\346\226\207(=文) reassemble correctly even though we never set quotepath=true ourselves (covers callers feeding us output produced by some other diff command). - Returns the UTF-8-decoded string.
extractFilePath now pipes every candidate through stripTab → unquoteCStylePath before checking the a/ / b/ prefix.
Tests: 2 new unit tests (tab-escape, octal-escape), 1 new integration test that creates a real tab\there.txt tracked file and asserts fetchGitDiffHunks keys hunks under the real name (no-ops on filesystems that reject tab-in-name).
| const untrackedCount = countNulDelimited(untrackedOut); | ||
|
|
||
| if (shortstatOut != null) { | ||
| const quickStats = parseShortstat(shortstatOut); |
There was a problem hiding this comment.
[Critical] The >500-file fast path only runs when git diff --shortstat parses to a truthy value. With no tracked changes and more than 500 untracked files, shortstat is empty, so the slow path runs and only counts lines for untrackedPaths.slice(0, MAX_FILES). For 501 one-line untracked files this reports filesCount: 501 but linesAdded: 50, materially under-reporting totals and bypassing the intended guardrail.
Suggested fix: treat missing shortstat as zero tracked stats when applying the tracked-plus-untracked fast path, or count all untracked files intended for header totals before returning detailed stats.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Adopted in 632b259.
You're right — if (quickStats && ...) short-circuited when shortstat was empty (no tracked changes). 0 tracked + 501 untracked therefore slipped past the guardrail and hit the slow path, which only line-counted the first MAX_FILES untracked entries — header would say +50 / -0 for what was actually 501 newly-added files.
Fix: treat empty/null/unparseable shortstat as EMPTY_STATS = { filesCount: 0, linesAdded: 0, linesRemoved: 0 } and apply the threshold on tracked + untracked uniformly:
const quickStats =
(shortstatOut != null && parseShortstat(shortstatOut)) || EMPTY_STATS;
if (quickStats.filesCount + untrackedCount > MAX_FILES_FOR_DETAILS) {
return { stats: { ...quickStats, filesCount: ... }, perFileStats: new Map() };
}Regression test plants 501 untracked files + 0 tracked changes and asserts the result has filesCount: 501 with an empty perFileStats Map (i.e. the summary-only fast path actually fired).
| // global config that sets `GIT_EXTERNAL_DIFF` or `diff.<name>.command` | ||
| // would let `git diff` execute arbitrary commands when this read-only | ||
| // utility is invoked. The stats variants in `fetchGitDiff` already | ||
| // bypass external drivers, but plain `git diff` honors them. |
There was a problem hiding this comment.
[Suggestion] fetchGitDiffHunks() reads the full git diff HEAD output into memory via execFile before parser-level caps (MAX_FILES, per-file size, per-file lines) apply. Large diffs can still allocate up to the command buffer as a string before most content is discarded, causing avoidable latency and memory spikes.
Suggested fix: stream git diff output and stop once caps are reached, or document the API as a full-diff reader if streaming is intentionally out of scope.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Acknowledged — documented the limitation in the JSDoc rather than implementing streaming in this PR (632b259):
NOTE on memory: this reads the full
git diff HEADstdout viaexecFilebefore applying parser caps (MAX_FILES,MAX_DIFF_SIZE_BYTES,MAX_LINES_PER_FILE). For very large diffs we can buffer up to therunGitmaxBuffer(64 MB) before dropping content. Streaming the parser would let us terminategitearly atMAX_FILES; that's a reasonable follow-up but out of scope for this utility's first cut.
Streaming is a non-trivial refactor here — child_process.spawn with a Node ReadableStream, incremental file-block detection (the ^diff --git boundary), UTF-8 boundary handling on Buffer chunks, and a graceful child.kill() once MAX_FILES is reached. I'd rather land that as a separate PR than bundle it with this one. Tracking the 64 MB maxBuffer cap as the failsafe in the meantime — pathological diffs error out cleanly via runGit's catch-and-return-null path.
Happy to file a follow-up issue if you want to track this explicitly.
Two Critical findings + one suggestion from wenshao on PR #3491: 1. (line 615) `extractFilePath` only accepted unquoted `+++ b/...` / `--- a/...` headers. Git wraps a path in `"..."` and applies C-style escaping (`\t`, `\n`, `\r`, `\"`, `\\`, plus octal `\NNN` for non-ASCII bytes) whenever the raw path contains a character that breaks space-delimited parsing. `core.quotepath=false` only disables the octal form for non-ASCII bytes — control chars and quotes are still escaped — so `fetchGitDiffHunks` silently dropped hunks for any tracked file whose name contained a tab, newline, or quote. Add `unquoteCStylePath()`: detects the surrounding quotes, decodes `\t`/`\n`/`\r`/`\"`/`\\` plus octal `\NNN` to raw bytes, then UTF-8-decodes the byte sequence so multi-byte octal sequences like `\346\226\207` (= `文`) round-trip correctly. `extractFilePath` pipes every candidate through `stripTab` -> `unquoteCStylePath` before checking the `a/` / `b/` prefix. Two unit tests cover the tab and octal cases; one integration test creates a real `tab\there.txt` tracked file, modifies it, and asserts `fetchGitDiffHunks` keys hunks under the real name. The integration test no-ops on filesystems that reject tab-in-name (NTFS). 2. (line 146) The >MAX_FILES_FOR_DETAILS fast path was guarded by `quickStats &&`, which short-circuited to false when shortstat returned an empty string. A workspace with 0 tracked changes plus 501 untracked files therefore slipped past the guardrail and ran the slow path, line-counting only the first MAX_FILES untracked files — header reported `filesCount: 501` but `linesAdded` missed the other 451. Treat empty/null/unparseable shortstat as `EMPTY_STATS` and apply the threshold on `tracked + untracked` uniformly. Integration test plants 501 untracked files + 0 tracked and asserts the result has `filesCount: 501` with an empty perFileStats Map (summary-only). 3. (line 263) `fetchGitDiffHunks` reads the full `git diff HEAD` stdout before parser caps apply. Documented in the JSDoc as a known limitation: streaming the parser to terminate git early at MAX_FILES is a reasonable follow-up but a non-trivial refactor (spawn + incremental parse + UTF-8 boundary handling) and out of scope for this PR. The existing `runGit` 64 MB maxBuffer keeps pathological cases from runaway-allocating. 55 gitDiff tests pass (51 + 4 new).
| // utility is invoked. The stats variants in `fetchGitDiff` already | ||
| // bypass external drivers, but plain `git diff` honors them. | ||
| const diffOut = await runGit( | ||
| ['--no-optional-locks', 'diff', '--no-ext-diff', 'HEAD'], |
There was a problem hiding this comment.
[Critical] The git diff calls disable external diff drivers with --no-ext-diff, but they still allow Git textconv filters. Textconv is configured separately through .gitattributes / diff.<driver>.textconv, and git diff --no-ext-diff can still execute those commands while /diff is only trying to inspect local changes. Add --no-textconv to each git diff invocation that already passes --no-ext-diff, and add a regression test with a textconv driver that writes a sentinel file and must not fire.
| ['--no-optional-locks', 'diff', '--no-ext-diff', 'HEAD'], | |
| ['--no-optional-locks', 'diff', '--no-ext-diff', '--no-textconv', 'HEAD'], |
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Adopted in 4b371a4.
Verified locally that textconv is a separate code path from the external diff drivers — --no-ext-diff alone is insufficient:
git config diff.evil.textconv /tmp/evil.sh
echo '*.pdf diff=evil' > .gitattributes
# ... commit + modify doc.pdf ...
git diff HEAD --no-ext-diff -> /tmp/evil.sh DOES fire (writes sentinel)
git diff HEAD --no-ext-diff --no-textconv -> driver does NOT fire
Pinned --no-textconv on all four git diff invocations alongside --no-ext-diff. Empirically only the plain-diff call (fetchGitDiffHunks) currently fires textconv (the stats variants don't), but pinning both flags uniformly is defense-in-depth and keeps the policy declarative.
Regression test plants the actual attack vector — registers diff.evil.textconv = /path/to/evil.sh in the worktree's .git/config, sets *.pdf diff=evil in .gitattributes, commits + modifies a .pdf file, and asserts the driver's sentinel file is NOT written when fetchGitDiffHunks runs. Without the new flag this test would fail with the sentinel present.
…rivers
Builds on the earlier `--no-ext-diff` hardening. wenshao pointed out
that `--no-ext-diff` covers `GIT_EXTERNAL_DIFF` and
`diff.<name>.command`, but it does NOT block textconv filters
registered via `.gitattributes` + `diff.<name>.textconv` — those run
on a separate code path inside `git diff`.
Verified locally:
git config diff.evil.textconv /tmp/evil.sh
echo '*.pdf diff=evil' > .gitattributes
# ... commit + modify doc.pdf ...
git diff HEAD --no-ext-diff -> /tmp/evil.sh fires
git diff HEAD --no-ext-diff --no-textconv -> driver does NOT fire
Add `--no-textconv` to all four `git diff` invocations
(shortstat / numstat / name-status / plain hunks). As with
`--no-ext-diff`, only the plain-diff call (`fetchGitDiffHunks`) is
known to invoke textconv in current git, but pinning both flags
uniformly is defense-in-depth and keeps the policy declarative.
Regression test plants a real textconv driver in a worktree's
`.git/config` + `.gitattributes` and asserts the driver's sentinel
file is NOT written when `fetchGitDiffHunks` runs. Without the new
flag the test fails with the sentinel present.
多轮无方向评审报告(设计方案 + 实现代码)针对当前分支 Round 1 — 代码层逐函数审查
Round 2 — 数据流与边界
Round 3 — 测试 + 工具链
仅有的次要观察(非缺陷)
结论方案与实现没有发现需要改动的问题,可以维持现状。 |
| countUntrackedLines(path.join(gitRoot, relPath)), | ||
| ), | ||
| ); | ||
| for (const s of countableStats) stats.linesAdded += s.added; |
There was a problem hiding this comment.
[Critical] fetchGitDiff() adds every untracked file to stats.filesCount, but only line-counts untrackedPaths.slice(0, MAX_FILES). For 51–500 untracked files with no tracked changes, /diff reports all changed files while linesAdded includes only the first 50 files, so the header can substantially under-report additions.
Consider counting all untracked paths in the slow path, or explicitly marking the totals as capped/partial instead of presenting them as complete.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Adopted in ecb7743.
Confirmed locally: with 0 tracked changes and 60 untracked one-line files in a clean repo, the pre-fix /diff reported 60 files changed, +50 lines — untrackedPaths.slice(0, MAX_FILES) clipped lines 51-60 out of stats.linesAdded while keeping them in filesCount.
Fix. fetchGitDiff now reads line counts for every untracked path that survives the >MAX_FILES_FOR_DETAILS fast-path filter (so up to ~500 files at the outer cap, not just MAX_FILES). The extension is bounded by a new mapWithConcurrency helper that runs at most MAX_FILES (50) countUntrackedLines calls concurrently, so peak heap stays around MAX_FILES * UNTRACKED_READ_CHUNK_BYTES (~3.2 MB) regardless of how many files survive into the slow path. Per-file rendering still caps at MAX_FILES — the rest are folded into linesAdded and surface as hiddenCount on the renderer side.
Why bound concurrency rather than just Promise.all over all of them. A workspace with 500 untracked files would otherwise open 500 file handles + 500 × 64 KB chunk buffers in flight; the explicit cap keeps /diff predictable on constrained hosts (containers, CI runners) without changing behavior on the common case (≤50 untracked).
Test. line-counts every untracked file in the slow path, not just the first MAX_FILES in gitDiff.test.ts creates MAX_FILES + 10 = 60 untracked one-line files in a clean repo and asserts:
stats.filesCount === 60stats.linesAdded === 60(would be 50 pre-fix)perFileStats.size === MAX_FILES— visible-row cap unchanged
Filenames are zero-padded so ls-files --others returns them in stable order, otherwise the pre-fix bug would randomly cover the first 50 files instead of cleanly missing 10.
| : r.isDeleted | ||
| ? ` ${t('(binary, deleted)')}` | ||
| : ` ${t('(binary)')}`; | ||
| out.push(` ${padMarker('~', statColumnWidth)} ${r.filename}${suffix}`); |
There was a problem hiding this comment.
[Critical] renderDiffModelText() interpolates raw r.filename values into non-interactive output. Git permits filenames containing ANSI/control bytes; the interactive history path is sanitized via escapeAnsiCtrlCodes(item), but non-interactive/ACP output bypasses that protection.
This can let a malicious repository filename inject terminal control sequences into logs, CI output, or non-interactive terminal displays. Sanitize filenames at the plain-text rendering boundary, for example by applying the existing escapeAnsiCtrlCodes helper to r.filename before interpolation.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Adopted in ecb7743.
Confirmed the gap locally:
- Interactive:
HistoryItemDisplay.tsx:88runs the entire item throughescapeAnsiCtrlCodes(item), which recursively walksmodel.rows[i].filenameand JSON-escapes any byte matchingansiRegex(). Safe. - Non-interactive / ACP:
renderDiffModelTextwas building strings with template-literal concatenation and the result went straight intoMessageActionReturn.content→ stdout / logs / transports, no sanitizer.
A repo containing a tracked file named e.g. evil\x1b[31mEVIL\x1b[0m.txt could therefore inject SGR codes into CI logs, terminal scrollback, even pagers. Worse vectors include \x1b[2J (clear screen), \x1b[H (cursor home), and DECSC / DECRC pairs that hide later output.
Fix. Pipe r.filename through escapeAnsiCtrlCodes inside formatRowsText on every row variant — modified, binary, untracked, deleted. The same helper used by the interactive path, applied once per filename at the rendering boundary so the suffix markers ((binary), (new), (deleted), (new, partial), (binary, new), (binary, deleted)) and the i18n header strings are preserved unchanged.
Tests (diffCommand.test.ts, new renderDiffModelText filename sanitization describe block):
escapes raw ANSI escape sequences embedded in tracked filenames— feedssafe\x1b[31mEVIL\x1b[0m.txtinto a modified-row mock, asserts\x1b[never appears in the output and the literal�[31mform does.escapes ANSI sequences in untracked / binary / deleted suffix rows— covers all three suffix branches with\x1b[2J,\x1b[H,\x1b[0Kfilenames; asserts no raw ESC byte survives, while(binary)/(new)/(deleted)still render.
The interactive path is unchanged (already covered by escapeAnsiCtrlCodes(item) at the HistoryItem boundary), so this fix is purely additive on the text-output side.
…f output Two Critical issues from PR #3491 review: 1. fetchGitDiff slow path only line-counted the first MAX_FILES (50) untracked paths via `untrackedPaths.slice(0, MAX_FILES)`. With 51-500 untracked files in a clean tree the header reported the full file count but only ~50 files' worth of additions, materially under-reporting the total. Now read every untracked path that survived the >MAX_FILES_FOR_DETAILS fast-path filter, with concurrency bounded to MAX_FILES so peak heap stays around MAX_FILES * UNTRACKED_READ_CHUNK_BYTES (~3.2 MB) regardless of input size. 2. renderDiffModelText interpolated raw filenames into the non-interactive / ACP text path. The interactive history is sanitized via escapeAnsiCtrlCodes(item) inside HistoryItemDisplay, but the text path streams to stdout / logs / transports with no equivalent hop, so a tracked or untracked filename containing \x1b[2J etc. could inject color resets, cursor moves, or full screen clears into CI logs and downstream terminals. Pipe r.filename through escapeAnsiCtrlCodes at the rendering boundary on every row variant (binary, untracked, deleted, modified). Tests: - gitDiff.test.ts: regression that asserts every one of MAX_FILES + 10 untracked one-line files contributes to stats.linesAdded (would be 50 pre-fix vs 60 actual). - diffCommand.test.ts: two new specs covering ANSI escapes in modified-file rows and in binary / untracked / deleted suffix rows. Verifies raw \x1b never reaches stdout while suffix markers ((binary), (new), (deleted)) still render.
| if (!s.startsWith('"') || !s.endsWith('"') || s.length < 2) return s; | ||
| const inner = s.slice(1, -1); | ||
| // Build raw bytes first so octal `\NNN` sequences (each one byte of a | ||
| // potentially multi-byte UTF-8 character) reassemble correctly. |
There was a problem hiding this comment.
[Critical] This branch encodes inner[i], which is only one UTF-16 code unit. If Git emits a quoted path that also contains raw non-BMP Unicode under core.quotepath=false (for example an emoji plus a tab or quote that forces C-style quoting), the surrogate pair is split and decoded as replacement characters, so the hunk map is keyed by a corrupted filename.
A safer fix is to encode one Unicode code point at a time and advance by the produced string length, while still keeping octal escapes as raw bytes.
| // potentially multi-byte UTF-8 character) reassemble correctly. | |
| const cp = inner.codePointAt(i); | |
| if (cp === undefined) { | |
| i++; | |
| continue; | |
| } | |
| const ch = String.fromCodePoint(cp); | |
| bytes.push(...Buffer.from(ch, 'utf8')); | |
| i += ch.length; |
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Adopted in 4f7e8dd.
Confirmed locally: a tracked file named \t🚀.txt produces +++ "b/\t🚀.txt" in plain git diff output. Under the previous walker, the rocket emoji (U+1F680, two UTF-16 code units \uD83D\uDE80) was iterated one code unit at a time, and Buffer.from('\uD83D', 'utf8') emits the 3-byte replacement sequence EF BF BD for a lone surrogate — so the filename round-tripped as two U+FFFD characters and the hunk key never matched the on-disk path.
Fix: the non-backslash branch now reads via inner.codePointAt(i), builds String.fromCodePoint(cp), and advances by ch.length. Non-BMP code points are encoded as their full 4-byte UTF-8 sequence, octal \NNN escapes still pass through the byte-stream branch unchanged, and the bytes array is still UTF-8-decoded en masse so multi-byte octal sequences continue to reassemble correctly.
Regression test (preserves non-BMP code points in quoted paths instead of splitting surrogates in gitDiff.test.ts) feeds a quoted path with \t🚀.txt (forced-quoting via the literal tab plus a non-BMP code point in the same path) through parseGitDiff and asserts the hunk key is \t🚀.txt. Without the fix this would key under \t\uFFFD\uFFFD.txt.
| (inner[i + 1 + octal.length] ?? '') >= '0' && | ||
| (inner[i + 1 + octal.length] ?? '') <= '7' | ||
| ) { | ||
| octal += inner[i + 1 + octal.length]; |
There was a problem hiding this comment.
[Critical] Git's C-style quoting can emit \a, \b, \f, and \v for control characters in paths, but this decoder falls through to the default branch and drops the backslash, turning those filenames into ordinary a/b/f/v characters. That means fetchGitDiffHunks can return hunks under a path that does not match the real tracked filename.
Please add explicit mappings for the remaining C escapes and cover at least one of them with a regression test.
| octal += inner[i + 1 + octal.length]; | |
| case 'a': | |
| bytes.push(0x07); | |
| i += 2; | |
| break; | |
| case 'b': | |
| bytes.push(0x08); | |
| i += 2; | |
| break; | |
| case 'f': | |
| bytes.push(0x0c); | |
| i += 2; | |
| break; | |
| case 'v': | |
| bytes.push(0x0b); | |
| i += 2; | |
| break; | |
| default: |
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Adopted in 4f7e8dd.
You're right — \a, \b, \f, \v were falling into the default branch, which bytes.push(...Buffer.from(next, 'utf8'))-ed only the trailing letter and silently dropped the leading backslash. So a tracked path like bell\afile.txt (BEL embedded) was decoding to bellafile.txt, and fetchGitDiffHunks keyed hunks under a name that didn't exist on disk.
Fix: explicit case arms for 'a' → 0x07, 'b' → 0x08, 'f' → 0x0c, 'v' → 0x0b next to the existing t / n / r / " / \\ arms. The octal-escape branch is unchanged.
Regression test (decodes the remaining C-style escapes (\a, \b, \f, \v) in gitDiff.test.ts) constructs a multi-file diff covering all four escapes (bell\afile.txt, back\bspace.txt, form\ffeed.txt, vert\vtab.txt) and asserts the hunk keys decode to the literal control-byte forms bell\x07file.txt, back\x08space.txt, form\x0cfeed.txt, vert\x0btab.txt. Without the fix the keys collapse to bellafile.txt / backbspace.txt / formffeed.txt / vertvtab.txt.
| // `escapeAnsiCtrlCodes(item)` sanitizer in `HistoryItemDisplay`. Without | ||
| // this hop, a hostile filename could inject color resets, cursor moves, | ||
| // or full screen clears into CI logs and any consumer's terminal. | ||
| const safeName = escapeAnsiCtrlCodes(r.filename); |
There was a problem hiding this comment.
[Critical] escapeAnsiCtrlCodes only escapes ANSI escape sequences matched by ansi-regex; it does not escape other terminal-control bytes that Git filenames can contain, such as newline, carriage return, backspace, or BEL. As a result, a filename like bad\nINJECTED.txt or bad\roverwrite.txt can still alter non-interactive/ACP output layout or terminal behavior.
Please sanitize all non-printable control characters at this text boundary, not just ANSI sequences, and add regression coverage for newline/carriage-return/control-byte filenames.
— gpt-5.5 via Qwen Code /review
There was a problem hiding this comment.
Adopted in 4f7e8dd.
Confirmed the gap: escapeAnsiCtrlCodes runs the input through ansi-regex, which only matches the multi-byte CSI / OSC / DEC families. Standalone control bytes that aren't part of a recognized escape sequence — \n (LF), \r (CR), \x08 (BS), \x07 (BEL), \x09 (TAB), \x7f (DEL), and the C1 range \x80-\x9f — went through unchanged. A filename like bad\nINJECTED.txt would still tear our two-column diff layout apart in CI logs, bad\roverwrite.txt would still let the next row stomp on the previous one, and BEL would still beep terminals on every render.
Fix: introduced sanitizeFilenameForDisplay in diffCommand.ts that pipes the filename through escapeAnsiCtrlCodes first (preserves the existing CSI-injection coverage) and then replaces every byte in [\x00-\x1f\x7f-\x9f] via escapeControlChar. The escaper emits the familiar \\b / \\t / \\n / \\f / \\r short forms and falls back to \\u00XX for everything else (DEL and the C1 range — JSON.stringify would leave those two ranges as raw bytes, which is exactly what we're trying to keep out of the rendered output, hence the hand-rolled escape rather than reusing JSON.stringify(ch).slice(1,-1)). All four call sites — modified, untracked, binary, deleted — now go through this sanitizer.
Regression test (escapes standalone control bytes that ansi-regex does not match in diffCommand.test.ts) renders a five-row diff with filenames containing raw \n, \r, BS, BEL, and DEL respectively, then asserts none of those bytes survive into the output and that the JSON-style escaped forms (bad\nINJECTED.txt, bad\roverwrite.txt) appear instead. The existing escapes ANSI sequences in untracked / binary / deleted suffix rows test still passes against the new sanitizer, so the previous CSI coverage is intact.
- unquoteCStylePath now walks Unicode code points so non-BMP characters (e.g. emoji) inside a forced-quoted path no longer get split into lone surrogates and decoded as replacement characters. - Add explicit C-escape mappings for \a, \b, \f, \v so paths using those control bytes decode to BEL/BS/FF/VT instead of dropping the backslash. - Replace escapeAnsiCtrlCodes(filename) at the /diff text-rendering boundary with a sanitizer that also escapes standalone C0/C1 control bytes plus DEL, closing newline / CR / BS / BEL injection vectors that ansi-regex does not match.
wenshao
left a comment
There was a problem hiding this comment.
No review issues found — the diff is clean, tests are thorough, and security mitigations are in place. LGTM! ✅
Test (windows-latest, 24.x). All other platforms (macOS 20/22/24, Ubuntu 20/22/24, Windows 20/22) and lint/CodeQL checks pass. The single Windows 24.x failure is likely pre-existing / flaky, not caused by this PR.
— deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — DeepSeek/deepseek-v4-pro via Qwen Code /review



TLDR
Implements the structured git-diff statistics feature requested in #2997 and exposes it through a new
/diffslash command.packages/core/src/utils/gitDiff.tsparsesgit diff --numstat,--shortstat, and unifiedgit diff HEADoutput into a structured result (files changed, lines added/removed, per-file summaries, hunks) with the caps called out in the issue: 50 files, 1 MB per file, 400 lines per file, plus a 500-file short-circuit via--shortstat./diffbuilt-in slash command renders the result as a single info message — headerN files changed, +A / -Rfollowed by per-file rows, with?for untracked files and~for binary files.Screenshots / Video Demo
Typical output inside a dirty working tree:
On a clean tree or outside a git repo,
/diffprints a single info line and exits without error.Dive Deeper
Borrowed the parsing strategy from the reference implementation in claude-code-cli's
utils/gitDiff.ts, then adapted to qwen-code conventions and fixed a handful of issues surfaced during adversarial review:findGitRootreturns the directory that contains.git, but in a linked worktree.gitis a file pointing at<main>/.git/worktrees/<name>/. A newresolveGitDirhelper follows that indirection soMERGE_HEAD,CHERRY_PICK_HEAD, etc. are looked up in the right gitdir.REBASE_HEADalone misses the common case. The transient-state check now looks atrebase-merge/andrebase-apply/directories too.---/+++/indexas file content. The unified-diff parser previously skipped any line starting with those prefixes as file-header metadata, which corrupted hunks that removed or added source lines beginning with---,+++, orindex. Metadata is now only skipped before the first@@hunk header.-c core.quotepath=falseso Japanese / Chinese / other non-ASCII filenames stay as UTF-8 instead of being octal-escaped into keys like\346\227\245\346\234\254\350\252\236.txt.stats.filesCount, so callers see a consistent surface.continuetobreakso we stop scanning once the 400-line cap is reached rather than walking the remainder of a huge file looking for hunk headers./diffrobustness. Wrapped in try/catch (FS permission errors surface as a friendly message rather than crashing the action), declaredsupportedModes: ['interactive', 'non_interactive', 'acp']so the command works outside the Ink UI, and aligned?/~marker rows with the+X -Ystat column so mixed output stays tabular.Reviewer Test Plan
/diff. Expected: a header with files-changed / +added / -removed plus per-file rows./diffin a clean tree, a fresh repo with no HEAD, and a non-git directory — each should print a single info message without throwing.git worktree add ../wt -b feature, drop a fakeMERGE_HEADinside<main>/.git/worktrees/wt/, and confirm/diffshort-circuits with the "merge/rebase/cherry-pick/revert" message — this exercises theresolveGitDirfix.Testing Matrix
Validated on macOS via `npm run typecheck` (0 errors) and the project's vitest runner (1602 core tests pass, 271 CLI command tests pass).
Linked issues / bugs
Resolves #2997