fix(core): unify Edit/WriteFile prior-read with Claude Code; close #3964 + #3945 by wenshao · Pull Request #4002 · QwenLM/qwen-code

wenshao · 2026-05-09T18:42:24Z

Motivation

Issue #3964 reports Edit / write_file rejecting regular .kt / .cpp / .py / .ts source files with a misleading binary-payload error across 0.15.7 – 0.15.9, on both Linux and Windows:

File X is a binary / image / audio / video / PDF / notebook payload that the read_file tool returns as a structured value rather than as plain text. The Edit / WriteFile tools cannot mutate that payload safely — re-reading it would not change this. Use a different mechanism (e.g. shell tool with a binary-aware writer) if you need to edit it.

The reproductions in the issue's comments fall into two distinct populations:

Cache-side conflation (Kieaer's .kt, analysis.py, the multiple "frequent edit failures after auto-update" reports). Triggers on partial reads (offset / limit) or full reads that hit the truncate-tool-output limit, on any file system.
Content-detection-side false positive (Frank-Shaw-FS's first report). .cpp / .c / .h source files on a Windows encrypted / DRM-protected file system, where the OS surfaces encrypted bytes to fs.open() random-access reads and isBinaryFile's 4 KB sample concludes binary.

Issue #3945 (edit tool unusable for large files) is the truncated-full-read arm of population 1 from the Edit side. The same truncated-full-read on the WriteFile side hit a separate deadlock that #3945's report didn't surface but became reachable once population 1 was fixed: WriteFile's requireFullRead rejection ("only been partially read … re-read without offset / limit / pages") sends the model back to a read that produces the same truncated state, with no escape.

This PR closes all of it.

Part 1 — cache-side: decouple `cacheable` from truncation

PR #3774 introduced priorReadEnforcement.ts, which consults FileReadCache.lastReadCacheable to discriminate text from binary / image / PDF / notebook payloads. ReadFileToolInvocation derived cacheable as:

const cacheable =
  typeof result.llmContent === 'string' &&
  result.originalLineCount !== undefined &&
  !result.isTruncated;   // ← the bug

That conflates two unrelated concerns: "is the content text" and "did the model see all the bytes". A partial read or a truncated full read of a regular text file therefore set cacheable: false, and priorReadEnforcement.ts mistook that for a non-text payload.

PR #3932 had already split prior-read enforcement so Edit accepts partial reads (lastReadWasFull-relaxed for Edit, kept for WriteFile). But the lastReadCacheable conflation persisted, so partial / truncated text reads still hit the binary-payload rejection on the very Edit path that #3932 was supposed to relax.

Change

   if (cacheEnabled && (result.stats ?? stats)) {
     const cacheable =
       typeof result.llmContent === 'string' &&
-      result.originalLineCount !== undefined &&
-      !result.isTruncated;
+      result.originalLineCount !== undefined;
     const recordStats: Stats = result.stats ?? stats!;
     cache.recordRead(absPath, recordStats, {
-      full: isFullRead,
+      full: isFullRead && !result.isTruncated,
       cacheable,
     });
   }

lastReadCacheable is now purely about content type.
Truncation gating moves to lastReadWasFull (used by the Read fast-path only — see Part 3 for why it is not consulted by enforcement).

Part 2 — content-detection-side: trust mime / extension over `isBinaryFile`

isBinaryFile reads the first 4 KB via fs.open() random-access and counts nulls / non-printables. That heuristic produces false positives on:

UTF-16 / UTF-32 without BOM.
Encrypted / DRM-protected file systems where random-access reads surface raw encrypted bytes (the Windows scenario in File type detection misidentifies encrypted .c/.cpp/.h files as binary payloads #3964).
Source files whose first 4 KB happen to start with a long binary-ish header (e.g. base64 blobs).

When the extension is unambiguously a text format, the content sample's verdict should not override the extension. Update detectFileType to consult the registry / a curated extension list before the content sample:

1. Hardcoded `.ts` / `.mts` / `.cts` / `.tsx` / `.svg` / `.ipynb`
2. mime registry: image / audio / video / pdf → matching type
                  text/* / application/* text-likes / +xml / +json → text
3. BINARY_EXTENSIONS pre-empt
4. Curated source-code / config extensions (`.py` / `.kt` / `.go` / ...) → text
5. isBinaryFile content sample
6. Default text

The curated extension list (KNOWN_TEXT_EXTENSIONS) covers the languages / configs that mime/lite either omits or registers ambiguously: Python, Kotlin, Go, Ruby, Swift, Scala, Rust, Elixir, Erlang, Clojure, Haskell, OCaml, F#, Vue / Svelte / Astro, shells (.sh / .bash / .zsh / .fish / .ps1), Dart, Zig, Nim, Solidity, C#, VB, Proto / GraphQL / SQL / Thrift, AsciiDoc / TeX / RST / org, TOML / HCL / Terraform / Dockerfile / CMake / Make. Restricted to extensions we have observed to be misclassified in the field; obscure extensions still go through the content sampler.

The mime allowlist is restricted to the 6 values mime/lite actually emits (application/javascript, application/ecmascript, application/node, application/json, application/xml, application/toml) plus +xml / +json suffix matching for structured-data formats.

BINARY_EXTENSIONS is checked before the new text-extension override, so a .png whose first bytes happen to be ASCII still gets classified as binary — verified by a regression test.

Part 3 — drop `requireFullRead` on WriteFile, align with Claude Code

PR #3932 deliberately diverged from Claude Code's readFileState by keeping requireFullRead: true on WriteFile's prior-read enforcement, citing issue #2499 (LLM hallucinates unread bytes on overwrite) as evidence that the asymmetric stance was justified. In practice that stance leaves a hard deadlock: when a file is larger than truncate-tool-output-lines, read_file without offset/limit still records lastReadWasFull: false (the model only saw the head), and the "only been partially read … re-read without offset / limit / pages" rejection sends the model back to the same truncated read — no escape.

Pre-Part-1, this case was masked by the cacheable conflation reporting "binary payload" instead, but with Part 1 in place the deadlock would surface as the accurate-but-still-wedged "partial read" message.

Drop the requireFullRead option from checkPriorRead and remove all 5 requireFullRead: true call sites in WriteFileTool. After this change the contract is identical to Claude Code's: any prior read of an existing file clears enforcement; the mtime / size drift check is the only gate that distinguishes "the model has seen current bytes" from "the model has seen older bytes", and it fires identically for Edit and WriteFile. The lastReadWasFull flag survives in the cache for the Read fast-path only.

The residual #2499 risk is acknowledged in the docstring: a model that reads only a slice and then overwrites would necessarily hallucinate the rest of the bytes. Mitigations:

The mtime / size drift check still rejects Writes against bytes the model saw at fingerprint X if disk has moved to Y.
Users who want stricter behaviour can set fileReadCacheDisabled: true (existing escape hatch, unchanged).

Behaviour	Pre-PR (0.15.9)	Post-PR	Claude Code
Edit + partial read	reject (binary payload)	accept	accept
Edit + truncated full read	reject (binary payload)	accept	accept
WriteFile + partial read	reject (binary payload)	accept	accept
WriteFile + truncated full read	reject (binary payload, then would be partial read after Part 1 alone) → deadlock	accept	accept
Edit / WriteFile + binary read	reject (binary payload)	reject (binary payload)	reject
Edit / WriteFile + mtime drift	reject (file changed since read)	reject (file changed since read)	reject
Edit / WriteFile + new file	accept	accept	accept

Effect on reported scenarios

Reporter	File / Scenario	Pre-fix	Post-fix
Kieaer	`.kt` partial read (`offset/limit`) → Edit	`binary payload` ❌	accepted ✅ (Part 1)
mingdedi	`analysis.py` Edit fails frequently	`binary payload` ❌	accepted ✅ (Part 1)
Xxianna / encounter888	`.ts` / `.cxx` Edit fails	`binary payload` ❌	accepted ✅ (Part 1)
#3945 reporter	`ai_views.py` 1416 lines, full read truncated → Edit	rejected (`fully read`) ❌	accepted ✅ (Part 1)
Frank-Shaw-FS	`.cpp` on Windows encrypted FS → Read → Edit	`binary payload` ❌	accepted ✅ (Part 2)
Anyone	Truncated full read on > truncate-tool-output-lines file → WriteFile	rejected (deadlock) ❌	accepted ✅ (Part 3)
Anyone	Real binary file Edit	`binary payload`	unchanged ✅
Anyone	Untruncated full text read	placeholder fast-path	unchanged ✅

Risk

recordRead's full parameter contract changes from "the request did not pass offset/limit/pages" to "the model saw every current byte". Only call site is read-file.ts; verified by grep across all packages.
KNOWN_TEXT_EXTENSIONS is a maintenance surface; kept tight to languages / configs we have evidence for. New extensions can be added incrementally.
Trusting the mime registry / extension means a deliberately misnamed file (e.g. binary payload renamed to evil.cpp) is now treated as text. Same outcome as the existing text/* conventions in any IDE / editor — extension is the dominant signal, content sample is a tiebreaker for unrecognised extensions.
WriteFile now accepts partial reads (Part 3). Issue Agent overwrites files using write_file without reading existing content, leading to data loss. #2499's hallucinated-overwrite scenario is the residual risk, mitigated by the mtime/size drift check and the fileReadCacheDisabled escape hatch. This matches Claude Code's contract.

Test plan

read-file.test.ts — 3 new tests (partial .kt read, truncated full .cpp read, encrypted-FS-style .cpp end-to-end through ReadFile)
fileUtils.test.ts — 5 new tests on detectFileType (text/* mime override, application/* allowlist, +xml / +json suffix match, curated extension override across .py / .kt / .go / .rb / .swift, BINARY_EXTENSIONS pre-empt regression)
write-file.test.ts — partial-read test inverted from "rejects" to "allows" (matches the EditTool counterpart); new test for the issue edit tool unusable for large files — "fully read" precondition impossible when read_file truncates #3945 truncated-full-read deadlock
npx vitest run on the full packages/core — 7393 / 7398 pass (5 skipped, all pre-existing)
tsc --noEmit -p packages/core/tsconfig.json — clean
Existing edit.test.ts / fileReadCache.test.ts pass without modification

Closes #3964.
Closes #3945.

PR QwenLM#3774 introduced prior-read enforcement that consults `lastReadCacheable` to discriminate text from binary / image / PDF / notebook payloads. ReadFileToolInvocation derived `cacheable` as `string && originalLineCount && !isTruncated`, conflating two unrelated concerns: "is the content text" and "did we see all the bytes". A partial read (offset/limit) or a truncated full read of a regular `.kt` / `.cpp` / `.py` source file therefore set `cacheable: false`, and priorReadEnforcement.ts mistook that for a non-text payload and rejected the next Edit with the misleading "binary / image / audio / video / PDF / notebook payload" error. PR QwenLM#3932 split prior-read enforcement so Edit accepts partial reads (`lastReadWasFull`-relaxed for Edit, kept for WriteFile), but the `lastReadCacheable` conflation persisted, so partial / truncated text reads still hit the binary-payload rejection on Edit. Issue QwenLM#3964 is the resulting field reports: .kt / .cpp / .py / .ts files on both Linux and Windows misclassified as binary across 0.15.7-0.15.9. Decouple the two concerns: - `cacheable` is now purely about content type. A partial or truncated text read records `cacheable: true` because the bytes the model saw were text. - Truncation gating moves to `full`. A request-level full read (no offset/limit/pages) only counts as full at the cache level when the produced content was not truncated; otherwise the model only saw the head of the file. The fast-path `file_unchanged` placeholder still requires both `lastReadWasFull && lastReadCacheable`, so its semantics are unchanged — a truncated full read now fails the AND on the moved flag instead of the original. WriteFile's `requireFullRead` still rejects partial or truncated text reads; it now reports the accurate "partial read" error instead of the wrong "binary payload" message. Also fixes issue QwenLM#3945 (edit tool unusable for large files) as a side effect: the truncated-full case there hit the same misclassified path before the rejection wording could even surface the truncation question. Tests: 2 regression tests added in read-file.test.ts (partial .kt read and truncated full .cpp read both record `lastReadCacheable: true`). Existing 7386 / 7391 (5 skipped) core tests pass; tsc --noEmit clean. Issue QwenLM#3964 also reports a separate scenario on Windows encrypted/DRM-protected file systems where .cpp source files are misclassified by `isBinaryFile`'s 4KB content sampling. That path is content-detection-side, not cache-side, and is left to a follow-up (extension- or mime-based override of the content sample for known text types).

…text Issue QwenLM#3964's first report (Frank-Shaw-FS) describes `.cpp` / `.c` / `.h` source files on a Windows encrypted / DRM-protected file system being misclassified as binary. The OS surfaces encrypted bytes to `fs.open()` random-access reads, so `isBinaryFile`'s 4 KB sample sees nulls / non-printable characters and concludes binary — even though the higher-level `readFile` returns the decrypted text and the extension declares the file as text. Layer-2 fix on top of the cache-side decoupling: change `detectFileType` to trust the registry / curated extension list *before* running the content sample, so a known text extension is not subject to false positives from raw-bytes sampling. - Trust mime types declared as text: `text/*`, `application/*` text-likes (`application/javascript`, `application/json`, `application/toml`, ...), and any mime ending in `+xml` / `+json`. - Trust a curated set of source-code / config / markup extensions whose `mime/lite` registry coverage is patchy (`.py`, `.kt`, `.go`, `.rb`, `.swift`, `.scala`, `.rs`, `.proto`, `.graphql`, `.toml`, `.hcl`, `.tf`, ...). The list is restricted to extensions we have observed to be misclassified by `isBinaryFile` in the field; obscure extensions still go through the content sampler. Order in `detectFileType`: 1. Hardcoded `.ts` / `.svg` / `.ipynb` 2. Mime check (image / audio / video / pdf / declared-text) 3. `BINARY_EXTENSIONS` pre-empt (so `.png` with text-looking content stays binary) 4. Curated text extension override (for mime-less source code) 5. `isBinaryFile` content sampler (final fallback for unrecognised extensions) 6. Default text Tests: 5 new cases in `fileUtils.test.ts` and 1 end-to-end in `read-file.test.ts` covering: text mime override on binary-looking content, application/* text mimes, `+xml` / `+json` suffix match, curated extension override on `.py` / `.kt` / `.go` / `.rb` / `.swift`, and the `BINARY_EXTENSIONS` pre-empt still winning over the new override (a `.png` whose first bytes happen to be ASCII text stays binary). Full core suite passes (7392 / 7397, 5 pre- existing skips); `tsc --noEmit` clean. Together with the earlier commit, this PR closes both arms of issue QwenLM#3964: the cache-side `cacheable` conflation that affected partial / truncated reads, and the content-detection-side false positive on encrypted file systems.

wenshao

No issues found. Downgraded from Approve to Comment: self-PR.

Reviewed changes:

Decoupled lastReadWasFull (completeness) and lastReadCacheable (text content type) flags — correct fix for the issue #3964 regression where partial/truncated text reads were misclassified as binary payload.
Added KNOWN_TEXT_APPLICATION_MIMES, KNOWN_TEXT_EXTENSIONS, and isTextMime() for extension/mime-based text classification — handles encrypted filesystem scenario without isBinaryFile false positives.
Tests pass (88 + 55), ESLint clean, no security or correctness concerns.

— glm-5.1 via Qwen Code /review

wenshao

⚠️ Downgraded from Approve to Comment: self-PR. No Critical issues found. 3 Suggestions below.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Three follow-ups flagged by `/review` on QwenLM#4002: 1. `KNOWN_TEXT_APPLICATION_MIMES` had 10 dead entries — names like `application/x-sh`, `application/x-perl`, `application/x-yaml`, `application/x-tex`, `application/x-sql`, `application/graphql` are real mimes seen in HTTP `Content-Type` contexts but are not in `mime/lite`'s registry, so `mime.getType()` never returns them and the entries are unreachable. Strip the set to the 6 values the registry actually emits (`javascript`, `ecmascript`, `node`, `json`, `xml`, `toml`); the shells / tex / sql / graphql extensions reach the text fallback through `KNOWN_TEXT_EXTENSIONS` instead. Add a scope rule in the docstring so future additions stay aligned with what mime/lite actually emits. 2. The early-return at the top of `detectFileType` listed `.ts / .mts / .cts / .tsx` in its comment but the array only contained `.ts / .mts / .cts`. `.tsx` was reaching the text verdict via `KNOWN_TEXT_EXTENSIONS`, which works today but would break if a future `mime/lite` update mapped `.tsx` to `video/mp2t` (mirroring `.ts`): the `startsWith('video/')` guard would fire before the text fallback. Move `.tsx` up to the early-return array so the comment matches the code and the defence is consistent across the TypeScript family. Drop the duplicate listing in `KNOWN_TEXT_EXTENSIONS`. 3. `isTextMime()` short-circuits `isBinaryFile` for any `text/*` mime, which is the necessary tradeoff for the encrypted-FS fix but removes the safety net for *corrupted* text files (a binary blob saved with a `.txt` / `.md` extension via redirection). Document the tradeoff explicitly with a concrete counter-example and call out that Edit's `0 occurrences` failure mode is the fallback for the corrupted-text population. Tests: 261 / 262 (1 skipped, pre-existing) on `fileUtils.test.ts` + `read-file.test.ts` + `edit.test.ts` + `write-file.test.ts`. `tsc --noEmit` clean.

wenshao · 2026-05-09T22:37:45Z

All 3 self-review suggestions are addressed in c9aa1f73c:

Dead application/* mime entries — turned out to be 10/16 dead, not 3 (the others — application/x-sh, application/x-perl, application/x-yaml, application/x-tex, application/x-sql, application/sql, application/x-shellscript, application/x-csh, application/x-latex, application/graphql — are real HTTP Content-Type names but not in the mime/lite registry, so mime.getType() never returns them). Stripped the set to the 6 values mime/lite actually emits (javascript, ecmascript, node, json, xml, toml). Shells / tex / sql / graphql still reach the text fallback through KNOWN_TEXT_EXTENSIONS. Added a scope rule in the docstring.
.tsx comment / array mismatch — moved .tsx from KNOWN_TEXT_EXTENSIONS up to the hardcoded TS-family early-return at the top of detectFileType. Comment now matches the code, and a future mime/lite update that maps .tsx → video/mp2t (mirroring .ts) is defended against.
text/* short-circuit tradeoff — added an explicit tradeoff note with a concrete counter-example (a binary blob saved with .txt / .md via cat blob.dat > notes.md) and called out Edit's 0 occurrences failure as the fallback for that corrupted-text population.

Tests: 261 / 262 (1 pre-existing skip) across fileUtils / read-file / edit / write-file. tsc --noEmit clean.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

… Code PR QwenLM#3932 deliberately diverged from Claude Code's `readFileState` by keeping `requireFullRead: true` on WriteFile's prior-read enforcement, citing issue QwenLM#2499 (LLM hallucinates content of an unread file and clobbers user changes) as evidence that the asymmetric stance was justified. In practice that stance leaves a hard deadlock: when a file is larger than `truncate-tool-output- lines`, `read_file` without offset/limit still records `lastReadWasFull: false` (the model only saw the head), and the "only been partially read … re-read without offset / limit / pages" rejection sends the model back to the same truncated read with no escape — the exact deadlock issue QwenLM#3945 reported. Drop the `requireFullRead` option from `checkPriorRead` and remove all 5 `requireFullRead: true` call sites in WriteFileTool. After this change the contract is identical to Claude Code's: any prior read of an existing file clears enforcement; the mtime/size drift check is the only gate that distinguishes "the model has seen current bytes" from "the model has seen older bytes", and it fires identically for Edit and WriteFile. The residual QwenLM#2499 risk is acknowledged in the docstring: a model that reads only a slice and then overwrites would necessarily hallucinate the rest of the bytes. Mitigations: - `fileReadCacheDisabled: true` for users who want stricter behaviour (existing escape hatch, unchanged). - The mtime/size drift check still rejects Writes against bytes the model saw at fingerprint X if disk has moved to Y. Cleanup: drops the dedicated "fresh + cacheable + partial + requireFullRead" rejection branch and the `requireFullRead`-aware wording variant in the `unknown` branch — both unreachable now. Tests: - `write-file.test.ts:932` inverted from "rejects a write when the previous read was ranged" to "allows a write after a ranged read", matching the equivalent `edit.test.ts:1077`. - New `write-file.test.ts:961` regression for the issue QwenLM#3945 deadlock: a `recordRead({ full: false, cacheable: true })` entry (what a truncated full read produces) clears WriteFile enforcement. - 7393 / 7398 (5 skipped, all pre-existing) on the full core suite. `tsc --noEmit` clean.

wenshao · 2026-05-09T23:24:56Z

Added Part 3 in b591a51af: drop requireFullRead on WriteFile and align with Claude Code's readFileState.

Pre-Part-1 the cacheable-flag bug masked a second deadlock on the WriteFile side: a read_file without offset/limit on a file larger than truncate-tool-output-lines records lastReadWasFull: false (truncated head only), and requireFullRead: true rejects with "only been partially read … re-read without offset / limit / pages" — but a re-read produces the exact same truncated state, deadlocking. With Part 1 in place that deadlock would surface as the accurate-but-still-wedged "partial read" message instead of "binary payload".

Per #3932's own comparison, Claude Code treats partial and full reads identically for both Edit and WriteFile. PR #3932 deliberately diverged citing #2499 as evidence that the asymmetric stance was justified; Part 3 reverts that decision. The mtime / size drift check is now the only gate distinguishing "model saw current bytes" from "model saw older bytes" and fires identically for both tools. fileReadCacheDisabled: true remains as the escape hatch for users who want stricter behaviour. The lastReadWasFull flag survives in the cache for the Read fast-path only.

Tests: partial-read rejection in write-file.test.ts:932 inverted to "allows a write after a ranged read" (matches the EditTool counterpart at edit.test.ts:1077); new regression for the truncated-full-read deadlock at write-file.test.ts:961. 7393 / 7398 (5 pre-existing skips) on the full core suite. PR title and body updated to reflect the broader scope.

Three sites a future contributor might naturally try to "tighten up" back into the deadlock-prone shape, now carrying explicit guard comments that name the prior PR (QwenLM#3932), the issue it broke (QwenLM#3945), and the residual risk this stance accepts (QwenLM#2499): - `priorReadEnforcement.ts:CheckPriorReadOptions` — interface-level note: do not re-introduce `requireFullRead` (or any "stricter for WriteFile than Edit") option here. References the function docstring for the full rationale. - `fileReadCache.ts:lastReadWasFull` — field-level note: sole consumer is the Read fast-path; `priorReadEnforcement` does not consult this and must not start. - `write-file.ts` first checkPriorRead call site — anchor comment that explains why no extra option is passed and applies to all 5 call sites in the file. No code changes; test suite unchanged at 7393 / 7398 (5 pre-existing skips); `tsc --noEmit` clean.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

wenshao

⚠️ Downgraded from Approve to Comment: self-PR; CI failing: Agent.

No Critical issues found. 3 Suggestions below.

…le comments 3 QwenLM#4002 review threads addressed: - fileUtils.ts: added KNOWN_TEXT_BASENAMES allowlist for extensionless build / config / lockfiles (Dockerfile, Containerfile, Makefile, GNUmakefile, Jenkinsfile, Vagrantfile, Rakefile, Gemfile, Procfile, BUILD, WORKSPACE, CMakeLists.txt, go.mod, go.sum, go.work, Cargo.lock, Pipfile, Pipfile.lock, poetry.lock, package-lock.json, yarn.lock, pnpm-lock.yaml, requirements.txt, .gitignore, .gitattributes, .dockerignore, .npmignore, .editorconfig, .env, .bashrc, .zshrc, .profile, LICENSE, COPYING, AUTHORS, CHANGELOG, README, NOTICE). `path.extname('Dockerfile')` returns `''`, so the KNOWN_TEXT_EXTENSIONS check above misses these — an encrypted-volume read whose 4 KB sample looks binary would misclassify them as binary. Regression test pinned with fake-encrypted bytes for Dockerfile / Makefile / Jenkinsfile / go.mod / package-lock.json / .gitignore / LICENSE. - priorReadEnforcement.ts: rewrote two misleading comments that pointed users to `fileReadCacheDisabled: true` for "stricter behaviour". That setting actually DISABLES enforcement entirely (skips checkPriorRead). Updated to make the opt-out semantics explicit and clarify that there is no built-in stricter mode — users who want stricter built-in enforcement than the residual QwenLM#2499 risk accepts have no flag here today and should file a feature request. - read-file.ts: updated the `lastReadWasFull` comment to reflect that PR QwenLM#4002 removed WriteFile's `requireFullRead`. The flag now gates ONLY the `file_unchanged` fast-path; the stale "and WriteFile's full-read requirement" wording would have confused future readers into thinking WriteFile still consults `lastReadWasFull`. Tests: 89/89 fileUtils.test.ts pass; tsc + ESLint clean.

… full for overwrite QwenLM#4002 review: shared "never read" error said `(a partial read with offset / limit is fine — you only need to have seen the bytes you intend to edit/overwrite)` for BOTH Edit and WriteFile. For Edit this is correct — the model only needs to have seen the `old_string`-bearing bytes; the rest passes through untouched. For WriteFile this is misleading: overwriting replaces EVERY byte, so a partial read leaves any unseen bytes as collateral damage. The mtime/size drift check still catches the worst-case QwenLM#2499 hallucinated-bytes risk, but recommending a partial read in the WriteFile guidance would actively encourage the footgun. Fix: branch the partial-read guidance on `verb`. Edit keeps the current "partial OK" text. WriteFile gets `(read the full file — overwriting replaces every byte, so any unseen bytes would be discarded)`. 120/120 edit + write-file tests pass; tsc + ESLint clean.

…CacheDisabled is escape hatch" mentions cc30278 + c6e2bde addressed 4 of the 6 QwenLM#4002 review threads but left two prior occurrences of the misleading "fileReadCacheDisabled: true is the escape hatch for users who want stricter behaviour" wording untouched. The flag actually goes the OPPOSITE way (skips checkPriorRead entirely so application-level locking can take over), so describing it as a "stricter" escape hatch is exactly the guidance the c6e2bde review thread asked us to stop giving. Files updated: - fileReadCache.ts:lastReadWasFull docstring — replaces the "stricter behaviour" sentence with the same opt-out / opt-in distinction c6e2bde used in priorReadEnforcement.ts. - write-file.ts anchor comment for all 5 checkPriorRead call sites — replaces the "fileReadCacheDisabled: true is the escape hatch" sentence with an explicit note on the opt-out direction matching the docstring. Plus a coverage-split comment on the issue QwenLM#3945 deadlock-free regression test in write-file.test.ts (review thread QwenLM#6 from glm-5.1: pointed out the test seeds the cache directly rather than driving a full ReadFile→WriteFile pipeline). A real integration test would need ReadFile-side mockConfig plumbing (`getFileService`, `getTruncateToolOutputLines`, etc.) ported into write-file.test.ts; the comment captures the link to read-file.test.ts's matching cache-population assertion so a future cache-entry schema change has to update both halves to keep the end-to-end guarantee. Tests: 295 / 296 (1 pre-existing skip) on the affected files; tsc --noEmit clean.

wenshao · 2026-05-10T01:33:25Z

Picked up the second review wave — all 6 suggestions on b591a51af / 853910bca are now addressed:

fileReadCacheDisabled: true mis-described as "stricter behaviour" — fixed in c6e2bde10 (priorReadEnforcement.ts) and 662979be6 (the two remaining mentions in fileReadCache.ts and write-file.ts that the first sweep missed).
read-file.ts lastReadWasFull stale comment — fixed in c6e2bde10.
Bare-name build files (Dockerfile, Makefile, Jenkinsfile, go.mod, lockfiles, common dotfiles, LICENSE / CHANGELOG / README) — KNOWN_TEXT_BASENAMES allowlist added in c6e2bde10 with regression test.
"never read" error wording for WriteFile no longer steers toward partial-read-then-overwrite — verb-specific guidance in cc3027800.
Same as 3 — covered by the basename allowlist.
End-to-end ReadFile→WriteFile integration — left as a follow-up; coverage-split comment added in 662979be6 linking the cache-population assertion in read-file.test.ts to the cache-consumption assertion in write-file.test.ts so any future cache schema change has to update both halves.

All 9 review threads on the PR now resolved. Tests: 7394 / 7399 (5 pre-existing skips) on the full core suite. tsc clean.

wenshao

⚠️ 审核降级：self-PR; CI 失败: Agent。

审查范围：8 个文件（priorReadEnforcement.ts、read-file.ts、write-file.ts、fileUtils.ts 及其测试）。核心修复正确：cacheable/截断解耦（#3964）、mime/扩展名文本检测（#3964 加密 FS）、移除 requireFullRead（#3945 死锁）。测试 188/188 通过，tsc/eslint 无错误。

发现的建议：
1 个新建议；其余与现有的开放内联评论重叠或已在该 PR 的早期自我审查中处理。

— DeepSeek/deepseek-v4-pro 通过 Qwen Code /review

wenshao

No review findings. Downgraded from Approve to Comment: self-PR. — DeepSeek/deepseek-v4-pro via Qwen Code /review

QwenLM#4002 review (DeepSeek): the new text-classification branches returned `'text'` without logging which path fired, leaving future QwenLM#3964-class troubleshooting unable to tell mime-trust from extension-override from basename-override from the content-sample fallback without re-deriving by code reading. Add `debugLogger.debug` calls on the three new fast-path branches: mime trust (`isTextMime` match), extension override (`KNOWN_TEXT_EXTENSIONS`), and basename override (`KNOWN_TEXT_BASENAMES`). Each log includes the path, the chosen classification, and the looked-up mime when relevant — enough to disambiguate the four classification paths from a single line. Off by default (`debug` level on the `FILE_UTILS` logger). Older branches (image / audio / video / pdf / hardcoded TS / SVG / ipynb / BINARY_EXTENSIONS / isBinaryFile / default text) keep their existing silent behaviour: they predate the issue this is paving for and adding logs there would be scope creep. Tests: 89 / 89 fileUtils.test.ts pass; tsc --noEmit clean.

wenshao · 2026-05-10T04:09:13Z

Addressed in 9ea39ea6d — added debugLogger.debug to all three new text fast-paths (mime-trust, extension-override, basename-override). Older branches kept silent to avoid scope creep.

tanzhenxin

Review

This PR addresses both populations of false-positive Edit/WriteFile rejections from #3964 plus the truncated-full-read deadlock from #3945. Part 1 decouples cacheable from truncation so partial / truncated text reads stop tripping the binary-payload rejection. Part 2 reorders detectFileType to trust the mime registry and a curated extension/basename allowlist before isBinaryFile's 4 KB content sample, which is the encrypted-FS scenario from the issue. Part 3 drops requireFullRead from WriteFile's prior-read enforcement, eliminating the deadlock that arises when a file larger than the truncate-tool-output limit can never produce a lastReadWasFull: true read.

The implementation is internally consistent across cache, fast-path, and enforcement; the contract change to recordRead's full semantics is captured in updated docstrings and anti-regression comments cite #3932 / #3945 / #2499 by number to lock the new stance for future contributors. The classification ordering preserves binary detection for .png/etc. and the mime allowlist is restricted to the 6 application/* values mime/lite actually emits.

Verdict

APPROVE — well-scoped, addresses both arms of #3964 and the #3945 deadlock, no critical findings.

wenshao

No review findings. Downgraded from Approve to Comment: self-PR. — DeepSeek/deepseek-v4-pro via Qwen Code /review

wenshao added 2 commits May 10, 2026 02:41

wenshao changed the title ~~fix(core): decouple cacheable flag from truncation in FileReadCache~~ fix(core): both arms of issue #3964 — cacheable flag + encrypted-FS misclassification May 9, 2026

wenshao commented May 9, 2026

View reviewed changes

wenshao requested a review from Copilot May 9, 2026 22:11

Copilot started reviewing on behalf of wenshao May 9, 2026 22:12 View session

wenshao mentioned this pull request May 9, 2026

File type detection misidentifies encrypted .c/.cpp/.h files as binary payloads #3964

Closed

wenshao commented May 9, 2026

View reviewed changes

Comment thread packages/core/src/utils/fileUtils.ts

Comment thread packages/core/src/utils/fileUtils.ts

Comment thread packages/core/src/utils/fileUtils.ts

wenshao mentioned this pull request May 9, 2026

edit tool unusable for large files — "fully read" precondition impossible when read_file truncates #3945

Closed

Copilot AI reviewed May 9, 2026

wenshao requested a review from Copilot May 9, 2026 22:48

Copilot started reviewing on behalf of wenshao May 9, 2026 22:49 View session

Copilot AI reviewed May 9, 2026

wenshao changed the title ~~fix(core): both arms of issue #3964 — cacheable flag + encrypted-FS misclassification~~ fix(core): unify Edit/WriteFile prior-read with Claude Code; close #3964 + #3945 May 9, 2026

wenshao requested a review from Copilot May 10, 2026 00:16

Copilot started reviewing on behalf of wenshao May 10, 2026 00:17 View session

Copilot AI reviewed May 10, 2026