Fix tool response role serialization (issue #62) by BingqingLyu · Pull Request #6 · BingqingLyu/qwen-code

BingqingLyu · 2026-04-27T06:43:38Z

Summary

Fixes the "tool result labeled as user" bug reported in issue #62 where tool responses were incorrectly serialized with role: "user" instead of role: "tool", causing OpenAI API 400 errors.

Root Cause

The bug was in the fallback role assignment logic in convertToOpenAIFormat() that defaulted all non-model content to role: "user" instead of properly handling different content types.

Changes

Fixed fallback role assignment logic in openaiContentGenerator.ts (lines 883-898)
Fixed similar logic in logging conversion (lines 1531-1533)
Added comprehensive regression tests in toolRoleSerialization.test.ts
Only genuine user content gets role: "user", other content gets appropriate roles
Tool responses with functionResponse parts correctly get role: "tool"

Test Coverage

✅ Function responses emit role: "tool" with correct tool_call_id
✅ No tool content gets assigned role: "user"
✅ Genuine user messages preserve role: "user"
✅ Mixed conversations with tools handled correctly
✅ Fallback roles don't default to user for non-user content

Validation

All existing tests pass
New regression tests pass
Manual testing shows tool responses now correctly formatted

Closes #62

- Add debug instrumentation to openaiContentGenerator.ts to dump message payloads - Add CLI debug output to understand message flow and conversion - Add reproduce-tool-bug.js script to systematically reproduce the issue - Add test.txt file for testing tool calls - Include debug payload dumps showing the bug where tool responses get role: 'user' instead of role: 'tool' in Gemini->OpenAI conversion Related to issue #62: API Error when calling tools - tool_calls not followed by proper tool response messages due to incorrect role assignment.

Resolves issue #62 where tool responses were incorrectly serialized with role: "user" instead of role: "tool", causing OpenAI API 400 errors. Changes: - Fixed fallback role assignment logic in convertToOpenAIFormat (lines 883-898) - Fixed similar logic in logging conversion (lines 1531-1533) - Only genuine user content gets role: "user", other content gets appropriate roles - Added comprehensive regression tests to prevent recurrence The bug was in the else-fallback that defaulted non-model content to role: "user" instead of properly handling different content types. Tool responses with functionResponse parts were already correctly handled, but the fallback logic was incorrectly overriding this for edge cases. Fixes #62

…e 'user' The real root cause was in useGeminiStream.ts where cancelled tool responses were being added to conversation history with role: 'user' instead of role: 'function'. This caused the LLM to interpret tool results as user input, leading to confusion about the conversation flow. Fixes the remaining part of issue #62 where models would respond as if the user had sent the tool output directly.

Found another instance where tool response parts were being assigned role: 'user' instead of role: 'function' in the non-interactive CLI mode. This ensures tool responses have consistent, correct role assignment across all code paths. Related to issue #62

BingqingLyu · 2026-05-07T07:15:38Z

Conflict Group 1

This PR shares modified functions with 13 other PR(s): #106, #114, #117, #14, #17, #31, #55, #71, #75, #88, #9, #94, #96.

These PRs should be reviewed as a batch — merging one may affect the others.

Function	File	Also modified by
`runNonInteractive`	`nonInteractiveCli.ts`	#114, #14, #17, #71, #75
`showCitations`	`useGeminiStream.ts`	#106, #55, #9, #96
`startInteractiveUI`	`gemini.tsx`	#114, #117, #14, #17, #31, #88, #94

graph LR
    PR6["PR #6"]
    FrunNonInteractive_8467["runNonInteractive<br>nonInteractiveCli.ts"]
    PR6 -->|modifies| FrunNonInteractive_8467
    PR114["PR #114"]
    PR114 -->|modifies| FrunNonInteractive_8467
    PR14["PR #14"]
    PR14 -->|modifies| FrunNonInteractive_8467
    PR17["PR #17"]
    PR17 -->|modifies| FrunNonInteractive_8467
    PR71["PR #71"]
    PR71 -->|modifies| FrunNonInteractive_8467
    PR75["PR #75"]
    PR75 -->|modifies| FrunNonInteractive_8467
    FshowCitations_6790["showCitations<br>useGeminiStream.ts"]
    PR6 -->|modifies| FshowCitations_6790
    PR106["PR #106"]
    PR106 -->|modifies| FshowCitations_6790
    PR55["PR #55"]
    PR55 -->|modifies| FshowCitations_6790
    PR9["PR #9"]
    PR9 -->|modifies| FshowCitations_6790
    PR96["PR #96"]
    PR96 -->|modifies| FshowCitations_6790
    FstartInteractiveUI_4875["startInteractiveUI<br>gemini.tsx"]
    PR6 -->|modifies| FstartInteractiveUI_4875
    PR114 -->|modifies| FstartInteractiveUI_4875
    PR117["PR #117"]
    PR117 -->|modifies| FstartInteractiveUI_4875
    PR14 -->|modifies| FstartInteractiveUI_4875
    PR17 -->|modifies| FstartInteractiveUI_4875
    PR31["PR #31"]
    PR31 -->|modifies| FstartInteractiveUI_4875
    PR88["PR #88"]
    PR88 -->|modifies| FstartInteractiveUI_4875
    PR94["PR #94"]
    PR94 -->|modifies| FstartInteractiveUI_4875

Posted by codegraph-ai conflict detection.

…enLM#3964 + QwenLM#3945 (QwenLM#4002) * fix(core): decouple cacheable flag from truncation in FileReadCache PR QwenLM#3774 introduced prior-read enforcement that consults `lastReadCacheable` to discriminate text from binary / image / PDF / notebook payloads. ReadFileToolInvocation derived `cacheable` as `string && originalLineCount && !isTruncated`, conflating two unrelated concerns: "is the content text" and "did we see all the bytes". A partial read (offset/limit) or a truncated full read of a regular `.kt` / `.cpp` / `.py` source file therefore set `cacheable: false`, and priorReadEnforcement.ts mistook that for a non-text payload and rejected the next Edit with the misleading "binary / image / audio / video / PDF / notebook payload" error. PR QwenLM#3932 split prior-read enforcement so Edit accepts partial reads (`lastReadWasFull`-relaxed for Edit, kept for WriteFile), but the `lastReadCacheable` conflation persisted, so partial / truncated text reads still hit the binary-payload rejection on Edit. Issue QwenLM#3964 is the resulting field reports: .kt / .cpp / .py / .ts files on both Linux and Windows misclassified as binary across 0.15.7-0.15.9. Decouple the two concerns: - `cacheable` is now purely about content type. A partial or truncated text read records `cacheable: true` because the bytes the model saw were text. - Truncation gating moves to `full`. A request-level full read (no offset/limit/pages) only counts as full at the cache level when the produced content was not truncated; otherwise the model only saw the head of the file. The fast-path `file_unchanged` placeholder still requires both `lastReadWasFull && lastReadCacheable`, so its semantics are unchanged — a truncated full read now fails the AND on the moved flag instead of the original. WriteFile's `requireFullRead` still rejects partial or truncated text reads; it now reports the accurate "partial read" error instead of the wrong "binary payload" message. Also fixes issue QwenLM#3945 (edit tool unusable for large files) as a side effect: the truncated-full case there hit the same misclassified path before the rejection wording could even surface the truncation question. Tests: 2 regression tests added in read-file.test.ts (partial .kt read and truncated full .cpp read both record `lastReadCacheable: true`). Existing 7386 / 7391 (5 skipped) core tests pass; tsc --noEmit clean. Issue QwenLM#3964 also reports a separate scenario on Windows encrypted/DRM-protected file systems where .cpp source files are misclassified by `isBinaryFile`'s 4KB content sampling. That path is content-detection-side, not cache-side, and is left to a follow-up (extension- or mime-based override of the content sample for known text types). * fix(core): trust extension/mime over isBinaryFile sampling for known text Issue QwenLM#3964's first report (Frank-Shaw-FS) describes `.cpp` / `.c` / `.h` source files on a Windows encrypted / DRM-protected file system being misclassified as binary. The OS surfaces encrypted bytes to `fs.open()` random-access reads, so `isBinaryFile`'s 4 KB sample sees nulls / non-printable characters and concludes binary — even though the higher-level `readFile` returns the decrypted text and the extension declares the file as text. Layer-2 fix on top of the cache-side decoupling: change `detectFileType` to trust the registry / curated extension list *before* running the content sample, so a known text extension is not subject to false positives from raw-bytes sampling. - Trust mime types declared as text: `text/*`, `application/*` text-likes (`application/javascript`, `application/json`, `application/toml`, ...), and any mime ending in `+xml` / `+json`. - Trust a curated set of source-code / config / markup extensions whose `mime/lite` registry coverage is patchy (`.py`, `.kt`, `.go`, `.rb`, `.swift`, `.scala`, `.rs`, `.proto`, `.graphql`, `.toml`, `.hcl`, `.tf`, ...). The list is restricted to extensions we have observed to be misclassified by `isBinaryFile` in the field; obscure extensions still go through the content sampler. Order in `detectFileType`: 1. Hardcoded `.ts` / `.svg` / `.ipynb` 2. Mime check (image / audio / video / pdf / declared-text) 3. `BINARY_EXTENSIONS` pre-empt (so `.png` with text-looking content stays binary) 4. Curated text extension override (for mime-less source code) 5. `isBinaryFile` content sampler (final fallback for unrecognised extensions) 6. Default text Tests: 5 new cases in `fileUtils.test.ts` and 1 end-to-end in `read-file.test.ts` covering: text mime override on binary-looking content, application/* text mimes, `+xml` / `+json` suffix match, curated extension override on `.py` / `.kt` / `.go` / `.rb` / `.swift`, and the `BINARY_EXTENSIONS` pre-empt still winning over the new override (a `.png` whose first bytes happen to be ASCII text stays binary). Full core suite passes (7392 / 7397, 5 pre- existing skips); `tsc --noEmit` clean. Together with the earlier commit, this PR closes both arms of issue QwenLM#3964: the cache-side `cacheable` conflation that affected partial / truncated reads, and the content-detection-side false positive on encrypted file systems. * fix(core): tighten detectFileType after self-review on QwenLM#4002 Three follow-ups flagged by `/review` on QwenLM#4002: 1. `KNOWN_TEXT_APPLICATION_MIMES` had 10 dead entries — names like `application/x-sh`, `application/x-perl`, `application/x-yaml`, `application/x-tex`, `application/x-sql`, `application/graphql` are real mimes seen in HTTP `Content-Type` contexts but are not in `mime/lite`'s registry, so `mime.getType()` never returns them and the entries are unreachable. Strip the set to the 6 values the registry actually emits (`javascript`, `ecmascript`, `node`, `json`, `xml`, `toml`); the shells / tex / sql / graphql extensions reach the text fallback through `KNOWN_TEXT_EXTENSIONS` instead. Add a scope rule in the docstring so future additions stay aligned with what mime/lite actually emits. 2. The early-return at the top of `detectFileType` listed `.ts / .mts / .cts / .tsx` in its comment but the array only contained `.ts / .mts / .cts`. `.tsx` was reaching the text verdict via `KNOWN_TEXT_EXTENSIONS`, which works today but would break if a future `mime/lite` update mapped `.tsx` to `video/mp2t` (mirroring `.ts`): the `startsWith('video/')` guard would fire before the text fallback. Move `.tsx` up to the early-return array so the comment matches the code and the defence is consistent across the TypeScript family. Drop the duplicate listing in `KNOWN_TEXT_EXTENSIONS`. 3. `isTextMime()` short-circuits `isBinaryFile` for any `text/*` mime, which is the necessary tradeoff for the encrypted-FS fix but removes the safety net for *corrupted* text files (a binary blob saved with a `.txt` / `.md` extension via redirection). Document the tradeoff explicitly with a concrete counter-example and call out that Edit's `0 occurrences` failure mode is the fallback for the corrupted-text population. Tests: 261 / 262 (1 skipped, pre-existing) on `fileUtils.test.ts` + `read-file.test.ts` + `edit.test.ts` + `write-file.test.ts`. `tsc --noEmit` clean. * fix(core): drop full-read requirement on WriteFile, align with Claude Code PR QwenLM#3932 deliberately diverged from Claude Code's `readFileState` by keeping `requireFullRead: true` on WriteFile's prior-read enforcement, citing issue QwenLM#2499 (LLM hallucinates content of an unread file and clobbers user changes) as evidence that the asymmetric stance was justified. In practice that stance leaves a hard deadlock: when a file is larger than `truncate-tool-output- lines`, `read_file` without offset/limit still records `lastReadWasFull: false` (the model only saw the head), and the "only been partially read … re-read without offset / limit / pages" rejection sends the model back to the same truncated read with no escape — the exact deadlock issue QwenLM#3945 reported. Drop the `requireFullRead` option from `checkPriorRead` and remove all 5 `requireFullRead: true` call sites in WriteFileTool. After this change the contract is identical to Claude Code's: any prior read of an existing file clears enforcement; the mtime/size drift check is the only gate that distinguishes "the model has seen current bytes" from "the model has seen older bytes", and it fires identically for Edit and WriteFile. The residual QwenLM#2499 risk is acknowledged in the docstring: a model that reads only a slice and then overwrites would necessarily hallucinate the rest of the bytes. Mitigations: - `fileReadCacheDisabled: true` for users who want stricter behaviour (existing escape hatch, unchanged). - The mtime/size drift check still rejects Writes against bytes the model saw at fingerprint X if disk has moved to Y. Cleanup: drops the dedicated "fresh + cacheable + partial + requireFullRead" rejection branch and the `requireFullRead`-aware wording variant in the `unknown` branch — both unreachable now. Tests: - `write-file.test.ts:932` inverted from "rejects a write when the previous read was ranged" to "allows a write after a ranged read", matching the equivalent `edit.test.ts:1077`. - New `write-file.test.ts:961` regression for the issue QwenLM#3945 deadlock: a `recordRead({ full: false, cacheable: true })` entry (what a truncated full read produces) clears WriteFile enforcement. - 7393 / 7398 (5 skipped, all pre-existing) on the full core suite. `tsc --noEmit` clean. * docs(core): add anti-regression notes locking in the WriteFile relax Three sites a future contributor might naturally try to "tighten up" back into the deadlock-prone shape, now carrying explicit guard comments that name the prior PR (QwenLM#3932), the issue it broke (QwenLM#3945), and the residual risk this stance accepts (QwenLM#2499): - `priorReadEnforcement.ts:CheckPriorReadOptions` — interface-level note: do not re-introduce `requireFullRead` (or any "stricter for WriteFile than Edit") option here. References the function docstring for the full rationale. - `fileReadCache.ts:lastReadWasFull` — field-level note: sole consumer is the Read fast-path; `priorReadEnforcement` does not consult this and must not start. - `write-file.ts` first checkPriorRead call site — anchor comment that explains why no extra option is passed and applies to all 5 call sites in the file. No code changes; test suite unchanged at 7393 / 7398 (5 pre-existing skips); `tsc --noEmit` clean. * fix(core): QwenLM#4002 review wave — basename allowlist + correct stale comments 3 QwenLM#4002 review threads addressed: - fileUtils.ts: added KNOWN_TEXT_BASENAMES allowlist for extensionless build / config / lockfiles (Dockerfile, Containerfile, Makefile, GNUmakefile, Jenkinsfile, Vagrantfile, Rakefile, Gemfile, Procfile, BUILD, WORKSPACE, CMakeLists.txt, go.mod, go.sum, go.work, Cargo.lock, Pipfile, Pipfile.lock, poetry.lock, package-lock.json, yarn.lock, pnpm-lock.yaml, requirements.txt, .gitignore, .gitattributes, .dockerignore, .npmignore, .editorconfig, .env, .bashrc, .zshrc, .profile, LICENSE, COPYING, AUTHORS, CHANGELOG, README, NOTICE). `path.extname('Dockerfile')` returns `''`, so the KNOWN_TEXT_EXTENSIONS check above misses these — an encrypted-volume read whose 4 KB sample looks binary would misclassify them as binary. Regression test pinned with fake-encrypted bytes for Dockerfile / Makefile / Jenkinsfile / go.mod / package-lock.json / .gitignore / LICENSE. - priorReadEnforcement.ts: rewrote two misleading comments that pointed users to `fileReadCacheDisabled: true` for "stricter behaviour". That setting actually DISABLES enforcement entirely (skips checkPriorRead). Updated to make the opt-out semantics explicit and clarify that there is no built-in stricter mode — users who want stricter built-in enforcement than the residual QwenLM#2499 risk accepts have no flag here today and should file a feature request. - read-file.ts: updated the `lastReadWasFull` comment to reflect that PR QwenLM#4002 removed WriteFile's `requireFullRead`. The flag now gates ONLY the `file_unchanged` fast-path; the stale "and WriteFile's full-read requirement" wording would have confused future readers into thinking WriteFile still consults `lastReadWasFull`. Tests: 89/89 fileUtils.test.ts pass; tsc + ESLint clean. * fix(core): split priorReadEnforcement guidance — partial OK for edit, full for overwrite QwenLM#4002 review: shared "never read" error said `(a partial read with offset / limit is fine — you only need to have seen the bytes you intend to edit/overwrite)` for BOTH Edit and WriteFile. For Edit this is correct — the model only needs to have seen the `old_string`-bearing bytes; the rest passes through untouched. For WriteFile this is misleading: overwriting replaces EVERY byte, so a partial read leaves any unseen bytes as collateral damage. The mtime/size drift check still catches the worst-case QwenLM#2499 hallucinated-bytes risk, but recommending a partial read in the WriteFile guidance would actively encourage the footgun. Fix: branch the partial-read guidance on `verb`. Edit keeps the current "partial OK" text. WriteFile gets `(read the full file — overwriting replaces every byte, so any unseen bytes would be discarded)`. 120/120 edit + write-file tests pass; tsc + ESLint clean. * docs(core): finish QwenLM#4002 review wave — drop two stale "fileReadCacheDisabled is escape hatch" mentions cc30278 + c6e2bde addressed 4 of the 6 QwenLM#4002 review threads but left two prior occurrences of the misleading "fileReadCacheDisabled: true is the escape hatch for users who want stricter behaviour" wording untouched. The flag actually goes the OPPOSITE way (skips checkPriorRead entirely so application-level locking can take over), so describing it as a "stricter" escape hatch is exactly the guidance the c6e2bde review thread asked us to stop giving. Files updated: - fileReadCache.ts:lastReadWasFull docstring — replaces the "stricter behaviour" sentence with the same opt-out / opt-in distinction c6e2bde used in priorReadEnforcement.ts. - write-file.ts anchor comment for all 5 checkPriorRead call sites — replaces the "fileReadCacheDisabled: true is the escape hatch" sentence with an explicit note on the opt-out direction matching the docstring. Plus a coverage-split comment on the issue QwenLM#3945 deadlock-free regression test in write-file.test.ts (review thread #6 from glm-5.1: pointed out the test seeds the cache directly rather than driving a full ReadFile→WriteFile pipeline). A real integration test would need ReadFile-side mockConfig plumbing (`getFileService`, `getTruncateToolOutputLines`, etc.) ported into write-file.test.ts; the comment captures the link to read-file.test.ts's matching cache-population assertion so a future cache-entry schema change has to update both halves to keep the end-to-end guarantee. Tests: 295 / 296 (1 pre-existing skip) on the affected files; tsc --noEmit clean. * chore(core): add debug logs to detectFileType text fast-paths QwenLM#4002 review (DeepSeek): the new text-classification branches returned `'text'` without logging which path fired, leaving future QwenLM#3964-class troubleshooting unable to tell mime-trust from extension-override from basename-override from the content-sample fallback without re-deriving by code reading. Add `debugLogger.debug` calls on the three new fast-path branches: mime trust (`isTextMime` match), extension override (`KNOWN_TEXT_EXTENSIONS`), and basename override (`KNOWN_TEXT_BASENAMES`). Each log includes the path, the chosen classification, and the looked-up mime when relevant — enough to disambiguate the four classification paths from a single line. Off by default (`debug` level on the `FILE_UTILS` logger). Older branches (image / audio / video / pdf / hardcoded TS / SVG / ipynb / BINARY_EXTENSIONS / isBinaryFile / default text) keep their existing silent behaviour: they predate the issue this is paving for and adding logs there would be scope creep. Tests: 89 / 89 fileUtils.test.ts pass; tsc --noEmit clean.

chazmaniandinkle added 5 commits July 24, 2025 13:13

Fix import path in test file for build compatibility

bacf6c0

BingqingLyu added conflicting-group-1 labels Apr 28, 2026

BingqingLyu added conflicting-group-1 and removed conflicting-group-1 labels Apr 29, 2026

BingqingLyu added conflicting-group-1 conflicting-group-1 Conflicting PR group 1 — review as a batch conflicting-pr Shares at least one cross-PR dependency with other PRs and removed conflicting-group-1 labels May 7, 2026

This was referenced May 7, 2026

fix(core): preserve reasoning_content when merging consecutive assistant messages (#3619) #126

Open

fix(core): handle shell line continuations in command splitting #115

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tool response role serialization (issue #62)#6

Fix tool response role serialization (issue #62)#6
BingqingLyu wants to merge 5 commits into
mainfrom
fork-pr-112-fix-tool-role-serialization

BingqingLyu commented Apr 27, 2026 •

edited

Loading

Uh oh!

BingqingLyu commented May 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BingqingLyu commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Changes

Test Coverage

Validation

Uh oh!

BingqingLyu commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflict Group 1

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BingqingLyu commented Apr 27, 2026 •

edited

Loading

BingqingLyu commented May 7, 2026 •

edited

Loading