Skip to content

Fix tool response role serialization (issue #62)#6

Open
BingqingLyu wants to merge 5 commits into
mainfrom
fork-pr-112-fix-tool-role-serialization
Open

Fix tool response role serialization (issue #62)#6
BingqingLyu wants to merge 5 commits into
mainfrom
fork-pr-112-fix-tool-role-serialization

Conversation

@BingqingLyu

@BingqingLyu BingqingLyu commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes the "tool result labeled as user" bug reported in issue #62 where tool responses were incorrectly serialized with role: "user" instead of role: "tool", causing OpenAI API 400 errors.

Root Cause

The bug was in the fallback role assignment logic in convertToOpenAIFormat() that defaulted all non-model content to role: "user" instead of properly handling different content types.

Changes

  • Fixed fallback role assignment logic in openaiContentGenerator.ts (lines 883-898)
  • Fixed similar logic in logging conversion (lines 1531-1533)
  • Added comprehensive regression tests in toolRoleSerialization.test.ts
  • Only genuine user content gets role: "user", other content gets appropriate roles
  • Tool responses with functionResponse parts correctly get role: "tool"

Test Coverage

  • ✅ Function responses emit role: "tool" with correct tool_call_id
  • ✅ No tool content gets assigned role: "user"
  • ✅ Genuine user messages preserve role: "user"
  • ✅ Mixed conversations with tools handled correctly
  • ✅ Fallback roles don't default to user for non-user content

Validation

  • All existing tests pass
  • New regression tests pass
  • Manual testing shows tool responses now correctly formatted

Closes #62

- Add debug instrumentation to openaiContentGenerator.ts to dump message payloads
- Add CLI debug output to understand message flow and conversion
- Add reproduce-tool-bug.js script to systematically reproduce the issue
- Add test.txt file for testing tool calls
- Include debug payload dumps showing the bug where tool responses
  get role: 'user' instead of role: 'tool' in Gemini->OpenAI conversion

Related to issue #62: API Error when calling tools - tool_calls not
followed by proper tool response messages due to incorrect role assignment.
Resolves issue #62 where tool responses were incorrectly serialized with
role: "user" instead of role: "tool", causing OpenAI API 400 errors.

Changes:
- Fixed fallback role assignment logic in convertToOpenAIFormat (lines 883-898)
- Fixed similar logic in logging conversion (lines 1531-1533)
- Only genuine user content gets role: "user", other content gets appropriate roles
- Added comprehensive regression tests to prevent recurrence

The bug was in the else-fallback that defaulted non-model content to role: "user"
instead of properly handling different content types. Tool responses with
functionResponse parts were already correctly handled, but the fallback logic
was incorrectly overriding this for edge cases.

Fixes #62
…e 'user'

The real root cause was in useGeminiStream.ts where cancelled tool responses
were being added to conversation history with role: 'user' instead of
role: 'function'. This caused the LLM to interpret tool results as user
input, leading to confusion about the conversation flow.

Fixes the remaining part of issue #62 where models would respond as if
the user had sent the tool output directly.
Found another instance where tool response parts were being assigned
role: 'user' instead of role: 'function' in the non-interactive CLI mode.
This ensures tool responses have consistent, correct role assignment
across all code paths.

Related to issue #62
@BingqingLyu

BingqingLyu commented May 7, 2026

Copy link
Copy Markdown
Owner Author

Conflict Group 1

This PR shares modified functions with 13 other PR(s): #106, #114, #117, #14, #17, #31, #55, #71, #75, #88, #9, #94, #96.

These PRs should be reviewed as a batch — merging one may affect the others.

Function File Also modified by
runNonInteractive nonInteractiveCli.ts #114, #14, #17, #71, #75
showCitations useGeminiStream.ts #106, #55, #9, #96
startInteractiveUI gemini.tsx #114, #117, #14, #17, #31, #88, #94
graph LR
    PR6["PR #6"]
    FrunNonInteractive_8467["runNonInteractive<br>nonInteractiveCli.ts"]
    PR6 -->|modifies| FrunNonInteractive_8467
    PR114["PR #114"]
    PR114 -->|modifies| FrunNonInteractive_8467
    PR14["PR #14"]
    PR14 -->|modifies| FrunNonInteractive_8467
    PR17["PR #17"]
    PR17 -->|modifies| FrunNonInteractive_8467
    PR71["PR #71"]
    PR71 -->|modifies| FrunNonInteractive_8467
    PR75["PR #75"]
    PR75 -->|modifies| FrunNonInteractive_8467
    FshowCitations_6790["showCitations<br>useGeminiStream.ts"]
    PR6 -->|modifies| FshowCitations_6790
    PR106["PR #106"]
    PR106 -->|modifies| FshowCitations_6790
    PR55["PR #55"]
    PR55 -->|modifies| FshowCitations_6790
    PR9["PR #9"]
    PR9 -->|modifies| FshowCitations_6790
    PR96["PR #96"]
    PR96 -->|modifies| FshowCitations_6790
    FstartInteractiveUI_4875["startInteractiveUI<br>gemini.tsx"]
    PR6 -->|modifies| FstartInteractiveUI_4875
    PR114 -->|modifies| FstartInteractiveUI_4875
    PR117["PR #117"]
    PR117 -->|modifies| FstartInteractiveUI_4875
    PR14 -->|modifies| FstartInteractiveUI_4875
    PR17 -->|modifies| FstartInteractiveUI_4875
    PR31["PR #31"]
    PR31 -->|modifies| FstartInteractiveUI_4875
    PR88["PR #88"]
    PR88 -->|modifies| FstartInteractiveUI_4875
    PR94["PR #94"]
    PR94 -->|modifies| FstartInteractiveUI_4875
Loading

Posted by codegraph-ai conflict detection.

BingqingLyu pushed a commit that referenced this pull request May 11, 2026
…enLM#3964 + QwenLM#3945 (QwenLM#4002)

* fix(core): decouple cacheable flag from truncation in FileReadCache

PR QwenLM#3774 introduced prior-read enforcement that consults
`lastReadCacheable` to discriminate text from binary / image / PDF /
notebook payloads. ReadFileToolInvocation derived `cacheable` as
`string && originalLineCount && !isTruncated`, conflating two
unrelated concerns: "is the content text" and "did we see all the
bytes". A partial read (offset/limit) or a truncated full read of a
regular `.kt` / `.cpp` / `.py` source file therefore set
`cacheable: false`, and priorReadEnforcement.ts mistook that for a
non-text payload and rejected the next Edit with the misleading
"binary / image / audio / video / PDF / notebook payload" error.

PR QwenLM#3932 split prior-read enforcement so Edit accepts partial reads
(`lastReadWasFull`-relaxed for Edit, kept for WriteFile), but the
`lastReadCacheable` conflation persisted, so partial / truncated text
reads still hit the binary-payload rejection on Edit. Issue QwenLM#3964 is
the resulting field reports: .kt / .cpp / .py / .ts files on both
Linux and Windows misclassified as binary across 0.15.7-0.15.9.

Decouple the two concerns:
  - `cacheable` is now purely about content type. A partial or
    truncated text read records `cacheable: true` because the bytes
    the model saw were text.
  - Truncation gating moves to `full`. A request-level full read
    (no offset/limit/pages) only counts as full at the cache level
    when the produced content was not truncated; otherwise the model
    only saw the head of the file.

The fast-path `file_unchanged` placeholder still requires both
`lastReadWasFull && lastReadCacheable`, so its semantics are unchanged
— a truncated full read now fails the AND on the moved flag instead
of the original. WriteFile's `requireFullRead` still rejects partial
or truncated text reads; it now reports the accurate "partial read"
error instead of the wrong "binary payload" message.

Also fixes issue QwenLM#3945 (edit tool unusable for large files) as a
side effect: the truncated-full case there hit the same misclassified
path before the rejection wording could even surface the truncation
question.

Tests: 2 regression tests added in read-file.test.ts (partial .kt
read and truncated full .cpp read both record `lastReadCacheable:
true`). Existing 7386 / 7391 (5 skipped) core tests pass; tsc
--noEmit clean.

Issue QwenLM#3964 also reports a separate scenario on Windows
encrypted/DRM-protected file systems where .cpp source files are
misclassified by `isBinaryFile`'s 4KB content sampling. That path is
content-detection-side, not cache-side, and is left to a follow-up
(extension- or mime-based override of the content sample for known
text types).

* fix(core): trust extension/mime over isBinaryFile sampling for known text

Issue QwenLM#3964's first report (Frank-Shaw-FS) describes `.cpp` / `.c` /
`.h` source files on a Windows encrypted / DRM-protected file system
being misclassified as binary. The OS surfaces encrypted bytes to
`fs.open()` random-access reads, so `isBinaryFile`'s 4 KB sample
sees nulls / non-printable characters and concludes binary — even
though the higher-level `readFile` returns the decrypted text and
the extension declares the file as text.

Layer-2 fix on top of the cache-side decoupling: change
`detectFileType` to trust the registry / curated extension list
*before* running the content sample, so a known text extension is
not subject to false positives from raw-bytes sampling.

  - Trust mime types declared as text: `text/*`, `application/*`
    text-likes (`application/javascript`, `application/json`,
    `application/toml`, ...), and any mime ending in `+xml` / `+json`.
  - Trust a curated set of source-code / config / markup
    extensions whose `mime/lite` registry coverage is patchy (`.py`,
    `.kt`, `.go`, `.rb`, `.swift`, `.scala`, `.rs`, `.proto`,
    `.graphql`, `.toml`, `.hcl`, `.tf`, ...). The list is restricted
    to extensions we have observed to be misclassified by
    `isBinaryFile` in the field; obscure extensions still go through
    the content sampler.

Order in `detectFileType`:
  1. Hardcoded `.ts` / `.svg` / `.ipynb`
  2. Mime check (image / audio / video / pdf / declared-text)
  3. `BINARY_EXTENSIONS` pre-empt (so `.png` with text-looking
     content stays binary)
  4. Curated text extension override (for mime-less source code)
  5. `isBinaryFile` content sampler (final fallback for
     unrecognised extensions)
  6. Default text

Tests: 5 new cases in `fileUtils.test.ts` and 1 end-to-end in
`read-file.test.ts` covering: text mime override on binary-looking
content, application/* text mimes, `+xml` / `+json` suffix match,
curated extension override on `.py` / `.kt` / `.go` / `.rb` /
`.swift`, and the `BINARY_EXTENSIONS` pre-empt still winning over
the new override (a `.png` whose first bytes happen to be ASCII
text stays binary). Full core suite passes (7392 / 7397, 5 pre-
existing skips); `tsc --noEmit` clean.

Together with the earlier commit, this PR closes both arms of issue
QwenLM#3964: the cache-side `cacheable` conflation that affected partial /
truncated reads, and the content-detection-side false positive on
encrypted file systems.

* fix(core): tighten detectFileType after self-review on QwenLM#4002

Three follow-ups flagged by `/review` on QwenLM#4002:

1. `KNOWN_TEXT_APPLICATION_MIMES` had 10 dead entries — names like
   `application/x-sh`, `application/x-perl`, `application/x-yaml`,
   `application/x-tex`, `application/x-sql`, `application/graphql`
   are real mimes seen in HTTP `Content-Type` contexts but are not
   in `mime/lite`'s registry, so `mime.getType()` never returns
   them and the entries are unreachable. Strip the set to the 6
   values the registry actually emits (`javascript`, `ecmascript`,
   `node`, `json`, `xml`, `toml`); the shells / tex / sql / graphql
   extensions reach the text fallback through `KNOWN_TEXT_EXTENSIONS`
   instead. Add a scope rule in the docstring so future additions
   stay aligned with what mime/lite actually emits.

2. The early-return at the top of `detectFileType` listed
   `.ts / .mts / .cts / .tsx` in its comment but the array only
   contained `.ts / .mts / .cts`. `.tsx` was reaching the text
   verdict via `KNOWN_TEXT_EXTENSIONS`, which works today but
   would break if a future `mime/lite` update mapped `.tsx` to
   `video/mp2t` (mirroring `.ts`): the `startsWith('video/')`
   guard would fire before the text fallback. Move `.tsx` up to
   the early-return array so the comment matches the code and the
   defence is consistent across the TypeScript family. Drop the
   duplicate listing in `KNOWN_TEXT_EXTENSIONS`.

3. `isTextMime()` short-circuits `isBinaryFile` for any `text/*`
   mime, which is the necessary tradeoff for the encrypted-FS fix
   but removes the safety net for *corrupted* text files (a binary
   blob saved with a `.txt` / `.md` extension via redirection).
   Document the tradeoff explicitly with a concrete counter-example
   and call out that Edit's `0 occurrences` failure mode is the
   fallback for the corrupted-text population.

Tests: 261 / 262 (1 skipped, pre-existing) on
`fileUtils.test.ts` + `read-file.test.ts` + `edit.test.ts` +
`write-file.test.ts`. `tsc --noEmit` clean.

* fix(core): drop full-read requirement on WriteFile, align with Claude Code

PR QwenLM#3932 deliberately diverged from Claude Code's `readFileState` by
keeping `requireFullRead: true` on WriteFile's prior-read
enforcement, citing issue QwenLM#2499 (LLM hallucinates content of an
unread file and clobbers user changes) as evidence that the
asymmetric stance was justified. In practice that stance leaves a
hard deadlock: when a file is larger than `truncate-tool-output-
lines`, `read_file` without offset/limit still records
`lastReadWasFull: false` (the model only saw the head), and the
"only been partially read … re-read without offset / limit /
pages" rejection sends the model back to the same truncated read
with no escape — the exact deadlock issue QwenLM#3945 reported.

Drop the `requireFullRead` option from `checkPriorRead` and remove
all 5 `requireFullRead: true` call sites in WriteFileTool. After
this change the contract is identical to Claude Code's: any prior
read of an existing file clears enforcement; the mtime/size drift
check is the only gate that distinguishes "the model has seen
current bytes" from "the model has seen older bytes", and it fires
identically for Edit and WriteFile.

The residual QwenLM#2499 risk is acknowledged in the docstring: a model
that reads only a slice and then overwrites would necessarily
hallucinate the rest of the bytes. Mitigations:
  - `fileReadCacheDisabled: true` for users who want stricter
    behaviour (existing escape hatch, unchanged).
  - The mtime/size drift check still rejects Writes against bytes
    the model saw at fingerprint X if disk has moved to Y.

Cleanup: drops the dedicated "fresh + cacheable + partial +
requireFullRead" rejection branch and the `requireFullRead`-aware
wording variant in the `unknown` branch — both unreachable now.

Tests:
  - `write-file.test.ts:932` inverted from "rejects a write when
    the previous read was ranged" to "allows a write after a
    ranged read", matching the equivalent `edit.test.ts:1077`.
  - New `write-file.test.ts:961` regression for the issue QwenLM#3945
    deadlock: a `recordRead({ full: false, cacheable: true })`
    entry (what a truncated full read produces) clears WriteFile
    enforcement.
  - 7393 / 7398 (5 skipped, all pre-existing) on the full core
    suite. `tsc --noEmit` clean.

* docs(core): add anti-regression notes locking in the WriteFile relax

Three sites a future contributor might naturally try to "tighten up"
back into the deadlock-prone shape, now carrying explicit guard
comments that name the prior PR (QwenLM#3932), the issue it broke (QwenLM#3945),
and the residual risk this stance accepts (QwenLM#2499):

  - `priorReadEnforcement.ts:CheckPriorReadOptions` — interface-level
    note: do not re-introduce `requireFullRead` (or any "stricter for
    WriteFile than Edit") option here. References the function
    docstring for the full rationale.

  - `fileReadCache.ts:lastReadWasFull` — field-level note: sole
    consumer is the Read fast-path; `priorReadEnforcement` does not
    consult this and must not start.

  - `write-file.ts` first checkPriorRead call site — anchor comment
    that explains why no extra option is passed and applies to all
    5 call sites in the file.

No code changes; test suite unchanged at 7393 / 7398 (5 pre-existing
skips); `tsc --noEmit` clean.

* fix(core): QwenLM#4002 review wave — basename allowlist + correct stale comments

3 QwenLM#4002 review threads addressed:

- fileUtils.ts: added KNOWN_TEXT_BASENAMES allowlist for extensionless
  build / config / lockfiles (Dockerfile, Containerfile, Makefile,
  GNUmakefile, Jenkinsfile, Vagrantfile, Rakefile, Gemfile, Procfile,
  BUILD, WORKSPACE, CMakeLists.txt, go.mod, go.sum, go.work,
  Cargo.lock, Pipfile, Pipfile.lock, poetry.lock, package-lock.json,
  yarn.lock, pnpm-lock.yaml, requirements.txt, .gitignore,
  .gitattributes, .dockerignore, .npmignore, .editorconfig, .env,
  .bashrc, .zshrc, .profile, LICENSE, COPYING, AUTHORS, CHANGELOG,
  README, NOTICE). `path.extname('Dockerfile')` returns `''`, so the
  KNOWN_TEXT_EXTENSIONS check above misses these — an
  encrypted-volume read whose 4 KB sample looks binary would
  misclassify them as binary. Regression test pinned with
  fake-encrypted bytes for Dockerfile / Makefile / Jenkinsfile /
  go.mod / package-lock.json / .gitignore / LICENSE.

- priorReadEnforcement.ts: rewrote two misleading comments that
  pointed users to `fileReadCacheDisabled: true` for "stricter
  behaviour". That setting actually DISABLES enforcement entirely
  (skips checkPriorRead). Updated to make the opt-out semantics
  explicit and clarify that there is no built-in stricter mode —
  users who want stricter built-in enforcement than the residual
  QwenLM#2499 risk accepts have no flag here today and should file a
  feature request.

- read-file.ts: updated the `lastReadWasFull` comment to reflect that
  PR QwenLM#4002 removed WriteFile's `requireFullRead`. The flag now gates
  ONLY the `file_unchanged` fast-path; the stale "and WriteFile's
  full-read requirement" wording would have confused future readers
  into thinking WriteFile still consults `lastReadWasFull`.

Tests: 89/89 fileUtils.test.ts pass; tsc + ESLint clean.

* fix(core): split priorReadEnforcement guidance — partial OK for edit, full for overwrite

QwenLM#4002 review: shared "never read" error said `(a partial read with
offset / limit is fine — you only need to have seen the bytes you
intend to edit/overwrite)` for BOTH Edit and WriteFile. For Edit
this is correct — the model only needs to have seen the
`old_string`-bearing bytes; the rest passes through untouched.
For WriteFile this is misleading: overwriting replaces EVERY byte,
so a partial read leaves any unseen bytes as collateral damage.
The mtime/size drift check still catches the worst-case QwenLM#2499
hallucinated-bytes risk, but recommending a partial read in the
WriteFile guidance would actively encourage the footgun.

Fix: branch the partial-read guidance on `verb`. Edit keeps the
current "partial OK" text. WriteFile gets `(read the full file —
overwriting replaces every byte, so any unseen bytes would be
discarded)`.

120/120 edit + write-file tests pass; tsc + ESLint clean.

* docs(core): finish QwenLM#4002 review wave — drop two stale "fileReadCacheDisabled is escape hatch" mentions

cc30278 + c6e2bde addressed 4 of the 6 QwenLM#4002 review threads but
left two prior occurrences of the misleading "fileReadCacheDisabled:
true is the escape hatch for users who want stricter behaviour"
wording untouched. The flag actually goes the OPPOSITE way (skips
checkPriorRead entirely so application-level locking can take over),
so describing it as a "stricter" escape hatch is exactly the
guidance the c6e2bde review thread asked us to stop giving.

Files updated:

  - fileReadCache.ts:lastReadWasFull docstring — replaces the
    "stricter behaviour" sentence with the same opt-out / opt-in
    distinction c6e2bde used in priorReadEnforcement.ts.

  - write-file.ts anchor comment for all 5 checkPriorRead call
    sites — replaces the "fileReadCacheDisabled: true is the
    escape hatch" sentence with an explicit note on the opt-out
    direction matching the docstring.

Plus a coverage-split comment on the issue QwenLM#3945 deadlock-free
regression test in write-file.test.ts (review thread #6 from
glm-5.1: pointed out the test seeds the cache directly rather
than driving a full ReadFile→WriteFile pipeline). A real
integration test would need ReadFile-side mockConfig plumbing
(`getFileService`, `getTruncateToolOutputLines`, etc.) ported
into write-file.test.ts; the comment captures the link to
read-file.test.ts's matching cache-population assertion so a
future cache-entry schema change has to update both halves to
keep the end-to-end guarantee.

Tests: 295 / 296 (1 pre-existing skip) on the affected files;
tsc --noEmit clean.

* chore(core): add debug logs to detectFileType text fast-paths

QwenLM#4002 review (DeepSeek): the new text-classification branches
returned `'text'` without logging which path fired, leaving future
QwenLM#3964-class troubleshooting unable to tell mime-trust from
extension-override from basename-override from the content-sample
fallback without re-deriving by code reading.

Add `debugLogger.debug` calls on the three new fast-path branches:
mime trust (`isTextMime` match), extension override
(`KNOWN_TEXT_EXTENSIONS`), and basename override
(`KNOWN_TEXT_BASENAMES`). Each log includes the path, the chosen
classification, and the looked-up mime when relevant — enough to
disambiguate the four classification paths from a single line.

Off by default (`debug` level on the `FILE_UTILS` logger). Older
branches (image / audio / video / pdf / hardcoded TS / SVG / ipynb /
BINARY_EXTENSIONS / isBinaryFile / default text) keep their existing
silent behaviour: they predate the issue this is paving for and
adding logs there would be scope creep.

Tests: 89 / 89 fileUtils.test.ts pass; tsc --noEmit clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conflicting-group-1 Conflicting PR group 1 — review as a batch conflicting-pr Shares at least one cross-PR dependency with other PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants