fix(openai): default splitToolMedia so tool-returned images reach strict OpenAI-compatible backends#4917
Conversation
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
🧪 E2E verification on a real Doubao endpoint (
|
| # | Path | Binary | image_url role | doubao-lite output | Markers hit |
|---|---|---|---|---|---|
| A | @image baseline |
fix | user |
correct | OK — MANGO-58 / #592A91 / teal square / orange caption |
| B | read_file Before |
global 0.17.1 | tool |
correct | OK |
| C | read_file After |
fix | user |
correct | OK |
What it proves — and an honest caveat
Structural fix verified end-to-end on a real Doubao model: the read_file image moves from role:"tool" (Before) to a follow-up role:"user" message (After), captured in the raw payloads above. No regression — doubao-lite describes the image correctly in all three cases.
Honest caveat — the Volcengine Ark official endpoint is permissive: it parses an image_url even inside a role:"tool" message, so Case B still works here. That means this endpoint does not reproduce #4876's "image ignored" symptom. The reporter's symptom comes from a strict OpenAI-compatible gateway (a new-api proxy in #4876; LM Studio behaves the same), which only honors images in role:"user".
So this run confirms the two things a permissive endpoint can confirm — (1) the fix produces the spec-compliant message shape, and (2) it doesn't break a model that already worked — and it reconfirms the core finding: tolerating tool-message images is an endpoint property, not a model property (Ark permissive; the reporter's new-api strict). The fix makes the image role:"user"-visible on strict endpoints too, which is exactly where #4876 bites.
中文说明
针对真实火山方舟官方端点 (https://ark.cn-beijing.volces.com/api/v3) + doubao-seed-2-0-lite-260428(provider 配了 modalities:{image:true}、真实 ARK_API_KEY)做了端到端验证。测试图带独特标识(MANGO-58、青色方块、深紫 #592A91 背景、橙色 qwen image test),描述对不上就是没看到。provider 已配 modalities.image,所以它不是变量——Before/After 唯一差异是 splitToolMedia 默认值。
抓包(同模型同端点):
- Before(0.17.1,
splitToolMedia=false):read_file那轮 →[5] tool [image_url],图嵌在role:"tool"。 - After(修复,
splitToolMedia=true):[3] tool (纯文本) -> [4] user [text, image_url],图被提升到role:"user"。
结果: 三个 case(@图基线 / read_file Before / read_file After)doubao-lite 都正确描述了图片,命中全部标识。
结论与诚实声明:
- 结构修复在真实 Doubao 模型上验证通过:read_file 的图从
role:"tool"移到role:"user"(抓包铁证),且无回归。 - 诚实声明——火山方舟官方端点是宽容的:它连
role:"tool"里的图也能解析,所以 Case B 在这里也正常。也就是说这个端点复现不出 使用subagent读取图片文件,模型返回非预期内容 #4876 的"图被忽略"症状。报告者的症状来自严格的 OpenAI 兼容网关(使用subagent读取图片文件,模型返回非预期内容 #4876 里是new-api代理;LM Studio 同理),那种端点只认role:"user"里的图。 - 因此本次验证证明了宽容端点能证明的两点——修复产出 spec-compliant 的消息结构、且不破坏已能工作的模型——并再次印证核心机制:能否容忍 tool 消息里的图是「端点」属性而非「模型」属性(Ark 宽容;报告者的 new-api 严格)。修复让图在严格端点上也以
role:"user"可见,而那正是 使用subagent读取图片文件,模型返回非预期内容 #4876 真正发作的地方。
6a7747e to
23fa89b
Compare
23fa89b to
37008f3
Compare
…ict backends OpenAI Chat Completions only permits text on `role:"tool"` messages, so an image read via read_file — the only image path available to a subagent — was embedded there and silently dropped by strict OpenAI-compatible backends (doubao / new-api / LM Studio). The model never saw the image and returned content unrelated to it (#4876). Permissive backends (e.g. DashScope) happen to parse it, which is why the same model worked for the main agent via @-image (role:"user") but not for the subagent via read_file (role:"tool"). Flip the runtime default of splitToolMedia to true so tool-returned media is lifted into a follow-up role:"user" message — spec-compliant and visible to all backends. Opt out via generationConfig.splitToolMedia = false. Also: - modalityDefaults: recognize ByteDance Doubao (Seed chat + *vision/*vl => image; seedance/seedream generation models => text-only). - settingsSchema + docs: default true, description corrected to cover the built-in read_file (not only MCP tools). Tests: pipeline default-true regression, modalityDefaults doubao cases, converter opt-out wording.
37008f3 to
a9d1fd0
Compare
|
Thanks for the PR! Template looks good ✓ — all required sections present with solid detail. On direction: this is a clear bug fix. The OpenAI Chat Completions spec only permits text on On approach: the scope feels right — it's essentially a one-liner default flip ( 中文说明感谢贡献! 模板完整 ✓ — 所有必填章节齐全,细节充实。 方向:这是一个明确的 bug 修复。OpenAI Chat Completions 规范只允许 方案:范围合理——本质上是在 pipeline.ts 里把默认值从 — Qwen Code · qwen3.7-max |
Code ReviewClean implementation — no blockers or AGENTS.md violations. The change is essentially two focused fixes:
Schema and docs are consistent. No over-abstraction, no unnecessary scope. Unit TestsAll 214 tests pass across the 3 changed test files (modalityDefaults, converter, pipeline). Build clean — only pre-existing warnings in vscode-ide-companion (unrelated). Real-Scenario Testing (tmux)This is a wire-format change (image placement in HTTP request payloads), not a TUI change — no user-visible difference to screenshot. The meaningful verification is inspecting What I could verify — the CLI starts and responds correctly with the PR code: Before (installed qwen 0.17.1)After (this PR via
|
|
This is a well-executed bug fix. The core change is a single default flip ( What makes this PR solid:
The main risk — changing the wire format for all OpenAI-compatible providers — is mitigated by the fact that the new format is spec-compliant. Permissive backends that accepted images in Approving. ✅ 中文说明这是一个执行良好的 bug 修复。核心改动是一个默认值翻转( PR 的优点:范围纪律好(11 文件,+158/-50,与改动量成比例);测试覆盖真实(opt-out 测试显式设置 主要风险——改变所有 OpenAI 兼容 provider 的 wire format——因新格式符合规范而缓解。宽容后端两种 role 都能接受。 批准。✅ — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
DragonnZhang
left a comment
There was a problem hiding this comment.
No issues found. LGTM. — Claude via Qwen Code /review
What this PR does
Makes images returned by tool calls (e.g. an image read via the built-in
read_file) reach the model on strict OpenAI-compatible backends. The runtime default ofsplitToolMediais flipped totrue, so tool-returned media is placed in a follow-uprole:"user"message instead of being embedded in therole:"tool"message. It also teaches modality auto-detection about ByteDance Doubao models.Why it's needed
The OpenAI Chat Completions spec only permits text content on
role:"tool"messages. Until now an image read viaread_filewas embedded there, so strict OpenAI-compatible backends (doubao / new-api / LM Studio) silently dropped or rejected it — the model never saw the image and returned content unrelated to it (#4876). A subagent is hit hardest becauseread_fileis its only image entry point (no@-image / paste path), whereas the main agent's@-image goes into arole:"user"message that these backends accept. Permissive backends (e.g. DashScope) happen to parse the tool-message image, which is exactly why the reporter saw "main agent works, subagent doesn't" with the same model.Splitting the media into a follow-up user message is spec-compliant and safe for permissive backends too (they still see the image, just in a different role). Users who relied on the old embed-in-tool behavior can opt out with
generationConfig.splitToolMedia = false.Separately,
doubao-*was not inmodalityDefaults, somodalities.imagedefaulted to false and images could be replaced with a text placeholder before they even became animage_url. This PR recognizes Doubao Seed chat and*vision/*vlmodels as image-capable, and keeps the seedance/seedream generation models text-only.Reviewer Test Plan
How to verify
doubao-seed-2.0-pro; verified locally withqwen3.6-pluson DashScope).@(so it goes throughread_file), e.g.分析 ./assets/chart.png, or delegate it to an image-analysis subagent.--openai-logging --openai-logging-dir <dir>and inspect the payload.Expected (after): the image
image_urlappears in arole:"user"message (with an(attached media from previous tool call)marker), not in therole:"tool"message. On strict backends the model now describes the image correctly instead of returning unrelated content. Before:image_urlsits inside therole:"tool"message and strict backends ignore it.Evidence (Before & After)
Captured payloads (qwen3.6-plus, default config — no manual
splitToolMedia): Before —read_fileimage →image_url @ role:"tool". After —read_fileimage →image_url @ role:"user"+(attached media from previous tool call), verified for both the main-agent and subagent paths. N/A for screenshots — this is a request-shape / wire-format change, not a TUI change.Tested on
Environment
Local
node dist/cli.js(built from this branch) against a real DashScopeqwen3.6-plusendpoint; unit tests via vitest (modalityDefaults / pipeline / converter / settingsSchema).Risk & Scope
doubao-seed-2-0-lite— see the E2E comment below), and corroborated by the reporter's own before/after experiments in 使用subagent读取图片文件,模型返回非预期内容 #4876.generationConfig.splitToolMedia = false.Linked Issues
Closes #4876
中文说明
这个 PR 做了什么
让工具调用返回的图片(例如通过内置
read_file读取的图)在严格 OpenAI 兼容后端也能被模型看到。把splitToolMedia的运行时默认值改为true,使工具返回的媒体放进后续的role:"user"消息,而不是嵌在role:"tool"消息里。同时让模态自动识别认识字节跳动 Doubao 系列模型。为什么需要
OpenAI Chat Completions 规范只允许
role:"tool"消息携带文本。此前read_file读到的图被嵌在 tool 消息里,严格 OpenAI 兼容后端(doubao / new-api / LM Studio)会静默丢弃或拒绝,模型从未看到图、返回与图无关的内容(#4876)。subagent 受影响最严重,因为read_file是它唯一的图片入口(没有@图/粘贴),而主 agent 的@图进的是role:"user"消息、这些后端能接受。宽容后端(如 DashScope)恰好能解析 tool 消息里的图——这正是报告者用同一模型看到"主 agent 正常、subagent 异常"的原因。把媒体拆分到后续 user 消息是符合规范的,对宽容后端也安全(图仍可见,只是换了 role)。依赖旧"嵌在 tool 消息"行为的用户可用
generationConfig.splitToolMedia = false退回。另外,
doubao-*不在modalityDefaults里,导致modalities.image默认 false、图可能在变成image_url前就被替换成占位文本。本 PR 把 Doubao Seed chat 和*vision/*vl识别为支持图片(seedance/seedream 生成模型保持纯文本)。复现/验证方式
doubao-seed-2.0-pro;本地用 DashScopeqwen3.6-plus验证)。@读取本地图(走read_file),如分析 ./assets/chart.png,或委派给图片分析 subagent。--openai-logging --openai-logging-dir <dir>抓请求体检查。预期(修复后):图片
image_url出现在role:"user"消息(带(attached media from previous tool call)标记),而非role:"tool"。严格后端下模型能正确描述图片。修复前:image_url在role:"tool"消息里,严格后端忽略。证据(前后对比)
抓包(qwen3.6-plus,默认配置,未手动配
splitToolMedia):修复前read_file图 →image_url @ role:"tool";修复后read_file图 →image_url @ role:"user"+(attached media from previous tool call),主 agent 和 subagent 两条路径均验证。无截图——这是请求结构/wire-format 改动,非 TUI 改动。风险与范围
doubao-seed-2-0-lite,见下方 E2E comment)端到端验证,并由报告者在 使用subagent读取图片文件,模型返回非预期内容 #4876 的前后对比实验佐证。generationConfig.splitToolMedia = false退回。关联 Issue
Closes #4876