fix(agents): coerce non-text/image MCP tool-result blocks to text (fixes #90710)#90728
Conversation
openclaw#90710) MCP CallToolResult.content can include resource_link, resource, and audio blocks, but toAgentToolResult cast them straight into AgentToolResult, which only carries text/image. Downstream the Anthropic provider/transport image branch then built an image block with undefined data/media_type, Anthropic returned 400, and because the bad block stayed in history every later turn 400d too -> permanently poisoned session. Normalize MCP content at the materialize boundary so the text/image contract stays honest: pass through valid images, coerce resource_link/resource/audio and malformed images to text. Fixes all providers, not just Anthropic.
|
Codex review: passed. Reviewed June 5, 2026, 10:15 PM ET / 02:15 UTC. Summary PR surface: Source +36, Tests +66. Total +102 across 2 files. Reproducibility: yes. Source inspection shows current main lets MCP resource/audio blocks cross into a text/image-only result contract, and the PR body includes before/after real-runtime output from a spawned stdio MCP server; I did not run a live hosted Anthropic API round trip in this read-only review. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Next step before merge
Security Review detailsBest possible solution: Land the materialization-boundary normalization after exact-head checks pass, keep provider converters typed to text/image, and let the linked issue close on merge. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection shows current main lets MCP resource/audio blocks cross into a text/image-only result contract, and the PR body includes before/after real-runtime output from a spawned stdio MCP server; I did not run a live hosted Anthropic API round trip in this read-only review. Is this the best way to solve the issue? Yes. Normalizing at the bundle-MCP materialization boundary is the narrowest maintainable fix because it restores the core tool-result contract for all downstream providers, while patching only Anthropic would leave invalid content in agent history. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 3a2f54e6a866. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +36, Tests +66. Total +102 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
@clawsweeper re-review — real behavior proof added to the PR body (real spawned stdio MCP server → real bundle-MCP runtime, before/after on origin/main vs this branch). The Real behavior proof check is now green. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
f70dccf to
05b682d
Compare
05b682d to
f70dccf
Compare
|
@clawsweeper re-review — the PR head was briefly overwritten by an unintended commit and has now been restored to the boundary-normalization fix (f70dccf). The previous verdict reviewed the wrong (now-reverted) code against a stale base. Current head: materialize-boundary normalization only, CI fully green (131 pass / 0 fail), real-behavior proof in the body. Please re-review f70dccf. |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
|
@clawsweeper automerge |
|
🦞✅ Source: What merged:
Automerge notes:
The automerge loop is complete. Automerge progress:
|
openclaw#90710) (openclaw#90728) Summary: - The PR converts wider MCP CallToolResult content blocks into text/image AgentToolResult blocks at the bundle-MCP materialization boundary and adds regression tests. - PR surface: Source +36, Tests +66. Total +102 across 2 files. - Reproducibility: yes. Source inspection shows current main lets MCP resource/audio blocks cross into a text/ ... a spawned stdio MCP server; I did not run a live hosted Anthropic API round trip in this read-only review. Automerge notes: - No ClawSweeper repair was needed after automerge opt-in. Validation: - ClawSweeper review passed for head f70dccf. - Required merge gates passed before the squash merge. Prepared head SHA: f70dccf Review: openclaw#90728 (comment) Co-authored-by: 宇宙熊Yzx <53250620+849261680@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
openclaw#90710) (openclaw#90728) Summary: - The PR converts wider MCP CallToolResult content blocks into text/image AgentToolResult blocks at the bundle-MCP materialization boundary and adds regression tests. - PR surface: Source +36, Tests +66. Total +102 across 2 files. - Reproducibility: yes. Source inspection shows current main lets MCP resource/audio blocks cross into a text/ ... a spawned stdio MCP server; I did not run a live hosted Anthropic API round trip in this read-only review. Automerge notes: - No ClawSweeper repair was needed after automerge opt-in. Validation: - ClawSweeper review passed for head f70dccf. - Required merge gates passed before the squash merge. Prepared head SHA: f70dccf Review: openclaw#90728 (comment) Co-authored-by: 宇宙熊Yzx <53250620+849261680@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
openclaw#90710) (openclaw#90728) Summary: - The PR converts wider MCP CallToolResult content blocks into text/image AgentToolResult blocks at the bundle-MCP materialization boundary and adds regression tests. - PR surface: Source +36, Tests +66. Total +102 across 2 files. - Reproducibility: yes. Source inspection shows current main lets MCP resource/audio blocks cross into a text/ ... a spawned stdio MCP server; I did not run a live hosted Anthropic API round trip in this read-only review. Automerge notes: - No ClawSweeper repair was needed after automerge opt-in. Validation: - ClawSweeper review passed for head f70dccf. - Required merge gates passed before the squash merge. Prepared head SHA: f70dccf Review: openclaw#90728 (comment) Co-authored-by: 宇宙熊Yzx <53250620+849261680@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…26.6.5) (#963) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.6.1` → `2026.6.5` | --- ### Release Notes <details> <summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary> ### [`v2026.6.5`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#202665) [Compare Source](openclaw/openclaw@v2026.6.1...v2026.6.5) ##### Highlights - QQBot now strips model reasoning/thinking scaffolding before native delivery, preventing raw `<thinking>` content from leaking into channel replies. ([#​89913](openclaw/openclaw#89913), [#​90132](openclaw/openclaw#90132)) Thanks [@​openperf](https://github.com/openperf). - MCP tool results now coerce `resource_link`, `resource`, `audio`, malformed image, and future non-text/image blocks at the materialize boundary, preventing Anthropic 400s and poisoned session history after a tool returns richer MCP content. ([#​90710](openclaw/openclaw#90710), [#​90728](openclaw/openclaw#90728)) Thanks [@​RanSHammer](https://github.com/RanSHammer) and [@​849261680](https://github.com/849261680). - Anthropic extended-thinking sessions recover after prompt-cache expiry or Gateway restart because stream start events wait for `message_start`, letting pre-generation signature errors trigger the existing recovery retry. ([#​90667](openclaw/openclaw#90667), [#​90697](openclaw/openclaw#90697)) Thanks [@​openperf](https://github.com/openperf). - Parallel is now a bundled `web_search` provider with `PARALLEL_API_KEY` discovery, guarded endpoint handling, cache-safe session ids, onboarding picker support, and docs. ([#​85158](openclaw/openclaw#85158)) Thanks [@​NormallyGaussian](https://github.com/NormallyGaussian). - Google Vertex ADC users get static catalog rows and runtime model resolution again, while single-provider cooldown recovery and memory adapter status checks are more reliable. ([#​90506](openclaw/openclaw#90506), [#​90609](openclaw/openclaw#90609), [#​90717](openclaw/openclaw#90717), [#​90816](openclaw/openclaw#90816)) Thanks [@​849261680](https://github.com/849261680). - Matrix can preflight voice notes before mention gating, preserve thread reads/replies through Matrix relations pagination, and carry QA coverage for voice and thread flows. ([#​78016](openclaw/openclaw#78016), [#​90415](openclaw/openclaw#90415)) - Auth and plugin install state is more durable: auth profiles now live in SQLite, official npm plugin install records keep their trusted pins, and prerelease fallback integrity checks avoid carrying stale integrity forward. ([#​89102](openclaw/openclaw#89102), [#​88585](openclaw/openclaw#88585)) - macOS node mode no longer silently self-reconnects away from a healthy direct Gateway session, reducing unexpected companion app session churn. ([#​90668](openclaw/openclaw#90668), [#​90815](openclaw/openclaw#90815)) Thanks [@​vrurg](https://github.com/vrurg). - Upgrade and service paths are safer: cron legacy JSON stores migrate during doctor preflight, service env placeholders no longer mask state-dir secrets, WhatsApp startup waits are bounded, and disabled WhatsApp accounts tear down on config reload. ([#​90072](openclaw/openclaw#90072), [#​90208](openclaw/openclaw#90208), [#​90277](openclaw/openclaw#90277), [#​90488](openclaw/openclaw#90488), [#​90486](openclaw/openclaw#90486), [#​87951](openclaw/openclaw#87951), [#​87965](openclaw/openclaw#87965)) Thanks [@​MonkeyLeeT](https://github.com/MonkeyLeeT), [@​sallyom](https://github.com/sallyom), [@​mcaxtr](https://github.com/mcaxtr), and [@​MukundaKatta](https://github.com/MukundaKatta). ##### Changes - Search/providers: add the Parallel bundled web-search plugin, live provider tests, registration contracts, onboarding/docs wiring, and guarded `api.parallel.ai/v1/search` support. ([#​85158](openclaw/openclaw#85158)) Thanks [@​NormallyGaussian](https://github.com/NormallyGaussian). - Matrix/channels: add voice-message preflight and thread-aware read/reply behavior, including Matrix QA scenario wiring and docs for voice-message behavior. ([#​78016](openclaw/openclaw#78016), [#​90415](openclaw/openclaw#90415)) - Skills/ClawHub: install ClawHub skills backed by GitHub repositories through the resolved install API, download the pinned GitHub commit, keep install-policy checks, and report install telemetry after success. ([#​90478](openclaw/openclaw#90478)) Thanks [@​Patrick-Erichsen](https://github.com/Patrick-Erichsen). - Google Chat/channels: add native approval card actions and click handling so Google Chat approvals use platform-native cards instead of generic message flow. - Mobile: Android provider/model screens now surface expiring, unavailable, unresolved, and attention states more clearly, while iOS settings and Talk tabs keep diagnostics, gateway rows, attachment labels, and unavailable Talk controls reachable. - Memory: QMD search can use the new rerank toggle, and memory adapter status uses the resolved default model identity when checking plain status. ([#​61834](openclaw/openclaw#61834)) - Docs/tooling: add Parallel search docs, refresh weather-skill guidance toward `web_fetch`, clarify legacy `openai-codex` auth, document release/test helper scripts, and tighten changed-test routing docs for CI/debugging work. ([#​90028](openclaw/openclaw#90028), [#​90250](openclaw/openclaw#90250)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Release/process: switch release trains to `YYYY.M.PATCH` monthly patch numbering, keep pre-transition tags compatible, and pin the June 2026 floor at `2026.6.5` after the published beta. - Platform maintenance: refresh Android, Swift/macOS, Docker, CodeQL, Buildx, Docker build/push, and Codex Action dependencies for this release train. ([#​74980](openclaw/openclaw#74980), [#​81757](openclaw/openclaw#81757), [#​86481](openclaw/openclaw#86481), [#​86483](openclaw/openclaw#86483), [#​90601](openclaw/openclaw#90601)) - QQBot: add `/bot-group-allways on|off` slash command (with named-account and default-account support) to toggle whether group messages require an `@mention` before the bot replies, and clear the runtime config snapshot after the write so the new account-level `defaultRequireMention` takes effect immediately without restart. ([#​91423](openclaw/openclaw#91423)) Thanks [@​cxyhhhhh](https://github.com/cxyhhhhh). ##### Fixes - Channel content boundaries: QQBot now strips reasoning/thinking tags before sending, preserving final answers while hiding internal model narration from users. ([#​89913](openclaw/openclaw#89913), [#​90132](openclaw/openclaw#90132)) Thanks [@​openperf](https://github.com/openperf). - Agents/MCP/providers: coerce non-text/image MCP tool-result blocks before they reach provider converters, preserving valid images and turning richer MCP content into text instead of malformed image blocks. ([#​90710](openclaw/openclaw#90710), [#​90728](openclaw/openclaw#90728)) Thanks [@​RanSHammer](https://github.com/RanSHammer) and [@​849261680](https://github.com/849261680). - Anthropic/Codex/ACP/agent recovery: defer Anthropic stream start events until `message_start`, strip stale compaction thinking signatures before Anthropic replay, detect unsigned thinking-only stalls, refresh prompt fences after compaction writes, reject empty completion handoffs, preserve parent streaming-off overrides/shared progress commentary, forward heartbeat metadata to context-engine hooks, and cover Codex session/thread migration edge cases. ([#​90667](openclaw/openclaw#90667), [#​90697](openclaw/openclaw#90697), [#​90163](openclaw/openclaw#90163), [#​90108](openclaw/openclaw#90108), [#​89874](openclaw/openclaw#89874), [#​89505](openclaw/openclaw#89505), [#​90632](openclaw/openclaw#90632), [#​89302](openclaw/openclaw#89302), [#​90729](openclaw/openclaw#90729), [#​90317](openclaw/openclaw#90317), [#​90319](openclaw/openclaw#90319)) Thanks [@​openperf](https://github.com/openperf), [@​100yenadmin](https://github.com/100yenadmin), and [@​ooiuuii](https://github.com/ooiuuii). - Provider/model resolution: preserve Google Vertex ADC auth markers in generated catalogs, re-probe a single-provider primary after cooldown, share Codex model visibility, fail closed for unknown model auth, preserve Codex alias availability, keep unresolved profile refs unknown, and avoid resolving auth while listing models. ([#​90506](openclaw/openclaw#90506), [#​90609](openclaw/openclaw#90609), [#​90717](openclaw/openclaw#90717), [#​90702](openclaw/openclaw#90702)) Thanks [@​849261680](https://github.com/849261680). - Gateway/macOS/mobile: avoid duplicate Gateway probe warnings by identity, rate-limit node pairing requests while preserving paired-node reconnects, keep macOS node mode on a healthy direct Gateway session, keep iOS diagnostics and gateway rows reachable, and avoid Linux ARM Gradle resource tasks during Android builds. ([#​85791](openclaw/openclaw#85791), [#​90147](openclaw/openclaw#90147), [#​90668](openclaw/openclaw#90668), [#​90815](openclaw/openclaw#90815)) Thanks [@​giodl73-repo](https://github.com/giodl73-repo) and [@​vrurg](https://github.com/vrurg). - TUI/chat/Workboard/auto-reply: optimistic user messages stay stable across stale history reloads, runId reassignment, and abort windows instead of disappearing, jumping, or lingering as ghost rows; Workboard stale lifecycle bulk updates no longer overwrite newer status/provenance; message-tool sends now count as delivery. ([#​86205](openclaw/openclaw#86205), [#​89600](openclaw/openclaw#89600), [#​88592](openclaw/openclaw#88592), [#​90123](openclaw/openclaw#90123)) Thanks [@​RomneyDa](https://github.com/RomneyDa). - Cron/update/service env: doctor config preflight now migrates legacy cron JSON stores into SQLite before runtime reads, service env planning skips unresolved placeholders that would mask state-dir `.env` values, and session transcript rewrites keep registry markers/discriminants consistent. ([#​90072](openclaw/openclaw#90072), [#​90208](openclaw/openclaw#90208), [#​90277](openclaw/openclaw#90277), [#​90488](openclaw/openclaw#90488)) Thanks [@​MonkeyLeeT](https://github.com/MonkeyLeeT) and [@​sallyom](https://github.com/sallyom). - Security/config/tooling: guard MCP HTTP redirects, protect global agent config defaults, and keep release/test/tooling proof failures bounded and explicit. ([#​89732](openclaw/openclaw#89732), [#​90145](openclaw/openclaw#90145)) - Channels: WhatsApp restarts when per-account config changes, bounds background startup waits, closes failed sockets, and preserves reconnect behavior; Mattermost slash commands keep their state on `globalThis`; Feishu streaming cards preserve full merged content; voice-call tracks Twilio streams after connect; ClickClack reply tools respect `toolsAllow`. ([#​87951](openclaw/openclaw#87951), [#​87965](openclaw/openclaw#87965), [#​90486](openclaw/openclaw#90486), [#​68113](openclaw/openclaw#68113), [#​90534](openclaw/openclaw#90534), [#​90181](openclaw/openclaw#90181), [#​90607](openclaw/openclaw#90607), [#​89500](openclaw/openclaw#89500)) Thanks [@​MukundaKatta](https://github.com/MukundaKatta), [@​mcaxtr](https://github.com/mcaxtr), [@​infoanton](https://github.com/infoanton), [@​mushuiyu886](https://github.com/mushuiyu886), and [@​sahibzada-allahyar](https://github.com/sahibzada-allahyar). - Feishu: retry transient send rate-limit errors (HTTP 429, per-chat code 230020, tenant-level code 11232) with linear backoff, including SDK responses that fulfill with rate-limit bodies instead of throwing, and route streaming-card sends through the retry wrapper. ([#​89659](openclaw/openclaw#89659)) Thanks [@​ladygege](https://github.com/ladygege). - Release/CI/E2E: main CI guard drift, PR merge diff scoping, live Docker credential staging, base-image qualification, installer Docker classification, Playwright dependency install recovery, API-key auth for Codex live Docker lanes, Parallels option terminators, and JSON-mode progress handling are tighter so release proof fails cleaner. ([#​90532](openclaw/openclaw#90532), [#​90287](openclaw/openclaw#90287), [#​90058](openclaw/openclaw#90058)) Thanks [@​RomneyDa](https://github.com/RomneyDa), [@​hxy91819](https://github.com/hxy91819), and [@​mrunalp](https://github.com/mrunalp). - Release/CI/E2E: Docker E2E and live Docker harness runs now apply default memory, CPU, and process ceilings while preserving explicit per-lane overrides. - Release/CI/E2E: plugin lifecycle matrix resource sampling now fails phases that exceed RSS, wall-clock, or CPU ceilings instead of only logging the measurements. - Release/CI/E2E: Codex npm plugin live assertions now cap transcript discovery and diagnostic log reads so failure proof stays bounded. - Tests/state isolation: QA Lab valid-tool-call metrics now require runtime tool-call evidence when runtime parity data is available instead of counting tool-backed scenario pass status alone. - Tests/state isolation: QA Lab runtime parity now fails planned-only tool-call rows without matching tool results instead of treating matching mock plans as real tool evidence. - Tests/state isolation: provider, media, auth, cron, task, session, sandbox, Gateway, and Codex timeout fixtures now scope more home/state/env data per test, reducing cross-test leakage and making release validation failures less noisy. ([#​90027](openclaw/openclaw#90027), [#​89974](openclaw/openclaw#89974)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19--> Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/963
Summary
MCP
CallToolResult.content[]can includeresource_link,resource, andaudioblocks (MCP SDKContentBlockunion), buttoAgentToolResultinagent-bundle-mcp-materialize.tscast that array straight intoAgentToolResult.content, whose contract is only(TextContent | ImageContent)[]. That cast lied to the type system, so the bad blocks flowed downstream where the Anthropic provider/transport "non-text ⇒ image" branch built animageblock withundefineddata/media_type. Anthropic rejects that with HTTP 400, and because the malformed block stays in conversation history, every subsequent turn replays it and 400s too — the session is permanently poisoned until history is reset.text/imagecontract stays honest, instead of patching each downstream provider converter. This fixes all providers, not just Anthropic.resource_link→[title|name] uri,resource→resource.text ?? uri,audio→[audio <mime>], and malformed/unknown blocks → stringified text.convertContentBlockskeep their honesttext|imagesignatures; they no longer receive wider MCP blocks, so they are intentionally untouched.Linked context
Closes #90710
Reporter: @RanSHammer (clear repro + root-cause diagnosis).
Real behavior proof (required for external PRs)
resource_linkfor a.docx, plusresource/audio) no longer leaks those blocks into the agent tool result. They are coerced to text at the materialize boundary, so the Anthropic image branch never builds an emptyimageblock (no 400, no poisoned history).createBundleMcpToolRuntime→createSessionMcpRuntime) against a real spawned stdio MCP server subprocess (MCP SDK 1.29.0McpServer) over real JSON-RPC. No vitest, no mocks — production code paths, executed on bothorigin/main(before) and this branch (after). Node 22.22.2, macOS.get_attachmenttool returns[text, resource_link(.docx), resource(memo), audio], then drove the real runtime and printed the materializedAgentToolResult.content:node --import tsx drive.mts <repoRoot>(driver dynamically importssrc/agents/agent-bundle-mcp-materialize.tsfrom the given checkout and calls the materialized tool).f70dccf33e):Before (same driver + same real MCP server against unfixed
origin/mainmaterialize), the array carried raw non-text/image blocks — exactly what the Anthropic image branch then turns into an emptyimagesource and 400s on:image. Theresource_link/resource/audioblocks that previously sat illegally inside the(text|image)[]array are gone, so the provider can no longer build adata:undefined / media_type:undefinedimage block.imagebranch consumes (src/llm/providers/anthropic.ts:140,src/agents/anthropic-transport-stream.ts:287).tsx; the spawned MCP server is a minimal real SDK server rather than the reporter's Superhuman Mail remote MCP, but it emits the sameContentBlockshapes.Tests and validation
Which commands did you run?
node scripts/run-vitest.mjs src/agents/agent-bundle-mcp-tools.materialize.test.ts→ 13 passed (supplemental)tsgocore + test:src lanes → exit 0;oxfmt --check+ oxlint on changed files → clean (supplemental)What regression coverage was added or updated?
Two new cases in
agent-bundle-mcp-tools.materialize.test.ts:resource_link/resource(text + blob) /audioblocks coerce to text while a validimagepasses through.imageblock (missing base64 source) coerces to text instead of an empty image block.