fix(outbound): sanitize message.send arguments to prevent runtime scaffolding leaks#89118
fix(outbound): sanitize message.send arguments to prevent runtime scaffolding leaks#89118LiLan0125 wants to merge 1 commit into
Conversation
…ffolding leaks Weak tool-calling models (MiniMax, Kimi, small Ollama models) can verbatim-echo the runtime Delivery: hint and Conversation info / Sender (untrusted metadata) JSON envelopes into message.send tool arguments. The runtime forwarded them unfiltered to channel adapters, leaking internal metadata into real human conversations. Apply the existing stripInboundMetadata sanitizer to outbound message.send arguments so the same sentinels stripped from inbound prompts are also stripped from outbound tool-call text before delivery. Closes openclaw#89100 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Codex review: needs real behavior proof before merge. Reviewed June 1, 2026, 11:04 AM ET / 15:04 UTC. Summary PR surface: Source +3, Docs +1. Total +4 across 2 files. Reproducibility: yes. from source: current main forwards parsed message.send text after citation/tool-call stripping but before any inbound-metadata stripping, so a model-provided metadata envelope would reach the send payload. I did not run a live MiniMax/channel reproduction. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Land a narrow runtime sanitizer fix with outbound-path proof, remove the release-owned changelog edit, and keep or split the linked routing bug so FM-2 remains tracked. Do we have a high-confidence way to reproduce the issue? Yes from source: current main forwards parsed message.send text after citation/tool-call stripping but before any inbound-metadata stripping, so a model-provided metadata envelope would reach the send payload. I did not run a live MiniMax/channel reproduction. Is this the best way to solve the issue? Mostly yes for FM-3: reusing Full review comments:
Overall correctness: patch is correct AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against dfeb5b81ca67. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +3, Docs +1. Total +4 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
Summary
Delivery:hint andConversation info / Sender (untrusted metadata)JSON envelopes intomessage.sendtool arguments.chat_id,sender_id,inbound_event_kind, sender display name/phone) into real human conversations.stripInboundMetadatasanitizer to outbound message text. This is the same function already used to strip these sentinel blocks from inbound prompts before they reach the model — now it also strips them from outbound tool-call arguments before they reach the channel adapter.hasReplyPayloadContentguard throws "send requires text or media", preventing empty-message delivery.Verification
stripInboundMetadatacorrectly strips the Delivery hint, Conversation info block, and Sender metadata block from a simulated leaked-scaffolding message while preserving the actual reply.Real behavior proof
Behavior addressed: Weak models echo runtime scaffolding (Delivery hint + Conversation info/Sender metadata JSON blocks) into
message.sendarguments, which are forwarded unfiltered to real chat channels.Environment tested: Node 22.22.0 on Linux, pnpm test.
Steps run after the patch:
npx vitest run --config test/vitest/vitest.infra.config.ts src/infra/outbound/(36 test files, 626 tests — all pass).stripInboundMetadatafunction with a reconstructed leaked-scaffolding message:Evidence after fix:
Observed result: All scaffolding blocks (Delivery hint, Conversation info, Sender metadata) are stripped; only the actual reply text remains for delivery.
Not tested: Live gateway session with MiniMax/Kimi models on WhatsApp/Signal group chats.
Closes #89100.
🤖 Generated with Claude Code