Problem
OpenClaw wraps external/untrusted content in XML tags with a SECURITY NOTICE header before presenting it to the LLM. This is a good first layer, but the THREAT-MODEL-ATLAS.md explicitly acknowledges the limitation:
T-EXEC-002: Indirect Prompt Injection
- Current Mitigations: Content wrapping with XML tags and security notice
- Residual Risk: High — LLM may ignore wrapper instructions
- Recommendations: Implement content sanitization, separate execution contexts
The wrapping approach relies on the LLM respecting the boundary markers. Sophisticated injection attacks can include instructions to ignore previous context, override system prompts, or manipulate tool behavior — and the LLM may comply despite the wrapper.
Evidence
- OpenClaw's own threat model rates this as High residual risk
- The recommendation in the threat model ("implement content sanitization") is not yet implemented natively
- MCP tool output, web_fetch results, and email ingestion are all attack surfaces
Workaround
I built cc-taint-check.py, a PostToolUse hook (~150 lines Python) that:
- Intercepts MCP tool output before the LLM processes it
- Scans content against a catalogue of injection patterns:
- Direct instruction injection ("ignore previous instructions", "you are now", "system prompt override")
- Role manipulation ("as an AI without restrictions", "pretend you are")
- Encoding-based evasion (base64 encoded instructions, Unicode homoglyphs)
- Tool/function injection ("call function", "execute command")
- Blocks on high-severity matches, logs low-severity
- Normalizes Unicode before scanning to catch homoglyph-based evasion
Proposed Solution
Add a configurable content scanning layer to the tool output pipeline:
- Scan tool output from external sources (web_fetch, MCP tools, email, webhooks) before presenting to the LLM
- Pattern catalogue of known injection techniques (maintainable, updatable)
- Configurable response: log-only, warn user, or block the content
- Unicode normalization before scanning
{
"security": {
"contentScanning": {
"enabled": true,
"sources": ["web_fetch", "mcp", "email", "webhooks"],
"action": "warn",
"customPatterns": []
}
}
}
This directly implements the recommendation from the existing threat model.
Impact
High. Addresses a self-identified High-risk gap in the threat model. As LLMs increasingly process untrusted external content, content-based scanning becomes essential alongside wrapper-based containment.
Environment
- OpenClaw 2026.4.10 (npm, macOS)
Problem
OpenClaw wraps external/untrusted content in XML tags with a SECURITY NOTICE header before presenting it to the LLM. This is a good first layer, but the THREAT-MODEL-ATLAS.md explicitly acknowledges the limitation:
The wrapping approach relies on the LLM respecting the boundary markers. Sophisticated injection attacks can include instructions to ignore previous context, override system prompts, or manipulate tool behavior — and the LLM may comply despite the wrapper.
Evidence
Workaround
I built
cc-taint-check.py, a PostToolUse hook (~150 lines Python) that:Proposed Solution
Add a configurable content scanning layer to the tool output pipeline:
{ "security": { "contentScanning": { "enabled": true, "sources": ["web_fetch", "mcp", "email", "webhooks"], "action": "warn", "customPatterns": [] } } }This directly implements the recommendation from the existing threat model.
Impact
High. Addresses a self-identified High-risk gap in the threat model. As LLMs increasingly process untrusted external content, content-based scanning becomes essential alongside wrapper-based containment.
Environment