Problem
OpenClaw processes untrusted input from multiple surfaces (email, webhooks, chat, web scraping) and sends output back to those surfaces. Currently, there is no extensible middleware pipeline where users can plug in security layers like:
- Inbound sanitization — stripping Unicode steganography, detecting encoded injection payloads, LLM-based classification
- Outbound content gating — catching leaked secrets, API keys, file paths, data exfiltration patterns before they leave the system
- Call governance — rate limiting, spend tracking, and dedup for LLM calls
- Access control — path jailing and URL safety checks (SSRF prevention)
The system already does good work wrapping external content in <<<EXTERNAL_UNTRUSTED_CONTENT>>> tags with security notices. This proposal extends that into a full defense-in-depth pipeline.
Proposal
Add inbound and outbound middleware hooks to the gateway message processing pipeline:
Inbound Middleware Chain
Runs before the message reaches the agent:
Raw message → [Middleware 1] → [Middleware 2] → ... → Agent
Each middleware receives the message + metadata (source, sender, channel) and can:
- Modify the message (sanitize, strip dangerous content)
- Annotate it (add risk scores, detection metadata)
- Block it (return early with a rejection)
Outbound Middleware Chain
Runs before the response leaves the system:
Agent reply → [Middleware 1] → [Middleware 2] → ... → Channel
Each middleware can:
- Redact sensitive content (API keys, personal emails, phone numbers)
- Block the response if it contains leaked secrets or exfil patterns
- Log findings for audit
Configuration
middleware:
inbound:
- name: prompt-shield-sanitizer
module: prompt-shield
function: sanitize
config:
blockThreshold: 80
- name: prompt-shield-scanner
module: prompt-shield
function: scan
config:
model: strongest-available
outbound:
- name: prompt-shield-redactor
module: prompt-shield
function: redact
- name: prompt-shield-gate
module: prompt-shield
function: checkOutbound
Middleware Interface
interface InboundMiddleware {
name: string;
process(message: string, context: MessageContext): Promise<MiddlewareResult>;
}
interface MiddlewareResult {
action: "pass" | "modify" | "block";
message?: string; // modified message (if action=modify)
metadata?: Record<string, unknown>; // annotations
reason?: string; // block reason
}
Context
I built prompt-shield — a 6-layer prompt injection defense system informed by attack techniques from Pliny's L1B3RT4S (jailbreak catalog) and P4RS3LT0NGV3 (79+ encoding/steganography techniques). It includes:
- Deterministic sanitizer — strips Unicode tags, variation selectors, Zalgo, normalizes Cyrillic/fullwidth/mathematical confusables, detects Base64/hex/ROT13/leetspeak encoded injections
- LLM-based scanner — dedicated classification prompt with structured output, score overrides, source-aware error handling
- Outbound content gate — secret detection (15 patterns), file path leakage, injection artifacts, markdown image exfiltration, Luhn-validated credit card detection
- Redaction pipeline — API keys, personal emails (filtered against 30 providers), phone numbers, dollar amounts
- Call governor — spend limits, volume limits, lifetime counters, SHA-256 dedup cache
- Access control — path jailing with deny lists, URL safety with private IP/SSRF blocking
All deterministic layers are synchronous with zero external dependencies. 154 tests passing.
The library is ready to plug in — OpenClaw just needs the hooks.
Benefits
- Defense in depth — complements the existing
EXTERNAL_UNTRUSTED_CONTENT wrapping
- Extensible — users can write custom middleware (compliance, logging, domain-specific filters)
- Configurable — enable/disable per layer, tune thresholds
- Zero-trust by default — security layers run regardless of what the agent "decides" to do
Problem
OpenClaw processes untrusted input from multiple surfaces (email, webhooks, chat, web scraping) and sends output back to those surfaces. Currently, there is no extensible middleware pipeline where users can plug in security layers like:
The system already does good work wrapping external content in
<<<EXTERNAL_UNTRUSTED_CONTENT>>>tags with security notices. This proposal extends that into a full defense-in-depth pipeline.Proposal
Add inbound and outbound middleware hooks to the gateway message processing pipeline:
Inbound Middleware Chain
Runs before the message reaches the agent:
Each middleware receives the message + metadata (source, sender, channel) and can:
Outbound Middleware Chain
Runs before the response leaves the system:
Each middleware can:
Configuration
Middleware Interface
Context
I built
prompt-shield— a 6-layer prompt injection defense system informed by attack techniques from Pliny's L1B3RT4S (jailbreak catalog) and P4RS3LT0NGV3 (79+ encoding/steganography techniques). It includes:All deterministic layers are synchronous with zero external dependencies. 154 tests passing.
The library is ready to plug in — OpenClaw just needs the hooks.
Benefits
EXTERNAL_UNTRUSTED_CONTENTwrapping