Skip to content

Feature: Inbound/outbound middleware hooks for security layers #39582

@9to5ai

Description

@9to5ai

Problem

OpenClaw processes untrusted input from multiple surfaces (email, webhooks, chat, web scraping) and sends output back to those surfaces. Currently, there is no extensible middleware pipeline where users can plug in security layers like:

  • Inbound sanitization — stripping Unicode steganography, detecting encoded injection payloads, LLM-based classification
  • Outbound content gating — catching leaked secrets, API keys, file paths, data exfiltration patterns before they leave the system
  • Call governance — rate limiting, spend tracking, and dedup for LLM calls
  • Access control — path jailing and URL safety checks (SSRF prevention)

The system already does good work wrapping external content in <<<EXTERNAL_UNTRUSTED_CONTENT>>> tags with security notices. This proposal extends that into a full defense-in-depth pipeline.

Proposal

Add inbound and outbound middleware hooks to the gateway message processing pipeline:

Inbound Middleware Chain

Runs before the message reaches the agent:

Raw message → [Middleware 1] → [Middleware 2] → ... → Agent

Each middleware receives the message + metadata (source, sender, channel) and can:

  • Modify the message (sanitize, strip dangerous content)
  • Annotate it (add risk scores, detection metadata)
  • Block it (return early with a rejection)

Outbound Middleware Chain

Runs before the response leaves the system:

Agent reply → [Middleware 1] → [Middleware 2] → ... → Channel

Each middleware can:

  • Redact sensitive content (API keys, personal emails, phone numbers)
  • Block the response if it contains leaked secrets or exfil patterns
  • Log findings for audit

Configuration

middleware:
  inbound:
    - name: prompt-shield-sanitizer
      module: prompt-shield
      function: sanitize
      config:
        blockThreshold: 80
    - name: prompt-shield-scanner
      module: prompt-shield
      function: scan
      config:
        model: strongest-available
  outbound:
    - name: prompt-shield-redactor
      module: prompt-shield
      function: redact
    - name: prompt-shield-gate
      module: prompt-shield
      function: checkOutbound

Middleware Interface

interface InboundMiddleware {
  name: string;
  process(message: string, context: MessageContext): Promise<MiddlewareResult>;
}

interface MiddlewareResult {
  action: "pass" | "modify" | "block";
  message?: string;       // modified message (if action=modify)
  metadata?: Record<string, unknown>; // annotations
  reason?: string;        // block reason
}

Context

I built prompt-shield — a 6-layer prompt injection defense system informed by attack techniques from Pliny's L1B3RT4S (jailbreak catalog) and P4RS3LT0NGV3 (79+ encoding/steganography techniques). It includes:

  1. Deterministic sanitizer — strips Unicode tags, variation selectors, Zalgo, normalizes Cyrillic/fullwidth/mathematical confusables, detects Base64/hex/ROT13/leetspeak encoded injections
  2. LLM-based scanner — dedicated classification prompt with structured output, score overrides, source-aware error handling
  3. Outbound content gate — secret detection (15 patterns), file path leakage, injection artifacts, markdown image exfiltration, Luhn-validated credit card detection
  4. Redaction pipeline — API keys, personal emails (filtered against 30 providers), phone numbers, dollar amounts
  5. Call governor — spend limits, volume limits, lifetime counters, SHA-256 dedup cache
  6. Access control — path jailing with deny lists, URL safety with private IP/SSRF blocking

All deterministic layers are synchronous with zero external dependencies. 154 tests passing.

The library is ready to plug in — OpenClaw just needs the hooks.

Benefits

  • Defense in depth — complements the existing EXTERNAL_UNTRUSTED_CONTENT wrapping
  • Extensible — users can write custom middleware (compliance, logging, domain-specific filters)
  • Configurable — enable/disable per layer, tune thresholds
  • Zero-trust by default — security layers run regardless of what the agent "decides" to do

Metadata

Metadata

Assignees

No one assigned

    Labels

    duplicateThis issue or pull request already existssecuritySecurity documentation

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions