Feature: Inbound/outbound middleware hooks for security layers

## Problem

OpenClaw processes untrusted input from multiple surfaces (email, webhooks, chat, web scraping) and sends output back to those surfaces. Currently, there is no extensible middleware pipeline where users can plug in security layers like:

- **Inbound sanitization** — stripping Unicode steganography, detecting encoded injection payloads, LLM-based classification
- **Outbound content gating** — catching leaked secrets, API keys, file paths, data exfiltration patterns before they leave the system
- **Call governance** — rate limiting, spend tracking, and dedup for LLM calls
- **Access control** — path jailing and URL safety checks (SSRF prevention)

The system already does good work wrapping external content in `<<<EXTERNAL_UNTRUSTED_CONTENT>>>` tags with security notices. This proposal extends that into a full defense-in-depth pipeline.

## Proposal

Add **inbound and outbound middleware hooks** to the gateway message processing pipeline:

### Inbound Middleware Chain
Runs before the message reaches the agent:

```
Raw message → [Middleware 1] → [Middleware 2] → ... → Agent
```

Each middleware receives the message + metadata (source, sender, channel) and can:
- **Modify** the message (sanitize, strip dangerous content)
- **Annotate** it (add risk scores, detection metadata)
- **Block** it (return early with a rejection)

### Outbound Middleware Chain
Runs before the response leaves the system:

```
Agent reply → [Middleware 1] → [Middleware 2] → ... → Channel
```

Each middleware can:
- **Redact** sensitive content (API keys, personal emails, phone numbers)
- **Block** the response if it contains leaked secrets or exfil patterns
- **Log** findings for audit

### Configuration

```yaml
middleware:
  inbound:
    - name: prompt-shield-sanitizer
      module: prompt-shield
      function: sanitize
      config:
        blockThreshold: 80
    - name: prompt-shield-scanner
      module: prompt-shield
      function: scan
      config:
        model: strongest-available
  outbound:
    - name: prompt-shield-redactor
      module: prompt-shield
      function: redact
    - name: prompt-shield-gate
      module: prompt-shield
      function: checkOutbound
```

### Middleware Interface

```typescript
interface InboundMiddleware {
  name: string;
  process(message: string, context: MessageContext): Promise<MiddlewareResult>;
}

interface MiddlewareResult {
  action: "pass" | "modify" | "block";
  message?: string;       // modified message (if action=modify)
  metadata?: Record<string, unknown>; // annotations
  reason?: string;        // block reason
}
```

## Context

I built [`prompt-shield`](https://github.com/9to5ai/prompt-shield) — a 6-layer prompt injection defense system informed by attack techniques from [Pliny's L1B3RT4S](https://github.com/elder-plinius/L1B3RT4S) (jailbreak catalog) and [P4RS3LT0NGV3](https://github.com/elder-plinius/P4RS3LT0NGV3) (79+ encoding/steganography techniques). It includes:

1. **Deterministic sanitizer** — strips Unicode tags, variation selectors, Zalgo, normalizes Cyrillic/fullwidth/mathematical confusables, detects Base64/hex/ROT13/leetspeak encoded injections
2. **LLM-based scanner** — dedicated classification prompt with structured output, score overrides, source-aware error handling
3. **Outbound content gate** — secret detection (15 patterns), file path leakage, injection artifacts, markdown image exfiltration, Luhn-validated credit card detection
4. **Redaction pipeline** — API keys, personal emails (filtered against 30 providers), phone numbers, dollar amounts
5. **Call governor** — spend limits, volume limits, lifetime counters, SHA-256 dedup cache
6. **Access control** — path jailing with deny lists, URL safety with private IP/SSRF blocking

All deterministic layers are synchronous with zero external dependencies. 154 tests passing.

The library is ready to plug in — OpenClaw just needs the hooks.

## Benefits

- **Defense in depth** — complements the existing `EXTERNAL_UNTRUSTED_CONTENT` wrapping
- **Extensible** — users can write custom middleware (compliance, logging, domain-specific filters)
- **Configurable** — enable/disable per layer, tune thresholds
- **Zero-trust by default** — security layers run regardless of what the agent "decides" to do

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Inbound/outbound middleware hooks for security layers #39582

Problem

Proposal

Inbound Middleware Chain

Outbound Middleware Chain

Configuration

Middleware Interface

Context

Benefits

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Feature: Inbound/outbound middleware hooks for security layers #39582

Description

Problem

Proposal

Inbound Middleware Chain

Outbound Middleware Chain

Configuration

Middleware Interface

Context

Benefits

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions