[FEATURE]  Tool result transform hook for content sanitization

### Preflight Checklist

- [x] I have searched [existing requests](https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20label%3Aenhancement) and this feature hasn't been requested yet
- [x] This is a single feature request (not multiple features)

### Problem Statement

### The Architectural Gap

Claude Code's hook system provides `PreToolUse` (before tool execution) and `PostToolUse` (after successful completion) hooks. However, there is a critical gap: **no hook can intercept and transform tool results before they enter Claude's context window**.

This matters because prompt injection attacks via external content are a documented, exploited vulnerability class affecting all major AI assistants:

| Product | Vulnerability | Disclosure Date | Source |
|---------|--------------|-----------------|--------|
| Claude Cowork | Data exfiltration via Anthropic API | Jan 2026 | PromptArmor / Embrace The Red |
| Microsoft Copilot | Reprompt attack (P2P injection) | Jan 2026 | Varonis Threat Labs |
| Slack AI | Private channel exfiltration | Aug 2024 | PromptArmor |
| Notion AI | Pre-approval data exfiltration | Jan 2026 | PromptArmor |
| Google Antigravity | Credential theft via browser agent | Dec 2025 | PromptArmor |

The common pattern: **external content (web pages, documents, API responses) contains hidden instructions that the model processes as commands**.

### Current Limitations

**`PreToolUse` hooks** can block tool execution but cannot see the result—they run *before* the tool executes.

**`PostToolUse` hooks** run *after* the tool completes but:
1. Cannot modify the tool result (confirmed in #4544, closed as duplicate)
2. By the time they execute, the content has already entered Claude's context
3. Any injection payload has already had opportunity to influence the model

**The fundamental problem:** There is no interception point where external content can be scanned, sanitized, or blocked *before* Claude processes it.

### Attack Scenario

```
1. User asks Claude to fetch a URL or read a document
2. Tool (WebFetch, Read, MCP tool) retrieves content
3. Content contains hidden prompt injection:
   - White-on-white text in documents
   - Microscopic font sizes (1-2pt)
   - HTML comments or invisible Unicode
   - Contextually-plausible "instructions to AI assistants"
4. Content enters Claude's context window
5. Injection influences Claude's subsequent behavior
6. Potential outcomes: data exfiltration, unauthorized actions, session hijacking
```

No current hook can intervene between steps 2 and 4.


### Proposed Solution

### New Hook Type: `ToolResultTransform`

Add a hook that executes **after a tool returns its result** but **before that result enters Claude's context window**, with the ability to:

1. **Inspect** the raw tool result
2. **Transform** the content (sanitize, redact, annotate)
3. **Block** the result entirely with a reason
4. **Pass through** unmodified

### Hook Specification

#### Trigger Point

```
Tool Invoked → Tool Executes → Tool Returns Result
                                        ↓
                              [ToolResultTransform Hook] ← NEW
                                        ↓
                              Result Enters Context Window
                                        ↓
                              Claude Processes Result
```

#### Configuration

```json
{
  "hooks": {
    "ToolResultTransform": [
      {
        "matcher": "WebFetch|WebSearch|Read|mcp__*",
        "hooks": [
          {
            "type": "command",
            "command": "~/.claude/hooks/content-scanner.py",
            "timeout": 30
          }
        ]
      }
    ]
  }
}
```

#### Input (stdin JSON)

```json
{
  "tool_name": "WebFetch",
  "tool_input": {
    "url": "https://example.com/document"
  },
  "tool_result": {
    "content": "... raw content returned by tool ...",
    "content_type": "text/html",
    "status_code": 200
  },
  "session_id": "abc123",
  "timestamp": "2026-01-16T12:00:00Z"
}
```

#### Output (stdout JSON)

**Pass through (no modification):**
```json
{
  "action": "pass"
}
```

**Transform content:**
```json
{
  "action": "transform",
  "transformed_content": "... sanitized content ...",
  "annotations": [
    {
      "type": "warning",
      "message": "Removed 3 potential injection patterns"
    }
  ]
}
```

**Block entirely:**
```json
{
  "action": "block",
  "reason": "Content contains high-confidence prompt injection patterns",
  "details": {
    "patterns_detected": ["SYSTEM_OVERRIDE", "PRIORITY_INSTRUCTION"],
    "risk_score": 0.92
  }
}
```

#### Exit Codes

| Code | Behavior |
|------|----------|
| 0 | Process JSON output normally |
| 1 | Non-blocking error (log warning, pass content through) |
| 2 | Blocking error (block content, show stderr to user) |

### Alternative Solutions

### 1. External Proxy (e.g., claudemon)

Third-party tools like [claudemon](https://github.com/aupeachmo/claudemon) use mitmproxy to intercept network traffic. This works but:
- Requires complex setup (proxy configuration, CA certificates)
- Only works for network-based tools, not file reads or MCP tools
- Doesn't integrate with Claude Code's permission/logging systems

### 2. Disable Tools Entirely

Users can disable `WebFetch`, `WebSearch`, etc., but this eliminates legitimate functionality rather than adding defense-in-depth.

### 3. Rely on Model Guardrails

Current approach. Demonstrably insufficient—every major AI assistant has been exploited via prompt injection despite guardrails.

### 4. User-Side Pre-Processing

Users can manually fetch content, scan it externally, then paste it into Claude. This:
- Defeats the purpose of integrated tools
- Introduces friction that reduces adoption
- Doesn't scale for agentic workflows


### Priority

High - Significant impact on productivity

### Feature Category

Configuration and settings

### Use Case Example

#### 1. Prompt Injection Detection

```python
#!/usr/bin/env python3
import json
import sys
import re

INJECTION_PATTERNS = [
    r'\[SYSTEM\s*(INSTRUCTION|OVERRIDE|PROMPT)\]',
    r'<\s*/?SYSTEM\s*>',
    r'IGNORE\s+(ALL\s+)?PREVIOUS\s+INSTRUCTIONS',
    r'YOU\s+ARE\s+NOW\s+IN\s+.+\s+MODE',
    r'BEGIN\s+NEW\s+CONVERSATION',
]

def scan_content(content):
    findings = []
    for pattern in INJECTION_PATTERNS:
        if re.search(pattern, content, re.IGNORECASE):
            findings.append(pattern)
    return findings

hook_input = json.load(sys.stdin)
content = hook_input.get("tool_result", {}).get("content", "")

findings = scan_content(content)

if findings:
    print(json.dumps({
        "action": "block",
        "reason": f"Detected {len(findings)} potential injection pattern(s)",
        "details": {"patterns": findings}
    }))
else:
    print(json.dumps({"action": "pass"}))
```

#### 2. Document Sanitization (Hidden Text Detection)

```python
#!/usr/bin/env python3
# Detect hidden text in Office documents (DOCX, XLSX, PPTX)
import json
import sys
import zipfile
import re
from io import BytesIO
import base64

def scan_docx(content_bytes):
    findings = []
    with zipfile.ZipFile(BytesIO(content_bytes)) as z:
        if 'word/document.xml' in z.namelist():
            doc_xml = z.read('word/document.xml').decode('utf-8')
            
            # Microscopic font (sz < 4 = 2pt)
            if re.search(r'<w:sz\s+w:val="[0-3]"', doc_xml):
                findings.append("microscopic_font")
            
            # White text
            if re.search(r'<w:color\s+w:val="FFFFFF"', doc_xml, re.I):
                findings.append("white_text")
            
            # Hidden text property
            if '<w:vanish/>' in doc_xml or '<w:vanish ' in doc_xml:
                findings.append("hidden_text_property")
    
    return findings

# ... process and return action
```

#### 3. Content Redaction for Sensitive Environments

```python
# Redact PII, credentials, or sensitive patterns before Claude sees them
REDACT_PATTERNS = {
    r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b': '[EMAIL_REDACTED]',
    r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE_REDACTED]',
    r'sk-[a-zA-Z0-9]{48}': '[API_KEY_REDACTED]',
}

def redact_content(content):
    for pattern, replacement in REDACT_PATTERNS.items():
        content = re.sub(pattern, replacement, content)
    return content
```

#### 4. Token Budget Management

```python
# Truncate oversized responses to prevent context exhaustion
MAX_TOKENS = 50000  # Approximate

def estimate_tokens(text):
    return len(text) // 4  # Rough estimate

hook_input = json.load(sys.stdin)
content = hook_input["tool_result"]["content"]

if estimate_tokens(content) > MAX_TOKENS:
    truncated = content[:MAX_TOKENS * 4]
    print(json.dumps({
        "action": "transform",
        "transformed_content": truncated + "\n\n[Content truncated due to size]",
        "annotations": [{"type": "info", "message": "Content truncated to fit context"}]
    }))
else:
    print(json.dumps({"action": "pass"}))
```


### Additional Context

## Security Considerations

### Hook Trust Model

The hook runs with the user's permissions and is configured by the user. This aligns with Claude Code's existing security model where users are responsible for hook scripts they configure.

### Performance

- Hooks should have configurable timeouts (default: 30s)
- For high-frequency tools, users can exclude them from scanning via matcher patterns
- Async/parallel hook execution could be considered for multiple hooks

### Failure Modes

| Scenario | Recommended Behavior |
|----------|---------------------|
| Hook times out | Pass content through with warning |
| Hook crashes | Pass content through with warning |
| Invalid JSON output | Pass content through with warning |
| Hook returns malformed action | Pass content through with warning |

Fail-open by default (with logging) to avoid breaking workflows, but allow users to configure fail-closed behavior for high-security environments:

```json
{
  "hooks": {
    "ToolResultTransform": [{
      "matcher": "WebFetch",
      "failMode": "block",  // "pass" (default) or "block"
      "hooks": [...]
    }]
  }
}
```

## Related Issues

- #4544 - PostToolUse hooks that can modify tool output (closed as duplicate)
- #4320 - Integrated Runtime Sandboxing for Tool Execution (sandboxing, not content scanning)
- #6699 - Critical Security Bug: `deny` permissions not enforced (workaround via PreToolUse)
- #4831 - Hook for Tool Execution Failures (`OnToolError`)

## References

- [Varonis Threat Labs: Reprompt Attack](https://www.varonis.com/blog/reprompt) - Microsoft Copilot vulnerability
- [Embrace The Red: Claude Data Exfiltration](https://embracethered.com/blog/posts/2025/claude-abusing-network-access-and-anthropic-api-for-data-exfiltration/) - Johann Rehberger's research
- [PromptArmor Threat Intelligence](https://promptarmor.com/) - Multiple AI assistant vulnerabilities
- [Simon Willison's "Lethal Trifecta"](https://simonwillison.net/) - Private data + untrusted content + external communication

## Summary

Prompt injection via external content is not a theoretical risk—it's an actively exploited vulnerability class. The current hook architecture cannot address it because no hook can transform tool results before context ingestion.

Adding `ToolResultTransform` would:
1. Enable defense-in-depth against prompt injection
2. Allow content sanitization/redaction for compliance requirements
3. Support token budget management
4. Maintain Claude Code's user-controlled, scriptable security model

This is a relatively small architectural change (one new hook point) with significant security benefits.

Product	Vulnerability	Disclosure Date	Source
Claude Cowork	Data exfiltration via Anthropic API	Jan 2026	PromptArmor / Embrace The Red
Microsoft Copilot	Reprompt attack (P2P injection)	Jan 2026	Varonis Threat Labs
Slack AI	Private channel exfiltration	Aug 2024	PromptArmor
Notion AI	Pre-approval data exfiltration	Jan 2026	PromptArmor
Google Antigravity	Credential theft via browser agent	Dec 2025	PromptArmor

Code	Behavior
0	Process JSON output normally
1	Non-blocking error (log warning, pass content through)
2	Blocking error (block content, show stderr to user)

Scenario	Recommended Behavior
Hook times out	Pass content through with warning
Hook crashes	Pass content through with warning
Invalid JSON output	Pass content through with warning
Hook returns malformed action	Pass content through with warning

[FEATURE] Tool result transform hook for content sanitization #18653

Description

Preflight Checklist

Problem Statement

The Architectural Gap

Current Limitations

Attack Scenario

Proposed Solution

New Hook Type: ToolResultTransform

Hook Specification

Trigger Point

Configuration

Input (stdin JSON)

Output (stdout JSON)

Exit Codes

Alternative Solutions

1. External Proxy (e.g., claudemon)

2. Disable Tools Entirely

3. Rely on Model Guardrails

4. User-Side Pre-Processing

Priority

Feature Category

Use Case Example

1. Prompt Injection Detection

2. Document Sanitization (Hidden Text Detection)

3. Content Redaction for Sensitive Environments

4. Token Budget Management

Additional Context

Security Considerations

Hook Trust Model

Performance

Failure Modes

Related Issues

References

Summary

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

New Hook Type: `ToolResultTransform`