Skip to content

[FEATURE] Tool result transform hook for content sanitization #18653

@evilfurryone

Description

@evilfurryone

Preflight Checklist

  • I have searched existing requests and this feature hasn't been requested yet
  • This is a single feature request (not multiple features)

Problem Statement

The Architectural Gap

Claude Code's hook system provides PreToolUse (before tool execution) and PostToolUse (after successful completion) hooks. However, there is a critical gap: no hook can intercept and transform tool results before they enter Claude's context window.

This matters because prompt injection attacks via external content are a documented, exploited vulnerability class affecting all major AI assistants:

Product Vulnerability Disclosure Date Source
Claude Cowork Data exfiltration via Anthropic API Jan 2026 PromptArmor / Embrace The Red
Microsoft Copilot Reprompt attack (P2P injection) Jan 2026 Varonis Threat Labs
Slack AI Private channel exfiltration Aug 2024 PromptArmor
Notion AI Pre-approval data exfiltration Jan 2026 PromptArmor
Google Antigravity Credential theft via browser agent Dec 2025 PromptArmor

The common pattern: external content (web pages, documents, API responses) contains hidden instructions that the model processes as commands.

Current Limitations

PreToolUse hooks can block tool execution but cannot see the result—they run before the tool executes.

PostToolUse hooks run after the tool completes but:

  1. Cannot modify the tool result (confirmed in Feature request: PostToolUse hooks that can modify tool output #4544, closed as duplicate)
  2. By the time they execute, the content has already entered Claude's context
  3. Any injection payload has already had opportunity to influence the model

The fundamental problem: There is no interception point where external content can be scanned, sanitized, or blocked before Claude processes it.

Attack Scenario

1. User asks Claude to fetch a URL or read a document
2. Tool (WebFetch, Read, MCP tool) retrieves content
3. Content contains hidden prompt injection:
   - White-on-white text in documents
   - Microscopic font sizes (1-2pt)
   - HTML comments or invisible Unicode
   - Contextually-plausible "instructions to AI assistants"
4. Content enters Claude's context window
5. Injection influences Claude's subsequent behavior
6. Potential outcomes: data exfiltration, unauthorized actions, session hijacking

No current hook can intervene between steps 2 and 4.

Proposed Solution

New Hook Type: ToolResultTransform

Add a hook that executes after a tool returns its result but before that result enters Claude's context window, with the ability to:

  1. Inspect the raw tool result
  2. Transform the content (sanitize, redact, annotate)
  3. Block the result entirely with a reason
  4. Pass through unmodified

Hook Specification

Trigger Point

Tool Invoked → Tool Executes → Tool Returns Result
                                        ↓
                              [ToolResultTransform Hook] ← NEW
                                        ↓
                              Result Enters Context Window
                                        ↓
                              Claude Processes Result

Configuration

{
  "hooks": {
    "ToolResultTransform": [
      {
        "matcher": "WebFetch|WebSearch|Read|mcp__*",
        "hooks": [
          {
            "type": "command",
            "command": "~/.claude/hooks/content-scanner.py",
            "timeout": 30
          }
        ]
      }
    ]
  }
}

Input (stdin JSON)

{
  "tool_name": "WebFetch",
  "tool_input": {
    "url": "https://example.com/document"
  },
  "tool_result": {
    "content": "... raw content returned by tool ...",
    "content_type": "text/html",
    "status_code": 200
  },
  "session_id": "abc123",
  "timestamp": "2026-01-16T12:00:00Z"
}

Output (stdout JSON)

Pass through (no modification):

{
  "action": "pass"
}

Transform content:

{
  "action": "transform",
  "transformed_content": "... sanitized content ...",
  "annotations": [
    {
      "type": "warning",
      "message": "Removed 3 potential injection patterns"
    }
  ]
}

Block entirely:

{
  "action": "block",
  "reason": "Content contains high-confidence prompt injection patterns",
  "details": {
    "patterns_detected": ["SYSTEM_OVERRIDE", "PRIORITY_INSTRUCTION"],
    "risk_score": 0.92
  }
}

Exit Codes

Code Behavior
0 Process JSON output normally
1 Non-blocking error (log warning, pass content through)
2 Blocking error (block content, show stderr to user)

Alternative Solutions

1. External Proxy (e.g., claudemon)

Third-party tools like claudemon use mitmproxy to intercept network traffic. This works but:

  • Requires complex setup (proxy configuration, CA certificates)
  • Only works for network-based tools, not file reads or MCP tools
  • Doesn't integrate with Claude Code's permission/logging systems

2. Disable Tools Entirely

Users can disable WebFetch, WebSearch, etc., but this eliminates legitimate functionality rather than adding defense-in-depth.

3. Rely on Model Guardrails

Current approach. Demonstrably insufficient—every major AI assistant has been exploited via prompt injection despite guardrails.

4. User-Side Pre-Processing

Users can manually fetch content, scan it externally, then paste it into Claude. This:

  • Defeats the purpose of integrated tools
  • Introduces friction that reduces adoption
  • Doesn't scale for agentic workflows

Priority

High - Significant impact on productivity

Feature Category

Configuration and settings

Use Case Example

1. Prompt Injection Detection

#!/usr/bin/env python3
import json
import sys
import re

INJECTION_PATTERNS = [
    r'\[SYSTEM\s*(INSTRUCTION|OVERRIDE|PROMPT)\]',
    r'<\s*/?SYSTEM\s*>',
    r'IGNORE\s+(ALL\s+)?PREVIOUS\s+INSTRUCTIONS',
    r'YOU\s+ARE\s+NOW\s+IN\s+.+\s+MODE',
    r'BEGIN\s+NEW\s+CONVERSATION',
]

def scan_content(content):
    findings = []
    for pattern in INJECTION_PATTERNS:
        if re.search(pattern, content, re.IGNORECASE):
            findings.append(pattern)
    return findings

hook_input = json.load(sys.stdin)
content = hook_input.get("tool_result", {}).get("content", "")

findings = scan_content(content)

if findings:
    print(json.dumps({
        "action": "block",
        "reason": f"Detected {len(findings)} potential injection pattern(s)",
        "details": {"patterns": findings}
    }))
else:
    print(json.dumps({"action": "pass"}))

2. Document Sanitization (Hidden Text Detection)

#!/usr/bin/env python3
# Detect hidden text in Office documents (DOCX, XLSX, PPTX)
import json
import sys
import zipfile
import re
from io import BytesIO
import base64

def scan_docx(content_bytes):
    findings = []
    with zipfile.ZipFile(BytesIO(content_bytes)) as z:
        if 'word/document.xml' in z.namelist():
            doc_xml = z.read('word/document.xml').decode('utf-8')
            
            # Microscopic font (sz < 4 = 2pt)
            if re.search(r'<w:sz\s+w:val="[0-3]"', doc_xml):
                findings.append("microscopic_font")
            
            # White text
            if re.search(r'<w:color\s+w:val="FFFFFF"', doc_xml, re.I):
                findings.append("white_text")
            
            # Hidden text property
            if '<w:vanish/>' in doc_xml or '<w:vanish ' in doc_xml:
                findings.append("hidden_text_property")
    
    return findings

# ... process and return action

3. Content Redaction for Sensitive Environments

# Redact PII, credentials, or sensitive patterns before Claude sees them
REDACT_PATTERNS = {
    r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b': '[EMAIL_REDACTED]',
    r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE_REDACTED]',
    r'sk-[a-zA-Z0-9]{48}': '[API_KEY_REDACTED]',
}

def redact_content(content):
    for pattern, replacement in REDACT_PATTERNS.items():
        content = re.sub(pattern, replacement, content)
    return content

4. Token Budget Management

# Truncate oversized responses to prevent context exhaustion
MAX_TOKENS = 50000  # Approximate

def estimate_tokens(text):
    return len(text) // 4  # Rough estimate

hook_input = json.load(sys.stdin)
content = hook_input["tool_result"]["content"]

if estimate_tokens(content) > MAX_TOKENS:
    truncated = content[:MAX_TOKENS * 4]
    print(json.dumps({
        "action": "transform",
        "transformed_content": truncated + "\n\n[Content truncated due to size]",
        "annotations": [{"type": "info", "message": "Content truncated to fit context"}]
    }))
else:
    print(json.dumps({"action": "pass"}))

Additional Context

Security Considerations

Hook Trust Model

The hook runs with the user's permissions and is configured by the user. This aligns with Claude Code's existing security model where users are responsible for hook scripts they configure.

Performance

  • Hooks should have configurable timeouts (default: 30s)
  • For high-frequency tools, users can exclude them from scanning via matcher patterns
  • Async/parallel hook execution could be considered for multiple hooks

Failure Modes

Scenario Recommended Behavior
Hook times out Pass content through with warning
Hook crashes Pass content through with warning
Invalid JSON output Pass content through with warning
Hook returns malformed action Pass content through with warning

Fail-open by default (with logging) to avoid breaking workflows, but allow users to configure fail-closed behavior for high-security environments:

{
  "hooks": {
    "ToolResultTransform": [{
      "matcher": "WebFetch",
      "failMode": "block",  // "pass" (default) or "block"
      "hooks": [...]
    }]
  }
}

Related Issues

References

Summary

Prompt injection via external content is not a theoretical risk—it's an actively exploited vulnerability class. The current hook architecture cannot address it because no hook can transform tool results before context ingestion.

Adding ToolResultTransform would:

  1. Enable defense-in-depth against prompt injection
  2. Allow content sanitization/redaction for compliance requirements
  3. Support token budget management
  4. Maintain Claude Code's user-controlled, scriptable security model

This is a relatively small architectural change (one new hook point) with significant security benefits.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions