Skip to content

RFC: AWS Bedrock Guardrails Integration (ApplyGuardrail API) #9748

@koushikkethamakka

Description

@koushikkethamakka

Summary

Add support for AWS Bedrock Guardrails via the ApplyGuardrail API to provide content filtering, PII detection, and prompt attack prevention across the OpenClaw data flow.

Motivation

Bedrock Guardrails provide enterprise-grade content safety features:

  • Content filtering (hate, violence, sexual, misconduct, prompt attacks)
  • PII detection and masking (names, emails, SSN, etc.)
  • Denied topics (custom topic blocking)
  • Word filters (custom blocklists)

Critically, the ApplyGuardrail API works independently of the model provider - you can use Bedrock guardrails even when using Anthropic direct API, OpenAI, or any other provider. This makes it a universal safety layer.

Proposed Architecture

Hook Points

Instead of just wrapping model inference, guardrails should be applied at multiple points in the data flow:

Hook When Purpose
input User message received Block prompt attacks, denied topics
output Model response ready Filter harmful content before delivery
memory.write Before saving to memory files Prevent PII/secrets from persisting
memory.read After memory_search retrieval Check retrieved context before use
tool.result After tool execution Catch sensitive data in file contents, exec output
web.search After search results return Filter injected prompts in results
web.fetch After URL content fetched Check external content before context

Data Flow

User Input
    ↓
[Guardrail: INPUT] ← prompt attack detection
    ↓
Memory Search
    ↓
[Guardrail: MEMORY_READ] ← PII check on retrieved context
    ↓
Web Search/Fetch
    ↓
[Guardrail: WEB_*] ← external content validation
    ↓
Tool Calls
    ↓
[Guardrail: TOOL_RESULT] ← sensitive data in outputs
    ↓
Model Inference
    ↓
[Guardrail: OUTPUT] ← content filtering
    ↓
Memory Write
    ↓
[Guardrail: MEMORY_WRITE] ← prevent PII persistence
    ↓
Response to User

Configuration

{
  guardrails: {
    bedrock: {
      enabled: true,
      guardrailId: "abc123def",
      guardrailVersion: "DRAFT", // or version number
      region: "us-east-1",
      
      // Enable/disable specific hooks
      hooks: {
        input: true,
        output: true,
        memoryWrite: true,
        memoryRead: false,  // might be noisy
        toolResult: true,
        webSearch: true,
        webFetch: true,
      },
      
      // Behavior when guardrail triggers
      onBlock: "reject",     // reject | warn | log
      onPiiDetected: "mask", // mask | reject | log
    }
  }
}

Implementation

New files:

  • src/agents/bedrock-guardrails.ts - Core ApplyGuardrail wrapper
  • src/agents/guardrails-hooks.ts - Hook registration and execution
  • src/config/types.guardrails.ts - Config schema

Key functions:

async function applyGuardrail(params: {
  content: string;
  source: "INPUT" | "OUTPUT";
  guardrailId: string;
  guardrailVersion: string;
}): Promise<GuardrailResult>;

function registerGuardrailHook(
  hook: GuardrailHook,
  handler: GuardrailHandler
): void;

AWS Permissions Required

  • bedrock:ApplyGuardrail

Uses existing AWS SDK auth chain (same as Bedrock inference).

Use Cases

  1. Enterprise compliance - Prevent PII from leaking into logs/memory
  2. Content safety - Block harmful outputs before delivery
  3. Prompt injection defense - Check external content (web, RAG) for attacks
  4. Audit logging - Track what guardrails caught

Open Questions

  1. Should guardrail checks be async/non-blocking for low-priority hooks?
  2. How to handle guardrail latency impact on response time?
  3. Should we support multiple guardrail configurations for different hooks?
  4. Integration with existing tool policy system?

References


Happy to implement this if the design direction looks good. Would love feedback on the hook architecture.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions