-
-
Notifications
You must be signed in to change notification settings - Fork 55.9k
Description
Summary
Add support for AWS Bedrock Guardrails via the ApplyGuardrail API to provide content filtering, PII detection, and prompt attack prevention across the OpenClaw data flow.
Motivation
Bedrock Guardrails provide enterprise-grade content safety features:
- Content filtering (hate, violence, sexual, misconduct, prompt attacks)
- PII detection and masking (names, emails, SSN, etc.)
- Denied topics (custom topic blocking)
- Word filters (custom blocklists)
Critically, the ApplyGuardrail API works independently of the model provider - you can use Bedrock guardrails even when using Anthropic direct API, OpenAI, or any other provider. This makes it a universal safety layer.
Proposed Architecture
Hook Points
Instead of just wrapping model inference, guardrails should be applied at multiple points in the data flow:
| Hook | When | Purpose |
|---|---|---|
input |
User message received | Block prompt attacks, denied topics |
output |
Model response ready | Filter harmful content before delivery |
memory.write |
Before saving to memory files | Prevent PII/secrets from persisting |
memory.read |
After memory_search retrieval | Check retrieved context before use |
tool.result |
After tool execution | Catch sensitive data in file contents, exec output |
web.search |
After search results return | Filter injected prompts in results |
web.fetch |
After URL content fetched | Check external content before context |
Data Flow
User Input
↓
[Guardrail: INPUT] ← prompt attack detection
↓
Memory Search
↓
[Guardrail: MEMORY_READ] ← PII check on retrieved context
↓
Web Search/Fetch
↓
[Guardrail: WEB_*] ← external content validation
↓
Tool Calls
↓
[Guardrail: TOOL_RESULT] ← sensitive data in outputs
↓
Model Inference
↓
[Guardrail: OUTPUT] ← content filtering
↓
Memory Write
↓
[Guardrail: MEMORY_WRITE] ← prevent PII persistence
↓
Response to User
Configuration
{
guardrails: {
bedrock: {
enabled: true,
guardrailId: "abc123def",
guardrailVersion: "DRAFT", // or version number
region: "us-east-1",
// Enable/disable specific hooks
hooks: {
input: true,
output: true,
memoryWrite: true,
memoryRead: false, // might be noisy
toolResult: true,
webSearch: true,
webFetch: true,
},
// Behavior when guardrail triggers
onBlock: "reject", // reject | warn | log
onPiiDetected: "mask", // mask | reject | log
}
}
}Implementation
New files:
src/agents/bedrock-guardrails.ts- Core ApplyGuardrail wrappersrc/agents/guardrails-hooks.ts- Hook registration and executionsrc/config/types.guardrails.ts- Config schema
Key functions:
async function applyGuardrail(params: {
content: string;
source: "INPUT" | "OUTPUT";
guardrailId: string;
guardrailVersion: string;
}): Promise<GuardrailResult>;
function registerGuardrailHook(
hook: GuardrailHook,
handler: GuardrailHandler
): void;AWS Permissions Required
bedrock:ApplyGuardrail
Uses existing AWS SDK auth chain (same as Bedrock inference).
Use Cases
- Enterprise compliance - Prevent PII from leaking into logs/memory
- Content safety - Block harmful outputs before delivery
- Prompt injection defense - Check external content (web, RAG) for attacks
- Audit logging - Track what guardrails caught
Open Questions
- Should guardrail checks be async/non-blocking for low-priority hooks?
- How to handle guardrail latency impact on response time?
- Should we support multiple guardrail configurations for different hooks?
- Integration with existing tool policy system?
References
- ApplyGuardrail API docs
- Bedrock Guardrails overview
- AWS SDK:
@aws-sdk/client-bedrock-runtime(already installed)
Happy to implement this if the design direction looks good. Would love feedback on the hook architecture.