RFC: AWS Bedrock Guardrails Integration (ApplyGuardrail API)

## Summary

Add support for AWS Bedrock Guardrails via the `ApplyGuardrail` API to provide content filtering, PII detection, and prompt attack prevention across the OpenClaw data flow.

## Motivation

Bedrock Guardrails provide enterprise-grade content safety features:
- **Content filtering** (hate, violence, sexual, misconduct, prompt attacks)
- **PII detection and masking** (names, emails, SSN, etc.)
- **Denied topics** (custom topic blocking)
- **Word filters** (custom blocklists)

Critically, the `ApplyGuardrail` API works **independently of the model provider** - you can use Bedrock guardrails even when using Anthropic direct API, OpenAI, or any other provider. This makes it a universal safety layer.

## Proposed Architecture

### Hook Points

Instead of just wrapping model inference, guardrails should be applied at multiple points in the data flow:

| Hook | When | Purpose |
|------|------|---------|
| `input` | User message received | Block prompt attacks, denied topics |
| `output` | Model response ready | Filter harmful content before delivery |
| `memory.write` | Before saving to memory files | Prevent PII/secrets from persisting |
| `memory.read` | After memory_search retrieval | Check retrieved context before use |
| `tool.result` | After tool execution | Catch sensitive data in file contents, exec output |
| `web.search` | After search results return | Filter injected prompts in results |
| `web.fetch` | After URL content fetched | Check external content before context |

### Data Flow

```
User Input
    ↓
[Guardrail: INPUT] ← prompt attack detection
    ↓
Memory Search
    ↓
[Guardrail: MEMORY_READ] ← PII check on retrieved context
    ↓
Web Search/Fetch
    ↓
[Guardrail: WEB_*] ← external content validation
    ↓
Tool Calls
    ↓
[Guardrail: TOOL_RESULT] ← sensitive data in outputs
    ↓
Model Inference
    ↓
[Guardrail: OUTPUT] ← content filtering
    ↓
Memory Write
    ↓
[Guardrail: MEMORY_WRITE] ← prevent PII persistence
    ↓
Response to User
```

### Configuration

```json5
{
  guardrails: {
    bedrock: {
      enabled: true,
      guardrailId: "abc123def",
      guardrailVersion: "DRAFT", // or version number
      region: "us-east-1",
      
      // Enable/disable specific hooks
      hooks: {
        input: true,
        output: true,
        memoryWrite: true,
        memoryRead: false,  // might be noisy
        toolResult: true,
        webSearch: true,
        webFetch: true,
      },
      
      // Behavior when guardrail triggers
      onBlock: "reject",     // reject | warn | log
      onPiiDetected: "mask", // mask | reject | log
    }
  }
}
```

### Implementation

New files:
- `src/agents/bedrock-guardrails.ts` - Core ApplyGuardrail wrapper
- `src/agents/guardrails-hooks.ts` - Hook registration and execution
- `src/config/types.guardrails.ts` - Config schema

Key functions:
```typescript
async function applyGuardrail(params: {
  content: string;
  source: "INPUT" | "OUTPUT";
  guardrailId: string;
  guardrailVersion: string;
}): Promise<GuardrailResult>;

function registerGuardrailHook(
  hook: GuardrailHook,
  handler: GuardrailHandler
): void;
```

### AWS Permissions Required

- `bedrock:ApplyGuardrail`

Uses existing AWS SDK auth chain (same as Bedrock inference).

## Use Cases

1. **Enterprise compliance** - Prevent PII from leaking into logs/memory
2. **Content safety** - Block harmful outputs before delivery
3. **Prompt injection defense** - Check external content (web, RAG) for attacks
4. **Audit logging** - Track what guardrails caught

## Open Questions

1. Should guardrail checks be async/non-blocking for low-priority hooks?
2. How to handle guardrail latency impact on response time?
3. Should we support multiple guardrail configurations for different hooks?
4. Integration with existing tool policy system?

## References

- [ApplyGuardrail API docs](https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-use-independent-api.html)
- [Bedrock Guardrails overview](https://aws.amazon.com/bedrock/guardrails/)
- AWS SDK: `@aws-sdk/client-bedrock-runtime` (already installed)

---

Happy to implement this if the design direction looks good. Would love feedback on the hook architecture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: AWS Bedrock Guardrails Integration (ApplyGuardrail API) #9748

Summary

Motivation

Proposed Architecture

Hook Points

Data Flow

Configuration

Implementation

AWS Permissions Required

Use Cases

Open Questions

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hook	When	Purpose
`input`	User message received	Block prompt attacks, denied topics
`output`	Model response ready	Filter harmful content before delivery
`memory.write`	Before saving to memory files	Prevent PII/secrets from persisting
`memory.read`	After memory_search retrieval	Check retrieved context before use
`tool.result`	After tool execution	Catch sensitive data in file contents, exec output
`web.search`	After search results return	Filter injected prompts in results
`web.fetch`	After URL content fetched	Check external content before context

Uh oh!

RFC: AWS Bedrock Guardrails Integration (ApplyGuardrail API) #9748

Description

Summary

Motivation

Proposed Architecture

Hook Points

Data Flow

Configuration

Implementation

AWS Permissions Required

Use Cases

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions