Skip to content

[Feature]: Pluggable Guardrail Provider Interface for tool authorization #46441

@uchibeke

Description

@uchibeke

Summary

A standard GuardrailProvider interface that lets users plug any authorization provider into tool:before hooks - so tool calls (exec, write, browser, MCP, messaging) can be evaluated before execution, not just exec commands via the current approvals system.

Problem to solve

OpenClaw has exec approvals for shell commands, but no general-purpose authorization for any other tool — file writes, browser actions, messaging, MCP tools, git operations, etc. An agent can write to ~/.ssh/authorized_keys, send messages to arbitrary recipients, or execute MCP tools with no policy check.

The community has been asking for this across 10+ issues spanning 2+ years:

The infrastructure is partially there — PR #22068 merged tool:before/tool:after internal hook events, and the plugin system defines before_tool_call. But there's no standard contract for guardrail providers to implement, so every solution is ad-hoc and incompatible.

Proposed solution

A minimal TypeScript interface that any guardrail provider can implement:

interface GuardrailProvider {
  name: string;
  version: string;
  evaluate(request: GuardrailRequest): Promise<GuardrailDecision>;
  healthCheck?(): Promise<{ ok: boolean; message?: string }>;
}

interface GuardrailRequest {
  toolName: string;           // "exec", "write", "browser", "mcp.tool_name"
  params: Record<string, unknown>;
  agentId?: string;
  sessionId?: string;
  timestamp: string;
}

interface GuardrailDecision {
  allow: boolean;
  reasons?: Array<{ code: string; message: string }>;
  metadata?: Record<string, unknown>;  // provider-specific (audit ID, signature, etc.)
}

Config:

guardrails:
  enabled: true
  failClosed: true
  provider: "my-guardrail-plugin"   # or "./local-guardrail.ts"
  config:
    # provider-specific settings

How it works:

  1. tool:before fires
  2. If guardrail configured, call provider.evaluate({ toolName, params })
  3. allow: false → block tool, return reasons to agent
  4. allow: true or no provider → proceed normally

Key properties:

  • Opt-in — zero impact when not configured
  • Provider-agnostic — users pick their own implementation (simple allowlist, policy engine, enterprise service)
  • Builds on existing infra — uses tool:before from PR Add tool:before/tool:after internal hook events #22068, no execution pipeline changes
  • Fail-closed option — provider errors can deny by default (configurable)

Alternatives considered

1. Extend exec approvals to all tools
Tightly coupled to OpenClaw internals, requires core changes per tool category, doesn't support external providers or custom policies. The three-layer model (policy/allowlist/approval) is good for exec but doesn't generalize to tools with different parameter shapes.

2. Full interceptor pipeline (PR #6569 approach)
Too much scope - interceptors for tool calls, messages, and params in one PR. Was closed. A focused interface for just tool authorization is more likely to land and can be extended later.

3. Per-plugin ad-hoc hooks
What exists today - each plugin implements its own before_tool_call handler with no shared contract. Providers can't be swapped, config isn't standard, and there's no failClosed behavior. Works but doesn't compose.

Impact

Affected: Every OpenClaw user running agents with tool access — especially multi-channel setups (Slack, Discord, WhatsApp) where agents act on behalf of users, and enterprise/team deployments where agents touch production systems.

Severity: Blocks workflow for security-conscious deployments. Currently the only option is exec approvals (shell only) or trusting the agent entirely for everything else.

Frequency: Every tool call. Agents execute tools continuously — file operations, web fetches, messaging, MCP tools. Each one is an unguarded action.

Consequence:

Evidence/examples

Community demand: 10+ issues listed above, plus two substantial PRs (#6095 modular guardrails, #6569 interceptor pipeline) that were closed - indicating demand exists but prior approaches were too broad.

Working reference: APort Agent Guardrails implements this pattern today as an OpenClaw plugin via before_tool_call. It maps tools to policies (exec → command policy, write → file policy, etc.), evaluates locally or via API, and blocks denied calls. Runs without any OpenClaw core changes - proving the interface is viable.

External research: Noma Security found 53% of enterprise users granted AI agents privileged access without policy controls. Cisco documented data exfiltration via third-party skills. A standard guardrail interface addresses both.

Prior art in other ecosystems:

  • OCI Runtime Spec — container interface, any runtime
  • OpenTelemetry Collector — observability interface, any backend
  • CSI (Kubernetes) — storage interface, any provider
  • Android/iOS permission models — capability declarations before install

Additional information

Happy to submit a focused PR if there's interest. Scope would be:

  • GuardrailProvider interface in packages/types/
  • guardrails config section in config schema
  • Wire into tool:before hook handling
  • Docs at docs/extensions/guardrails.md

No bundled providers, no changes to the steerable agent loop, no opinions on policy format. Just the interface - providers bring the opinions.

Exec approvals could optionally be refactored as a built-in guardrail provider in a follow-up, unifying the model. But that's separate scope.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions