Skip to content

Feature Request: message:before-send hook event for output guardrails #20246

@jpirstin

Description

@jpirstin

Summary

Add a new hook event message:before-send (or expose message_sending plugin hook to internal hooks) that fires before an outgoing message is delivered to any channel. The hook should be able to:

  1. Modify the message content (rewrite)
  2. Cancel the message entirely (return { cancel: true })

Motivation

I'm building an autonomy guardrail system for an AI agent that has a recurring problem: it asks permission for tactical decisions instead of acting autonomously. The agent has rules against this behavior, but under conversational pressure (e.g., upset user), the base model training overrides the rules.

Current state:

  • message:sent internal hook fires AFTER the message is already delivered — too late to prevent violations
  • message_sending plugin hook exists and CAN modify/cancel messages, but requires a full plugin (manifest + extensions dir + config)
  • The research (Anthropic's <system-reminder> pattern, LLM guardrails literature) shows that output filtering before delivery is the most effective layer

What I need:

  • A simple internal hook (like message-filter for inbound) that can intercept outbound messages
  • Regex pattern matching + optional LLM classification on the outgoing text
  • Ability to rewrite problematic phrases or cancel the message

Current Workaround

I built a plugin using before_prompt_build (injects an autonomy reminder every turn) + message_sending (logs violations). This works but:

  • Requires a full plugin with manifest, not a simple hook directory
  • The message_sending plugin hook API isn't documented in the hooks docs
  • Internal hooks (message:sent) fire too late

Proposed API

// In HOOK.md metadata:
// events: ["message:before-send"]

const handler: HookHandler = async (event) => {
  const content = event.context.content;
  
  // Check for problems
  if (/want me to|should I/i.test(content)) {
    // Option 1: Rewrite
    event.context.content = content.replace(/want me to .+\?/gi, '');
    
    // Option 2: Cancel
    // event.context.cancel = true;
  }
};

Impact

This would enable a whole class of output guardrails as simple hook directories — no plugin infrastructure needed. Use cases beyond autonomy:

  • Content policy enforcement
  • Tone/style consistency
  • PII redaction before sending
  • Rate limiting outbound messages

Environment

  • OpenClaw 2026.2.17
  • Using internal hooks system (~/.openclaw/hooks/)

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions