Summary
Before-send hook for outbound message filtering
Problem to solve
When an agent's LLM response includes chain-of-thought reasoning alongside the intended reply, the entire output is delivered to the end user. There is no hook to inspect or filter outbound messages before delivery.
We've had 3 incidents in 2 weeks where internal processing text leaked to external WhatsApp group chats and DMs — things like "Let me check who this contact is" and full system status dumps.
Use cases:
• Internal processing leak prevention (our case — 3 incidents)
• PII redaction before delivery
• Content safety scanning
• Audit logging of all outbound messages
• Rate limiting per-recipient
• Language/tone enforcement per-group
Related: #20246, #12960
Proposed solution
Add a message:before-send hook event that fires before any outgoing message is delivered to a channel. The hook should support:
1. Inspecting the message text, target channel, and session context
2. Modifying the message content (rewrite)
3. Cancelling the message entirely (return { cancel: true } )
This would slot into normalizeReplyPayload() or deliverOutboundPayloads() — after NO_REPLY/HEARTBEAT_OK stripping but before channel-specific delivery.
Alternatives considered
• SOUL.md / system prompt rules → agent ignores under cognitive load
• agent:bootstrap hook to inject rules → still prompt-level, can't block delivery
• heartbeat.target: "none" → only fixes heartbeat-specific leaks
• Custom outbound-guard plugin with 26 regex patterns → detects leaks but can't intercept deliver
Impact
Running a multi-channel agent (WhatsApp, Telegram, Slack, Discord) serving both the owner and external contacts. Internal reasoning text has leaked to external users 3 times in 2 weeks — damaging trust and professionalism. There is currently no architectural way to prevent this. Prompt-level rules fail under cognitive load. The only real fix is a delivery-layer hook that can intercept and block before the message reaches the recipient. Without this, any agent serving external users on messaging platforms is one bad inference away from exposing internal operations to the wrong person.
Evidence/examples
No response
Additional information
No response
Summary
Before-send hook for outbound message filtering
Problem to solve
When an agent's LLM response includes chain-of-thought reasoning alongside the intended reply, the entire output is delivered to the end user. There is no hook to inspect or filter outbound messages before delivery.
We've had 3 incidents in 2 weeks where internal processing text leaked to external WhatsApp group chats and DMs — things like "Let me check who this contact is" and full system status dumps.
Use cases:
• Internal processing leak prevention (our case — 3 incidents)
• PII redaction before delivery
• Content safety scanning
• Audit logging of all outbound messages
• Rate limiting per-recipient
• Language/tone enforcement per-group
Related: #20246, #12960
Proposed solution
Add a message:before-send hook event that fires before any outgoing message is delivered to a channel. The hook should support:
1. Inspecting the message text, target channel, and session context
2. Modifying the message content (rewrite)
3. Cancelling the message entirely (return { cancel: true } )
This would slot into normalizeReplyPayload() or deliverOutboundPayloads() — after NO_REPLY/HEARTBEAT_OK stripping but before channel-specific delivery.
Alternatives considered
• SOUL.md / system prompt rules → agent ignores under cognitive load
• agent:bootstrap hook to inject rules → still prompt-level, can't block delivery
• heartbeat.target: "none" → only fixes heartbeat-specific leaks
• Custom outbound-guard plugin with 26 regex patterns → detects leaks but can't intercept deliver
Impact
Running a multi-channel agent (WhatsApp, Telegram, Slack, Discord) serving both the owner and external contacts. Internal reasoning text has leaked to external users 3 times in 2 weeks — damaging trust and professionalism. There is currently no architectural way to prevent this. Prompt-level rules fail under cognitive load. The only real fix is a delivery-layer hook that can intercept and block before the message reaches the recipient. Without this, any agent serving external users on messaging platforms is one bad inference away from exposing internal operations to the wrong person.
Evidence/examples
No response
Additional information
No response