-
-
Notifications
You must be signed in to change notification settings - Fork 55.9k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Feature: Native prompt injection scanning configuration in openclaw.json
Use case: Autonomous agents need to filter untrusted inputs (web scrapes, third-party messages, skill outputs) for malicious prompt injections before processing.
Proposed config:
{
"security": {
"promptInjection": {
"enabled": true,
"scanModel": "nvidia/meta/llama-guard-4-12b",
"blockOnUnsafe": true,
"logIncidents": true,
"logPath": "~/.openclaw/security/prompt-injection.log"
}
}
}How it would work:
- Before passing user/web/message content to the LLM, run it through a content safety model (Llama Guard 4, Nemotron, etc.)
- If unsafe/injection detected, either block (blockOnUnsafe: true) or log + proceed with warning
- Log all scans to audit trail
Why this matters:
- OWASP Agentic AI Top 10 #A01 (Prompt Injection)
- Critical for production deployments that ingest untrusted content
- Currently requires manual implementation outside OpenClaw's request pipeline
Workaround:
We currently run Llama Guard externally in our WATCHTOWER security framework, but native integration would be more efficient and reliable.
Related: This came up while responding to Palo Alto Networks' claim that OpenClaw is insecure. Native security config would strengthen the platform's enterprise credibility.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request