Skip to content

Feature Request: Prompt Injection Scanning Config #7705

@LumenLantern

Description

@LumenLantern

Feature: Native prompt injection scanning configuration in openclaw.json

Use case: Autonomous agents need to filter untrusted inputs (web scrapes, third-party messages, skill outputs) for malicious prompt injections before processing.

Proposed config:

{
  "security": {
    "promptInjection": {
      "enabled": true,
      "scanModel": "nvidia/meta/llama-guard-4-12b",
      "blockOnUnsafe": true,
      "logIncidents": true,
      "logPath": "~/.openclaw/security/prompt-injection.log"
    }
  }
}

How it would work:

  1. Before passing user/web/message content to the LLM, run it through a content safety model (Llama Guard 4, Nemotron, etc.)
  2. If unsafe/injection detected, either block (blockOnUnsafe: true) or log + proceed with warning
  3. Log all scans to audit trail

Why this matters:

  • OWASP Agentic AI Top 10 #A01 (Prompt Injection)
  • Critical for production deployments that ingest untrusted content
  • Currently requires manual implementation outside OpenClaw's request pipeline

Workaround:
We currently run Llama Guard externally in our WATCHTOWER security framework, but native integration would be more efficient and reliable.

Related: This came up while responding to Palo Alto Networks' claim that OpenClaw is insecure. Native security config would strengthen the platform's enterprise credibility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions