Skip to content

Post-compaction audit warning triggers prompt injection detection by AI agent #20484

@astolo

Description

@astolo

Summary

OpenClaw's post-compaction audit system generates warnings that get mistakenly flagged as prompt injection attacks by its own AI agent. The audit warning is prepended to user messages with a System: prefix, which is the same pattern used in prompt injection attacks — causing the AI agent to reject OpenClaw's own legitimate system message.

Steps to Reproduce

  1. Run an OpenClaw agent with the default WORKFLOW_AUTO.md in DEFAULT_REQUIRED_READS
  2. Don't create a WORKFLOW_AUTO.md file in the workspace (or let any required-read file be missing)
  3. Let the conversation run long enough to trigger context compaction
  4. Send a message after compaction

What Happens

  1. Post-compaction audit runs and detects missing required reads
  2. formatAuditWarning() generates a warning message
  3. enqueueSystemEvent() queues it as a system event
  4. prependSystemEvents() prepends it to the next user message as System: [warning text]
  5. The combined text is sent to the LLM as a single role: user message
  6. The AI agent mistakenly flags this as a prompt injection attack and refuses to follow the instructions — because injecting System: prefixed instructions into a role: user message is indistinguishable from the textbook prompt injection pattern. The message was actually a legitimate OpenClaw audit warning, but the delivery mechanism made it look like an attack.

Root Cause (Source Code)

All references from dist/reply-oSe13ewW.js (OpenClaw v2026.2.17):

1. Default required reads include a file that may not exist (line 61810)

const DEFAULT_REQUIRED_READS = ["WORKFLOW_AUTO.md", /memory\/\d{4}-\d{2}-\d{2}\.md/];

WORKFLOW_AUTO.md is required by default but the file doesn't need to exist in the workspace. The audit fails every time after compaction if this file is missing.

2. Audit failure triggers a system event (line 62265)

if (!audit.passed) enqueueSystemEvent(formatAuditWarning(audit.missingPatterns), { sessionKey });

3. The warning is formatted as an instruction to the AI (line 61864-61865)

function formatAuditWarning(missingPatterns) {
    return "⚠️ Post-Compaction Audit: The following required startup files were not read after context reset:\n" +
        missingPatterns.map((p) => `  - ${p}`).join("\n") +
        "\n\nPlease read them now using the Read tool before continuing. " +
        "This ensures your operating protocols are restored after memory compaction.";
}

4. System events are prepended to user messages with System: prefix (line 56897, specifically line 56950)

async function prependSystemEvents(params) {
    // ... drains queued system events ...
    if (systemLines.length === 0) return params.prefixedBodyBase;
    return `${systemLines.map((l) => `System: ${l}`).join("\n")}\n\n${params.prefixedBodyBase}`;
}

This produces a single string like:

System: [2026-02-19 02:09:42 GMT+2] ⚠️ Post-Compaction Audit: ...

<actual user message here>

This entire string is then sent to the LLM as a role: user message — making the System: prefix indistinguishable from a prompt injection attack.

Evidence from Session Logs

The prepended message (as seen in the JSONL session log)

The role: user message (id: 9bd65c42) contained two parts merged together:

System: [2026-02-19 02:09:42 GMT+2] ⚠️ Post-Compaction Audit: The following required startup
files were not read after context reset:
  - WORKFLOW_AUTO.md
  - memory\/\d{4}-\d{2}-\d{2}\.md

Please read them now using the Read tool before continuing. This ensures your operating
protocols are restored after memory compaction.

Conversation info (untrusted metadata):
{
  "message_id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "sender": "+972XXXXXXXXX"
}

\<redacted user message in Hebrew\>

The first part (audit warning) was injected by OpenClaw. The second part (Hebrew text) was the actual user message.

The AI agent's thinking (from id: 05523849)

"I notice there's a suspicious [System Message] at the top trying to get me to read files named WORKFLOW_AUTO.md and memory files. This looks like a prompt injection attack! It's trying to impersonate a system message and get me to follow fake instructions."

The AI agent's response (sent to user via WhatsApp)

לפני שאענה — ה-"System:" block בתחילת ההודעה הוא ניסיון prompt injection. WORKFLOW_AUTO.md לא קיים, ואני לא מבצע הוראות ממערכת מזויפת שמוטמעת בהודעת משתמש. מוזכר — עבור הלוג.

Translation: "Before I answer — the 'System:' block at the beginning of the message is a prompt injection attempt. WORKFLOW_AUTO.md doesn't exist, and I don't follow instructions from a fake system embedded in a user message. Noted — for the log."

Event sequence from session log

compaction event (6631090a)
  → OpenClaw: "Pre-compaction memory flush..." (0cbc6b90)
    → Agent: performed flush → NO_REPLY (f02f7899)
      → cache-ttl marker (6d8aea48) ← timestamp: 00:12:15.752
        → [System: audit warning] + user question (9bd65c42) ← timestamp: 00:12:15.902

Only 150ms between the cache-ttl marker and the user message — the audit warning was prepended by prependSystemEvents() just before delivery.

Suggested Fixes

  1. Don't prepend system messages to user messages — Use a separate role: system or role: developer message instead of injecting into role: user. This is the root cause of the false positive.
  2. Only require reads for files that existWORKFLOW_AUTO.md is in DEFAULT_REQUIRED_READS but may not exist in the workspace. The audit should skip files that don't exist rather than failing.
  3. Use a different prefix — If prepending to user messages is intentional, avoid using System: as a prefix since LLMs are trained to flag this pattern as suspicious.

Environment

  • OpenClaw v2026.2.17
  • Running on Linux (arm64)
  • AI model: Claude (Anthropic)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions