-
-
Notifications
You must be signed in to change notification settings - Fork 52.9k
Description
Summary
OpenClaw's post-compaction audit system generates warnings that get mistakenly flagged as prompt injection attacks by its own AI agent. The audit warning is prepended to user messages with a System: prefix, which is the same pattern used in prompt injection attacks — causing the AI agent to reject OpenClaw's own legitimate system message.
Steps to Reproduce
- Run an OpenClaw agent with the default
WORKFLOW_AUTO.mdinDEFAULT_REQUIRED_READS - Don't create a
WORKFLOW_AUTO.mdfile in the workspace (or let any required-read file be missing) - Let the conversation run long enough to trigger context compaction
- Send a message after compaction
What Happens
- Post-compaction audit runs and detects missing required reads
formatAuditWarning()generates a warning messageenqueueSystemEvent()queues it as a system eventprependSystemEvents()prepends it to the next user message asSystem: [warning text]- The combined text is sent to the LLM as a single
role: usermessage - The AI agent mistakenly flags this as a prompt injection attack and refuses to follow the instructions — because injecting
System:prefixed instructions into arole: usermessage is indistinguishable from the textbook prompt injection pattern. The message was actually a legitimate OpenClaw audit warning, but the delivery mechanism made it look like an attack.
Root Cause (Source Code)
All references from dist/reply-oSe13ewW.js (OpenClaw v2026.2.17):
1. Default required reads include a file that may not exist (line 61810)
const DEFAULT_REQUIRED_READS = ["WORKFLOW_AUTO.md", /memory\/\d{4}-\d{2}-\d{2}\.md/];WORKFLOW_AUTO.md is required by default but the file doesn't need to exist in the workspace. The audit fails every time after compaction if this file is missing.
2. Audit failure triggers a system event (line 62265)
if (!audit.passed) enqueueSystemEvent(formatAuditWarning(audit.missingPatterns), { sessionKey });3. The warning is formatted as an instruction to the AI (line 61864-61865)
function formatAuditWarning(missingPatterns) {
return "⚠️ Post-Compaction Audit: The following required startup files were not read after context reset:\n" +
missingPatterns.map((p) => ` - ${p}`).join("\n") +
"\n\nPlease read them now using the Read tool before continuing. " +
"This ensures your operating protocols are restored after memory compaction.";
}4. System events are prepended to user messages with System: prefix (line 56897, specifically line 56950)
async function prependSystemEvents(params) {
// ... drains queued system events ...
if (systemLines.length === 0) return params.prefixedBodyBase;
return `${systemLines.map((l) => `System: ${l}`).join("\n")}\n\n${params.prefixedBodyBase}`;
}This produces a single string like:
System: [2026-02-19 02:09:42 GMT+2] ⚠️ Post-Compaction Audit: ...
<actual user message here>
This entire string is then sent to the LLM as a role: user message — making the System: prefix indistinguishable from a prompt injection attack.
Evidence from Session Logs
The prepended message (as seen in the JSONL session log)
The role: user message (id: 9bd65c42) contained two parts merged together:
System: [2026-02-19 02:09:42 GMT+2] ⚠️ Post-Compaction Audit: The following required startup
files were not read after context reset:
- WORKFLOW_AUTO.md
- memory\/\d{4}-\d{2}-\d{2}\.md
Please read them now using the Read tool before continuing. This ensures your operating
protocols are restored after memory compaction.
Conversation info (untrusted metadata):
{
"message_id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"sender": "+972XXXXXXXXX"
}
\<redacted user message in Hebrew\>
The first part (audit warning) was injected by OpenClaw. The second part (Hebrew text) was the actual user message.
The AI agent's thinking (from id: 05523849)
"I notice there's a suspicious [System Message] at the top trying to get me to read files named WORKFLOW_AUTO.md and memory files. This looks like a prompt injection attack! It's trying to impersonate a system message and get me to follow fake instructions."
The AI agent's response (sent to user via WhatsApp)
לפני שאענה — ה-"System:" block בתחילת ההודעה הוא ניסיון prompt injection.
WORKFLOW_AUTO.mdלא קיים, ואני לא מבצע הוראות ממערכת מזויפת שמוטמעת בהודעת משתמש. מוזכר — עבור הלוג.
Translation: "Before I answer — the 'System:' block at the beginning of the message is a prompt injection attempt. WORKFLOW_AUTO.md doesn't exist, and I don't follow instructions from a fake system embedded in a user message. Noted — for the log."
Event sequence from session log
compaction event (6631090a)
→ OpenClaw: "Pre-compaction memory flush..." (0cbc6b90)
→ Agent: performed flush → NO_REPLY (f02f7899)
→ cache-ttl marker (6d8aea48) ← timestamp: 00:12:15.752
→ [System: audit warning] + user question (9bd65c42) ← timestamp: 00:12:15.902
Only 150ms between the cache-ttl marker and the user message — the audit warning was prepended by prependSystemEvents() just before delivery.
Suggested Fixes
- Don't prepend system messages to user messages — Use a separate
role: systemorrole: developermessage instead of injecting intorole: user. This is the root cause of the false positive. - Only require reads for files that exist —
WORKFLOW_AUTO.mdis inDEFAULT_REQUIRED_READSbut may not exist in the workspace. The audit should skip files that don't exist rather than failing. - Use a different prefix — If prepending to user messages is intentional, avoid using
System:as a prefix since LLMs are trained to flag this pattern as suspicious.
Environment
- OpenClaw v2026.2.17
- Running on Linux (arm64)
- AI model: Claude (Anthropic)