-
-
Notifications
You must be signed in to change notification settings - Fork 52.6k
Description
[Bug]: Inbound metadata injection into system prompt breaks Anthropic prompt caching (~100x cost increase)
Summary
buildInboundMetaSystemPrompt() injects per-message metadata (including message_id) into the system prompt on every API call. Since Anthropic's prompt caching is prefix-based and the system prompt is the prefix, any change to the system prompt invalidates the entire cache. This causes the full prompt to be re-written to cache on every turn instead of incrementally, resulting in an ~80-170x increase in cache_write costs.
Environment
- OpenClaw version: 2026.2.15+ (regression introduced in this version)
- Provider: Anthropic (claude-opus-4-6, claude-sonnet-4-6)
- Channel: Telegram (DM)
- Context window: 200k tokens
Reproduction
- Run any Anthropic model on v2026.2.15+
- Send multiple messages in a DM session
- Observe
cache_writein usage — each turn re-writes 60-190k tokens instead of the expected 100-2000 token delta
No special config required. Affects all Anthropic users on v2026.2.15+.
Root Cause
In src/auto-reply/reply/inbound-meta.ts, buildInboundMetaSystemPrompt() builds a JSON block containing volatile per-message fields:
{
"schema": "openclaw.inbound_meta.v1",
"message_id": "4821", // ← changes every message
"sender_id": "123456789",
"chat_id": "telegram:123456789",
...
}This block is injected into extraSystemPrompt (via the auto-reply handler around line 37907 in pi-embedded-BxoxxVJz.js):
const extraSystemPrompt = [
buildInboundMetaSystemPrompt(isNewSession ? sessionCtx : {
...sessionCtx,
ThreadStarterBody: void 0
}),
groupChatContext,
groupIntro,
groupSystemPrompt
].filter(Boolean).join("\n\n");Which then gets pushed into the system prompt lines:
lines.push(contextHeader, extraSystemPrompt, "");Since Anthropic caches by longest matching prefix, changing message_id in the system prompt invalidates everything after that point. Only the static instructions before the inbound context block (~8,921 tokens) survive as a cache hit.
Measured Impact
Data from a production deployment (single user, Opus 4.6):
| Period | Version | CW:CR Ratio | CW/Call (avg) | Daily CW Cost |
|---|---|---|---|---|
| Feb 14 | 2026.2.9 | 1:26 | 3,337 tokens | $0.44 |
| Feb 15 | 2026.2.9 | 1:40 | 3,866 tokens | $0.19 |
| Feb 16 | 2026.2.15 | 1:4 | 20,014 tokens | $21.66 |
| Feb 17 | 2026.2.15 | 1:1 | 46,092 tokens | $40.04 |
| Feb 18 | 2026.2.17 | 1:2 | 37,755 tokens | $84.32 |
The Feb 16 inflection point correlates exactly with the mid-session update to v2026.2.15 at 05:32 UTC. The last healthy cache hit was at 05:32:55; the first broken hit (cacheRead=8,921) was at 05:34:17.
Cache behavior detail (Feb 17, typical post-regression):
00:57:28 cw= 99,142 cr= 0 ← TTL miss (normal after gap)
00:57:32 cw= 989 cr= 99,142 ← healthy (reads previous write)
01:10:14 cw= 96,944 cr= 8,921 ← BROKEN (only static prefix matches)
01:11:20 cw= 97,089 cr= 8,921 ← BROKEN
09:28:54 cw= 115,091 cr= 0 ← TTL miss
09:30:10 cw= 106,568 cr= 8,921 ← BROKEN (even 2 min later!)
The 8,921 token cache hit is consistent across all broken calls — it represents the static OpenClaw instructions before the ## Inbound Context block. Everything after (workspace files, conversation history) is re-written every turn.
Healthy behavior (Feb 15, pre-regression):
00:00:39 cw= 182 cr= 161,872 ← 99.9% cache hit
00:03:51 cw= 1,955 cr= 162,054 ← small delta only
00:04:39 cw= 397 cr= 164,009 ← small delta only
Changelog entries that introduced this
From the 2026.2.15/2026.2.17 changelog:
- Auto-reply/Prompts: include trusted inbound
message_idin conversation metadata payloads for downstream targeting workflows.- Group chats: always inject group chat context (name, participants, reply guidance) into the system prompt on every turn, not just the first.
The first entry added message_id to the inbound metadata. The second ensures the block is injected on every turn (not just the first). Together they guarantee the system prompt changes on every message.
Suggested Fix
The volatile inbound metadata should not be part of the system prompt prefix. Options:
Option A (minimal fix): Strip message_id (and any other per-message volatile fields) from the system prompt copy of inbound_meta. Keep sender_id, chat_id, channel, chat_type, etc. — those are stable within a session and won't break caching.
Option B (better fix): Move the entire ## Inbound Context (trusted metadata) block from the system prompt to a user-message prefix on the triggering message. The untrusted metadata blocks already use this pattern. This preserves the metadata for the model while keeping the system prompt cache-stable.
Option C (config toggle): Add a config flag like agents.defaults.inboundMeta.inSystemPrompt: false so users can opt out. Not ideal since it should default to the cache-friendly behavior.
Workaround
None currently available without source patching. Switching to a cheaper model (Sonnet) reduces per-token cost but doesn't fix the cache invalidation.
Labels
bug, performance, anthropic, prompt-caching