fix(auto-reply): restore prompt cache stability by moving per-turn ids to user context by anisoptera · Pull Request #20597 · openclaw/openclaw

anisoptera · 2026-02-19T03:56:45Z

Summary

• Problem: Commit bed8e7a added message_id, message_id_full, reply_to_id, and sender_id to buildInboundMetaSystemPrompt(), injecting them into the system prompt on every turn. Since message_id is unique per message, this caused the system prompt to differ on every turn, busting prefix-based prompt caches on local model providers (llama-server, LM Studio/MLX) and causing full cache rebuilds on every conversation turn.
• Why it matters: Cache invalidation on every turn increases latency, costs, and reduces efficiency for local models.
• What changed: Moved per-message identifiers from buildInboundMetaSystemPrompt() (system prompt) to buildInboundUserContextPrefix() (user context prefix). System prompt now contains only session-stable routing fields.
• What did NOT change (scope boundary): Other metadata fields (sender info, thread starter, forwarded message, chat history) remain unchanged. The buildInboundMetaSystemPrompt() trusted metadata schema remains the same.

Change Type

• [x] Bug fix

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

List user-visible changes (including defaults/config).
If none, write None.

Security Impact (required)

New permissions/capabilities? No
• Secrets/tokens handling changed? No
• New/changed network calls? No
• Command/tool execution surface changed? No
• Data access scope changed? No

Repro + Verification

Environment

• OS: Linux (Debian)
• Runtime/container: Node.js v22.22.0
• Model/provider: glm-4.7-flash, llama.cpp
• Integration/channel: General
• Relevant config: Default OpenClaw config

Steps

Configure OpenClaw to use a local model provider (e.g., llama-server, LM Studio)
Send multiple messages in a conversation
Observe prompt cache behavior via logs or diagnostics

###Expected

• Prompt cache should remain stable across turns when workspace files don't change
• Cache should only rebuild when actual workspace changes occur

###Actual

• Before fix: Cache invalidation on every turn
• After fix: Cache remains stable across turns

Evidence

Attach at least one:

Trace/log snippets
Here's it actually being able to use the prefix cache:

Feb 18 19:54:43 megami launch_models.sh[497507]: srv  get_availabl: updating prompt cache
Feb 18 19:54:43 megami launch_models.sh[497507]: srv   prompt_save:  - saving prompt with length 18201, total state size = 940.031 MiB
Feb 18 19:54:43 megami launch_models.sh[497507]: srv          load:  - looking for better prompt, base f_keep = 0.070, sim = 0.206
Feb 18 19:54:43 megami launch_models.sh[497507]: srv          load:  - found better prompt with f_keep = 0.578, sim = 0.813

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: ran this for some time, my agent is so much faster now it's wild. Even with more sessions going than I have slots to hold.
Edge cases checked:
I did consider splitting the conversation info block a few different ways but didn't see the point in the end.
What you did not verify:
Lots I'm sure. But I've been running it for a while with no issues!

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No
If yes, exact upgrade steps:

Failure Recovery (if this breaks)

How to disable/revert this change quickly:
Just revert it and restart.
Files/config to restore:
none
Known bad symptoms reviewers should watch for:
You're probably already experiencing them.

Risks and Mitigations

None

Greptile Summary

Relocated per-turn message identifiers (message_id, message_id_full, reply_to_id, sender_id) from system prompt to user context prefix to prevent prompt cache invalidation on every conversation turn. The system prompt now contains only session-stable routing fields (chat_id, channel, provider, surface, chat_type, flags), while per-turn identifiers are included in the conversation info block within user context. This optimization enables efficient prefix-based caching for local model providers (llama-server, LM Studio, MLX).

Moved per-turn identifiers from buildInboundMetaSystemPrompt() to buildInboundUserContextPrefix()
Added clear inline documentation explaining cache stability rationale
Updated tests to verify system prompt excludes per-turn identifiers
Added comprehensive test coverage for user context prefix including all relocated fields
Preserved metadata schema (openclaw.inbound_meta.v1) unchanged

Confidence Score: 5/5

This PR is safe to merge with minimal risk.
The change is a straightforward refactoring that moves metadata fields between two string-building functions without altering logic or behavior. Comprehensive test coverage verifies both the exclusion from system prompt and inclusion in user context. The implementation includes clear documentation explaining the cache stability rationale. No security implications, backwards compatibility issues, or edge cases identified.
No files require special attention

_{Last reviewed commit: 607c922}

_{(2/5) Greptile learns from your feedback when you react with thumbs up/down!}

Addresses issue openclaw#20894 where volatile metadata in system prompt breaks Anthropic caching, causing 80-170x cost increases. Documents: - How to detect broken caching (token usage patterns) - Cost impact analysis (/bin/bash.44/day → 4.32/day measured) - Root cause (message_id in system prompt changes per turn) - Workarounds (switch to Sonnet, disable metadata) - Proper fix approach (move volatile data to user messages) - Best practices for cache optimization - Cost monitoring strategies Includes detailed token breakdown examples and cache hit rate calculations. Refs openclaw#20894, PR openclaw#20597 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dynamicfire · 2026-02-19T14:10:55Z

+1 this fix — it matches what we’re seeing.

Root cause seems to be that “Inbound Context (trusted metadata)” is injected into the SYSTEM prompt and contains per-message fields like message_id / reply_to_id. Those change every turn, so Anthropic’s prefix cache gets busted constantly.

In our logs this shows up as: small cacheRead (~8–10k) + huge cacheWrite (often 120k–170k+) every message → costs explode.

Moving the volatile IDs out of the system prompt and into user-role context (as this PR does) feels like the right approach: keeps caching stable while still preserving the metadata for reactions/routing.

tilleulenspiegel · 2026-02-19T14:13:50Z

Reporting from issue #19989: this PR directly fixes the root cause we've been tracking. We confirmed the message_id invalidation on our self-hosted setup (Ubuntu 24.04, claude-sonnet-4-6, ~30KB workspace) — the cache was busting on every single message, causing ~1% of daily token budget per call at peak.

PR looks clean and minimal. Greptile 5/5, backward compatible, no config changes needed. Would be great to see this merged — users on v2026.2.15+ are silently burning 10-100x normal costs right now with no way to detect it (cf. also #19997).

Ready to test any follow-up if needed.

Rubedo-AI · 2026-02-19T15:17:10Z

We have production data confirming this exact regression. Our measurements show the cache break point precisely:

Last healthy call (v2026.2.9): cw=182, cr=161,872 — 99.9% cache hit
First broken call (v2026.2.15): cw=96,944, cr=8,921 — only static prefix cached
Daily cost impact: $0.44/day → $84.32/day (Opus 4.6, 200k context)

The 8,921 token cache-read floor is consistent across all broken calls — it's the static instruction block before the injected inbound meta. Everything after gets rewritten every turn.

This PR's approach (moving volatile fields to user-role context, keeping session-stable fields in system prompt) is the right fix. Related issues: #20894, #19989.

Would love to see this merged — it's silently costing every Anthropic user on v2026.2.15+ significantly more than they realize.

PastaPastaPasta · 2026-02-19T17:56:27Z

ACK; I have noticed this massive explosion in usage. I've reviewed these changes and they make sense to me.

This, or a similiar fix, should be high priority.

…s to user context Commit bed8e7a added message_id, message_id_full, reply_to_id, and sender_id to buildInboundMetaSystemPrompt(), injecting them into the system prompt on every turn. Since message_id is unique per message, this caused the system prompt to differ on every turn, busting prefix-based prompt caches on local model providers (llama-server, LM Studio/MLX) and causing full cache rebuilds from ~token 9212. Move these per-turn volatile fields out of the system prompt and into the user-role conversationInfo block in buildInboundUserContextPrefix(), where message_id was already partially present. The system prompt now contains only session-stable fields (chat_id, channel, provider, surface, chat_type, flags), restoring cache stability for the duration of a session. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mbelinky · 2026-02-19T19:11:52Z

Merged via squash.

Prepared head SHA: 175919a
Merge commit: 4b7d891

Thanks @anisoptera!

@mbelinky

…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky

@mbelinky

…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky

@mbelinky

…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky

@mbelinky

…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky

@mbelinky

…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky

Add 'The bill' section covering OpenClaw's token economics, the prompt cache invalidation bug (openclaw/openclaw#20597), and the experience of hitting API limits on a MAX subscription due to a platform bug rather than user behaviour.

@mbelinky

…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky

@mbelinky

…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky

@mbelinky

…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky

sebastienbo · 2026-02-23T06:33:53Z

To avoid furure changes causing this (prompt changes at each new request problem), in the future again, we need an automated unit test, or functional test.

This should never happen again.
And in my opinion this could easily happen if we don't test new merges on prompt changes, or what do you guys think,

How can we avoid such problem in the future?

And why did the doctor or health check not detect that?

@mbelinky

…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky

openclaw-barnacle bot added the size: S label Feb 19, 2026

anisoptera mentioned this pull request Feb 19, 2026

[Bug]: Prompt cache constantly invalidated — cacheWrite dominates over cacheRead, causing 10x cost increase #19989

Closed

anisoptera changed the title ~~fix(caching): restore prompt cache stability by moving per-turn ids to user context~~ fix(auto-reply): restore prompt cache stability by moving per-turn ids to user context Feb 19, 2026

Rubedo-AI mentioned this pull request Feb 19, 2026

[Bug]: Inbound metadata injection into system prompt breaks Anthropic prompt caching (~100x cost increase) #20894

Closed

This was referenced Feb 19, 2026

message_id in system prompt defeats prompt caching on every run #19014

Closed

fix: remove message_id from system prompt to preserve prompt caching #19013

Closed

mbelinky self-assigned this Feb 19, 2026

mbelinky force-pushed the message-id-cache-buster branch from 607c922 to 366394e Compare February 19, 2026 19:05

anisoptera and others added 2 commits February 19, 2026 20:05

chore(changelog): note prompt cache stability fix

175919a

mbelinky force-pushed the message-id-cache-buster branch from 366394e to 175919a Compare February 19, 2026 19:10

mbelinky merged commit 4b7d891 into openclaw:main Feb 19, 2026
19 checks passed

github-actions bot mentioned this pull request Feb 19, 2026

📡 Upstream Digest — 2026-02-19 20:27 UTC curtismercier/openclaw-mods#72

Open

earthlingworks mentioned this pull request Feb 20, 2026

[Bug]: Anthropic prompt caching broken - Cache Read always 0" #19534

Closed

SaintPepsi mentioned this pull request Feb 21, 2026

content: add token burn section to Three Tabs article SaintPepsi/ianhogers.dev#8

Merged

github-actions bot mentioned this pull request Feb 22, 2026

上游更新: v2026.2.21 — 32 P0 + 30 P1 待合并 jiulingyun/openclaw-cn#274

Closed

jiulingyun mentioned this pull request Feb 22, 2026

upstream(auto-reply): 移植 1 个冲突 commit (P1) — v2026.2.19→v2026.2.21 jiulingyun/openclaw-cn#298

Closed

6 tasks

Copilot AI mentioned this pull request Feb 22, 2026

upstream(auto-reply): port 4b7d89100eef — restore prompt cache stability by moving per-turn IDs to user context jiulingyun/openclaw-cn#301

Merged

steipete mentioned this pull request Feb 22, 2026

Per-message metadata in system prompt invalidates llama.cpp KV cache every turn #20430

Closed

steipete mentioned this pull request Feb 23, 2026

Anthropic prompt cache misses on most turns in auto-reply path (high cacheWrite, zero cacheRead) #18963

Closed

liebpmp mentioned this pull request Feb 24, 2026

fix(cache): stabilize runtimeChannel across turn types for prompt cache efficiency #25548

Open

2 tasks

Oceanswave mentioned this pull request Feb 28, 2026

Agents can't react to DM messages — message_id stripped from inbound context #29503

Open

aidiffuser mentioned this pull request Mar 5, 2026

[Bug]: Runtime events and reasoning toggle mutate system prompt, invalidating prompt cache on every turn (all providers) #36520

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(auto-reply): restore prompt cache stability by moving per-turn ids to user context#20597

fix(auto-reply): restore prompt cache stability by moving per-turn ids to user context#20597
mbelinky merged 2 commits intoopenclaw:mainfrom
anisoptera:message-id-cache-buster

anisoptera commented Feb 19, 2026 •

edited

Loading

Uh oh!

dynamicfire commented Feb 19, 2026

Uh oh!

tilleulenspiegel commented Feb 19, 2026

Uh oh!

Rubedo-AI commented Feb 19, 2026

Uh oh!

PastaPastaPasta commented Feb 19, 2026

Uh oh!

Uh oh!

mbelinky commented Feb 19, 2026

Uh oh!

sebastienbo commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

anisoptera commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Evidence

Human Verification (required)

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Greptile Summary

Confidence Score: 5/5

Uh oh!

dynamicfire commented Feb 19, 2026

Uh oh!

tilleulenspiegel commented Feb 19, 2026

Uh oh!

Rubedo-AI commented Feb 19, 2026

Uh oh!

PastaPastaPasta commented Feb 19, 2026

Uh oh!

Uh oh!

mbelinky commented Feb 19, 2026

Uh oh!

sebastienbo commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

anisoptera commented Feb 19, 2026 •

edited

Loading