fix(auto-reply): restore prompt cache stability by moving per-turn ids to user context#20597
Conversation
Addresses issue openclaw#20894 where volatile metadata in system prompt breaks Anthropic caching, causing 80-170x cost increases. Documents: - How to detect broken caching (token usage patterns) - Cost impact analysis (/bin/bash.44/day → 4.32/day measured) - Root cause (message_id in system prompt changes per turn) - Workarounds (switch to Sonnet, disable metadata) - Proper fix approach (move volatile data to user messages) - Best practices for cache optimization - Cost monitoring strategies Includes detailed token breakdown examples and cache hit rate calculations. Refs openclaw#20894, PR openclaw#20597 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
+1 this fix — it matches what we’re seeing. Root cause seems to be that “Inbound Context (trusted metadata)” is injected into the SYSTEM prompt and contains per-message fields like In our logs this shows up as: small cacheRead (~8–10k) + huge cacheWrite (often 120k–170k+) every message → costs explode. Moving the volatile IDs out of the system prompt and into user-role context (as this PR does) feels like the right approach: keeps caching stable while still preserving the metadata for reactions/routing. |
|
Reporting from issue #19989: this PR directly fixes the root cause we've been tracking. We confirmed the PR looks clean and minimal. Greptile 5/5, backward compatible, no config changes needed. Would be great to see this merged — users on v2026.2.15+ are silently burning 10-100x normal costs right now with no way to detect it (cf. also #19997). Ready to test any follow-up if needed. |
|
We have production data confirming this exact regression. Our measurements show the cache break point precisely:
The 8,921 token cache-read floor is consistent across all broken calls — it's the static instruction block before the injected inbound meta. Everything after gets rewritten every turn. This PR's approach (moving volatile fields to user-role context, keeping session-stable fields in system prompt) is the right fix. Related issues: #20894, #19989. Would love to see this merged — it's silently costing every Anthropic user on v2026.2.15+ significantly more than they realize. |
|
ACK; I have noticed this massive explosion in usage. I've reviewed these changes and they make sense to me. This, or a similiar fix, should be high priority. |
607c922 to
366394e
Compare
…s to user context Commit bed8e7a added message_id, message_id_full, reply_to_id, and sender_id to buildInboundMetaSystemPrompt(), injecting them into the system prompt on every turn. Since message_id is unique per message, this caused the system prompt to differ on every turn, busting prefix-based prompt caches on local model providers (llama-server, LM Studio/MLX) and causing full cache rebuilds from ~token 9212. Move these per-turn volatile fields out of the system prompt and into the user-role conversationInfo block in buildInboundUserContextPrefix(), where message_id was already partially present. The system prompt now contains only session-stable fields (chat_id, channel, provider, surface, chat_type, flags), restoring cache stability for the duration of a session. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
366394e to
175919a
Compare
|
Merged via squash. Thanks @anisoptera! |
…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
Add 'The bill' section covering OpenClaw's token economics, the prompt cache invalidation bug (openclaw/openclaw#20597), and the experience of hitting API limits on a MAX subscription due to a platform bug rather than user behaviour.
…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
|
To avoid furure changes causing this (prompt changes at each new request problem), in the future again, we need an automated unit test, or functional test. This should never happen again. How can we avoid such problem in the future? And why did the doctor or health check not detect that? |
…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
Summary
• Problem: Commit bed8e7a added message_id, message_id_full, reply_to_id, and sender_id to buildInboundMetaSystemPrompt(), injecting them into the system prompt on every turn. Since message_id is unique per message, this caused the system prompt to differ on every turn, busting prefix-based prompt caches on local model providers (llama-server, LM Studio/MLX) and causing full cache rebuilds on every conversation turn.
• Why it matters: Cache invalidation on every turn increases latency, costs, and reduces efficiency for local models.
• What changed: Moved per-message identifiers from buildInboundMetaSystemPrompt() (system prompt) to buildInboundUserContextPrefix() (user context prefix). System prompt now contains only session-stable routing fields.
• What did NOT change (scope boundary): Other metadata fields (sender info, thread starter, forwarded message, chat history) remain unchanged. The buildInboundMetaSystemPrompt() trusted metadata schema remains the same.
Change Type
• [x] Bug fix
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
List user-visible changes (including defaults/config).
If none, write
None.Security Impact (required)
• Secrets/tokens handling changed? No
• New/changed network calls? No
• Command/tool execution surface changed? No
• Data access scope changed? No
Repro + Verification
Environment
• OS: Linux (Debian)
• Runtime/container: Node.js v22.22.0
• Model/provider: glm-4.7-flash, llama.cpp
• Integration/channel: General
• Relevant config: Default OpenClaw config
Steps
###Expected
• Prompt cache should remain stable across turns when workspace files don't change
• Cache should only rebuild when actual workspace changes occur
###Actual
• Before fix: Cache invalidation on every turn
• After fix: Cache remains stable across turns
Evidence
Attach at least one:
Here's it actually being able to use the prefix cache:
Human Verification (required)
What you personally verified (not just CI), and how:
I did consider splitting the conversation info block a few different ways but didn't see the point in the end.
Lots I'm sure. But I've been running it for a while with no issues!
Compatibility / Migration
Failure Recovery (if this breaks)
Just revert it and restart.
none
You're probably already experiencing them.
Risks and Mitigations
None
Greptile Summary
Relocated per-turn message identifiers (
message_id,message_id_full,reply_to_id,sender_id) from system prompt to user context prefix to prevent prompt cache invalidation on every conversation turn. The system prompt now contains only session-stable routing fields (chat_id,channel,provider,surface,chat_type,flags), while per-turn identifiers are included in the conversation info block within user context. This optimization enables efficient prefix-based caching for local model providers (llama-server, LM Studio, MLX).buildInboundMetaSystemPrompt()tobuildInboundUserContextPrefix()openclaw.inbound_meta.v1) unchangedConfidence Score: 5/5
Last reviewed commit: 607c922
(2/5) Greptile learns from your feedback when you react with thumbs up/down!