Skip to content

fix(auto-reply): restore prompt cache stability by moving per-turn ids to user context#20597

Merged
mbelinky merged 2 commits intoopenclaw:mainfrom
anisoptera:message-id-cache-buster
Feb 19, 2026
Merged

fix(auto-reply): restore prompt cache stability by moving per-turn ids to user context#20597
mbelinky merged 2 commits intoopenclaw:mainfrom
anisoptera:message-id-cache-buster

Conversation

@anisoptera
Copy link
Contributor

@anisoptera anisoptera commented Feb 19, 2026

Summary

• Problem: Commit bed8e7a added message_id, message_id_full, reply_to_id, and sender_id to buildInboundMetaSystemPrompt(), injecting them into the system prompt on every turn. Since message_id is unique per message, this caused the system prompt to differ on every turn, busting prefix-based prompt caches on local model providers (llama-server, LM Studio/MLX) and causing full cache rebuilds on every conversation turn.
• Why it matters: Cache invalidation on every turn increases latency, costs, and reduces efficiency for local models.
• What changed: Moved per-message identifiers from buildInboundMetaSystemPrompt() (system prompt) to buildInboundUserContextPrefix() (user context prefix). System prompt now contains only session-stable routing fields.
• What did NOT change (scope boundary): Other metadata fields (sender info, thread starter, forwarded message, chat history) remain unchanged. The buildInboundMetaSystemPrompt() trusted metadata schema remains the same.

Change Type

• [x] Bug fix

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

List user-visible changes (including defaults/config).
If none, write None.

Security Impact (required)

  • New permissions/capabilities? No
    • Secrets/tokens handling changed? No
    • New/changed network calls? No
    • Command/tool execution surface changed? No
    • Data access scope changed? No

Repro + Verification

Environment

• OS: Linux (Debian)
• Runtime/container: Node.js v22.22.0
• Model/provider: glm-4.7-flash, llama.cpp
• Integration/channel: General
• Relevant config: Default OpenClaw config

Steps

  1. Configure OpenClaw to use a local model provider (e.g., llama-server, LM Studio)
  2. Send multiple messages in a conversation
  3. Observe prompt cache behavior via logs or diagnostics

###Expected

• Prompt cache should remain stable across turns when workspace files don't change
• Cache should only rebuild when actual workspace changes occur

###Actual

• Before fix: Cache invalidation on every turn
• After fix: Cache remains stable across turns

Evidence

Attach at least one:

  • Trace/log snippets
    Here's it actually being able to use the prefix cache:
Feb 18 19:54:43 megami launch_models.sh[497507]: srv  get_availabl: updating prompt cache
Feb 18 19:54:43 megami launch_models.sh[497507]: srv   prompt_save:  - saving prompt with length 18201, total state size = 940.031 MiB
Feb 18 19:54:43 megami launch_models.sh[497507]: srv          load:  - looking for better prompt, base f_keep = 0.070, sim = 0.206
Feb 18 19:54:43 megami launch_models.sh[497507]: srv          load:  - found better prompt with f_keep = 0.578, sim = 0.813

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: ran this for some time, my agent is so much faster now it's wild. Even with more sessions going than I have slots to hold.
  • Edge cases checked:
    I did consider splitting the conversation info block a few different ways but didn't see the point in the end.
  • What you did not verify:
    Lots I'm sure. But I've been running it for a while with no issues!

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly:
    Just revert it and restart.
  • Files/config to restore:
    none
  • Known bad symptoms reviewers should watch for:
    You're probably already experiencing them.

Risks and Mitigations

None

Greptile Summary

Relocated per-turn message identifiers (message_id, message_id_full, reply_to_id, sender_id) from system prompt to user context prefix to prevent prompt cache invalidation on every conversation turn. The system prompt now contains only session-stable routing fields (chat_id, channel, provider, surface, chat_type, flags), while per-turn identifiers are included in the conversation info block within user context. This optimization enables efficient prefix-based caching for local model providers (llama-server, LM Studio, MLX).

  • Moved per-turn identifiers from buildInboundMetaSystemPrompt() to buildInboundUserContextPrefix()
  • Added clear inline documentation explaining cache stability rationale
  • Updated tests to verify system prompt excludes per-turn identifiers
  • Added comprehensive test coverage for user context prefix including all relocated fields
  • Preserved metadata schema (openclaw.inbound_meta.v1) unchanged

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk.
  • The change is a straightforward refactoring that moves metadata fields between two string-building functions without altering logic or behavior. Comprehensive test coverage verifies both the exclusion from system prompt and inclusion in user context. The implementation includes clear documentation explaining the cache stability rationale. No security implications, backwards compatibility issues, or edge cases identified.
  • No files require special attention

Last reviewed commit: 607c922

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

@anisoptera anisoptera changed the title fix(caching): restore prompt cache stability by moving per-turn ids to user context fix(auto-reply): restore prompt cache stability by moving per-turn ids to user context Feb 19, 2026
chilu18 added a commit to chilu18/openclaw that referenced this pull request Feb 19, 2026
Addresses issue openclaw#20894 where volatile metadata in system prompt
breaks Anthropic caching, causing 80-170x cost increases.

Documents:
- How to detect broken caching (token usage patterns)
- Cost impact analysis (/bin/bash.44/day → 4.32/day measured)
- Root cause (message_id in system prompt changes per turn)
- Workarounds (switch to Sonnet, disable metadata)
- Proper fix approach (move volatile data to user messages)
- Best practices for cache optimization
- Cost monitoring strategies

Includes detailed token breakdown examples and cache hit rate
calculations.

Refs openclaw#20894, PR openclaw#20597

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dynamicfire
Copy link

+1 this fix — it matches what we’re seeing.

Root cause seems to be that “Inbound Context (trusted metadata)” is injected into the SYSTEM prompt and contains per-message fields like message_id / reply_to_id. Those change every turn, so Anthropic’s prefix cache gets busted constantly.

In our logs this shows up as: small cacheRead (~8–10k) + huge cacheWrite (often 120k–170k+) every message → costs explode.

Moving the volatile IDs out of the system prompt and into user-role context (as this PR does) feels like the right approach: keeps caching stable while still preserving the metadata for reactions/routing.

@tilleulenspiegel
Copy link

Reporting from issue #19989: this PR directly fixes the root cause we've been tracking. We confirmed the message_id invalidation on our self-hosted setup (Ubuntu 24.04, claude-sonnet-4-6, ~30KB workspace) — the cache was busting on every single message, causing ~1% of daily token budget per call at peak.

PR looks clean and minimal. Greptile 5/5, backward compatible, no config changes needed. Would be great to see this merged — users on v2026.2.15+ are silently burning 10-100x normal costs right now with no way to detect it (cf. also #19997).

Ready to test any follow-up if needed.

@Rubedo-AI
Copy link

We have production data confirming this exact regression. Our measurements show the cache break point precisely:

  • Last healthy call (v2026.2.9): cw=182, cr=161,872 — 99.9% cache hit
  • First broken call (v2026.2.15): cw=96,944, cr=8,921 — only static prefix cached
  • Daily cost impact: $0.44/day → $84.32/day (Opus 4.6, 200k context)

The 8,921 token cache-read floor is consistent across all broken calls — it's the static instruction block before the injected inbound meta. Everything after gets rewritten every turn.

This PR's approach (moving volatile fields to user-role context, keeping session-stable fields in system prompt) is the right fix. Related issues: #20894, #19989.

Would love to see this merged — it's silently costing every Anthropic user on v2026.2.15+ significantly more than they realize.

@PastaPastaPasta
Copy link

ACK; I have noticed this massive explosion in usage. I've reviewed these changes and they make sense to me.

This, or a similiar fix, should be high priority.

@mbelinky mbelinky self-assigned this Feb 19, 2026
@mbelinky mbelinky force-pushed the message-id-cache-buster branch from 607c922 to 366394e Compare February 19, 2026 19:05
anisoptera and others added 2 commits February 19, 2026 20:05
…s to user context

Commit bed8e7a added message_id, message_id_full, reply_to_id, and sender_id
to buildInboundMetaSystemPrompt(), injecting them into the system prompt on every
turn. Since message_id is unique per message, this caused the system prompt to
differ on every turn, busting prefix-based prompt caches on local model providers
(llama-server, LM Studio/MLX) and causing full cache rebuilds from ~token 9212.

Move these per-turn volatile fields out of the system prompt and into the user-role
conversationInfo block in buildInboundUserContextPrefix(), where message_id was
already partially present. The system prompt now contains only session-stable fields
(chat_id, channel, provider, surface, chat_type, flags), restoring cache stability
for the duration of a session.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@mbelinky mbelinky force-pushed the message-id-cache-buster branch from 366394e to 175919a Compare February 19, 2026 19:10
@mbelinky mbelinky merged commit 4b7d891 into openclaw:main Feb 19, 2026
19 checks passed
@mbelinky
Copy link
Contributor

Merged via squash.

Thanks @anisoptera!

bandarupalli pushed a commit to tildabio/openclaw that referenced this pull request Feb 19, 2026
…s to user context (openclaw#20597)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 175919a
Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
yneth-ray-openclaw pushed a commit to yneth-ray-openclaw/openclaw that referenced this pull request Feb 19, 2026
…s to user context (openclaw#20597)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 175919a
Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
vignesh07 pushed a commit to pahdo/openclaw that referenced this pull request Feb 20, 2026
…s to user context (openclaw#20597)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 175919a
Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
rodrigogs pushed a commit to rodrigogs/openclaw that referenced this pull request Feb 20, 2026
…s to user context (openclaw#20597)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 175919a
Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
Hansen1018 added a commit to Hansen1018/openclaw that referenced this pull request Feb 21, 2026
…s to user context (openclaw#20597)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 175919a
Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
SaintPepsi added a commit to SaintPepsi/ianhogers.dev that referenced this pull request Feb 21, 2026
Add 'The bill' section covering OpenClaw's token economics,
the prompt cache invalidation bug (openclaw/openclaw#20597),
and the experience of hitting API limits on a MAX subscription
due to a platform bug rather than user behaviour.
mmyyfirstb pushed a commit to mmyyfirstb/openclaw that referenced this pull request Feb 21, 2026
…s to user context (openclaw#20597)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 175919a
Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
mmyyfirstb pushed a commit to mmyyfirstb/openclaw that referenced this pull request Feb 21, 2026
…s to user context (openclaw#20597)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 175919a
Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
obviyus pushed a commit to guirguispierre/openclaw that referenced this pull request Feb 22, 2026
…s to user context (openclaw#20597)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 175919a
Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
@sebastienbo
Copy link

To avoid furure changes causing this (prompt changes at each new request problem), in the future again, we need an automated unit test, or functional test.

This should never happen again.
And in my opinion this could easily happen if we don't test new merges on prompt changes, or what do you guys think,

How can we avoid such problem in the future?

And why did the doctor or health check not detect that?

zooqueen pushed a commit to hanzoai/bot that referenced this pull request Mar 6, 2026
…s to user context (openclaw#20597)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 175919a
Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: 2026.2.15 breaks prompt cache for local model providers (llama-server, LM Studio/MLX)

7 participants