Note from the human: Hi, I investigated this in depth using Codex, iterating over all possible configs to validate that this is not an isolated issue that I introduced by making adjustments to settings.
Summary
Anthropic prompt caching appears to miss on most turns in auto-reply flows.
Observed pattern across many adjacent turns:
cacheWrite very high (~137k)
cacheRead = 0
- small fresh input tokens
This causes repeated high-cost re-caching.
Environment
- OpenClaw:
2026.2.16 (db3480f)
- Provider: direct Anthropic (not OpenRouter)
- Model:
anthropic/claude-sonnet-4-5-20250929
- Install: local CLI install
- Channel observed: Telegram auto-reply (code path appears shared across other channel adapters too)
Relevant config
- active Anthropic model has
params.cacheRetention: "long"
agents.defaults.contextPruning.mode: "cache-ttl"
agents.defaults.contextPruning.ttl: "55m"
agents.defaults.contextPruning.minPrunableToolChars: 50000
agents.defaults.contextPruning.softTrim.maxChars: 4000
Expected
Within TTL, adjacent turns should show meaningful cacheRead for unchanged prompt-prefix segments.
Actual
Most turns show near-full cacheWrite and cacheRead=0.
Additional runtime signal
There are occasional cache-hit continuations immediately after tool/result boundaries, but normal subsequent user turns return to cacheRead=0 + large cacheWrite. That suggests caching is not globally disabled, but prefix reuse is unstable across regular turns.
Suspected cause
Auto-reply builds extra system prompt content from inbound metadata:
src/auto-reply/reply/get-reply-run.ts
src/auto-reply/reply/inbound-meta.ts
The inbound metadata includes volatile per-message fields:
message_id
reply_to_id
history_count
If these are part of the cached prefix, they can change every inbound turn and defeat cache reuse.
Related issues / PRs
Request
Please confirm whether inbound trusted metadata intended for routing/reactions should remain in system-prompt cached prefix, or be segmented/moved so prompt caching remains stable across normal adjacent turns.
Happy to test candidate fixes and report before/after cacheRead/cacheWrite.
Note from the human: Hi, I investigated this in depth using Codex, iterating over all possible configs to validate that this is not an isolated issue that I introduced by making adjustments to settings.
Summary
Anthropic prompt caching appears to miss on most turns in auto-reply flows.
Observed pattern across many adjacent turns:
cacheWritevery high (~137k)cacheRead= 0This causes repeated high-cost re-caching.
Environment
2026.2.16(db3480f)anthropic/claude-sonnet-4-5-20250929Relevant config
params.cacheRetention: "long"agents.defaults.contextPruning.mode: "cache-ttl"agents.defaults.contextPruning.ttl: "55m"agents.defaults.contextPruning.minPrunableToolChars: 50000agents.defaults.contextPruning.softTrim.maxChars: 4000Expected
Within TTL, adjacent turns should show meaningful
cacheReadfor unchanged prompt-prefix segments.Actual
Most turns show near-full
cacheWriteandcacheRead=0.Additional runtime signal
There are occasional cache-hit continuations immediately after tool/result boundaries, but normal subsequent user turns return to
cacheRead=0+ largecacheWrite. That suggests caching is not globally disabled, but prefix reuse is unstable across regular turns.Suspected cause
Auto-reply builds extra system prompt content from inbound metadata:
src/auto-reply/reply/get-reply-run.tssrc/auto-reply/reply/inbound-meta.tsThe inbound metadata includes volatile per-message fields:
message_idreply_to_idhistory_countIf these are part of the cached prefix, they can change every inbound turn and defeat cache reuse.
Related issues / PRs
Request
Please confirm whether inbound trusted metadata intended for routing/reactions should remain in system-prompt cached prefix, or be segmented/moved so prompt caching remains stable across normal adjacent turns.
Happy to test candidate fixes and report before/after
cacheRead/cacheWrite.