Description
Anthropic prompt caching has an extremely low hit rate (~10%) in OpenClaw when using third-party API relay/proxy providers. The root cause is that buildAgentSystemPrompt produces a single monolithic system prompt string containing both static and dynamic content. Since Anthropic's prompt caching requires an exact prefix match, any change in the system prompt invalidates the entire cache.
Observed behavior
Measured over 143 consecutive API calls on a single session:
| Metric |
Value |
cacheWrite > 0 |
138/143 (96%) |
cacheRead > 0 |
15/143 (10.5%) |
Avg cacheWrite per request |
~118k tokens |
Avg cacheRead per request |
~12k tokens |
Even consecutive requests within the same session (30-170 second gaps) almost always result in a full cache miss followed by a full cache write. The cache is being written on nearly every request but almost never read back.
Root cause
buildAgentSystemPrompt() in src/agents/system-prompt.ts concatenates everything into a single string, including sections that change on every request:
Dynamic sections (change every request):
## Current Date & Time — includes current time
## Runtime — includes model, channel, capabilities, host info
## Workspace Files (injected) / # Project Context — workspace file contents that may change between requests
## Heartbeats — heartbeat prompt line
- Inbound context / extra system prompt — varies per message
Static sections (stable across requests):
- Tool summaries and descriptions
- Skills section
- Memory section
- Messaging / Voice / Reply Tags guidance
- Safety rules
- CLI reference
- Silent replies guidance
Since pi-ai's Anthropic provider wraps the entire system prompt in a single cache_control: {type: "ephemeral"} block, any change in any dynamic section invalidates the cache for the entire ~60-120KB system prompt.
Why this matters more for relay providers
When using Anthropic's direct API, prompt caching is server-managed and somewhat tolerant. But many relay/proxy providers have stricter cache key matching or shorter TTLs. The result is that OpenClaw sessions using relay providers experience:
- Every request processes 100-300k tokens from scratch (no cache benefit)
- Progressively slower responses as conversation history grows
- Effective timeout failures on long sessions (300+ messages)
Proposed fix
Split the system prompt into two (or more) content blocks:
// In pi-ai's Anthropic provider, instead of:
params.system = [
{ type: "text", text: entireSystemPrompt, cache_control: { type: "ephemeral" } }
];
// Split into:
params.system = [
{ type: "text", text: staticSystemPrompt, cache_control: { type: "ephemeral" } },
{ type: "text", text: dynamicSystemPrompt } // no cache_control
];
This way the static prefix (tool descriptions, skills, safety rules, etc.) remains cacheable across requests, while dynamic content (time, runtime info, workspace file changes) is appended without breaking the cache prefix.
Implementation approach:
buildAgentSystemPrompt should return a structured result (e.g., { static: string, dynamic: string }) instead of a single string
- The Anthropic provider in
pi-ai should emit multiple system blocks with cache_control only on the static prefix
- For non-Anthropic providers that don't support multi-block system prompts, fall back to concatenation
Expected impact
- Cache hit rate should increase from ~10% to ~90%+ for consecutive requests in the same session
- For a typical session with 100k+ tokens of system+tools, this means ~100k fewer tokens processed per request
- Significant latency reduction, especially on relay providers
- Lower token costs (cached tokens are cheaper on Anthropic)
Environment
- OpenClaw version: 2026.3.13
- pi-ai version: 0.57.1
- Model: claude-sonnet-4-6 via third-party relay provider
- System prompt size: 60-120KB depending on workspace files
- Session: 300+ messages, ~296k total tokens
Related
docs/reference/prompt-caching.md mentions "check for volatile system-prompt inputs" in troubleshooting but doesn't address the architectural issue
- The
createOpenRouterSystemCacheWrapper in proxy-stream-wrappers.ts adds cache_control to system blocks but doesn't split static/dynamic content
Description
Anthropic prompt caching has an extremely low hit rate (~10%) in OpenClaw when using third-party API relay/proxy providers. The root cause is that
buildAgentSystemPromptproduces a single monolithic system prompt string containing both static and dynamic content. Since Anthropic's prompt caching requires an exact prefix match, any change in the system prompt invalidates the entire cache.Observed behavior
Measured over 143 consecutive API calls on a single session:
cacheWrite > 0cacheRead > 0cacheWriteper requestcacheReadper requestEven consecutive requests within the same session (30-170 second gaps) almost always result in a full cache miss followed by a full cache write. The cache is being written on nearly every request but almost never read back.
Root cause
buildAgentSystemPrompt()insrc/agents/system-prompt.tsconcatenates everything into a single string, including sections that change on every request:Dynamic sections (change every request):
## Current Date & Time— includes current time## Runtime— includes model, channel, capabilities, host info## Workspace Files (injected)/# Project Context— workspace file contents that may change between requests## Heartbeats— heartbeat prompt lineStatic sections (stable across requests):
Since
pi-ai's Anthropic provider wraps the entire system prompt in a singlecache_control: {type: "ephemeral"}block, any change in any dynamic section invalidates the cache for the entire ~60-120KB system prompt.Why this matters more for relay providers
When using Anthropic's direct API, prompt caching is server-managed and somewhat tolerant. But many relay/proxy providers have stricter cache key matching or shorter TTLs. The result is that OpenClaw sessions using relay providers experience:
Proposed fix
Split the system prompt into two (or more) content blocks:
This way the static prefix (tool descriptions, skills, safety rules, etc.) remains cacheable across requests, while dynamic content (time, runtime info, workspace file changes) is appended without breaking the cache prefix.
Implementation approach:
buildAgentSystemPromptshould return a structured result (e.g.,{ static: string, dynamic: string }) instead of a single stringpi-aishould emit multiple system blocks withcache_controlonly on the static prefixExpected impact
Environment
Related
docs/reference/prompt-caching.mdmentions "check for volatile system-prompt inputs" in troubleshooting but doesn't address the architectural issuecreateOpenRouterSystemCacheWrapperinproxy-stream-wrappers.tsaddscache_controlto system blocks but doesn't split static/dynamic content