Skip to content

System prompt dynamic content invalidates Anthropic prompt caching (~10% hit rate) #49700

@Nimo1987

Description

@Nimo1987

Description

Anthropic prompt caching has an extremely low hit rate (~10%) in OpenClaw when using third-party API relay/proxy providers. The root cause is that buildAgentSystemPrompt produces a single monolithic system prompt string containing both static and dynamic content. Since Anthropic's prompt caching requires an exact prefix match, any change in the system prompt invalidates the entire cache.

Observed behavior

Measured over 143 consecutive API calls on a single session:

Metric Value
cacheWrite > 0 138/143 (96%)
cacheRead > 0 15/143 (10.5%)
Avg cacheWrite per request ~118k tokens
Avg cacheRead per request ~12k tokens

Even consecutive requests within the same session (30-170 second gaps) almost always result in a full cache miss followed by a full cache write. The cache is being written on nearly every request but almost never read back.

Root cause

buildAgentSystemPrompt() in src/agents/system-prompt.ts concatenates everything into a single string, including sections that change on every request:

Dynamic sections (change every request):

  • ## Current Date & Time — includes current time
  • ## Runtime — includes model, channel, capabilities, host info
  • ## Workspace Files (injected) / # Project Context — workspace file contents that may change between requests
  • ## Heartbeats — heartbeat prompt line
  • Inbound context / extra system prompt — varies per message

Static sections (stable across requests):

  • Tool summaries and descriptions
  • Skills section
  • Memory section
  • Messaging / Voice / Reply Tags guidance
  • Safety rules
  • CLI reference
  • Silent replies guidance

Since pi-ai's Anthropic provider wraps the entire system prompt in a single cache_control: {type: "ephemeral"} block, any change in any dynamic section invalidates the cache for the entire ~60-120KB system prompt.

Why this matters more for relay providers

When using Anthropic's direct API, prompt caching is server-managed and somewhat tolerant. But many relay/proxy providers have stricter cache key matching or shorter TTLs. The result is that OpenClaw sessions using relay providers experience:

  • Every request processes 100-300k tokens from scratch (no cache benefit)
  • Progressively slower responses as conversation history grows
  • Effective timeout failures on long sessions (300+ messages)

Proposed fix

Split the system prompt into two (or more) content blocks:

// In pi-ai's Anthropic provider, instead of:
params.system = [
  { type: "text", text: entireSystemPrompt, cache_control: { type: "ephemeral" } }
];

// Split into:
params.system = [
  { type: "text", text: staticSystemPrompt, cache_control: { type: "ephemeral" } },
  { type: "text", text: dynamicSystemPrompt }  // no cache_control
];

This way the static prefix (tool descriptions, skills, safety rules, etc.) remains cacheable across requests, while dynamic content (time, runtime info, workspace file changes) is appended without breaking the cache prefix.

Implementation approach:

  1. buildAgentSystemPrompt should return a structured result (e.g., { static: string, dynamic: string }) instead of a single string
  2. The Anthropic provider in pi-ai should emit multiple system blocks with cache_control only on the static prefix
  3. For non-Anthropic providers that don't support multi-block system prompts, fall back to concatenation

Expected impact

  • Cache hit rate should increase from ~10% to ~90%+ for consecutive requests in the same session
  • For a typical session with 100k+ tokens of system+tools, this means ~100k fewer tokens processed per request
  • Significant latency reduction, especially on relay providers
  • Lower token costs (cached tokens are cheaper on Anthropic)

Environment

  • OpenClaw version: 2026.3.13
  • pi-ai version: 0.57.1
  • Model: claude-sonnet-4-6 via third-party relay provider
  • System prompt size: 60-120KB depending on workspace files
  • Session: 300+ messages, ~296k total tokens

Related

  • docs/reference/prompt-caching.md mentions "check for volatile system-prompt inputs" in troubleshooting but doesn't address the architectural issue
  • The createOpenRouterSystemCacheWrapper in proxy-stream-wrappers.ts adds cache_control to system blocks but doesn't split static/dynamic content

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions