System prompt dynamic content invalidates Anthropic prompt caching (~10% hit rate)

### Description

Anthropic prompt caching has an extremely low hit rate (~10%) in OpenClaw when using third-party API relay/proxy providers. The root cause is that `buildAgentSystemPrompt` produces a single monolithic system prompt string containing both **static** and **dynamic** content. Since Anthropic's prompt caching requires an exact prefix match, any change in the system prompt invalidates the entire cache.

### Observed behavior

Measured over 143 consecutive API calls on a single session:

| Metric | Value |
|--------|-------|
| `cacheWrite > 0` | 138/143 (96%) |
| `cacheRead > 0` | **15/143 (10.5%)** |
| Avg `cacheWrite` per request | ~118k tokens |
| Avg `cacheRead` per request | ~12k tokens |

Even consecutive requests within the same session (30-170 second gaps) almost always result in a full cache miss followed by a full cache write. The cache is being written on nearly every request but almost never read back.

### Root cause

`buildAgentSystemPrompt()` in `src/agents/system-prompt.ts` concatenates everything into a single string, including sections that change on every request:

**Dynamic sections (change every request):**
- `## Current Date & Time` — includes current time
- `## Runtime` — includes model, channel, capabilities, host info
- `## Workspace Files (injected)` / `# Project Context` — workspace file contents that may change between requests
- `## Heartbeats` — heartbeat prompt line
- Inbound context / extra system prompt — varies per message

**Static sections (stable across requests):**
- Tool summaries and descriptions
- Skills section
- Memory section
- Messaging / Voice / Reply Tags guidance
- Safety rules
- CLI reference
- Silent replies guidance

Since `pi-ai`'s Anthropic provider wraps the entire system prompt in a single `cache_control: {type: "ephemeral"}` block, any change in any dynamic section invalidates the cache for the entire ~60-120KB system prompt.

### Why this matters more for relay providers

When using Anthropic's direct API, prompt caching is server-managed and somewhat tolerant. But many relay/proxy providers have stricter cache key matching or shorter TTLs. The result is that OpenClaw sessions using relay providers experience:

- **Every request processes 100-300k tokens from scratch** (no cache benefit)
- **Progressively slower responses** as conversation history grows
- **Effective timeout failures** on long sessions (300+ messages)

### Proposed fix

Split the system prompt into two (or more) content blocks:

```typescript
// In pi-ai's Anthropic provider, instead of:
params.system = [
  { type: "text", text: entireSystemPrompt, cache_control: { type: "ephemeral" } }
];

// Split into:
params.system = [
  { type: "text", text: staticSystemPrompt, cache_control: { type: "ephemeral" } },
  { type: "text", text: dynamicSystemPrompt }  // no cache_control
];
```

This way the static prefix (tool descriptions, skills, safety rules, etc.) remains cacheable across requests, while dynamic content (time, runtime info, workspace file changes) is appended without breaking the cache prefix.

**Implementation approach:**

1. `buildAgentSystemPrompt` should return a structured result (e.g., `{ static: string, dynamic: string }`) instead of a single string
2. The Anthropic provider in `pi-ai` should emit multiple system blocks with `cache_control` only on the static prefix
3. For non-Anthropic providers that don't support multi-block system prompts, fall back to concatenation

### Expected impact

- Cache hit rate should increase from ~10% to ~90%+ for consecutive requests in the same session
- For a typical session with 100k+ tokens of system+tools, this means ~100k fewer tokens processed per request
- Significant latency reduction, especially on relay providers
- Lower token costs (cached tokens are cheaper on Anthropic)

### Environment

- OpenClaw version: 2026.3.13
- pi-ai version: 0.57.1
- Model: claude-sonnet-4-6 via third-party relay provider
- System prompt size: 60-120KB depending on workspace files
- Session: 300+ messages, ~296k total tokens

### Related

- `docs/reference/prompt-caching.md` mentions "check for volatile system-prompt inputs" in troubleshooting but doesn't address the architectural issue
- The `createOpenRouterSystemCacheWrapper` in `proxy-stream-wrappers.ts` adds `cache_control` to system blocks but doesn't split static/dynamic content


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

System prompt dynamic content invalidates Anthropic prompt caching (~10% hit rate) #49700

Description

Observed behavior

Root cause

Why this matters more for relay providers

Proposed fix

Expected impact

Environment

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Value
`cacheWrite > 0`	138/143 (96%)
`cacheRead > 0`	15/143 (10.5%)
Avg `cacheWrite` per request	~118k tokens
Avg `cacheRead` per request	~12k tokens

Uh oh!

System prompt dynamic content invalidates Anthropic prompt caching (~10% hit rate) #49700

Description

Description

Observed behavior

Root cause

Why this matters more for relay providers

Proposed fix

Expected impact

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions