fix(anthropic): split system prompt into static/dynamic blocks for stable cache prefix#53203
fix(anthropic): split system prompt into static/dynamic blocks for stable cache prefix#53203coletebou wants to merge 1 commit into
Conversation
…able cache prefix Move per-turn dynamic content (extraSystemPrompt, ## Runtime) into a separate system content block without cache_control, so the static prefix (tools, skills, memory, safety rules, project context) stays cached across turns. Anthropic's prompt cache is prefix-based — any byte change in the system content invalidates the cache for all content after it. The current monolithic system prompt includes sections that change every turn (group context, runtime info, model capabilities), causing full cache re-writes of ~60-150k tokens on every API call instead of incremental ~200-500 token appends. Implementation: - Add SYSTEM_PROMPT_CACHE_BOUNDARY delimiter in system-prompt.ts between static and dynamic sections - Add createAnthropicSystemPromptCacheSplitWrapper in anthropic-stream-wrappers.ts that splits on the delimiter in onPayload, preserving cache_control only on the static prefix - Wire the wrapper for direct Anthropic and Bedrock providers in extra-params.ts Measured impact on a real deployment (33 tenant multi-agent): - Before: 44% cache miss rate, $0.36/message in cache writes alone - After: static prefix stays cached, cache writes drop to incremental Closes openclaw#49700 Related: openclaw#18963, openclaw#19989, openclaw#20894, openclaw#43232
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f8b57136b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| payloadObj.system = [ | ||
| ...(staticPart | ||
| ? [{ type: "text", text: staticPart, cache_control: { type: "ephemeral" } }] | ||
| : []), |
There was a problem hiding this comment.
Respect disabled Anthropic caching in the string split path
When applyExtraParamsToAgent() installs this wrapper for Anthropic, requests that did not opt into cacheRetention still reach this branch with payload.system as a plain string. Re-emitting the static prefix with cache_control: { type: "ephemeral" } silently turns prompt caching on anyway, so sessions that previously had no cache writes — or Bedrock calls explicitly configured with cacheRetention: "none" when pi-ai serializes back to a string — now start paying cache-write cost instead of preserving the old behavior.
Useful? React with 👍 / 👎.
| // --- Cache boundary: everything above is static per-session; everything below | ||
| // may change per-turn (group context, runtime info). Providers with prefix-based | ||
| // caching (Anthropic) split here so the static prefix stays cached. | ||
| lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY); |
There was a problem hiding this comment.
Move the cache boundary below the injected project context
Placing the split marker here makes everything after it uncached, including # Project Context and the injected file bodies added later in buildAgentSystemPrompt(). In sessions that load large repo context files, those tokens will still be rewritten on every turn even when only extraSystemPrompt or runtime metadata changes, so the expensive part of the prompt never benefits from the new stable cache prefix.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR splits the monolithic Anthropic system prompt into a cached static prefix and an uncached dynamic suffix using an HTML comment delimiter, following the same Two issues were found:
Confidence Score: 3/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/pi-embedded-runner/extra-params.ts
Line: 253
Comment:
**Wrong argument passed to `isAnthropicBedrockModel`**
`isAnthropicBedrockModel` takes a single `modelId: string` parameter, but the new code passes `provider` as the first argument. The function checks whether the string contains `"anthropic.claude"` or `"anthropic/claude"` — `"amazon-bedrock"` never matches either pattern, so **the wrapper is silently never applied to Bedrock Anthropic models**, exactly contradicting the PR description's stated Bedrock support goal.
Compare the existing correct usage at line 299:
```typescript
if (provider === "amazon-bedrock" && !isAnthropicBedrockModel(modelId)) {
```
Fix:
```suggestion
if (provider === "anthropic" || (provider === "amazon-bedrock" && isAnthropicBedrockModel(modelId))) {
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/system-prompt.ts
Line: 585-588
Comment:
**`# Project Context` falls in the uncached dynamic block**
The cache boundary is inserted at line 585, but the actual `contextFiles` content (`# Project Context`) is built and pushed significantly later — after `extraSystemPrompt`, `reactionGuidance`, and `reasoningHint`. This contradicts the PR description, which explicitly lists "project context" as one of the **static** sections that should stay cached.
For agents with large context file sets (the prompt sizes mentioned are 60–150 k tokens), this means those tokens are paid at full input-token price every turn rather than being served from cache. The sections placed after the boundary that are not truly per-turn dynamic are:
- `# Project Context` / `contextFiles` — session-level, rarely changes
- `## Reactions` — session-level guidance
- `## Reasoning Format` — session-level hint
- `## Silent Replies` — fully static text
- `## Heartbeats` — changes only when heartbeat config changes
Only `## Group Chat Context` (`extraSystemPrompt`) and `## Runtime` (contains model name / capabilities) genuinely change per-turn. Consider moving the boundary to just before `extraSystemPrompt` is pushed (i.e. after the last truly static section), or create a second boundary so that session-level content can also be cached in its own block.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(anthropic): split system prompt into..." | Re-trigger Greptile |
| // Split the system prompt into static (cached) and dynamic (uncached) blocks | ||
| // for Anthropic providers. This preserves cache hits across turns by keeping | ||
| // per-turn dynamic content (group context, runtime info) out of the cached prefix. | ||
| if (provider === "anthropic" || isAnthropicBedrockModel(provider, modelId)) { |
There was a problem hiding this comment.
Wrong argument passed to
isAnthropicBedrockModel
isAnthropicBedrockModel takes a single modelId: string parameter, but the new code passes provider as the first argument. The function checks whether the string contains "anthropic.claude" or "anthropic/claude" — "amazon-bedrock" never matches either pattern, so the wrapper is silently never applied to Bedrock Anthropic models, exactly contradicting the PR description's stated Bedrock support goal.
Compare the existing correct usage at line 299:
if (provider === "amazon-bedrock" && !isAnthropicBedrockModel(modelId)) {Fix:
| if (provider === "anthropic" || isAnthropicBedrockModel(provider, modelId)) { | |
| if (provider === "anthropic" || (provider === "amazon-bedrock" && isAnthropicBedrockModel(modelId))) { |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/extra-params.ts
Line: 253
Comment:
**Wrong argument passed to `isAnthropicBedrockModel`**
`isAnthropicBedrockModel` takes a single `modelId: string` parameter, but the new code passes `provider` as the first argument. The function checks whether the string contains `"anthropic.claude"` or `"anthropic/claude"` — `"amazon-bedrock"` never matches either pattern, so **the wrapper is silently never applied to Bedrock Anthropic models**, exactly contradicting the PR description's stated Bedrock support goal.
Compare the existing correct usage at line 299:
```typescript
if (provider === "amazon-bedrock" && !isAnthropicBedrockModel(modelId)) {
```
Fix:
```suggestion
if (provider === "anthropic" || (provider === "amazon-bedrock" && isAnthropicBedrockModel(modelId))) {
```
How can I resolve this? If you propose a fix, please make it concise.| lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY); | ||
|
|
||
| if (extraSystemPrompt) { | ||
| // Use "Subagent Context" header for minimal mode (subagents), otherwise "Group Chat Context" |
There was a problem hiding this comment.
# Project Context falls in the uncached dynamic block
The cache boundary is inserted at line 585, but the actual contextFiles content (# Project Context) is built and pushed significantly later — after extraSystemPrompt, reactionGuidance, and reasoningHint. This contradicts the PR description, which explicitly lists "project context" as one of the static sections that should stay cached.
For agents with large context file sets (the prompt sizes mentioned are 60–150 k tokens), this means those tokens are paid at full input-token price every turn rather than being served from cache. The sections placed after the boundary that are not truly per-turn dynamic are:
# Project Context/contextFiles— session-level, rarely changes## Reactions— session-level guidance## Reasoning Format— session-level hint## Silent Replies— fully static text## Heartbeats— changes only when heartbeat config changes
Only ## Group Chat Context (extraSystemPrompt) and ## Runtime (contains model name / capabilities) genuinely change per-turn. Consider moving the boundary to just before extraSystemPrompt is pushed (i.e. after the last truly static section), or create a second boundary so that session-level content can also be cached in its own block.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/system-prompt.ts
Line: 585-588
Comment:
**`# Project Context` falls in the uncached dynamic block**
The cache boundary is inserted at line 585, but the actual `contextFiles` content (`# Project Context`) is built and pushed significantly later — after `extraSystemPrompt`, `reactionGuidance`, and `reasoningHint`. This contradicts the PR description, which explicitly lists "project context" as one of the **static** sections that should stay cached.
For agents with large context file sets (the prompt sizes mentioned are 60–150 k tokens), this means those tokens are paid at full input-token price every turn rather than being served from cache. The sections placed after the boundary that are not truly per-turn dynamic are:
- `# Project Context` / `contextFiles` — session-level, rarely changes
- `## Reactions` — session-level guidance
- `## Reasoning Format` — session-level hint
- `## Silent Replies` — fully static text
- `## Heartbeats` — changes only when heartbeat config changes
Only `## Group Chat Context` (`extraSystemPrompt`) and `## Runtime` (contains model name / capabilities) genuinely change per-turn. Consider moving the boundary to just before `extraSystemPrompt` is pushed (i.e. after the last truly static section), or create a second boundary so that session-level content can also be cached in its own block.
How can I resolve this? If you propose a fix, please make it concise.|
Closing in favor of a v2 with review feedback addressed (Bedrock arg fix, cache boundary placement, cacheRetention guard). |
…able cache prefix Move per-turn dynamic content (## Runtime) into a separate system content block without cache_control, so the static prefix (tools, skills, memory, safety rules, project context, heartbeats) stays cached across turns. Implementation: - Add SYSTEM_PROMPT_CACHE_BOUNDARY delimiter in system-prompt.ts right before ## Runtime (the only truly dynamic section) - Add createAnthropicSystemPromptCacheSplitWrapper in anthropic-stream-wrappers.ts that splits on the delimiter in onPayload, preserving cache_control only on the static prefix - Wire the wrapper for direct Anthropic and Bedrock providers in extra-params.ts, gated on cacheRetention being enabled - Strip delimiter harmlessly when caching is not enabled (string path) v2 — addresses review feedback from openclaw#53203: - Fix isAnthropicBedrockModel arg (was passing provider, now modelId) - Move boundary after project context/heartbeats (before ## Runtime) - Guard wrapper on cacheRetention !== "none" to avoid silent cache enables - Fix oxfmt formatting Closes openclaw#49700 Related: openclaw#18963, openclaw#19989, openclaw#20894, openclaw#43232
Summary
Split the monolithic system prompt into two Anthropic API content blocks — a static prefix (cached) and a dynamic suffix (uncached) — so the static prefix stays cached across turns instead of being re-written on every API call.
Problem
buildAgentSystemPrompt()produces a single string containing both static sections (tools, skills, memory, safety rules, project context) and dynamic sections (group context viaextraSystemPrompt,## Runtimeinfo). Anthropic's prompt cache is prefix-based: any byte change in the system content invalidates everything after it.Since
extraSystemPromptcontains per-message metadata and## Runtimecontains model/capabilities that change on/modelswitches, the cache prefix breaks on most turns. Measured on a real multi-tenant deployment:Over 930 API calls on a single agent today, this caused $74.30 in unnecessary cache write costs — 73% of the agent's total spend.
Root cause
The Anthropic Messages API wraps the system prompt in a single
cache_control: { type: "ephemeral" }content block. WhenextraSystemPrompt(group context, sender metadata) or## Runtime(model name, capabilities) changes between turns, the entire ~60-150k token system prompt is re-cached from scratch instead of incrementally appending ~200-500 new tokens.Fix
Three small changes across 3 files (106 lines added, 0 removed):
src/agents/system-prompt.ts: Export aSYSTEM_PROMPT_CACHE_BOUNDARYdelimiter constant. Insert it between the last static section and the first dynamic section (extraSystemPrompt,## Runtime).src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts: AddcreateAnthropicSystemPromptCacheSplitWrapper()— anonPayloadstream wrapper that splits the system content at the delimiter into two blocks. The static prefix keeps itscache_control: { type: "ephemeral" }from pi-ai. The dynamic suffix gets nocache_control, so changes to it don't invalidate the prefix.src/agents/pi-embedded-runner/extra-params.ts: Wire the new wrapper foranthropicandamazon-bedrockproviders inapplyExtraParamsToAgent().Design decisions
<!-- OPENCLAW_CACHE_BOUNDARY -->that's invisible to models. This avoids changingbuildAgentSystemPrompt's return type (which would be a breaking interface change for context engines and plugins). The delimiter is stripped at the transport layer.onPayloadwrapper (vs. pi-ai changes): Follows the same pattern ascreateOpenRouterSystemCacheWrapperinproxy-stream-wrappers.ts. No changes needed to the pi-ai library orAgentSessioninterface.promptMode: "none", subagents, or non-Anthropic providers), the wrapper is a no-op — the system prompt passes through unchanged.amazon-bedrockprovider as well, since Bedrock Anthropic models use the same prefix-based caching.Scope
Security impact
Test plan
cacheReaddominates after the first turn/model sonnet) — verify only the dynamic block re-caches, not the full prefixextraSystemPrompt— verify static prefix stays cached across different group messagescreateOpenRouterSystemCacheWrapperstill works (this PR doesn't touch it)Related issues
message_idout of system prompt — merged)AI-assisted
This PR was developed with AI assistance (Claude Code) based on analysis of real production cache miss data from a 33-tenant OpenClaw deployment.