fix(anthropic): split system prompt into static/dynamic blocks for stable cache prefix#53225
fix(anthropic): split system prompt into static/dynamic blocks for stable cache prefix#53225coletebou wants to merge 2 commits into
Conversation
…able cache prefix Move per-turn dynamic content (extraSystemPrompt, ## Runtime) into a separate system content block without cache_control, so the static prefix (tools, skills, memory, safety rules, project context) stays cached across turns. Anthropic's prompt cache is prefix-based — any byte change in the system content invalidates the cache for all content after it. The current monolithic system prompt includes sections that change every turn (group context, runtime info, model capabilities), causing full cache re-writes of ~60-150k tokens on every API call instead of incremental ~200-500 token appends. Implementation: - Add SYSTEM_PROMPT_CACHE_BOUNDARY delimiter in system-prompt.ts between static and dynamic sections - Add createAnthropicSystemPromptCacheSplitWrapper in anthropic-stream-wrappers.ts that splits on the delimiter in onPayload, preserving cache_control only on the static prefix - Wire the wrapper for direct Anthropic and Bedrock providers in extra-params.ts Measured impact on a real deployment (33 tenant multi-agent): - Before: 44% cache miss rate, $0.36/message in cache writes alone - After: static prefix stays cached, cache writes drop to incremental Closes openclaw#49700 Related: openclaw#18963, openclaw#19989, openclaw#20894, openclaw#43232
…able cache prefix Move per-turn dynamic content (## Runtime) into a separate system content block without cache_control, so the static prefix (tools, skills, memory, safety rules, project context, heartbeats) stays cached across turns. Implementation: - Add SYSTEM_PROMPT_CACHE_BOUNDARY delimiter in system-prompt.ts right before ## Runtime (the only truly dynamic section) - Add createAnthropicSystemPromptCacheSplitWrapper in anthropic-stream-wrappers.ts that splits on the delimiter in onPayload, preserving cache_control only on the static prefix - Wire the wrapper for direct Anthropic and Bedrock providers in extra-params.ts, gated on cacheRetention being enabled - Strip delimiter harmlessly when caching is not enabled (string path) v2 — addresses review feedback from openclaw#53203: - Fix isAnthropicBedrockModel arg (was passing provider, now modelId) - Move boundary after project context/heartbeats (before ## Runtime) - Guard wrapper on cacheRetention !== "none" to avoid silent cache enables - Fix oxfmt formatting Closes openclaw#49700 Related: openclaw#18963, openclaw#19989, openclaw#20894, openclaw#43232
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 72094e2740
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // per-turn (group context, runtime info). The delimiter is stripped or used to | ||
| // split system content blocks at the transport layer — see | ||
| // createAnthropicSystemPromptCacheSplitWrapper. | ||
| lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY); |
There was a problem hiding this comment.
Avoid emitting the cache boundary in shared system prompts
buildAgentSystemPrompt() is used outside the embedded Anthropic path, but the new marker is inserted unconditionally here. The only stripper lives in src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts, so other transports now send the literal <!-- OPENCLAW_CACHE_BOUNDARY --> to models — for example src/agents/openai-ws-stream.ts:872-878 forwards context.systemPrompt as instructions, and src/agents/cli-runner.ts:155-168 plus src/agents/cli-runner/helpers.ts:356-362 pass the prompt straight to external CLIs. That changes prompts for every CLI/OpenAI-WS session even though the boundary was supposed to be transport-only.
Useful? React with 👍 / 👎.
| // per-turn (group context, runtime info). The delimiter is stripped or used to | ||
| // split system content blocks at the transport layer — see | ||
| // createAnthropicSystemPromptCacheSplitWrapper. | ||
| lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY); |
There was a problem hiding this comment.
Move the cache boundary below the injected Project Context
Placing the delimiter here leaves # Project Context, ## Silent Replies, and ## Heartbeats on the uncached side (src/agents/system-prompt.ts:622-682). In sessions that load injected files such as AGENTS.md/SOUL.md, that project-context block is often the largest stable portion of the prompt, so any per-turn change in extraSystemPrompt still forces Anthropic to rewrite most of the expensive tokens. That materially undercuts the cache-write savings this change is trying to achieve.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR addresses a real production cost problem — a 44% cache miss rate on Anthropic calls caused by dynamic content ( Key observations:
Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/system-prompt.ts
Line: 582-587
Comment:
**Inaccurate comment — project context and heartbeats are below the boundary**
The comment says "everything above is stable per-session (tools, skills, memory, safety, project context, heartbeats)" — but those last two are incorrect. Inspecting the imperative `lines.push` calls that follow the boundary:
- `# Project Context` (file contents) is pushed at lines ~627–643, **after** the boundary
- `## Silent Replies` is pushed at line ~649, **after** the boundary
- `## Heartbeats` is pushed at line ~667, **after** the boundary
Only `## Runtime` is genuinely dynamic; the others (`contextFiles`, Silent Replies, Heartbeats) are per-session static content that ends up in the uncached dynamic block. This is consistent with the pre-PR ordering (all three were already after `extraSystemPrompt`), so it's not a regression, but the comment overstates what is cached and will mislead future developers.
```suggestion
// --- Cache boundary: everything above is stable per-session (tooling, skills,
// memory, safety, workspace/messaging sections). Everything below changes
// per-turn (group context, runtime info) or per-session but was already
// ordered after extraSystemPrompt (project context, silent replies, heartbeats).
// The delimiter is stripped or used to split system content blocks at the
// transport layer — see createAnthropicSystemPromptCacheSplitWrapper.
lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts
Line: 362-363
Comment:
**Silent no-op when `splitAndCache=true` but system is a plain string**
When `splitAndCache` is `true` (Anthropic + caching enabled) but `system` arrives as a plain `string` rather than an array, the code falls through to this branch and only strips the delimiter with `replace`. No content-block split occurs and the caching optimisation silently doesn't fire.
In practice pi-ai almost certainly sends an array when prompt caching is active, so this is unlikely to matter today. But a future change or a different code path could send a string, and the silent degradation would be hard to diagnose. A small guard comment (or a dev-mode `log.warn`) would make this explicit:
```suggestion
} else if (typeof system === "string" && system.includes(delimiter)) {
// splitAndCache=true but system is a plain string (no cache_control blocks),
// so we can't split into discrete content blocks — just strip the delimiter.
// This normally doesn't happen when pi-ai has prompt caching enabled.
payloadObj.system = system.replace(delimiter, "\n");
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(anthropic): split system prompt into..." | Re-trigger Greptile |
| // --- Cache boundary: everything above is stable per-session (tools, skills, | ||
| // memory, safety, project context, heartbeats). Everything below may change | ||
| // per-turn (group context, runtime info). The delimiter is stripped or used to | ||
| // split system content blocks at the transport layer — see | ||
| // createAnthropicSystemPromptCacheSplitWrapper. | ||
| lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY); |
There was a problem hiding this comment.
Inaccurate comment — project context and heartbeats are below the boundary
The comment says "everything above is stable per-session (tools, skills, memory, safety, project context, heartbeats)" — but those last two are incorrect. Inspecting the imperative lines.push calls that follow the boundary:
# Project Context(file contents) is pushed at lines ~627–643, after the boundary## Silent Repliesis pushed at line ~649, after the boundary## Heartbeatsis pushed at line ~667, after the boundary
Only ## Runtime is genuinely dynamic; the others (contextFiles, Silent Replies, Heartbeats) are per-session static content that ends up in the uncached dynamic block. This is consistent with the pre-PR ordering (all three were already after extraSystemPrompt), so it's not a regression, but the comment overstates what is cached and will mislead future developers.
| // --- Cache boundary: everything above is stable per-session (tools, skills, | |
| // memory, safety, project context, heartbeats). Everything below may change | |
| // per-turn (group context, runtime info). The delimiter is stripped or used to | |
| // split system content blocks at the transport layer — see | |
| // createAnthropicSystemPromptCacheSplitWrapper. | |
| lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY); | |
| // --- Cache boundary: everything above is stable per-session (tooling, skills, | |
| // memory, safety, workspace/messaging sections). Everything below changes | |
| // per-turn (group context, runtime info) or per-session but was already | |
| // ordered after extraSystemPrompt (project context, silent replies, heartbeats). | |
| // The delimiter is stripped or used to split system content blocks at the | |
| // transport layer — see createAnthropicSystemPromptCacheSplitWrapper. | |
| lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/system-prompt.ts
Line: 582-587
Comment:
**Inaccurate comment — project context and heartbeats are below the boundary**
The comment says "everything above is stable per-session (tools, skills, memory, safety, project context, heartbeats)" — but those last two are incorrect. Inspecting the imperative `lines.push` calls that follow the boundary:
- `# Project Context` (file contents) is pushed at lines ~627–643, **after** the boundary
- `## Silent Replies` is pushed at line ~649, **after** the boundary
- `## Heartbeats` is pushed at line ~667, **after** the boundary
Only `## Runtime` is genuinely dynamic; the others (`contextFiles`, Silent Replies, Heartbeats) are per-session static content that ends up in the uncached dynamic block. This is consistent with the pre-PR ordering (all three were already after `extraSystemPrompt`), so it's not a regression, but the comment overstates what is cached and will mislead future developers.
```suggestion
// --- Cache boundary: everything above is stable per-session (tooling, skills,
// memory, safety, workspace/messaging sections). Everything below changes
// per-turn (group context, runtime info) or per-session but was already
// ordered after extraSystemPrompt (project context, silent replies, heartbeats).
// The delimiter is stripped or used to split system content blocks at the
// transport layer — see createAnthropicSystemPromptCacheSplitWrapper.
lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);
```
How can I resolve this? If you propose a fix, please make it concise.| } else if (typeof system === "string" && system.includes(delimiter)) { | ||
| payloadObj.system = system.replace(delimiter, "\n"); |
There was a problem hiding this comment.
Silent no-op when
splitAndCache=true but system is a plain string
When splitAndCache is true (Anthropic + caching enabled) but system arrives as a plain string rather than an array, the code falls through to this branch and only strips the delimiter with replace. No content-block split occurs and the caching optimisation silently doesn't fire.
In practice pi-ai almost certainly sends an array when prompt caching is active, so this is unlikely to matter today. But a future change or a different code path could send a string, and the silent degradation would be hard to diagnose. A small guard comment (or a dev-mode log.warn) would make this explicit:
| } else if (typeof system === "string" && system.includes(delimiter)) { | |
| payloadObj.system = system.replace(delimiter, "\n"); | |
| } else if (typeof system === "string" && system.includes(delimiter)) { | |
| // splitAndCache=true but system is a plain string (no cache_control blocks), | |
| // so we can't split into discrete content blocks — just strip the delimiter. | |
| // This normally doesn't happen when pi-ai has prompt caching enabled. | |
| payloadObj.system = system.replace(delimiter, "\n"); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts
Line: 362-363
Comment:
**Silent no-op when `splitAndCache=true` but system is a plain string**
When `splitAndCache` is `true` (Anthropic + caching enabled) but `system` arrives as a plain `string` rather than an array, the code falls through to this branch and only strips the delimiter with `replace`. No content-block split occurs and the caching optimisation silently doesn't fire.
In practice pi-ai almost certainly sends an array when prompt caching is active, so this is unlikely to matter today. But a future change or a different code path could send a string, and the silent degradation would be hard to diagnose. A small guard comment (or a dev-mode `log.warn`) would make this explicit:
```suggestion
} else if (typeof system === "string" && system.includes(delimiter)) {
// splitAndCache=true but system is a plain string (no cache_control blocks),
// so we can't split into discrete content blocks — just strip the delimiter.
// This normally doesn't happen when pi-ai has prompt caching enabled.
payloadObj.system = system.replace(delimiter, "\n");
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
Superseded by #59054. I refreshed this fix on top of current main, kept the Anthropic/Bedrock-only scope, added the changelog entry, and revalidated the touched surface (pnpm test -- src/agents/pi-embedded-runner/system-prompt-cache-boundary.test.ts, pnpm check). Closing this stale branch to keep the merge path single-threaded. |
|
Closed in favor of #59054. |
Summary
Split the monolithic system prompt into two Anthropic API content blocks — a static prefix (cached) and a dynamic suffix (uncached) — so the static prefix stays cached across turns instead of being re-written on every API call.
Problem
buildAgentSystemPrompt()produces a single string containing both static sections (tools, skills, memory, safety rules, project context, heartbeats) and dynamic sections (extraSystemPromptwith per-message group context,## Runtimewith model/capabilities). Anthropic's prompt cache is prefix-based: any byte change invalidates everything after it.Measured on a real multi-tenant deployment (33 agents, 930 calls today):
Fix
Three files, 110 lines added, 0 removed:
src/agents/system-prompt.ts: ExportSYSTEM_PROMPT_CACHE_BOUNDARYdelimiter constant. Insert it right beforeextraSystemPrompt— after all static per-session sections (tools, skills, memory, safety, project context, heartbeats, silent replies).src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts: AddcreateAnthropicSystemPromptCacheSplitWrapper(baseStreamFn, delimiter, splitAndCache):splitAndCache=true: splits system content at the delimiter into two blocks — static prefix keepscache_controlfrom pi-ai, dynamic suffix gets none.splitAndCache=false: strips the delimiter from the prompt so it never leaks to the model.src/agents/pi-embedded-runner/extra-params.ts: Always apply the wrapper (to strip the delimiter), withsplitAndCache=trueonly whenresolveCacheRetention()returns a truthy non-"none"value AND the provider isanthropicor Bedrock Anthropic.Design decisions
<!-- OPENCLAW_CACHE_BOUNDARY -->): Invisible to models, no interface change tobuildAgentSystemPrompt's return type. Follows the sameonPayloadwrapper pattern ascreateOpenRouterSystemCacheWrapper.extraSystemPrompt(not before## Runtime). In group chats,extraSystemPromptcontains per-message sender metadata and group context that changes every turn. Placing the boundary before it keeps tools, skills, memory, project context, and heartbeats in the cached prefix.splitAndCacheflag only controls whether it also splits into separate content blocks with differentialcache_control. This prevents the delimiter from leaking to the model on any provider or cache configuration.resolveCacheRetention()returning a truthy non-"none"value. Sessions that opted out of caching viacacheRetention: "none"get strip-only behavior — no silent cache writes introduced.isAnthropicBedrockModel(modelId)(notprovider) to correctly detect Bedrock Anthropic models, matching the existing usage pattern at line 299.Scope
Security impact
None — no new permissions, secrets, network calls, or execution surface changes.
Test plan
cacheReaddominates after first turnextraSystemPrompt— verify static prefix stays cached/modelswitch mid-session — verify only dynamic block re-cachescacheRetention: "none"— verify delimiter stripped, no cache writescreateOpenRouterSystemCacheWrapperunaffectedpromptMode: "minimal") — verify wrapper is no-op when delimiter absentRelated
message_id— merged)AI-assisted
Developed with Claude Code. Cache analysis performed on real production data from a 33-tenant OpenClaw deployment.