fix(anthropic): split system prompt into static/dynamic blocks for stable cache prefix by coletebou · Pull Request #53203 · openclaw/openclaw

coletebou · 2026-03-23T22:36:17Z

Summary

Split the monolithic system prompt into two Anthropic API content blocks — a static prefix (cached) and a dynamic suffix (uncached) — so the static prefix stays cached across turns instead of being re-written on every API call.

Problem

buildAgentSystemPrompt() produces a single string containing both static sections (tools, skills, memory, safety rules, project context) and dynamic sections (group context via extraSystemPrompt, ## Runtime info). Anthropic's prompt cache is prefix-based: any byte change in the system content invalidates everything after it.

Since extraSystemPrompt contains per-message metadata and ## Runtime contains model/capabilities that change on /model switches, the cache prefix breaks on most turns. Measured on a real multi-tenant deployment:

Metric	Before	After (expected)
Cache miss rate	44% (16/36 calls)	<5%
Avg cache write on miss	63,333 tokens	<1,000 tokens
Cost per message (cache writes)	$0.36	~$0.003

Over 930 API calls on a single agent today, this caused $74.30 in unnecessary cache write costs — 73% of the agent's total spend.

Root cause

The Anthropic Messages API wraps the system prompt in a single cache_control: { type: "ephemeral" } content block. When extraSystemPrompt (group context, sender metadata) or ## Runtime (model name, capabilities) changes between turns, the entire ~60-150k token system prompt is re-cached from scratch instead of incrementally appending ~200-500 new tokens.

Turn 1: [static 50k | dynamic A 10k] → cache write 60k ✓
Turn 2: [static 50k | dynamic B 10k] → cache MISS, re-write 60k ✗ (dynamic changed)

Fix

Three small changes across 3 files (106 lines added, 0 removed):

src/agents/system-prompt.ts: Export a SYSTEM_PROMPT_CACHE_BOUNDARY delimiter constant. Insert it between the last static section and the first dynamic section (extraSystemPrompt, ## Runtime).
src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts: Add createAnthropicSystemPromptCacheSplitWrapper() — an onPayload stream wrapper that splits the system content at the delimiter into two blocks. The static prefix keeps its cache_control: { type: "ephemeral" } from pi-ai. The dynamic suffix gets no cache_control, so changes to it don't invalidate the prefix.
src/agents/pi-embedded-runner/extra-params.ts: Wire the new wrapper for anthropic and amazon-bedrock providers in applyExtraParamsToAgent().

Turn 1: [static 50k ← cached | dynamic A 10k] → cache write 60k ✓
Turn 2: [static 50k ← cache HIT | dynamic B 10k] → cache read 50k, write 0.5k ✓

Design decisions

Delimiter approach (vs. structured return type): Uses an HTML comment marker  that's invisible to models. This avoids changing buildAgentSystemPrompt's return type (which would be a breaking interface change for context engines and plugins). The delimiter is stripped at the transport layer.
onPayload wrapper (vs. pi-ai changes): Follows the same pattern as createOpenRouterSystemCacheWrapper in proxy-stream-wrappers.ts. No changes needed to the pi-ai library or AgentSession interface.
Graceful degradation: If the delimiter isn't present (e.g. promptMode: "none", subagents, or non-Anthropic providers), the wrapper is a no-op — the system prompt passes through unchanged.
Bedrock support: Applied to amazon-bedrock provider as well, since Bedrock Anthropic models use the same prefix-based caching.

Scope

Security impact

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Test plan

Send consecutive messages in a DM session on Anthropic — verify cacheRead dominates after the first turn
Switch models mid-session (/model sonnet) — verify only the dynamic block re-caches, not the full prefix
Test group chat with extraSystemPrompt — verify static prefix stays cached across different group messages
Test subagent (minimal prompt mode) — verify wrapper is a no-op when delimiter is absent
Test non-Anthropic provider (OpenAI, Google) — verify no behavioral change
Test OpenRouter Anthropic models — verify existing createOpenRouterSystemCacheWrapper still works (this PR doesn't touch it)

Related issues

AI-assisted

This PR was developed with AI assistance (Claude Code) based on analysis of real production cache miss data from a 33-tenant OpenClaw deployment.

…able cache prefix Move per-turn dynamic content (extraSystemPrompt, ## Runtime) into a separate system content block without cache_control, so the static prefix (tools, skills, memory, safety rules, project context) stays cached across turns. Anthropic's prompt cache is prefix-based — any byte change in the system content invalidates the cache for all content after it. The current monolithic system prompt includes sections that change every turn (group context, runtime info, model capabilities), causing full cache re-writes of ~60-150k tokens on every API call instead of incremental ~200-500 token appends. Implementation: - Add SYSTEM_PROMPT_CACHE_BOUNDARY delimiter in system-prompt.ts between static and dynamic sections - Add createAnthropicSystemPromptCacheSplitWrapper in anthropic-stream-wrappers.ts that splits on the delimiter in onPayload, preserving cache_control only on the static prefix - Wire the wrapper for direct Anthropic and Bedrock providers in extra-params.ts Measured impact on a real deployment (33 tenant multi-agent): - Before: 44% cache miss rate, $0.36/message in cache writes alone - After: static prefix stays cached, cache writes drop to incremental Closes openclaw#49700 Related: openclaw#18963, openclaw#19989, openclaw#20894, openclaw#43232

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f8b57136b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T22:40:20Z

+            payloadObj.system = [
+              ...(staticPart
+                ? [{ type: "text", text: staticPart, cache_control: { type: "ephemeral" } }]
+                : []),


Respect disabled Anthropic caching in the string split path

When applyExtraParamsToAgent() installs this wrapper for Anthropic, requests that did not opt into cacheRetention still reach this branch with payload.system as a plain string. Re-emitting the static prefix with cache_control: { type: "ephemeral" } silently turns prompt caching on anyway, so sessions that previously had no cache writes — or Bedrock calls explicitly configured with cacheRetention: "none" when pi-ai serializes back to a string — now start paying cache-write cost instead of preserving the old behavior.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-23T22:40:20Z

+  // --- Cache boundary: everything above is static per-session; everything below
+  // may change per-turn (group context, runtime info). Providers with prefix-based
+  // caching (Anthropic) split here so the static prefix stays cached.
+  lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);


Move the cache boundary below the injected project context

Placing the split marker here makes everything after it uncached, including # Project Context and the injected file bodies added later in buildAgentSystemPrompt(). In sessions that load large repo context files, those tokens will still be rewritten on every turn even when only extraSystemPrompt or runtime metadata changes, so the expensive part of the prompt never benefits from the new stable cache prefix.

Useful? React with 👍 / 👎.

greptile-apps · 2026-03-23T22:40:50Z

Greptile Summary

This PR splits the monolithic Anthropic system prompt into a cached static prefix and an uncached dynamic suffix using an HTML comment delimiter, following the same onPayload wrapper pattern already used for OpenRouter. The approach is sound and the core wrapper implementation in anthropic-stream-wrappers.ts is clean and consistent with existing conventions.

Two issues were found:

Bedrock support is broken (P1): isAnthropicBedrockModel(provider, modelId) at extra-params.ts:253 passes provider as the argument instead of modelId. Since "amazon-bedrock" does not contain "anthropic.claude" or "anthropic/claude", the wrapper is silently never applied for Bedrock Anthropic models. The fix mirrors the existing correct call pattern at line 299: provider === "amazon-bedrock" && isAnthropicBedrockModel(modelId).
Cache boundary placement puts session-level content in the uncached block (P2): The boundary is inserted just before extraSystemPrompt, but several sections that don't change per-turn — # Project Context (potentially 60–150 k tokens), ## Reactions, ## Reasoning Format, ## Silent Replies, and ## Heartbeats — also end up in the dynamic (uncached) block. The PR description explicitly lists "project context" as a static section, so this appears to be an unintentional placement that could significantly increase input token costs for agents with large context files.

Confidence Score: 3/5

Not safe to merge as-is — the Bedrock support has a silent argument-order bug that must be fixed before this reaches production.
The wrapper logic itself is correct and the caching improvement for direct Anthropic is real. However, the isAnthropicBedrockModel(provider, modelId) argument-order mistake means the feature silently does nothing for Bedrock Anthropic models, contradicting an explicitly stated goal. This needs a one-line fix before the PR is merged. The boundary placement concern is lower priority but worth addressing to match the stated design intent.
src/agents/pi-embedded-runner/extra-params.ts (wrong argument on line 253) and src/agents/system-prompt.ts (boundary placement that leaves project context uncached).

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/extra-params.ts
Line: 253

Comment:
**Wrong argument passed to `isAnthropicBedrockModel`**

`isAnthropicBedrockModel` takes a single `modelId: string` parameter, but the new code passes `provider` as the first argument. The function checks whether the string contains `"anthropic.claude"` or `"anthropic/claude"` — `"amazon-bedrock"` never matches either pattern, so **the wrapper is silently never applied to Bedrock Anthropic models**, exactly contradicting the PR description's stated Bedrock support goal.

Compare the existing correct usage at line 299:
```typescript
if (provider === "amazon-bedrock" && !isAnthropicBedrockModel(modelId)) {
```

Fix:
```suggestion
  if (provider === "anthropic" || (provider === "amazon-bedrock" && isAnthropicBedrockModel(modelId))) {
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/system-prompt.ts
Line: 585-588

Comment:
**`# Project Context` falls in the uncached dynamic block**

The cache boundary is inserted at line 585, but the actual `contextFiles` content (`# Project Context`) is built and pushed significantly later — after `extraSystemPrompt`, `reactionGuidance`, and `reasoningHint`. This contradicts the PR description, which explicitly lists "project context" as one of the **static** sections that should stay cached.

For agents with large context file sets (the prompt sizes mentioned are 60–150 k tokens), this means those tokens are paid at full input-token price every turn rather than being served from cache. The sections placed after the boundary that are not truly per-turn dynamic are:

- `# Project Context` / `contextFiles` — session-level, rarely changes
- `## Reactions` — session-level guidance
- `## Reasoning Format` — session-level hint
- `## Silent Replies` — fully static text
- `## Heartbeats` — changes only when heartbeat config changes

Only `## Group Chat Context` (`extraSystemPrompt`) and `## Runtime` (contains model name / capabilities) genuinely change per-turn. Consider moving the boundary to just before `extraSystemPrompt` is pushed (i.e. after the last truly static section), or create a second boundary so that session-level content can also be cached in its own block.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(anthropic): split system prompt into..." | Re-trigger Greptile}

greptile-apps · 2026-03-23T22:40:54Z

+  // Split the system prompt into static (cached) and dynamic (uncached) blocks
+  // for Anthropic providers. This preserves cache hits across turns by keeping
+  // per-turn dynamic content (group context, runtime info) out of the cached prefix.
+  if (provider === "anthropic" || isAnthropicBedrockModel(provider, modelId)) {


Wrong argument passed to isAnthropicBedrockModel

isAnthropicBedrockModel takes a single modelId: string parameter, but the new code passes provider as the first argument. The function checks whether the string contains "anthropic.claude" or "anthropic/claude" — "amazon-bedrock" never matches either pattern, so the wrapper is silently never applied to Bedrock Anthropic models, exactly contradicting the PR description's stated Bedrock support goal.

Compare the existing correct usage at line 299:

if (provider === "amazon-bedrock" && !isAnthropicBedrockModel(modelId)) {

Fix:

Suggested change

if (provider === "anthropic" || isAnthropicBedrockModel(provider, modelId)) {

if (provider === "anthropic" || (provider === "amazon-bedrock" && isAnthropicBedrockModel(modelId))) {

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/pi-embedded-runner/extra-params.ts Line: 253 Comment: **Wrong argument passed to `isAnthropicBedrockModel`** `isAnthropicBedrockModel` takes a single `modelId: string` parameter, but the new code passes `provider` as the first argument. The function checks whether the string contains `"anthropic.claude"` or `"anthropic/claude"` — `"amazon-bedrock"` never matches either pattern, so **the wrapper is silently never applied to Bedrock Anthropic models**, exactly contradicting the PR description's stated Bedrock support goal. Compare the existing correct usage at line 299: ```typescript if (provider === "amazon-bedrock" && !isAnthropicBedrockModel(modelId)) { ``` Fix: ```suggestion if (provider === "anthropic" || (provider === "amazon-bedrock" && isAnthropicBedrockModel(modelId))) { ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-23T22:40:55Z

+  lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);
+
  if (extraSystemPrompt) {
    // Use "Subagent Context" header for minimal mode (subagents), otherwise "Group Chat Context"


# Project Context falls in the uncached dynamic block

The cache boundary is inserted at line 585, but the actual contextFiles content (# Project Context) is built and pushed significantly later — after extraSystemPrompt, reactionGuidance, and reasoningHint. This contradicts the PR description, which explicitly lists "project context" as one of the static sections that should stay cached.

For agents with large context file sets (the prompt sizes mentioned are 60–150 k tokens), this means those tokens are paid at full input-token price every turn rather than being served from cache. The sections placed after the boundary that are not truly per-turn dynamic are:

# Project Context / contextFiles — session-level, rarely changes

## Reactions — session-level guidance

## Reasoning Format — session-level hint

## Silent Replies — fully static text

## Heartbeats — changes only when heartbeat config changes

Only ## Group Chat Context (extraSystemPrompt) and ## Runtime (contains model name / capabilities) genuinely change per-turn. Consider moving the boundary to just before extraSystemPrompt is pushed (i.e. after the last truly static section), or create a second boundary so that session-level content can also be cached in its own block.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/system-prompt.ts Line: 585-588 Comment: **`# Project Context` falls in the uncached dynamic block** The cache boundary is inserted at line 585, but the actual `contextFiles` content (`# Project Context`) is built and pushed significantly later — after `extraSystemPrompt`, `reactionGuidance`, and `reasoningHint`. This contradicts the PR description, which explicitly lists "project context" as one of the **static** sections that should stay cached. For agents with large context file sets (the prompt sizes mentioned are 60–150 k tokens), this means those tokens are paid at full input-token price every turn rather than being served from cache. The sections placed after the boundary that are not truly per-turn dynamic are: - `# Project Context` / `contextFiles` — session-level, rarely changes - `## Reactions` — session-level guidance - `## Reasoning Format` — session-level hint - `## Silent Replies` — fully static text - `## Heartbeats` — changes only when heartbeat config changes Only `## Group Chat Context` (`extraSystemPrompt`) and `## Runtime` (contains model name / capabilities) genuinely change per-turn. Consider moving the boundary to just before `extraSystemPrompt` is pushed (i.e. after the last truly static section), or create a second boundary so that session-level content can also be cached in its own block. How can I resolve this? If you propose a fix, please make it concise.

coletebou · 2026-03-23T22:55:55Z

Closing in favor of a v2 with review feedback addressed (Bedrock arg fix, cache boundary placement, cacheRetention guard).

…able cache prefix Move per-turn dynamic content (## Runtime) into a separate system content block without cache_control, so the static prefix (tools, skills, memory, safety rules, project context, heartbeats) stays cached across turns. Implementation: - Add SYSTEM_PROMPT_CACHE_BOUNDARY delimiter in system-prompt.ts right before ## Runtime (the only truly dynamic section) - Add createAnthropicSystemPromptCacheSplitWrapper in anthropic-stream-wrappers.ts that splits on the delimiter in onPayload, preserving cache_control only on the static prefix - Wire the wrapper for direct Anthropic and Bedrock providers in extra-params.ts, gated on cacheRetention being enabled - Strip delimiter harmlessly when caching is not enabled (string path) v2 — addresses review feedback from openclaw#53203: - Fix isAnthropicBedrockModel arg (was passing provider, now modelId) - Move boundary after project context/heartbeats (before ## Runtime) - Guard wrapper on cacheRetention !== "none" to avoid silent cache enables - Fix oxfmt formatting Closes openclaw#49700 Related: openclaw#18963, openclaw#19989, openclaw#20894, openclaw#43232

openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Mar 23, 2026

chatgpt-codex-connector Bot reviewed Mar 23, 2026

View reviewed changes

greptile-apps Bot reviewed Mar 23, 2026

View reviewed changes

coletebou closed this Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(anthropic): split system prompt into static/dynamic blocks for stable cache prefix#53203

fix(anthropic): split system prompt into static/dynamic blocks for stable cache prefix#53203
coletebou wants to merge 1 commit into
openclaw:mainfrom
coletebou:fix/stable-system-prompt-cache

coletebou commented Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Uh oh!

greptile-apps Bot commented Mar 23, 2026

Uh oh!

greptile-apps Bot Mar 23, 2026

Uh oh!

greptile-apps Bot Mar 23, 2026

Uh oh!

coletebou commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if (provider === "anthropic" \|\| isAnthropicBedrockModel(provider, modelId)) {
	if (provider === "anthropic" \|\| (provider === "amazon-bedrock" && isAnthropicBedrockModel(modelId))) {

Uh oh!

Conversation

coletebou commented Mar 23, 2026

Summary

Problem

Root cause

Fix

Design decisions

Scope

Security impact

Test plan

Related issues

AI-assisted

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Mar 23, 2026

Greptile Summary

Confidence Score: 3/5

Uh oh!

greptile-apps Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coletebou commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant