Skip to content

fix(anthropic): split system prompt into static/dynamic blocks for stable cache prefix#53203

Closed
coletebou wants to merge 1 commit into
openclaw:mainfrom
coletebou:fix/stable-system-prompt-cache
Closed

fix(anthropic): split system prompt into static/dynamic blocks for stable cache prefix#53203
coletebou wants to merge 1 commit into
openclaw:mainfrom
coletebou:fix/stable-system-prompt-cache

Conversation

@coletebou

Copy link
Copy Markdown
Contributor

Summary

Split the monolithic system prompt into two Anthropic API content blocks — a static prefix (cached) and a dynamic suffix (uncached) — so the static prefix stays cached across turns instead of being re-written on every API call.

Problem

buildAgentSystemPrompt() produces a single string containing both static sections (tools, skills, memory, safety rules, project context) and dynamic sections (group context via extraSystemPrompt, ## Runtime info). Anthropic's prompt cache is prefix-based: any byte change in the system content invalidates everything after it.

Since extraSystemPrompt contains per-message metadata and ## Runtime contains model/capabilities that change on /model switches, the cache prefix breaks on most turns. Measured on a real multi-tenant deployment:

Metric Before After (expected)
Cache miss rate 44% (16/36 calls) <5%
Avg cache write on miss 63,333 tokens <1,000 tokens
Cost per message (cache writes) $0.36 ~$0.003

Over 930 API calls on a single agent today, this caused $74.30 in unnecessary cache write costs — 73% of the agent's total spend.

Root cause

The Anthropic Messages API wraps the system prompt in a single cache_control: { type: "ephemeral" } content block. When extraSystemPrompt (group context, sender metadata) or ## Runtime (model name, capabilities) changes between turns, the entire ~60-150k token system prompt is re-cached from scratch instead of incrementally appending ~200-500 new tokens.

Turn 1: [static 50k | dynamic A 10k] → cache write 60k ✓
Turn 2: [static 50k | dynamic B 10k] → cache MISS, re-write 60k ✗ (dynamic changed)

Fix

Three small changes across 3 files (106 lines added, 0 removed):

  1. src/agents/system-prompt.ts: Export a SYSTEM_PROMPT_CACHE_BOUNDARY delimiter constant. Insert it between the last static section and the first dynamic section (extraSystemPrompt, ## Runtime).

  2. src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts: Add createAnthropicSystemPromptCacheSplitWrapper() — an onPayload stream wrapper that splits the system content at the delimiter into two blocks. The static prefix keeps its cache_control: { type: "ephemeral" } from pi-ai. The dynamic suffix gets no cache_control, so changes to it don't invalidate the prefix.

  3. src/agents/pi-embedded-runner/extra-params.ts: Wire the new wrapper for anthropic and amazon-bedrock providers in applyExtraParamsToAgent().

Turn 1: [static 50k ← cached | dynamic A 10k] → cache write 60k ✓
Turn 2: [static 50k ← cache HIT | dynamic B 10k] → cache read 50k, write 0.5k ✓

Design decisions

  • Delimiter approach (vs. structured return type): Uses an HTML comment marker <!-- OPENCLAW_CACHE_BOUNDARY --> that's invisible to models. This avoids changing buildAgentSystemPrompt's return type (which would be a breaking interface change for context engines and plugins). The delimiter is stripped at the transport layer.
  • onPayload wrapper (vs. pi-ai changes): Follows the same pattern as createOpenRouterSystemCacheWrapper in proxy-stream-wrappers.ts. No changes needed to the pi-ai library or AgentSession interface.
  • Graceful degradation: If the delimiter isn't present (e.g. promptMode: "none", subagents, or non-Anthropic providers), the wrapper is a no-op — the system prompt passes through unchanged.
  • Bedrock support: Applied to amazon-bedrock provider as well, since Bedrock Anthropic models use the same prefix-based caching.

Scope

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Security impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Test plan

  • Send consecutive messages in a DM session on Anthropic — verify cacheRead dominates after the first turn
  • Switch models mid-session (/model sonnet) — verify only the dynamic block re-caches, not the full prefix
  • Test group chat with extraSystemPrompt — verify static prefix stays cached across different group messages
  • Test subagent (minimal prompt mode) — verify wrapper is a no-op when delimiter is absent
  • Test non-Anthropic provider (OpenAI, Google) — verify no behavioral change
  • Test OpenRouter Anthropic models — verify existing createOpenRouterSystemCacheWrapper still works (this PR doesn't touch it)

Related issues

AI-assisted

This PR was developed with AI assistance (Claude Code) based on analysis of real production cache miss data from a 33-tenant OpenClaw deployment.

…able cache prefix

Move per-turn dynamic content (extraSystemPrompt, ## Runtime) into a
separate system content block without cache_control, so the static
prefix (tools, skills, memory, safety rules, project context) stays
cached across turns.

Anthropic's prompt cache is prefix-based — any byte change in the
system content invalidates the cache for all content after it. The
current monolithic system prompt includes sections that change every
turn (group context, runtime info, model capabilities), causing full
cache re-writes of ~60-150k tokens on every API call instead of
incremental ~200-500 token appends.

Implementation:
- Add SYSTEM_PROMPT_CACHE_BOUNDARY delimiter in system-prompt.ts
  between static and dynamic sections
- Add createAnthropicSystemPromptCacheSplitWrapper in
  anthropic-stream-wrappers.ts that splits on the delimiter in
  onPayload, preserving cache_control only on the static prefix
- Wire the wrapper for direct Anthropic and Bedrock providers in
  extra-params.ts

Measured impact on a real deployment (33 tenant multi-agent):
- Before: 44% cache miss rate, $0.36/message in cache writes alone
- After: static prefix stays cached, cache writes drop to incremental

Closes openclaw#49700
Related: openclaw#18963, openclaw#19989, openclaw#20894, openclaw#43232
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Mar 23, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f8b57136b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +364 to +367
payloadObj.system = [
...(staticPart
? [{ type: "text", text: staticPart, cache_control: { type: "ephemeral" } }]
: []),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Respect disabled Anthropic caching in the string split path

When applyExtraParamsToAgent() installs this wrapper for Anthropic, requests that did not opt into cacheRetention still reach this branch with payload.system as a plain string. Re-emitting the static prefix with cache_control: { type: "ephemeral" } silently turns prompt caching on anyway, so sessions that previously had no cache writes — or Bedrock calls explicitly configured with cacheRetention: "none" when pi-ai serializes back to a string — now start paying cache-write cost instead of preserving the old behavior.

Useful? React with 👍 / 👎.

// --- Cache boundary: everything above is static per-session; everything below
// may change per-turn (group context, runtime info). Providers with prefix-based
// caching (Anthropic) split here so the static prefix stays cached.
lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Move the cache boundary below the injected project context

Placing the split marker here makes everything after it uncached, including # Project Context and the injected file bodies added later in buildAgentSystemPrompt(). In sessions that load large repo context files, those tokens will still be rewritten on every turn even when only extraSystemPrompt or runtime metadata changes, so the expensive part of the prompt never benefits from the new stable cache prefix.

Useful? React with 👍 / 👎.

@greptile-apps

greptile-apps Bot commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR splits the monolithic Anthropic system prompt into a cached static prefix and an uncached dynamic suffix using an HTML comment delimiter, following the same onPayload wrapper pattern already used for OpenRouter. The approach is sound and the core wrapper implementation in anthropic-stream-wrappers.ts is clean and consistent with existing conventions.

Two issues were found:

  • Bedrock support is broken (P1): isAnthropicBedrockModel(provider, modelId) at extra-params.ts:253 passes provider as the argument instead of modelId. Since "amazon-bedrock" does not contain "anthropic.claude" or "anthropic/claude", the wrapper is silently never applied for Bedrock Anthropic models. The fix mirrors the existing correct call pattern at line 299: provider === "amazon-bedrock" && isAnthropicBedrockModel(modelId).
  • Cache boundary placement puts session-level content in the uncached block (P2): The boundary is inserted just before extraSystemPrompt, but several sections that don't change per-turn — # Project Context (potentially 60–150 k tokens), ## Reactions, ## Reasoning Format, ## Silent Replies, and ## Heartbeats — also end up in the dynamic (uncached) block. The PR description explicitly lists "project context" as a static section, so this appears to be an unintentional placement that could significantly increase input token costs for agents with large context files.

Confidence Score: 3/5

  • Not safe to merge as-is — the Bedrock support has a silent argument-order bug that must be fixed before this reaches production.
  • The wrapper logic itself is correct and the caching improvement for direct Anthropic is real. However, the isAnthropicBedrockModel(provider, modelId) argument-order mistake means the feature silently does nothing for Bedrock Anthropic models, contradicting an explicitly stated goal. This needs a one-line fix before the PR is merged. The boundary placement concern is lower priority but worth addressing to match the stated design intent.
  • src/agents/pi-embedded-runner/extra-params.ts (wrong argument on line 253) and src/agents/system-prompt.ts (boundary placement that leaves project context uncached).

Fix All in Codex Fix All in Claude Code

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/extra-params.ts
Line: 253

Comment:
**Wrong argument passed to `isAnthropicBedrockModel`**

`isAnthropicBedrockModel` takes a single `modelId: string` parameter, but the new code passes `provider` as the first argument. The function checks whether the string contains `"anthropic.claude"` or `"anthropic/claude"``"amazon-bedrock"` never matches either pattern, so **the wrapper is silently never applied to Bedrock Anthropic models**, exactly contradicting the PR description's stated Bedrock support goal.

Compare the existing correct usage at line 299:
```typescript
if (provider === "amazon-bedrock" && !isAnthropicBedrockModel(modelId)) {
```

Fix:
```suggestion
  if (provider === "anthropic" || (provider === "amazon-bedrock" && isAnthropicBedrockModel(modelId))) {
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/system-prompt.ts
Line: 585-588

Comment:
**`# Project Context` falls in the uncached dynamic block**

The cache boundary is inserted at line 585, but the actual `contextFiles` content (`# Project Context`) is built and pushed significantly later — after `extraSystemPrompt`, `reactionGuidance`, and `reasoningHint`. This contradicts the PR description, which explicitly lists "project context" as one of the **static** sections that should stay cached.

For agents with large context file sets (the prompt sizes mentioned are 60–150 k tokens), this means those tokens are paid at full input-token price every turn rather than being served from cache. The sections placed after the boundary that are not truly per-turn dynamic are:

- `# Project Context` / `contextFiles` — session-level, rarely changes
- `## Reactions` — session-level guidance
- `## Reasoning Format` — session-level hint
- `## Silent Replies` — fully static text
- `## Heartbeats` — changes only when heartbeat config changes

Only `## Group Chat Context` (`extraSystemPrompt`) and `## Runtime` (contains model name / capabilities) genuinely change per-turn. Consider moving the boundary to just before `extraSystemPrompt` is pushed (i.e. after the last truly static section), or create a second boundary so that session-level content can also be cached in its own block.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(anthropic): split system prompt into..." | Re-trigger Greptile

// Split the system prompt into static (cached) and dynamic (uncached) blocks
// for Anthropic providers. This preserves cache hits across turns by keeping
// per-turn dynamic content (group context, runtime info) out of the cached prefix.
if (provider === "anthropic" || isAnthropicBedrockModel(provider, modelId)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Wrong argument passed to isAnthropicBedrockModel

isAnthropicBedrockModel takes a single modelId: string parameter, but the new code passes provider as the first argument. The function checks whether the string contains "anthropic.claude" or "anthropic/claude""amazon-bedrock" never matches either pattern, so the wrapper is silently never applied to Bedrock Anthropic models, exactly contradicting the PR description's stated Bedrock support goal.

Compare the existing correct usage at line 299:

if (provider === "amazon-bedrock" && !isAnthropicBedrockModel(modelId)) {

Fix:

Suggested change
if (provider === "anthropic" || isAnthropicBedrockModel(provider, modelId)) {
if (provider === "anthropic" || (provider === "amazon-bedrock" && isAnthropicBedrockModel(modelId))) {
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/extra-params.ts
Line: 253

Comment:
**Wrong argument passed to `isAnthropicBedrockModel`**

`isAnthropicBedrockModel` takes a single `modelId: string` parameter, but the new code passes `provider` as the first argument. The function checks whether the string contains `"anthropic.claude"` or `"anthropic/claude"``"amazon-bedrock"` never matches either pattern, so **the wrapper is silently never applied to Bedrock Anthropic models**, exactly contradicting the PR description's stated Bedrock support goal.

Compare the existing correct usage at line 299:
```typescript
if (provider === "amazon-bedrock" && !isAnthropicBedrockModel(modelId)) {
```

Fix:
```suggestion
  if (provider === "anthropic" || (provider === "amazon-bedrock" && isAnthropicBedrockModel(modelId))) {
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex Fix in Claude Code

Comment on lines +585 to 588
lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);

if (extraSystemPrompt) {
// Use "Subagent Context" header for minimal mode (subagents), otherwise "Group Chat Context"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 # Project Context falls in the uncached dynamic block

The cache boundary is inserted at line 585, but the actual contextFiles content (# Project Context) is built and pushed significantly later — after extraSystemPrompt, reactionGuidance, and reasoningHint. This contradicts the PR description, which explicitly lists "project context" as one of the static sections that should stay cached.

For agents with large context file sets (the prompt sizes mentioned are 60–150 k tokens), this means those tokens are paid at full input-token price every turn rather than being served from cache. The sections placed after the boundary that are not truly per-turn dynamic are:

  • # Project Context / contextFiles — session-level, rarely changes
  • ## Reactions — session-level guidance
  • ## Reasoning Format — session-level hint
  • ## Silent Replies — fully static text
  • ## Heartbeats — changes only when heartbeat config changes

Only ## Group Chat Context (extraSystemPrompt) and ## Runtime (contains model name / capabilities) genuinely change per-turn. Consider moving the boundary to just before extraSystemPrompt is pushed (i.e. after the last truly static section), or create a second boundary so that session-level content can also be cached in its own block.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/system-prompt.ts
Line: 585-588

Comment:
**`# Project Context` falls in the uncached dynamic block**

The cache boundary is inserted at line 585, but the actual `contextFiles` content (`# Project Context`) is built and pushed significantly later — after `extraSystemPrompt`, `reactionGuidance`, and `reasoningHint`. This contradicts the PR description, which explicitly lists "project context" as one of the **static** sections that should stay cached.

For agents with large context file sets (the prompt sizes mentioned are 60–150 k tokens), this means those tokens are paid at full input-token price every turn rather than being served from cache. The sections placed after the boundary that are not truly per-turn dynamic are:

- `# Project Context` / `contextFiles` — session-level, rarely changes
- `## Reactions` — session-level guidance
- `## Reasoning Format` — session-level hint
- `## Silent Replies` — fully static text
- `## Heartbeats` — changes only when heartbeat config changes

Only `## Group Chat Context` (`extraSystemPrompt`) and `## Runtime` (contains model name / capabilities) genuinely change per-turn. Consider moving the boundary to just before `extraSystemPrompt` is pushed (i.e. after the last truly static section), or create a second boundary so that session-level content can also be cached in its own block.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex Fix in Claude Code

@coletebou

Copy link
Copy Markdown
Contributor Author

Closing in favor of a v2 with review feedback addressed (Bedrock arg fix, cache boundary placement, cacheRetention guard).

@coletebou coletebou closed this Mar 23, 2026
coletebou added a commit to coletebou/openclaw that referenced this pull request Mar 23, 2026
…able cache prefix

Move per-turn dynamic content (## Runtime) into a separate system
content block without cache_control, so the static prefix (tools,
skills, memory, safety rules, project context, heartbeats) stays
cached across turns.

Implementation:
- Add SYSTEM_PROMPT_CACHE_BOUNDARY delimiter in system-prompt.ts
  right before ## Runtime (the only truly dynamic section)
- Add createAnthropicSystemPromptCacheSplitWrapper in
  anthropic-stream-wrappers.ts that splits on the delimiter in
  onPayload, preserving cache_control only on the static prefix
- Wire the wrapper for direct Anthropic and Bedrock providers in
  extra-params.ts, gated on cacheRetention being enabled
- Strip delimiter harmlessly when caching is not enabled (string path)

v2 — addresses review feedback from openclaw#53203:
- Fix isAnthropicBedrockModel arg (was passing provider, now modelId)
- Move boundary after project context/heartbeats (before ## Runtime)
- Guard wrapper on cacheRetention !== "none" to avoid silent cache enables
- Fix oxfmt formatting

Closes openclaw#49700
Related: openclaw#18963, openclaw#19989, openclaw#20894, openclaw#43232
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

System prompt dynamic content invalidates Anthropic prompt caching (~10% hit rate)

1 participant