Skip to content

fix(anthropic): split system prompt into static/dynamic blocks for stable cache prefix#53225

Closed
coletebou wants to merge 2 commits into
openclaw:mainfrom
coletebou:fix/stable-system-prompt-cache-v3
Closed

fix(anthropic): split system prompt into static/dynamic blocks for stable cache prefix#53225
coletebou wants to merge 2 commits into
openclaw:mainfrom
coletebou:fix/stable-system-prompt-cache-v3

Conversation

@coletebou

Copy link
Copy Markdown
Contributor

Summary

Split the monolithic system prompt into two Anthropic API content blocks — a static prefix (cached) and a dynamic suffix (uncached) — so the static prefix stays cached across turns instead of being re-written on every API call.

Problem

buildAgentSystemPrompt() produces a single string containing both static sections (tools, skills, memory, safety rules, project context, heartbeats) and dynamic sections (extraSystemPrompt with per-message group context, ## Runtime with model/capabilities). Anthropic's prompt cache is prefix-based: any byte change invalidates everything after it.

Measured on a real multi-tenant deployment (33 agents, 930 calls today):

Metric Value
Cache miss rate 44%
Avg cache write on miss 63,333 tokens
Cost per message (cache writes) $0.36
Daily cache write cost (single agent) $74.30 (73% of total spend)

Fix

Three files, 110 lines added, 0 removed:

  1. src/agents/system-prompt.ts: Export SYSTEM_PROMPT_CACHE_BOUNDARY delimiter constant. Insert it right before extraSystemPrompt — after all static per-session sections (tools, skills, memory, safety, project context, heartbeats, silent replies).

  2. src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts: Add createAnthropicSystemPromptCacheSplitWrapper(baseStreamFn, delimiter, splitAndCache):

    • When splitAndCache=true: splits system content at the delimiter into two blocks — static prefix keeps cache_control from pi-ai, dynamic suffix gets none.
    • When splitAndCache=false: strips the delimiter from the prompt so it never leaks to the model.
    • Handles both array (pi-ai with caching) and string (no caching) system content formats.
  3. src/agents/pi-embedded-runner/extra-params.ts: Always apply the wrapper (to strip the delimiter), with splitAndCache=true only when resolveCacheRetention() returns a truthy non-"none" value AND the provider is anthropic or Bedrock Anthropic.

Design decisions

  • Delimiter approach (<!-- OPENCLAW_CACHE_BOUNDARY -->): Invisible to models, no interface change to buildAgentSystemPrompt's return type. Follows the same onPayload wrapper pattern as createOpenRouterSystemCacheWrapper.
  • Boundary placement: Before extraSystemPrompt (not before ## Runtime). In group chats, extraSystemPrompt contains per-message sender metadata and group context that changes every turn. Placing the boundary before it keeps tools, skills, memory, project context, and heartbeats in the cached prefix.
  • Always-strip guarantee: The wrapper always runs to remove the delimiter. The splitAndCache flag only controls whether it also splits into separate content blocks with differential cache_control. This prevents the delimiter from leaking to the model on any provider or cache configuration.
  • cacheRetention guard: Cache-splitting is gated on resolveCacheRetention() returning a truthy non-"none" value. Sessions that opted out of caching via cacheRetention: "none" get strip-only behavior — no silent cache writes introduced.
  • Bedrock: Uses isAnthropicBedrockModel(modelId) (not provider) to correctly detect Bedrock Anthropic models, matching the existing usage pattern at line 299.

Scope

  • Gateway / orchestration

Security impact

None — no new permissions, secrets, network calls, or execution surface changes.

Test plan

  • Consecutive DM messages on Anthropic — verify cacheRead dominates after first turn
  • Group chat with extraSystemPrompt — verify static prefix stays cached
  • /model switch mid-session — verify only dynamic block re-caches
  • cacheRetention: "none" — verify delimiter stripped, no cache writes
  • Non-Anthropic provider (OpenAI, Google) — verify delimiter stripped, no behavioral change
  • Bedrock Anthropic model — verify wrapper splits correctly
  • OpenRouter Anthropic — verify existing createOpenRouterSystemCacheWrapper unaffected
  • Subagent (promptMode: "minimal") — verify wrapper is no-op when delimiter absent

Related

AI-assisted

Developed with Claude Code. Cache analysis performed on real production data from a 33-tenant OpenClaw deployment.

…able cache prefix

Move per-turn dynamic content (extraSystemPrompt, ## Runtime) into a
separate system content block without cache_control, so the static
prefix (tools, skills, memory, safety rules, project context) stays
cached across turns.

Anthropic's prompt cache is prefix-based — any byte change in the
system content invalidates the cache for all content after it. The
current monolithic system prompt includes sections that change every
turn (group context, runtime info, model capabilities), causing full
cache re-writes of ~60-150k tokens on every API call instead of
incremental ~200-500 token appends.

Implementation:
- Add SYSTEM_PROMPT_CACHE_BOUNDARY delimiter in system-prompt.ts
  between static and dynamic sections
- Add createAnthropicSystemPromptCacheSplitWrapper in
  anthropic-stream-wrappers.ts that splits on the delimiter in
  onPayload, preserving cache_control only on the static prefix
- Wire the wrapper for direct Anthropic and Bedrock providers in
  extra-params.ts

Measured impact on a real deployment (33 tenant multi-agent):
- Before: 44% cache miss rate, $0.36/message in cache writes alone
- After: static prefix stays cached, cache writes drop to incremental

Closes openclaw#49700
Related: openclaw#18963, openclaw#19989, openclaw#20894, openclaw#43232
…able cache prefix

Move per-turn dynamic content (## Runtime) into a separate system
content block without cache_control, so the static prefix (tools,
skills, memory, safety rules, project context, heartbeats) stays
cached across turns.

Implementation:
- Add SYSTEM_PROMPT_CACHE_BOUNDARY delimiter in system-prompt.ts
  right before ## Runtime (the only truly dynamic section)
- Add createAnthropicSystemPromptCacheSplitWrapper in
  anthropic-stream-wrappers.ts that splits on the delimiter in
  onPayload, preserving cache_control only on the static prefix
- Wire the wrapper for direct Anthropic and Bedrock providers in
  extra-params.ts, gated on cacheRetention being enabled
- Strip delimiter harmlessly when caching is not enabled (string path)

v2 — addresses review feedback from openclaw#53203:
- Fix isAnthropicBedrockModel arg (was passing provider, now modelId)
- Move boundary after project context/heartbeats (before ## Runtime)
- Guard wrapper on cacheRetention !== "none" to avoid silent cache enables
- Fix oxfmt formatting

Closes openclaw#49700
Related: openclaw#18963, openclaw#19989, openclaw#20894, openclaw#43232
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Mar 23, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 72094e2740

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// per-turn (group context, runtime info). The delimiter is stripped or used to
// split system content blocks at the transport layer — see
// createAnthropicSystemPromptCacheSplitWrapper.
lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid emitting the cache boundary in shared system prompts

buildAgentSystemPrompt() is used outside the embedded Anthropic path, but the new marker is inserted unconditionally here. The only stripper lives in src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts, so other transports now send the literal <!-- OPENCLAW_CACHE_BOUNDARY --> to models — for example src/agents/openai-ws-stream.ts:872-878 forwards context.systemPrompt as instructions, and src/agents/cli-runner.ts:155-168 plus src/agents/cli-runner/helpers.ts:356-362 pass the prompt straight to external CLIs. That changes prompts for every CLI/OpenAI-WS session even though the boundary was supposed to be transport-only.

Useful? React with 👍 / 👎.

// per-turn (group context, runtime info). The delimiter is stripped or used to
// split system content blocks at the transport layer — see
// createAnthropicSystemPromptCacheSplitWrapper.
lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Move the cache boundary below the injected Project Context

Placing the delimiter here leaves # Project Context, ## Silent Replies, and ## Heartbeats on the uncached side (src/agents/system-prompt.ts:622-682). In sessions that load injected files such as AGENTS.md/SOUL.md, that project-context block is often the largest stable portion of the prompt, so any per-turn change in extraSystemPrompt still forces Anthropic to rewrite most of the expensive tokens. That materially undercuts the cache-write savings this change is trying to achieve.

Useful? React with 👍 / 👎.

@greptile-apps

greptile-apps Bot commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR addresses a real production cost problem — a 44% cache miss rate on Anthropic calls caused by dynamic content (extraSystemPrompt, runtime info) invalidating the entire monolithic system prompt on every turn. The fix is well-scoped: a delimiter constant is inserted into buildAgentSystemPrompt and a new onPayload wrapper splits (or strips) the prompt at the transport layer, keeping the large static prefix cached without changing any public interfaces.

Key observations:

  • The core mechanism is correct. The wrapper correctly handles the three cases (array+split, string+strip, array+strip-only) and chains cleanly onto the existing onPayload pattern used by other wrappers.
  • The shouldSplit guard in extra-params.ts correctly uses isAnthropicBedrockModel for Bedrock detection and respects cacheRetention: "none" opt-outs — no silent cache writes introduced.
  • The comment at the boundary insertion point (system-prompt.ts lines 582–586) overstates what is cached: # Project Context (file contents), ## Silent Replies, and ## Heartbeats are all pushed after the boundary via subsequent lines.push calls. They were already ordered after extraSystemPrompt in the original code, so this is not a regression, but the comment will mislead future contributors.
  • When splitAndCache=true and system arrives as a plain string (not an array), the code silently falls through to a simple delimiter replace with no content-block split. Practically harmless today (pi-ai sends arrays when caching is active), but worth documenting inline.
  • The PR test plan is marked entirely unchecked — production validation of the cache-read dominance claim would be good to see before merge.

Confidence Score: 4/5

  • Safe to merge; the fix is a net improvement with no regressions and a solid always-strip guarantee that prevents delimiter leakage.
  • The primary goal (caching the large static prefix and preventing per-turn extraSystemPrompt changes from invalidating it) is achieved correctly. The always-strip path ensures no provider or cache-config sees the raw delimiter. The two P2 findings (inaccurate comment, undocumented silent no-op for string system in split mode) are non-blocking. Score stays at 4 rather than 5 because the test plan is entirely unvalidated and the misleading comment should be corrected before this pattern is extended.
  • src/agents/system-prompt.ts — the boundary comment incorrectly lists project context and heartbeats as cached content when they are in the dynamic block.

Fix All in Codex Fix All in Claude Code

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/system-prompt.ts
Line: 582-587

Comment:
**Inaccurate comment — project context and heartbeats are below the boundary**

The comment says "everything above is stable per-session (tools, skills, memory, safety, project context, heartbeats)" — but those last two are incorrect. Inspecting the imperative `lines.push` calls that follow the boundary:

- `# Project Context` (file contents) is pushed at lines ~627–643, **after** the boundary
- `## Silent Replies` is pushed at line ~649, **after** the boundary
- `## Heartbeats` is pushed at line ~667, **after** the boundary

Only `## Runtime` is genuinely dynamic; the others (`contextFiles`, Silent Replies, Heartbeats) are per-session static content that ends up in the uncached dynamic block. This is consistent with the pre-PR ordering (all three were already after `extraSystemPrompt`), so it's not a regression, but the comment overstates what is cached and will mislead future developers.

```suggestion
  // --- Cache boundary: everything above is stable per-session (tooling, skills,
  // memory, safety, workspace/messaging sections). Everything below changes
  // per-turn (group context, runtime info) or per-session but was already
  // ordered after extraSystemPrompt (project context, silent replies, heartbeats).
  // The delimiter is stripped or used to split system content blocks at the
  // transport layer — see createAnthropicSystemPromptCacheSplitWrapper.
  lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts
Line: 362-363

Comment:
**Silent no-op when `splitAndCache=true` but system is a plain string**

When `splitAndCache` is `true` (Anthropic + caching enabled) but `system` arrives as a plain `string` rather than an array, the code falls through to this branch and only strips the delimiter with `replace`. No content-block split occurs and the caching optimisation silently doesn't fire.

In practice pi-ai almost certainly sends an array when prompt caching is active, so this is unlikely to matter today. But a future change or a different code path could send a string, and the silent degradation would be hard to diagnose. A small guard comment (or a dev-mode `log.warn`) would make this explicit:

```suggestion
          } else if (typeof system === "string" && system.includes(delimiter)) {
            // splitAndCache=true but system is a plain string (no cache_control blocks),
            // so we can't split into discrete content blocks — just strip the delimiter.
            // This normally doesn't happen when pi-ai has prompt caching enabled.
            payloadObj.system = system.replace(delimiter, "\n");
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(anthropic): split system prompt into..." | Re-trigger Greptile

Comment on lines +582 to +587
// --- Cache boundary: everything above is stable per-session (tools, skills,
// memory, safety, project context, heartbeats). Everything below may change
// per-turn (group context, runtime info). The delimiter is stripped or used to
// split system content blocks at the transport layer — see
// createAnthropicSystemPromptCacheSplitWrapper.
lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inaccurate comment — project context and heartbeats are below the boundary

The comment says "everything above is stable per-session (tools, skills, memory, safety, project context, heartbeats)" — but those last two are incorrect. Inspecting the imperative lines.push calls that follow the boundary:

  • # Project Context (file contents) is pushed at lines ~627–643, after the boundary
  • ## Silent Replies is pushed at line ~649, after the boundary
  • ## Heartbeats is pushed at line ~667, after the boundary

Only ## Runtime is genuinely dynamic; the others (contextFiles, Silent Replies, Heartbeats) are per-session static content that ends up in the uncached dynamic block. This is consistent with the pre-PR ordering (all three were already after extraSystemPrompt), so it's not a regression, but the comment overstates what is cached and will mislead future developers.

Suggested change
// --- Cache boundary: everything above is stable per-session (tools, skills,
// memory, safety, project context, heartbeats). Everything below may change
// per-turn (group context, runtime info). The delimiter is stripped or used to
// split system content blocks at the transport layer — see
// createAnthropicSystemPromptCacheSplitWrapper.
lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);
// --- Cache boundary: everything above is stable per-session (tooling, skills,
// memory, safety, workspace/messaging sections). Everything below changes
// per-turn (group context, runtime info) or per-session but was already
// ordered after extraSystemPrompt (project context, silent replies, heartbeats).
// The delimiter is stripped or used to split system content blocks at the
// transport layer — see createAnthropicSystemPromptCacheSplitWrapper.
lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/system-prompt.ts
Line: 582-587

Comment:
**Inaccurate comment — project context and heartbeats are below the boundary**

The comment says "everything above is stable per-session (tools, skills, memory, safety, project context, heartbeats)" — but those last two are incorrect. Inspecting the imperative `lines.push` calls that follow the boundary:

- `# Project Context` (file contents) is pushed at lines ~627–643, **after** the boundary
- `## Silent Replies` is pushed at line ~649, **after** the boundary
- `## Heartbeats` is pushed at line ~667, **after** the boundary

Only `## Runtime` is genuinely dynamic; the others (`contextFiles`, Silent Replies, Heartbeats) are per-session static content that ends up in the uncached dynamic block. This is consistent with the pre-PR ordering (all three were already after `extraSystemPrompt`), so it's not a regression, but the comment overstates what is cached and will mislead future developers.

```suggestion
  // --- Cache boundary: everything above is stable per-session (tooling, skills,
  // memory, safety, workspace/messaging sections). Everything below changes
  // per-turn (group context, runtime info) or per-session but was already
  // ordered after extraSystemPrompt (project context, silent replies, heartbeats).
  // The delimiter is stripped or used to split system content blocks at the
  // transport layer — see createAnthropicSystemPromptCacheSplitWrapper.
  lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex Fix in Claude Code

Comment on lines +362 to +363
} else if (typeof system === "string" && system.includes(delimiter)) {
payloadObj.system = system.replace(delimiter, "\n");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent no-op when splitAndCache=true but system is a plain string

When splitAndCache is true (Anthropic + caching enabled) but system arrives as a plain string rather than an array, the code falls through to this branch and only strips the delimiter with replace. No content-block split occurs and the caching optimisation silently doesn't fire.

In practice pi-ai almost certainly sends an array when prompt caching is active, so this is unlikely to matter today. But a future change or a different code path could send a string, and the silent degradation would be hard to diagnose. A small guard comment (or a dev-mode log.warn) would make this explicit:

Suggested change
} else if (typeof system === "string" && system.includes(delimiter)) {
payloadObj.system = system.replace(delimiter, "\n");
} else if (typeof system === "string" && system.includes(delimiter)) {
// splitAndCache=true but system is a plain string (no cache_control blocks),
// so we can't split into discrete content blocks — just strip the delimiter.
// This normally doesn't happen when pi-ai has prompt caching enabled.
payloadObj.system = system.replace(delimiter, "\n");
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts
Line: 362-363

Comment:
**Silent no-op when `splitAndCache=true` but system is a plain string**

When `splitAndCache` is `true` (Anthropic + caching enabled) but `system` arrives as a plain `string` rather than an array, the code falls through to this branch and only strips the delimiter with `replace`. No content-block split occurs and the caching optimisation silently doesn't fire.

In practice pi-ai almost certainly sends an array when prompt caching is active, so this is unlikely to matter today. But a future change or a different code path could send a string, and the silent degradation would be hard to diagnose. A small guard comment (or a dev-mode `log.warn`) would make this explicit:

```suggestion
          } else if (typeof system === "string" && system.includes(delimiter)) {
            // splitAndCache=true but system is a plain string (no cache_control blocks),
            // so we can't split into discrete content blocks — just strip the delimiter.
            // This normally doesn't happen when pi-ai has prompt caching enabled.
            payloadObj.system = system.replace(delimiter, "\n");
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex Fix in Claude Code

@vincentkoc

Copy link
Copy Markdown
Member

Superseded by #59054.

I refreshed this fix on top of current main, kept the Anthropic/Bedrock-only scope, added the changelog entry, and revalidated the touched surface (pnpm test -- src/agents/pi-embedded-runner/system-prompt-cache-boundary.test.ts, pnpm check). Closing this stale branch to keep the merge path single-threaded.

@vincentkoc

Copy link
Copy Markdown
Member

Closed in favor of #59054.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

System prompt dynamic content invalidates Anthropic prompt caching (~10% hit rate)

2 participants