fix(anthropic): split system prompt into static/dynamic blocks for stable cache prefix by coletebou · Pull Request #53225 · openclaw/openclaw

coletebou · 2026-03-23T23:24:23Z

Summary

Split the monolithic system prompt into two Anthropic API content blocks — a static prefix (cached) and a dynamic suffix (uncached) — so the static prefix stays cached across turns instead of being re-written on every API call.

Problem

buildAgentSystemPrompt() produces a single string containing both static sections (tools, skills, memory, safety rules, project context, heartbeats) and dynamic sections (extraSystemPrompt with per-message group context, ## Runtime with model/capabilities). Anthropic's prompt cache is prefix-based: any byte change invalidates everything after it.

Measured on a real multi-tenant deployment (33 agents, 930 calls today):

Metric	Value
Cache miss rate	44%
Avg cache write on miss	63,333 tokens
Cost per message (cache writes)	$0.36
Daily cache write cost (single agent)	$74.30 (73% of total spend)

Fix

Three files, 110 lines added, 0 removed:

src/agents/system-prompt.ts: Export SYSTEM_PROMPT_CACHE_BOUNDARY delimiter constant. Insert it right before extraSystemPrompt — after all static per-session sections (tools, skills, memory, safety, project context, heartbeats, silent replies).
src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts: Add createAnthropicSystemPromptCacheSplitWrapper(baseStreamFn, delimiter, splitAndCache):
- When splitAndCache=true: splits system content at the delimiter into two blocks — static prefix keeps cache_control from pi-ai, dynamic suffix gets none.
- When splitAndCache=false: strips the delimiter from the prompt so it never leaks to the model.
- Handles both array (pi-ai with caching) and string (no caching) system content formats.
src/agents/pi-embedded-runner/extra-params.ts: Always apply the wrapper (to strip the delimiter), with splitAndCache=true only when resolveCacheRetention() returns a truthy non-"none" value AND the provider is anthropic or Bedrock Anthropic.

Design decisions

Delimiter approach (): Invisible to models, no interface change to buildAgentSystemPrompt's return type. Follows the same onPayload wrapper pattern as createOpenRouterSystemCacheWrapper.
Boundary placement: Before extraSystemPrompt (not before ## Runtime). In group chats, extraSystemPrompt contains per-message sender metadata and group context that changes every turn. Placing the boundary before it keeps tools, skills, memory, project context, and heartbeats in the cached prefix.
Always-strip guarantee: The wrapper always runs to remove the delimiter. The splitAndCache flag only controls whether it also splits into separate content blocks with differential cache_control. This prevents the delimiter from leaking to the model on any provider or cache configuration.
cacheRetention guard: Cache-splitting is gated on resolveCacheRetention() returning a truthy non-"none" value. Sessions that opted out of caching via cacheRetention: "none" get strip-only behavior — no silent cache writes introduced.
Bedrock: Uses isAnthropicBedrockModel(modelId) (not provider) to correctly detect Bedrock Anthropic models, matching the existing usage pattern at line 299.

Scope

Gateway / orchestration

Security impact

None — no new permissions, secrets, network calls, or execution surface changes.

Test plan

Consecutive DM messages on Anthropic — verify cacheRead dominates after first turn
Group chat with extraSystemPrompt — verify static prefix stays cached
/model switch mid-session — verify only dynamic block re-caches
cacheRetention: "none" — verify delimiter stripped, no cache writes
Non-Anthropic provider (OpenAI, Google) — verify delimiter stripped, no behavioral change
Bedrock Anthropic model — verify wrapper splits correctly
OpenRouter Anthropic — verify existing createOpenRouterSystemCacheWrapper unaffected
Subagent (promptMode: "minimal") — verify wrapper is no-op when delimiter absent

AI-assisted

Developed with Claude Code. Cache analysis performed on real production data from a 33-tenant OpenClaw deployment.

…able cache prefix Move per-turn dynamic content (extraSystemPrompt, ## Runtime) into a separate system content block without cache_control, so the static prefix (tools, skills, memory, safety rules, project context) stays cached across turns. Anthropic's prompt cache is prefix-based — any byte change in the system content invalidates the cache for all content after it. The current monolithic system prompt includes sections that change every turn (group context, runtime info, model capabilities), causing full cache re-writes of ~60-150k tokens on every API call instead of incremental ~200-500 token appends. Implementation: - Add SYSTEM_PROMPT_CACHE_BOUNDARY delimiter in system-prompt.ts between static and dynamic sections - Add createAnthropicSystemPromptCacheSplitWrapper in anthropic-stream-wrappers.ts that splits on the delimiter in onPayload, preserving cache_control only on the static prefix - Wire the wrapper for direct Anthropic and Bedrock providers in extra-params.ts Measured impact on a real deployment (33 tenant multi-agent): - Before: 44% cache miss rate, $0.36/message in cache writes alone - After: static prefix stays cached, cache writes drop to incremental Closes openclaw#49700 Related: openclaw#18963, openclaw#19989, openclaw#20894, openclaw#43232

…able cache prefix Move per-turn dynamic content (## Runtime) into a separate system content block without cache_control, so the static prefix (tools, skills, memory, safety rules, project context, heartbeats) stays cached across turns. Implementation: - Add SYSTEM_PROMPT_CACHE_BOUNDARY delimiter in system-prompt.ts right before ## Runtime (the only truly dynamic section) - Add createAnthropicSystemPromptCacheSplitWrapper in anthropic-stream-wrappers.ts that splits on the delimiter in onPayload, preserving cache_control only on the static prefix - Wire the wrapper for direct Anthropic and Bedrock providers in extra-params.ts, gated on cacheRetention being enabled - Strip delimiter harmlessly when caching is not enabled (string path) v2 — addresses review feedback from openclaw#53203: - Fix isAnthropicBedrockModel arg (was passing provider, now modelId) - Move boundary after project context/heartbeats (before ## Runtime) - Guard wrapper on cacheRetention !== "none" to avoid silent cache enables - Fix oxfmt formatting Closes openclaw#49700 Related: openclaw#18963, openclaw#19989, openclaw#20894, openclaw#43232

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 72094e2740

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T23:28:49Z

+  // per-turn (group context, runtime info). The delimiter is stripped or used to
+  // split system content blocks at the transport layer — see
+  // createAnthropicSystemPromptCacheSplitWrapper.
+  lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);


Avoid emitting the cache boundary in shared system prompts

buildAgentSystemPrompt() is used outside the embedded Anthropic path, but the new marker is inserted unconditionally here. The only stripper lives in src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts, so other transports now send the literal  to models — for example src/agents/openai-ws-stream.ts:872-878 forwards context.systemPrompt as instructions, and src/agents/cli-runner.ts:155-168 plus src/agents/cli-runner/helpers.ts:356-362 pass the prompt straight to external CLIs. That changes prompts for every CLI/OpenAI-WS session even though the boundary was supposed to be transport-only.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-23T23:28:49Z

+  // per-turn (group context, runtime info). The delimiter is stripped or used to
+  // split system content blocks at the transport layer — see
+  // createAnthropicSystemPromptCacheSplitWrapper.
+  lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);


Move the cache boundary below the injected Project Context

Placing the delimiter here leaves # Project Context, ## Silent Replies, and ## Heartbeats on the uncached side (src/agents/system-prompt.ts:622-682). In sessions that load injected files such as AGENTS.md/SOUL.md, that project-context block is often the largest stable portion of the prompt, so any per-turn change in extraSystemPrompt still forces Anthropic to rewrite most of the expensive tokens. That materially undercuts the cache-write savings this change is trying to achieve.

Useful? React with 👍 / 👎.

greptile-apps · 2026-03-23T23:29:44Z

Greptile Summary

This PR addresses a real production cost problem — a 44% cache miss rate on Anthropic calls caused by dynamic content (extraSystemPrompt, runtime info) invalidating the entire monolithic system prompt on every turn. The fix is well-scoped: a delimiter constant is inserted into buildAgentSystemPrompt and a new onPayload wrapper splits (or strips) the prompt at the transport layer, keeping the large static prefix cached without changing any public interfaces.

Key observations:

The core mechanism is correct. The wrapper correctly handles the three cases (array+split, string+strip, array+strip-only) and chains cleanly onto the existing onPayload pattern used by other wrappers.
The shouldSplit guard in extra-params.ts correctly uses isAnthropicBedrockModel for Bedrock detection and respects cacheRetention: "none" opt-outs — no silent cache writes introduced.
The comment at the boundary insertion point (system-prompt.ts lines 582–586) overstates what is cached: # Project Context (file contents), ## Silent Replies, and ## Heartbeats are all pushed after the boundary via subsequent lines.push calls. They were already ordered after extraSystemPrompt in the original code, so this is not a regression, but the comment will mislead future contributors.
When splitAndCache=true and system arrives as a plain string (not an array), the code silently falls through to a simple delimiter replace with no content-block split. Practically harmless today (pi-ai sends arrays when caching is active), but worth documenting inline.
The PR test plan is marked entirely unchecked — production validation of the cache-read dominance claim would be good to see before merge.

Confidence Score: 4/5

Safe to merge; the fix is a net improvement with no regressions and a solid always-strip guarantee that prevents delimiter leakage.
The primary goal (caching the large static prefix and preventing per-turn extraSystemPrompt changes from invalidating it) is achieved correctly. The always-strip path ensures no provider or cache-config sees the raw delimiter. The two P2 findings (inaccurate comment, undocumented silent no-op for string system in split mode) are non-blocking. Score stays at 4 rather than 5 because the test plan is entirely unvalidated and the misleading comment should be corrected before this pattern is extended.
src/agents/system-prompt.ts — the boundary comment incorrectly lists project context and heartbeats as cached content when they are in the dynamic block.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/agents/system-prompt.ts
Line: 582-587

Comment:
**Inaccurate comment — project context and heartbeats are below the boundary**

The comment says "everything above is stable per-session (tools, skills, memory, safety, project context, heartbeats)" — but those last two are incorrect. Inspecting the imperative `lines.push` calls that follow the boundary:

- `# Project Context` (file contents) is pushed at lines ~627–643, **after** the boundary
- `## Silent Replies` is pushed at line ~649, **after** the boundary
- `## Heartbeats` is pushed at line ~667, **after** the boundary

Only `## Runtime` is genuinely dynamic; the others (`contextFiles`, Silent Replies, Heartbeats) are per-session static content that ends up in the uncached dynamic block. This is consistent with the pre-PR ordering (all three were already after `extraSystemPrompt`), so it's not a regression, but the comment overstates what is cached and will mislead future developers.

```suggestion
  // --- Cache boundary: everything above is stable per-session (tooling, skills,
  // memory, safety, workspace/messaging sections). Everything below changes
  // per-turn (group context, runtime info) or per-session but was already
  // ordered after extraSystemPrompt (project context, silent replies, heartbeats).
  // The delimiter is stripped or used to split system content blocks at the
  // transport layer — see createAnthropicSystemPromptCacheSplitWrapper.
  lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts
Line: 362-363

Comment:
**Silent no-op when `splitAndCache=true` but system is a plain string**

When `splitAndCache` is `true` (Anthropic + caching enabled) but `system` arrives as a plain `string` rather than an array, the code falls through to this branch and only strips the delimiter with `replace`. No content-block split occurs and the caching optimisation silently doesn't fire.

In practice pi-ai almost certainly sends an array when prompt caching is active, so this is unlikely to matter today. But a future change or a different code path could send a string, and the silent degradation would be hard to diagnose. A small guard comment (or a dev-mode `log.warn`) would make this explicit:

```suggestion
          } else if (typeof system === "string" && system.includes(delimiter)) {
            // splitAndCache=true but system is a plain string (no cache_control blocks),
            // so we can't split into discrete content blocks — just strip the delimiter.
            // This normally doesn't happen when pi-ai has prompt caching enabled.
            payloadObj.system = system.replace(delimiter, "\n");
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(anthropic): split system prompt into..." | Re-trigger Greptile}

greptile-apps · 2026-03-23T23:29:48Z

+  // --- Cache boundary: everything above is stable per-session (tools, skills,
+  // memory, safety, project context, heartbeats). Everything below may change
+  // per-turn (group context, runtime info). The delimiter is stripped or used to
+  // split system content blocks at the transport layer — see
+  // createAnthropicSystemPromptCacheSplitWrapper.
+  lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);


Inaccurate comment — project context and heartbeats are below the boundary

The comment says "everything above is stable per-session (tools, skills, memory, safety, project context, heartbeats)" — but those last two are incorrect. Inspecting the imperative lines.push calls that follow the boundary:

# Project Context (file contents) is pushed at lines ~627–643, after the boundary

## Silent Replies is pushed at line ~649, after the boundary

## Heartbeats is pushed at line ~667, after the boundary

Only ## Runtime is genuinely dynamic; the others (contextFiles, Silent Replies, Heartbeats) are per-session static content that ends up in the uncached dynamic block. This is consistent with the pre-PR ordering (all three were already after extraSystemPrompt), so it's not a regression, but the comment overstates what is cached and will mislead future developers.

Suggested change

// --- Cache boundary: everything above is stable per-session (tools, skills,

// memory, safety, project context, heartbeats). Everything below may change

// per-turn (group context, runtime info). The delimiter is stripped or used to

// split system content blocks at the transport layer — see

// createAnthropicSystemPromptCacheSplitWrapper.

lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);

// --- Cache boundary: everything above is stable per-session (tooling, skills,

// memory, safety, workspace/messaging sections). Everything below changes

// per-turn (group context, runtime info) or per-session but was already

// ordered after extraSystemPrompt (project context, silent replies, heartbeats).

// The delimiter is stripped or used to split system content blocks at the

// transport layer — see createAnthropicSystemPromptCacheSplitWrapper.

lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY);

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/system-prompt.ts Line: 582-587 Comment: **Inaccurate comment — project context and heartbeats are below the boundary** The comment says "everything above is stable per-session (tools, skills, memory, safety, project context, heartbeats)" — but those last two are incorrect. Inspecting the imperative `lines.push` calls that follow the boundary: - `# Project Context` (file contents) is pushed at lines ~627–643, **after** the boundary - `## Silent Replies` is pushed at line ~649, **after** the boundary - `## Heartbeats` is pushed at line ~667, **after** the boundary Only `## Runtime` is genuinely dynamic; the others (`contextFiles`, Silent Replies, Heartbeats) are per-session static content that ends up in the uncached dynamic block. This is consistent with the pre-PR ordering (all three were already after `extraSystemPrompt`), so it's not a regression, but the comment overstates what is cached and will mislead future developers. ```suggestion // --- Cache boundary: everything above is stable per-session (tooling, skills, // memory, safety, workspace/messaging sections). Everything below changes // per-turn (group context, runtime info) or per-session but was already // ordered after extraSystemPrompt (project context, silent replies, heartbeats). // The delimiter is stripped or used to split system content blocks at the // transport layer — see createAnthropicSystemPromptCacheSplitWrapper. lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY); ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-23T23:29:49Z

+          } else if (typeof system === "string" && system.includes(delimiter)) {
+            payloadObj.system = system.replace(delimiter, "\n");


Silent no-op when splitAndCache=true but system is a plain string

When splitAndCache is true (Anthropic + caching enabled) but system arrives as a plain string rather than an array, the code falls through to this branch and only strips the delimiter with replace. No content-block split occurs and the caching optimisation silently doesn't fire.

In practice pi-ai almost certainly sends an array when prompt caching is active, so this is unlikely to matter today. But a future change or a different code path could send a string, and the silent degradation would be hard to diagnose. A small guard comment (or a dev-mode log.warn) would make this explicit:

Suggested change

} else if (typeof system === "string" && system.includes(delimiter)) {

payloadObj.system = system.replace(delimiter, "\n");

} else if (typeof system === "string" && system.includes(delimiter)) {

// splitAndCache=true but system is a plain string (no cache_control blocks),

// so we can't split into discrete content blocks — just strip the delimiter.

// This normally doesn't happen when pi-ai has prompt caching enabled.

payloadObj.system = system.replace(delimiter, "\n");

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/pi-embedded-runner/anthropic-stream-wrappers.ts Line: 362-363 Comment: **Silent no-op when `splitAndCache=true` but system is a plain string** When `splitAndCache` is `true` (Anthropic + caching enabled) but `system` arrives as a plain `string` rather than an array, the code falls through to this branch and only strips the delimiter with `replace`. No content-block split occurs and the caching optimisation silently doesn't fire. In practice pi-ai almost certainly sends an array when prompt caching is active, so this is unlikely to matter today. But a future change or a different code path could send a string, and the silent degradation would be hard to diagnose. A small guard comment (or a dev-mode `log.warn`) would make this explicit: ```suggestion } else if (typeof system === "string" && system.includes(delimiter)) { // splitAndCache=true but system is a plain string (no cache_control blocks), // so we can't split into discrete content blocks — just strip the delimiter. // This normally doesn't happen when pi-ai has prompt caching enabled. payloadObj.system = system.replace(delimiter, "\n"); ``` How can I resolve this? If you propose a fix, please make it concise.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

vincentkoc · 2026-04-01T14:03:28Z

Superseded by #59054.

I refreshed this fix on top of current main, kept the Anthropic/Bedrock-only scope, added the changelog entry, and revalidated the touched surface (pnpm test -- src/agents/pi-embedded-runner/system-prompt-cache-boundary.test.ts, pnpm check). Closing this stale branch to keep the merge path single-threaded.

vincentkoc · 2026-04-01T14:03:33Z

Closed in favor of #59054.

coletebou added 2 commits March 23, 2026 18:35

openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Mar 23, 2026

chatgpt-codex-connector Bot reviewed Mar 23, 2026

View reviewed changes

greptile-apps Bot reviewed Mar 23, 2026

View reviewed changes

vincentkoc mentioned this pull request Apr 1, 2026

fix(agents): split system prompt cache prefix by transport #59054

Merged

25 tasks

vincentkoc closed this Apr 1, 2026

xinbenlv mentioned this pull request Apr 3, 2026

Suggestion: strengthen contributor-first PR workflow and credit practices #60108

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(anthropic): split system prompt into static/dynamic blocks for stable cache prefix#53225

fix(anthropic): split system prompt into static/dynamic blocks for stable cache prefix#53225
coletebou wants to merge 2 commits into
openclaw:mainfrom
coletebou:fix/stable-system-prompt-cache-v3

coletebou commented Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Uh oh!

greptile-apps Bot commented Mar 23, 2026

Uh oh!

greptile-apps Bot Mar 23, 2026

Uh oh!

greptile-apps Bot Mar 23, 2026

Uh oh!

vincentkoc commented Apr 1, 2026

Uh oh!

vincentkoc commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		} else if (typeof system === "string" && system.includes(delimiter)) {
		payloadObj.system = system.replace(delimiter, "\n");

Uh oh!

Conversation

coletebou commented Mar 23, 2026

Summary

Problem

Fix

Design decisions

Scope

Security impact

Test plan

Related

AI-assisted

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Mar 23, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

greptile-apps Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

vincentkoc commented Apr 1, 2026

Uh oh!

vincentkoc commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants