You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Behavior bug (incorrect output/state without crash)
Summary
The system prompt volatile suffix is assembled in a different order depending on which code path triggers a turn — normal chat, heartbeat, and ACP announce each produce different byte sequences for the same session. Since Anthropic prompt caching requires byte-identical prefixes, every path transition causes a full cache re-write of the entire context.
This is silently hemorrhaging money for anyone using heartbeats + chat on Anthropic models. Heartbeats warm one cache key, then the first user message writes a completely different one. Every. Single. Time.
Three affected code paths (verified via diagnostics.cacheTrace)
All three paths target the same session but produce different systemDigest values:
Code path
When it fires
systemDigest (first 16 chars)
Volatile suffix order
Normal chat
User sends a message
cb8a82a10654fa98
HEARTBEAT.md → Group Chat Context → Inbound Context → Runtime
Heartbeat
Every N minutes
2d44ab1ce72b8ae0
Different ordering of volatile sections
ACP announce
Background task completes
3132b2a94c36e91d
HEARTBEAT.md → Runtime → (missing sections)
The static prefix (tools, skills, workspace files) is byte-identical across all three — divergence starts in the volatile suffix below OPENCLAW_CACHE_BOUNDARY.
Real-world cost impact
Heartbeat cache war (the expensive one)
With heartbeats pointed at the Discord channel session (as recommended for cache warming):
Overnight (no user messages, just heartbeats):
10:27 heartbeat → sysDigest=2d44ab1c → cache WRITE ~110k tokens
11:22 heartbeat → sysDigest=2d44ab1c → cache READ (warm from last heartbeat)
12:17 heartbeat → sysDigest=2d44ab1c → cache READ
...pattern continues, heartbeats stay warm with each other...
Morning (user sends first message):
09:29 user chat → sysDigest=cb8a82a1 → cache WRITE ~110k tokens (BUST — different prefix than heartbeat)
09:31 user chat → sysDigest=cb8a82a1 → cache READ (warm now)
Then the cycle repeats: next heartbeat busts the chat cache, next chat message busts the heartbeat cache. Every transition between heartbeat and chat is a full context re-write.
On Claude Opus 4.6 with ~110k context, each bust costs $0.69 in cache writes (110k × $6.25/MTok). With heartbeats every 55 minutes and intermittent chat, this compounds to $5-15/day per agent in pure waste.
ACP announce cache bust
Each ACP task completion notification produces yet another different system prompt, causing ~10k cache write tokens. In coding workflows with frequent Codex spawns, this adds $0.50-2.00/day per agent.
Multiplied across agents
With 4 Anthropic leadership agents (Opus), the overnight heartbeat cache war alone was burning $20-60/day in unnecessary cache writes. We had to disable heartbeats entirely as a workaround.
Steps to reproduce
Configure an agent with cacheRetention: "long" on any Anthropic model
Set heartbeat.session to point at the agent's Discord channel session (the session where chat happens)
Enable diagnostics.cacheTrace
Let a heartbeat fire, then send a chat message
Compare systemDigest between the heartbeat turn and the chat turn
Observe: different digests, full cache re-write on every path transition
Cache trace evidence
From /logs/cache-trace.jsonl on a real production deployment:
# Heartbeats (consistent with each other, but different from chat):
2026-04-08T10:27:37 | run=8461b3f1 | sysDigest=2d44ab1ce72b8ae0 | msgs=63-64
2026-04-08T11:22:37 | run=bad90290 | sysDigest=2d44ab1ce72b8ae0 | msgs=63-66
2026-04-08T12:17:37 | run=041ec4cd | sysDigest=2d44ab1ce72b8ae0 | msgs=63-68
2026-04-08T13:12:37 | run=cd72beab | sysDigest=2d44ab1ce72b8ae0 | msgs=63-70
2026-04-08T14:07:37 | run=97cdc066 | sysDigest=2d44ab1ce72b8ae0 | msgs=63-72
# User chat (different digest):
2026-04-08T17:24:35 | run=52cdb6bc | sysDigest=cb8a82a10654fa98 | msgs=65-110
2026-04-08T17:38:06 | run=5f76d3d0 | sysDigest=cb8a82a10654fa98 | msgs=111-128
# ACP announce (yet another different digest from earlier testing):
2026-04-08T07:49:xx | run=announce | sysDigest=3132b2a94c36e91d | msgs=21
Expected behavior
All code paths for the same session should produce a byte-identical system prompt. The volatile suffix sections below OPENCLAW_CACHE_BOUNDARY must be assembled in the same deterministic order regardless of whether the turn was triggered by chat, heartbeat, or ACP announce.
Proposed fixes (any would work)
Normalize volatile section ordering across all code paths — sort sections deterministically before assembly
Add notifyPolicy parameter to sessions_spawn — as a workaround for ACP, let callers suppress announce notifications at spawn time (the openclaw tasks notify silent CLI exists but has a race condition since the task completes before the policy can be set)
Current workarounds
Heartbeats: Disabled for all Anthropic agents (loses cache warming and liveness monitoring)
ACP announces: Using PTY background exec instead of sessions_spawn (loses task tracking and completion notifications)
Both workarounds degrade the agent experience to avoid the cost penalty
Config: cacheRetention: "long", heartbeat every 55m targeting Discord channel session
Auth: Claude MAX (OAuth token) hitting api.anthropic.com
Severity
Critical cost impact — silently wastes significant money for any Anthropic user with heartbeats enabled (the default). The longer the context and the more agents you run, the worse it gets. Most users won't notice until they check their bill.
Bug type
Behavior bug (incorrect output/state without crash)
Summary
The system prompt volatile suffix is assembled in a different order depending on which code path triggers a turn — normal chat, heartbeat, and ACP announce each produce different byte sequences for the same session. Since Anthropic prompt caching requires byte-identical prefixes, every path transition causes a full cache re-write of the entire context.
This is silently hemorrhaging money for anyone using heartbeats + chat on Anthropic models. Heartbeats warm one cache key, then the first user message writes a completely different one. Every. Single. Time.
Three affected code paths (verified via
diagnostics.cacheTrace)All three paths target the same session but produce different
systemDigestvalues:cb8a82a10654fa982d44ab1ce72b8ae03132b2a94c36e91dThe static prefix (tools, skills, workspace files) is byte-identical across all three — divergence starts in the volatile suffix below
OPENCLAW_CACHE_BOUNDARY.Real-world cost impact
Heartbeat cache war (the expensive one)
With heartbeats pointed at the Discord channel session (as recommended for cache warming):
Then the cycle repeats: next heartbeat busts the chat cache, next chat message busts the heartbeat cache. Every transition between heartbeat and chat is a full context re-write.
On Claude Opus 4.6 with ~110k context, each bust costs $0.69 in cache writes (110k × $6.25/MTok). With heartbeats every 55 minutes and intermittent chat, this compounds to $5-15/day per agent in pure waste.
ACP announce cache bust
Each ACP task completion notification produces yet another different system prompt, causing ~10k cache write tokens. In coding workflows with frequent Codex spawns, this adds $0.50-2.00/day per agent.
Multiplied across agents
With 4 Anthropic leadership agents (Opus), the overnight heartbeat cache war alone was burning $20-60/day in unnecessary cache writes. We had to disable heartbeats entirely as a workaround.
Steps to reproduce
cacheRetention: "long"on any Anthropic modelheartbeat.sessionto point at the agent's Discord channel session (the session where chat happens)diagnostics.cacheTracesystemDigestbetween the heartbeat turn and the chat turnCache trace evidence
From
/logs/cache-trace.jsonlon a real production deployment:Expected behavior
All code paths for the same session should produce a byte-identical system prompt. The volatile suffix sections below
OPENCLAW_CACHE_BOUNDARYmust be assembled in the same deterministic order regardless of whether the turn was triggered by chat, heartbeat, or ACP announce.Proposed fixes (any would work)
notifyPolicyparameter tosessions_spawn— as a workaround for ACP, let callers suppress announce notifications at spawn time (theopenclaw tasks notify silentCLI exists but has a race condition since the task completes before the policy can be set)Current workarounds
Related
Environment
cacheRetention: "long", heartbeat every 55m targeting Discord channel sessionSeverity
Critical cost impact — silently wastes significant money for any Anthropic user with heartbeats enabled (the default). The longer the context and the more agents you run, the worse it gets. Most users won't notice until they check their bill.