Skip to content

System prompt assembled differently across code paths (chat/heartbeat/announce), causing continuous Anthropic cache invalidation #63030

@anthonyconnelly

Description

@anthonyconnelly

Bug type

Behavior bug (incorrect output/state without crash)

Summary

The system prompt volatile suffix is assembled in a different order depending on which code path triggers a turn — normal chat, heartbeat, and ACP announce each produce different byte sequences for the same session. Since Anthropic prompt caching requires byte-identical prefixes, every path transition causes a full cache re-write of the entire context.

This is silently hemorrhaging money for anyone using heartbeats + chat on Anthropic models. Heartbeats warm one cache key, then the first user message writes a completely different one. Every. Single. Time.

Three affected code paths (verified via diagnostics.cacheTrace)

All three paths target the same session but produce different systemDigest values:

Code path When it fires systemDigest (first 16 chars) Volatile suffix order
Normal chat User sends a message cb8a82a10654fa98 HEARTBEAT.md → Group Chat Context → Inbound Context → Runtime
Heartbeat Every N minutes 2d44ab1ce72b8ae0 Different ordering of volatile sections
ACP announce Background task completes 3132b2a94c36e91d HEARTBEAT.md → Runtime → (missing sections)

The static prefix (tools, skills, workspace files) is byte-identical across all three — divergence starts in the volatile suffix below OPENCLAW_CACHE_BOUNDARY.

Real-world cost impact

Heartbeat cache war (the expensive one)

With heartbeats pointed at the Discord channel session (as recommended for cache warming):

Overnight (no user messages, just heartbeats):
  10:27 heartbeat  → sysDigest=2d44ab1c → cache WRITE ~110k tokens
  11:22 heartbeat  → sysDigest=2d44ab1c → cache READ (warm from last heartbeat)  
  12:17 heartbeat  → sysDigest=2d44ab1c → cache READ
  ...pattern continues, heartbeats stay warm with each other...

Morning (user sends first message):
  09:29 user chat  → sysDigest=cb8a82a1 → cache WRITE ~110k tokens (BUST — different prefix than heartbeat)
  09:31 user chat  → sysDigest=cb8a82a1 → cache READ (warm now)

Then the cycle repeats: next heartbeat busts the chat cache, next chat message busts the heartbeat cache. Every transition between heartbeat and chat is a full context re-write.

On Claude Opus 4.6 with ~110k context, each bust costs $0.69 in cache writes (110k × $6.25/MTok). With heartbeats every 55 minutes and intermittent chat, this compounds to $5-15/day per agent in pure waste.

ACP announce cache bust

Each ACP task completion notification produces yet another different system prompt, causing ~10k cache write tokens. In coding workflows with frequent Codex spawns, this adds $0.50-2.00/day per agent.

Multiplied across agents

With 4 Anthropic leadership agents (Opus), the overnight heartbeat cache war alone was burning $20-60/day in unnecessary cache writes. We had to disable heartbeats entirely as a workaround.

Steps to reproduce

  1. Configure an agent with cacheRetention: "long" on any Anthropic model
  2. Set heartbeat.session to point at the agent's Discord channel session (the session where chat happens)
  3. Enable diagnostics.cacheTrace
  4. Let a heartbeat fire, then send a chat message
  5. Compare systemDigest between the heartbeat turn and the chat turn
  6. Observe: different digests, full cache re-write on every path transition

Cache trace evidence

From /logs/cache-trace.jsonl on a real production deployment:

# Heartbeats (consistent with each other, but different from chat):
2026-04-08T10:27:37 | run=8461b3f1 | sysDigest=2d44ab1ce72b8ae0 | msgs=63-64
2026-04-08T11:22:37 | run=bad90290 | sysDigest=2d44ab1ce72b8ae0 | msgs=63-66
2026-04-08T12:17:37 | run=041ec4cd | sysDigest=2d44ab1ce72b8ae0 | msgs=63-68
2026-04-08T13:12:37 | run=cd72beab | sysDigest=2d44ab1ce72b8ae0 | msgs=63-70
2026-04-08T14:07:37 | run=97cdc066 | sysDigest=2d44ab1ce72b8ae0 | msgs=63-72

# User chat (different digest):
2026-04-08T17:24:35 | run=52cdb6bc | sysDigest=cb8a82a10654fa98 | msgs=65-110
2026-04-08T17:38:06 | run=5f76d3d0 | sysDigest=cb8a82a10654fa98 | msgs=111-128

# ACP announce (yet another different digest from earlier testing):
2026-04-08T07:49:xx | run=announce  | sysDigest=3132b2a94c36e91d | msgs=21

Expected behavior

All code paths for the same session should produce a byte-identical system prompt. The volatile suffix sections below OPENCLAW_CACHE_BOUNDARY must be assembled in the same deterministic order regardless of whether the turn was triggered by chat, heartbeat, or ACP announce.

Proposed fixes (any would work)

  1. Normalize volatile section ordering across all code paths — sort sections deterministically before assembly
  2. Separate volatile context into its own message block — keep the system prompt stable, put per-turn metadata in a separate developer message (as suggested in [Bug]: Changing system prompt causes cache invalidations #43148)
  3. Add notifyPolicy parameter to sessions_spawn — as a workaround for ACP, let callers suppress announce notifications at spawn time (the openclaw tasks notify silent CLI exists but has a race condition since the task completes before the policy can be set)

Current workarounds

  • Heartbeats: Disabled for all Anthropic agents (loses cache warming and liveness monitoring)
  • ACP announces: Using PTY background exec instead of sessions_spawn (loses task tracking and completion notifications)
  • Both workarounds degrade the agent experience to avoid the cost penalty

Related

Environment

  • OpenClaw: latest (2026.4.8, 9ece252)
  • OS: macOS Darwin 25.2.0 (arm64)
  • Models: anthropic/claude-opus-4-6, anthropic/claude-sonnet-4-6
  • Config: cacheRetention: "long", heartbeat every 55m targeting Discord channel session
  • Auth: Claude MAX (OAuth token) hitting api.anthropic.com

Severity

Critical cost impact — silently wastes significant money for any Anthropic user with heartbeats enabled (the default). The longer the context and the more agents you run, the worse it gets. Most users won't notice until they check their bill.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal backlog priority with limited blast radius.clawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions