Claude Code v2.1.128 Caching Regression in Parallel-Team Workloads
Severity: High (10x token cost increase for councils/teams)
Affected Version: v2.1.128 (and v2.1.126)
Works in: v2.1.121
User: @oksanantonova
Date: 2026-05-05
Summary
The v2.1.128 changelog claims "Fixed sub-agent progress summaries missing the prompt cache (~3x cache_creation reduction)". However, empirical analysis of session transcripts shows this fix did not land for parallel-team workloads and actually made caching 10x worse compared to v2.1.121.
Evidence
Like-For-Like Comparison (Same Workload Pattern)
v2.1.121 - SendMessage-heavy sub-agent (agent-a63a0e37df5eaa2da)
- Turns: 121
- SendMessage calls: 4
- Cache miss share: 4%
- Avg cache_creation/turn: 5,648 tokens
- Pattern: Cache builds monotonically, only 2 collapses across entire session
- Status: ✅ Healthy
v2.1.128 - SendMessage-heavy sub-agent (agent-ab587fd4e60ffc856)
- Turns: 52
- SendMessage calls: 12
- Cache miss share: 40%
- Avg cache_creation/turn: 26,000 tokens
- Pattern: Cache collapses every 2-4 turns, oscillates between cold (cc
25K) and warm (cc3K)
- Status: ❌ Severe regression
Regression magnitude: 10x worsening in miss share for identical workload type
Per-Version Historical Data
Analysis of 4,367 sub-agent transcript files across versions v2.1.74 through v2.1.128:
| Version |
Avg cache_creation/turn |
Workload |
Status |
| v2.1.121 |
5,534 tokens |
councils/teams |
baseline |
| v2.1.126 |
8,433 tokens |
councils/teams |
+52% regression |
| v2.1.128 |
22,713 tokens |
councils/teams |
+410% regression |
Low-parallelism v2.1.128 workloads show normal cache behavior (3-4% miss), confirming regression is specific to parallel-team fan-out patterns.
Root Cause Analysis
The regression correlates with inbound SendMessage deliveries (teammate-to-teammate messages). Each inbound message appears to:
- Break the cache_control prefix (checkpoint)
- Force regeneration of all downstream content (cache_creation spike)
- Continue until next cache rebuild
Likely cause: v2.1.126/v2.1.128 added cache_control to sub-agent progress summaries but not to inbound teammate message injections. Teammate messages likely include non-stable fields (timestamp, message-id, routing metadata) that shift the cache prefix hash.
Supporting evidence:
- v2.1.121 agents with 4 SendMessage calls: 2 cache collapses total
- v2.1.128 agents with 12 SendMessage calls: 25 cache collapses out of 52 turns
- Collapse timing correlates with inbound message delivery timestamps (within 1-2 seconds)
User Impact
A typical observation-council spawning 5 parallel agents:
- v2.1.121: ~280K tokens cold-start (5 agents × 56K initial), scales linearly with reuse
- v2.1.128: ~900K+ tokens cold-start due to repeated cache invalidations, no scaling benefit
This explains why users running councils on v2.1.128 rapidly exhaust token budgets on subscription plans.
Reproduction Steps
- Spawn an observation-council with 5 parallel agents
- Inspect session transcripts at
~/.claude/projects/[SESSION_ID]/subagents/*.jsonl
- Extract
cache_creation_input_tokens and cache_read_input_tokens from API calls
- Calculate per-turn cache miss share:
1 - (cache_read / (cache_read + cache_creation))
- Compare to same workload in v2.1.121
Expected result: v2.1.128 shows 35-40% miss share; v2.1.121 shows 3-5% miss share for identical workload.
Workaround
Until this is fixed, avoid parallel-team workloads (councils, observation-council, planning-council) that trigger high SendMessage volume. Use sequential task-based sub-agents instead, which maintain healthy cache behavior at 4% miss rate.
Requested Fix
Ensure cache_control headers are applied to inbound teammate messages in addition to progress summaries. Specifically:
- Tag teammate message injections with cache_control before insertion
- Use stable content hashes (exclude timestamp/message-id from cache key)
- Validate cache miss share drops below 5% for parallel-team workloads in testing
Files for Investigation
- Changelog source:
github.com/anthropics/claude-code main branch, versions 2.1.121, 2.1.126, 2.1.128
Investigation conducted: 2026-05-05 via SRE observation-council with 5 agents (investigator, critic, validator, reviewer, historian) analyzing 4,367 historical transcripts and performing like-for-like version comparison.
Claude Code v2.1.128 Caching Regression in Parallel-Team Workloads
Severity: High (10x token cost increase for councils/teams)
Affected Version: v2.1.128 (and v2.1.126)
Works in: v2.1.121
User: @oksanantonova
Date: 2026-05-05
Summary
The v2.1.128 changelog claims "Fixed sub-agent progress summaries missing the prompt cache (~3x cache_creation reduction)". However, empirical analysis of session transcripts shows this fix did not land for parallel-team workloads and actually made caching 10x worse compared to v2.1.121.
Evidence
Like-For-Like Comparison (Same Workload Pattern)
v2.1.121 - SendMessage-heavy sub-agent (agent-a63a0e37df5eaa2da)
v2.1.128 - SendMessage-heavy sub-agent (agent-ab587fd4e60ffc856)
25K) and warm (cc3K)Regression magnitude: 10x worsening in miss share for identical workload type
Per-Version Historical Data
Analysis of 4,367 sub-agent transcript files across versions v2.1.74 through v2.1.128:
Low-parallelism v2.1.128 workloads show normal cache behavior (3-4% miss), confirming regression is specific to parallel-team fan-out patterns.
Root Cause Analysis
The regression correlates with inbound
SendMessagedeliveries (teammate-to-teammate messages). Each inbound message appears to:Likely cause: v2.1.126/v2.1.128 added
cache_controlto sub-agent progress summaries but not to inbound teammate message injections. Teammate messages likely include non-stable fields (timestamp, message-id, routing metadata) that shift the cache prefix hash.Supporting evidence:
User Impact
A typical observation-council spawning 5 parallel agents:
This explains why users running councils on v2.1.128 rapidly exhaust token budgets on subscription plans.
Reproduction Steps
~/.claude/projects/[SESSION_ID]/subagents/*.jsonlcache_creation_input_tokensandcache_read_input_tokensfrom API calls1 - (cache_read / (cache_read + cache_creation))Expected result: v2.1.128 shows 35-40% miss share; v2.1.121 shows 3-5% miss share for identical workload.
Workaround
Until this is fixed, avoid parallel-team workloads (councils, observation-council, planning-council) that trigger high SendMessage volume. Use sequential task-based sub-agents instead, which maintain healthy cache behavior at 4% miss rate.
Requested Fix
Ensure
cache_controlheaders are applied to inbound teammate messages in addition to progress summaries. Specifically:Files for Investigation
github.com/anthropics/claude-codemain branch, versions 2.1.121, 2.1.126, 2.1.128Investigation conducted: 2026-05-05 via SRE observation-council with 5 agents (investigator, critic, validator, reviewer, historian) analyzing 4,367 historical transcripts and performing like-for-like version comparison.