v2.1.128 caching regression in parallel-team workloads (10x token cost increase)

# Claude Code v2.1.128 Caching Regression in Parallel-Team Workloads

**Severity**: High (10x token cost increase for councils/teams)  
**Affected Version**: v2.1.128 (and v2.1.126)  
**Works in**: v2.1.121  
**User**: @oksanantonova  
**Date**: 2026-05-05

## Summary

The v2.1.128 changelog claims "Fixed sub-agent progress summaries missing the prompt cache (~3x cache_creation reduction)". However, empirical analysis of session transcripts shows this fix **did not land for parallel-team workloads** and actually made caching **10x worse** compared to v2.1.121.

## Evidence

### Like-For-Like Comparison (Same Workload Pattern)

**v2.1.121 - SendMessage-heavy sub-agent (agent-a63a0e37df5eaa2da)**
- Turns: 121
- SendMessage calls: 4
- Cache miss share: **4%**
- Avg cache_creation/turn: **5,648 tokens**
- Pattern: Cache builds monotonically, only 2 collapses across entire session
- Status: ✅ Healthy

**v2.1.128 - SendMessage-heavy sub-agent (agent-ab587fd4e60ffc856)**
- Turns: 52
- SendMessage calls: 12
- Cache miss share: **40%**
- Avg cache_creation/turn: **26,000 tokens**
- Pattern: Cache collapses every 2-4 turns, oscillates between cold (cc~25K) and warm (cc~3K)
- Status: ❌ Severe regression

**Regression magnitude: 10x worsening in miss share for identical workload type**

### Per-Version Historical Data

Analysis of 4,367 sub-agent transcript files across versions v2.1.74 through v2.1.128:

| Version | Avg cache_creation/turn | Workload | Status |
|---------|-------------------------|----------|--------|
| v2.1.121 | 5,534 tokens | councils/teams | baseline |
| v2.1.126 | 8,433 tokens | councils/teams | +52% regression |
| v2.1.128 | 22,713 tokens | councils/teams | +410% regression |

Low-parallelism v2.1.128 workloads show normal cache behavior (3-4% miss), confirming regression is **specific to parallel-team fan-out patterns**.

## Root Cause Analysis

The regression correlates with inbound `SendMessage` deliveries (teammate-to-teammate messages). Each inbound message appears to:

1. Break the cache_control prefix (checkpoint)
2. Force regeneration of all downstream content (cache_creation spike)
3. Continue until next cache rebuild

**Likely cause**: v2.1.126/v2.1.128 added `cache_control` to sub-agent progress summaries but **not** to inbound teammate message injections. Teammate messages likely include non-stable fields (timestamp, message-id, routing metadata) that shift the cache prefix hash.

**Supporting evidence**:
- v2.1.121 agents with 4 SendMessage calls: 2 cache collapses total
- v2.1.128 agents with 12 SendMessage calls: 25 cache collapses out of 52 turns
- Collapse timing correlates with inbound message delivery timestamps (within 1-2 seconds)

## User Impact

A typical observation-council spawning 5 parallel agents:
- **v2.1.121**: ~280K tokens cold-start (5 agents × 56K initial), scales linearly with reuse
- **v2.1.128**: ~900K+ tokens cold-start due to repeated cache invalidations, no scaling benefit

This explains why users running councils on v2.1.128 rapidly exhaust token budgets on subscription plans.

## Reproduction Steps

1. Spawn an observation-council with 5 parallel agents
2. Inspect session transcripts at `~/.claude/projects/[SESSION_ID]/subagents/*.jsonl`
3. Extract `cache_creation_input_tokens` and `cache_read_input_tokens` from API calls
4. Calculate per-turn cache miss share: `1 - (cache_read / (cache_read + cache_creation))`
5. Compare to same workload in v2.1.121

**Expected result**: v2.1.128 shows 35-40% miss share; v2.1.121 shows 3-5% miss share for identical workload.

## Workaround

Until this is fixed, avoid parallel-team workloads (councils, observation-council, planning-council) that trigger high SendMessage volume. Use sequential task-based sub-agents instead, which maintain healthy cache behavior at 4% miss rate.

## Requested Fix

Ensure `cache_control` headers are applied to **inbound teammate messages** in addition to progress summaries. Specifically:
- Tag teammate message injections with cache_control before insertion
- Use stable content hashes (exclude timestamp/message-id from cache key)
- Validate cache miss share drops below 5% for parallel-team workloads in testing

## Files for Investigation

- Changelog source: `github.com/anthropics/claude-code` main branch, versions 2.1.121, 2.1.126, 2.1.128

---

**Investigation conducted**: 2026-05-05 via SRE observation-council with 5 agents (investigator, critic, validator, reviewer, historian) analyzing 4,367 historical transcripts and performing like-for-like version comparison.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.1.128 caching regression in parallel-team workloads (10x token cost increase) #56293

Claude Code v2.1.128 Caching Regression in Parallel-Team Workloads

Summary

Evidence

Like-For-Like Comparison (Same Workload Pattern)

Per-Version Historical Data

Root Cause Analysis

User Impact

Reproduction Steps

Workaround

Requested Fix

Files for Investigation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Version	Avg cache_creation/turn	Workload	Status
v2.1.121	5,534 tokens	councils/teams	baseline
v2.1.126	8,433 tokens	councils/teams	+52% regression
v2.1.128	22,713 tokens	councils/teams	+410% regression

v2.1.128 caching regression in parallel-team workloads (10x token cost increase) #56293

Description

Claude Code v2.1.128 Caching Regression in Parallel-Team Workloads

Summary

Evidence

Like-For-Like Comparison (Same Workload Pattern)

Per-Version Historical Data

Root Cause Analysis

User Impact

Reproduction Steps

Workaround

Requested Fix

Files for Investigation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions