Skip to content

v2.1.128 caching regression in parallel-team workloads (10x token cost increase) #56293

@oksanantonova

Description

@oksanantonova

Claude Code v2.1.128 Caching Regression in Parallel-Team Workloads

Severity: High (10x token cost increase for councils/teams)
Affected Version: v2.1.128 (and v2.1.126)
Works in: v2.1.121
User: @oksanantonova
Date: 2026-05-05

Summary

The v2.1.128 changelog claims "Fixed sub-agent progress summaries missing the prompt cache (~3x cache_creation reduction)". However, empirical analysis of session transcripts shows this fix did not land for parallel-team workloads and actually made caching 10x worse compared to v2.1.121.

Evidence

Like-For-Like Comparison (Same Workload Pattern)

v2.1.121 - SendMessage-heavy sub-agent (agent-a63a0e37df5eaa2da)

  • Turns: 121
  • SendMessage calls: 4
  • Cache miss share: 4%
  • Avg cache_creation/turn: 5,648 tokens
  • Pattern: Cache builds monotonically, only 2 collapses across entire session
  • Status: ✅ Healthy

v2.1.128 - SendMessage-heavy sub-agent (agent-ab587fd4e60ffc856)

  • Turns: 52
  • SendMessage calls: 12
  • Cache miss share: 40%
  • Avg cache_creation/turn: 26,000 tokens
  • Pattern: Cache collapses every 2-4 turns, oscillates between cold (cc25K) and warm (cc3K)
  • Status: ❌ Severe regression

Regression magnitude: 10x worsening in miss share for identical workload type

Per-Version Historical Data

Analysis of 4,367 sub-agent transcript files across versions v2.1.74 through v2.1.128:

Version Avg cache_creation/turn Workload Status
v2.1.121 5,534 tokens councils/teams baseline
v2.1.126 8,433 tokens councils/teams +52% regression
v2.1.128 22,713 tokens councils/teams +410% regression

Low-parallelism v2.1.128 workloads show normal cache behavior (3-4% miss), confirming regression is specific to parallel-team fan-out patterns.

Root Cause Analysis

The regression correlates with inbound SendMessage deliveries (teammate-to-teammate messages). Each inbound message appears to:

  1. Break the cache_control prefix (checkpoint)
  2. Force regeneration of all downstream content (cache_creation spike)
  3. Continue until next cache rebuild

Likely cause: v2.1.126/v2.1.128 added cache_control to sub-agent progress summaries but not to inbound teammate message injections. Teammate messages likely include non-stable fields (timestamp, message-id, routing metadata) that shift the cache prefix hash.

Supporting evidence:

  • v2.1.121 agents with 4 SendMessage calls: 2 cache collapses total
  • v2.1.128 agents with 12 SendMessage calls: 25 cache collapses out of 52 turns
  • Collapse timing correlates with inbound message delivery timestamps (within 1-2 seconds)

User Impact

A typical observation-council spawning 5 parallel agents:

  • v2.1.121: ~280K tokens cold-start (5 agents × 56K initial), scales linearly with reuse
  • v2.1.128: ~900K+ tokens cold-start due to repeated cache invalidations, no scaling benefit

This explains why users running councils on v2.1.128 rapidly exhaust token budgets on subscription plans.

Reproduction Steps

  1. Spawn an observation-council with 5 parallel agents
  2. Inspect session transcripts at ~/.claude/projects/[SESSION_ID]/subagents/*.jsonl
  3. Extract cache_creation_input_tokens and cache_read_input_tokens from API calls
  4. Calculate per-turn cache miss share: 1 - (cache_read / (cache_read + cache_creation))
  5. Compare to same workload in v2.1.121

Expected result: v2.1.128 shows 35-40% miss share; v2.1.121 shows 3-5% miss share for identical workload.

Workaround

Until this is fixed, avoid parallel-team workloads (councils, observation-council, planning-council) that trigger high SendMessage volume. Use sequential task-based sub-agents instead, which maintain healthy cache behavior at 4% miss rate.

Requested Fix

Ensure cache_control headers are applied to inbound teammate messages in addition to progress summaries. Specifically:

  • Tag teammate message injections with cache_control before insertion
  • Use stable content hashes (exclude timestamp/message-id from cache key)
  • Validate cache miss share drops below 5% for parallel-team workloads in testing

Files for Investigation

  • Changelog source: github.com/anthropics/claude-code main branch, versions 2.1.121, 2.1.126, 2.1.128

Investigation conducted: 2026-05-05 via SRE observation-council with 5 agents (investigator, critic, validator, reviewer, historian) analyzing 4,367 historical transcripts and performing like-for-like version comparison.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions