Bug Description
Token overhead in system prompt cache has nearly doubled since v2.1.98
Testing with an HTTP proxy (claude-code-logger approach) on a clean directory with no CLAUDE.md, I measured cache_creation_input_tokens on cold cache across versions:
v2.1.98: ~49.7K (reported in community)
v2.1.100: ~69.9K (reported in community)
v2.1.101: ~72K (reported in community)
v2.1.104 (mine): ~96.5K
Test conditions: empty directory, no CLAUDE.md, single --print "responda apenas: ok" call, cold cache, ANTHROPIC_BASE_URL proxied locally to capture raw SSE usage events (message_start).
The cache_creation_input_tokens value represents real tokens processed server-side — not just billing metadata. For workflows with parallel subagents, each new session pays this cost independently, multiplying the overhead by the number of concurrent agents.
This trend suggests the base system prompt is growing significantly with each release. Would appreciate visibility into what's being added and whether there's a way to opt out of unused tool definitions or context blocks.
Version: v2.1.104
Plan: Max
OS: macOS
Environment Info
- Platform: darwin
- Terminal: iTerm.app
- Version: 2.1.104
- Feedback ID: b9e9f7bf-922c-4ce7-b4c1-e2a98bd17499
Errors
[{"error":"Error: NON-FATAL: Lock acquisition failed for /Users/saulo.oliveira/.local/share/claude/versions/2.1.104 (expected in multi-process scenarios)\n at Tc_ (/$bunfs/root/src/entrypoints/cli.js:2836:2153)\n at Se6 (/$bunfs/root/src/entrypoints/cli.js:2836:1233)\n at processTicksAndRejections (native:7:39)","timestamp":"2026-04-13T17:41:56.752Z"}]
Bug Description
Token overhead in system prompt cache has nearly doubled since v2.1.98
Testing with an HTTP proxy (claude-code-logger approach) on a clean directory with no CLAUDE.md, I measured cache_creation_input_tokens on cold cache across versions:
v2.1.98: ~49.7K (reported in community)
v2.1.100: ~69.9K (reported in community)
v2.1.101: ~72K (reported in community)
v2.1.104 (mine): ~96.5K
Test conditions: empty directory, no CLAUDE.md, single --print "responda apenas: ok" call, cold cache, ANTHROPIC_BASE_URL proxied locally to capture raw SSE usage events (message_start).
The cache_creation_input_tokens value represents real tokens processed server-side — not just billing metadata. For workflows with parallel subagents, each new session pays this cost independently, multiplying the overhead by the number of concurrent agents.
This trend suggests the base system prompt is growing significantly with each release. Would appreciate visibility into what's being added and whether there's a way to opt out of unused tool definitions or context blocks.
Version: v2.1.104
Plan: Max
OS: macOS
Environment Info
Errors
[{"error":"Error: NON-FATAL: Lock acquisition failed for /Users/saulo.oliveira/.local/share/claude/versions/2.1.104 (expected in multi-process scenarios)\n at Tc_ (/$bunfs/root/src/entrypoints/cli.js:2836:2153)\n at Se6 (/$bunfs/root/src/entrypoints/cli.js:2836:1233)\n at processTicksAndRejections (native:7:39)","timestamp":"2026-04-13T17:41:56.752Z"}]