Preflight Checklist
What's Wrong?
Today I burned through my entire 5h quota in half an hour of work, which has never happened in months of daily usage.
After digging into the transcripts I found the cause: my Claude Code session was silently switched from claude-opus-4-6 to claude-opus-4-7 mid-session, and the only Opus 4.7 variant exposed in /model is the 1M-context one.
Looking at a single session transcript, the model field changed mid-conversation:
11:58 → 16:17 UTC : claude-opus-4-6
16:17 UTC onward : claude-opus-4-7 ← auto-switched, no prompt, no notice
Over the previous 15 days, every transcript shows claude-opus-4-6 exclusively. April 16 is the first appearance of claude-opus-4-7 in my logs.
Impact on token usage.
Cache-read tokens per 10-minute bucket in the affected session:
| Window |
cache_read tokens |
Notes |
| 13:00–13:10 |
~4.9 M |
Opus 4.6, 200K context |
| 16:10–16:20 |
~13 M |
right before switch |
| 16:40–17:00 |
~60 M |
right after switch to 4.7 [1M] |
| 20:10–20:30 |
~38 M |
still 4.7 [1M] |
Max context per request grew from ~250K (pre-switch) to 650K+ (post-switch), because the 1M variant does not auto-compact at the 200K boundary the way the 200K model does. Each subsequent turn re-reads that full context from cache, so the burn rate scales with context size.
Session totals for the day: 391 M cache-read tokens, 10 M cache-creation, 948 K output across ~14h of combined session time. 5 hours quota gone in half an hour.
Happy to share more details privately if useful for reproduction.
What Should Happen?
In /model, the only Opus 4.7 option is Opus 4.7 (1M context). There is no 200K variant of 4.7 exposed. Users who want to stay on a 200K-context Opus have to downgrade to 4.6, assuming it remains available.
Ask
- Don't silently switch model variants mid-session. If a user started on a 200K model, keep them on a 200K model unless they opt in.
- Expose a 200K variant of Opus 4.7 in /model, like the 4.6 behavior. Not everyone wants to pay the 3–4× cache-read cost of the 1M variant by default.
- Warn users when context grows past a threshold (e.g., 300K) on 1M-context models, since the quota impact is invisible in the current UI.
- Document the cache-read cost implications of the 1M variant — "1M context" sounds like a pure upgrade, but the cost profile is very different.
Error Messages/Logs
Steps to Reproduce
The switch itself is server-side and not user-triggerable.
Claude Model
Opus
Is this a regression?
I don't know
Last Working Version
No response
Claude Code Version
2.1.112
Platform
Other
Operating System
Ubuntu/Debian Linux
Terminal/Shell
Other
Additional Information
Edit April 16th :
Additional context from community reports
Multiple users on r/ClaudeAI report the same burn-rate issue since 2026-04-16 (see thread: "Opus 4.7 is 50% more expensive with context regression?!"). Independent measurements indicate:
- Tokenizer change: Opus 4.7 consumes ~1.35× more tokens than 4.6 for identical input, acknowledged by Anthropic's Boris Cherny on X as "by design for better quality."
- Context recall regression (MRCR v2 benchmark):
- 256K: 4.6 = 91.9% → 4.7 = 59.2%
- 1M: 4.6 = 78.3% → 4.7 = 32.2%
- Anthropic states limits were raised to compensate, but has not disclosed by how much.
The compound effect with the silent mid-session model switch (my case) is a ~4× burn rate increase, which matches my transcript data. For users who never intended to opt into the 1M variant, this is a regression presented as an upgrade.
Preflight Checklist
What's Wrong?
Today I burned through my entire 5h quota in half an hour of work, which has never happened in months of daily usage.
After digging into the transcripts I found the cause: my Claude Code session was silently switched from claude-opus-4-6 to claude-opus-4-7 mid-session, and the only Opus 4.7 variant exposed in /model is the 1M-context one.
Looking at a single session transcript, the model field changed mid-conversation:
11:58 → 16:17 UTC : claude-opus-4-6
16:17 UTC onward : claude-opus-4-7 ← auto-switched, no prompt, no notice
Over the previous 15 days, every transcript shows claude-opus-4-6 exclusively. April 16 is the first appearance of claude-opus-4-7 in my logs.
Impact on token usage.
Cache-read tokens per 10-minute bucket in the affected session:
Max context per request grew from ~250K (pre-switch) to 650K+ (post-switch), because the 1M variant does not auto-compact at the 200K boundary the way the 200K model does. Each subsequent turn re-reads that full context from cache, so the burn rate scales with context size.
Session totals for the day: 391 M cache-read tokens, 10 M cache-creation, 948 K output across ~14h of combined session time. 5 hours quota gone in half an hour.
Happy to share more details privately if useful for reproduction.
What Should Happen?
In /model, the only Opus 4.7 option is Opus 4.7 (1M context). There is no 200K variant of 4.7 exposed. Users who want to stay on a 200K-context Opus have to downgrade to 4.6, assuming it remains available.
Ask
Error Messages/Logs
Steps to Reproduce
The switch itself is server-side and not user-triggerable.
Claude Model
Opus
Is this a regression?
I don't know
Last Working Version
No response
Claude Code Version
2.1.112
Platform
Other
Operating System
Ubuntu/Debian Linux
Terminal/Shell
Other
Additional Information
Edit April 16th :
Additional context from community reports
Multiple users on r/ClaudeAI report the same burn-rate issue since 2026-04-16 (see thread: "Opus 4.7 is 50% more expensive with context regression?!"). Independent measurements indicate:
The compound effect with the silent mid-session model switch (my case) is a ~4× burn rate increase, which matches my transcript data. For users who never intended to opt into the 1M variant, this is a regression presented as an upgrade.