[BUG] Silent model switch to Opus 4.7 [1M] mid-session caused ~4× quota burn

### Preflight Checklist

- [x] I have searched [existing issues](https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet
- [x] This is a single bug report (please file separate reports for different bugs)
- [x] I am using the latest version of Claude Code

### What's Wrong?

Today I burned through my entire 5h quota in half an hour of work, which has never happened in months of daily usage. 
After digging into the transcripts I found the cause: my Claude Code session was silently switched from claude-opus-4-6 to claude-opus-4-7 mid-session, and the only Opus 4.7 variant exposed in /model is the 1M-context one.
Looking at a single session transcript, the model field changed mid-conversation:
11:58 → 16:17 UTC : claude-opus-4-6
16:17 UTC onward  : claude-opus-4-7   ← auto-switched, no prompt, no notice

Over the previous 15 days, every transcript shows claude-opus-4-6 exclusively. April 16 is the first appearance of claude-opus-4-7 in my logs.

Impact on token usage.

Cache-read tokens per 10-minute bucket in the affected session:

  | Window        | cache_read tokens | Notes                              |
  |---------------|-------------------|------------------------------------|
  | 13:00–13:10   | ~4.9 M            | Opus 4.6, 200K context             |
  | 16:10–16:20   | ~13 M             | right before switch                |
  | 16:40–17:00   | ~60 M             | right after switch to 4.7 [1M]     |
  | 20:10–20:30   | ~38 M             | still 4.7 [1M]                     |

Max context per request grew from ~250K (pre-switch) to 650K+ (post-switch), because the 1M variant does not auto-compact at the 200K boundary the way the 200K model does. Each subsequent turn re-reads that full context from cache, so the burn rate scales with context size.

Session totals for the day: 391 M cache-read tokens, 10 M cache-creation, 948 K output across ~14h of combined session time. 5 hours quota gone in half an hour.




Happy to share more details privately if useful for reproduction.

### What Should Happen?

In /model, the only Opus 4.7 option is Opus 4.7 (1M context). There is no 200K variant of 4.7 exposed. Users who want to stay on a 200K-context Opus have to downgrade to 4.6, assuming it remains available.

Ask

1. Don't silently switch model variants mid-session. If a user started on a 200K model, keep them on a 200K model unless they opt in.
2. Expose a 200K variant of Opus 4.7 in /model, like the 4.6 behavior. Not everyone wants to pay the 3–4× cache-read cost of the 1M variant by default.
3. Warn users when context grows past a threshold (e.g., 300K) on 1M-context models, since the quota impact is invisible in the current UI.
4. Document the cache-read cost implications of the 1M variant — "1M context" sounds like a pure upgrade, but the cost profile is very different.

### Error Messages/Logs

```shell

```

### Steps to Reproduce

The switch itself is server-side and not user-triggerable.

### Claude Model

Opus

### Is this a regression?

I don't know

### Last Working Version

_No response_

### Claude Code Version

2.1.112 

### Platform

Other

### Operating System

Ubuntu/Debian Linux

### Terminal/Shell

Other

### Additional Information

Edit April 16th : 

Additional context from community reports

Multiple users on r/ClaudeAI report the same burn-rate issue since 2026-04-16 (see thread: "Opus 4.7 is 50% more expensive with context regression?!"). Independent measurements indicate:

- Tokenizer change: Opus 4.7 consumes ~1.35× more tokens than 4.6 for identical input, acknowledged by Anthropic's Boris Cherny on X as "by design for better quality."
- Context recall regression (MRCR v2 benchmark):
- 256K: 4.6 = 91.9% → 4.7 = 59.2%
- 1M: 4.6 = 78.3% → 4.7 = 32.2%
- Anthropic states limits were raised to compensate, but has not disclosed by how much.

The compound effect with the silent mid-session model switch (my case) is a ~4× burn rate increase, which matches my transcript data. For users who never intended to opt into the 1M variant, this is a regression presented as an upgrade.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Silent model switch to Opus 4.7 [1M] mid-session caused ~4× quota burn #49541

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Window	cache_read tokens	Notes
13:00–13:10	~4.9 M	Opus 4.6, 200K context
16:10–16:20	~13 M	right before switch
16:40–17:00	~60 M	right after switch to 4.7 [1M]
20:10–20:30	~38 M	still 4.7 [1M]

[BUG] Silent model switch to Opus 4.7 [1M] mid-session caused ~4× quota burn #49541

Description

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions