Skip to content

feat(agent): set claude-sonnet-4.6 and claude-opus-4.6 context to 1M tokens#1820

Closed
deboste wants to merge 1 commit into
NousResearch:mainfrom
deboste:fix/claude-4-6-context-1m-tokens
Closed

feat(agent): set claude-sonnet-4.6 and claude-opus-4.6 context to 1M tokens#1820
deboste wants to merge 1 commit into
NousResearch:mainfrom
deboste:fix/claude-4-6-context-1m-tokens

Conversation

@deboste

@deboste deboste commented Mar 17, 2026

Copy link
Copy Markdown

Claude Sonnet 4.6 and Claude Opus 4.6 natively support a 1M token context window via the Anthropic API without any feature flag.

Other Claude 4.x models (4, 4.1, 4.5) remain at 200K — 1M context is either behind a flag or not available for those versions.

Updated entries in agent/model_metadata.py:

  • anthropic/claude-opus-4.6: 200_000 -> 1_000_000
  • anthropic/claude-sonnet-4.6: 200_000 -> 1_000_000
  • claude-opus-4-6 (bare model ID): 200_000 -> 1_000_000
  • claude-sonnet-4-6 (bare model ID): 200_000 -> 1_000_000

Ref: https://docs.anthropic.com/en/docs/about-claude/models

…okens

Claude Sonnet 4.6 and Claude Opus 4.6 natively support a 1M token context
window via the Anthropic API without any feature flag.

Other Claude 4.x models (4, 4.1, 4.5) remain at 200K — 1M context is
either behind a flag or not available for those versions.

Updated entries in agent/model_metadata.py:
- anthropic/claude-opus-4.6: 200_000 -> 1_000_000
- anthropic/claude-sonnet-4.6: 200_000 -> 1_000_000
- claude-opus-4-6 (bare model ID): 200_000 -> 1_000_000
- claude-sonnet-4-6 (bare model ID): 200_000 -> 1_000_000

Ref: https://docs.anthropic.com/en/docs/about-claude/models
@Tranquil-Flow

Copy link
Copy Markdown
Contributor

FYI — #1849 takes a different approach to this same problem: rather than hardcoding 1M, it auto-detects the user's actual context limit via speculative probing.

This matters because not all API tiers may support 1M (e.g., standard Anthropic API plans are 200K, while Max plan gets 1M). Hardcoding 1M would cause non-Max users to hit context errors at ~200K, triggering error handling + compression instead of proactive compression at the threshold.

The auto-detect approach in #1849:

  • Defaults to 200K (safe for all users)
  • Speculatively upgrades to 1M when the compression threshold is first reached
  • If the API call succeeds → 1M confirmed and cached for future sessions
  • If it fails → reverts to 200K, compresses, caches 200K

If all Claude 4.6 users truly get 1M natively (as the Anthropic docs suggest), then the probe confirms it on the first long session and both approaches converge. But if some users are still on 200K, hardcoding 1M would break their experience.

@Tranquil-Flow

Copy link
Copy Markdown
Contributor

Update: After further research, hardcoding 1M may break users on certain tiers.

The 1M context window for Claude 4.6 is not universal — it depends on the API plan:

  • Max / Team / Enterprise: 1M automatic
  • Pro: 1M only after opting in via /extra-usage in Claude Code
  • Free / standard API: 200K

Hardcoding 1M would cause Pro users (who haven't opted in) and free-tier users to hit context errors at ~200K, triggering error handling + compression rather than proactive compression at the threshold.

#1849 handles this by auto-detecting the user's actual limit via speculative probing — defaults to 200K, upgrades to 1M only when confirmed by a successful API call, and caches the result for future sessions. All tiers work correctly without configuration.

@teknium1

Copy link
Copy Markdown
Contributor

Superseded by PR #2158 which resolves claude 4.6 context lengths dynamically via models.dev and Anthropic API, rather than hardcoding. Thanks for flagging this @deboste!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants