Skip to content

feat: auto-detect extended context for premium API tiers#1849

Closed
Tranquil-Flow wants to merge 1 commit into
NousResearch:mainfrom
Tranquil-Flow:feat/auto-detect-max-context
Closed

feat: auto-detect extended context for premium API tiers#1849
Tranquil-Flow wants to merge 1 commit into
NousResearch:mainfrom
Tranquil-Flow:feat/auto-detect-max-context

Conversation

@Tranquil-Flow

@Tranquil-Flow Tranquil-Flow commented Mar 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds speculative upgrade probing for models that support higher context on premium plans (e.g., Claude Opus 4.6: 200K default → 1M on Anthropic Max)
  • Default context remains 200K — Max/Team/Enterprise users are automatically detected and upgraded to 1M without any configuration
  • Standard-plan users experience no disruption — the probe is transparent and the result is cached for future sessions

Why not just hardcode 1M?

The 1M context window for Claude 4.6 is not universal — it depends on the API plan:

Tier Context Hardcode 1M This PR (auto-detect)
Max / Team / Enterprise 1M auto ✅ Works ✅ Works
Pro (opted in via /extra-usage) 1M ✅ Works ✅ Works
Pro (not opted in) 200K ❌ Breaks — hits wall at 200K ✅ Works
Free / standard API 200K ❌ Breaks ✅ Works

See also #1820 which proposes hardcoding 1M.

How it works

  1. When the compression threshold (50% of 200K = 100K) is first reached, the compressor speculatively raises the context limit to the upgrade tier (1M) instead of compressing
  2. If the next API call succeeds past the old threshold → premium tier confirmed, 1M cached
  3. If the API call fails with a context error → standard tier confirmed, reverts to 200K, compresses normally, 200K cached
  4. Future sessions load from cache — no re-probing

Files changed

File Change
agent/model_metadata.py UPGRADE_CONTEXT_TIERS dict + get_upgrade_context_tier()
agent/context_compressor.py Upgrade probe activation in should_compress() / should_compress_preflight(), cache-aware init
run_agent.py Probe confirmation on success path, probe revert on context error (separate branch from normal step-down)
tests/agent/test_model_metadata.py 7 new tests for upgrade tiers
tests/agent/test_context_compressor.py 7 new tests for probe lifecycle

Design decisions

  • Probe at compression threshold, not session start: avoids wasted API calls for short sessions
  • Cache suppresses re-probing: get_cached_context_length check prevents standard-plan users from re-probing every session
  • Fuzzy match only checks key in model (not reverse): prevents claude-opus-4 from falsely matching claude-opus-4-6's upgrade tier
  • Error handler uses separate branch for probe revert: avoids incorrect step-down (200K → 128K) that would occur if the parsed limit equals the pre-upgrade default

Adding new models

One line in UPGRADE_CONTEXT_TIERS:

UPGRADE_CONTEXT_TIERS = {
    "anthropic/claude-opus-4.6": 1_000_000,
    "claude-opus-4-6": 1_000_000,
    # Add new models here
}

Related

Test plan

  • 118 tests passing (model_metadata + context_compressor + context overflow)
  • Manual test: verify Max-plan user sees 1M context cached after first long session
  • Manual test: verify standard-plan user gets transparent compression at ~200K

…opic Max)

Models like Claude Opus 4.6 support 1M context on premium plans but
default to 200K.  Rather than hardcoding 1M (which causes non-Max users
to hit context errors) or staying at 200K (which penalises Max users
with unnecessary early compression), this adds speculative upgrade
probing:

- Default remains 200K for all Claude models
- When the compression threshold is first reached, the compressor
  speculatively upgrades to the premium tier (1M) instead of compressing
- If the next API call succeeds past the old threshold → Max plan
  confirmed, tier cached for future sessions
- If the API call fails with a context error → standard plan confirmed,
  reverts to 200K, compresses, and caches 200K

The probe is transparent to users on both tiers — Max users seamlessly
get 1M context, standard users experience one deferred compression
instead of an early one, then behave identically to before.

New models/providers can be added with a single entry in
UPGRADE_CONTEXT_TIERS.
@teknium1

Copy link
Copy Markdown
Contributor

Hey @Tranquil-Flow, thanks for this — the probe design was genuinely clever and well-implemented. The approach of speculatively upgrading at the compression threshold and confirming via API success/failure was a smart way to handle the tier ambiguity.

However, Anthropic has since made the 1M context window generally available for Claude Opus 4.6 and Sonnet 4.6 at standard pricing — no Max/Team/Enterprise plan required anymore. This means the probe mechanism is no longer needed; we can simply update the default context lengths to 1M directly.

We'll make that simpler change (bumping DEFAULT_CONTEXT_LENGTHS) instead. Thanks for the contribution!

@teknium1 teknium1 closed this Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants