feat: auto-detect extended context for premium API tiers#1849
Closed
Tranquil-Flow wants to merge 1 commit into
Closed
feat: auto-detect extended context for premium API tiers#1849Tranquil-Flow wants to merge 1 commit into
Tranquil-Flow wants to merge 1 commit into
Conversation
…opic Max) Models like Claude Opus 4.6 support 1M context on premium plans but default to 200K. Rather than hardcoding 1M (which causes non-Max users to hit context errors) or staying at 200K (which penalises Max users with unnecessary early compression), this adds speculative upgrade probing: - Default remains 200K for all Claude models - When the compression threshold is first reached, the compressor speculatively upgrades to the premium tier (1M) instead of compressing - If the next API call succeeds past the old threshold → Max plan confirmed, tier cached for future sessions - If the API call fails with a context error → standard plan confirmed, reverts to 200K, compresses, and caches 200K The probe is transparent to users on both tiers — Max users seamlessly get 1M context, standard users experience one deferred compression instead of an early one, then behave identically to before. New models/providers can be added with a single entry in UPGRADE_CONTEXT_TIERS.
This was referenced Mar 18, 2026
Contributor
|
Hey @Tranquil-Flow, thanks for this — the probe design was genuinely clever and well-implemented. The approach of speculatively upgrading at the compression threshold and confirming via API success/failure was a smart way to handle the tier ambiguity. However, Anthropic has since made the 1M context window generally available for Claude Opus 4.6 and Sonnet 4.6 at standard pricing — no Max/Team/Enterprise plan required anymore. This means the probe mechanism is no longer needed; we can simply update the default context lengths to 1M directly. We'll make that simpler change (bumping |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why not just hardcode 1M?
The 1M context window for Claude 4.6 is not universal — it depends on the API plan:
/extra-usage)See also #1820 which proposes hardcoding 1M.
How it works
Files changed
agent/model_metadata.pyUPGRADE_CONTEXT_TIERSdict +get_upgrade_context_tier()agent/context_compressor.pyshould_compress()/should_compress_preflight(), cache-aware initrun_agent.pytests/agent/test_model_metadata.pytests/agent/test_context_compressor.pyDesign decisions
get_cached_context_lengthcheck prevents standard-plan users from re-probing every sessionkey in model(not reverse): preventsclaude-opus-4from falsely matchingclaude-opus-4-6's upgrade tierAdding new models
One line in
UPGRADE_CONTEXT_TIERS:Related
Test plan