feat: auto-detect extended context for premium API tiers by Tranquil-Flow · Pull Request #1849 · NousResearch/hermes-agent

Tranquil-Flow · 2026-03-18T00:16:54Z

Summary

Adds speculative upgrade probing for models that support higher context on premium plans (e.g., Claude Opus 4.6: 200K default → 1M on Anthropic Max)
Default context remains 200K — Max/Team/Enterprise users are automatically detected and upgraded to 1M without any configuration
Standard-plan users experience no disruption — the probe is transparent and the result is cached for future sessions

Why not just hardcode 1M?

The 1M context window for Claude 4.6 is not universal — it depends on the API plan:

Tier	Context	Hardcode 1M	This PR (auto-detect)
Max / Team / Enterprise	1M auto	✅ Works	✅ Works
Pro (opted in via `/extra-usage`)	1M	✅ Works	✅ Works
Pro (not opted in)	200K	❌ Breaks — hits wall at 200K	✅ Works
Free / standard API	200K	❌ Breaks	✅ Works

See also #1820 which proposes hardcoding 1M.

How it works

When the compression threshold (50% of 200K = 100K) is first reached, the compressor speculatively raises the context limit to the upgrade tier (1M) instead of compressing
If the next API call succeeds past the old threshold → premium tier confirmed, 1M cached
If the API call fails with a context error → standard tier confirmed, reverts to 200K, compresses normally, 200K cached
Future sessions load from cache — no re-probing

Files changed

File	Change
`agent/model_metadata.py`	`UPGRADE_CONTEXT_TIERS` dict + `get_upgrade_context_tier()`
`agent/context_compressor.py`	Upgrade probe activation in `should_compress()` / `should_compress_preflight()`, cache-aware init
`run_agent.py`	Probe confirmation on success path, probe revert on context error (separate branch from normal step-down)
`tests/agent/test_model_metadata.py`	7 new tests for upgrade tiers
`tests/agent/test_context_compressor.py`	7 new tests for probe lifecycle

Design decisions

Probe at compression threshold, not session start: avoids wasted API calls for short sessions
Cache suppresses re-probing: get_cached_context_length check prevents standard-plan users from re-probing every session
Fuzzy match only checks key in model (not reverse): prevents claude-opus-4 from falsely matching claude-opus-4-6's upgrade tier
Error handler uses separate branch for probe revert: avoids incorrect step-down (200K → 128K) that would occur if the parsed limit equals the pre-upgrade default

Adding new models

One line in UPGRADE_CONTEXT_TIERS:

UPGRADE_CONTEXT_TIERS = {
    "anthropic/claude-opus-4.6": 1_000_000,
    "claude-opus-4-6": 1_000_000,
    # Add new models here
}

Test plan

118 tests passing (model_metadata + context_compressor + context overflow)
Manual test: verify Max-plan user sees 1M context cached after first long session
Manual test: verify standard-plan user gets transparent compression at ~200K

…opic Max) Models like Claude Opus 4.6 support 1M context on premium plans but default to 200K. Rather than hardcoding 1M (which causes non-Max users to hit context errors) or staying at 200K (which penalises Max users with unnecessary early compression), this adds speculative upgrade probing: - Default remains 200K for all Claude models - When the compression threshold is first reached, the compressor speculatively upgrades to the premium tier (1M) instead of compressing - If the next API call succeeds past the old threshold → Max plan confirmed, tier cached for future sessions - If the API call fails with a context error → standard plan confirmed, reverts to 200K, compresses, and caches 200K The probe is transparent to users on both tiers — Max users seamlessly get 1M context, standard users experience one deferred compression instead of an early one, then behave identically to before. New models/providers can be added with a single entry in UPGRADE_CONTEXT_TIERS.

teknium1 · 2026-03-29T22:48:10Z

Hey @Tranquil-Flow, thanks for this — the probe design was genuinely clever and well-implemented. The approach of speculatively upgrading at the compression threshold and confirming via API success/failure was a smart way to handle the tier ambiguity.

However, Anthropic has since made the 1M context window generally available for Claude Opus 4.6 and Sonnet 4.6 at standard pricing — no Max/Team/Enterprise plan required anymore. This means the probe mechanism is no longer needed; we can simply update the default context lengths to 1M directly.

We'll make that simpler change (bumping DEFAULT_CONTEXT_LENGTHS) instead. Thanks for the contribution!

This was referenced Mar 18, 2026

feat(agent): set claude-sonnet-4.6 and claude-opus-4.6 context to 1M tokens #1820

Closed

fix: auto-invalidate stale context length cache when defaults change #1852

Closed

teknium1 closed this Mar 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: auto-detect extended context for premium API tiers#1849

feat: auto-detect extended context for premium API tiers#1849
Tranquil-Flow wants to merge 1 commit into
NousResearch:mainfrom
Tranquil-Flow:feat/auto-detect-max-context

Tranquil-Flow commented Mar 18, 2026 •

edited

Loading

Uh oh!

teknium1 commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Tranquil-Flow commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why not just hardcode 1M?

How it works

Files changed

Design decisions

Adding new models

Related

Test plan

Uh oh!

teknium1 commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Tranquil-Flow commented Mar 18, 2026 •

edited

Loading