Skip to content

fix: handle Anthropic long-context tier 429 by reducing to 200k#4747

Merged
teknium1 merged 1 commit into
mainfrom
fix/sonnet-long-context-429
Apr 3, 2026
Merged

fix: handle Anthropic long-context tier 429 by reducing to 200k#4747
teknium1 merged 1 commit into
mainfrom
fix/sonnet-long-context-429

Conversation

@teknium1

@teknium1 teknium1 commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

Problem

Claude Max users without "extra usage" enabled hit instant 429 errors when using Sonnet 4.6 because hermes sets the context to 1M (the model's capability). Anthropic returns:

HTTP 429: Extra usage is required for long context requests.

This is NOT a transient rate limit — it's a subscription tier gate. Retrying, credential rotation, and fallback switching are all pointless.

Fix

Detect this specific 429 before the generic rate-limit handler:

  1. Check for both extra usage and long context in the error message
  2. Reduce context_compressor.context_length from 1M to 200k (standard tier)
  3. Trigger context compression to fit within the reduced window
  4. Retry the API call

The reduction is session-scoped only (not persisted to the context length cache). If the user later enables extra usage on their Anthropic subscription, the 1M context comes back automatically next session.

Changes

run_agent.py

  • New handler block in the retry loop's except path, inserted before the generic is_rate_limited check
  • Matches on status_code == 429 + error message containing both "extra usage" and "long context"
  • Reduces compressor context to 200k, triggers compression, breaks to restart

tests/test_long_context_tier_429.py (new)

  • 10 tests covering: detection logic (positive/negative), context reduction math, no-op when already ≤200k, interaction with generic rate-limit path

User Experience

Before: API call failed after 3 retries: HTTP 429: Extra usage is required for long context requests.

After:

⚠️  Anthropic long-context tier requires extra usage — reducing context: 1,000,000 → 200,000 tokens
🗜️ Context reduced to 200,000 tokens (was 1,000,000), retrying...

Then the request succeeds.

@teknium1 teknium1 force-pushed the fix/sonnet-long-context-429 branch from 87bb80c to 2d0ff00 Compare April 3, 2026 08:58
Anthropic returns HTTP 429 'Extra usage is required for long context
requests' when a Claude Max subscription doesn't include the 1M context
tier. This is NOT a transient rate limit — retrying won't help.

Only applies to Sonnet models (Opus 1M is general access). Detects
this specific error before the generic rate-limit handler and:
1. Reduces context_length from 1M to 200k (the standard tier)
2. Triggers context compression to fit
3. Retries with the reduced context

The reduction is session-scoped (not persisted) so it auto-recovers
if the user later enables extra usage on their subscription.

Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
@teknium1 teknium1 force-pushed the fix/sonnet-long-context-429 branch from 2d0ff00 to 3dde84b Compare April 3, 2026 09:04
@teknium1 teknium1 merged commit 8fd9faf into main Apr 3, 2026
5 of 6 checks passed
@teknium1 teknium1 deleted the fix/sonnet-long-context-429 branch April 3, 2026 09:05
jooray added a commit to jooray/hermes-agent that referenced this pull request Apr 3, 2026
* upstream/main: (38 commits)
  fix(memory): Fix ByteRover plugin - run brv query synchronously before LLM call
  chore: release v0.7.0 (2026.4.3) (NousResearch#4812)
  fix: route memory provider tools in sequential execution path (NousResearch#4803)
  fix: persist API server sessions to shared SessionDB (state.db) (NousResearch#4802)
  fix(discord): register /approve and /deny slash commands, wire up button-based approval UI (NousResearch#4800)
  fix: respect per-platform disabled skills in Telegram menu and gateway dispatch (NousResearch#4799)
  fix(gateway): route /approve and /deny through running-agent guard (NousResearch#4798)
  docs: add community FAQ entries — multi-model workflows, WhatsApp binding, verbose control, skills config, thread sessions, migration, install troubleshooting (NousResearch#4797)
  fix: handle None mcp_servers in _get_platform_tools()
  fix(mcp): stability fix pack — reload timeout, shutdown cleanup, event loop handler, OAuth non-blocking (NousResearch#4757)
  fix: prevent compression death spiral from API disconnects (NousResearch#2153) (NousResearch#4750)
  fix: handle Anthropic Sonnet long-context tier 429 by reducing to 200k (NousResearch#4747)
  fix: correct qwen3.6-plus model slug
  fix: handle Anthropic long-context tier 429 by reducing to 200k
  docs(acp): fix zed config
  fix: use get_hermes_home(), consolidate git_cmd, update tests
  Add fork detection and upstream sync to hermes update
  fix(update): handle conflicted git index during hermes update (NousResearch#4735)
  fix: remove redundant restart message from update launchd path
  fix(update): avoid launchd restart race on macOS
  ...
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
NousResearch#4747)

Anthropic returns HTTP 429 'Extra usage is required for long context
requests' when a Claude Max subscription doesn't include the 1M context
tier. This is NOT a transient rate limit — retrying won't help.

Only applies to Sonnet models (Opus 1M is general access). Detects
this specific error before the generic rate-limit handler and:
1. Reduces context_length from 1M to 200k (the standard tier)
2. Triggers context compression to fit
3. Retries with the reduced context

The reduction is session-scoped (not persisted) so it auto-recovers
if the user later enables extra usage on their subscription.

Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
NousResearch#4747)

Anthropic returns HTTP 429 'Extra usage is required for long context
requests' when a Claude Max subscription doesn't include the 1M context
tier. This is NOT a transient rate limit — retrying won't help.

Only applies to Sonnet models (Opus 1M is general access). Detects
this specific error before the generic rate-limit handler and:
1. Reduces context_length from 1M to 200k (the standard tier)
2. Triggers context compression to fit
3. Retries with the reduced context

The reduction is session-scoped (not persisted) so it auto-recovers
if the user later enables extra usage on their subscription.

Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
NousResearch#4747)

Anthropic returns HTTP 429 'Extra usage is required for long context
requests' when a Claude Max subscription doesn't include the 1M context
tier. This is NOT a transient rate limit — retrying won't help.

Only applies to Sonnet models (Opus 1M is general access). Detects
this specific error before the generic rate-limit handler and:
1. Reduces context_length from 1M to 200k (the standard tier)
2. Triggers context compression to fit
3. Retries with the reduced context

The reduction is session-scoped (not persisted) so it auto-recovers
if the user later enables extra usage on their subscription.

Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
NousResearch#4747)

Anthropic returns HTTP 429 'Extra usage is required for long context
requests' when a Claude Max subscription doesn't include the 1M context
tier. This is NOT a transient rate limit — retrying won't help.

Only applies to Sonnet models (Opus 1M is general access). Detects
this specific error before the generic rate-limit handler and:
1. Reduces context_length from 1M to 200k (the standard tier)
2. Triggers context compression to fit
3. Retries with the reduced context

The reduction is session-scoped (not persisted) so it auto-recovers
if the user later enables extra usage on their subscription.

Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
NousResearch#4747)

Anthropic returns HTTP 429 'Extra usage is required for long context
requests' when a Claude Max subscription doesn't include the 1M context
tier. This is NOT a transient rate limit — retrying won't help.

Only applies to Sonnet models (Opus 1M is general access). Detects
this specific error before the generic rate-limit handler and:
1. Reduces context_length from 1M to 200k (the standard tier)
2. Triggers context compression to fit
3. Retries with the reduced context

The reduction is session-scoped (not persisted) so it auto-recovers
if the user later enables extra usage on their subscription.

Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant