fix: handle Anthropic long-context tier 429 by reducing to 200k#4747
Merged
Conversation
87bb80c to
2d0ff00
Compare
Anthropic returns HTTP 429 'Extra usage is required for long context requests' when a Claude Max subscription doesn't include the 1M context tier. This is NOT a transient rate limit — retrying won't help. Only applies to Sonnet models (Opus 1M is general access). Detects this specific error before the generic rate-limit handler and: 1. Reduces context_length from 1M to 200k (the standard tier) 2. Triggers context compression to fit 3. Retries with the reduced context The reduction is session-scoped (not persisted) so it auto-recovers if the user later enables extra usage on their subscription. Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
2d0ff00 to
3dde84b
Compare
jooray
added a commit
to jooray/hermes-agent
that referenced
this pull request
Apr 3, 2026
* upstream/main: (38 commits) fix(memory): Fix ByteRover plugin - run brv query synchronously before LLM call chore: release v0.7.0 (2026.4.3) (NousResearch#4812) fix: route memory provider tools in sequential execution path (NousResearch#4803) fix: persist API server sessions to shared SessionDB (state.db) (NousResearch#4802) fix(discord): register /approve and /deny slash commands, wire up button-based approval UI (NousResearch#4800) fix: respect per-platform disabled skills in Telegram menu and gateway dispatch (NousResearch#4799) fix(gateway): route /approve and /deny through running-agent guard (NousResearch#4798) docs: add community FAQ entries — multi-model workflows, WhatsApp binding, verbose control, skills config, thread sessions, migration, install troubleshooting (NousResearch#4797) fix: handle None mcp_servers in _get_platform_tools() fix(mcp): stability fix pack — reload timeout, shutdown cleanup, event loop handler, OAuth non-blocking (NousResearch#4757) fix: prevent compression death spiral from API disconnects (NousResearch#2153) (NousResearch#4750) fix: handle Anthropic Sonnet long-context tier 429 by reducing to 200k (NousResearch#4747) fix: correct qwen3.6-plus model slug fix: handle Anthropic long-context tier 429 by reducing to 200k docs(acp): fix zed config fix: use get_hermes_home(), consolidate git_cmd, update tests Add fork detection and upstream sync to hermes update fix(update): handle conflicted git index during hermes update (NousResearch#4735) fix: remove redundant restart message from update launchd path fix(update): avoid launchd restart race on macOS ...
This was referenced Apr 11, 2026
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 27, 2026
NousResearch#4747) Anthropic returns HTTP 429 'Extra usage is required for long context requests' when a Claude Max subscription doesn't include the 1M context tier. This is NOT a transient rate limit — retrying won't help. Only applies to Sonnet models (Opus 1M is general access). Detects this specific error before the generic rate-limit handler and: 1. Reduces context_length from 1M to 200k (the standard tier) 2. Triggers context compression to fit 3. Retries with the reduced context The reduction is session-scoped (not persisted) so it auto-recovers if the user later enables extra usage on their subscription. Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
NousResearch#4747) Anthropic returns HTTP 429 'Extra usage is required for long context requests' when a Claude Max subscription doesn't include the 1M context tier. This is NOT a transient rate limit — retrying won't help. Only applies to Sonnet models (Opus 1M is general access). Detects this specific error before the generic rate-limit handler and: 1. Reduces context_length from 1M to 200k (the standard tier) 2. Triggers context compression to fit 3. Retries with the reduced context The reduction is session-scoped (not persisted) so it auto-recovers if the user later enables extra usage on their subscription. Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
NousResearch#4747) Anthropic returns HTTP 429 'Extra usage is required for long context requests' when a Claude Max subscription doesn't include the 1M context tier. This is NOT a transient rate limit — retrying won't help. Only applies to Sonnet models (Opus 1M is general access). Detects this specific error before the generic rate-limit handler and: 1. Reduces context_length from 1M to 200k (the standard tier) 2. Triggers context compression to fit 3. Retries with the reduced context The reduction is session-scoped (not persisted) so it auto-recovers if the user later enables extra usage on their subscription. Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
NousResearch#4747) Anthropic returns HTTP 429 'Extra usage is required for long context requests' when a Claude Max subscription doesn't include the 1M context tier. This is NOT a transient rate limit — retrying won't help. Only applies to Sonnet models (Opus 1M is general access). Detects this specific error before the generic rate-limit handler and: 1. Reduces context_length from 1M to 200k (the standard tier) 2. Triggers context compression to fit 3. Retries with the reduced context The reduction is session-scoped (not persisted) so it auto-recovers if the user later enables extra usage on their subscription. Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
NousResearch#4747) Anthropic returns HTTP 429 'Extra usage is required for long context requests' when a Claude Max subscription doesn't include the 1M context tier. This is NOT a transient rate limit — retrying won't help. Only applies to Sonnet models (Opus 1M is general access). Detects this specific error before the generic rate-limit handler and: 1. Reduces context_length from 1M to 200k (the standard tier) 2. Triggers context compression to fit 3. Retries with the reduced context The reduction is session-scoped (not persisted) so it auto-recovers if the user later enables extra usage on their subscription. Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Claude Max users without "extra usage" enabled hit instant 429 errors when using Sonnet 4.6 because hermes sets the context to 1M (the model's capability). Anthropic returns:
This is NOT a transient rate limit — it's a subscription tier gate. Retrying, credential rotation, and fallback switching are all pointless.
Fix
Detect this specific 429 before the generic rate-limit handler:
extra usageandlong contextin the error messagecontext_compressor.context_lengthfrom 1M to 200k (standard tier)The reduction is session-scoped only (not persisted to the context length cache). If the user later enables extra usage on their Anthropic subscription, the 1M context comes back automatically next session.
Changes
run_agent.pyexceptpath, inserted before the genericis_rate_limitedcheckstatus_code == 429+ error message containing both "extra usage" and "long context"tests/test_long_context_tier_429.py(new)User Experience
Before:
API call failed after 3 retries: HTTP 429: Extra usage is required for long context requests.After:
Then the request succeeds.