fix(error_classifier): classify 'overloaded' message as FailoverReason.overloaded#14055
Open
ms-alan wants to merge 1 commit into
Open
fix(error_classifier): classify 'overloaded' message as FailoverReason.overloaded#14055ms-alan wants to merge 1 commit into
ms-alan wants to merge 1 commit into
Conversation
…aded before rate_limit When a provider (e.g. Z.AI) returns 'The service may be temporarily overloaded, please try again later' as HTTP 200 or HTTP 400, the error was matched against _RATE_LIMIT_PATTERNS (which includes 'servicequotaexceededexception') and classified as rate_limit with should_rotate_credential=True. After 2 failures the single API key was marked exhausted and all further retries failed. The fix adds an 'overloaded' / 'temporarily overloaded' pattern check BEFORE the rate_limit check in both _classify_400 and _classify_by_message. Overloaded errors now get FailoverReason.overloaded (retryable, should_fallback) instead of rate_limit, preventing unnecessary credential rotation. Closes NousResearch#14038
This was referenced Apr 23, 2026
pazyork
pushed a commit
to pazyork/hermes-agent
that referenced
this pull request
Apr 25, 2026
When a provider returns 503 (Service Unavailable) or 529 (Overloaded), the agent should fall back to an alternate provider immediately. Credential-pool rotation cannot fix provider-side overload — rotating keys against the same overloaded servers is useless. Two minimal changes: 1. error_classifier: set should_fallback=True for 503/529 (consistent with rate_limit and billing classifications) 2. run_agent: add independent eager-fallback block for overloaded, placed after the rate-limit pool-rotation deferral block. Overloaded bypasses the _pool_may_recover_from_rate_limit check because credential rotation cannot resolve provider-side capacity issues. More focused than adding overloaded to the is_rate_limited tuple and complementary to NousResearch#14055 (message-pattern classification path).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #14038
Summary
When a provider (e.g. Z.AI) returns a 'temporarily overloaded' error (HTTP 200 with code 1305, or HTTP 400), it was being classified as with . After 2 failures, the single API key was marked exhausted, causing all further retries to fail immediately.
The fix adds an 'overloaded' / 'temporarily overloaded' pattern check before the rate_limit check in both and . Overloaded errors now get (retryable, should_fallback) instead of , preventing unnecessary credential rotation.
Changes
Root cause
contains , but 'overloaded' error messages from providers like Z.AI were matching as generic rate limits. The flag caused the credential pool to mark the API key as exhausted after just 2 transient errors.