Skip to content

fix(error_classifier): classify 'overloaded' message as FailoverReason.overloaded#14055

Open
ms-alan wants to merge 1 commit into
NousResearch:mainfrom
ms-alan:fix/ISSUE-14038-overloaded-error-classification
Open

fix(error_classifier): classify 'overloaded' message as FailoverReason.overloaded#14055
ms-alan wants to merge 1 commit into
NousResearch:mainfrom
ms-alan:fix/ISSUE-14038-overloaded-error-classification

Conversation

@ms-alan

@ms-alan ms-alan commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

Closes #14038

Summary

When a provider (e.g. Z.AI) returns a 'temporarily overloaded' error (HTTP 200 with code 1305, or HTTP 400), it was being classified as with . After 2 failures, the single API key was marked exhausted, causing all further retries to fail immediately.

The fix adds an 'overloaded' / 'temporarily overloaded' pattern check before the rate_limit check in both and . Overloaded errors now get (retryable, should_fallback) instead of , preventing unnecessary credential rotation.

Changes

  • : Added overloaded pattern check before rate_limit in (~line 594) and (~line 736)

Root cause

contains , but 'overloaded' error messages from providers like Z.AI were matching as generic rate limits. The flag caused the credential pool to mark the API key as exhausted after just 2 transient errors.

…aded before rate_limit

When a provider (e.g. Z.AI) returns 'The service may be temporarily
overloaded, please try again later' as HTTP 200 or HTTP 400, the error
was matched against _RATE_LIMIT_PATTERNS (which includes
'servicequotaexceededexception') and classified as rate_limit with
should_rotate_credential=True. After 2 failures the single API key was
marked exhausted and all further retries failed.

The fix adds an 'overloaded' / 'temporarily overloaded' pattern check
BEFORE the rate_limit check in both _classify_400 and
_classify_by_message. Overloaded errors now get FailoverReason.overloaded
(retryable, should_fallback) instead of rate_limit, preventing
unnecessary credential rotation.

Closes NousResearch#14038
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 22, 2026
pazyork pushed a commit to pazyork/hermes-agent that referenced this pull request Apr 25, 2026
When a provider returns 503 (Service Unavailable) or 529 (Overloaded),
the agent should fall back to an alternate provider immediately.
Credential-pool rotation cannot fix provider-side overload — rotating
keys against the same overloaded servers is useless.

Two minimal changes:
1. error_classifier: set should_fallback=True for 503/529 (consistent
   with rate_limit and billing classifications)
2. run_agent: add independent eager-fallback block for overloaded,
   placed after the rate-limit pool-rotation deferral block. Overloaded
   bypasses the _pool_may_recover_from_rate_limit check because
   credential rotation cannot resolve provider-side capacity issues.

More focused than adding overloaded to the is_rate_limited tuple
and complementary to NousResearch#14055 (message-pattern classification path).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"overloaded" server errors classified as rate_limit, exhausting credential pool

2 participants