Bug Description
When a provider (e.g., Z.AI) returns a "temporarily overloaded" error (HTTP 200 with code 1305), Hermes classifies it as rate_limit with should_rotate_credential=True. This causes the credential pool to mark the API key as "exhausted" after just 2 errors, making all further retries useless.
Steps to Reproduce
- Configure a provider that occasionally returns overloaded errors (e.g., Z.AI with a single API key)
- Trigger multiple requests during peak load
- Provider returns:
HTTP 200: The service may be temporarily overloaded, please try again later
- After 2 errors, the single API key is marked exhausted
- All subsequent retries fail immediately with no valid credential
Expected Behavior
"Overloaded" errors should be classified as server-side issues (FailoverReason.overloaded), NOT as rate limits. The credential is valid — the server is just busy. Rotating credentials is counterproductive and exhausts the pool unnecessarily.
Suggested Fix
In agent/error_classifier.py, add an overloaded check before the rate_limit check in both _classify_by_message functions:
# Overloaded patterns — server-side overload, NOT a credential/billing issue.
# Must come before rate_limit check to avoid rotating credentials unnecessarily.
if "overloaded" in error_msg or "temporarily overloaded" in error_msg:
return result_fn(
FailoverReason.overloaded,
retryable=True,
)
# Rate limit patterns
if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
...
Also add overloaded patterns to _RATE_LIMIT_PATTERNS:
"servicequotaexceededexception",
"overloaded",
"temporarily overloaded",
]
Related
Retry parameters (max_retries, base_delay, max_delay) are hardcoded in run_agent.py. Making them configurable via config.yaml would help users tune retry behavior for providers with volatile availability without editing source code (changes are lost on hermes update).
Environment
- Provider: Z.AI (api.z.ai) GLM Coding Max Plan
- Error:
HTTP 200 with code 1305 "The service may be temporarily overloaded, please try again later"
Bug Description
When a provider (e.g., Z.AI) returns a "temporarily overloaded" error (HTTP 200 with code 1305), Hermes classifies it as
rate_limitwithshould_rotate_credential=True. This causes the credential pool to mark the API key as "exhausted" after just 2 errors, making all further retries useless.Steps to Reproduce
HTTP 200: The service may be temporarily overloaded, please try again laterExpected Behavior
"Overloaded" errors should be classified as server-side issues (
FailoverReason.overloaded), NOT as rate limits. The credential is valid — the server is just busy. Rotating credentials is counterproductive and exhausts the pool unnecessarily.Suggested Fix
In
agent/error_classifier.py, add an overloaded check before the rate_limit check in both_classify_by_messagefunctions:Also add overloaded patterns to
_RATE_LIMIT_PATTERNS:Related
Retry parameters (
max_retries,base_delay,max_delay) are hardcoded inrun_agent.py. Making them configurable viaconfig.yamlwould help users tune retry behavior for providers with volatile availability without editing source code (changes are lost onhermes update).Environment
HTTP 200with code 1305 "The service may be temporarily overloaded, please try again later"