Summary
The error classifier treats all HTTP 429 responses as FailoverReason.rate_limit, regardless of whether the 429 indicates a per-key rate limit or a server-side overload. This causes the wrong recovery strategy to be used.
Some providers (e.g. Z.AI/Zhipu) return HTTP 429 with messages like:
HTTP 429: The service may be temporarily overloaded, please try again later
This is a server-side overload — the entire provider endpoint is struggling, not just this API key hitting a per-key quota. The recovery strategy should be different:
| Reason |
Correct Behavior |
rate_limit |
Retry same credential once, then rotate to next key |
overloaded |
Skip retry, rotate immediately (the whole provider is down) |
Current Behavior
agent/error_classifier.py (line ~551):
if status_code == 429:
return result_fn(
FailoverReason.rate_limit, # ← always rate_limit
retryable=True,
should_rotate_credential=True,
should_fallback=True,
)
The message body is not inspected to distinguish overload from rate limiting.
Additionally, FailoverReason.overloaded exists as an enum value but is never produced by the 429 classification path, and _handle_credential_failover() in run_agent.py has no handler for it — it falls through to the default no-op return.
Proposed Fix
- In
error_classifier.py: inspect the error message for overload patterns ("temporarily overloaded", "server is overloaded", "capacity", etc.) and classify as FailoverReason.overloaded instead of rate_limit
- In
run_agent.py: add an overloaded handler in _handle_credential_failover() that skips the retry-on-same-credential step and rotates immediately (same behavior as billing)
Environment
- Hermes-agent latest
- Observed with Z.AI provider returning
HTTP 429: "The service may be temporarily overloaded, please try again later"
- No
retry_after header or resets_at field in the error response
Summary
The error classifier treats all HTTP 429 responses as
FailoverReason.rate_limit, regardless of whether the 429 indicates a per-key rate limit or a server-side overload. This causes the wrong recovery strategy to be used.Some providers (e.g. Z.AI/Zhipu) return HTTP 429 with messages like:
This is a server-side overload — the entire provider endpoint is struggling, not just this API key hitting a per-key quota. The recovery strategy should be different:
rate_limitoverloadedCurrent Behavior
agent/error_classifier.py(line ~551):The message body is not inspected to distinguish overload from rate limiting.
Additionally,
FailoverReason.overloadedexists as an enum value but is never produced by the 429 classification path, and_handle_credential_failover()inrun_agent.pyhas no handler for it — it falls through to the default no-op return.Proposed Fix
error_classifier.py: inspect the error message for overload patterns ("temporarily overloaded","server is overloaded","capacity", etc.) and classify asFailoverReason.overloadedinstead ofrate_limitrun_agent.py: add anoverloadedhandler in_handle_credential_failover()that skips the retry-on-same-credential step and rotates immediately (same behavior asbilling)Environment
HTTP 429: "The service may be temporarily overloaded, please try again later"retry_afterheader orresets_atfield in the error response