fix(aux): trigger fallback on 429 rate-limit errors in auxiliary client by zeejaytan · Pull Request #13579 · NousResearch/hermes-agent

zeejaytan · 2026-04-21T15:15:15Z

What does this PR do?

Adds fallback triggering for 429 rate-limit errors in the auxiliary client (call_llm / async_call_llm). Previously, only payment/billing errors (402) and connection errors triggered fallback — rate-limit 429s were silently retried against the same rate-limited provider until all attempts were exhausted.

Root cause

_is_payment_error() only matched 429s containing billing keywords ("credits", "insufficient funds", etc.). Provider-specific rate-limit messages like Nous's "Hold up for a bit, you've exceeded the rate limit on your API key" didn't match. So should_fallback = _is_payment_error(...) or _is_connection_error(...) was always False for rate limits, and all 3 retries hit the same rate-limited endpoint.

Real-world impact: Session search summarization lost session metadata when Nous hit RPH limits, because all retries failed against the same rate-limited provider instead of falling back to OpenRouter.

Changes

New: `_is_rate_limit_error()` (after `_is_payment_error`)

Detects 429 + rate-limit keywords ("rate limit", "too many requests", "retry after", "resets in")
Generic 429 without billing keywords = classified as rate limit
Catches OpenAI SDK RateLimitError by class name (may omit .status_code) — covers the same bug as PR fix(agent): force 429 classification for RateLimitError + rate-limit cooldown on primary restore #8023
Does NOT trigger on billing-related 429s (those are _is_payment_error's job)

Updated: `should_fallback` in `call_llm` and `async_call_llm`

should_fallback = (_is_payment_error(first_err)
                  or _is_connection_error(first_err)
                  or _is_rate_limit_error(first_err))

Updated: `reason` string

Three-way dispatch: "payment error" / "rate limit" / "connection error"

Updated: max_tokens retry path

Added _is_rate_limit_error to the fallthrough check alongside payment/connection.

Tests

TestIsRateLimitError — 10 tests covering: rate-limit keywords, billing exclusions, generic 429, RateLimitError class detection, non-429 errors
TestCallLlmPaymentFallback::test_429_rate_limit_triggers_fallback — end-to-end test showing 429 on primary → fallback to next provider

All 87 tests in test_auxiliary_client.py pass.

Relationship to existing PRs

PR	Scope	This PR
#8023	`run_agent.py` — force `RateLimitError` classification + cooldown	Complementary — this PR handles the auxiliary client path
#12554	`run_agent.py` — single-credential pool rotation	Complementary
#11034	`run_agent.py` — fail over on provider overload (503/529)	Complementary

This PR fills the gap in auxiliary_client.py which none of the above touch. It should land cleanly alongside any of them.

How to test

Configure Nous as primary (or auto-detected)
Exhaust the rate limit (or mock a 429 with rate-limit message)
Before fix: auxiliary task retries 3x against same endpoint, fails
After fix: auxiliary task falls back to OpenRouter/Codex/next provider on first 429

Unit test (quick):

pytest tests/agent/test_auxiliary_client.py::TestIsRateLimitError -v
pytest tests/agent/test_auxiliary_client.py::TestCallLlmPaymentFallback::test_429_rate_limit_triggers_fallback -v

When a provider returns a 429 rate-limit error (not billing-related), the auxiliary client's call_llm/async_call_llm previously did NOT trigger the fallback chain. This caused auxiliary tasks like session_search to exhaust all 3 retries against the same rate-limited endpoint, losing session metadata that depended on the summarization completing. Root cause: `_is_payment_error()` only matched 429s containing billing keywords ("credits", "insufficient funds", etc.). Provider-specific rate-limit messages like Nous's "Hold up for a bit, you've exceeded the rate limit on your API key" didn't match, so `_is_payment_error` returned False, `_is_connection_error` returned False, and `should_fallback` was False — all retries hit the same rate-limited provider. Fix: - New `_is_rate_limit_error()` function that detects 429 + rate-limit keywords, generic 429 without billing keywords, and OpenAI SDK `RateLimitError` class instances (which may omit .status_code). - Updated `should_fallback` in both `call_llm` and `async_call_llm` to include `_is_rate_limit_error`. - Updated the max_tokens retry path to also check for rate-limit errors. - Updated the reason string to include "rate limit". This complements the Nous rate guard (PR NousResearch#10568) which prevents new calls to Nous when already rate-limited — this fix handles the case where a request is already in flight when the 429 arrives. Related: NousResearch#8023, NousResearch#12554, NousResearch#11034 Co-authored-by: Zeejay <zjtan1@gmail.com>

zeejaytan · 2026-04-21T15:45:57Z

Real-world impact

This bug caused session search summarization to silently fail when Nous hit its RPH (requests per hour) limit. The symptom:

Session search triggers async_call_llm to summarize past sessions
Nous returns 429: "Hold up for a bit, you've exceeded the rate limit on your API key"
_is_payment_error returns False (no billing keywords in message)
_is_connection_error returns False (connection was successful, it's a 429 response)
should_fallback = False → no fallback triggered
All 3 retries hit the same rate-limited Nous endpoint → all fail
Session metadata is lost — session_search can't find past sessions

The user's wiki work (165+ papers, 123 wiki pages) persisted on disk, but the session index was gone because summarization never completed.

How this interacts with existing PRs

fix: Nous Portal rate limit guard — prevent retry amplification #10568 (merged): The Nous rate guard prevents new calls to Nous when another session already recorded a 429. But it can't help when a request is already in flight when the 429 arrives — that's what this PR fixes.
fix(agent): force 429 classification for RateLimitError + rate-limit cooldown on primary restore #8023 (open): Fixes RateLimitError classification in run_agent.py. This PR covers the same pattern (RateLimitError class name detection) but in auxiliary_client.py, which fix(agent): force 429 classification for RateLimitError + rate-limit cooldown on primary restore #8023 doesn't touch.
fix(agent): fall back on rate limit when pool has no rotation room #12554 (open): Single-credential pool rotation in run_agent.py. Complementary — different surface.
fix(run-agent): fail over on provider overload #11034 (open): Fail over on 503/529 overload in run_agent.py. Complementary.

Scope

Only agent/auxiliary_client.py is changed (production code). No changes to run_agent.py. This should land cleanly alongside any of the above PRs.

alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 21, 2026

alt-glitch mentioned this pull request Apr 22, 2026

feat(plugin): Add proactive rate limiting to prevent 429 errors #14029

Open

13 tasks

alt-glitch mentioned this pull request May 4, 2026

feat(auxiliary): fallback to alternative provider on 429 rate-limit errors #19762

Closed

2 tasks

teknium1 mentioned this pull request May 5, 2026

fix(aux): trigger fallback on 429 rate-limit errors (salvage #13579) #20294

Merged

teknium1 closed this in #20294 May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(aux): trigger fallback on 429 rate-limit errors in auxiliary client#13579

fix(aux): trigger fallback on 429 rate-limit errors in auxiliary client#13579
zeejaytan wants to merge 1 commit into
NousResearch:mainfrom
zeejaytan:fix/auxiliary-429-rate-limit-fallback

zeejaytan commented Apr 21, 2026

Uh oh!

zeejaytan commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zeejaytan commented Apr 21, 2026

What does this PR do?

Root cause

Changes

New: _is_rate_limit_error() (after _is_payment_error)

Updated: should_fallback in call_llm and async_call_llm

Updated: reason string

Updated: max_tokens retry path

Tests

Relationship to existing PRs

How to test

Uh oh!

zeejaytan commented Apr 21, 2026

Real-world impact

How this interacts with existing PRs

Scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New: `_is_rate_limit_error()` (after `_is_payment_error`)

Updated: `should_fallback` in `call_llm` and `async_call_llm`

Updated: `reason` string