Skip to content

fix(aux): trigger fallback on 429 rate-limit errors in auxiliary client#13579

Closed
zeejaytan wants to merge 1 commit into
NousResearch:mainfrom
zeejaytan:fix/auxiliary-429-rate-limit-fallback
Closed

fix(aux): trigger fallback on 429 rate-limit errors in auxiliary client#13579
zeejaytan wants to merge 1 commit into
NousResearch:mainfrom
zeejaytan:fix/auxiliary-429-rate-limit-fallback

Conversation

@zeejaytan

Copy link
Copy Markdown

What does this PR do?

Adds fallback triggering for 429 rate-limit errors in the auxiliary client (call_llm / async_call_llm). Previously, only payment/billing errors (402) and connection errors triggered fallback — rate-limit 429s were silently retried against the same rate-limited provider until all attempts were exhausted.

Root cause

_is_payment_error() only matched 429s containing billing keywords ("credits", "insufficient funds", etc.). Provider-specific rate-limit messages like Nous's "Hold up for a bit, you've exceeded the rate limit on your API key" didn't match. So should_fallback = _is_payment_error(...) or _is_connection_error(...) was always False for rate limits, and all 3 retries hit the same rate-limited endpoint.

Real-world impact: Session search summarization lost session metadata when Nous hit RPH limits, because all retries failed against the same rate-limited provider instead of falling back to OpenRouter.

Changes

New: _is_rate_limit_error() (after _is_payment_error)

Updated: should_fallback in call_llm and async_call_llm

should_fallback = (_is_payment_error(first_err)
                  or _is_connection_error(first_err)
                  or _is_rate_limit_error(first_err))

Updated: reason string

Three-way dispatch: "payment error" / "rate limit" / "connection error"

Updated: max_tokens retry path

Added _is_rate_limit_error to the fallthrough check alongside payment/connection.

Tests

  • TestIsRateLimitError — 10 tests covering: rate-limit keywords, billing exclusions, generic 429, RateLimitError class detection, non-429 errors
  • TestCallLlmPaymentFallback::test_429_rate_limit_triggers_fallback — end-to-end test showing 429 on primary → fallback to next provider

All 87 tests in test_auxiliary_client.py pass.

Relationship to existing PRs

PR Scope This PR
#8023 run_agent.py — force RateLimitError classification + cooldown Complementary — this PR handles the auxiliary client path
#12554 run_agent.py — single-credential pool rotation Complementary
#11034 run_agent.py — fail over on provider overload (503/529) Complementary

This PR fills the gap in auxiliary_client.py which none of the above touch. It should land cleanly alongside any of them.

How to test

  1. Configure Nous as primary (or auto-detected)
  2. Exhaust the rate limit (or mock a 429 with rate-limit message)
  3. Before fix: auxiliary task retries 3x against same endpoint, fails
  4. After fix: auxiliary task falls back to OpenRouter/Codex/next provider on first 429

Unit test (quick):

pytest tests/agent/test_auxiliary_client.py::TestIsRateLimitError -v
pytest tests/agent/test_auxiliary_client.py::TestCallLlmPaymentFallback::test_429_rate_limit_triggers_fallback -v

When a provider returns a 429 rate-limit error (not billing-related),
the auxiliary client's call_llm/async_call_llm previously did NOT trigger
the fallback chain. This caused auxiliary tasks like session_search to
exhaust all 3 retries against the same rate-limited endpoint, losing
session metadata that depended on the summarization completing.

Root cause: `_is_payment_error()` only matched 429s containing billing
keywords ("credits", "insufficient funds", etc.). Provider-specific
rate-limit messages like Nous's "Hold up for a bit, you've exceeded the
rate limit on your API key" didn't match, so `_is_payment_error` returned
False, `_is_connection_error` returned False, and `should_fallback` was
False — all retries hit the same rate-limited provider.

Fix:
- New `_is_rate_limit_error()` function that detects 429 + rate-limit
  keywords, generic 429 without billing keywords, and OpenAI SDK
  `RateLimitError` class instances (which may omit .status_code).
- Updated `should_fallback` in both `call_llm` and `async_call_llm` to
  include `_is_rate_limit_error`.
- Updated the max_tokens retry path to also check for rate-limit errors.
- Updated the reason string to include "rate limit".

This complements the Nous rate guard (PR NousResearch#10568) which prevents new calls
to Nous when already rate-limited — this fix handles the case where a
request is already in flight when the 429 arrives.

Related: NousResearch#8023, NousResearch#12554, NousResearch#11034
Co-authored-by: Zeejay <zjtan1@gmail.com>
@zeejaytan

Copy link
Copy Markdown
Author

Real-world impact

This bug caused session search summarization to silently fail when Nous hit its RPH (requests per hour) limit. The symptom:

  1. Session search triggers async_call_llm to summarize past sessions
  2. Nous returns 429: "Hold up for a bit, you've exceeded the rate limit on your API key"
  3. _is_payment_error returns False (no billing keywords in message)
  4. _is_connection_error returns False (connection was successful, it's a 429 response)
  5. should_fallback = False → no fallback triggered
  6. All 3 retries hit the same rate-limited Nous endpoint → all fail
  7. Session metadata is lost — session_search can't find past sessions

The user's wiki work (165+ papers, 123 wiki pages) persisted on disk, but the session index was gone because summarization never completed.

How this interacts with existing PRs

Scope

Only agent/auxiliary_client.py is changed (production code). No changes to run_agent.py. This should land cleanly alongside any of the above PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants