Skip to content

fix(aux): trigger fallback on 429 rate-limit errors (salvage #13579)#20294

Merged
teknium1 merged 2 commits into
mainfrom
salvage/pr-13579
May 5, 2026
Merged

fix(aux): trigger fallback on 429 rate-limit errors (salvage #13579)#20294
teknium1 merged 2 commits into
mainfrom
salvage/pr-13579

Conversation

@teknium1

@teknium1 teknium1 commented May 5, 2026

Copy link
Copy Markdown
Contributor

Salvages @zeejaytan's PR #13579 onto current main (conflicts with main's newer Nous-auth-refresh + credential-refresh retry blocks resolved — both preserved).

What it does

Auxiliary calls that 429 with non-billing rate-limit text previously exhausted all retries against the same endpoint instead of falling back. _is_payment_error only matched billing-keyword 429s, so Nous's 'Hold up for a bit' and similar generic rate-limit messages fell through. Adds _is_rate_limit_error and includes it in should_fallback on both the sync and async paths.

Changes

  • agent/auxiliary_client.py — new _is_rate_limit_error helper; should_fallback in both call_llm and async_call_llm now or's in rate-limit; max_tokens retry path also checks.
  • tests/agent/test_auxiliary_client.py — new coverage for 429 detection.
  • scripts/release.py — AUTHOR_MAP entry for zeejaytan.

Validation

tests/agent/test_auxiliary_client.py — 134 passed locally.

Closes #13579 via salvage.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 5, 2026
zeejaytan and others added 2 commits May 5, 2026 10:15
When a provider returns a 429 rate-limit error (not billing-related),
the auxiliary client's call_llm/async_call_llm previously did NOT trigger
the fallback chain. This caused auxiliary tasks like session_search to
exhaust all 3 retries against the same rate-limited endpoint, losing
session metadata that depended on the summarization completing.

Root cause: `_is_payment_error()` only matched 429s containing billing
keywords ("credits", "insufficient funds", etc.). Provider-specific
rate-limit messages like Nous's "Hold up for a bit, you've exceeded the
rate limit on your API key" didn't match, so `_is_payment_error` returned
False, `_is_connection_error` returned False, and `should_fallback` was
False — all retries hit the same rate-limited provider.

Fix:
- New `_is_rate_limit_error()` function that detects 429 + rate-limit
  keywords, generic 429 without billing keywords, and OpenAI SDK
  `RateLimitError` class instances (which may omit .status_code).
- Updated `should_fallback` in both `call_llm` and `async_call_llm` to
  include `_is_rate_limit_error`.
- Updated the max_tokens retry path to also check for rate-limit errors.
- Updated the reason string to include "rate limit".

This complements the Nous rate guard (PR #10568) which prevents new calls
to Nous when already rate-limited — this fix handles the case where a
request is already in flight when the 429 arrives.

Related: #8023, #12554, #11034
Co-authored-by: Zeejay <zjtan1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants