Skip to content

fix(auxiliary): detect quota keywords in _is_payment_error and allow fallback for explicit providers#26809

Closed
kagura-agent wants to merge 1 commit into
NousResearch:mainfrom
kagura-agent:fix/aux-call-llm-quota-fallback
Closed

fix(auxiliary): detect quota keywords in _is_payment_error and allow fallback for explicit providers#26809
kagura-agent wants to merge 1 commit into
NousResearch:mainfrom
kagura-agent:fix/aux-call-llm-quota-fallback

Conversation

@kagura-agent

Copy link
Copy Markdown
Contributor

Problem

Auxiliary call_llm fallback doesn't trigger on provider rate limits (429 daily quota).

Two root causes:

  1. _is_payment_error() doesn't recognize quota-related keywords in 429 responses. Providers like OpenRouter return 429 with messages like "Too many tokens per day" or "quota exceeded", but these weren't matched.

  2. The fallback chain is gated on is_auto — explicitly configured providers are excluded from fallback even on payment/connection/rate-limit errors where the provider clearly cannot serve the request.

Fix

  1. Add quota keywords to _is_payment_error(): "quota", "too many tokens", "daily limit", "tokens per day".

  2. Remove the is_auto gate on the should_fallback condition in both call_llm() and async_call_llm(). Since should_fallback already only fires for payment/connection/rate-limit errors (all indicating "this provider can't serve right now"), the auto-only restriction was overly conservative.

Tests

  • 4 new tests in TestIsPaymentError for quota keyword detection
  • 1 new test in TestCallLlmPaymentFallback verifying explicit providers get fallback on quota errors

All 164 tests pass.

Fixes #26803

…fallback for explicit providers

- Add quota-related keywords ('quota', 'too many tokens', 'daily limit',
  'tokens per day') to _is_payment_error() so 429 responses from providers
  with daily token quotas are recognized as payment/exhaustion errors.

- Remove the is_auto gate on fallback in both call_llm() and
  async_call_llm(). Previously, explicitly configured providers were
  excluded from the fallback chain even on payment/connection/rate-limit
  errors where the provider clearly cannot serve. Since should_fallback
  already only fires for these capacity errors, the gate was overly
  restrictive.

- Add tests for new quota keywords and for explicit-provider fallback.

Fixes NousResearch#26803

Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
@kagura-agent

Copy link
Copy Markdown
Contributor Author

Closing in favor of #27625 — better approach that relaxes the explicit-provider gate only for capacity errors (payment/quota/connection) instead of removing it entirely. Thanks for the salvage @teknium1!

@teknium1

Copy link
Copy Markdown
Contributor

Superseded by #27625 (merged). @Bartok9's #26811 had a slightly more thorough quota-keyword set and kept transient rate-limit fallback gated on is_auto (correct — a 429 retry-after is a request constraint, not a capacity problem), so we salvaged that version. Same underlying fix though. Thanks for working on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auxiliary call_llm fallback doesn't trigger on provider rate limits (429 daily quota)

3 participants