Skip to content

fix(auxiliary): detect quota exhaustion as payment error; allow capacity-error fallback for explicit providers#26811

Closed
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/26803-quota-rate-limit-fallback
Closed

fix(auxiliary): detect quota exhaustion as payment error; allow capacity-error fallback for explicit providers#26811
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/26803-quota-rate-limit-fallback

Conversation

@Bartok9

@Bartok9 Bartok9 commented May 16, 2026

Copy link
Copy Markdown
Contributor

Closes #26803

Root Causes

1. Daily quota exhaustion not classified as fallback-worthy

_is_payment_error() checked for billing keywords (credits, insufficient funds, billing, payment required) but missed daily token quota exhaustion phrases used by Bedrock, Vertex AI, and LiteLLM proxies:

  • "Too many tokens per day" (Bedrock / LiteLLM)
  • "quota exceeded" / "quota_exceeded" (Vertex AI, GCP)
  • "resource exhausted" (Vertex AI gRPC code)
  • "daily limit" / "daily quota" / "tokens per day"

These are functionally identical to credit exhaustion — the provider cannot serve the request until the quota resets — but didn't trigger provider fallback.

2. Fallback chain gated on resolved_provider == 'auto' only

When a task resolves to a specific provider (e.g. "custom" for a LiteLLM proxy or "openrouter"), capacity failures (payment/quota/connection) would raise instead of trying alternatives. This is overly conservative: capacity errors mean the provider cannot serve the request regardless of user intent.

Fixes

  1. _is_payment_error(): add quota-related keywords — quota exceeded, too many tokens per day, daily limit, tokens per day, daily quota, resource exhausted.

  2. Fallback gate: capacity errors (payment/quota + connection) bypass the explicit-provider constraint in both call_llm() and acall_llm(). Transient rate-limit fallback still respects explicit provider choice.

  3. Tests: 6 new targeted tests for quota-error detection variants (Bedrock daily limit, Vertex AI RESOURCE_EXHAUSTED, generic daily quota phrases, etc.).

Impact

  • Context compaction no longer silently drops conversation history when the primary provider hits daily limits
  • Deployments using LiteLLM → Bedrock (daily token limit) with Anthropic fallback now automatically switch providers
  • No behaviour change for transient rate limits with explicit providers

…ity-error fallback for explicit providers

Closes NousResearch#26803

Root causes:
1. _is_payment_error() checked for billing keywords (credits, insufficient
   funds, billing, payment required) but missed daily token quota exhaustion
   phrases used by Bedrock, Vertex AI, and LiteLLM proxies — e.g.
   'Too many tokens per day', 'quota exceeded', 'resource exhausted',
   'daily limit'. These are functionally identical to credit exhaustion
   (provider cannot serve the request) but don't trigger fallback.

2. The call_llm() fallback chain was gated on resolved_provider == 'auto'.
   When a task resolves to a specific provider (e.g. 'custom' for a LiteLLM
   proxy, or 'openrouter'), capacity failures (payment/quota/connection)
   silently raise instead of trying alternatives. This is overly conservative:
   capacity errors mean the provider *cannot* serve the request regardless of
   user intent, so alternatives should always be tried.

Fixes:
- Add quota-related keywords to _is_payment_error(): quota_exceeded,
  too many tokens per day, daily limit, tokens per day, daily quota,
  resource exhausted (Vertex AI gRPC code).
- Allow fallback for capacity errors (payment + connection) even when
  resolved_provider is not 'auto'. Rate-limit fallback stays gated on
  is_auto to honour explicit provider constraints for transient limits.
- Apply both fixes to sync call_llm() and async acall_llm() paths.
- Add 6 targeted tests for the new quota-error detection cases.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 16, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #26809 — both PRs fix the same two root causes in _is_payment_error() (missing quota keywords) and the is_auto fallback gate in auxiliary_client.py. Both close #26803.

@teknium1

Copy link
Copy Markdown
Contributor

Superseded by #27625 (merged). Your quota-keyword detection in _is_payment_error and the capacity-error gate relaxation were salvaged onto current main — your commit is preserved with your authorship (24c209f), plus your 6 quota-detection tests. On top of your fix we layered @zccyman's #26998 fallback_chain schema and added a main-agent safety net. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auxiliary call_llm fallback doesn't trigger on provider rate limits (429 daily quota)

3 participants