fix(auxiliary): detect quota exhaustion as payment error; allow capacity-error fallback for explicit providers#26811
Closed
Bartok9 wants to merge 1 commit into
Closed
Conversation
…ity-error fallback for explicit providers Closes NousResearch#26803 Root causes: 1. _is_payment_error() checked for billing keywords (credits, insufficient funds, billing, payment required) but missed daily token quota exhaustion phrases used by Bedrock, Vertex AI, and LiteLLM proxies — e.g. 'Too many tokens per day', 'quota exceeded', 'resource exhausted', 'daily limit'. These are functionally identical to credit exhaustion (provider cannot serve the request) but don't trigger fallback. 2. The call_llm() fallback chain was gated on resolved_provider == 'auto'. When a task resolves to a specific provider (e.g. 'custom' for a LiteLLM proxy, or 'openrouter'), capacity failures (payment/quota/connection) silently raise instead of trying alternatives. This is overly conservative: capacity errors mean the provider *cannot* serve the request regardless of user intent, so alternatives should always be tried. Fixes: - Add quota-related keywords to _is_payment_error(): quota_exceeded, too many tokens per day, daily limit, tokens per day, daily quota, resource exhausted (Vertex AI gRPC code). - Allow fallback for capacity errors (payment + connection) even when resolved_provider is not 'auto'. Rate-limit fallback stays gated on is_auto to honour explicit provider constraints for transient limits. - Apply both fixes to sync call_llm() and async acall_llm() paths. - Add 6 targeted tests for the new quota-error detection cases.
Collaborator
Contributor
|
Superseded by #27625 (merged). Your quota-keyword detection in |
This was referenced May 18, 2026
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #26803
Root Causes
1. Daily quota exhaustion not classified as fallback-worthy
_is_payment_error()checked for billing keywords (credits,insufficient funds,billing,payment required) but missed daily token quota exhaustion phrases used by Bedrock, Vertex AI, and LiteLLM proxies:"Too many tokens per day"(Bedrock / LiteLLM)"quota exceeded"/"quota_exceeded"(Vertex AI, GCP)"resource exhausted"(Vertex AI gRPC code)"daily limit"/"daily quota"/"tokens per day"These are functionally identical to credit exhaustion — the provider cannot serve the request until the quota resets — but didn't trigger provider fallback.
2. Fallback chain gated on
resolved_provider == 'auto'onlyWhen a task resolves to a specific provider (e.g.
"custom"for a LiteLLM proxy or"openrouter"), capacity failures (payment/quota/connection) would raise instead of trying alternatives. This is overly conservative: capacity errors mean the provider cannot serve the request regardless of user intent.Fixes
_is_payment_error(): add quota-related keywords —quota exceeded,too many tokens per day,daily limit,tokens per day,daily quota,resource exhausted.Fallback gate: capacity errors (payment/quota + connection) bypass the explicit-provider constraint in both
call_llm()andacall_llm(). Transient rate-limit fallback still respects explicit provider choice.Tests: 6 new targeted tests for quota-error detection variants (Bedrock daily limit, Vertex AI RESOURCE_EXHAUSTED, generic daily quota phrases, etc.).
Impact