fix(auxiliary): detect quota exhaustion as payment error; allow capacity-error fallback for explicit providers by Bartok9 · Pull Request #26811 · NousResearch/hermes-agent

Bartok9 · 2026-05-16T07:36:57Z

Root Causes

1. Daily quota exhaustion not classified as fallback-worthy

_is_payment_error() checked for billing keywords (credits, insufficient funds, billing, payment required) but missed daily token quota exhaustion phrases used by Bedrock, Vertex AI, and LiteLLM proxies:

"Too many tokens per day" (Bedrock / LiteLLM)
"quota exceeded" / "quota_exceeded" (Vertex AI, GCP)
"resource exhausted" (Vertex AI gRPC code)
"daily limit" / "daily quota" / "tokens per day"

These are functionally identical to credit exhaustion — the provider cannot serve the request until the quota resets — but didn't trigger provider fallback.

2. Fallback chain gated on `resolved_provider == 'auto'` only

When a task resolves to a specific provider (e.g. "custom" for a LiteLLM proxy or "openrouter"), capacity failures (payment/quota/connection) would raise instead of trying alternatives. This is overly conservative: capacity errors mean the provider cannot serve the request regardless of user intent.

Fixes

_is_payment_error(): add quota-related keywords — quota exceeded, too many tokens per day, daily limit, tokens per day, daily quota, resource exhausted.
Fallback gate: capacity errors (payment/quota + connection) bypass the explicit-provider constraint in both call_llm() and acall_llm(). Transient rate-limit fallback still respects explicit provider choice.
Tests: 6 new targeted tests for quota-error detection variants (Bedrock daily limit, Vertex AI RESOURCE_EXHAUSTED, generic daily quota phrases, etc.).

Impact

Context compaction no longer silently drops conversation history when the primary provider hits daily limits
Deployments using LiteLLM → Bedrock (daily token limit) with Anthropic fallback now automatically switch providers
No behaviour change for transient rate limits with explicit providers

…ity-error fallback for explicit providers Closes NousResearch#26803 Root causes: 1. _is_payment_error() checked for billing keywords (credits, insufficient funds, billing, payment required) but missed daily token quota exhaustion phrases used by Bedrock, Vertex AI, and LiteLLM proxies — e.g. 'Too many tokens per day', 'quota exceeded', 'resource exhausted', 'daily limit'. These are functionally identical to credit exhaustion (provider cannot serve the request) but don't trigger fallback. 2. The call_llm() fallback chain was gated on resolved_provider == 'auto'. When a task resolves to a specific provider (e.g. 'custom' for a LiteLLM proxy, or 'openrouter'), capacity failures (payment/quota/connection) silently raise instead of trying alternatives. This is overly conservative: capacity errors mean the provider *cannot* serve the request regardless of user intent, so alternatives should always be tried. Fixes: - Add quota-related keywords to _is_payment_error(): quota_exceeded, too many tokens per day, daily limit, tokens per day, daily quota, resource exhausted (Vertex AI gRPC code). - Allow fallback for capacity errors (payment + connection) even when resolved_provider is not 'auto'. Rate-limit fallback stays gated on is_auto to honour explicit provider constraints for transient limits. - Apply both fixes to sync call_llm() and async acall_llm() paths. - Add 6 targeted tests for the new quota-error detection cases.

alt-glitch · 2026-05-16T07:46:32Z

Duplicate of #26809 — both PRs fix the same two root causes in _is_payment_error() (missing quota keywords) and the is_auto fallback gate in auxiliary_client.py. Both close #26803.

teknium1 · 2026-05-18T00:15:56Z

Superseded by #27625 (merged). Your quota-keyword detection in _is_payment_error and the capacity-error gate relaxation were salvaged onto current main — your commit is preserved with your authorship (24c209f), plus your 6 quota-detection tests. On top of your fix we layered @zccyman's #26998 fallback_chain schema and added a main-agent safety net. Thanks!

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 16, 2026

teknium1 mentioned this pull request May 17, 2026

feat(auxiliary): layered fallback (chain → main agent) + capacity-error gate fix #27625

Merged

teknium1 closed this May 18, 2026

This was referenced May 18, 2026

feat(auxiliary): add configurable fallback chains for auxiliary tasks (#26882) #26998

Closed

fix(auxiliary): detect quota keywords in _is_payment_error and allow fallback for explicit providers #26809

Closed

BrewTestBot mentioned this pull request May 28, 2026

hermes-agent 2026.5.28 Homebrew/homebrew-core#285115

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(auxiliary): detect quota exhaustion as payment error; allow capacity-error fallback for explicit providers#26811

fix(auxiliary): detect quota exhaustion as payment error; allow capacity-error fallback for explicit providers#26811
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/26803-quota-rate-limit-fallback

Bartok9 commented May 16, 2026

Uh oh!

alt-glitch commented May 16, 2026

Uh oh!

teknium1 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Bartok9 commented May 16, 2026

Root Causes

1. Daily quota exhaustion not classified as fallback-worthy

2. Fallback chain gated on resolved_provider == 'auto' only

Fixes

Impact

Uh oh!

alt-glitch commented May 16, 2026

Uh oh!

teknium1 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

2. Fallback chain gated on `resolved_provider == 'auto'` only