feat(auxiliary): layered fallback (chain → main agent) + capacity-error gate fix by teknium1 · Pull Request #27625 · NousResearch/hermes-agent

teknium1 · 2026-05-17T21:46:04Z

Salvages #26811 (@Bartok9) AND #26998 (@zccyman) into a layered auxiliary fallback system. Closes #26803, closes #26882.

What this makes true

Auxiliary tasks (compression, vision, tts, web_extract, session_search, etc.) now follow a 4-step fallback ladder when the primary aux provider fails on a capacity error (402 payment, 429 quota, connection failure):

Primary aux provider (existing)
User-configured auxiliary.<task>.fallback_chain entries, in order
Main agent provider + model (last-resort safety net)
User-visible logger.warning + re-raise the original error

For users on auto (no explicit aux provider), the existing auto-detection chain runs instead — its Step 1 already IS the main agent model, so they get the same outcome with zero config.

Config schema

auxiliary:
  vision:
    provider: glm
    model: glm-4v-flash
    fallback_chain:
      - provider: openai
        model: gpt-4o-mini
      - provider: nous
        model: anthropic/claude-sonnet-4

If fallback_chain is omitted, the user still gets main-agent fallback for free. The chain is optional ordering preference, not required setup.

Underlying bug fixes (from #26811)

_is_payment_error() now recognizes daily/monthly quota exhaustion phrases used by Bedrock, Vertex AI, LiteLLM proxies (quota exceeded, too many tokens per day, daily limit, resource exhausted). Previously these were misclassified as transient rate limits and silently raised on explicit providers.
Fallback gate — capacity errors (payment/quota + connection) bypass the explicit-provider constraint. Transient rate-limit fallback still respects explicit provider choice (a 429 retry-after is a request constraint, not a capacity problem).

Changes

File	What
`agent/auxiliary_client.py`	+176/-4: `_try_main_agent_model_fallback()` helper, layered fallback in `call_llm`/`async_call_llm`, exhaustion warning, quota-keyword detection
`tests/agent/test_auxiliary_client.py`	+147: 6 quota-keyword tests, 3 fallback-layering tests, 4 main-agent-helper tests, 2 eviction tests adapted to new gate
`scripts/release.py`	+1: AUTHOR_MAP entry for @zccyman noreply email

Validation

scripts/run_tests.sh tests/agent/test_auxiliary_client.py → 171 passed, 1 pre-existing failure on main (test_custom_endpoint_uses_codex_wrapper_when_runtime_requests_responses_api, unrelated to this PR).

E2E verified with real imports: layered fallback resolves the configured chain entries in order, falls back to the user's actual main agent provider+model when chain exhausts, and emits the warning when both layers fail.

Credit

@Bartok9 — original _is_payment_error quota-keyword fix and capacity-error gate relaxation (fix(auxiliary): detect quota exhaustion as payment error; allow capacity-error fallback for explicit providers #26811)
@zccyman — fallback_chain config schema, _try_configured_fallback_chain, _resolve_single_provider (feat(auxiliary): add configurable fallback chains for auxiliary tasks (#26882) #26998)
@teknium1 — main-agent safety net layer, exhaustion warning, ordering, eviction-test fix-up, layering tests

Closes #26803, closes #26882. Closes #26998 and #26809 superseded.

…ity-error fallback for explicit providers Closes #26803 Root causes: 1. _is_payment_error() checked for billing keywords (credits, insufficient funds, billing, payment required) but missed daily token quota exhaustion phrases used by Bedrock, Vertex AI, and LiteLLM proxies — e.g. 'Too many tokens per day', 'quota exceeded', 'resource exhausted', 'daily limit'. These are functionally identical to credit exhaustion (provider cannot serve the request) but don't trigger fallback. 2. The call_llm() fallback chain was gated on resolved_provider == 'auto'. When a task resolves to a specific provider (e.g. 'custom' for a LiteLLM proxy, or 'openrouter'), capacity failures (payment/quota/connection) silently raise instead of trying alternatives. This is overly conservative: capacity errors mean the provider *cannot* serve the request regardless of user intent, so alternatives should always be tried. Fixes: - Add quota-related keywords to _is_payment_error(): quota_exceeded, too many tokens per day, daily limit, tokens per day, daily quota, resource exhausted (Vertex AI gRPC code). - Allow fallback for capacity errors (payment + connection) even when resolved_provider is not 'auto'. Rate-limit fallback stays gated on is_auto to honour explicit provider constraints for transient limits. - Apply both fixes to sync call_llm() and async acall_llm() paths. - Add 6 targeted tests for the new quota-error detection cases.

The two TestAuxiliaryClientPoisonedCacheEviction tests were written when explicit-provider users got no fallback at all on connection errors — they asserted ConnectionError propagated after eviction because the fallback gate blocked the auto chain. After the #26803 fix in the previous commit, capacity errors (payment/quota/connection) now DO trigger fallback even on explicit providers. The tests still verify cache eviction (their actual contract) but now stub _try_payment_fallback so the fallback machinery does not attempt a real network call.

github-actions · 2026-05-17T21:46:40Z

🔎 Lint report: `hermes/hermes-60aba8ad` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8747 on HEAD, 8732 on base (🆕 +15)

🆕 New issues (3):

Rule	Count
`unknown-argument`	2
`no-matching-overload`	1

First entries

agent/auxiliary_client.py:2719: [unknown-argument] unknown-argument: Argument `base_url` does not match any known parameter of function `resolve_provider_client`
agent/auxiliary_client.py:2679: [no-matching-overload] no-matching-overload: No overload of bound method `dict.get` matches arguments
agent/auxiliary_client.py:2720: [unknown-argument] unknown-argument: Argument `api_key` does not match any known parameter of function `resolve_provider_client`

✅ Fixed issues: none

Unchanged: 4603 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@zccyman

… net Layered fallback for auxiliary tasks (compression, vision, tts, web_extract, session_search, etc.): 1. Primary aux provider (existing) 2. User-configured auxiliary.<task>.fallback_chain (new) 3. Main agent provider + model (new — last-resort safety net) 4. Warn user + re-raise original error (new) For users on 'auto' (no explicit aux provider), the existing _try_payment_fallback auto-detection chain runs instead — its Step 1 already IS the main agent model, so they get the same behaviour without configuration. The configured fallback_chain config schema comes from #26882 / @zccyman; the main-agent safety net + exhaustion warning were added on top. Closes #26882. Builds on the capacity-error gate fix in the previous commit (#26803 / @Bartok9).

7 new tests: TestAuxiliaryFallbackLayering (3): - configured_chain succeeds → main agent fallback NOT consulted - chain returns nothing → main agent fallback runs and succeeds - both exhausted → user-visible 'all fallbacks exhausted' warning fires before the original error is re-raised TestTryMainAgentModelFallback (4): - returns (None, None, "") when main provider is 'auto' - returns (None, None, "") when failed provider == main provider (no point retrying the same backend) - resolves the main provider's client when configured correctly - skips when main provider is marked unhealthy

BoardJames-Bot · 2026-05-17T23:38:01Z

CI triage note from rock-turning: test/build-arm64 were still running when I checked, but the current blocking completed check is CodeQL (6 high "Clear-text logging of sensitive information" annotations in agent/auxiliary_client.py). The annotations point at fallback logging that includes exception/model fields from aux provider config paths.

I made the narrow local fix on top of this PR as commit 8888e60f9 (fix(auxiliary): avoid sensitive fallback log fields): remove the raw exception/model/default-model values from the fallback log records while keeping task/reason/provider/fallback labels. Targeted verification:

HOME=/Users/spencer scripts/run_tests.sh \
  tests/agent/test_auxiliary_client.py::TestIsPaymentError \
  tests/agent/test_auxiliary_client.py::TestAuxiliaryFallbackLayering \
  tests/agent/test_auxiliary_client.py::TestTryMainAgentModelFallback -q
# 20 passed

I also ran the full tests/agent/test_auxiliary_client.py; it hit an existing TestAuxiliaryClientPoisonedCacheEviction::test_codex_timeout_evicts_cached_wrapper failure unrelated to this log-field edit. Pushing the local fix is blocked for BoardJames-Bot with Permission to NousResearch/hermes-agent.git denied. Next owner: a maintainer with branch push rights should apply the same 4 log-call edit (or cherry-pick local 8888e60f9) so CodeQL can rerun.

BoardJames-Bot · 2026-05-17T23:45:59Z

Follow-up status update: the previously pending test and build-arm64 checks have now completed successfully. The only remaining red check on this PR is still CodeQL for the sensitive clear-text logging annotations described above. I retried pushing local fix 8888e60f9 to hermes/hermes-60aba8ad, but BoardJames-Bot is still denied write access to NousResearch/hermes-agent.git; maintainer/branch-owner action is still needed to apply that log-field edit and rerun CodeQL.

Adds a new 'Auxiliary Capacity-Error Fallback' section to website/docs/user-guide/features/fallback-providers.md covering: - The 4-step ladder (primary → fallback_chain → main agent → warn) - Which errors trigger fallback (402, 429 quota, connection) vs which respect explicit provider choice (transient 429 rate limits) - Optional fallback_chain config schema with vision + compression examples - Recognized quota-error phrases (Bedrock, Vertex AI, generic) Updates the bottom summary table — every auxiliary task now shows 'Layered (see above)' instead of 'Auto-detection chain' since explicit-provider users also get the main-agent safety net.

Bartok9 and others added 2 commits May 17, 2026 14:44

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 17, 2026

kagura-agent mentioned this pull request May 17, 2026

fix(auxiliary): detect quota keywords in _is_payment_error and allow fallback for explicit providers #26809

Closed

zccyman and others added 3 commits May 17, 2026 16:25

chore(release): map zccyman noreply email for #26998

a78b331

teknium1 changed the title ~~fix(auxiliary): capacity-error fallback for explicit providers~~ feat(auxiliary): layered fallback (chain → main agent) + capacity-error gate fix May 17, 2026

github-advanced-security AI found potential problems May 17, 2026

View reviewed changes

teknium1 merged commit 43e566f into main May 18, 2026
21 of 22 checks passed

teknium1 deleted the hermes/hermes-60aba8ad branch May 18, 2026 00:15

This was referenced May 18, 2026

fix(auxiliary): detect quota exhaustion as payment error; allow capacity-error fallback for explicit providers #26811

Closed

feat(auxiliary): add configurable fallback chains for auxiliary tasks (#26882) #26998

Closed

Haderach-Ram mentioned this pull request May 18, 2026

Ecosystem Digest — 2026-05-18 Haderach-Ram/openclaw-radar#11

Open

BrewTestBot mentioned this pull request May 28, 2026

hermes-agent 2026.5.28 Homebrew/homebrew-core#285115

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(auxiliary): layered fallback (chain → main agent) + capacity-error gate fix#27625

feat(auxiliary): layered fallback (chain → main agent) + capacity-error gate fix#27625
teknium1 merged 6 commits into
mainfrom
hermes/hermes-60aba8ad

teknium1 commented May 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BoardJames-Bot commented May 17, 2026

Uh oh!

BoardJames-Bot commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

teknium1 commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this makes true

Config schema

Underlying bug fixes (from #26811)

Changes

Validation

Credit

Uh oh!

github-actions Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: hermes/hermes-60aba8ad vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BoardJames-Bot commented May 17, 2026

Uh oh!

BoardJames-Bot commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

teknium1 commented May 17, 2026 •

edited

Loading

github-actions Bot commented May 17, 2026 •

edited

Loading

🔎 Lint report: `hermes/hermes-60aba8ad` vs `origin/main`