Skip to content

feat(auxiliary): layered fallback (chain → main agent) + capacity-error gate fix#27625

Merged
teknium1 merged 6 commits into
mainfrom
hermes/hermes-60aba8ad
May 18, 2026
Merged

feat(auxiliary): layered fallback (chain → main agent) + capacity-error gate fix#27625
teknium1 merged 6 commits into
mainfrom
hermes/hermes-60aba8ad

Conversation

@teknium1

@teknium1 teknium1 commented May 17, 2026

Copy link
Copy Markdown
Contributor

Salvages #26811 (@Bartok9) AND #26998 (@zccyman) into a layered auxiliary fallback system. Closes #26803, closes #26882.

What this makes true

Auxiliary tasks (compression, vision, tts, web_extract, session_search, etc.) now follow a 4-step fallback ladder when the primary aux provider fails on a capacity error (402 payment, 429 quota, connection failure):

  1. Primary aux provider (existing)
  2. User-configured auxiliary.<task>.fallback_chain entries, in order
  3. Main agent provider + model (last-resort safety net)
  4. User-visible logger.warning + re-raise the original error

For users on auto (no explicit aux provider), the existing auto-detection chain runs instead — its Step 1 already IS the main agent model, so they get the same outcome with zero config.

Config schema

auxiliary:
  vision:
    provider: glm
    model: glm-4v-flash
    fallback_chain:
      - provider: openai
        model: gpt-4o-mini
      - provider: nous
        model: anthropic/claude-sonnet-4

If fallback_chain is omitted, the user still gets main-agent fallback for free. The chain is optional ordering preference, not required setup.

Underlying bug fixes (from #26811)

  • _is_payment_error() now recognizes daily/monthly quota exhaustion phrases used by Bedrock, Vertex AI, LiteLLM proxies (quota exceeded, too many tokens per day, daily limit, resource exhausted). Previously these were misclassified as transient rate limits and silently raised on explicit providers.
  • Fallback gate — capacity errors (payment/quota + connection) bypass the explicit-provider constraint. Transient rate-limit fallback still respects explicit provider choice (a 429 retry-after is a request constraint, not a capacity problem).

Changes

File What
agent/auxiliary_client.py +176/-4: _try_main_agent_model_fallback() helper, layered fallback in call_llm/async_call_llm, exhaustion warning, quota-keyword detection
tests/agent/test_auxiliary_client.py +147: 6 quota-keyword tests, 3 fallback-layering tests, 4 main-agent-helper tests, 2 eviction tests adapted to new gate
scripts/release.py +1: AUTHOR_MAP entry for @zccyman noreply email

Validation

scripts/run_tests.sh tests/agent/test_auxiliary_client.py → 171 passed, 1 pre-existing failure on main (test_custom_endpoint_uses_codex_wrapper_when_runtime_requests_responses_api, unrelated to this PR).

E2E verified with real imports: layered fallback resolves the configured chain entries in order, falls back to the user's actual main agent provider+model when chain exhausts, and emits the warning when both layers fail.

Credit

Closes #26803, closes #26882. Closes #26998 and #26809 superseded.

Bartok9 and others added 2 commits May 17, 2026 14:44
…ity-error fallback for explicit providers

Closes #26803

Root causes:
1. _is_payment_error() checked for billing keywords (credits, insufficient
   funds, billing, payment required) but missed daily token quota exhaustion
   phrases used by Bedrock, Vertex AI, and LiteLLM proxies — e.g.
   'Too many tokens per day', 'quota exceeded', 'resource exhausted',
   'daily limit'. These are functionally identical to credit exhaustion
   (provider cannot serve the request) but don't trigger fallback.

2. The call_llm() fallback chain was gated on resolved_provider == 'auto'.
   When a task resolves to a specific provider (e.g. 'custom' for a LiteLLM
   proxy, or 'openrouter'), capacity failures (payment/quota/connection)
   silently raise instead of trying alternatives. This is overly conservative:
   capacity errors mean the provider *cannot* serve the request regardless of
   user intent, so alternatives should always be tried.

Fixes:
- Add quota-related keywords to _is_payment_error(): quota_exceeded,
  too many tokens per day, daily limit, tokens per day, daily quota,
  resource exhausted (Vertex AI gRPC code).
- Allow fallback for capacity errors (payment + connection) even when
  resolved_provider is not 'auto'. Rate-limit fallback stays gated on
  is_auto to honour explicit provider constraints for transient limits.
- Apply both fixes to sync call_llm() and async acall_llm() paths.
- Add 6 targeted tests for the new quota-error detection cases.
The two TestAuxiliaryClientPoisonedCacheEviction tests were written
when explicit-provider users got no fallback at all on connection
errors — they asserted ConnectionError propagated after eviction
because the fallback gate blocked the auto chain.

After the #26803 fix in the previous commit, capacity errors
(payment/quota/connection) now DO trigger fallback even on explicit
providers. The tests still verify cache eviction (their actual
contract) but now stub _try_payment_fallback so the fallback
machinery does not attempt a real network call.
@github-actions

github-actions Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-60aba8ad vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8747 on HEAD, 8732 on base (🆕 +15)

🆕 New issues (3):

Rule Count
unknown-argument 2
no-matching-overload 1
First entries
agent/auxiliary_client.py:2719: [unknown-argument] unknown-argument: Argument `base_url` does not match any known parameter of function `resolve_provider_client`
agent/auxiliary_client.py:2679: [no-matching-overload] no-matching-overload: No overload of bound method `dict.get` matches arguments
agent/auxiliary_client.py:2720: [unknown-argument] unknown-argument: Argument `api_key` does not match any known parameter of function `resolve_provider_client`

✅ Fixed issues: none

Unchanged: 4603 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 17, 2026
zccyman and others added 3 commits May 17, 2026 16:25
… net

Layered fallback for auxiliary tasks (compression, vision, tts, web_extract,
session_search, etc.):

  1. Primary aux provider (existing)
  2. User-configured auxiliary.<task>.fallback_chain (new)
  3. Main agent provider + model (new — last-resort safety net)
  4. Warn user + re-raise original error (new)

For users on 'auto' (no explicit aux provider), the existing
_try_payment_fallback auto-detection chain runs instead — its Step 1
already IS the main agent model, so they get the same behaviour without
configuration.

The configured fallback_chain config schema comes from #26882 / @zccyman;
the main-agent safety net + exhaustion warning were added on top.

Closes #26882. Builds on the capacity-error gate fix in the previous
commit (#26803 / @Bartok9).
7 new tests:

TestAuxiliaryFallbackLayering (3):
  - configured_chain succeeds → main agent fallback NOT consulted
  - chain returns nothing → main agent fallback runs and succeeds
  - both exhausted → user-visible 'all fallbacks exhausted' warning
    fires before the original error is re-raised

TestTryMainAgentModelFallback (4):
  - returns (None, None, "") when main provider is 'auto'
  - returns (None, None, "") when failed provider == main provider
    (no point retrying the same backend)
  - resolves the main provider's client when configured correctly
  - skips when main provider is marked unhealthy
@teknium1 teknium1 changed the title fix(auxiliary): capacity-error fallback for explicit providers feat(auxiliary): layered fallback (chain → main agent) + capacity-error gate fix May 17, 2026
Comment thread agent/auxiliary_client.py Dismissed
Comment thread agent/auxiliary_client.py Dismissed
Comment thread agent/auxiliary_client.py Dismissed
Comment thread agent/auxiliary_client.py Dismissed
Comment thread agent/auxiliary_client.py Dismissed
Comment thread agent/auxiliary_client.py Dismissed
@BoardJames-Bot

Copy link
Copy Markdown

CI triage note from rock-turning: test/build-arm64 were still running when I checked, but the current blocking completed check is CodeQL (6 high "Clear-text logging of sensitive information" annotations in agent/auxiliary_client.py). The annotations point at fallback logging that includes exception/model fields from aux provider config paths.

I made the narrow local fix on top of this PR as commit 8888e60f9 (fix(auxiliary): avoid sensitive fallback log fields): remove the raw exception/model/default-model values from the fallback log records while keeping task/reason/provider/fallback labels. Targeted verification:

HOME=/Users/spencer scripts/run_tests.sh \
  tests/agent/test_auxiliary_client.py::TestIsPaymentError \
  tests/agent/test_auxiliary_client.py::TestAuxiliaryFallbackLayering \
  tests/agent/test_auxiliary_client.py::TestTryMainAgentModelFallback -q
# 20 passed

I also ran the full tests/agent/test_auxiliary_client.py; it hit an existing TestAuxiliaryClientPoisonedCacheEviction::test_codex_timeout_evicts_cached_wrapper failure unrelated to this log-field edit. Pushing the local fix is blocked for BoardJames-Bot with Permission to NousResearch/hermes-agent.git denied. Next owner: a maintainer with branch push rights should apply the same 4 log-call edit (or cherry-pick local 8888e60f9) so CodeQL can rerun.

@BoardJames-Bot

Copy link
Copy Markdown

Follow-up status update: the previously pending test and build-arm64 checks have now completed successfully. The only remaining red check on this PR is still CodeQL for the sensitive clear-text logging annotations described above. I retried pushing local fix 8888e60f9 to hermes/hermes-60aba8ad, but BoardJames-Bot is still denied write access to NousResearch/hermes-agent.git; maintainer/branch-owner action is still needed to apply that log-field edit and rerun CodeQL.

Adds a new 'Auxiliary Capacity-Error Fallback' section to
website/docs/user-guide/features/fallback-providers.md covering:

- The 4-step ladder (primary → fallback_chain → main agent → warn)
- Which errors trigger fallback (402, 429 quota, connection) vs
  which respect explicit provider choice (transient 429 rate limits)
- Optional fallback_chain config schema with vision + compression examples
- Recognized quota-error phrases (Bedrock, Vertex AI, generic)

Updates the bottom summary table — every auxiliary task now shows
'Layered (see above)' instead of 'Auto-detection chain' since
explicit-provider users also get the main-agent safety net.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Configurable fallback chains for auxiliary tasks Auxiliary call_llm fallback doesn't trigger on provider rate limits (429 daily quota)

6 participants