Skip to content

feat(auxiliary): add configurable fallback chains for auxiliary tasks (#26882)#26998

Closed
zccyman wants to merge 1 commit into
NousResearch:mainfrom
atyou2happy:feat/auxiliary-fallback-chains-26882
Closed

feat(auxiliary): add configurable fallback chains for auxiliary tasks (#26882)#26998
zccyman wants to merge 1 commit into
NousResearch:mainfrom
atyou2happy:feat/auxiliary-fallback-chains-26882

Conversation

@zccyman

@zccyman zccyman commented May 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Add configurable fallback_chain for auxiliary tasks so that when a configured provider fails (quota, rate-limit, connection), the system automatically tries alternative providers instead of raising immediately.

Closes #26882

Problem

Currently, auxiliary task fallback only works when resolved_provider is "auto" (no explicit provider configured). When a user explicitly configures an auxiliary task (e.g., auxiliary.vision.provider: glm), the is_auto gate at the fallback entry point prevents any fallback:

is_auto = resolved_provider in {"auto", "", None}
if should_fallback and is_auto:  # ← blocks explicit-provider users

This means a user who configures auxiliary.vision.provider: glm gets zero recovery when glm fails — the error propagates directly.

Solution

  1. New config key: auxiliary.<task>.fallback_chain — a list of {provider, model, base_url?, api_key?} entries
  2. New function: _try_configured_fallback_chain() — reads the chain from config, tries each entry via resolve_provider_client()
  3. Modified fallback gate: Removed the is_auto requirement from the outer gate; auto-chain runs first (existing behaviour), then configured chain runs as fallback

Config example

auxiliary:
  vision:
    provider: glm
    model: glm-4v-flash
    fallback_chain:
      - provider: openrouter
        model: google/gemini-3-flash-preview
      - provider: nous
        model: claude-sonnet-4
  compression:
    provider: openrouter
    fallback_chain:
      - provider: openai
        model: gpt-4o-mini

Behaviour

  1. Try primary provider first (unchanged)
  2. On fallback-worthy errors (429, 402, connection), try auto-detection chain (unchanged for auto users)
  3. New: If auto-chain doesn't find a fallback (or user has explicit provider), try fallback_chain entries in order
  4. If all entries fail, raise the last error

Files Changed

File Change
agent/auxiliary_client.py +103/-6: added _try_configured_fallback_chain(), _resolve_single_provider(), modified call_llm() and async_call_llm() fallback gates

Test Results

295 passed, 1 skipped, 23362 deselected (all auxiliary tests)

Design Decisions

  • Config over code: Users define fallback chains in config.yaml, not in Python
  • Non-breaking: Existing auto fallback behaviour is unchanged; fallback_chain is purely additive
  • Task-agnostic: Works for ALL auxiliary tasks (vision, tts, compression, web_extract, etc.)
  • Reuses resolve_provider_client(): No duplicate provider resolution logic

…NousResearch#26882)

When a user explicitly configures an auxiliary task provider (e.g.
auxiliary.vision.provider=glm), the fallback logic is gated behind
is_auto=True and never triggers. This means a configured provider
that fails (quota, rate-limit, connection) raises immediately with
no recovery.

Add auxiliary.<task>.fallback_chain config key — a list of
{provider, model, base_url?, api_key?} entries tried in order when
the primary provider fails. Works for ALL auxiliary tasks (vision,
tts, compression, web_extract, etc.), not just auto-detected ones.

Closes NousResearch#26882
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder labels May 16, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Competing with #25878 (vision fallback chain). Parent issue #26882 was previously triaged as duplicate of #22201. Related: #24039 (reuse fallback_providers).

@zccyman

zccyman commented May 16, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @alt-glitch for the cross-reference. Good catch on the crowded space.

A clarification on scope differentiation:

Both are convergent solutions to the same class of problem (auxiliary resilience), but at different granularities. Leaving open for upstream to decide on the preferred scope.

Related: #22201, #24039.

@teknium1

Copy link
Copy Markdown
Contributor

Superseded by #27625 (merged). Your fallback_chain config schema + _try_configured_fallback_chain plumbing were salvaged and used directly — your commit is preserved on main with your authorship (a574246). On top of your work we added a main-agent safety net layer that always runs after the configured chain (so users without a chain still get fallback), and a user-visible warning when every layer exhausts. The capacity-error gate fix from @Bartok9's #26811 is bundled in the same PR. Thanks for the contribution!

gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Configurable fallback chains for auxiliary tasks

3 participants