The Design Issue
Hermes maintains two parallel fallback systems that don't know about each other:
User's fallback_providers (config.yaml): Hardcoded aux fallback chain:
┌────────────────────────────────┐ ┌───────────────────────────────┐
│ 1. nvidia/ring-2.6-1t:free │ │ 1. openrouter → │
│ 2. nvidia/deepseek-v4-pro │ ←ignored→ │ google/gemini-3-flash- │
│ 3. nvidia/glm-5.1 │ each other │ preview (PAID) │
│ 4. nvidia/minimax-m2.7 │ │ 2. nous → same paid model │
│ ... │ │ 3. custom │
└────────────────────────────────┘ │ 4. api-key providers │
↑ └───────────────────────────────┘
Used by main agent ↑
Used by every aux task
(compression, vision, title,
web_extract, curator, etc.)
When the main provider fails, the main agent walks the user's
fallback_providers. But every auxiliary task (compression,
title_generation, vision, web_extract, session_search,
skills_hub, approval, mcp, triage_specifier, curator) walks a
separate, hardcoded fallback chain with hardcoded default models —
mostly paid models like google/gemini-3-flash-preview,
claude-haiku-4-5, glm-4.5-flash, etc.
This is a violation of the single-source-of-truth principle: the user
configured their fallback chain in one place, and most users assume
that's the only place the agent will look.
Why This Matters
Free-tier users who explicitly configure only :free models in
fallback_providers still get charged (or hit per-key spend limits)
because aux tasks invisibly use a paid default. The user has no way to
discover this short of reading the source code.
Even paid users are affected: a user who carefully picked a budget-
friendly fallback chain will see aux tasks silently use a more
expensive model they didn't choose.
Code Evidence
agent/auxiliary_client.py:1823-1841:
def _get_provider_chain() -> List[tuple]:
return [
(\"openrouter\", _try_openrouter),
(\"nous\", _try_nous),
(\"local/custom\", _try_custom_endpoint),
(\"api-key\", _resolve_api_key_provider),
]
agent/auxiliary_client.py:391-392:
_OPENROUTER_MODEL = \"google/gemini-3-flash-preview\" # paid
_NOUS_MODEL = \"google/gemini-3-flash-preview\" # paid
plugins/model-providers/*/__init__.py: every provider's
default_aux_model is a paid model.
Nowhere in this fallback path does the code read the user's
fallback_providers from config.
Existing Comment Acknowledges Step 1 But Not Step 2
agent/auxiliary_client.py:2451-2457:
"auto" means "use my main chat model for side tasks as well" — no
surprise switches to a cheap fallback model for side tasks.
The comment frames Step 1 as preventing surprise model switches, but
Step 2's surprise paid-model switch goes unaddressed.
Proposed Fix
When Step 1 (main provider) fails for an aux task, walk the user's
fallback_providers list — same order, same models the user picked —
before consulting the hardcoded aux chain. The hardcoded chain
remains as a last-resort default for users with no fallback_providers
configured.
def _resolve_fallback(failed_provider, task):
# 1. Honor user's fallback_providers first
for entry in user_fallback_providers:
if entry.provider == failed_provider:
continue
client = try_build_client(entry.provider, entry.model)
if client: return client, entry.model
# 2. Hardcoded chain only if user didn't configure anything
if not user_fallback_providers:
return _try_payment_fallback(failed_provider, task)
return None
This makes fallback_providers the single source of truth for the
entire agent (main + aux), and respects users who deliberately picked
free-only models.
Related
See #24029 for the specific symptom (free-only users getting billed via
aux fallback). This issue addresses the underlying design.
Environment
- Hermes Agent v0.13.0 (2026.5.7)
- Affects all users with
fallback_providers set + auxiliary.*.provider: auto
The Design Issue
Hermes maintains two parallel fallback systems that don't know about each other:
When the main provider fails, the main agent walks the user's
fallback_providers. But every auxiliary task (compression,title_generation,vision,web_extract,session_search,skills_hub,approval,mcp,triage_specifier,curator) walks aseparate, hardcoded fallback chain with hardcoded default models —
mostly paid models like
google/gemini-3-flash-preview,claude-haiku-4-5,glm-4.5-flash, etc.This is a violation of the single-source-of-truth principle: the user
configured their fallback chain in one place, and most users assume
that's the only place the agent will look.
Why This Matters
Free-tier users who explicitly configure only
:freemodels infallback_providersstill get charged (or hit per-key spend limits)because aux tasks invisibly use a paid default. The user has no way to
discover this short of reading the source code.
Even paid users are affected: a user who carefully picked a budget-
friendly fallback chain will see aux tasks silently use a more
expensive model they didn't choose.
Code Evidence
agent/auxiliary_client.py:1823-1841:agent/auxiliary_client.py:391-392:plugins/model-providers/*/__init__.py: every provider'sdefault_aux_modelis a paid model.Nowhere in this fallback path does the code read the user's
fallback_providersfrom config.Existing Comment Acknowledges Step 1 But Not Step 2
agent/auxiliary_client.py:2451-2457:The comment frames Step 1 as preventing surprise model switches, but
Step 2's surprise paid-model switch goes unaddressed.
Proposed Fix
When Step 1 (main provider) fails for an aux task, walk the user's
fallback_providerslist — same order, same models the user picked —before consulting the hardcoded aux chain. The hardcoded chain
remains as a last-resort default for users with no
fallback_providersconfigured.
This makes
fallback_providersthe single source of truth for theentire agent (main + aux), and respects users who deliberately picked
free-only models.
Related
See #24029 for the specific symptom (free-only users getting billed via
aux fallback). This issue addresses the underlying design.
Environment
fallback_providersset +auxiliary.*.provider: auto