Skip to content

[Bug]: fallback_model not triggered when primary provider auth fails (AuthError at credential resolution) #7230

@kht2199

Description

@kht2199

Bug Description

When the primary provider's OAuth token expires (e.g. openai-codex with HTTP 401), fallback_model configured in config.yaml is never used. The agent returns a fixed ~78-char error message to Discord instead of falling back to the configured backup model.

Steps to Reproduce

  1. Configure config.yaml with openai-codex as primary and a local model as fallback:
    model:
      provider: openai-codex
    fallback_model:
      provider: custom
      model: mlx-community/gemma-4-e4b-it-4bit
      base_url: http://localhost:8080/v1
      api_key: no-key-required
  2. Let the Codex OAuth token expire (or revoke it)
  3. Send a message via Discord gateway
  4. Agent responds with a fixed error string instead of using the fallback model

Expected Behavior

When primary provider auth fails, Hermes should automatically switch to fallback_model (or fallback_providers chain) and respond normally, the same way it handles rate limits mid-conversation.

Actual Behavior

The agent returns a fixed ~78-char error response and logs:

INFO  agent.credential_pool: credential pool: no available entries (all exhausted or empty)
ERROR cron.scheduler: Job failed: RuntimeError: Codex token refresh failed with status 401.
Traceback:
  ...
  hermes_cli.auth.AuthError: Codex token refresh failed with status 401.

Affected Component

  • Gateway (Telegram/Discord/Slack/WhatsApp)

Messaging Platform

  • Discord

Root Cause Analysis

The fallback mechanism has two separate layers:

Layer 1 — run_agent.py:_try_activate_fallback() (lines 4987–5125)
Handles fallback during API call errors (rate limits, HTTP errors mid-conversation). Works correctly.

Layer 2 — credential resolution (gateway/run.py:_resolve_runtime_agent_kwargs(), cron/scheduler.py:run_job())
This runs before run_agent.py is instantiated. When resolve_runtime_provider() raises AuthError, it propagates as a RuntimeError and Layer 1 is never reached.

The root is in hermes_cli/runtime_provider.py lines 707–709:

# openai-codex (same pattern for nous, qwen-oauth)
except AuthError:
    if requested_provider != "auto":
        raise  # ← raises immediately when provider is explicitly set
    # only falls through when provider == "auto"

When the provider is explicitly set in config.yaml (e.g. provider: openai-codex), requested_provider is "openai-codex" — not "auto" — so the AuthError is always re-raised before fallback_model is ever consulted.

Where the fix should go:

gateway/run.py:292_resolve_runtime_agent_kwargs()

# current: AuthError propagates as RuntimeError, no fallback
try:
    runtime = resolve_runtime_provider(...)
except Exception as exc:
    raise RuntimeError(format_runtime_provider_error(exc)) from exc

cron/scheduler.py:661run_job()

# current: same — fallback_model is read on line 687 but never used on auth failure
try:
    runtime = resolve_runtime_provider(**runtime_kwargs)
except Exception as exc:
    raise RuntimeError(message) from exc

fallback_model = _cfg.get("fallback_providers") or _cfg.get("fallback_model")  # already here, unused on auth error

Proposed Fix

Catch AuthError specifically at both call sites and retry resolve_runtime_provider() with the fallback config before giving up. The fix is isolated to gateway/run.py and cron/scheduler.py — no changes needed in runtime_provider.py.

Key consideration: only catch AuthError (not all exceptions) to avoid silently swallowing real config errors (e.g. missing Anthropic API key should still surface clearly).

Operating System

macOS 25.4.0 (Darwin)

Python Version

3.11.15

Hermes Version

0.8.0 (2026.4.8)

Relevant Logs / Traceback

INFO  gateway.run: inbound message: platform=discord user=taek chat=...
INFO  agent.credential_pool: credential pool: no available entries (all exhausted or empty)
ERROR cron.scheduler: Job 'resource-monitor-home-discord' failed: RuntimeError: Codex token refresh failed with status 401.
  File "hermes_cli/runtime_provider.py", line 697, in resolve_runtime_provider
    creds = resolve_codex_runtime_credentials()
  File "hermes_cli/auth.py", line 1471, in resolve_codex_runtime_credentials
    tokens = _refresh_codex_auth_tokens(tokens, refresh_timeout_seconds)
  File "hermes_cli/auth.py", line 1334, in refresh_codex_oauth_pure
    raise AuthError: Codex token refresh failed with status 401.
INFO  gateway.run: response ready: platform=discord time=5.3s api_calls=0 response=78 chars

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundarea/authAuthentication, OAuth, credential poolscomp/cronCron scheduler and job managementcomp/gatewayGateway runner, session dispatch, deliveryprovider/openaiOpenAI / Codex Responses APItype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions