Skip to content

[Bug]: Credential pool provider fallback is silent — inconsistent with fallback_model notification behavior #10476

@aangelinsf

Description

@aangelinsf

Bug Description

Hermes version: v0.9.0 (2026.4.13)

Authorship:
written primarily by Claude but experienced by and reviewed by a human; a human will respond.

Summary

When fallback_model triggers a provider switch, Hermes correctly notifies the user via
_emit_status(), which propagates to Telegram/Discord/etc. When credential pool exhaustion
causes the same provider switch, no notification is sent. The user has no idea they are now
talking to a different AI.

Why This Is a Bug (Not a Feature Request)

The notification infrastructure already exists. _emit_status() propagates to all gateway
platforms. The _try_activate_fallback() path calls it correctly:

Steps to Reproduce

Steps to Reproduce

  1. Configure a primary provider (e.g. kimi-coding) with a valid API key.
  2. Have a second provider's key also present in .env (e.g. OPENROUTER_API_KEY).
  3. Cause the primary key to exhaust — e.g. via repeated 401s during initial setup.
  4. Send a message via Telegram.

Expected Behavior

Expected: User receives a notification that the provider has changed.

Actual Behavior

Actual: Response arrives silently from the fallback provider. No indication of the switch.

Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp)

Messaging Platform (if gateway-related)

Telegram

Debug Report

Debug report uploaded:

Report     https://paste.rs/kL77k
agent.log  https://paste.rs/7Hx5f

Operating System

Linux 6.8.0-107-generic x86_64

Python Version

3.13.5

Hermes Version

0.9.0 (2026.4.13)

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

The _recover_with_credential_pool() path does not call it when exhaustion causes a
cross-provider rotation. Same user-visible outcome, inconsistent behavior.

Impact

In a gateway deployment (Telegram bot), there is no way to know the provider has changed
short of explicitly asking the agent or SSHing into the server to check logs. The user may
unknowingly send sensitive tasks to a different provider, incur unexpected costs, or receive
different capability behavior — all without any signal.

This is distinct from the cooldown duration change in PR #10058, which was the stated reason for closing that PR. The notification half of that fix stands on its own regardless of what TTL is used for exhausted credentials.

PR #10058 (closed) identified this issue and had a working implementation. The notification behavior from that PR should be reconsidered independently of the 24h cooldown change.

Proposed Fix (optional)

In run_agent.py, call _emit_status() from _recover_with_credential_pool() when
mark_exhausted_and_rotate() returns None (all credentials exhausted, cross-provider
fallback triggered). Same-provider key rotation should remain silent — that is low-signal
infrastructure behavior the user doesn't need to know about.

_provider_label = getattr(self, "provider", "unknown")
self._emit_status(
    f"⚠️ Primary AI ({_provider_label}) is unavailable — your key may be invalid or expired. "
    f"Switched to fallback provider. Responses may behave differently. "
    f"To restore: check your API key and run `hermes auth reset {_provider_label}`."
)

### Are you willing to submit a PR for this?

- [ ] I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions