Skip to content

[Feature]: Multi-provider credential pools for cross-provider failover and rotation #11737

@brimdor

Description

@brimdor

Problem or Use Case

Hermes credential pools are currently scoped to a single provider. Each pool key (e.g., "custom:local", "openai-codex", "copilot") maps to credentials that connect to one API endpoint. When all credentials in a pool are exhausted (rate-limited, billing-quota-hit, or auth-failed), the agent errors out even if other providers have available capacity with compatible models.

This is painful for users who:

  1. Pay for multiple providers (OpenRouter, OpenAI, Anthropic, custom endpoints) and want to use them interchangeably for the same model family (e.g., GPT-5.4 via OpenRouter vs. OpenAI direct).
  2. Run custom providers that serve the same model through different gateways (e.g., ollama.com/v1 and a local Ollama instance both serving glm-5.1) and want automatic failover between them.
  3. Want cost optimization by routing to the cheapest available provider for a given model without manual config changes.
  4. Need resilience — when one provider goes down or rate-limits, seamlessly fall through to the next provider serving a compatible model.

Currently, fallback_model/fallback_providers provides provider-level failover, but it requires the agent to fully fail before trying the next provider. It does not pool credentials across providers for smart selection within a single turn. The provider_routing config (only/ignore/order) restricts which providers OpenRouter uses — it does not bridge across different provider types.

What is needed is a way to define a multi-provider credential pool that groups credentials from different providers (potentially with different base_url, api_mode, and api_key values) into a single failover-capable pool that the CredentialPool class can rotate through.

Proposed Solution

Config-level: Add pools section to config.yaml

pools:
  openai-compatible:
    strategy: round_robin      # fill_first | round_robin | random | least_used
    entries:
      - provider: openai-codex
        label: "Codex Plus"
        # Uses pool entry from auth.json (openai-codex)

      - provider: custom
        base_url: https://api.openai.com/v1
        api_key_env: OPENAI_API_KEY_DIRECT
        label: "OpenAI Direct"
        api_mode: chat_completions

      - provider: custom
        base_url: https://ollama.com/v1
        pool: custom:local      # Reference existing custom:local pool from auth.json
        label: "Ollama Cloud"
        api_mode: chat_completions

  anthropic-compatible:
    strategy: fill_first
    entries:
      - provider: anthropic
        # Uses pool entry from auth.json (anthropic)

      - provider: custom
        base_url: https://opencode.ai/zen/go/v1
        pool: custom:opencode-go
        label: "OpenCode Go"
        api_mode: anthropic_messages

Model config integration

model:
  default: gpt-5.4
  provider: pool:openai-compatible   # Route through a multi-provider pool

  # Or, for single-model failover:
  fallback_providers:
    - provider: pool:openai-compatible
      model: gpt-5.4

Runtime behavior

When provider resolves to a pool: reference, resolve_runtime_provider() would:

  1. Load the named pool definition from config
  2. Iterate entries using the pool strategy (round_robin, least_used, etc.)
  3. For each entry, resolve credentials (pool reference from auth.json, env var, or explicit key)
  4. If the selected entry fails with an auth/rate-limit error, mark it exhausted and rotate to the next entry
  5. Return the full runtime dict (provider, api_key, base_url, api_mode, credential_pool) just like a single-provider pool does today

Implementation approach

The core change is in hermes_cli/runtime_provider.py — when resolve_runtime_provider() receives a pool:<name> provider string:

  • Load the pool definition from config
  • Create a MultiProviderPool that wraps multiple CredentialPool instances (or inline entries)
  • MultiProviderPool.select() iterates entries using the configured strategy
  • On mark_exhausted_and_rotate(), advance to the next entry in the pool
  • Each entry contributes a fully-resolved provider, base_url, api_key, api_mode tuple

The existing CredentialPool class stays unchanged — single-provider pools continue working as before. MultiProviderPool is additive.

Key challenges

  • Model compatibility: A pool entry should declare which models it serves, so the agent does not route a gpt-5.4 request to an Anthropic-only endpoint. Could use the existing custom_providers.models config pattern.
  • api_mode bridging: Entries in the same pool may need different api_mode values (e.g., chat_completions for OpenAI-compatible, anthropic_messages for Claude endpoints). The pool entry must carry its own api_mode.
  • Error classification: Not all errors should trigger pool rotation — only auth (401/403), rate-limit (429), and billing/quota (402) errors. Server errors (500/502/503) should use retry logic instead.

Alternatives Considered

  1. Use fallback_providers for cross-provider failover — This already exists but requires a full agent failure before trying the next provider. It does not pre-emptively rotate credentials or do smart selection. It is also a separate config that does not integrate with pool strategies.

  2. Multiple custom_providers entries pointing to different providers — The current custom_providers list only defines endpoint metadata and model lists. It does not create a failover pool. Each custom provider still resolves to one pool key in auth.json.

  3. Write a custom hook that catches errors and re-resolves — The hook system (gateway/hooks.py) fires agent:end events but cannot modify the in-flight request or switch providers mid-turn.

  4. Shell script wrapper — An external tool cannot inject credentials into a running Hermes agent mid-session.

Feature Type

Gateway / messaging improvement

Scope

Medium (2-3 files, ~300-500 lines)

Additional Context

I investigated the full credential pool and runtime provider resolution pipeline:

  • agent/credential_pool.pyCredentialPool class is provider-scoped (keyed by provider name like "openai-codex", "custom:local")
  • agent/credential_pool.py:346get_pool_strategy() reads credential_pool_strategies from config, accepts fill_first, round_robin, random, least_used
  • hermes_cli/runtime_provider.py:704resolve_runtime_provider() loads a single pool via load_pool(provider) and returns a single credential_pool reference
  • hermes_cli/runtime_provider.py:170-230_resolve_runtime_from_pool_entry() builds the runtime dict with provider, base_url, api_key, api_mode, credential_pool
  • run_agent.py:611-1088AIAgent.__init__ accepts fallback_model (single dict or list of provider dicts), but this is a separate failover chain, not pool-integrated
  • Existing provider_routing config only controls OpenRouter routing preferences, not cross-provider pooling

The custom:local pool in our deployment uses round_robin strategy across 4 Ollama API keys, which works great for same-provider key rotation. Extending this concept across providers is the natural next step.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havearea/authAuthentication, OAuth, credential poolscomp/agentCore agent loop, run_agent.py, prompt buildertype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions