Problem or Use Case
Hermes credential pools are currently scoped to a single provider. Each pool key (e.g., "custom:local", "openai-codex", "copilot") maps to credentials that connect to one API endpoint. When all credentials in a pool are exhausted (rate-limited, billing-quota-hit, or auth-failed), the agent errors out even if other providers have available capacity with compatible models.
This is painful for users who:
- Pay for multiple providers (OpenRouter, OpenAI, Anthropic, custom endpoints) and want to use them interchangeably for the same model family (e.g., GPT-5.4 via OpenRouter vs. OpenAI direct).
- Run custom providers that serve the same model through different gateways (e.g.,
ollama.com/v1 and a local Ollama instance both serving glm-5.1) and want automatic failover between them.
- Want cost optimization by routing to the cheapest available provider for a given model without manual config changes.
- Need resilience — when one provider goes down or rate-limits, seamlessly fall through to the next provider serving a compatible model.
Currently, fallback_model/fallback_providers provides provider-level failover, but it requires the agent to fully fail before trying the next provider. It does not pool credentials across providers for smart selection within a single turn. The provider_routing config (only/ignore/order) restricts which providers OpenRouter uses — it does not bridge across different provider types.
What is needed is a way to define a multi-provider credential pool that groups credentials from different providers (potentially with different base_url, api_mode, and api_key values) into a single failover-capable pool that the CredentialPool class can rotate through.
Proposed Solution
Config-level: Add pools section to config.yaml
pools:
openai-compatible:
strategy: round_robin # fill_first | round_robin | random | least_used
entries:
- provider: openai-codex
label: "Codex Plus"
# Uses pool entry from auth.json (openai-codex)
- provider: custom
base_url: https://api.openai.com/v1
api_key_env: OPENAI_API_KEY_DIRECT
label: "OpenAI Direct"
api_mode: chat_completions
- provider: custom
base_url: https://ollama.com/v1
pool: custom:local # Reference existing custom:local pool from auth.json
label: "Ollama Cloud"
api_mode: chat_completions
anthropic-compatible:
strategy: fill_first
entries:
- provider: anthropic
# Uses pool entry from auth.json (anthropic)
- provider: custom
base_url: https://opencode.ai/zen/go/v1
pool: custom:opencode-go
label: "OpenCode Go"
api_mode: anthropic_messages
Model config integration
model:
default: gpt-5.4
provider: pool:openai-compatible # Route through a multi-provider pool
# Or, for single-model failover:
fallback_providers:
- provider: pool:openai-compatible
model: gpt-5.4
Runtime behavior
When provider resolves to a pool: reference, resolve_runtime_provider() would:
- Load the named pool definition from config
- Iterate entries using the pool strategy (round_robin, least_used, etc.)
- For each entry, resolve credentials (pool reference from auth.json, env var, or explicit key)
- If the selected entry fails with an auth/rate-limit error, mark it exhausted and rotate to the next entry
- Return the full runtime dict (
provider, api_key, base_url, api_mode, credential_pool) just like a single-provider pool does today
Implementation approach
The core change is in hermes_cli/runtime_provider.py — when resolve_runtime_provider() receives a pool:<name> provider string:
- Load the pool definition from config
- Create a
MultiProviderPool that wraps multiple CredentialPool instances (or inline entries)
MultiProviderPool.select() iterates entries using the configured strategy
- On
mark_exhausted_and_rotate(), advance to the next entry in the pool
- Each entry contributes a fully-resolved
provider, base_url, api_key, api_mode tuple
The existing CredentialPool class stays unchanged — single-provider pools continue working as before. MultiProviderPool is additive.
Key challenges
- Model compatibility: A pool entry should declare which models it serves, so the agent does not route a
gpt-5.4 request to an Anthropic-only endpoint. Could use the existing custom_providers.models config pattern.
- api_mode bridging: Entries in the same pool may need different
api_mode values (e.g., chat_completions for OpenAI-compatible, anthropic_messages for Claude endpoints). The pool entry must carry its own api_mode.
- Error classification: Not all errors should trigger pool rotation — only auth (401/403), rate-limit (429), and billing/quota (402) errors. Server errors (500/502/503) should use retry logic instead.
Alternatives Considered
-
Use fallback_providers for cross-provider failover — This already exists but requires a full agent failure before trying the next provider. It does not pre-emptively rotate credentials or do smart selection. It is also a separate config that does not integrate with pool strategies.
-
Multiple custom_providers entries pointing to different providers — The current custom_providers list only defines endpoint metadata and model lists. It does not create a failover pool. Each custom provider still resolves to one pool key in auth.json.
-
Write a custom hook that catches errors and re-resolves — The hook system (gateway/hooks.py) fires agent:end events but cannot modify the in-flight request or switch providers mid-turn.
-
Shell script wrapper — An external tool cannot inject credentials into a running Hermes agent mid-session.
Feature Type
Gateway / messaging improvement
Scope
Medium (2-3 files, ~300-500 lines)
Additional Context
I investigated the full credential pool and runtime provider resolution pipeline:
agent/credential_pool.py — CredentialPool class is provider-scoped (keyed by provider name like "openai-codex", "custom:local")
agent/credential_pool.py:346 — get_pool_strategy() reads credential_pool_strategies from config, accepts fill_first, round_robin, random, least_used
hermes_cli/runtime_provider.py:704 — resolve_runtime_provider() loads a single pool via load_pool(provider) and returns a single credential_pool reference
hermes_cli/runtime_provider.py:170-230 — _resolve_runtime_from_pool_entry() builds the runtime dict with provider, base_url, api_key, api_mode, credential_pool
run_agent.py:611-1088 — AIAgent.__init__ accepts fallback_model (single dict or list of provider dicts), but this is a separate failover chain, not pool-integrated
- Existing
provider_routing config only controls OpenRouter routing preferences, not cross-provider pooling
The custom:local pool in our deployment uses round_robin strategy across 4 Ollama API keys, which works great for same-provider key rotation. Extending this concept across providers is the natural next step.
Problem or Use Case
Hermes credential pools are currently scoped to a single provider. Each pool key (e.g.,
"custom:local","openai-codex","copilot") maps to credentials that connect to one API endpoint. When all credentials in a pool are exhausted (rate-limited, billing-quota-hit, or auth-failed), the agent errors out even if other providers have available capacity with compatible models.This is painful for users who:
ollama.com/v1and a local Ollama instance both servingglm-5.1) and want automatic failover between them.Currently,
fallback_model/fallback_providersprovides provider-level failover, but it requires the agent to fully fail before trying the next provider. It does not pool credentials across providers for smart selection within a single turn. Theprovider_routingconfig (only/ignore/order) restricts which providers OpenRouter uses — it does not bridge across different provider types.What is needed is a way to define a multi-provider credential pool that groups credentials from different providers (potentially with different
base_url,api_mode, andapi_keyvalues) into a single failover-capable pool that theCredentialPoolclass can rotate through.Proposed Solution
Config-level: Add
poolssection toconfig.yamlModel config integration
Runtime behavior
When
providerresolves to apool:reference,resolve_runtime_provider()would:provider,api_key,base_url,api_mode,credential_pool) just like a single-provider pool does todayImplementation approach
The core change is in
hermes_cli/runtime_provider.py— whenresolve_runtime_provider()receives apool:<name>provider string:MultiProviderPoolthat wraps multipleCredentialPoolinstances (or inline entries)MultiProviderPool.select()iterates entries using the configured strategymark_exhausted_and_rotate(), advance to the next entry in the poolprovider,base_url,api_key,api_modetupleThe existing
CredentialPoolclass stays unchanged — single-provider pools continue working as before.MultiProviderPoolis additive.Key challenges
gpt-5.4request to an Anthropic-only endpoint. Could use the existingcustom_providers.modelsconfig pattern.api_modevalues (e.g.,chat_completionsfor OpenAI-compatible,anthropic_messagesfor Claude endpoints). The pool entry must carry its ownapi_mode.Alternatives Considered
Use
fallback_providersfor cross-provider failover — This already exists but requires a full agent failure before trying the next provider. It does not pre-emptively rotate credentials or do smart selection. It is also a separate config that does not integrate with pool strategies.Multiple
custom_providersentries pointing to different providers — The currentcustom_providerslist only defines endpoint metadata and model lists. It does not create a failover pool. Each custom provider still resolves to one pool key inauth.json.Write a custom hook that catches errors and re-resolves — The hook system (
gateway/hooks.py) firesagent:endevents but cannot modify the in-flight request or switch providers mid-turn.Shell script wrapper — An external tool cannot inject credentials into a running Hermes agent mid-session.
Feature Type
Gateway / messaging improvement
Scope
Medium (2-3 files, ~300-500 lines)
Additional Context
I investigated the full credential pool and runtime provider resolution pipeline:
agent/credential_pool.py—CredentialPoolclass is provider-scoped (keyed by provider name like"openai-codex","custom:local")agent/credential_pool.py:346—get_pool_strategy()readscredential_pool_strategiesfrom config, acceptsfill_first,round_robin,random,least_usedhermes_cli/runtime_provider.py:704—resolve_runtime_provider()loads a single pool viaload_pool(provider)and returns a singlecredential_poolreferencehermes_cli/runtime_provider.py:170-230—_resolve_runtime_from_pool_entry()builds the runtime dict withprovider,base_url,api_key,api_mode,credential_poolrun_agent.py:611-1088—AIAgent.__init__acceptsfallback_model(single dict or list of provider dicts), but this is a separate failover chain, not pool-integratedprovider_routingconfig only controls OpenRouter routing preferences, not cross-provider poolingThe
custom:localpool in our deployment usesround_robinstrategy across 4 Ollama API keys, which works great for same-provider key rotation. Extending this concept across providers is the natural next step.