Skip to content

Single-key credential pool: rate-limit cooldown causes init-time RuntimeError, fallback chain never tried #17929

@leslieyeo

Description

@leslieyeo

Summary

When a provider's credential_pool contains only one entry and that entry is currently in 429-cooldown (marked exhausted by mark_exhausted_and_rotate), AIAgent.__init__ raises RuntimeError before the fallback chain is constructed. As a result, every fresh agent (cron jobs, new gateway sessions, etc.) fails with a misleading "no API key was found" message even though both:

  • the credential exists in auth.json (status exhausted, not deleted), and
  • a valid fallback_providers entry is configured and has working credentials.

The eager-fallback safety at run_agent.py:11880 only protects in-flight requests, not init.

Affected versions

hermes-agent HEAD 2d137074a (refactor(config): add cfg_get() helper; migrate 20 nested-get call sites (#17304)). The misleading branch dates back to whenever the explicit "fail fast instead of silently routing through OpenRouter" behavior was added.

Repro

  1. Configure a single-key pool for any provider with daily-quota rate limits, e.g. alibaba-coding-plan:
    model:
      default: qwen3.6-plus
      provider: alibaba-coding-plan
      fallback_providers:
        - provider: tencent-token-plan
          model: kimi2.5
  2. Drive enough traffic through cron to hit HTTP 429: usage allocated quota exceeded.
  3. After 429, agent.credential_pool correctly logs marking coding plan key exhausted (status=429), rotating and credential pool: no available entries (all exhausted or empty).
  4. Every subsequent fresh AIAgent(...) (the next cron firing, the next gateway message that builds a new agent) fails with:
    RuntimeError: Provider 'alibaba-coding-plan' is set in config.yaml but no API key was found.
    Set the ALIBABA_CODING_PLAN_API_KEY environment variable, or switch to a different provider with `hermes model`.
    

Real call site: run_agent.py:1441-1445. The else: branch executes when resolve_provider_client returns None, which happens cleanly when the pool has entries but none are currently available().

Expected behavior

When a fallback chain is configured, init-time credential exhaustion should not abort. The agent should construct a client against the first usable fallback entry and surface an _emit_status notification ("⚠️ Primary rate-limited, using fallback X/Y").

Observed behavior

Init raises immediately. The user-facing error suggests setting an env var that — by design — is not where Hermes stores this provider's key, leading to misdiagnosis (the user opens .env, sees no ALIBABA_CODING_PLAN_API_KEY, and concludes the key is gone).

Proposed fix

In run_agent.py:__init__, before raising the "no API key was found" RuntimeError, iterate the fallback_model argument and try resolve_provider_client for each entry. If any resolves, use it as the effective primary client and set self._fallback_activated = True so the existing _restore_primary_runtime machinery can pick the primary back up after cooldown.

Patch outline (working locally):

# Before the existing `raise RuntimeError(...)` for missing creds:
for fb in (fallback_model if isinstance(fallback_model, list) else [fallback_model]):
    if not isinstance(fb, dict):
        continue
    fb_client, fb_model = resolve_provider_client(
        fb["provider"], model=fb["model"], raw_codex=True,
        explicit_base_url=fb.get("base_url"), explicit_api_key=fb.get("api_key"),
    )
    if fb_client is None:
        continue
    self.provider = fb["provider"]
    self.model = fb_model or fb["model"]
    self._fallback_activated = True
    client_kwargs = {"api_key": fb_client.api_key, "base_url": str(fb_client.base_url)}
    break
else:
    raise RuntimeError(...)

Secondary issue (will file separately if confirmed)

_pool_may_recover_from_rate_limit returns False for len(entries) == 1, which correctly triggers eager fallback in the request loop, but _recover_with_credential_pool for FailoverReason.rate_limit first burns one retry on the same exhausted credential (if not has_retried_429: return False, True). On a known single-credential pool this retry is wasted — quota won't reset within the retry window. Consider gating the "retry once on first 429" path on _pool_may_recover_from_rate_limit(pool).

Logs (anonymized)

14:35:59 INFO  root: Fallback activated: qwen3.6-plus → qwen3.6-plus (alibaba-coding-plan)
14:36:01 INFO  agent.credential_pool: marking coding plan key exhausted (status=429), rotating
14:36:01 INFO  agent.credential_pool: no available entries (all exhausted or empty)
14:36:12 ERROR root: API call failed after 3 retries. HTTP 429: usage allocated quota exceeded.
15:06:17 ERROR cron.scheduler: Job '...' failed: RuntimeError: Provider 'alibaba-coding-plan' is set
                in config.yaml but no API key was found.
[repeats every cron tick for 3+ hours until pool cooldown TTL elapses]

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions