Summary
When a provider's credential_pool contains only one entry and that entry is currently in 429-cooldown (marked exhausted by mark_exhausted_and_rotate), AIAgent.__init__ raises RuntimeError before the fallback chain is constructed. As a result, every fresh agent (cron jobs, new gateway sessions, etc.) fails with a misleading "no API key was found" message even though both:
- the credential exists in
auth.json (status exhausted, not deleted), and
- a valid
fallback_providers entry is configured and has working credentials.
The eager-fallback safety at run_agent.py:11880 only protects in-flight requests, not init.
Affected versions
hermes-agent HEAD 2d137074a (refactor(config): add cfg_get() helper; migrate 20 nested-get call sites (#17304)). The misleading branch dates back to whenever the explicit "fail fast instead of silently routing through OpenRouter" behavior was added.
Repro
- Configure a single-key pool for any provider with daily-quota rate limits, e.g.
alibaba-coding-plan:
model:
default: qwen3.6-plus
provider: alibaba-coding-plan
fallback_providers:
- provider: tencent-token-plan
model: kimi2.5
- Drive enough traffic through cron to hit
HTTP 429: usage allocated quota exceeded.
- After 429,
agent.credential_pool correctly logs marking coding plan key exhausted (status=429), rotating and credential pool: no available entries (all exhausted or empty).
- Every subsequent fresh
AIAgent(...) (the next cron firing, the next gateway message that builds a new agent) fails with:
RuntimeError: Provider 'alibaba-coding-plan' is set in config.yaml but no API key was found.
Set the ALIBABA_CODING_PLAN_API_KEY environment variable, or switch to a different provider with `hermes model`.
Real call site: run_agent.py:1441-1445. The else: branch executes when resolve_provider_client returns None, which happens cleanly when the pool has entries but none are currently available().
Expected behavior
When a fallback chain is configured, init-time credential exhaustion should not abort. The agent should construct a client against the first usable fallback entry and surface an _emit_status notification ("⚠️ Primary rate-limited, using fallback X/Y").
Observed behavior
Init raises immediately. The user-facing error suggests setting an env var that — by design — is not where Hermes stores this provider's key, leading to misdiagnosis (the user opens .env, sees no ALIBABA_CODING_PLAN_API_KEY, and concludes the key is gone).
Proposed fix
In run_agent.py:__init__, before raising the "no API key was found" RuntimeError, iterate the fallback_model argument and try resolve_provider_client for each entry. If any resolves, use it as the effective primary client and set self._fallback_activated = True so the existing _restore_primary_runtime machinery can pick the primary back up after cooldown.
Patch outline (working locally):
# Before the existing `raise RuntimeError(...)` for missing creds:
for fb in (fallback_model if isinstance(fallback_model, list) else [fallback_model]):
if not isinstance(fb, dict):
continue
fb_client, fb_model = resolve_provider_client(
fb["provider"], model=fb["model"], raw_codex=True,
explicit_base_url=fb.get("base_url"), explicit_api_key=fb.get("api_key"),
)
if fb_client is None:
continue
self.provider = fb["provider"]
self.model = fb_model or fb["model"]
self._fallback_activated = True
client_kwargs = {"api_key": fb_client.api_key, "base_url": str(fb_client.base_url)}
break
else:
raise RuntimeError(...)
Secondary issue (will file separately if confirmed)
_pool_may_recover_from_rate_limit returns False for len(entries) == 1, which correctly triggers eager fallback in the request loop, but _recover_with_credential_pool for FailoverReason.rate_limit first burns one retry on the same exhausted credential (if not has_retried_429: return False, True). On a known single-credential pool this retry is wasted — quota won't reset within the retry window. Consider gating the "retry once on first 429" path on _pool_may_recover_from_rate_limit(pool).
Logs (anonymized)
14:35:59 INFO root: Fallback activated: qwen3.6-plus → qwen3.6-plus (alibaba-coding-plan)
14:36:01 INFO agent.credential_pool: marking coding plan key exhausted (status=429), rotating
14:36:01 INFO agent.credential_pool: no available entries (all exhausted or empty)
14:36:12 ERROR root: API call failed after 3 retries. HTTP 429: usage allocated quota exceeded.
15:06:17 ERROR cron.scheduler: Job '...' failed: RuntimeError: Provider 'alibaba-coding-plan' is set
in config.yaml but no API key was found.
[repeats every cron tick for 3+ hours until pool cooldown TTL elapses]
Summary
When a provider's
credential_poolcontains only one entry and that entry is currently in 429-cooldown (markedexhaustedbymark_exhausted_and_rotate),AIAgent.__init__raisesRuntimeErrorbefore the fallback chain is constructed. As a result, every fresh agent (cron jobs, new gateway sessions, etc.) fails with a misleading "no API key was found" message even though both:auth.json(statusexhausted, not deleted), andfallback_providersentry is configured and has working credentials.The eager-fallback safety at
run_agent.py:11880only protects in-flight requests, not init.Affected versions
hermes-agentHEAD2d137074a(refactor(config): add cfg_get() helper; migrate 20 nested-get call sites (#17304)). The misleading branch dates back to whenever the explicit "fail fast instead of silently routing through OpenRouter" behavior was added.Repro
alibaba-coding-plan:HTTP 429: usage allocated quota exceeded.agent.credential_poolcorrectly logsmarking coding plan key exhausted (status=429), rotatingandcredential pool: no available entries (all exhausted or empty).AIAgent(...)(the next cron firing, the next gateway message that builds a new agent) fails with:Real call site:
run_agent.py:1441-1445. Theelse:branch executes whenresolve_provider_clientreturnsNone, which happens cleanly when the pool has entries but none are currentlyavailable().Expected behavior
When a fallback chain is configured, init-time credential exhaustion should not abort. The agent should construct a client against the first usable fallback entry and surface an⚠️ Primary rate-limited, using fallback X/Y").
_emit_statusnotification ("Observed behavior
Init raises immediately. The user-facing error suggests setting an env var that — by design — is not where Hermes stores this provider's key, leading to misdiagnosis (the user opens
.env, sees noALIBABA_CODING_PLAN_API_KEY, and concludes the key is gone).Proposed fix
In
run_agent.py:__init__, before raising the "no API key was found"RuntimeError, iterate thefallback_modelargument and tryresolve_provider_clientfor each entry. If any resolves, use it as the effective primary client and setself._fallback_activated = Trueso the existing_restore_primary_runtimemachinery can pick the primary back up after cooldown.Patch outline (working locally):
Secondary issue (will file separately if confirmed)
_pool_may_recover_from_rate_limitreturnsFalseforlen(entries) == 1, which correctly triggers eager fallback in the request loop, but_recover_with_credential_poolforFailoverReason.rate_limitfirst burns one retry on the same exhausted credential (if not has_retried_429: return False, True). On a known single-credential pool this retry is wasted — quota won't reset within the retry window. Consider gating the "retry once on first 429" path on_pool_may_recover_from_rate_limit(pool).Logs (anonymized)