Single-key credential pool: rate-limit cooldown causes init-time RuntimeError, fallback chain never tried

## Summary

When a provider's `credential_pool` contains only one entry and that entry is currently in 429-cooldown (marked `exhausted` by `mark_exhausted_and_rotate`), `AIAgent.__init__` raises `RuntimeError` before the fallback chain is constructed. As a result, every fresh agent (cron jobs, new gateway sessions, etc.) fails with a misleading "no API key was found" message even though both:

- the credential exists in `auth.json` (status `exhausted`, not deleted), and
- a valid `fallback_providers` entry is configured and has working credentials.

The eager-fallback safety at `run_agent.py:11880` only protects in-flight requests, not init.

## Affected versions

`hermes-agent` HEAD `2d137074a` (`refactor(config): add cfg_get() helper; migrate 20 nested-get call sites (#17304)`). The misleading branch dates back to whenever the explicit "fail fast instead of silently routing through OpenRouter" behavior was added.

## Repro

1. Configure a single-key pool for any provider with daily-quota rate limits, e.g. `alibaba-coding-plan`:
   ```yaml
   model:
     default: qwen3.6-plus
     provider: alibaba-coding-plan
     fallback_providers:
       - provider: tencent-token-plan
         model: kimi2.5
   ```
2. Drive enough traffic through cron to hit `HTTP 429: usage allocated quota exceeded`.
3. After 429, `agent.credential_pool` correctly logs `marking coding plan key exhausted (status=429), rotating` and `credential pool: no available entries (all exhausted or empty)`.
4. Every subsequent fresh `AIAgent(...)` (the next cron firing, the next gateway message that builds a new agent) fails with:
   ```
   RuntimeError: Provider 'alibaba-coding-plan' is set in config.yaml but no API key was found.
   Set the ALIBABA_CODING_PLAN_API_KEY environment variable, or switch to a different provider with `hermes model`.
   ```

Real call site: `run_agent.py:1441-1445`. The `else:` branch executes when `resolve_provider_client` returns `None`, which happens cleanly when the pool has entries but none are currently `available()`.

## Expected behavior

When a fallback chain is configured, init-time credential exhaustion should not abort. The agent should construct a client against the first usable fallback entry and surface an `_emit_status` notification ("⚠️ Primary rate-limited, using fallback X/Y").

## Observed behavior

Init raises immediately. The user-facing error suggests setting an env var that — by design — is not where Hermes stores this provider's key, leading to misdiagnosis (the user opens `.env`, sees no `ALIBABA_CODING_PLAN_API_KEY`, and concludes the key is gone).

## Proposed fix

In `run_agent.py:__init__`, before raising the "no API key was found" `RuntimeError`, iterate the `fallback_model` argument and try `resolve_provider_client` for each entry. If any resolves, use it as the effective primary client and set `self._fallback_activated = True` so the existing `_restore_primary_runtime` machinery can pick the primary back up after cooldown.

Patch outline (working locally):

```python
# Before the existing `raise RuntimeError(...)` for missing creds:
for fb in (fallback_model if isinstance(fallback_model, list) else [fallback_model]):
    if not isinstance(fb, dict):
        continue
    fb_client, fb_model = resolve_provider_client(
        fb["provider"], model=fb["model"], raw_codex=True,
        explicit_base_url=fb.get("base_url"), explicit_api_key=fb.get("api_key"),
    )
    if fb_client is None:
        continue
    self.provider = fb["provider"]
    self.model = fb_model or fb["model"]
    self._fallback_activated = True
    client_kwargs = {"api_key": fb_client.api_key, "base_url": str(fb_client.base_url)}
    break
else:
    raise RuntimeError(...)
```

## Secondary issue (will file separately if confirmed)

`_pool_may_recover_from_rate_limit` returns `False` for `len(entries) == 1`, which correctly triggers eager fallback in the request loop, but `_recover_with_credential_pool` for `FailoverReason.rate_limit` first burns one retry on the same exhausted credential (`if not has_retried_429: return False, True`). On a known single-credential pool this retry is wasted — quota won't reset within the retry window. Consider gating the "retry once on first 429" path on `_pool_may_recover_from_rate_limit(pool)`.

## Logs (anonymized)

```
14:35:59 INFO  root: Fallback activated: qwen3.6-plus → qwen3.6-plus (alibaba-coding-plan)
14:36:01 INFO  agent.credential_pool: marking coding plan key exhausted (status=429), rotating
14:36:01 INFO  agent.credential_pool: no available entries (all exhausted or empty)
14:36:12 ERROR root: API call failed after 3 retries. HTTP 429: usage allocated quota exceeded.
15:06:17 ERROR cron.scheduler: Job '...' failed: RuntimeError: Provider 'alibaba-coding-plan' is set
                in config.yaml but no API key was found.
[repeats every cron tick for 3+ hours until pool cooldown TTL elapses]
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-key credential pool: rate-limit cooldown causes init-time RuntimeError, fallback chain never tried #17929

Summary

Affected versions

Repro

Expected behavior

Observed behavior

Proposed fix

Secondary issue (will file separately if confirmed)

Logs (anonymized)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Single-key credential pool: rate-limit cooldown causes init-time RuntimeError, fallback chain never tried #17929

Description

Summary

Affected versions

Repro

Expected behavior

Observed behavior

Proposed fix

Secondary issue (will file separately if confirmed)

Logs (anonymized)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions