Skip to content

fix(agent): try fallback providers at init when primary credential pool is exhausted (salvage #17958)#18762

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-3d89efe9
May 2, 2026
Merged

fix(agent): try fallback providers at init when primary credential pool is exhausted (salvage #17958)#18762
teknium1 merged 1 commit into
mainfrom
hermes/hermes-3d89efe9

Conversation

@teknium1

@teknium1 teknium1 commented May 2, 2026

Copy link
Copy Markdown
Contributor

When a provider has a single-key credential_pool and that key is in 429-cooldown, resolve_provider_client returns None and AIAgent.__init__ used to raise a misleading RuntimeError: no API key was found — even when a valid fallback_providers chain was configured. This caused every fresh agent (cron jobs, new gateway sessions) to crash for the entire cooldown window, with the error suggesting the user set an env var that Hermes doesn't actually use for their provider.

Cherry-picked from @luyao618's #17958 onto current main. Clean apply; 2 new tests pass and no regressions in tests/run_agent/ (1192 passed, 2 pre-existing unrelated failures in test_concurrent_interrupt.py confirmed to be on main without this change).

What changed

  • run_agent.py: before raising the "no API key" RuntimeError, iterate fallback_model entries and call resolve_provider_client on each. If one resolves, adopt it as the effective primary, set _fallback_activated=True, and let the existing _restore_primary_runtime machinery promote the primary back once cooldown lifts. Preserves the flag across the later init block that used to reset it unconditionally.
  • tests/run_agent/test_init_fallback_on_exhausted_pool.py: 2 tests — fallback adopted when primary returns None; original error preserved when no fallback is configured.

Validation

  • scripts/run_tests.sh tests/run_agent/test_init_fallback_on_exhausted_pool.py → 2 passed.
  • scripts/run_tests.sh tests/run_agent/ → 1192 passed (2 pre-existing failures unrelated to this change).
  • E2E: three scenarios with real AIAgent.__init__ and mocked resolve_provider_client
    1. Primary exhausted + working fallback → agent comes up on fallback, _fallback_activated=True.
    2. Primary exhausted + no fallback → original RuntimeError preserved (message still names the provider and env var).
    3. Primary + first fallback both exhausted, second fallback working → chain walked through and agent adopts the second fallback.

Closes #17929.

…ol is exhausted (#17929)

When a provider's credential pool has a single entry in 429-cooldown,
resolve_provider_client returns None and AIAgent.__init__ raises a
misleading RuntimeError suggesting the API key is missing — even when
valid fallback_providers are configured.

This patch makes __init__ iterate the fallback chain before raising,
mirroring the existing in-flight fallback logic in the request loop.
If a fallback resolves, the agent initializes against it and sets
_fallback_activated=True so _restore_primary_runtime can pick the
primary back up after cooldown.

Closes #17929
@teknium1 teknium1 merged commit 13f344c into main May 2, 2026
8 of 10 checks passed
@github-actions

github-actions Bot commented May 2, 2026

Copy link
Copy Markdown
Contributor

🚨 CRITICAL Supply Chain Risk Detected

This PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging.

🚨 CRITICAL: Install-hook file added or modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py

Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting.

@teknium1 teknium1 deleted the hermes/hermes-3d89efe9 branch May 2, 2026 09:09
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder area/auth Authentication, OAuth, credential pools labels May 2, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Salvage of #17958. Closes #17929. Related to #15298 (restore_primary burns retries on exhausted pools).

1 similar comment
@alt-glitch

Copy link
Copy Markdown
Collaborator

Salvage of #17958. Closes #17929. Related to #15298 (restore_primary burns retries on exhausted pools).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/auth Authentication, OAuth, credential pools comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Single-key credential pool: rate-limit cooldown causes init-time RuntimeError, fallback chain never tried

3 participants