fix(agent): try fallback providers at init when primary credential pool is exhausted (salvage #17958)#18762
Merged
Merged
Conversation
…ol is exhausted (#17929) When a provider's credential pool has a single entry in 429-cooldown, resolve_provider_client returns None and AIAgent.__init__ raises a misleading RuntimeError suggesting the API key is missing — even when valid fallback_providers are configured. This patch makes __init__ iterate the fallback chain before raising, mirroring the existing in-flight fallback logic in the request loop. If a fallback resolves, the agent initializes against it and sets _fallback_activated=True so _restore_primary_runtime can pick the primary back up after cooldown. Closes #17929
Contributor
🚨 CRITICAL Supply Chain Risk DetectedThis PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging. 🚨 CRITICAL: Install-hook file added or modifiedThese files can execute code during package installation or interpreter startup. Files: Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting. |
Collaborator
1 similar comment
Collaborator
19 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a provider has a single-key
credential_pooland that key is in 429-cooldown,resolve_provider_clientreturnsNoneandAIAgent.__init__used to raise a misleadingRuntimeError: no API key was found— even when a validfallback_providerschain was configured. This caused every fresh agent (cron jobs, new gateway sessions) to crash for the entire cooldown window, with the error suggesting the user set an env var that Hermes doesn't actually use for their provider.Cherry-picked from @luyao618's #17958 onto current main. Clean apply; 2 new tests pass and no regressions in
tests/run_agent/(1192 passed, 2 pre-existing unrelated failures intest_concurrent_interrupt.pyconfirmed to be on main without this change).What changed
run_agent.py: before raising the "no API key"RuntimeError, iteratefallback_modelentries and callresolve_provider_clienton each. If one resolves, adopt it as the effective primary, set_fallback_activated=True, and let the existing_restore_primary_runtimemachinery promote the primary back once cooldown lifts. Preserves the flag across the later init block that used to reset it unconditionally.tests/run_agent/test_init_fallback_on_exhausted_pool.py: 2 tests — fallback adopted when primary returns None; original error preserved when no fallback is configured.Validation
scripts/run_tests.sh tests/run_agent/test_init_fallback_on_exhausted_pool.py→ 2 passed.scripts/run_tests.sh tests/run_agent/→ 1192 passed (2 pre-existing failures unrelated to this change).AIAgent.__init__and mockedresolve_provider_client—_fallback_activated=True.RuntimeErrorpreserved (message still names the provider and env var).Closes #17929.