fix(agent): reset _fallback_index at turn start even when no fallback activated#27185
Merged
Conversation
… activated In long-lived interactive sessions, _try_activate_fallback() advances _fallback_index before attempting client resolution. When resolution fails (provider not configured, etc.) the function returns False without ever setting _fallback_activated=True. _restore_primary_runtime() then skips its reset block entirely (guarded by `if not _fallback_activated`), leaving _fallback_index >= len(_fallback_chain) for all subsequent turns. The eager-fallback guard at the top of the retry loop checks `_fallback_index < len(_fallback_chain)`, so the condition fails silently and no fallback is ever attempted again for that session. Cron jobs spawn a fresh AIAgent per run and never hit this path, which is why the same fallback chain works reliably for cron but not interactive. Fix: reset _fallback_index=0 in the `not _fallback_activated` early-return branch so every new turn starts with the full chain available. Fixes #20465
Contributor
🔎 Lint report:
|
This comment was marked as spam.
This comment was marked as spam.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Salvage of #20793 onto current main. Preserves @konsisumer's authorship.
Summary
Interactive CLI sessions now honor
fallback_providerson Codex 429usage_limit_reached(and all other failures), matching cron behavior.Root cause:
_try_activate_fallback()increments_fallback_indexBEFORE resolving the provider's client. When every chain entry's resolver returnsNone(or raises), the recursive walk exhausts_fallback_indexto>= len(_fallback_chain)but never sets_fallback_activated = True. Next turn,_restore_primary_runtime()early-returns because_fallback_activatedis False, so the chain index stays exhausted forever. The eager-fallback check at the top of the retry loop sees the exhausted index and silently skips — no "trying fallback" log line, no status message, just 3 retries and "API call failed after 3 retries."Cron jobs work because each cron run constructs a fresh
AIAgentwith_fallback_index = 0.Changes
run_agent.py: addself._fallback_index = 0in thenot _fallback_activatedearly-return branch of_restore_primary_runtime().tests/run_agent/test_primary_runtime_restore.py: regression test exercising the failed-activation path.Validation
scripts/run_tests.sh tests/run_agent/test_primary_runtime_restore.py tests/run_agent/test_provider_fallback.py -q→ 54/54 passed.Closes #20465. Original PR #20793.