fix(agent): reset fallback index when activation never succeeded (#16677)#18156
Closed
season179 wants to merge 1 commit into
Closed
fix(agent): reset fallback index when activation never succeeded (#16677)#18156season179 wants to merge 1 commit into
season179 wants to merge 1 commit into
Conversation
…sResearch#16677) _try_activate_fallback() advances _fallback_index before activation can fail. If every entry fails to resolve (exhausted credential pool, unconfigured provider) the recursion walks the index to len(chain) without ever flipping _fallback_activated. _restore_primary_runtime() then early-returned without resetting the index, so subsequent fallback attempts short-circuited at the bounds check on line 7438 -- the caller emitted "trying fallback..." then aborted immediately. Reset _fallback_index in the early-return branch so the chain stays attempt-able across turns.
Collaborator
|
Likely duplicate of #17824 — same root cause: _fallback_index not reset in _restore_primary_runtime() when previous turn exhausted chain without activating. |
1 similar comment
Collaborator
|
Likely duplicate of #17824 — same root cause: _fallback_index not reset in _restore_primary_runtime() when previous turn exhausted chain without activating. |
Contributor
Author
|
Closing as a duplicate of #17824. Apologies for the noise. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #16677.
_try_activate_fallback()increments_fallback_indexbefore activation can fail. When every entry fails to resolve (exhausted credential pool, unconfigured provider, etc.), the recursion walks the index tolen(chain)without ever flipping_fallback_activated._restore_primary_runtime()then early-returned without resetting the index, so the next turn's_try_activate_fallback()short-circuits at the bounds check — the user sees⚠️ trying fallback...followed by an immediate abort, with no swap on the wire.The fix is one line: reset
_fallback_index = 0in the early-return branch so the chain is attempt-able again next turn.I reproduced the bug deterministically against the real
run_agent.pycode paths (single-entry chain, mockedresolve_provider_clientreturning(None, None)to simulate the exhausted-pool case): turn 1 leaves_fallback_index = 1/_fallback_activated = False, the next turn's_restore_primary_runtime()doesn't reset, and even a now-resolvable fallback short-circuits at the bounds check. With the patch, the same script swaps to the fallback on turn 2 as expected.Tested:
test_resets_fallback_index_after_failed_activationintests/run_agent/test_primary_runtime_restore.py. Fails without the patch (third assertion), passes with it.tests/run_agent/suite: 1182 passed, 9 skipped, no regressions.Note: this only fixes the credential-pool-exhausted / unconfigured-fallback class of #16677. The reporter also describes a separate
auxiliary.vision.provider: autoresolving to a 16K-context model — that path doesn't apply theMINIMUM_CONTEXT_LENGTHfloor and deserves its own ticket. Theirstatus=75/TEMPFAIL"crash loop" framing is downstream of this same fallback-never-fires bug (75 is the intentional graceful-restart code ingateway/restart.py, not a 429 exit).