fix(auxiliary): retry transient transport error once before fallback (#16587)#41885
Merged
Conversation
Contributor
🔎 Lint report:
|
…16587) A one-off transient transport failure (streaming-close / incomplete chunked read / 5xx / 408) on an auxiliary LLM call escalated straight to provider/model fallback (or, for context compression, dropped the summary and entered cooldown), even when an immediate retry on the same provider would have succeeded. Add a single same-target retry at the top of call_llm() and async_call_llm() — before the existing except-chain — gated on a new _is_transient_transport_error() that reuses the canonical _is_connection_error() detector plus a 5xx/408 status check. A second failure (or any non-transient error: auth, other 4xx, malformed payload) falls through to first_err and the existing fallback handling unchanged. This lives in call_llm so every auxiliary task (compression, memory flush, title generation, session search, vision) shares one transient-retry surface, rather than each caller re-implementing it. The context compressor needs no change — it calls call_llm and inherits the retry; its existing fallback-to-main path (#18458) now composes naturally (retry the aux model once, then fall back to main only if the retry also fails). Co-authored-by: ARegalado1 <alberto.regalado@ymail.com>
4f1cd37 to
53fadd3
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A one-off transient transport failure on an auxiliary LLM call now retries once on the same provider before escalating — instead of immediately falling back to another provider (or, for context compression, dropping the summary and entering cooldown).
Root cause:
call_llm()/async_call_llm()issued a singlechat.completions.create(), then went straight to the fallback/refresh except-chain on any error. A single streaming-close (peer closed connection/incomplete chunked read) or 5xx/408 blip abandoned an otherwise-healthy provider even though an immediate retry usually succeeds.Changes
agent/auxiliary_client.py:_is_transient_transport_error()— reuses the canonical_is_connection_error()detector + a 5xx/408 status check (no duplicate error list).call_llm()andasync_call_llm(), before the existing except-chain. A second failure (or any non-transient error: auth, other 4xx, malformed payload) falls through tofirst_errand the existing fallback handling unchanged.tests/agent/test_auxiliary_client.py: 4 tests — retry on streaming-close, retry on 5xx, no-retry on 400, and second-failure escalation to the existing provider fallback.scripts/release.py: AUTHOR_MAP entry for the contributor.This lives in
call_llmso every auxiliary task (compression, memory flush, title generation, session search, vision) shares one transient-retry surface. The context compressor needs no change — it inherits the retry, and its existing fallback-to-main path (#18458) composes naturally: retry the aux model once, then fall back to main only if the retry also fails.Validation
Targeted:
tests/agent/test_auxiliary_client.py+tests/agent/test_context_compressor.pypass (307), plus the wider aux/compression slice (527). The existing auth-refresh, payment-fallback, pool-rotation, and streaming-fallback suites are green — the retry composes with the except-chain rather than replacing it.Salvages PR #16587 (@ARegalado1). The original fixed only the context-compression caller with a private error classifier; this reimplements the same intent one layer down in
call_llm, so it benefits all auxiliary tasks and reuses the existing_is_connection_errordetector. Co-authored credit preserved.Infographic