fix(agent): retry transient compression summary transport errors#16587
fix(agent): retry transient compression summary transport errors#16587ARegalado1 wants to merge 1 commit into
Conversation
7061178 to
0312f72
Compare
|
Refresh update:
No CI checks are currently attached to this fork PR from GitHub. |
|
Fixed on The auxiliary Codex Responses path now routes through the same shared converter the main agent transport uses ( Thanks for the fix — closing as resolved by #41714. |
|
Reopening — I closed this in error. This PR is independent of the #5709 |
…16587) A one-off transient transport failure (streaming-close / incomplete chunked read / 5xx / 408) on an auxiliary LLM call escalated straight to provider/model fallback (or, for context compression, dropped the summary and entered cooldown), even when an immediate retry on the same provider would have succeeded. Add a single same-target retry at the top of call_llm() and async_call_llm() — before the existing except-chain — gated on a new _is_transient_transport_error() that reuses the canonical _is_connection_error() detector plus a 5xx/408 status check. A second failure (or any non-transient error: auth, other 4xx, malformed payload) falls through to first_err and the existing fallback handling unchanged. This lives in call_llm so every auxiliary task (compression, memory flush, title generation, session search, vision) shares one transient-retry surface, rather than each caller re-implementing it. The context compressor needs no change — it calls call_llm and inherits the retry; its existing fallback-to-main path (#18458) now composes naturally (retry the aux model once, then fall back to main only if the retry also fails). Co-authored-by: ARegalado1 <alberto.regalado@ymail.com>
…16587) A one-off transient transport failure (streaming-close / incomplete chunked read / 5xx / 408) on an auxiliary LLM call escalated straight to provider/model fallback (or, for context compression, dropped the summary and entered cooldown), even when an immediate retry on the same provider would have succeeded. Add a single same-target retry at the top of call_llm() and async_call_llm() — before the existing except-chain — gated on a new _is_transient_transport_error() that reuses the canonical _is_connection_error() detector plus a 5xx/408 status check. A second failure (or any non-transient error: auth, other 4xx, malformed payload) falls through to first_err and the existing fallback handling unchanged. This lives in call_llm so every auxiliary task (compression, memory flush, title generation, session search, vision) shares one transient-retry surface, rather than each caller re-implementing it. The context compressor needs no change — it calls call_llm and inherits the retry; its existing fallback-to-main path (#18458) now composes naturally (retry the aux model once, then fall back to main only if the retry also fails). Co-authored-by: ARegalado1 <alberto.regalado@ymail.com>
|
Merged via #41885 (commit 02a4d66). Your fix idea — retry a transient compression-summary transport error once before giving up — shipped, but generalized one layer down. Instead of retrying inside the context-compression caller with a private error classifier, the retry now lives in The context compressor needed no change after that: it calls Your authorship is preserved as a co-author on the merged commit, and you're in the release AUTHOR_MAP. Thanks for the report and the fix. |
…ousResearch#16587) A one-off transient transport failure (streaming-close / incomplete chunked read / 5xx / 408) on an auxiliary LLM call escalated straight to provider/model fallback (or, for context compression, dropped the summary and entered cooldown), even when an immediate retry on the same provider would have succeeded. Add a single same-target retry at the top of call_llm() and async_call_llm() — before the existing except-chain — gated on a new _is_transient_transport_error() that reuses the canonical _is_connection_error() detector plus a 5xx/408 status check. A second failure (or any non-transient error: auth, other 4xx, malformed payload) falls through to first_err and the existing fallback handling unchanged. This lives in call_llm so every auxiliary task (compression, memory flush, title generation, session search, vision) shares one transient-retry surface, rather than each caller re-implementing it. The context compressor needs no change — it calls call_llm and inherits the retry; its existing fallback-to-main path (NousResearch#18458) now composes naturally (retry the aux model once, then fall back to main only if the retry also fails). Co-authored-by: ARegalado1 <alberto.regalado@ymail.com>
…ousResearch#16587) A one-off transient transport failure (streaming-close / incomplete chunked read / 5xx / 408) on an auxiliary LLM call escalated straight to provider/model fallback (or, for context compression, dropped the summary and entered cooldown), even when an immediate retry on the same provider would have succeeded. Add a single same-target retry at the top of call_llm() and async_call_llm() — before the existing except-chain — gated on a new _is_transient_transport_error() that reuses the canonical _is_connection_error() detector plus a 5xx/408 status check. A second failure (or any non-transient error: auth, other 4xx, malformed payload) falls through to first_err and the existing fallback handling unchanged. This lives in call_llm so every auxiliary task (compression, memory flush, title generation, session search, vision) shares one transient-retry surface, rather than each caller re-implementing it. The context compressor needs no change — it calls call_llm and inherits the retry; its existing fallback-to-main path (NousResearch#18458) now composes naturally (retry the aux model once, then fall back to main only if the retry also fails). Co-authored-by: ARegalado1 <alberto.regalado@ymail.com>
Summary
APIConnectionError("Connection error."), transient HTTP status-code paths, auth/client no-retry, and second transient failure cooldown.Context
I hit a transient compression-summary failure where the auxiliary call reported
peer closed connection without sending complete message body (incomplete chunked read). Today that path goes straight to fallback/cooldown, even though a single retry is often enough for stream/transport disconnects.This is intentionally narrower than the open Responses API
role=toolreplay work around #5709. It does not change message conversion or tool replay behavior.Refresh note: I rebased this branch onto current
main, checked for duplicate/superseding PRs or issues for this exact compressor retry path, and added direct coverage for the status-code branches.Test Plan
python -m py_compile agent/context_compressor.pypython -m pytest tests/agent/test_context_compressor.py -q -o 'addopts='— 73 passedgit diff --check upstream/main...HEAD