Skip to content

fix(agent): retry transient compression summary transport errors#16587

Closed
ARegalado1 wants to merge 1 commit into
NousResearch:mainfrom
ARegalado1:fix/aux-alerts-2026-04-27
Closed

fix(agent): retry transient compression summary transport errors#16587
ARegalado1 wants to merge 1 commit into
NousResearch:mainfrom
ARegalado1:fix/aux-alerts-2026-04-27

Conversation

@ARegalado1

@ARegalado1 ARegalado1 commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Retry context compression summary generation once for transient auxiliary transport, stream, timeout, and 5xx failures before falling back.
  • Preserve existing no-retry behavior for auth and other non-408 client 4xx errors.
  • Add regression coverage for incomplete chunked reads, generic APIConnectionError("Connection error."), transient HTTP status-code paths, auth/client no-retry, and second transient failure cooldown.

Context

I hit a transient compression-summary failure where the auxiliary call reported peer closed connection without sending complete message body (incomplete chunked read). Today that path goes straight to fallback/cooldown, even though a single retry is often enough for stream/transport disconnects.

This is intentionally narrower than the open Responses API role=tool replay work around #5709. It does not change message conversion or tool replay behavior.

Refresh note: I rebased this branch onto current main, checked for duplicate/superseding PRs or issues for this exact compressor retry path, and added direct coverage for the status-code branches.

Test Plan

  • python -m py_compile agent/context_compressor.py
  • python -m pytest tests/agent/test_context_compressor.py -q -o 'addopts=' — 73 passed
  • git diff --check upstream/main...HEAD

@ARegalado1 ARegalado1 force-pushed the fix/aux-alerts-2026-04-27 branch from 7061178 to 0312f72 Compare May 3, 2026 14:35
@ARegalado1

Copy link
Copy Markdown
Contributor Author

Refresh update:

  • Rebased this PR branch onto current main.
  • Added direct regression coverage for transient HTTP status-code paths (408, 5xx, including response.status_code).
  • Rechecked for duplicate/superseding PRs or issues for this exact compressor retry path and did not find any.
  • Verified locally:
    • python -m py_compile agent/context_compressor.py
    • python -m pytest tests/agent/test_context_compressor.py -q -o 'addopts=' — 73 passed
    • git diff --check upstream/main...HEAD

No CI checks are currently attached to this fork PR from GitHub.

@teknium1

teknium1 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Fixed on main via #41714 (commit 2789bf4).

The auxiliary Codex Responses path now routes through the same shared converter the main agent transport uses (_chat_messages_to_responses_input), so tool history is encoded as function_call / function_call_output items with a valid call_id instead of leaking raw role=tool into Responses input[]. Single conversion path means the auxiliary and main paths can't drift apart again — which was the explicit scope of #5709.

Thanks for the fix — closing as resolved by #41714.

@teknium1 teknium1 closed this Jun 8, 2026
@teknium1 teknium1 reopened this Jun 8, 2026
@teknium1

teknium1 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Reopening — I closed this in error. This PR is independent of the #5709 role=tool Responses conversion fix (#41714); it adds single-retry handling for transient compression-summary transport failures (incomplete chunked reads, stream disconnects, 5xx) in agent/context_compressor.py, which my change does not touch or supersede. Apologies for the mistaken close. This stands on its own and will be triaged separately on its merits.

teknium1 added a commit that referenced this pull request Jun 8, 2026
…16587)

A one-off transient transport failure (streaming-close / incomplete
chunked read / 5xx / 408) on an auxiliary LLM call escalated straight to
provider/model fallback (or, for context compression, dropped the summary
and entered cooldown), even when an immediate retry on the same provider
would have succeeded.

Add a single same-target retry at the top of call_llm() and
async_call_llm() — before the existing except-chain — gated on a new
_is_transient_transport_error() that reuses the canonical
_is_connection_error() detector plus a 5xx/408 status check. A second
failure (or any non-transient error: auth, other 4xx, malformed payload)
falls through to first_err and the existing fallback handling unchanged.

This lives in call_llm so every auxiliary task (compression, memory flush,
title generation, session search, vision) shares one transient-retry
surface, rather than each caller re-implementing it. The context
compressor needs no change — it calls call_llm and inherits the retry; its
existing fallback-to-main path (#18458) now composes naturally (retry the
aux model once, then fall back to main only if the retry also fails).

Co-authored-by: ARegalado1 <alberto.regalado@ymail.com>
teknium1 added a commit that referenced this pull request Jun 8, 2026
…16587)

A one-off transient transport failure (streaming-close / incomplete
chunked read / 5xx / 408) on an auxiliary LLM call escalated straight to
provider/model fallback (or, for context compression, dropped the summary
and entered cooldown), even when an immediate retry on the same provider
would have succeeded.

Add a single same-target retry at the top of call_llm() and
async_call_llm() — before the existing except-chain — gated on a new
_is_transient_transport_error() that reuses the canonical
_is_connection_error() detector plus a 5xx/408 status check. A second
failure (or any non-transient error: auth, other 4xx, malformed payload)
falls through to first_err and the existing fallback handling unchanged.

This lives in call_llm so every auxiliary task (compression, memory flush,
title generation, session search, vision) shares one transient-retry
surface, rather than each caller re-implementing it. The context
compressor needs no change — it calls call_llm and inherits the retry; its
existing fallback-to-main path (#18458) now composes naturally (retry the
aux model once, then fall back to main only if the retry also fails).

Co-authored-by: ARegalado1 <alberto.regalado@ymail.com>
@teknium1

teknium1 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Merged via #41885 (commit 02a4d66).

Your fix idea — retry a transient compression-summary transport error once before giving up — shipped, but generalized one layer down. Instead of retrying inside the context-compression caller with a private error classifier, the retry now lives in call_llm() / async_call_llm() (a single same-target retry before the existing fallback chain), gated on the canonical _is_connection_error() detector + a 5xx/408 check. That way every auxiliary task — compression, memory flush, title generation, session search, vision — gets the same transient resilience, and there's no duplicate error list to drift.

The context compressor needed no change after that: it calls call_llm and inherits the retry, and its existing fallback-to-main path composes naturally (retry the aux model once, then fall back to main only if the retry also fails).

Your authorship is preserved as a co-author on the merged commit, and you're in the release AUTHOR_MAP. Thanks for the report and the fix.

@teknium1 teknium1 closed this Jun 8, 2026
a249169329-cpu pushed a commit to a249169329-cpu/hermes-agent that referenced this pull request Jun 8, 2026
…ousResearch#16587)

A one-off transient transport failure (streaming-close / incomplete
chunked read / 5xx / 408) on an auxiliary LLM call escalated straight to
provider/model fallback (or, for context compression, dropped the summary
and entered cooldown), even when an immediate retry on the same provider
would have succeeded.

Add a single same-target retry at the top of call_llm() and
async_call_llm() — before the existing except-chain — gated on a new
_is_transient_transport_error() that reuses the canonical
_is_connection_error() detector plus a 5xx/408 status check. A second
failure (or any non-transient error: auth, other 4xx, malformed payload)
falls through to first_err and the existing fallback handling unchanged.

This lives in call_llm so every auxiliary task (compression, memory flush,
title generation, session search, vision) shares one transient-retry
surface, rather than each caller re-implementing it. The context
compressor needs no change — it calls call_llm and inherits the retry; its
existing fallback-to-main path (NousResearch#18458) now composes naturally (retry the
aux model once, then fall back to main only if the retry also fails).

Co-authored-by: ARegalado1 <alberto.regalado@ymail.com>
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
…ousResearch#16587)

A one-off transient transport failure (streaming-close / incomplete
chunked read / 5xx / 408) on an auxiliary LLM call escalated straight to
provider/model fallback (or, for context compression, dropped the summary
and entered cooldown), even when an immediate retry on the same provider
would have succeeded.

Add a single same-target retry at the top of call_llm() and
async_call_llm() — before the existing except-chain — gated on a new
_is_transient_transport_error() that reuses the canonical
_is_connection_error() detector plus a 5xx/408 status check. A second
failure (or any non-transient error: auth, other 4xx, malformed payload)
falls through to first_err and the existing fallback handling unchanged.

This lives in call_llm so every auxiliary task (compression, memory flush,
title generation, session search, vision) shares one transient-retry
surface, rather than each caller re-implementing it. The context
compressor needs no change — it calls call_llm and inherits the retry; its
existing fallback-to-main path (NousResearch#18458) now composes naturally (retry the
aux model once, then fall back to main only if the retry also fails).

Co-authored-by: ARegalado1 <alberto.regalado@ymail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants