Skip to content

fix(auxiliary): retry transient transport error once before fallback (#16587)#41885

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-703737c0
Jun 8, 2026
Merged

fix(auxiliary): retry transient transport error once before fallback (#16587)#41885
teknium1 merged 1 commit into
mainfrom
hermes/hermes-703737c0

Conversation

@teknium1

@teknium1 teknium1 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

A one-off transient transport failure on an auxiliary LLM call now retries once on the same provider before escalating — instead of immediately falling back to another provider (or, for context compression, dropping the summary and entering cooldown).

Root cause: call_llm() / async_call_llm() issued a single chat.completions.create(), then went straight to the fallback/refresh except-chain on any error. A single streaming-close (peer closed connection / incomplete chunked read) or 5xx/408 blip abandoned an otherwise-healthy provider even though an immediate retry usually succeeds.

Changes

  • agent/auxiliary_client.py:
    • New _is_transient_transport_error() — reuses the canonical _is_connection_error() detector + a 5xx/408 status check (no duplicate error list).
    • One same-target retry at the top of both call_llm() and async_call_llm(), before the existing except-chain. A second failure (or any non-transient error: auth, other 4xx, malformed payload) falls through to first_err and the existing fallback handling unchanged.
  • tests/agent/test_auxiliary_client.py: 4 tests — retry on streaming-close, retry on 5xx, no-retry on 400, and second-failure escalation to the existing provider fallback.
  • scripts/release.py: AUTHOR_MAP entry for the contributor.

This lives in call_llm so every auxiliary task (compression, memory flush, title generation, session search, vision) shares one transient-retry surface. The context compressor needs no change — it inherits the retry, and its existing fallback-to-main path (#18458) composes naturally: retry the aux model once, then fall back to main only if the retry also fails.

Validation

Before After
transient blip (streaming-close / 5xx / 408) escalate to fallback immediately retry once same provider → succeed
second transient failure fall through to existing provider fallback
non-transient (auth, 400, payment) fallback/refresh chain unchanged

Targeted: tests/agent/test_auxiliary_client.py + tests/agent/test_context_compressor.py pass (307), plus the wider aux/compression slice (527). The existing auth-refresh, payment-fallback, pool-rotation, and streaming-fallback suites are green — the retry composes with the except-chain rather than replacing it.

Salvages PR #16587 (@ARegalado1). The original fixed only the context-compression caller with a private error classifier; this reimplements the same intent one layer down in call_llm, so it benefits all auxiliary tasks and reuses the existing _is_connection_error detector. Co-authored credit preserved.

Infographic

aux-retry-one-layer-down

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-703737c0 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 10114 on HEAD, 10114 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 5238 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

…16587)

A one-off transient transport failure (streaming-close / incomplete
chunked read / 5xx / 408) on an auxiliary LLM call escalated straight to
provider/model fallback (or, for context compression, dropped the summary
and entered cooldown), even when an immediate retry on the same provider
would have succeeded.

Add a single same-target retry at the top of call_llm() and
async_call_llm() — before the existing except-chain — gated on a new
_is_transient_transport_error() that reuses the canonical
_is_connection_error() detector plus a 5xx/408 status check. A second
failure (or any non-transient error: auth, other 4xx, malformed payload)
falls through to first_err and the existing fallback handling unchanged.

This lives in call_llm so every auxiliary task (compression, memory flush,
title generation, session search, vision) shares one transient-retry
surface, rather than each caller re-implementing it. The context
compressor needs no change — it calls call_llm and inherits the retry; its
existing fallback-to-main path (#18458) now composes naturally (retry the
aux model once, then fall back to main only if the retry also fails).

Co-authored-by: ARegalado1 <alberto.regalado@ymail.com>
@teknium1 teknium1 force-pushed the hermes/hermes-703737c0 branch from 4f1cd37 to 53fadd3 Compare June 8, 2026 07:03
@teknium1 teknium1 changed the title fix(compression): retry transient transport error once before cooldown (#16587) fix(auxiliary): retry transient transport error once before fallback (#16587) Jun 8, 2026
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Jun 8, 2026
@teknium1 teknium1 merged commit 02a4d66 into main Jun 8, 2026
29 of 31 checks passed
@teknium1 teknium1 deleted the hermes/hermes-703737c0 branch June 8, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants