Skip to content

fix(auxiliary): cache 402'd providers as unhealthy with TTL to stop per-call retry storms#23597

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-04bc4ccd
May 11, 2026
Merged

fix(auxiliary): cache 402'd providers as unhealthy with TTL to stop per-call retry storms#23597
teknium1 merged 1 commit into
mainfrom
hermes/hermes-04bc4ccd

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Closes part of #23570 (depleted-OpenRouter retry storm).

Summary

Stops aux from re-trying a depleted provider on every single compression / title-gen / session-search call. When any auxiliary call hits HTTP 402, the provider is marked unhealthy with a 10-minute TTL and skipped by every subsequent aux chain iteration until the TTL expires.

Changes

  • agent/auxiliary_client.py:
    • New TTL cache + helpers (_mark_provider_unhealthy, _is_provider_unhealthy, _log_skip_unhealthy, _reset_aux_unhealthy_cache, _normalize_chain_label)
    • Step-1 of _resolve_auto (main-provider-as-aux path) now consults the cache → bypasses the depleted main provider, falls to Step-2
    • Step-2 chain consults the cache → skips depleted entries, picks the next available
    • _try_payment_fallback consults the cache → second 402 within the same call doesn't re-try the same depleted endpoint
    • call_llm / acall_llm mark the provider unhealthy when they observe a payment error (using _recoverable_pool_provider to derive the actual provider label from base_url when resolved_provider == "auto")
  • tests/agent/test_auxiliary_client.py: new TestAuxUnhealthyCache (7 tests)

Behavior contract

  • TTL is 10 min (_AUX_UNHEALTHY_TTL_SECONDS); expires lazily on next lookup
  • Skip-logs throttled to once per minute per label (no log spam on bursty sessions)
  • Cache is per-process only — by design, multi-profile users with different keys see each profile's first 402

Validation

Before After
402 on every aux call yes (~1 RTT each) once per TTL window
User-visible signal repeated 402 traces in agent.log single WARNING ("marking X unhealthy for 600s") + throttled INFO skips
Targeted tests 0 7 (passing)
tests/agent/ 2679 passed (no regressions)
tests/agent/test_auxiliary_* 222 passed (existing suite green)

Refs #23570 (3 of 3: PR #23585 covered the silent config-parse failure; native-image data-URL replay PR is next).

…er-call retry storms

When an auxiliary provider returns HTTP 402 (credit / payment), every
subsequent compression / title-gen / session-search / vision call still
re-tried it as the FIRST entry in the chain — burning ~1 RTT to hit 402
again, then falling back. On a long Discord/LCM session that meant dozens
of doomed 402s per minute (issue #23570).

Add a per-process unhealthy-provider cache with a 10 min TTL. When any
caller observes a payment error against a provider, the label is marked
unhealthy and skipped by:
  * _resolve_auto Step-1 (main provider use-as-aux path)
  * _resolve_auto Step-2 (aggregator/fallback chain)
  * _try_payment_fallback (used by call_llm/acall_llm on first 402)

Skip-logs are throttled to once per minute per label so a bursty session
doesn't spam agent.log. Entries auto-expire so a topped-up account
recovers without manual intervention. The cache is in-process only by
design — multi-profile users with different keys per profile must each
hit the 402 once.

Refs #23570
@teknium1 teknium1 merged commit 228b7d2 into main May 11, 2026
13 of 16 checks passed
@teknium1 teknium1 deleted the hermes/hermes-04bc4ccd branch May 11, 2026 05:43
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-04bc4ccd vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8156 on HEAD, 8155 on base (🆕 +1)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4287 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 11, 2026
rmulligan pushed a commit to rmulligan/hermes-agent that referenced this pull request May 11, 2026
…er-call retry storms (NousResearch#23597)

When an auxiliary provider returns HTTP 402 (credit / payment), every
subsequent compression / title-gen / session-search / vision call still
re-tried it as the FIRST entry in the chain — burning ~1 RTT to hit 402
again, then falling back. On a long Discord/LCM session that meant dozens
of doomed 402s per minute (issue NousResearch#23570).

Add a per-process unhealthy-provider cache with a 10 min TTL. When any
caller observes a payment error against a provider, the label is marked
unhealthy and skipped by:
  * _resolve_auto Step-1 (main provider use-as-aux path)
  * _resolve_auto Step-2 (aggregator/fallback chain)
  * _try_payment_fallback (used by call_llm/acall_llm on first 402)

Skip-logs are throttled to once per minute per label so a bursty session
doesn't spam agent.log. Entries auto-expire so a topped-up account
recovers without manual intervention. The cache is in-process only by
design — multi-profile users with different keys per profile must each
hit the 402 once.

Refs NousResearch#23570
JinyuID pushed a commit to JinyuID/hermes-agent that referenced this pull request May 11, 2026
…er-call retry storms (NousResearch#23597)

When an auxiliary provider returns HTTP 402 (credit / payment), every
subsequent compression / title-gen / session-search / vision call still
re-tried it as the FIRST entry in the chain — burning ~1 RTT to hit 402
again, then falling back. On a long Discord/LCM session that meant dozens
of doomed 402s per minute (issue NousResearch#23570).

Add a per-process unhealthy-provider cache with a 10 min TTL. When any
caller observes a payment error against a provider, the label is marked
unhealthy and skipped by:
  * _resolve_auto Step-1 (main provider use-as-aux path)
  * _resolve_auto Step-2 (aggregator/fallback chain)
  * _try_payment_fallback (used by call_llm/acall_llm on first 402)

Skip-logs are throttled to once per minute per label so a bursty session
doesn't spam agent.log. Entries auto-expire so a topped-up account
recovers without manual intervention. The cache is in-process only by
design — multi-profile users with different keys per profile must each
hit the 402 once.

Refs NousResearch#23570
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…er-call retry storms (NousResearch#23597)

When an auxiliary provider returns HTTP 402 (credit / payment), every
subsequent compression / title-gen / session-search / vision call still
re-tried it as the FIRST entry in the chain — burning ~1 RTT to hit 402
again, then falling back. On a long Discord/LCM session that meant dozens
of doomed 402s per minute (issue NousResearch#23570).

Add a per-process unhealthy-provider cache with a 10 min TTL. When any
caller observes a payment error against a provider, the label is marked
unhealthy and skipped by:
  * _resolve_auto Step-1 (main provider use-as-aux path)
  * _resolve_auto Step-2 (aggregator/fallback chain)
  * _try_payment_fallback (used by call_llm/acall_llm on first 402)

Skip-logs are throttled to once per minute per label so a bursty session
doesn't spam agent.log. Entries auto-expire so a topped-up account
recovers without manual intervention. The cache is in-process only by
design — multi-profile users with different keys per profile must each
hit the 402 once.

Refs NousResearch#23570
jsboige pushed a commit to jsboige/hermes-agent that referenced this pull request May 14, 2026
…er-call retry storms (NousResearch#23597)

When an auxiliary provider returns HTTP 402 (credit / payment), every
subsequent compression / title-gen / session-search / vision call still
re-tried it as the FIRST entry in the chain — burning ~1 RTT to hit 402
again, then falling back. On a long Discord/LCM session that meant dozens
of doomed 402s per minute (issue NousResearch#23570).

Add a per-process unhealthy-provider cache with a 10 min TTL. When any
caller observes a payment error against a provider, the label is marked
unhealthy and skipped by:
  * _resolve_auto Step-1 (main provider use-as-aux path)
  * _resolve_auto Step-2 (aggregator/fallback chain)
  * _try_payment_fallback (used by call_llm/acall_llm on first 402)

Skip-logs are throttled to once per minute per label so a bursty session
doesn't spam agent.log. Entries auto-expire so a topped-up account
recovers without manual intervention. The cache is in-process only by
design — multi-profile users with different keys per profile must each
hit the 402 once.

Refs NousResearch#23570
AlexFoxD pushed a commit to AlexFoxD/hermes-agent that referenced this pull request May 21, 2026
…er-call retry storms (NousResearch#23597)

When an auxiliary provider returns HTTP 402 (credit / payment), every
subsequent compression / title-gen / session-search / vision call still
re-tried it as the FIRST entry in the chain — burning ~1 RTT to hit 402
again, then falling back. On a long Discord/LCM session that meant dozens
of doomed 402s per minute (issue NousResearch#23570).

Add a per-process unhealthy-provider cache with a 10 min TTL. When any
caller observes a payment error against a provider, the label is marked
unhealthy and skipped by:
  * _resolve_auto Step-1 (main provider use-as-aux path)
  * _resolve_auto Step-2 (aggregator/fallback chain)
  * _try_payment_fallback (used by call_llm/acall_llm on first 402)

Skip-logs are throttled to once per minute per label so a bursty session
doesn't spam agent.log. Entries auto-expire so a topped-up account
recovers without manual intervention. The cache is in-process only by
design — multi-profile users with different keys per profile must each
hit the 402 once.

Refs NousResearch#23570
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…er-call retry storms (NousResearch#23597)

When an auxiliary provider returns HTTP 402 (credit / payment), every
subsequent compression / title-gen / session-search / vision call still
re-tried it as the FIRST entry in the chain — burning ~1 RTT to hit 402
again, then falling back. On a long Discord/LCM session that meant dozens
of doomed 402s per minute (issue NousResearch#23570).

Add a per-process unhealthy-provider cache with a 10 min TTL. When any
caller observes a payment error against a provider, the label is marked
unhealthy and skipped by:
  * _resolve_auto Step-1 (main provider use-as-aux path)
  * _resolve_auto Step-2 (aggregator/fallback chain)
  * _try_payment_fallback (used by call_llm/acall_llm on first 402)

Skip-logs are throttled to once per minute per label so a bursty session
doesn't spam agent.log. Entries auto-expire so a topped-up account
recovers without manual intervention. The cache is in-process only by
design — multi-profile users with different keys per profile must each
hit the 402 once.

Refs NousResearch#23570
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants