Problem
When the primary provider becomes unresponsive during streaming (no chunks delivered within the stale timeout), Hermes kills the connection and retries with the same provider. It does not activate the fallback_providers chain.
This means that if a provider is alive but unresponsive (not returning 429/500, just holding the connection open without delivering tokens), the agent will retry the same provider repeatedly until max_retries is exhausted — even though working fallback providers are configured and available.
Reproduction
- Configure a primary provider and
fallback_providers chain in config.yaml
- Send a request with a large context (~80K tokens) that causes the primary to stall (no chunks, but TCP connection stays alive via SSE keepalive)
- Observe: stale detector fires at 240s, kills connection, retries same provider
- After
max_retries, the turn fails — fallback chain is never activated
Expected behavior
After the first stale-stream kill (or after N stale kills), _try_activate_fallback() should be called to switch to the next provider in the chain — similar to how empty/malformed responses trigger eager fallback at line 11513-11519.
Relevant code
Stale stream detection (run_agent.py:7514-7546):
_stale_elapsed = time.time() - last_chunk_time["t"]
if _stale_elapsed > _stream_stale_timeout:
# ... logs warning, emits status, kills client
self._emit_status(
f"⚠️ No response from provider for {int(_stale_elapsed)}s "
f"(model: {api_kwargs.get('model', 'unknown')}, "
f"context: ~{_est_ctx:,} tokens). "
f"Reconnecting..."
)
# Closes client, resets timer, continues while-loop — does NOT call _try_activate_fallback()
Non-streaming stale detection (run_agent.py:6550-6587) — same gap: kills connection, sets TimeoutError, but no fallback activation.
Where fallback IS activated (run_agent.py:11510-11519):
# Empty/malformed responses — correctly triggers fallback
if self._fallback_index < len(self._fallback_chain):
self._emit_status("⚠️ Empty/malformed response — switching to fallback...")
if self._try_activate_fallback():
retry_count = 0
continue
_try_activate_fallback signature (run_agent.py:7629):
def _try_activate_fallback(self, reason: "FailoverReason | None" = None) -> bool:
FailoverReason.timeout exists in agent/error_classifier.py:40 — can be used as the reason parameter.
Environment
- Hermes Agent v0.12.0
- Provider: zai (glm-5-turbo) as primary, ollama-cloud (minimax-m2.7) as first fallback
- Context: ~81K tokens, stale timeout hit 240s (correct per scaling logic at line 7484)
- Provider was alive (quota not exhausted, non-streaming requests worked) — just not delivering stream chunks
Suggested fix
In _make_streaming_api_call() around line 7546, after the stale-stream kill and timer reset, add:
if self._fallback_index < len(self._fallback_chain):
self._emit_status("⚠️ Stale stream — switching to fallback...")
if self._try_activate_fallback(reason=FailoverReason.timeout):
retry_count = 0
compression_attempts = 0
primary_recovery_attempted = False
break # exit stale-detection while-loop, retry with fallback
Same logic should be added for the non-streaming stale timeout at line 6582-6586.
Note: FailoverReason.timeout is already defined in agent/error_classifier.py:40 — no new enum value needed.
Problem
When the primary provider becomes unresponsive during streaming (no chunks delivered within the stale timeout), Hermes kills the connection and retries with the same provider. It does not activate the
fallback_providerschain.This means that if a provider is alive but unresponsive (not returning 429/500, just holding the connection open without delivering tokens), the agent will retry the same provider repeatedly until
max_retriesis exhausted — even though working fallback providers are configured and available.Reproduction
fallback_providerschain inconfig.yamlmax_retries, the turn fails — fallback chain is never activatedExpected behavior
After the first stale-stream kill (or after N stale kills),
_try_activate_fallback()should be called to switch to the next provider in the chain — similar to how empty/malformed responses trigger eager fallback at line 11513-11519.Relevant code
Stale stream detection (
run_agent.py:7514-7546):Non-streaming stale detection (
run_agent.py:6550-6587) — same gap: kills connection, setsTimeoutError, but no fallback activation.Where fallback IS activated (
run_agent.py:11510-11519):_try_activate_fallbacksignature (run_agent.py:7629):FailoverReason.timeoutexists inagent/error_classifier.py:40— can be used as the reason parameter.Environment
Suggested fix
In
_make_streaming_api_call()around line 7546, after the stale-stream kill and timer reset, add:Same logic should be added for the non-streaming stale timeout at line 6582-6586.
Note:
FailoverReason.timeoutis already defined inagent/error_classifier.py:40— no new enum value needed.