Stale stream timeout does not trigger fallback_providers chain

### Problem

When the primary provider becomes unresponsive during streaming (no chunks delivered within the stale timeout), Hermes kills the connection and retries with the **same** provider. It does **not** activate the `fallback_providers` chain.

This means that if a provider is alive but unresponsive (not returning 429/500, just holding the connection open without delivering tokens), the agent will retry the same provider repeatedly until `max_retries` is exhausted — even though working fallback providers are configured and available.

### Reproduction

1. Configure a primary provider and `fallback_providers` chain in `config.yaml`
2. Send a request with a large context (~80K tokens) that causes the primary to stall (no chunks, but TCP connection stays alive via SSE keepalive)
3. Observe: stale detector fires at 240s, kills connection, retries same provider
4. After `max_retries`, the turn fails — fallback chain is never activated

### Expected behavior

After the first stale-stream kill (or after N stale kills), `_try_activate_fallback()` should be called to switch to the next provider in the chain — similar to how empty/malformed responses trigger eager fallback at line 11513-11519.

### Relevant code

**Stale stream detection** (`run_agent.py:7514-7546`):
```python
_stale_elapsed = time.time() - last_chunk_time["t"]
if _stale_elapsed > _stream_stale_timeout:
    # ... logs warning, emits status, kills client
    self._emit_status(
        f"⚠️ No response from provider for {int(_stale_elapsed)}s "
        f"(model: {api_kwargs.get('model', 'unknown')}, "
        f"context: ~{_est_ctx:,} tokens). "
        f"Reconnecting..."
    )
    # Closes client, resets timer, continues while-loop — does NOT call _try_activate_fallback()
```

**Non-streaming stale detection** (`run_agent.py:6550-6587`) — same gap: kills connection, sets `TimeoutError`, but no fallback activation.

**Where fallback IS activated** (`run_agent.py:11510-11519`):
```python
# Empty/malformed responses — correctly triggers fallback
if self._fallback_index < len(self._fallback_chain):
    self._emit_status("⚠️ Empty/malformed response — switching to fallback...")
if self._try_activate_fallback():
    retry_count = 0
    continue
```

**`_try_activate_fallback` signature** (`run_agent.py:7629`):
```python
def _try_activate_fallback(self, reason: "FailoverReason | None" = None) -> bool:
```

**`FailoverReason.timeout`** exists in `agent/error_classifier.py:40` — can be used as the reason parameter.

### Environment

- Hermes Agent v0.12.0
- Provider: zai (glm-5-turbo) as primary, ollama-cloud (minimax-m2.7) as first fallback
- Context: ~81K tokens, stale timeout hit 240s (correct per scaling logic at line 7484)
- Provider was alive (quota not exhausted, non-streaming requests worked) — just not delivering stream chunks

### Suggested fix

In `_make_streaming_api_call()` around line 7546, after the stale-stream kill and timer reset, add:

```python
if self._fallback_index < len(self._fallback_chain):
    self._emit_status("⚠️ Stale stream — switching to fallback...")
if self._try_activate_fallback(reason=FailoverReason.timeout):
    retry_count = 0
    compression_attempts = 0
    primary_recovery_attempted = False
    break  # exit stale-detection while-loop, retry with fallback
```

Same logic should be added for the non-streaming stale timeout at line 6582-6586.

Note: `FailoverReason.timeout` is already defined in `agent/error_classifier.py:40` — no new enum value needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stale stream timeout does not trigger fallback_providers chain #25689

Problem

Reproduction

Expected behavior

Relevant code

Environment

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Stale stream timeout does not trigger fallback_providers chain #25689

Description

Problem

Reproduction

Expected behavior

Relevant code

Environment

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions