_restore_primary_runtime() doesn't check credential cooldown — burns retries every turn while provider is exhausted

## Summary

`_restore_primary_runtime()` unconditionally restores the primary provider at the start of every new turn, even when the primary provider's credential is still in exhaustion cooldown. This causes wasted retries on every turn until the cooldown expires.

## How It Happens

1. Turn N: Primary provider returns 429 → credential marked exhausted → fallback activated
2. Turn N+1: `_restore_primary_runtime()` runs at top of `run_conversation()` → restores primary provider
3. Turn N+1: Primary tried again → 429 (3 retries burned) → fallback activated again
4. Repeat every turn until cooldown expires

In long-lived gateway sessions (agent caching), this means every single message burns 3 retries on the exhausted provider before falling back. For cron jobs it's worse — each run only gets one turn, so the retry burns the entire execution.

## Affected Code

`run_agent.py` — `_restore_primary_runtime()` (line ~6490):

```python
def _restore_primary_runtime(self) -> bool:
    if not self._fallback_activated:
        return False
    # ← No cooldown check here — always restores
    rt = self._primary_runtime
    # ... restores model, provider, client, etc.
```

The docstring correctly explains *why* restoration exists (transient failures shouldn't permanently pin to fallback), but it doesn't account for sustained cooldown periods.

## Proposed Fix

Before restoring, check if the primary provider's credential pool entry is still in cooldown:

```python
def _restore_primary_runtime(self) -> bool:
    if not self._fallback_activated:
        return False

    # Check if primary credential is still in cooldown
    pool = self._credential_pool
    if pool is not None:
        rt_provider = self._primary_runtime.get("provider", "")
        if rt_provider and pool.provider == rt_provider:
            current_entry = pool.peek()
            if current_entry and current_entry.last_status == "exhausted":
                cooldown_until = _exhausted_until(current_entry)
                if cooldown_until is not None and time.time() < cooldown_until:
                    # Stay on fallback — primary still cooling down
                    return False

    rt = self._primary_runtime
    # ... existing restore logic
```

This preserves the existing behavior for transient failures (cooldown expired → restore works as before) while avoiding the retry burn during sustained outages.

## Environment

- Hermes-agent latest, gateway mode
- Credential pool enabled with `fill_first` strategy
- Observed during extended provider overload (429 errors lasting 1+ hours)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_restore_primary_runtime() doesn't check credential cooldown — burns retries every turn while provider is exhausted #15298

Summary

How It Happens

Affected Code

Proposed Fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

_restore_primary_runtime() doesn't check credential cooldown — burns retries every turn while provider is exhausted #15298

Description

Summary

How It Happens

Affected Code

Proposed Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions