Skip to content

_restore_primary_runtime() doesn't check credential cooldown — burns retries every turn while provider is exhausted #15298

@mordekai-lab

Description

@mordekai-lab

Summary

_restore_primary_runtime() unconditionally restores the primary provider at the start of every new turn, even when the primary provider's credential is still in exhaustion cooldown. This causes wasted retries on every turn until the cooldown expires.

How It Happens

  1. Turn N: Primary provider returns 429 → credential marked exhausted → fallback activated
  2. Turn N+1: _restore_primary_runtime() runs at top of run_conversation() → restores primary provider
  3. Turn N+1: Primary tried again → 429 (3 retries burned) → fallback activated again
  4. Repeat every turn until cooldown expires

In long-lived gateway sessions (agent caching), this means every single message burns 3 retries on the exhausted provider before falling back. For cron jobs it's worse — each run only gets one turn, so the retry burns the entire execution.

Affected Code

run_agent.py_restore_primary_runtime() (line ~6490):

def _restore_primary_runtime(self) -> bool:
    if not self._fallback_activated:
        return False
    # ← No cooldown check here — always restores
    rt = self._primary_runtime
    # ... restores model, provider, client, etc.

The docstring correctly explains why restoration exists (transient failures shouldn't permanently pin to fallback), but it doesn't account for sustained cooldown periods.

Proposed Fix

Before restoring, check if the primary provider's credential pool entry is still in cooldown:

def _restore_primary_runtime(self) -> bool:
    if not self._fallback_activated:
        return False

    # Check if primary credential is still in cooldown
    pool = self._credential_pool
    if pool is not None:
        rt_provider = self._primary_runtime.get("provider", "")
        if rt_provider and pool.provider == rt_provider:
            current_entry = pool.peek()
            if current_entry and current_entry.last_status == "exhausted":
                cooldown_until = _exhausted_until(current_entry)
                if cooldown_until is not None and time.time() < cooldown_until:
                    # Stay on fallback — primary still cooling down
                    return False

    rt = self._primary_runtime
    # ... existing restore logic

This preserves the existing behavior for transient failures (cooldown expired → restore works as before) while avoiding the retry burn during sustained outages.

Environment

  • Hermes-agent latest, gateway mode
  • Credential pool enabled with fill_first strategy
  • Observed during extended provider overload (429 errors lasting 1+ hours)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildersweeper:implemented-on-mainSweeper: behavior already present on current maintype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions