Provider cooldown / circuit breaker for persistent failures

## Problem

When a provider returns 401/403 or persistent rate limits, Hermes retries `max_retries` times, possibly tries fallback, but on the **next iteration loop pass** it hits the same broken provider again. No memory of prior failures carries across iterations within a session.

This means a rate-limited provider gets hammered on every single tool-call iteration until the session ends or the user intervenes.

## Proposed Solution

Add a lightweight process-scoped `ProviderCooldownTracker` that:
- Records failure reasons per `(provider, base_url)` key
- Implements escalating cooldown: 30s → 60s → 5min for transient errors, 5min → 10min → 30min for permanent errors
- Is checked before each API call in `run_conversation()`
- On cooldown: triggers fallback activation immediately
- Resets on successful calls (circuit breaker close)

Thread-safe singleton — works across concurrent gateway sessions.

## Benefits

- Stops hammering broken providers across iterations
- Graceful degradation with automatic recovery
- Pairs naturally with the existing fallback chain


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provider cooldown / circuit breaker for persistent failures #5436

Problem

Proposed Solution

Benefits

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Provider cooldown / circuit breaker for persistent failures #5436

Description

Problem

Proposed Solution

Benefits

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions