Problem
When a provider returns 401/403 or persistent rate limits, Hermes retries max_retries times, possibly tries fallback, but on the next iteration loop pass it hits the same broken provider again. No memory of prior failures carries across iterations within a session.
This means a rate-limited provider gets hammered on every single tool-call iteration until the session ends or the user intervenes.
Proposed Solution
Add a lightweight process-scoped ProviderCooldownTracker that:
- Records failure reasons per
(provider, base_url) key
- Implements escalating cooldown: 30s → 60s → 5min for transient errors, 5min → 10min → 30min for permanent errors
- Is checked before each API call in
run_conversation()
- On cooldown: triggers fallback activation immediately
- Resets on successful calls (circuit breaker close)
Thread-safe singleton — works across concurrent gateway sessions.
Benefits
- Stops hammering broken providers across iterations
- Graceful degradation with automatic recovery
- Pairs naturally with the existing fallback chain
Problem
When a provider returns 401/403 or persistent rate limits, Hermes retries
max_retriestimes, possibly tries fallback, but on the next iteration loop pass it hits the same broken provider again. No memory of prior failures carries across iterations within a session.This means a rate-limited provider gets hammered on every single tool-call iteration until the session ends or the user intervenes.
Proposed Solution
Add a lightweight process-scoped
ProviderCooldownTrackerthat:(provider, base_url)keyrun_conversation()Thread-safe singleton — works across concurrent gateway sessions.
Benefits