fix: apply fallback cooldown for all failover reasons, make duration configurable by dragonforce2010 · Pull Request #19839 · NousResearch/hermes-agent

dragonforce2010 · 2026-05-04T17:26:13Z

Problem

The rate-limit cooldown (_rate_limited_until) in _try_activate_fallback() was gated behind reason in (FailoverReason.rate_limit, FailoverReason.billing). However, the majority of callsites (11 out of 12) invoke the method without a reason argument, so the cooldown was effectively never applied.

This causes _restore_primary_runtime() to attempt the primary provider on every new turn — even when it's known to be unavailable. For subscription-based providers like OpenAI Codex OAuth where quota resets can take hours, this means:

Every message hits the primary → gets 429 → waits for retry timeout → falls back
User sees "Rate limited — switching to fallback provider..." on every single message
Unnecessary latency on every turn (retry delays before fallback activates)

Fix

Remove the reason gate so cooldown fires on any fallback activation
Make cooldown duration configurable via HERMES_RATE_LIMIT_COOLDOWN env var (default: 3600s = 1 hour, up from hardcoded 60s)
Preserve the existing guard that only starts cooldown when leaving the primary provider (chain-switching between fallbacks is unaffected)

Rationale for default change (60s → 3600s)

The original 60-second cooldown was reasonable for transient API outages, but subscription-based providers (Codex OAuth, Anthropic Pro) have quota reset windows measured in hours, not seconds. A 60s cooldown means the agent retries the exhausted provider every minute — adding latency with zero chance of success.

1 hour balances between:

Not wasting time on known-exhausted providers
Recovering reasonably quickly when quota does reset

Users who prefer the old behavior can set HERMES_RATE_LIMIT_COOLDOWN=60.

Testing

Verified on a live Hermes Gateway with OpenAI Codex as primary and MiniMax as fallback:

Before fix: every message showed "Rate limited — switching to fallback provider..."
After fix: first message triggers fallback, subsequent messages go directly to MiniMax for 1 hour

…limit/billing Previously, the rate-limit cooldown (_rate_limited_until) was only set when _try_activate_fallback() received reason=rate_limit or reason=billing. However, the majority of callsites invoke the method without a reason argument, so the cooldown was never applied in practice. This caused _restore_primary_runtime() to attempt the primary provider on every new turn — even when it's known to be unavailable (e.g. ChatGPT subscription quota exhausted for hours). Changes: - Remove the reason gate so cooldown fires on any fallback activation - Make cooldown duration configurable via HERMES_RATE_LIMIT_COOLDOWN env var (default: 3600s = 1 hour, up from hardcoded 60s) - Preserve the guard that only starts cooldown when leaving the primary provider (chain-switching between fallbacks is unaffected) This is particularly important for subscription-based providers like OpenAI Codex OAuth where quota resets can take hours.

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 4, 2026

ddoKx mentioned this pull request May 5, 2026

[Bug] Interactive CLI session does not auto-fallback on Codex 429 'usage_limit_reached', while cron jobs with the same fallback chain do #20465

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: apply fallback cooldown for all failover reasons, make duration configurable#19839

fix: apply fallback cooldown for all failover reasons, make duration configurable#19839
dragonforce2010 wants to merge 1 commit into
NousResearch:mainfrom
dragonforce2010:fix/fallback-cooldown-configurable

dragonforce2010 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dragonforce2010 commented May 4, 2026

Problem

Fix

Rationale for default change (60s → 3600s)

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants