Skip to content

fix(fallback): trigger eager fallback on 503/529 provider overload#15666

Open
pazyork wants to merge 1 commit into
NousResearch:mainfrom
pazyork:fix/overloaded-fallback-bypass-pool
Open

fix(fallback): trigger eager fallback on 503/529 provider overload#15666
pazyork wants to merge 1 commit into
NousResearch:mainfrom
pazyork:fix/overloaded-fallback-bypass-pool

Conversation

@pazyork

@pazyork pazyork commented Apr 25, 2026

Copy link
Copy Markdown

(Continuation of #11492 — rebased on main, couldn't reopen the old PR after force-push.)


When a provider returns 503/529 (overloaded), Hermes should fall back to another provider. Currently it doesn't — two small things are missing:

  1. error_classifier.py: 503/529 doesn't set should_fallback=True
  2. run_agent.py: even if it did, the credential-pool check would block fallback (rotation can't fix provider overload)

Fix

  1. error_classifier: 503/529should_fallback=True (1 line)
  2. run_agent: new eager-fallback block for overloaded, after the rate-limit block. Bypasses pool check entirely.

Why a separate block: overloaded isn't a rate limit, and credential rotation can't fix a saturated provider. Separate = zero risk to existing rate-limit/billing logic.

What this does NOT touch

  • is_rate_limited tuple — unchanged (still only rate_limit + billing)
  • _pool_may_recover_from_rate_limit — unchanged
  • Nous rate guard — unchanged
  • Compression / context-overflow — unchanged
  • All existing fallback and retry paths — preserved

Related

4 files, +27/-1 lines.

@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 25, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #11034 — same root cause: 503/529 overloaded errors missing should_fallback=True in error_classifier.py and credential pool guard blocking fallback. #11492 (closed) had same fix. This PR is more comprehensive with the independent overloaded block.

@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #11034

@pazyork pazyork closed this Apr 25, 2026
@pazyork pazyork reopened this Apr 25, 2026
@pazyork pazyork closed this Apr 25, 2026
@pazyork pazyork reopened this Apr 25, 2026
@pazyork pazyork closed this Apr 25, 2026
@pazyork pazyork reopened this Apr 25, 2026
@pazyork pazyork closed this Apr 25, 2026
@pazyork pazyork reopened this Apr 25, 2026
@pazyork

pazyork commented Apr 25, 2026

Copy link
Copy Markdown
Author

@alt-glitch same author as #11492 — rebased on main after _pool_may_recover_from_rate_limit landed (#11314). GitHub won't reopen after force-push, so here we are.

Independent block is intentional: overloaded ≠ rate_limit, credential rotation can't fix provider overload. #14055 is complementary (message-pattern path vs status-code path).

When a provider returns 503 (Service Unavailable) or 529 (Overloaded),
the agent should fall back to an alternate provider immediately.
Credential-pool rotation cannot fix provider-side overload — rotating
keys against the same overloaded servers is useless.

Two minimal changes:
1. error_classifier: set should_fallback=True for 503/529 (consistent
   with rate_limit and billing classifications)
2. run_agent: add independent eager-fallback block for overloaded,
   placed after the rate-limit pool-rotation deferral block. Overloaded
   bypasses the _pool_may_recover_from_rate_limit check because
   credential rotation cannot resolve provider-side capacity issues.

More focused than adding overloaded to the is_rate_limited tuple
and complementary to NousResearch#14055 (message-pattern classification path).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Persistent HTTP 529 / “server cluster is currently under high load” errors in Hermes when using MiniMax M2.7

3 participants