Bug Report
Description
When the primary provider (Anthropic) returns 'overloaded' errors, the failover logic retries alternate profiles of the same provider instead of escalating to configured fallback providers (Google Gemini, OpenAI).
Configuration
{
"primary": "anthropic/claude-opus-4-6",
"fallbacks": ["google/gemini-3-pro", "openai/gpt-5-2"]
}
What happened
During the Anthropic worldwide outage on March 2, 2026 (~11:30-16:37 UTC):
- Primary model (claude-opus-4-6) returned
FailoverError: The AI service is temporarily overloaded
- System retried with alternate Anthropic profile (
anthropic:claude oauth vs anthropic:default token) — same provider, same outage
- Fallback providers (Gemini, OpenAI) were never attempted despite being configured and operational
- All requests failed with
FailoverError for ~2 hours
- Log evidence:
Profile anthropic:default timed out. Trying next account... — but no log entries show attempts to google/ or openai/ providers
Expected behavior
When primary provider returns persistent errors (overloaded/timeout), failover should escalate to the next provider in the fallback chain, not just try alternate auth profiles of the same broken provider.
Log excerpts
2026-03-02T16:52:09 ERROR FailoverError: The AI service is temporarily overloaded.
2026-03-02T17:00:48 WARN embedded run agent end: isError=true error=The AI service is temporarily overloaded.
2026-03-02T17:01:06 WARN (retry 2) same error
2026-03-02T17:01:25 WARN (retry 3) same error
2026-03-02T17:01:48 WARN (retry 4) same error
2026-03-02T21:10:30 WARN Profile anthropic:default timed out. Trying next account...
Impact
User was completely unreachable via the bot for ~2 hours despite having two working fallback providers configured. This defeats the purpose of the fallback chain.
Environment
- OpenClaw version: 2026.2.24
- Primary: anthropic/claude-opus-4-6
- Fallbacks: google/gemini-3-pro, openai/gpt-5-2
- Auth profiles: anthropic:default (token), anthropic:claude (oauth)
Bug Report
Description
When the primary provider (Anthropic) returns 'overloaded' errors, the failover logic retries alternate profiles of the same provider instead of escalating to configured fallback providers (Google Gemini, OpenAI).
Configuration
{ "primary": "anthropic/claude-opus-4-6", "fallbacks": ["google/gemini-3-pro", "openai/gpt-5-2"] }What happened
During the Anthropic worldwide outage on March 2, 2026 (~11:30-16:37 UTC):
FailoverError: The AI service is temporarily overloadedanthropic:claudeoauth vsanthropic:defaulttoken) — same provider, same outageFailoverErrorfor ~2 hoursProfile anthropic:default timed out. Trying next account...— but no log entries show attempts to google/ or openai/ providersExpected behavior
When primary provider returns persistent errors (overloaded/timeout), failover should escalate to the next provider in the fallback chain, not just try alternate auth profiles of the same broken provider.
Log excerpts
Impact
User was completely unreachable via the bot for ~2 hours despite having two working fallback providers configured. This defeats the purpose of the fallback chain.
Environment