-
-
Notifications
You must be signed in to change notification settings - Fork 52.6k
Description
Description
When the primary model (Anthropic Claude) hits API rate limits, the gateway does NOT automatically switch agents to configured fallback models (OpenRouter/DeepSeek, OpenRouter/Kimi). Agents remain stuck on the rate-limited primary model, causing cascading failures.
Environment
- OpenClaw version: 2026.2.14
- OS: macOS (Intel)
- Node: v22.22.0
- Gateway mode: local
- 8 agents configured with fallback chains
Configuration
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-sonnet-4-5-20250929",
"fallbacks": ["openrouter/deepseek/deepseek-v3.2", "openrouter/moonshotai/kimi-k2.5"]
}
}
}
}Auth profiles exist for both anthropic:manual and openrouter:manual in each agent's auth-profiles.json.
Steps to Reproduce
- Configure agent with Anthropic primary + OpenRouter fallbacks
- Trigger rate limit on Anthropic (e.g., activate multiple agents simultaneously)
- Observe that agents do NOT switch to fallback models
Expected Behavior
Per model-failover docs: "If all profiles for a provider fail, OpenClaw moves to the next model in agents.defaults.model.fallbacks."
Agents should automatically switch to openrouter/deepseek/deepseek-v3.2 when Anthropic is rate-limited.
Actual Behavior
- All agents fail with
FailoverError: ⚠️ API rate limit reached. Please try again later. - No attempt to use fallback models
- Gateway error log shows cascading failures across all agents
- Messages queue up (321 seconds max wait observed)
- Eventually OAuth token expires (HTTP 401), compounding the issue
Testing Done
- Manual cooldown in auth-profiles.json: Set
cooldownUntilin the future foranthropic:manual→ gateway IGNORES file-based cooldowns for existing sessions - Changed primary model to DeepSeek in config: Gateway hot-reloads config but existing sessions keep old model
- Gateway restart with DeepSeek as primary: Agent correctly responds on DeepSeek → confirms OpenRouter integration works
- Verified OpenRouter API key: Direct curl to OpenRouter returns successful response
Key Finding
The failover mechanism appears to only work at session creation (after gateway restart), not at runtime when rate limits are encountered. File-based cooldowns in auth-profiles.json are not respected by running sessions.
Error Log Excerpt
2026-02-17T13:45:54Z [diagnostic] lane task error: lane=session:agent:lexi:discord:channel:... error="FailoverError: ⚠️ API rate limit reached."
2026-02-17T13:45:58Z [diagnostic] lane task error: lane=session:agent:ressie:discord:channel:... error="FailoverError: ⚠️ API rate limit reached."
2026-02-17T13:46:01Z [diagnostic] lane task error: lane=session:agent:mark:discord:channel:... error="FailoverError: ⚠️ API rate limit reached."
2026-02-17T13:46:16Z [diagnostic] lane task error: lane=session:agent:dessie:discord:channel:... error="FailoverError: ⚠️ API rate limit reached."
2026-02-17T13:46:18Z [diagnostic] lane task error: lane=session:agent:afy:discord:channel:... error="FailoverError: ⚠️ API rate limit reached."
2026-02-17T13:46:24Z [diagnostic] lane task error: lane=session:agent:tech:discord:channel:... error="FailoverError: ⚠️ API rate limit reached."
2026-02-17T13:51:01Z [discord] Slow listener detected: DiscordMessageListener took 321 seconds
Workaround
Manually switch model via config + gateway restart:
# Edit openclaw.json to change agent model
# Then restart gateway to clear sessions
launchctl kickstart -k gui/$(id -u)/ai.openclaw.gateway