Model failover does not activate on rate limit - agents stay on primary model

## Description

When the primary model (Anthropic Claude) hits API rate limits, the gateway does NOT automatically switch agents to configured fallback models (OpenRouter/DeepSeek, OpenRouter/Kimi). Agents remain stuck on the rate-limited primary model, causing cascading failures.

## Environment

- OpenClaw version: 2026.2.14
- OS: macOS (Intel)
- Node: v22.22.0
- Gateway mode: local
- 8 agents configured with fallback chains

## Configuration

```json
{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-5-20250929",
        "fallbacks": ["openrouter/deepseek/deepseek-v3.2", "openrouter/moonshotai/kimi-k2.5"]
      }
    }
  }
}
```

Auth profiles exist for both `anthropic:manual` and `openrouter:manual` in each agent's `auth-profiles.json`.

## Steps to Reproduce

1. Configure agent with Anthropic primary + OpenRouter fallbacks
2. Trigger rate limit on Anthropic (e.g., activate multiple agents simultaneously)
3. Observe that agents do NOT switch to fallback models

## Expected Behavior

Per [model-failover docs](https://docs.openclaw.ai/concepts/model-failover): "If all profiles for a provider fail, OpenClaw moves to the next model in `agents.defaults.model.fallbacks`."

Agents should automatically switch to `openrouter/deepseek/deepseek-v3.2` when Anthropic is rate-limited.

## Actual Behavior

- All agents fail with `FailoverError: ⚠️ API rate limit reached. Please try again later.`
- No attempt to use fallback models
- Gateway error log shows cascading failures across all agents
- Messages queue up (321 seconds max wait observed)
- Eventually OAuth token expires (HTTP 401), compounding the issue

## Testing Done

1. **Manual cooldown in auth-profiles.json**: Set `cooldownUntil` in the future for `anthropic:manual` → gateway IGNORES file-based cooldowns for existing sessions
2. **Changed primary model to DeepSeek in config**: Gateway hot-reloads config but existing sessions keep old model
3. **Gateway restart with DeepSeek as primary**: Agent correctly responds on DeepSeek → confirms OpenRouter integration works
4. **Verified OpenRouter API key**: Direct curl to OpenRouter returns successful response

## Key Finding

The failover mechanism appears to only work at session creation (after gateway restart), not at runtime when rate limits are encountered. File-based cooldowns in `auth-profiles.json` are not respected by running sessions.

## Error Log Excerpt

```
2026-02-17T13:45:54Z [diagnostic] lane task error: lane=session:agent:lexi:discord:channel:... error="FailoverError: ⚠️ API rate limit reached."
2026-02-17T13:45:58Z [diagnostic] lane task error: lane=session:agent:ressie:discord:channel:... error="FailoverError: ⚠️ API rate limit reached."
2026-02-17T13:46:01Z [diagnostic] lane task error: lane=session:agent:mark:discord:channel:... error="FailoverError: ⚠️ API rate limit reached."
2026-02-17T13:46:16Z [diagnostic] lane task error: lane=session:agent:dessie:discord:channel:... error="FailoverError: ⚠️ API rate limit reached."
2026-02-17T13:46:18Z [diagnostic] lane task error: lane=session:agent:afy:discord:channel:... error="FailoverError: ⚠️ API rate limit reached."
2026-02-17T13:46:24Z [diagnostic] lane task error: lane=session:agent:tech:discord:channel:... error="FailoverError: ⚠️ API rate limit reached."
2026-02-17T13:51:01Z [discord] Slow listener detected: DiscordMessageListener took 321 seconds
```

## Workaround

Manually switch model via config + gateway restart:
```bash
# Edit openclaw.json to change agent model
# Then restart gateway to clear sessions
launchctl kickstart -k gui/$(id -u)/ai.openclaw.gateway
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model failover does not activate on rate limit - agents stay on primary model #19249

Description

Environment

Configuration

Steps to Reproduce

Expected Behavior

Actual Behavior

Testing Done

Key Finding

Error Log Excerpt

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Model failover does not activate on rate limit - agents stay on primary model #19249

Description

Description

Environment

Configuration

Steps to Reproduce

Expected Behavior

Actual Behavior

Testing Done

Key Finding

Error Log Excerpt

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions