Per-model rate limit on one Google model triggers full provider cooldown, blocking other Google models with available quota

## Summary

When a single Google model (e.g., `gemini-3-flash`) hits its per-model TPM (tokens per minute) rate limit, the entire `google` provider enters cooldown. This prevents failover to other Google models (e.g., `gemini-2.5-flash-lite`) that have completely separate and unused quota.

## Environment

- **OpenClaw version:** 2026.1.30
- **OS:** macOS (Darwin 24.1.0)
- **Google AI Studio tier:** Paid Tier 1

## Model Configuration

- **Primary:** `google/gemini-3-flash-preview`
- **Fallback 1:** `google/gemini-3-pro-preview`
- **Fallback 2:** `google/gemini-2.5-flash-lite`
- **Fallback 3:** `openrouter/moonshotai/kimi-k2.5`

## Google Rate Limits (per model, from AI Studio dashboard)

| Model | RPM | TPM | RPD |
|---|---|---|---|
| Gemini 3 Pro | 25 | 1M | 250 |
| Gemini 3 Flash | 1K | 1M | 10K |
| Gemini 2.5 Flash Lite | 4K | 4M | Unlimited |

## What Happened

1. `gemini-3-flash` exceeded its 1M TPM limit (peaked at 1.82M/1M).
2. Google returned a `429 RESOURCE_EXHAUSTED` error with a 46s retry delay, **specific to the `gemini-3-flash` model**.
3. OpenClaw placed the entire `google` provider in cooldown.
4. Fallback to `gemini-3-pro-preview` failed: `"No available auth profile for google (all in cooldown or unavailable)."`
5. Fallback to `gemini-2.5-flash-lite` also failed with the same cooldown error — **despite having only 38.55K / 4M TPM used (less than 1% of its quota).**
6. All three Google models failed, and only the non-Google fallback (`openrouter/...`) could potentially help.

## Relevant Log Output

```
FailoverError: LLM error: { "error": { "code": 429, "message": "Quota exceeded for metric: ...input_token_count, limit: 1000000, model: gemini-3-flash\nPlease retry in 46.356163766s." } }

lane task error: lane=main error="FailoverError: No available auth profile for google (all in cooldown or unavailable)."

Embedded agent failed before reply: All models failed (3):
  google/gemini-3-flash-preview: LLM error 429 (rate_limit)
  | google/gemini-3-pro-preview: No available auth profile for google (all in cooldown or unavailable). (rate_limit)
  | google/gemini-2.5-flash-lite: No available auth profile for google (all in cooldown or unavailable). (rate_limit)
```

## Expected Behavior

When `gemini-3-flash` is rate-limited, OpenClaw should:
- Put only `gemini-3-flash` in cooldown (or the specific quota that was exceeded)
- Still attempt `gemini-2.5-flash-lite` since it has its own separate quota on Google's side (4M TPM vs 1M TPM for Flash)
- Only enter full provider cooldown if Google returns a provider-wide error (e.g., billing issue, auth revoked)

## Workaround

Adding a fallback on a **different provider** (e.g., `openrouter/...`) ensures the bot doesn't go fully dark. But users who want to stay within their Google AI Studio free/paid quota across multiple models are forced to route through a second provider unnecessarily.

## Suggestion

Consider implementing per-model cooldown tracking instead of (or in addition to) per-provider cooldown. The Google 429 response already includes the specific model name in the quota violation, which could be used to scope the cooldown:

```json
"quotaDimensions": { "location": "global", "model": "gemini-3-flash" }
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Per-model rate limit on one Google model triggers full provider cooldown, blocking other Google models with available quota #5744

Summary

Environment

Model Configuration

Google Rate Limits (per model, from AI Studio dashboard)

What Happened

Relevant Log Output

Expected Behavior

Workaround

Suggestion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model	RPM	TPM	RPD
Gemini 3 Pro	25	1M	250
Gemini 3 Flash	1K	1M	10K
Gemini 2.5 Flash Lite	4K	4M	Unlimited

Uh oh!

Per-model rate limit on one Google model triggers full provider cooldown, blocking other Google models with available quota #5744

Description

Summary

Environment

Model Configuration

Google Rate Limits (per model, from AI Studio dashboard)

What Happened

Relevant Log Output

Expected Behavior

Workaround

Suggestion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions