-
-
Notifications
You must be signed in to change notification settings - Fork 52.6k
Description
Summary
When a single Google model (e.g., gemini-3-flash) hits its per-model TPM (tokens per minute) rate limit, the entire google provider enters cooldown. This prevents failover to other Google models (e.g., gemini-2.5-flash-lite) that have completely separate and unused quota.
Environment
- OpenClaw version: 2026.1.30
- OS: macOS (Darwin 24.1.0)
- Google AI Studio tier: Paid Tier 1
Model Configuration
- Primary:
google/gemini-3-flash-preview - Fallback 1:
google/gemini-3-pro-preview - Fallback 2:
google/gemini-2.5-flash-lite - Fallback 3:
openrouter/moonshotai/kimi-k2.5
Google Rate Limits (per model, from AI Studio dashboard)
| Model | RPM | TPM | RPD |
|---|---|---|---|
| Gemini 3 Pro | 25 | 1M | 250 |
| Gemini 3 Flash | 1K | 1M | 10K |
| Gemini 2.5 Flash Lite | 4K | 4M | Unlimited |
What Happened
gemini-3-flashexceeded its 1M TPM limit (peaked at 1.82M/1M).- Google returned a
429 RESOURCE_EXHAUSTEDerror with a 46s retry delay, specific to thegemini-3-flashmodel. - OpenClaw placed the entire
googleprovider in cooldown. - Fallback to
gemini-3-pro-previewfailed:"No available auth profile for google (all in cooldown or unavailable)." - Fallback to
gemini-2.5-flash-litealso failed with the same cooldown error — despite having only 38.55K / 4M TPM used (less than 1% of its quota). - All three Google models failed, and only the non-Google fallback (
openrouter/...) could potentially help.
Relevant Log Output
FailoverError: LLM error: { "error": { "code": 429, "message": "Quota exceeded for metric: ...input_token_count, limit: 1000000, model: gemini-3-flash\nPlease retry in 46.356163766s." } }
lane task error: lane=main error="FailoverError: No available auth profile for google (all in cooldown or unavailable)."
Embedded agent failed before reply: All models failed (3):
google/gemini-3-flash-preview: LLM error 429 (rate_limit)
| google/gemini-3-pro-preview: No available auth profile for google (all in cooldown or unavailable). (rate_limit)
| google/gemini-2.5-flash-lite: No available auth profile for google (all in cooldown or unavailable). (rate_limit)
Expected Behavior
When gemini-3-flash is rate-limited, OpenClaw should:
- Put only
gemini-3-flashin cooldown (or the specific quota that was exceeded) - Still attempt
gemini-2.5-flash-litesince it has its own separate quota on Google's side (4M TPM vs 1M TPM for Flash) - Only enter full provider cooldown if Google returns a provider-wide error (e.g., billing issue, auth revoked)
Workaround
Adding a fallback on a different provider (e.g., openrouter/...) ensures the bot doesn't go fully dark. But users who want to stay within their Google AI Studio free/paid quota across multiple models are forced to route through a second provider unnecessarily.
Suggestion
Consider implementing per-model cooldown tracking instead of (or in addition to) per-provider cooldown. The Google 429 response already includes the specific model name in the quota violation, which could be used to scope the cooldown:
"quotaDimensions": { "location": "global", "model": "gemini-3-flash" }