Skip to content

Per-model rate limit on one Google model triggers full provider cooldown, blocking other Google models with available quota #5744

@davo20019

Description

@davo20019

Summary

When a single Google model (e.g., gemini-3-flash) hits its per-model TPM (tokens per minute) rate limit, the entire google provider enters cooldown. This prevents failover to other Google models (e.g., gemini-2.5-flash-lite) that have completely separate and unused quota.

Environment

  • OpenClaw version: 2026.1.30
  • OS: macOS (Darwin 24.1.0)
  • Google AI Studio tier: Paid Tier 1

Model Configuration

  • Primary: google/gemini-3-flash-preview
  • Fallback 1: google/gemini-3-pro-preview
  • Fallback 2: google/gemini-2.5-flash-lite
  • Fallback 3: openrouter/moonshotai/kimi-k2.5

Google Rate Limits (per model, from AI Studio dashboard)

Model RPM TPM RPD
Gemini 3 Pro 25 1M 250
Gemini 3 Flash 1K 1M 10K
Gemini 2.5 Flash Lite 4K 4M Unlimited

What Happened

  1. gemini-3-flash exceeded its 1M TPM limit (peaked at 1.82M/1M).
  2. Google returned a 429 RESOURCE_EXHAUSTED error with a 46s retry delay, specific to the gemini-3-flash model.
  3. OpenClaw placed the entire google provider in cooldown.
  4. Fallback to gemini-3-pro-preview failed: "No available auth profile for google (all in cooldown or unavailable)."
  5. Fallback to gemini-2.5-flash-lite also failed with the same cooldown error — despite having only 38.55K / 4M TPM used (less than 1% of its quota).
  6. All three Google models failed, and only the non-Google fallback (openrouter/...) could potentially help.

Relevant Log Output

FailoverError: LLM error: { "error": { "code": 429, "message": "Quota exceeded for metric: ...input_token_count, limit: 1000000, model: gemini-3-flash\nPlease retry in 46.356163766s." } }

lane task error: lane=main error="FailoverError: No available auth profile for google (all in cooldown or unavailable)."

Embedded agent failed before reply: All models failed (3):
  google/gemini-3-flash-preview: LLM error 429 (rate_limit)
  | google/gemini-3-pro-preview: No available auth profile for google (all in cooldown or unavailable). (rate_limit)
  | google/gemini-2.5-flash-lite: No available auth profile for google (all in cooldown or unavailable). (rate_limit)

Expected Behavior

When gemini-3-flash is rate-limited, OpenClaw should:

  • Put only gemini-3-flash in cooldown (or the specific quota that was exceeded)
  • Still attempt gemini-2.5-flash-lite since it has its own separate quota on Google's side (4M TPM vs 1M TPM for Flash)
  • Only enter full provider cooldown if Google returns a provider-wide error (e.g., billing issue, auth revoked)

Workaround

Adding a fallback on a different provider (e.g., openrouter/...) ensures the bot doesn't go fully dark. But users who want to stay within their Google AI Studio free/paid quota across multiple models are forced to route through a second provider unnecessarily.

Suggestion

Consider implementing per-model cooldown tracking instead of (or in addition to) per-provider cooldown. The Google 429 response already includes the specific model name in the quota violation, which could be used to scope the cooldown:

"quotaDimensions": { "location": "global", "model": "gemini-3-flash" }

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions