bug: Incorrect Model Fallback and Retry Logic for 429 Quota Errors

### What happened?


The current model fallback logic for 429 quota errors is unreliable and leads to a poor user experience. It relies on fragile string matching of error messages and a simple counter for consecutive errors. This causes incorrect behavior, such as downgrading the model for transient, short-term limits when a simple retry would suffice, or failing to downgrade when a hard daily limit is hit. The current implementation only handles one authentication type (`LOGIN_WITH_GOOGLE`) and does not cover all API endpoints that it should.


### What did you expect to happen?

The model fallback and retry logic should be robust, predictable, and apply to all authentication types (Google Sign-In, API Keys for Gemini & Vertex) and relevant API calls. It should intelligently distinguish between short-term (retryable) and long-term (fallback-worthy) quota limits.

When a short-term, per-minute limit is hit, the CLI should retry after the delay specified by the API. The user should not be interrupted.

When a long-term, daily limit is hit, the CLI should immediately and clearly inform the user and suggest falling back to a different model (e.g., Flash), because retrying is unlikely to solve the problem in a reasonable timeframe.

### Client information

<details>
<summary>Client Information</summary>

Run `gemini` to enter the interactive CLI, then run the `/about` command.

```console
> /about
| About Gemini CLI                                                │
│                                                                 │
│ CLI Version           0.7.0-nightly.20250918.2722473a           │
│ Git Commit            f46e50b27                                 │
│ Model                 gemini-2.5-pro                            │
│ Sandbox               no sandbox                                │
│ OS                    linux                                     │
│ Auth Method           vertex-ai                                 │
│ GCP Project           gaghosh-project-1                         │
│ IDE Client            VS Code  
```

</details>

### Login information


This issue affects all login types, including Google Account (GCA), Gemini API Keys, and Vertex API authentication, as all these services return structured 429 errors that are not being properly utilized.


### Anything else we need to know?

The backend APIs (GCA, Gemini, Vertex) generally always return structured errors for 429 responses, compliant with Google API standards. We can create a parser for these errors like https://gist.github.com/gsquared94/375a7220f6ea0d32961e7ab1a1d63da5 (`parseGoogleApiError`). We can refactor the `retryWithBackoff` logic to use this parser.

An example of a structured error is:
```json
{
  "error": {
    "message": "{\n \"error\": {\n \"code\": 429,\n \"message\": \"You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.\\n* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 50\nPlease retry in 34.074824224s.\",\n \"status\": \"RESOURCE_EXHAUSTED\",\n \"details\": [\n {\n \"@type\": \"type.googleapis.com/google.rpc.QuotaFailure\",\n \"violations\": [\n {\n \"quotaMetric\": \"generativelanguage.googleapis.com/generate_content_free_tier_requests\",\n \"quotaId\": \"GenerateRequestsPerDayPerProjectPerModel-FreeTier\",\n \"quotaDimensions\": {\n \"location\": \"global\",\n \"model\": \"gemini-2.5-pro\"\n },\n \"quotaValue\": \"50\"\n }\n ]\n },\n {\n \"@type\": \"type.googleapis.com/google.rpc.Help\",\n \"links\": [\n {\n \"description\": \"Learn more about Gemini API quotas\",\n \"url\": \"https://ai.google.dev/gemini-api/docs/rate-limits\"\n }
 ]\n },\n {\n \"@type\": \"type.googleapis.com/google.rpc.RetryInfo\",\n \"retryDelay\": \"34s\"\n }
 ]\n }\n}",
    "code": 429,
    "status": "Too Many Requests"
  }
}
```
*Note: The `message` field can contain a stringified JSON. The parser will need to handle this with `JSON.parse()`.*

**Implementation Proposal (_without too much refactoring_):**

1.  In `retryWithBackoff` (and other relevant error handling locations), call `parseGoogleApiError` on any caught error.
2.  If a structured `GoogleApiError` is parsed, inspect its `details`:
    *   **Check for `QuotaFailure`:**
        *   Examine the `quotaId` in the violations. If it contains substrings like `PerDay`, `Daily`, or other long-term indicators, it's a terminal quota for the session. This should trigger the model fallback flow.
        *   If the `quotaId` contains `PerMinute`, `PerSecond`, or indicates a short-term limit, it's a transient error.
    *   **Check for `RetryInfo`:**
        *   Use the `retryDelay`. A short delay (e.g., < 5 minutes) confirms a transient error, and the system should wait for this duration and retry.
        *   A long delay (e.g., hours) indicates a long-term lockout, which should also trigger the model fallback flow.
3.  **Decision Logic:**
    *   **Trigger Fallback IF:** (`QuotaFailure` exists AND `quotaId` indicates a daily/long-term limit) OR (`RetryInfo` exists AND `retryDelay` is long, e.g., > 5 minutes).
    *   **Retry Silently IF:** (`QuotaFailure` exists AND `quotaId` indicates a minute/short-term limit) OR (`RetryInfo` exists AND `retryDelay` is short).
    *   If no structured error is found, the system can revert to the existing (less reliable) backoff mechanism as a legacy fallback.
4.  This new logic should replace the current string-matching and consecutive error counting, and it should be applied to all auth types.

This approach will make the retry/fallback behavior much more accurate and improve the user experience by correctly interpreting the specific reason for the rate limit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Incorrect Model Fallback and Retry Logic for 429 Quota Errors #9248

What happened?

What did you expect to happen?

Client information

Login information

Anything else we need to know?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug: Incorrect Model Fallback and Retry Logic for 429 Quota Errors #9248

Description

What happened?

What did you expect to happen?

Client information

Login information

Anything else we need to know?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions