Skip to content

Unhandled 503 MODEL_CAPACITY_EXHAUSTED for gemini-3.1-pro-high (does not respect RetryInfo) #24815

@jyongchul

Description

@jyongchul

When using gemini-3.1-pro-high, the CLI occasionally receives a 503 Service Unavailable error with the reason MODEL_CAPACITY_EXHAUSTED. Even though the server provides a RetryInfo with a retryDelay (e.g., 9s or 10s), the CLI does not seem to honor this retry delay and instead fails the request.

Example error:
Trajectory ID: 89ecc052-02de-47e6-a2f9-62fd1f051cd4
Error: HTTP 503 Service Unavailable

{
"error": {
"code": 503,
"details": [
{
"@type": "type.googleapis.com/google.rpc.ErrorInfo",
"domain": "cloudcode-pa.googleapis.com",
"metadata": {
"model": "gemini-3.1-pro-high"
},
"reason": "MODEL_CAPACITY_EXHAUSTED"
},
{
"@type": "type.googleapis.com/google.rpc.RetryInfo",
"retryDelay": "9s"
}
],
"message": "No capacity available for model gemini-3.1-pro-high on the server",
"status": "UNAVAILABLE"
}
}

The CLI should automatically parse the RetryInfo and sleep for the specified duration before retrying the request, as there is sufficient quota left but the server is temporarily out of capacity.

Metadata

Metadata

Assignees

Labels

area/platformIssues related to Build infra, Release mgmt, Testing, Eval infra, Capacity, Quota mgmt🔒 maintainer only⛔ Do not contribute. Internal roadmap item.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions