Skip to content

Treat HTTP 503 (and 502/504) as failover-eligible so model fallback triggers #20999

@metadavideth

Description

@metadavideth

OpenClaw: Treat HTTP 503 (and 502/504) as failover-eligible so model fallback triggers

Summary

When the primary model’s API returns 503 Service Unavailable (e.g. Google Gemini “This model is currently experiencing high demand…”), OpenClaw retries the same model and never moves to agents.defaults.model.fallbacks. The run can then stall with no reply to the user.

Model fallback is documented to apply to “auth failures, rate limits, and timeouts.” Today, 503 (and 502/504) are not treated as failover-eligible, so they fall through to message-based classification and often end up as “other” → no fallback.

Expected behavior

  • 503, and preferably 502 / 504, should be treated like rate-limit/transient errors: trigger model fallback (next model in agents.defaults.model.fallbacks) instead of only retrying the same model.
  • Same for error messages that clearly indicate overload (e.g. “UNAVAILABLE”, “experiencing high demand”) when the error object doesn’t carry an HTTP status.

Actual behavior

  • Primary (e.g. google/gemini-3-flash-preview) returns 503.
  • Run retries the same model; after tool results, the next completion request again returns 503.
  • No switch to fallback (e.g. minimax/MiniMax-M2.5); user gets no reply.

Observed in session transcript: assistant message with "code":503,"status":"Service Unavailable" and message “This model is currently experiencing high demand. Spikes in demand are usually temporary. Please try again later.” Retries stayed on the same model.

Proposed fix

In src/agents/failover-error.ts, in the function that resolves failover reason from an error (e.g. resolveFailoverReasonFromError):

  1. By status code: After handling 408 (timeout), add handling for 503 (and optionally 502, 504) and return "rate_limit" (so existing fallback/cooldown behavior applies).

    Example (pseudocode):

    if (status === 408) return "timeout";
    // Treat server overload / temporary unavailable as rate-limit-like for fallback
    if (status === 503 || status === 502 || status === 504) return "rate_limit";
  2. By message (optional): In classifyFailoverReason (or equivalent message-based classifier), if the text contains 503, UNAVAILABLE, or “high demand” / “experiencing high demand”, return "rate_limit" so that when the error is passed without an HTTP status, fallback still triggers.

Environment

  • OpenClaw version: 2026.2.17 (from npm).
  • Config: agents.defaults.model.primary = google/gemini-3-flash-preview, agents.defaults.model.fallbacks = [minimax, kimi, grok-3, ollama].
  • Channel: Telegram (direct).

References

  • Model failover: “This applies to auth failures, rate limits, and timeouts that exhausted profile rotation (other errors do not advance fallback).”
  • Session transcript showed repeated 503 on gemini-3-flash-preview with no switch to fallback.

Thank you for considering this; it would make configured fallbacks actually apply when the primary provider is overloaded (503) or temporarily down (502/504).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions