-
-
Notifications
You must be signed in to change notification settings - Fork 52.6k
Open
Labels
staleMarked as stale due to inactivityMarked as stale due to inactivity
Description
Problem\n\nThe current model failover logic is robust for transient errors but can be overly aggressive when faced with permanent, unrecoverable errors like or .\n\nWhen a primary model's API key is invalid, the system currently attempts to fail over to the next model in the list. However, it retries the exact same prompt, which can be very large. This single large retry can be enough to exhaust the TPM (Tokens Per Minute) quota of the fallback model. Every subsequent request repeats this pattern, creating a self-inflicted denial-of-service where all available models quickly become rate-limited due to the primary model's permanent failure.\n\n### Proposed Solution\n\nImplement a more intelligent failover mechanism that incorporates a circuit breaker pattern, differentiating between transient and permanent errors.\n\n1. Error Categorization:\n * Permanent Errors: Treat HTTP , , and potentially (for invalid model IDs) as permanent configuration issues.\n * Transient Errors: Treat HTTP , , , , as temporary service issues.\n\n2. Circuit Breaker Logic:\n * If a model provider returns a permanent error, the system should immediately "trip the circuit" for that specific auth profile (e.g., ).\n * The unhealthy profile should be placed into a cooldown state for a configurable duration (e.g., 10-15 minutes) to prevent further requests.\n * The system should immediately attempt the request on the next model in the fallback list.\n * A high-priority system notification should be generated, informing the user that a provider has been disabled due to an authentication/configuration error (e.g., ).\n\n3. Failover for Transient Errors:\n * If a model fails with a transient error (like a rate limit), the existing failover logic to the next model is appropriate.\n\n### Benefits\n\n- Preserves Resources: Prevents the system from wasting API calls and burning through the rate limits of healthy fallback models.\n- Increases Resilience: Allows the system to gracefully degrade by automatically sidelining a misconfigured provider while continuing to function on others.\n- Improves Diagnosability: Provides clear, immediate feedback about which part of the configuration is broken, allowing for faster resolution.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
staleMarked as stale due to inactivityMarked as stale due to inactivity