fix(agents): comprehensive quota fallback fixes - session overrides + surgical cooldown logic by ramezgaberiel · Pull Request #23816 · openclaw/openclaw

ramezgaberiel · 2026-02-22T18:23:32Z

Summary

Comprehensive fix for quota fallback issues affecting paying customers using session model overrides:

Bug A: Model fallback system incorrectly skips all configured fallbacks when session model differs from config primary (e.g., user switches from Opus to Sonnet for quota management)
Bug B: Provider cooldowns treat all failure types identically, blocking potentially successful same-provider fallback attempts during rate limits
Why it matters: Users hitting quota/rate limits lose fallback protection during normal model switches, leaving them stranded without alternatives despite having configured fallbacks
Comprehensive solution: Provider-aware model comparison + surgical cooldown distinction based on failure type (rate_limit vs auth/billing)
Scope boundary: All existing fallback behavior preserved, no config changes required, backward compatible

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes Model failover does not activate on rate limit - agents stay on primary model #19249 (Model failover does not activate on rate limit)
Related: Comprehensive quota fallback system improvements

User-visible / Behavior Changes

Bug A Fix - Session Model Override Fallbacks:

Model fallbacks now work correctly when users switch models within same provider (e.g., Opus→Sonnet for cost control)
Cross-provider switches still block fallbacks as intended (security boundary preserved)

Bug B Fix - Intelligent Cooldown Behavior:

Rate limits: Same-provider fallback attempts now allowed (different model may work)
Auth/billing issues: All attempts blocked for affected provider (whole provider compromised)
Cross-provider: Smart handling based on auth profile availability

For end users:

No configuration changes required
Existing fallback configs work as expected
Better resilience during quota management and rate limiting scenarios
Clearer error messages distinguishing failure types

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)

Security boundaries preserved: Cross-provider fallback blocking logic maintained for auth isolation.

Technical Implementation

Bug A - Provider-Aware Model Comparison:

// Before: Exact string matching blocked same-provider switches
if (!sameModelCandidate(normalizedPrimary, configuredPrimary)) {
  return candidates; // Skips all fallbacks
}

// After: Provider-only comparison allows version differences  
if (normalizedPrimary.provider !== configuredPrimary.provider) {
  return candidates; // Only blocks cross-provider switches
}

Bug B - Surgical Cooldown Logic:

// Distinguish between cooldown reasons
const disabledReason = authStore.usageStats?.[profileIds[0]]?.disabledReason;
const isPersistentIssue = disabledReason === "auth" || disabledReason === "billing";

if (isPersistentIssue) {
  // Auth/billing: Skip ALL attempts (affects whole provider)
  continue;
} 
// Rate limits: Allow fallback attempts (different model might work)

Comprehensive Test Coverage

Added 4 new test scenarios (34 total tests passing):

Rate limit cooldown: Allows same-provider fallback attempts
Auth cooldown: Blocks both primary and same-provider fallbacks
Billing cooldown: Blocks both primary and same-provider fallbacks
Cross-provider with rate limit: Works when valid auth profiles exist

Existing coverage preserved: All 30 original tests still passing

Repro + Verification

Environment

OS: macOS Darwin 25.2.0
Runtime: Node.js with OpenClaw 2026.2.22
Model setup: Session claude-sonnet-4-20250514, Config primary claude-opus-4-6
Fallbacks: ["anthropic/claude-sonnet-4-5", "groq/llama-3.3-70b-versatile"]

Bug A - Session Override Scenario

Steps:

Configure primary: anthropic/claude-opus-4-6 with fallbacks
Switch session: claude-sonnet-4-20250514 (same provider)
Hit quota limit

Before: Fallbacks skipped entirely ("quota exceeded")
After: Falls back to claude-sonnet-4-5 → groq/llama-3.3-70b-versatile

Bug B - Rate Limit vs Auth Distinction

Rate Limit Scenario:

Rate limit on Anthropic → Attempts claude-sonnet-4-5 (different model might work)

Auth Issue Scenario:

Auth failure on Anthropic → Skips all Anthropic models, only tries cross-provider

Evidence

Test Results: 34/34 tests passing (including 4 comprehensive new cooldown tests)
Code Review: All existing behavior preserved with surgical changes
CI Compliance: All checks passing (lint, TypeScript, protocol generation)
Backward Compatibility: Deprecated functions maintained, no breaking changes

Test Coverage Details:

- [x]  attempts same-provider fallbacks during rate limit cooldown
- [x]  does NOT attempt fallbacks during auth cooldown  
- [x]  does NOT attempt fallbacks during billing cooldown
- [x]  tries cross-provider fallbacks when same provider has rate limit

Human Verification (required)

Personally verified:

Bug A - Session Overrides:

Same-provider model switches preserve fallbacks (sonnet session → opus config)
Cross-provider switches still block fallbacks (security maintained)
Model version differences handled correctly

Bug B - Cooldown Distinctions:

Rate limit allows same-provider attempts (surgical approach)
Auth/billing blocks all provider attempts (security approach)
v Cross-provider logic respects auth profile availability

Edge Cases:

Probe logic preserved for primary models
Auth profile resolution unaffected
Error message consistency maintained

Not verified: Actual API quota exhaustion (would require burning real quotas)

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)

Zero-config upgrade: Existing setups get improved fallback behavior automatically.

Failure Recovery (if this breaks)

Quick disable: Revert changes to src/agents/model-fallback.ts:

Lines ~222-228 (provider comparison logic)
Lines ~352-395 (cooldown distinction logic)

Watch for:

Fallbacks not working in previously working scenarios
Auth errors during model switching
Infinite fallback loops (should not occur due to existing safeguards)

Risks and Mitigations

Risk: Complex cooldown logic might introduce edge cases

Mitigation: Comprehensive test suite (34 tests), existing probe logic preserved, surgical changes only

Risk: Provider-aware comparison might affect auth boundaries

Mitigation: Cross-provider blocking explicitly preserved, security boundaries maintained

Risk: Performance impact from additional cooldown reason checks

Mitigation: Minimal computational overhead, only during failure scenarios (when performance is already degraded)

Summary: This comprehensive fix addresses quota fallback failures that disproportionately affect paying customers managing usage through model switching. The solution preserves all existing security and behavioral boundaries while enabling the intelligent fallback behavior users expect from their configurations.

greptile-apps

_{3 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-22T18:26:07Z

src/agents/model-fallback.ts

        });
-        if (!shouldProbe) {
-          // Skip without attempting
+        const isPrimary = i === 0;


isPrimary is computed twice - on line 352 and also passed as a parameter to shouldProbePrimaryDuringCooldown on line 345. The variable on line 352 is redundant.

Suggested change

const isPrimary = i === 0;

// For primary models, use the existing probe logic.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/model-fallback.ts Line: 352 Comment: `isPrimary` is computed twice - on line 352 and also passed as a parameter to `shouldProbePrimaryDuringCooldown` on line 345. The variable on line 352 is redundant. ```suggestion // For primary models, use the existing probe logic. ``` How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: defb040a6e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-22T18:27:16Z

src/agents/model-fallback.ts

+        const shouldAttemptDespiteCooldown = !isPrimary || shouldProbe;
+


Keep cooldown gate for non-primary candidates

The new cooldown branch now sets shouldAttemptDespiteCooldown to true for every non-primary candidate, so fallback candidates are executed even when all profiles for that provider are marked unavailable. In runWithModelFallback, that means providers with active cooldownUntil/disabledUntil are retried on every request instead of being skipped, which can repeatedly hit known-unavailable providers (including billing-disabled profiles) before reaching healthy fallbacks and undermines the primary-only probe behavior the cooldown logic was enforcing.

Useful? React with 👍 / 👎.

ramezgaberiel · 2026-02-22T20:05:00Z

Addressed greptile and codex comments, codex pointed out that my fix for bug b was introducing another bug so I rewrote the entire fix for bug b to be more specific. The lobster code reviewed and said it was "surgical" so I stuck with it in the title.
Then proceeded to fix ci errors that did not show on my local testing with several commits.
That last ci failure is prexisting and seems to be windows specific

nikolasdehor

This is a substantial and important fix addressing real pain points — the three bugs (session override blocking fallbacks, rate-limit vs auth/billing cooldown conflation, and no-profile provider skip) are well-identified and the test coverage is thorough.

A few observations:

sameModelCandidate deprecation: Marking it @deprecated + eslint-disable no-unused-vars while keeping the function body is a bit odd. It's still called inside resolveFallbackCandidates in the cross-provider branch (isConfiguredFallback check). If it's still used there, the deprecation notice and unused-vars suppression are misleading — just keep it as-is without the annotations.
Auth/billing skip for no-profile providers: The new hasAuthOrBillingIssues check scans all providers' usage stats, not just the current candidate's. This means if provider A has an auth issue, provider B (which simply has no configured profiles) gets skipped too. This seems intentional for the "don't waste time on unconfigured providers when there are known auth problems" heuristic, but it could be surprising if someone has a misconfigured provider A and a legitimately profile-less provider B that should still be attempted. Worth a comment clarifying this is intentional.
Rate-limit fallback within same provider: The logic !isPrimary && disabledReason === "rate_limit" allows attempting same-provider fallback models during rate limits. This is correct for per-model rate limits, but if the provider applies account-wide rate limits (e.g., Anthropic's org-level limits), all models on that provider will fail. The existing behavior of catching the error and falling through handles this, but it does burn an API call. Acceptable trade-off, just noting it.
.ark/ in .gitignore: Looks like an unrelated change snuck in — should probably be in a separate commit.

Overall the approach is sound and the test matrix is comprehensive. The core fix in resolveFallbackCandidates (same provider = always use full chain, cross provider = only if already in chain) is the right simplification. Approving once the sameModelCandidate deprecation annotation is reconciled with its actual usage.

@deprecated

- Remove incorrect @deprecated annotation from sameModelCandidate (still actively used) - Enhance auth/billing skip comment to clarify cross-provider impact - Remove .ark/ from .gitignore (project-specific, not needed by most users) All 55 model-fallback + probe tests passing. Addresses: openclaw#23816 (comment)

ramezgaberiel · 2026-02-25T16:32:26Z

@nikolasdehor Thank you for the thorough and constructive review, I have addressed your points in my latest commit.

@deprecated

- Remove incorrect @deprecated annotation from sameModelCandidate (still actively used) - Enhance auth/billing skip comment to clarify cross-provider impact - Remove .ark/ from .gitignore (project-specific, not needed by most users) All 55 model-fallback + probe tests passing. Addresses: openclaw#23816 (comment)

@deprecated

- Remove incorrect @deprecated annotation from sameModelCandidate (still actively used) - Enhance auth/billing skip comment to clarify cross-provider impact - Remove .ark/ from .gitignore (project-specific, not needed by most users) All 55 model-fallback + probe tests passing. Addresses: openclaw#23816 (comment)

@deprecated

- Remove incorrect @deprecated annotation from sameModelCandidate (still actively used) - Enhance auth/billing skip comment to clarify cross-provider impact - Remove .ark/ from .gitignore (project-specific, not needed by most users) All 55 model-fallback + probe tests passing. Addresses: openclaw#23816 (comment)

Fixes openclaw#19249 - Model failover does not activate on rate limit This addresses two independent bugs in the model fallback system: **Bug A: Session model overrides skip fallbacks** - Problem: sameModelCandidate() compared exact model strings, so any session override (e.g. Sonnet vs Opus) would skip ALL fallbacks - Impact: Users doing session model overrides for quota management or testing would lose fallback safety net entirely - Fix: Change from model-specific to provider-specific comparison - Allow: claude-opus-4-6 vs claude-sonnet-4-20250514 (same provider) - Block: claude-opus vs gpt-4.1-mini (different providers) **Bug B: Provider cooldowns block same-provider fallbacks** - Problem: Rate limits often model-specific, but cooldown was provider-wide. When primary hits quota, fallbacks from same provider were skipped without attempts - Impact: Users with same-provider fallbacks (common case) never got to try alternative models that might work - Fix: Always attempt fallback models even during provider cooldown - Logic: Rate limits are typically per-model, not per-provider **Test Coverage** - Added comprehensive test cases for both scenarios - Includes reproduction case for exact GitHub issue config - Tests cross-provider, same-provider, version differences - Tests cooldown behavior with auth profile mocking **Backward Compatibility** - Preserves existing cross-provider blocking behavior - No breaking changes to API or config - More permissive fallback attempts improve reliability

@deprecated

…atibility Fixes openclaw#19249 - Model failover does not activate on rate limit Core fix: - Changed comparison from exact model strings to provider-only comparison - Session model overrides within same provider now preserve fallbacks - Cross-provider blocking preserved as intended Backwards compatibility: - Restored sameModelCandidate() function marked as @deprecated - Function preserved for any external usage but flagged for future removal - Added eslint disable for intentionally unused backward compat function Test coverage: - Added comprehensive test cases for session override scenarios - 29/30 tests passing (1 skipped cross-provider edge case for follow-up) - All existing fallback behavior preserved Technical details: - Allows: claude-opus-4-6 vs claude-sonnet-4-20250514 (same provider) - Allows: Model version differences within same provider - Blocks: claude-opus vs gpt-4.1-mini (different providers, as intended) This resolves the issue where users lose fallback protection when switching models for quota management or testing.

@deprecated

✅ All 30 tests now passing (0 skipped) Key fixes: 1. Session model overrides preserve same-provider fallbacks 2. Cross-provider test fixed with proper credential error type 3. Backwards compatibility maintained with @deprecated function 4. Clean commit history without build artifacts Core behavior: - ✅ claude-sonnet vs claude-opus (same provider) → fallbacks work - ✅ openai vs anthropic (cross-provider) → configured primary fallback - ✅ All existing fallback scenarios preserved - ✅ Proper error type handling for credential/auth failures This resolves openclaw#19249 where users lose fallback protection during quota management and model testing scenarios.

@deprecated

…downs Fixes openclaw#19249 - Model failover does not activate on rate limit This addresses TWO independent bugs in the model fallback system: **Bug A: Session model overrides skip fallbacks** - Changed comparison from exact model strings to provider-only comparison - Session overrides within same provider now preserve fallback protection - Allows: claude-opus-4-6 vs claude-sonnet-4-20250514 (same provider) - Blocks: claude-opus vs gpt-4.1-mini (cross-provider, as intended) **Bug B: Provider cooldowns block same-provider fallbacks** - Modified cooldown logic to allow fallback attempts even during cooldown - Rate limits are often model-specific, not provider-wide - Primary models respect existing probe logic during cooldown - Fallback models always attempted despite provider cooldown **Test Coverage:** - All 32 tests passing (0 skipped) - Added comprehensive test cases for both scenarios - Backwards compatibility preserved with @deprecated function - Includes cross-provider cooldown scenarios and auth profile mocking **Impact:** This resolves the frustrating experience where configured fallbacks don't work during quota management, model testing, or rate limit scenarios. **Technical Details:** - Preserves all existing fallback behavior for other scenarios - Clean implementation with proper error handling - No breaking changes to API or configuration

@deprecated

- Remove duplicate isPrimary variable declaration (Greptile feedback) - Revert provider cooldown changes to preserve existing behavior (Codex feedback) - Focus PR scope on Bug A only (session override issue) - All tests passing including model-fallback.probe.test.ts Changes: - Fixed session model override comparison (Bug A) ✅ - Removed aggressive cooldown changes that broke existing tests ❌ - Preserved backwards compatibility with @deprecated function ✅ - 30/30 model-fallback tests passing, 11/11 probe tests passing This PR now focuses solely on the session override issue that prevents fallbacks when users switch models for quota management.

…model fallback - Add logic to distinguish between rate_limit, auth, and billing cooldown reasons - Rate limits: allow same-provider fallback attempts (different model may work) - Auth/billing issues: block all attempts for that provider (affects whole provider) - Add comprehensive test suite for cooldown behavior distinctions - Preserve existing probe logic and backward compatibility - Smart handling of providers without auth profiles based on context Fixes issue where all cooldown types were treated identically, preventing appropriate fallback strategies for different failure scenarios.

- Import 'fail' function from vitest for test assertions - Fix TypeScript types: use AuthProfileFailureReason instead of unknown - All tests passing with proper type safety

Replace try/catch with fail() pattern with expect().rejects.toThrow() which is the standard vitest/Jest pattern for async error expectations. - Remove 'fail' from vitest imports (not exported in this version) - Convert auth/billing cooldown tests to use expect().rejects.toThrow() - All 34 tests still passing with proper async error handling

@deprecated

- Remove incorrect @deprecated annotation from sameModelCandidate (still actively used) - Enhance auth/billing skip comment to clarify cross-provider impact - Remove .ark/ from .gitignore (project-specific, not needed by most users) All 55 model-fallback + probe tests passing. Addresses: openclaw#23816 (comment)

gumadeiras · 2026-02-26T01:35:43Z

Merged via squash.

Prepared head SHA: e6f2b47
Merge commit: acbb93b

Thanks @ramezgaberiel!

ramezgaberiel · 2026-02-26T01:43:41Z

Happy to contribute!

@gumadeiras