fix(agents): comprehensive quota fallback fixes - session overrides + surgical cooldown logic#23816
Conversation
src/agents/model-fallback.ts
Outdated
| }); | ||
| if (!shouldProbe) { | ||
| // Skip without attempting | ||
| const isPrimary = i === 0; |
There was a problem hiding this comment.
isPrimary is computed twice - on line 352 and also passed as a parameter to shouldProbePrimaryDuringCooldown on line 345. The variable on line 352 is redundant.
| const isPrimary = i === 0; | |
| // For primary models, use the existing probe logic. |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/model-fallback.ts
Line: 352
Comment:
`isPrimary` is computed twice - on line 352 and also passed as a parameter to `shouldProbePrimaryDuringCooldown` on line 345. The variable on line 352 is redundant.
```suggestion
// For primary models, use the existing probe logic.
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: defb040a6e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/agents/model-fallback.ts
Outdated
| const shouldAttemptDespiteCooldown = !isPrimary || shouldProbe; | ||
|
|
There was a problem hiding this comment.
Keep cooldown gate for non-primary candidates
The new cooldown branch now sets shouldAttemptDespiteCooldown to true for every non-primary candidate, so fallback candidates are executed even when all profiles for that provider are marked unavailable. In runWithModelFallback, that means providers with active cooldownUntil/disabledUntil are retried on every request instead of being skipped, which can repeatedly hit known-unavailable providers (including billing-disabled profiles) before reaching healthy fallbacks and undermines the primary-only probe behavior the cooldown logic was enforcing.
Useful? React with 👍 / 👎.
1c74e28 to
82c416b
Compare
|
Addressed greptile and codex comments, codex pointed out that my fix for bug b was introducing another bug so I rewrote the entire fix for bug b to be more specific. The lobster code reviewed and said it was "surgical" so I stuck with it in the title. |
c629945 to
60ea9ac
Compare
nikolasdehor
left a comment
There was a problem hiding this comment.
This is a substantial and important fix addressing real pain points — the three bugs (session override blocking fallbacks, rate-limit vs auth/billing cooldown conflation, and no-profile provider skip) are well-identified and the test coverage is thorough.
A few observations:
-
sameModelCandidatedeprecation: Marking it@deprecated+eslint-disable no-unused-varswhile keeping the function body is a bit odd. It's still called insideresolveFallbackCandidatesin the cross-provider branch (isConfiguredFallbackcheck). If it's still used there, the deprecation notice and unused-vars suppression are misleading — just keep it as-is without the annotations. -
Auth/billing skip for no-profile providers: The new
hasAuthOrBillingIssuescheck scans all providers' usage stats, not just the current candidate's. This means if provider A has an auth issue, provider B (which simply has no configured profiles) gets skipped too. This seems intentional for the "don't waste time on unconfigured providers when there are known auth problems" heuristic, but it could be surprising if someone has a misconfigured provider A and a legitimately profile-less provider B that should still be attempted. Worth a comment clarifying this is intentional. -
Rate-limit fallback within same provider: The logic
!isPrimary && disabledReason === "rate_limit"allows attempting same-provider fallback models during rate limits. This is correct for per-model rate limits, but if the provider applies account-wide rate limits (e.g., Anthropic's org-level limits), all models on that provider will fail. The existing behavior of catching the error and falling through handles this, but it does burn an API call. Acceptable trade-off, just noting it. -
.ark/in .gitignore: Looks like an unrelated change snuck in — should probably be in a separate commit.
Overall the approach is sound and the test matrix is comprehensive. The core fix in resolveFallbackCandidates (same provider = always use full chain, cross provider = only if already in chain) is the right simplification. Approving once the sameModelCandidate deprecation annotation is reconciled with its actual usage.
- Remove incorrect @deprecated annotation from sameModelCandidate (still actively used) - Enhance auth/billing skip comment to clarify cross-provider impact - Remove .ark/ from .gitignore (project-specific, not needed by most users) All 55 model-fallback + probe tests passing. Addresses: openclaw#23816 (comment)
|
@nikolasdehor Thank you for the thorough and constructive review, I have addressed your points in my latest commit. |
4fc8bfd to
6807652
Compare
- Remove incorrect @deprecated annotation from sameModelCandidate (still actively used) - Enhance auth/billing skip comment to clarify cross-provider impact - Remove .ark/ from .gitignore (project-specific, not needed by most users) All 55 model-fallback + probe tests passing. Addresses: openclaw#23816 (comment)
6807652 to
6e1dd56
Compare
- Remove incorrect @deprecated annotation from sameModelCandidate (still actively used) - Enhance auth/billing skip comment to clarify cross-provider impact - Remove .ark/ from .gitignore (project-specific, not needed by most users) All 55 model-fallback + probe tests passing. Addresses: openclaw#23816 (comment)
24482ff to
53f62b0
Compare
- Remove incorrect @deprecated annotation from sameModelCandidate (still actively used) - Enhance auth/billing skip comment to clarify cross-provider impact - Remove .ark/ from .gitignore (project-specific, not needed by most users) All 55 model-fallback + probe tests passing. Addresses: openclaw#23816 (comment)
Fixes openclaw#19249 - Model failover does not activate on rate limit This addresses two independent bugs in the model fallback system: **Bug A: Session model overrides skip fallbacks** - Problem: sameModelCandidate() compared exact model strings, so any session override (e.g. Sonnet vs Opus) would skip ALL fallbacks - Impact: Users doing session model overrides for quota management or testing would lose fallback safety net entirely - Fix: Change from model-specific to provider-specific comparison - Allow: claude-opus-4-6 vs claude-sonnet-4-20250514 (same provider) - Block: claude-opus vs gpt-4.1-mini (different providers) **Bug B: Provider cooldowns block same-provider fallbacks** - Problem: Rate limits often model-specific, but cooldown was provider-wide. When primary hits quota, fallbacks from same provider were skipped without attempts - Impact: Users with same-provider fallbacks (common case) never got to try alternative models that might work - Fix: Always attempt fallback models even during provider cooldown - Logic: Rate limits are typically per-model, not per-provider **Test Coverage** - Added comprehensive test cases for both scenarios - Includes reproduction case for exact GitHub issue config - Tests cross-provider, same-provider, version differences - Tests cooldown behavior with auth profile mocking **Backward Compatibility** - Preserves existing cross-provider blocking behavior - No breaking changes to API or config - More permissive fallback attempts improve reliability
…atibility Fixes openclaw#19249 - Model failover does not activate on rate limit Core fix: - Changed comparison from exact model strings to provider-only comparison - Session model overrides within same provider now preserve fallbacks - Cross-provider blocking preserved as intended Backwards compatibility: - Restored sameModelCandidate() function marked as @deprecated - Function preserved for any external usage but flagged for future removal - Added eslint disable for intentionally unused backward compat function Test coverage: - Added comprehensive test cases for session override scenarios - 29/30 tests passing (1 skipped cross-provider edge case for follow-up) - All existing fallback behavior preserved Technical details: - Allows: claude-opus-4-6 vs claude-sonnet-4-20250514 (same provider) - Allows: Model version differences within same provider - Blocks: claude-opus vs gpt-4.1-mini (different providers, as intended) This resolves the issue where users lose fallback protection when switching models for quota management or testing.
✅ All 30 tests now passing (0 skipped) Key fixes: 1. Session model overrides preserve same-provider fallbacks 2. Cross-provider test fixed with proper credential error type 3. Backwards compatibility maintained with @deprecated function 4. Clean commit history without build artifacts Core behavior: - ✅ claude-sonnet vs claude-opus (same provider) → fallbacks work - ✅ openai vs anthropic (cross-provider) → configured primary fallback - ✅ All existing fallback scenarios preserved - ✅ Proper error type handling for credential/auth failures This resolves openclaw#19249 where users lose fallback protection during quota management and model testing scenarios.
…downs Fixes openclaw#19249 - Model failover does not activate on rate limit This addresses TWO independent bugs in the model fallback system: **Bug A: Session model overrides skip fallbacks** - Changed comparison from exact model strings to provider-only comparison - Session overrides within same provider now preserve fallback protection - Allows: claude-opus-4-6 vs claude-sonnet-4-20250514 (same provider) - Blocks: claude-opus vs gpt-4.1-mini (cross-provider, as intended) **Bug B: Provider cooldowns block same-provider fallbacks** - Modified cooldown logic to allow fallback attempts even during cooldown - Rate limits are often model-specific, not provider-wide - Primary models respect existing probe logic during cooldown - Fallback models always attempted despite provider cooldown **Test Coverage:** - All 32 tests passing (0 skipped) - Added comprehensive test cases for both scenarios - Backwards compatibility preserved with @deprecated function - Includes cross-provider cooldown scenarios and auth profile mocking **Impact:** This resolves the frustrating experience where configured fallbacks don't work during quota management, model testing, or rate limit scenarios. **Technical Details:** - Preserves all existing fallback behavior for other scenarios - Clean implementation with proper error handling - No breaking changes to API or configuration
- Remove duplicate isPrimary variable declaration (Greptile feedback) - Revert provider cooldown changes to preserve existing behavior (Codex feedback) - Focus PR scope on Bug A only (session override issue) - All tests passing including model-fallback.probe.test.ts Changes: - Fixed session model override comparison (Bug A) ✅ - Removed aggressive cooldown changes that broke existing tests ❌ - Preserved backwards compatibility with @deprecated function ✅ - 30/30 model-fallback tests passing, 11/11 probe tests passing This PR now focuses solely on the session override issue that prevents fallbacks when users switch models for quota management.
…model fallback - Add logic to distinguish between rate_limit, auth, and billing cooldown reasons - Rate limits: allow same-provider fallback attempts (different model may work) - Auth/billing issues: block all attempts for that provider (affects whole provider) - Add comprehensive test suite for cooldown behavior distinctions - Preserve existing probe logic and backward compatibility - Smart handling of providers without auth profiles based on context Fixes issue where all cooldown types were treated identically, preventing appropriate fallback strategies for different failure scenarios.
- Import 'fail' function from vitest for test assertions - Fix TypeScript types: use AuthProfileFailureReason instead of unknown - All tests passing with proper type safety
Replace try/catch with fail() pattern with expect().rejects.toThrow() which is the standard vitest/Jest pattern for async error expectations. - Remove 'fail' from vitest imports (not exported in this version) - Convert auth/billing cooldown tests to use expect().rejects.toThrow() - All 34 tests still passing with proper async error handling
- Remove incorrect @deprecated annotation from sameModelCandidate (still actively used) - Enhance auth/billing skip comment to clarify cross-provider impact - Remove .ark/ from .gitignore (project-specific, not needed by most users) All 55 model-fallback + probe tests passing. Addresses: openclaw#23816 (comment)
53f62b0 to
e6f2b47
Compare
|
Merged via squash. Thanks @ramezgaberiel! |
|
Happy to contribute! |
… surgical cooldown logic (openclaw#23816) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: e6f2b47 Co-authored-by: ramezgaberiel <844893+ramezgaberiel@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras
… surgical cooldown logic (openclaw#23816) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: e6f2b47 Co-authored-by: ramezgaberiel <844893+ramezgaberiel@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras
… surgical cooldown logic (openclaw#23816) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: e6f2b47 Co-authored-by: ramezgaberiel <844893+ramezgaberiel@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras
… surgical cooldown logic (openclaw#23816) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: e6f2b47 Co-authored-by: ramezgaberiel <844893+ramezgaberiel@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras
… surgical cooldown logic (openclaw#23816) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: e6f2b47 Co-authored-by: ramezgaberiel <844893+ramezgaberiel@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras
… surgical cooldown logic (openclaw#23816) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: e6f2b47 Co-authored-by: ramezgaberiel <844893+ramezgaberiel@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras
… surgical cooldown logic (openclaw#23816) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: e6f2b47 Co-authored-by: ramezgaberiel <844893+ramezgaberiel@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras
… surgical cooldown logic (openclaw#23816) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: e6f2b47 Co-authored-by: ramezgaberiel <844893+ramezgaberiel@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras
… surgical cooldown logic (openclaw#23816) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: e6f2b47 Co-authored-by: ramezgaberiel <844893+ramezgaberiel@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras
Summary
Comprehensive fix for quota fallback issues affecting paying customers using session model overrides:
Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
Bug A Fix - Session Model Override Fallbacks:
Bug B Fix - Intelligent Cooldown Behavior:
For end users:
Security Impact (required)
No)No)No)No)No)Security boundaries preserved: Cross-provider fallback blocking logic maintained for auth isolation.
Technical Implementation
Bug A - Provider-Aware Model Comparison:
Bug B - Surgical Cooldown Logic:
Comprehensive Test Coverage
Added 4 new test scenarios (34 total tests passing):
Existing coverage preserved: All 30 original tests still passing
Repro + Verification
Environment
claude-sonnet-4-20250514, Config primaryclaude-opus-4-6["anthropic/claude-sonnet-4-5", "groq/llama-3.3-70b-versatile"]Bug A - Session Override Scenario
Steps:
anthropic/claude-opus-4-6with fallbacksclaude-sonnet-4-20250514(same provider)Before: Fallbacks skipped entirely ("quota exceeded")
After: Falls back to
claude-sonnet-4-5→groq/llama-3.3-70b-versatileBug B - Rate Limit vs Auth Distinction
Rate Limit Scenario:
claude-sonnet-4-5(different model might work)Auth Issue Scenario:
Evidence
Test Coverage Details:
Human Verification (required)
Personally verified:
Bug A - Session Overrides:
Bug B - Cooldown Distinctions:
v Cross-provider logic respects auth profile availability
Edge Cases:
Not verified: Actual API quota exhaustion (would require burning real quotas)
Compatibility / Migration
Yes)No)No)Zero-config upgrade: Existing setups get improved fallback behavior automatically.
Failure Recovery (if this breaks)
Quick disable: Revert changes to
src/agents/model-fallback.ts:Watch for:
Risks and Mitigations
Risk: Complex cooldown logic might introduce edge cases
Risk: Provider-aware comparison might affect auth boundaries
Risk: Performance impact from additional cooldown reason checks
Summary: This comprehensive fix addresses quota fallback failures that disproportionately affect paying customers managing usage through model switching. The solution preserves all existing security and behavioral boundaries while enabling the intelligent fallback behavior users expect from their configurations.