fix(agents): continue fallback loop for unrecognized provider errors by Sid-Qin · Pull Request #26106 · openclaw/openclaw

Sid-Qin · 2026-02-25T04:28:43Z

Summary

Problem: Model fallback stops after 2 models when a provider returns an error that coerceToFailoverError cannot classify, even though 17 fallback models are configured
Why it matters: Users experience 30–60 min downtime waiting for cooldown to expire, even when other providers are healthy
What changed: In runWithModelFallback (src/agents/model-fallback.ts), unrecognized errors now continue the fallback loop instead of immediately rethrowing; rethrow only occurs on the last candidate
What did NOT change: Auth errors, rate-limit cooldown, and context-overflow errors behave exactly as before

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes [Bug]: Model fallback stops after 2 models instead of trying all configured fallbacks #25926

User-visible / Behavior Changes

When a provider returns an unrecognized error (not auth, rate-limit, or context-overflow), the system now continues trying remaining fallback models instead of aborting. Users will see fewer All models failed (2) errors when many fallbacks are configured.

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Repro + Verification

Environment

Model/provider: Multiple providers (e.g., qwen-portal, opencode, nvidia-nim, xfyun)
Relevant config: 17 fallback models across 4 providers

Steps

Configure multiple fallback models across providers
Wait for first two providers to hit rate-limit cooldown
Send a message to trigger model fallback

Expected

System skips cooled-down providers and continues to remaining fallback models

Actual (before fix)

Error after 2 models: All models failed (2): ... Provider X is in cooldown | Provider Y is in cooldown

Evidence

Failing test/log before + passing after
Updated test: unrecognized errors with remaining candidates now trigger fallback to next model
New test: unrecognized error on last candidate is correctly rethrown
All model-fallback tests pass

Human Verification (required)

Verified scenarios: unrecognized error mid-chain continues fallback; unrecognized error on last candidate throws; auth/rate-limit errors behave as before
Edge cases checked: single candidate with fallbacksOverride: [], error on last candidate
What you did not verify: live multi-provider setup with actual rate-limiting

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No

Failure Recovery (if this breaks)

How to disable/revert this change quickly: Revert commit 265364e
Files/config to restore: src/agents/model-fallback.ts
Known bad symptoms: Non-retryable errors being retried across all candidates instead of failing fast

Risks and Mitigations

Risk: Some errors that were previously non-retryable might now be retried across all candidates, adding latency
- Mitigation: Context-overflow and abort errors are still handled as non-retryable; only truly unclassified errors continue the loop

When a provider returns an error that coerceToFailoverError cannot classify (e.g., custom error messages without standard HTTP status codes), the fallback loop threw immediately instead of trying the next candidate. This caused fallback to stop after 2 models even when 17 were configured. Only rethrow unrecognized errors when they occur on the last candidate. For intermediate candidates, record the error as an attempt and continue to the next model. Closes openclaw#25926 Co-authored-by: Cursor <cursoragent@cursor.com>

@Sid-Qin

…(thanks @Sid-Qin)

steipete · 2026-02-25T04:53:36Z

Landed via temp rebase onto main.

Gate: pnpm test src/agents/model-fallback.test.ts && pnpm check
Land commit: f78bf75
Merge commit: 156f13a

Thanks @Sid-Qin!

@Sid-Qin

…penclaw#26106) * fix(agents): continue fallback loop for unrecognized provider errors When a provider returns an error that coerceToFailoverError cannot classify (e.g., custom error messages without standard HTTP status codes), the fallback loop threw immediately instead of trying the next candidate. This caused fallback to stop after 2 models even when 17 were configured. Only rethrow unrecognized errors when they occur on the last candidate. For intermediate candidates, record the error as an attempt and continue to the next model. Closes openclaw#25926 Co-authored-by: Cursor <cursoragent@cursor.com> * test: cover unknown-error fallback telemetry and land openclaw#26106 (thanks @Sid-Qin) --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>

@Sid-Qin

…penclaw#26106) * fix(agents): continue fallback loop for unrecognized provider errors When a provider returns an error that coerceToFailoverError cannot classify (e.g., custom error messages without standard HTTP status codes), the fallback loop threw immediately instead of trying the next candidate. This caused fallback to stop after 2 models even when 17 were configured. Only rethrow unrecognized errors when they occur on the last candidate. For intermediate candidates, record the error as an attempt and continue to the next model. Closes openclaw#25926 Co-authored-by: Cursor <cursoragent@cursor.com> * test: cover unknown-error fallback telemetry and land openclaw#26106 (thanks @Sid-Qin) --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>

@Sid-Qin

…penclaw#26106) * fix(agents): continue fallback loop for unrecognized provider errors When a provider returns an error that coerceToFailoverError cannot classify (e.g., custom error messages without standard HTTP status codes), the fallback loop threw immediately instead of trying the next candidate. This caused fallback to stop after 2 models even when 17 were configured. Only rethrow unrecognized errors when they occur on the last candidate. For intermediate candidates, record the error as an attempt and continue to the next model. Closes openclaw#25926 Co-authored-by: Cursor <cursoragent@cursor.com> * test: cover unknown-error fallback telemetry and land openclaw#26106 (thanks @Sid-Qin) --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>

@Sid-Qin

…penclaw#26106) * fix(agents): continue fallback loop for unrecognized provider errors When a provider returns an error that coerceToFailoverError cannot classify (e.g., custom error messages without standard HTTP status codes), the fallback loop threw immediately instead of trying the next candidate. This caused fallback to stop after 2 models even when 17 were configured. Only rethrow unrecognized errors when they occur on the last candidate. For intermediate candidates, record the error as an attempt and continue to the next model. Closes openclaw#25926 Co-authored-by: Cursor <cursoragent@cursor.com> * test: cover unknown-error fallback telemetry and land openclaw#26106 (thanks @Sid-Qin) --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>

@Sid-Qin

…penclaw#26106) * fix(agents): continue fallback loop for unrecognized provider errors When a provider returns an error that coerceToFailoverError cannot classify (e.g., custom error messages without standard HTTP status codes), the fallback loop threw immediately instead of trying the next candidate. This caused fallback to stop after 2 models even when 17 were configured. Only rethrow unrecognized errors when they occur on the last candidate. For intermediate candidates, record the error as an attempt and continue to the next model. Closes openclaw#25926 Co-authored-by: Cursor <cursoragent@cursor.com> * test: cover unknown-error fallback telemetry and land openclaw#26106 (thanks @Sid-Qin) --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>

@Sid-Qin

…penclaw#26106) * fix(agents): continue fallback loop for unrecognized provider errors When a provider returns an error that coerceToFailoverError cannot classify (e.g., custom error messages without standard HTTP status codes), the fallback loop threw immediately instead of trying the next candidate. This caused fallback to stop after 2 models even when 17 were configured. Only rethrow unrecognized errors when they occur on the last candidate. For intermediate candidates, record the error as an attempt and continue to the next model. Closes openclaw#25926 Co-authored-by: Cursor <cursoragent@cursor.com> * test: cover unknown-error fallback telemetry and land openclaw#26106 (thanks @Sid-Qin) --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>

@Sid-Qin

…penclaw#26106) * fix(agents): continue fallback loop for unrecognized provider errors When a provider returns an error that coerceToFailoverError cannot classify (e.g., custom error messages without standard HTTP status codes), the fallback loop threw immediately instead of trying the next candidate. This caused fallback to stop after 2 models even when 17 were configured. Only rethrow unrecognized errors when they occur on the last candidate. For intermediate candidates, record the error as an attempt and continue to the next model. Closes openclaw#25926 Co-authored-by: Cursor <cursoragent@cursor.com> * test: cover unknown-error fallback telemetry and land openclaw#26106 (thanks @Sid-Qin) --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>

openclaw-barnacle bot added agents Agent runtime and tooling size: XS experienced-contributor labels Feb 25, 2026

SidQin-cyber and others added 2 commits February 25, 2026 04:52

test: cover unknown-error fallback telemetry and land openclaw#26106 …

f78bf75

…(thanks @Sid-Qin)

steipete force-pushed the fix/model-fallback-exhaustion-25926 branch from 265364e to f78bf75 Compare February 25, 2026 04:53

steipete merged commit 156f13a into openclaw:main Feb 25, 2026
9 checks passed

openclaw-barnacle bot added size: S and removed size: XS labels Feb 25, 2026

github-actions bot mentioned this pull request Feb 25, 2026

📡 Upstream Digest — 2026-02-25 06:54 UTC curtismercier/openclaw-mods#122

Open

arjunaskykok mentioned this pull request Feb 26, 2026

fix/test a2ui bundle preflight #27345

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): continue fallback loop for unrecognized provider errors#26106

fix(agents): continue fallback loop for unrecognized provider errors#26106
steipete merged 2 commits intoopenclaw:mainfrom
Sid-Qin:fix/model-fallback-exhaustion-25926

Sid-Qin commented Feb 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

steipete commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Sid-Qin commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual (before fix)

Evidence

Human Verification (required)

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Uh oh!

Uh oh!

steipete commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sid-Qin commented Feb 25, 2026 •

edited

Loading