fix(pipes): run fallback preset when the main model times out or errors (#3914)#3951
Merged
Conversation
…rs (#3914) A pipe with main + fallback presets (preset: ["primary", "fallback"]) only advanced to the fallback when the failed preset's circuit breaker tripped. The breaker opens only for a narrow set of text-matched provider errors and never for timeouts or executor crashes, so pick_preset() kept returning the same failing preset, should_retry was false, and the fallback silently never ran. This is exactly what #3914 reports (main model times out / errors / returns no valid response, fallback never kicks in). Drive in-run fallback by the failed preset's position instead: add PresetFallbackRegistry::pick_preset_with_floor and advance the selection floor past the preset that just failed on ANY failure, bounded by MAX_FALLBACK_DEPTH. Each retry strictly increases the floor so it always terminates. The circuit breaker stays as the cross-run optimization that preemptively skips known-bad presets. Covered by new unit tests in preset_fallback.rs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Closes #3914.
A pipe with a main + fallback preset (
preset: ["primary", "fallback"]) never ran the fallback when the main model failed. The user reported it for the exact cases the feature exists to cover: the main model "times out, returns an error, hits a rate limit, or fails to provide a valid response", and the pipe just stops instead of falling through.Root cause
Fallback advancement was gated entirely on the failed preset's circuit breaker tripping:
record_failure_from_outputonly trips the breaker when the failure's stderr/stdout text matches a hardcoded set (rate limit,429,credits,502/503/529,overloaded,timeout, ...).So when the main model timed out, crashed, or returned an error whose text did not match those strings, the breaker stayed
Closed,pick_preset()re-returned the same main preset,should_retryevaluated tofalse, and the fallback silently never ran.Fix
Drive the in-run fallback by the failed preset's position in the list, not by the breaker:
PresetFallbackRegistry::pick_preset_with_floor(presets, floor)selects from indexflooronward.retry_depthand, on any failure, advances the floor past the preset that just ran, bounded byMAX_FALLBACK_DEPTH.Each retry strictly increases the floor, so it always terminates.
Testing
New tests cover the regression directly:
pick_preset_with_flooradvances to the next preset with no breaker tripped (the timeout/crash case), skips an open breaker, and clamps a too-large floor.