Skip to content

fix(pipes): run fallback preset when the main model times out or errors (#3914)#3951

Merged
louis030195 merged 1 commit into
mainfrom
claude/zen-pasteur-cde8af
Jun 9, 2026
Merged

fix(pipes): run fallback preset when the main model times out or errors (#3914)#3951
louis030195 merged 1 commit into
mainfrom
claude/zen-pasteur-cde8af

Conversation

@louis030195

Copy link
Copy Markdown
Collaborator

before vs after

Problem

Closes #3914.

A pipe with a main + fallback preset (preset: ["primary", "fallback"]) never ran the fallback when the main model failed. The user reported it for the exact cases the feature exists to cover: the main model "times out, returns an error, hits a rate limit, or fails to provide a valid response", and the pipe just stops instead of falling through.

Root cause

Fallback advancement was gated entirely on the failed preset's circuit breaker tripping:

  • record_failure_from_output only trips the breaker when the failure's stderr/stdout text matches a hardcoded set (rate limit, 429, credits, 502/503/529, overloaded, timeout, ...).
  • The timeout arm and the executor-crash arm of the runner never call it at all.

So when the main model timed out, crashed, or returned an error whose text did not match those strings, the breaker stayed Closed, pick_preset() re-returned the same main preset, should_retry evaluated to false, and the fallback silently never ran.

Fix

Drive the in-run fallback by the failed preset's position in the list, not by the breaker:

  • New PresetFallbackRegistry::pick_preset_with_floor(presets, floor) selects from index floor onward.
  • The runner starts selection at retry_depth and, on any failure, advances the floor past the preset that just ran, bounded by MAX_FALLBACK_DEPTH.
  • The circuit breaker stays exactly as before for its real job: a cross-run optimization that preemptively skips presets known to be down.

Each retry strictly increases the floor, so it always terminates.

Testing

cargo test -p screenpipe-core preset_fallback
14 passed; 0 failed

New tests cover the regression directly: pick_preset_with_floor advances to the next preset with no breaker tripped (the timeout/crash case), skips an open breaker, and clamps a too-large floor.

Note: 4 unrelated test_should_run_cron_* tests are wall-clock-flaky. They build a cron from Utc::now() and assert grace-window firing, so they pass or fail depending on which second the suite runs in, on main too, independent of this change.

…rs (#3914)

A pipe with main + fallback presets (preset: ["primary", "fallback"]) only
advanced to the fallback when the failed preset's circuit breaker tripped.
The breaker opens only for a narrow set of text-matched provider errors and
never for timeouts or executor crashes, so pick_preset() kept returning the
same failing preset, should_retry was false, and the fallback silently never
ran. This is exactly what #3914 reports (main model times out / errors /
returns no valid response, fallback never kicks in).

Drive in-run fallback by the failed preset's position instead: add
PresetFallbackRegistry::pick_preset_with_floor and advance the selection
floor past the preset that just failed on ANY failure, bounded by
MAX_FALLBACK_DEPTH. Each retry strictly increases the floor so it always
terminates. The circuit breaker stays as the cross-run optimization that
preemptively skips known-bad presets.

Covered by new unit tests in preset_fallback.rs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@louis030195 louis030195 merged commit 53d1b88 into main Jun 9, 2026
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Fallback Model Doesn’t Run When the Main Model Fails

1 participant