fix(fallback): treat finish_reason:error as candidate_failed (#59524)#59680
Conversation
Greptile SummaryThis PR fixes a fallback-chain regression where a provider returning HTTP 200 with Confidence Score: 5/5
Reviews (1): Last reviewed commit: "fix(fallback): treat finish_reason:error..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 31b36d3015
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (result.meta.stopReason === "error") { | ||
| throw new FailoverError("Provider finish_reason: error", { | ||
| reason: "overloaded", |
There was a problem hiding this comment.
Preserve provider error context when converting stopReason errors
This branch always throws FailoverError("Provider finish_reason: error", { reason: "overloaded" }), which discards the original assistant/provider error details from runEmbeddedPiAgent. In single-candidate runs (or when all candidates fail), downstream handlers only see this synthetic overload error, so billing/rate-limit/role-ordering classification and user-facing recovery messaging regress to a generic failure instead of the real cause. Wrap using the underlying error text/reason (when present) rather than hardcoding overloaded.
Useful? React with 👍 / 👎.
31b36d3 to
57250d6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 57250d6124
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| throw new FailoverError("Provider finish_reason: error", { | ||
| reason: "overloaded", | ||
| provider: providerOverride, |
There was a problem hiding this comment.
Classify finish_reason errors before forcing overloaded retries
This branch converts every result.meta.stopReason === "error" into FailoverError(... reason: "overloaded"), which misclassifies non-overload failures as transient retryable overloads. For example, openai-transport-stream.ts maps finish_reason: "content_filter" to stopReason: "error"; with this change, that policy block now gets retried on fallback candidates instead of being surfaced, and downstream logic records the wrong failure reason. Please derive the failover reason from the provider error payload/message (or only map known transient reasons) before triggering fallback.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 329a48c2ba
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (result.meta.stopReason === "error") { | ||
| throw new FailoverError("Provider finish_reason: error", { | ||
| reason: "overloaded", | ||
| provider: providerOverride, |
There was a problem hiding this comment.
Derive failover reason from stopReason error payload
This branch converts every result.meta.stopReason === "error" into FailoverError(..., { reason: "overloaded" }), but stopReason: "error" is also used for non-transient failures (for example content_filter in src/agents/openai-transport-stream.ts:1304-1306). In cron isolated runs this misclassifies policy/validation errors as transient overloads, which can trigger inappropriate fallback retries and surfaces the wrong failure reason when candidates are exhausted. Please preserve the provider error context (or only map known transient cases) before forcing overloaded.
Useful? React with 👍 / 👎.
329a48c to
fd5f637
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fd5f637997
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: baf3848057
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (result.meta.stopReason === "error") { | ||
| const errorText = | ||
| result.meta.error?.message ?? | ||
| result.payloads?.find((p) => p.isError)?.text ?? | ||
| ""; | ||
| const derivedReason = classifyFailoverReason(errorText) ?? "overloaded"; | ||
| throw new FailoverError( |
There was a problem hiding this comment.
Skip fallback retry after streaming reply content
This unconditional stopReason === "error" conversion runs after the attempt may already have emitted user-visible output through earlier callbacks in this same run path (onPartialReply/onToolResult). When a provider streams partial text and then terminates with finish_reason:error (for example, a late content filter), this throw triggers runWithModelFallback to retry another candidate, so the user can receive mixed/duplicated replies from multiple attempts and repeated side effects. The retry should be gated on whether the failed attempt already produced outbound content/tool sends.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 43b602924c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| result.meta.error?.message ?? | ||
| result.payloads?.find((p) => p.isError)?.text ?? | ||
| ""; | ||
| const derivedReason = classifyFailoverReason(errorText) ?? "overloaded"; |
There was a problem hiding this comment.
Treat unknown finish_reason errors as non-transient failures
This fallback reason default (classifyFailoverReason(errorText) ?? "overloaded") misclassifies unrecognized stopReason === "error" cases as transient overloads. For example, OpenAI finish_reason: "content_filter" is mapped to "Provider finish_reason: content_filter" in src/agents/openai-transport-stream.ts, which does not match current classifiers, so this path retries other candidates and reports an overload instead of surfacing a moderation/policy failure. That changes both retry behavior and user-facing error semantics for non-transient provider stops.
Useful? React with 👍 / 👎.
43b6029 to
e936d60
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e936d60c0f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (result.meta.stopReason === "error") { | ||
| const errorText = | ||
| result.meta.error?.message ?? | ||
| result.payloads?.find((p) => p.isError)?.text ?? | ||
| ""; | ||
| const derivedReason = classifyFailoverReason(errorText) ?? "overloaded"; | ||
| throw new FailoverError( |
There was a problem hiding this comment.
Prevent retry after side-effecting stopReason:error attempt
In the cron isolated-agent fallback path, this unconditional stopReason === "error" conversion throws before the caller can evaluate runResult side-effect flags. runEmbeddedPiAgent can complete tool actions (for example message sends or cron mutations) and still return with an error stop reason from the final assistant step; throwing here makes runWithModelFallback run another candidate after those effects already happened, so scheduled runs can emit duplicate outbound messages or duplicate cron changes. Please gate this retry on whether the failed attempt already produced side effects (for example didSendViaMessagingTool / cron-add metadata) before re-entering fallback.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a038fcc037
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (result.meta.stopReason === "error") { | ||
| const errorText = | ||
| result.meta.error?.message ?? | ||
| result.payloads?.find((p) => p.isError)?.text ?? | ||
| ""; | ||
| const derivedReason = classifyFailoverReason(errorText) ?? "overloaded"; | ||
| throw new FailoverError( |
There was a problem hiding this comment.
Gate finish_reason retries after side-effecting runs
This unconditional stopReason === "error" conversion retries fallback even when the just-finished attempt may already have executed irreversible actions. runAgentAttempt can return results that include side-effect markers (for example didSendViaMessagingTool / successfulCronAdds from embedded runs), and throwing here re-enters runWithModelFallback, which can send duplicate outbound messages or repeat cron mutations on the next candidate. The retry should be skipped (or handled differently) once the failed attempt has already produced side effects.
Useful? React with 👍 / 👎.
a038fcc to
d84b2e5
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fe86c6eca7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| result.meta.error?.message ?? | ||
| result.payloads?.find((p) => p.isError)?.text ?? | ||
| ""; | ||
| const derivedReason = classifyFailoverReason(errorText) ?? "overloaded"; |
There was a problem hiding this comment.
Pass provider into failover reason classifier
classifyFailoverReason(errorText) is invoked without the candidate provider, which disables provider-specific classifiers (for example isOpenRouterKeyLimitExceededError in pi-embedded-helpers/errors.ts only matches when provider contains openrouter). In this path, those errors fall through to null and then get forced to "overloaded", so billing/auth-like failures are treated as transient overload retries and logged with the wrong reason. This regresses fallback behavior and cooldown handling for provider-specific error formats.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fe86c6eca7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| throw new FailoverError( | ||
| errorText || "Provider finish_reason: error", |
There was a problem hiding this comment.
Avoid turning terminal finish_reason errors into hard failures
Throwing here converts every stopReason === "error" result into an exception, but agentCommandInternal does not translate fallback exhaustion back into a reply payload. In single-candidate configs (or when all candidates fail), runWithModelFallback rethrows this FailoverError, and ingress paths that call agentCommandFromIngress (for example src/gateway/openai-http.ts and src/gateway/openresponses-http.ts) treat it as an internal failure and return generic 500s. Before this change, the same provider-side error surfaced as a structured error payload from the run result.
Useful? React with 👍 / 👎.
| result.meta.error?.message ?? | ||
| result.payloads?.find((p) => p.isError)?.text ?? | ||
| ""; | ||
| const derivedReason = classifyFailoverReason(errorText) ?? "overloaded"; |
There was a problem hiding this comment.
Pass provider context when classifying failover reason
classifyFailoverReason is called without the provider even though providerOverride is available, so provider-specific generic errors are misclassified (or fall back to "overloaded"). That changes recorded attempt reasons and downstream fallback policy/message behavior (cooldown probing and summary text), especially for provider patterns that only resolve with provider context.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e5b4fe25f6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
e5b4fe2 to
866dd09
Compare
|
Rebased on current main (8c11210) to resolve CI failures. Changes after rebase:
Fixes #59524 — |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 866dd09d7d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| result.meta.error?.message ?? | ||
| result.payloads?.find((p) => p.isError)?.text ?? | ||
| ""; | ||
| const derivedReason = classifyFailoverReason(errorText) ?? "overloaded"; |
There was a problem hiding this comment.
Pass provider into finish_reason failover classification
classifyFailoverReason is called with only errorText, so provider-specific classifiers are skipped in the auto-reply fallback path. For errors whose detection depends on provider context (for example OpenRouter-specific key/credit patterns), this falls through to null and then gets coerced to "overloaded", which records the wrong failure reason and can drive incorrect retry/cooldown behavior when a run ends with stopReason === "error".
Useful? React with 👍 / 👎.
rajamukerji
left a comment
There was a problem hiding this comment.
Looks good — bug-fix evidence bar met:
- Symptom evidence: production logs in #59524
- Root cause:
runWithModelFallbackdetects failures only via thrown exceptions;runEmbeddedPiAgentresolves normally onfinish_reason: error, so the chain halts despite the failure signal inresult.meta.stopReason - Fix touches implicated path: three call sites (
src/agents/agent-command.ts,src/auto-reply/reply/agent-runner-execution.ts,src/cron/isolated-agent/run-executor.ts) - Regression tests in
src/agents/model-fallback.test.tscover both the defaultoverloadedpath and arate_limit-classified case
CI all green, rebased onto current main, mirrors the existing LiveSessionModelSwitchError → FailoverError(reason: "overloaded") pattern.
Optional follow-ups (not blocking this PR):
- The
if (result.meta.stopReason === "error") { ... throw new FailoverError(...) }block is duplicated verbatim across the three call sites. A small helper (for examplethrowIfProviderStopReasonError(result, { provider, model })exported fromfailover-error.ts) would keep the three paths in lockstep. classifyFailoverReason(errorText)is called without a provider argument. Persrc/agents/pi-embedded-helpers.isbillingerrormessage.test.ts, provider context matters for some classifications (for example Anthropic-specific billing error text). Falling back tooverloadedis still safe, but passing{ provider }would tighten the classification for provider-specific error surfaces.src/hooks/llm-slug-generator.tsalso callsrunEmbeddedPiAgentdirectly and does not get the same treatment. Slug generation is non-critical, so leaving it is defensible — but worth a follow-up note if you want the pattern applied uniformly.
…w#59524) When a provider returns HTTP 200 with finish_reason:error, runWithModelFallback incorrectly logged candidate_succeeded and halted the fallback chain. The runner resolves normally in this case, so the fallback loop never saw a thrown error. Fix: in all three runWithModelFallback call sites that use runEmbeddedPiAgent, check result.meta.stopReason === "error" after the run resolves and throw a FailoverError(reason: "overloaded") so the fallback loop continues to the next candidate as expected. Affected call sites: - src/agents/agent-command.ts - src/auto-reply/reply/agent-runner-execution.ts - src/cron/isolated-agent/run.ts Regression test added in src/agents/model-fallback.test.ts.
866dd09 to
e19c792
Compare
|
Closing this as implemented after Codex review. Current main has superseding provider-aware failover classification for embedded/runtime assistant errors, and the related hardening shipped in v2026.4.24 via #70743. This PR's direct call-site conversion is no longer the best patch to land. What I checked:
So I’m closing this as already implemented rather than keeping a duplicate issue open. Codex Review notes: reviewed against 7bbd47349ec3; fix evidence: release v2026.4.24. |
Summary
When a provider returns HTTP 200 with
finish_reason: error, the fallback loopincorrectly logs
candidate_succeededand halts the chain. The remainingcandidates are never tried, and the user sees an error instead of a response.
Root Cause
runWithModelFallbackdetects failures exclusively via thrown exceptions.runEmbeddedPiAgentresolves normally onfinish_reason: error(HTTP 200 withan error body), so the fallback loop sees
{ ok: true }and stops. Theresult.meta.stopReason === "error"signal is only visible after the chainhas already terminated.
Trace from the reported production log (
runId 1eeb0fb6):anthropic/claude-opus-4-6→ timeout (408) →candidate_failed✅voidai/claude-opus-4-6→ HTTP 200,finish_reason: error→candidate_succeeded❌Changes
src/agents/agent-command.ts— checkresult.meta.stopReason === "error"afterrunAgentAttemptresolves; throwFailoverError(reason: "overloaded")to re-enter the fallback loopsrc/auto-reply/reply/agent-runner-execution.ts— same check in the embedded Pi run pathsrc/cron/isolated-agent/run.ts— same check in the cron isolated-agent run pathsrc/agents/model-fallback.test.ts— regression test confirming the chain continues past aFailoverErrorthrown forfinish_reason: errorWhy This Is Safe
stopReason === "error", which is an unambiguous provider-side failure signalreason: "overloaded"— retry-eligible, no auth/billing cooldown side-effectsLiveSessionModelSwitchError(wrapped asFailoverError(reason: "overloaded")inmodel-fallback.ts)runEmbeddedPiAgentormodel-fallback.tsinternalsTesting
Regression test in
model-fallback.test.ts:FailoverError("Provider finish_reason: error", { reason: "overloaded" })attempts[0]records the correct provider, model, reason, and error messageFixes #59524
AI-assisted PR — changes reviewed and understood. Lightly tested via unit test.