Skip to content

Model fallback treats finish_reason:error as success, stops fallback chain #59524

@vovapetry

Description

@vovapetry

Bug

When a provider returns HTTP 200 but with finish_reason: error in the response body, the model-fallback system marks the attempt as candidate_succeeded and stops the fallback chain. The error is only detected later at agent_end, when it's too late to try the next candidate.

Expected behavior

finish_reason: error should be treated as a failed attempt, triggering the next fallback candidate (same as timeout or HTTP error).

Reproduction

From production logs (2026-04-02 07:30 UTC, runId 1eeb0fb6-6d00-49d4-9fde-563bbb97bcd1):

  1. Attempt 1anthropic/claude-opus-4-6 → timeout (408) → candidate_failed ✅ correct
  2. Attempt 2voidai/claude-opus-4-6 → HTTP 200, finish_reason: errorcandidate_succeeded ❌ incorrect
  3. Fallback chain stops (2/5 attempts used)
  4. embedded_run_agent_end fires with isError: true, error: Provider finish_reason: error
  5. User sees generic error message instead of a response

Log evidence

// Step 2: fallback considers it success
{"event":"model_fallback_decision", "decision":"candidate_succeeded", "candidateProvider":"voidai", "attempt":2, "total":5}

// Step 4: but agent_end knows it failed  
{"event":"embedded_run_agent_end", "isError":true, "error":"Provider finish_reason: error", "provider":"voidai"}

Suggested fix

In the fallback decision logic, check finish_reason from the provider response. If finish_reason === "error", treat as candidate_failed and continue to the next candidate.

Environment

  • OpenClaw v2026.4.1
  • Fallback chain: anthropic → voidai → (3 more candidates configured but never reached)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions