Skip to content

fix(agents): continue model fallback on failover text payloads#19252

Closed
mahsumaktas wants to merge 2 commits intoopenclaw:mainfrom
mahsumaktas:fix/cron-agent-model-fallback-preservation
Closed

fix(agents): continue model fallback on failover text payloads#19252
mahsumaktas wants to merge 2 commits intoopenclaw:mainfrom
mahsumaktas:fix/cron-agent-model-fallback-preservation

Conversation

@mahsumaktas
Copy link
Copy Markdown
Contributor

@mahsumaktas mahsumaktas commented Feb 17, 2026

Summary

  • detect failover-shaped error payloads returned as successful run results in runWithModelFallback
  • convert those payload-only failures into fallback retries so the chain advances instead of stopping on OpenRouter-style 402 text
  • keep guardrails to avoid false positives for normal instructional text mentioning rate limits

Testing

  • pnpm vitest run --config vitest.e2e.config.ts src/agents/model-fallback.e2e.test.ts
  • pnpm oxlint --type-aware src/agents/model-fallback.ts src/agents/model-fallback.e2e.test.ts

Greptile Summary

Extends model fallback system to detect and retry when providers return failover-shaped error payloads as "successful" run results. The PR adds:

  • Payload-level failover detection (resolveFailoverPayloadMessage) in model-fallback.ts:89-134 that inspects successful run results for error text payloads and converts them to fallback retries
  • New billing error patterns (requires more credits, can only afford) to catch OpenRouter-style 402 messages
  • Context parameter (ModelFallbackRunContext) passed to all run callbacks, enabling callers to know when fallback chains are active
  • probePrimaryDuringCooldown configuration set to "always" across auto-reply, followup, memory, and CLI flows so primary models are always attempted first (then fallback if rate-limited)
  • Cron agent model merge fix preserving default fallbacks when agent configs only override primary
  • User-facing fallback notices shown when billing/rate-limit causes model switching

The detection logic guards against false positives by requiring error-like signals (payload marked isError, stopReason "error", or regex match for HTTP codes/error keywords) before treating instructional text about rate limits as actual failures.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The implementation is well-tested with comprehensive e2e tests covering both positive cases (detecting real failover payloads) and negative cases (not treating instructional text as errors). The detection logic includes multiple safeguards against false positives, all integration points are updated consistently, and the cron model merge fix has dedicated unit tests. The changes follow established patterns in the codebase.
  • No files require special attention

Last reviewed commit: 29d6606

@openclaw-barnacle openclaw-barnacle Bot added commands Command implementations agents Agent runtime and tooling size: M labels Feb 17, 2026
@mahsumaktas mahsumaktas force-pushed the fix/cron-agent-model-fallback-preservation branch from 29d6606 to 3eea4a0 Compare February 17, 2026 15:50
@mahsumaktas
Copy link
Copy Markdown
Contributor Author

Rebased on latest main and resolved runner-layer conflicts with a conservative merge strategy (kept new context/auth wiring + fallback improvements).\n\nValidation run locally:\n- pnpm vitest run --config vitest.e2e.config.ts src/agents/model-fallback.e2e.test.ts src/agents/pi-embedded-helpers.isbillingerrormessage.e2e.test.ts\n- pnpm vitest run src/agents/model-fallback.probe.test.ts\n- pnpm vitest run src/auto-reply/reply/followup-runner.test.ts src/auto-reply/reply/agent-runner.runreplyagent.test.ts\n- pnpm oxlint --type-aware (touched files)\n\nAll passed locally; PR is now mergeable and CI is running.

@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle
Copy link
Copy Markdown

Closing due to inactivity.
If you believe this PR should be revived, post in #pr-thunderdome-dangerzone on Discord to talk to a maintainer.
That channel is the escape hatch for high-quality PRs that get auto-closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling commands Command implementations size: M stale Marked as stale due to inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant