fix(failover): recognize 'abort' stop reason as timeout for model fallback#18618
Conversation
…lback When streaming providers (GLM, OpenRouter, etc.) return 'stop reason: abort' due to stream interruption, OpenClaw's failover mechanism did not recognize this as a timeout condition. This prevented fallback models from being triggered, leaving users with failed requests instead of graceful failover. Changes: - Add abort patterns to ERROR_PATTERNS.timeout in pi-embedded-helpers/errors.ts - Extend TIMEOUT_HINT_RE regex to include abort patterns in failover-error.ts Fixes openclaw#18453 Co-authored-by: James <james@openclaw.ai>
| const TIMEOUT_HINT_RE = | ||
| /timeout|timed out|deadline exceeded|context deadline exceeded|stop reason:\s*abort|unhandled stop reason:\s*abort/i; |
There was a problem hiding this comment.
Pattern mismatch with errors.ts
ERROR_PATTERNS.timeout in errors.ts includes the broad /\breason:\s*abort\b/i pattern, but TIMEOUT_HINT_RE here does not. This means the two timeout detection paths behave differently: isTimeoutErrorMessage() (errors.ts) will classify a message containing "reason: abort" as a timeout, but hasTimeoutHint() / isTimeoutError() (this file) will not.
This could cause inconsistent behavior — e.g., a provider error surfaced as an error message string would trigger failover via classifyFailoverReason → isTimeoutErrorMessage, but the same text passed through resolveFailoverReasonFromError → isTimeoutError would miss it.
If reason:\s*abort (without "stop" prefix) is a real pattern from providers, consider adding it here too for consistency:
| const TIMEOUT_HINT_RE = | |
| /timeout|timed out|deadline exceeded|context deadline exceeded|stop reason:\s*abort|unhandled stop reason:\s*abort/i; | |
| const TIMEOUT_HINT_RE = | |
| /timeout|timed out|deadline exceeded|context deadline exceeded|stop reason:\s*abort|reason:\s*abort|unhandled stop reason:\s*abort/i; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/failover-error.ts
Line: 3:4
Comment:
**Pattern mismatch with `errors.ts`**
`ERROR_PATTERNS.timeout` in `errors.ts` includes the broad `/\breason:\s*abort\b/i` pattern, but `TIMEOUT_HINT_RE` here does not. This means the two timeout detection paths behave differently: `isTimeoutErrorMessage()` (errors.ts) will classify a message containing "reason: abort" as a timeout, but `hasTimeoutHint()` / `isTimeoutError()` (this file) will not.
This could cause inconsistent behavior — e.g., a provider error surfaced as an error message string would trigger failover via `classifyFailoverReason` → `isTimeoutErrorMessage`, but the same text passed through `resolveFailoverReasonFromError` → `isTimeoutError` would miss it.
If `reason:\s*abort` (without "stop" prefix) is a real pattern from providers, consider adding it here too for consistency:
```suggestion
const TIMEOUT_HINT_RE =
/timeout|timed out|deadline exceeded|context deadline exceeded|stop reason:\s*abort|reason:\s*abort|unhandled stop reason:\s*abort/i;
```
How can I resolve this? If you propose a fix, please make it concise.| /\bstop reason:\s*abort\b/i, | ||
| /\breason:\s*abort\b/i, | ||
| /\bunhandled stop reason:\s*abort\b/i, |
There was a problem hiding this comment.
Redundant pattern: unhandled stop reason already matched
/\bstop reason:\s*abort\b/i will already match the substring "stop reason: abort" inside "unhandled stop reason: abort" (since \b fires at the word boundary before "stop"). This makes /\bunhandled stop reason:\s*abort\b/i on line 604 redundant — it can never match something that line 602 wouldn't already catch.
Not a bug, but worth simplifying to reduce maintenance surface:
| /\bstop reason:\s*abort\b/i, | |
| /\breason:\s*abort\b/i, | |
| /\bunhandled stop reason:\s*abort\b/i, | |
| /\bstop reason:\s*abort\b/i, | |
| /\breason:\s*abort\b/i, |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-helpers/errors.ts
Line: 602:604
Comment:
**Redundant pattern: `unhandled stop reason` already matched**
`/\bstop reason:\s*abort\b/i` will already match the substring "stop reason: abort" inside "unhandled stop reason: abort" (since `\b` fires at the word boundary before "stop"). This makes `/\bunhandled stop reason:\s*abort\b/i` on line 604 redundant — it can never match something that line 602 wouldn't already catch.
Not a bug, but worth simplifying to reduce maintenance surface:
```suggestion
/\bstop reason:\s*abort\b/i,
/\breason:\s*abort\b/i,
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
Problem
When streaming providers (GLM, OpenRouter, NVIDIA NIM, etc.) return
stop reason: abortdue to stream interruption, OpenClaw's failover mechanism did not recognize this as a timeout condition. This prevented fallback models from being triggered, leaving users with failed requests instead of graceful failover to secondary models.Root Cause
The timeout error pattern detection in two places did not include abort-related patterns:
ERROR_PATTERNS.timeoutinsrc/agents/pi-embedded-helpers/errors.tsTIMEOUT_HINT_REinsrc/agents/failover-error.tsSolution
Add abort stop reason patterns to both timeout detection regexes:
/\bstop reason:\s*abort\b/i/\breason:\s*abort\b/i/\bunhandled stop reason:\s*abort\b/iImpact
Affects all users configuring model fallback chains with streaming providers. Before this fix, any abort from the provider would surface as a user-facing error instead of triggering the next model in the fallback chain.
Testing
Fixes #18453
AI-Assisted
This PR was prepared with assistance from Claude Opus 4.5.
Greptile Summary
This PR adds abort stop reason pattern detection to both timeout error classification paths (
failover-error.tsanderrors.ts), enabling model fallback when streaming providers (GLM, OpenRouter, NVIDIA NIM) returnstop reason: abortdue to stream interruption.stop reason:\s*abortandunhandled stop reason:\s*abortpatterns toTIMEOUT_HINT_REinfailover-error.tsstop reason:\s*abort,reason:\s*abort,unhandled stop reason:\s*abort) toERROR_PATTERNS.timeoutinerrors.tsreason:\s*abortpattern exists only inerrors.tsbut not infailover-error.ts, which means the two timeout detection codepaths (isTimeoutErrorvsisTimeoutErrorMessage) will behave differently for error messages containing "reason: abort" without the "stop" prefixConfidence Score: 3/5
reason:\s*abortpattern is present inerrors.tsbut missing fromfailover-error.ts, creating an inconsistency between the two timeout classification paths. This means certain error messages may be classified as timeouts in one path but not the other, potentially causing subtle failover behavior differences. The redundantunhandled stop reasonpattern is a minor style concern.src/agents/failover-error.ts— missingreason:\s*abortpattern that exists inerrors.tsLast reviewed commit: d77099a
Local Validation
pnpm build && pnpm checkTypeScript compilation passes. Lint passes. Pattern logic is unit-testable via existing error classification helpers.