You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(agents): mark embedded run-budget timeouts as terminal aborts
Address P2/P3 bot review on PR #62682:
[P2] Embedded run-budget timeouts misclassified — `opts.abortSignal` is
the caller-provided signal (HTTP disconnect, cron wrapper). It is NOT
the embedded runner's private `runAbortController` — when
`scheduleAbortTimer` fires and calls `abortRun(true)`, only the internal
controller is aborted with a TimeoutError. The previous patch's
`isTerminalAbort(opts.abortSignal)` check therefore missed the #60388
case entirely; the fallback chain still tried alternative models even
though the whole run's deadline had elapsed.
Two complementary changes address this:
(1) `EmbeddedRunAttemptResult.timedOutByRunBudget` flag — set explicitly
in attempt.ts when the run-budget timer fires (NOT when LLM idle
watchdog fires; that's still LLM-phase and benefits from fallback).
Threaded through to failover-policy `shouldRotateAssistant`
alongside the existing `timedOutDuringCompaction` and
`timedOutDuringToolExecution` exemptions. Also gates the
timeout-triggered compaction branch in run.ts (compacting wouldn't
help — the deadline is exhausted). Optional in result type for
public harness SDK back-compat (matches #75873 pattern).
(2) `isTerminalAbortFromError(err)` — new helper that mirrors the
existing `isTerminalAbort(signal)` checks but reads the thrown
error's `.cause` chain. Needed because `abortable()` wraps the
embedded controller's TimeoutError in an outer AbortError, and the
fallback layer only sees the thrown error (not the embedded
controller's signal). Used in `runFallbackCandidate.catch` next to
the existing signal check.
[P3] CHANGELOG entry added under Unreleased/Fixes.
Tests: 197/197 pass in failover-policy, assistant-failover,
model-fallback, and trajectory-metadata suites. New regression cases
cover: (a) thrown error with TimeoutError in cause chain rethrows; (b)
thrown error with ClientDisconnectError in cause chain rethrows; (c)
generic AbortError still falls back; (d) failover-policy exempts
run-budget timeouts both pre- and post-rotation.
Closes#60388.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,7 @@ Docs: https://docs.openclaw.ai
11
11
12
12
### Fixes
13
13
14
+
- Agents/failover: stop the model fallback chain when a run-level abort is terminal — the embedded run-budget timer (`scheduleAbortTimer`), an HTTP client disconnect, or a cron job timeout. Previously the fallback layer would continue trying additional models even though the whole run's deadline had elapsed (or the caller had gone away), wasting tokens and time on retries that could not produce a useful result. Detection covers both `signal.reason` (caller-driven) and the thrown error's `.cause` chain (embedded private controller). New `timedOutByRunBudget` attempt result flag; failover policy now skips assistant rotation when set. Closes #60388. Thanks @simonusa.
14
15
- Agents/sessions: preserve terminal lifecycle state when final run metadata persists from a stale in-memory snapshot, preventing `main` sessions from staying stuck as running after completed or timed-out turns.
15
16
- Status: show the `openai-codex` OAuth profile for `openai/gpt-*` sessions running through the native Codex runtime instead of reporting auth as unknown. (#76197) Thanks @mbelinky.
16
17
- Plugins/externalization: keep diagnostics ClawHub packages and persisted bundled-plugin relocation on npm-first install metadata for launch, and omit Discord from the core package now that its external package is published. Thanks @vincentkoc.
0 commit comments