fix(agents): exempt tool-execution timeouts from model fallback (#52147)#75873
Conversation
|
Codex review: needs maintainer review before merge. Summary Reproducibility: yes. Configure a short Next step before merge Security Review detailsBest possible solution: Merge the focused active-tool timeout exemption after maintainer review and CI/changed-gate proof, keeping LLM-phase timeouts on the existing fallback path. Do we have a high-confidence way to reproduce the issue? Yes. Configure a short Is this the best way to solve the issue? Yes. The proposed active-tool flag mirrors the established compaction exemption, uses existing tool lifecycle state, keeps the new public attempt-result field optional, and avoids broader timeout-policy refactoring. Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 6a54aac4892c. |
… synthesis Address P2 review finding on PR openclaw#75873: thread timedOutDuringToolExecution through the two remaining timeout-handling branches in run.ts so a long tool call doesn't trigger compaction/retry or surface a misleading "Request timed out before a response was generated" payload before the failover-policy exemption matters. - run.ts:1250 (timeout-triggered compaction): tool-exec timeouts skip compaction. Compacting wouldn't help — the LLM already responded; the prompt wasn't the problem. It would only lose in-flight tool context. - run.ts:2084 (generic timeout payload synthesis): tool-exec timeouts skip the synthesized "Request timed out before a response was generated" payload. The assistant did produce a response (a tool call that ran long); the partial tool output already in the session is the correct artifact to surface. Also adds the missing CHANGELOG entry (P3 finding) under Unreleased/Fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c22de36 to
d1e3326
Compare
… synthesis Address P2 review finding on PR openclaw#75873: thread timedOutDuringToolExecution through the two remaining timeout-handling branches in run.ts so a long tool call doesn't trigger compaction/retry or surface a misleading "Request timed out before a response was generated" payload before the failover-policy exemption matters. - run.ts:1250 (timeout-triggered compaction): tool-exec timeouts skip compaction. Compacting wouldn't help — the LLM already responded; the prompt wasn't the problem. It would only lose in-flight tool context. - run.ts:2084 (generic timeout payload synthesis): tool-exec timeouts skip the synthesized "Request timed out before a response was generated" payload. The assistant did produce a response (a tool call that ran long); the partial tool output already in the session is the correct artifact to surface. Also adds the missing CHANGELOG entry (P3 finding) under Unreleased/Fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…K back-compat Address P2 review finding on PR openclaw#75873: EmbeddedRunAttemptResult is re-exported as AgentHarnessAttemptResult in the plugin SDK (src/plugin-sdk/agent-harness-runtime.ts:25), so making the new field required would force every third-party harness to add a field it cannot necessarily observe. - types.ts: field is now optional with documentation explaining the back-compat contract. - run.ts: destructure with `?? false` default at the runner boundary so internal failover-decision code can rely on a strict boolean. Internal embedded-runner attempt code always sets the flag explicitly, so this is purely a defensive default for the public SDK contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P2: Remove the stray <<<<<<< HEAD marker on CHANGELOG.md:41 left over from the rebase-conflict resolution. CI's check-no-conflict-markers.mjs caught it as an actionlint failure. P3: Add @simonusa attribution to the Unreleased fixes entry per OpenClaw changelog policy. Both findings from clawsweeper review on PR openclaw#75873.
a1b57e9 to
424cf69
Compare
… synthesis Address P2 review finding on PR openclaw#75873: thread timedOutDuringToolExecution through the two remaining timeout-handling branches in run.ts so a long tool call doesn't trigger compaction/retry or surface a misleading "Request timed out before a response was generated" payload before the failover-policy exemption matters. - run.ts:1250 (timeout-triggered compaction): tool-exec timeouts skip compaction. Compacting wouldn't help — the LLM already responded; the prompt wasn't the problem. It would only lose in-flight tool context. - run.ts:2084 (generic timeout payload synthesis): tool-exec timeouts skip the synthesized "Request timed out before a response was generated" payload. The assistant did produce a response (a tool call that ran long); the partial tool output already in the session is the correct artifact to surface. Also adds the missing CHANGELOG entry (P3 finding) under Unreleased/Fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…K back-compat Address P2 review finding on PR openclaw#75873: EmbeddedRunAttemptResult is re-exported as AgentHarnessAttemptResult in the plugin SDK (src/plugin-sdk/agent-harness-runtime.ts:25), so making the new field required would force every third-party harness to add a field it cannot necessarily observe. - types.ts: field is now optional with documentation explaining the back-compat contract. - run.ts: destructure with `?? false` default at the runner boundary so internal failover-decision code can rely on a strict boolean. Internal embedded-runner attempt code always sets the flag explicitly, so this is purely a defensive default for the public SDK contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
424cf69 to
5235de7
Compare
Detect run-level timeouts that fire while a tool call is still active and keep them out of assistant model fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thread tool-timeout state through timeout-triggered compaction, generic timeout payload synthesis, and the changelog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keep the new attempt-result field optional for third-party harness compatibility while defaulting absent values at the runner boundary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5235de7 to
cf156be
Compare
obviyus
left a comment
There was a problem hiding this comment.
Verified the runner/failover bug path: run-level timeouts during active tool execution now avoid model fallback while LLM-phase timeouts keep the existing rotation path.
Maintainer follow-up: rebased onto current main, cleaned the commit messages, trimmed redundant comments, and kept the changelog entry in the active Unreleased fixes block.
Local gate: skipped tests per maintainer request; ran conflict-marker and diff sanity checks during the rebase.
Address P2/P3 bot review on PR openclaw#62682: [P2] Embedded run-budget timeouts misclassified — `opts.abortSignal` is the caller-provided signal (HTTP disconnect, cron wrapper). It is NOT the embedded runner's private `runAbortController` — when `scheduleAbortTimer` fires and calls `abortRun(true)`, only the internal controller is aborted with a TimeoutError. The previous patch's `isTerminalAbort(opts.abortSignal)` check therefore missed the openclaw#60388 case entirely; the fallback chain still tried alternative models even though the whole run's deadline had elapsed. Two complementary changes address this: (1) `EmbeddedRunAttemptResult.timedOutByRunBudget` flag — set explicitly in attempt.ts when the run-budget timer fires (NOT when LLM idle watchdog fires; that's still LLM-phase and benefits from fallback). Threaded through to failover-policy `shouldRotateAssistant` alongside the existing `timedOutDuringCompaction` and `timedOutDuringToolExecution` exemptions. Also gates the timeout-triggered compaction branch in run.ts (compacting wouldn't help — the deadline is exhausted). Optional in result type for public harness SDK back-compat (matches openclaw#75873 pattern). (2) `isTerminalAbortFromError(err)` — new helper that mirrors the existing `isTerminalAbort(signal)` checks but reads the thrown error's `.cause` chain. Needed because `abortable()` wraps the embedded controller's TimeoutError in an outer AbortError, and the fallback layer only sees the thrown error (not the embedded controller's signal). Used in `runFallbackCandidate.catch` next to the existing signal check. [P3] CHANGELOG entry added under Unreleased/Fixes. Tests: 197/197 pass in failover-policy, assistant-failover, model-fallback, and trajectory-metadata suites. New regression cases cover: (a) thrown error with TimeoutError in cause chain rethrows; (b) thrown error with ClientDisconnectError in cause chain rethrows; (c) generic AbortError still falls back; (d) failover-policy exempts run-budget timeouts both pre- and post-rotation. Closes openclaw#60388.
Summary
Closes #52147.
When the embedded run-level timer fires while tool execution is in flight (e.g.
process(poll)loops, browser automation, longexec), the LLM has already responded. Today's failover policy still treats this astimedOut && !timedOutDuringCompaction→ rotates auth profiles, then escalates to model fallback withFailoverError: LLM request timed out.That's wrong on three axes:
LLM request timed outerror message is misleading (no LLM request was in flight)This PR mirrors the existing
timedOutDuringCompactionprecedent (PR #46889): a new flag is set whenabortRun(isTimeout=true)fires while at least one tool is still active, and the failover policy adds a parallel exemption. No behavior change for LLM-phase timeouts.The bot's 2026-04-28 review on #52147 verified the gap on current
mainand endorsed this exact approach as the cleanest fix.Detection
Uses the existing
toolStartDatamap inpi-embedded-subscribe.handlers.tools.ts— entries are inserted at tool start (handleToolExecutionStartline 610) and deleted at tool end (handleToolExecutionEndline 828). NewcountActiveToolExecutions(runId)helper filters byrunId:key prefix since the map is shared across runs.The flag is only set in
abortRunwhenisTimeout && !timedOutDuringCompaction && activeCount > 0. Compaction takes precedence (existing behavior).Plumbing
Mirrors the existing
timedOutDuringCompactionflow exactly:EmbeddedRunAttemptResult.timedOutDuringToolExecution: booleanfailover-policy.tsshouldRotateAssistantadds the exemptionassistant-failover.ts+run.tsthread the paramTest plan
continue_normalregardless of profile-rotation statepnpm vitest run src/agents/pi-embedded-runner/run/failover-policy.test.ts src/agents/pi-embedded-runner/run/assistant-failover.test.tspnpm vitest run src/agents/pi-embedded-runner/run/compaction-timeout.test.ts src/agents/pi-embedded-runner/run/llm-idle-timeout.test.tspnpm vitest run src/agents/pi-embedded-runner/run.timeout-triggered-compaction.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.tspnpm vitest run src/agents/harness/v2.test.ts src/agents/harness/selection.test.ts src/trajectory/metadata.test.tsScope boundary
Doesn't change behavior for:
timedOutDuringCompaction(untouched)abortRun(true)and is correctly classified as LLM-phase since no tool is in flight by definition during idle wait)Note
AI-assisted (Claude Opus 4.7).
🤖 Generated with Claude Code