fix(agents): exempt tool-execution timeouts from model fallback (#52147) by simonusa · Pull Request #75873 · openclaw/openclaw

simonusa · 2026-05-02T00:44:09Z

Summary

When the embedded run-level timer fires while tool execution is in flight (e.g. process(poll) loops, browser automation, long exec), the LLM has already responded. Today's failover policy still treats this as timedOut && !timedOutDuringCompaction → rotates auth profiles, then escalates to model fallback with FailoverError: LLM request timed out.

That's wrong on three axes:

the LLM was not pending — fallback model has nothing useful to do
the original task is interrupted mid-execution
the LLM request timed out error message is misleading (no LLM request was in flight)

This PR mirrors the existing timedOutDuringCompaction precedent (PR #46889): a new flag is set when abortRun(isTimeout=true) fires while at least one tool is still active, and the failover policy adds a parallel exemption. No behavior change for LLM-phase timeouts.

The bot's 2026-04-28 review on #52147 verified the gap on current main and endorsed this exact approach as the cleanest fix.

Detection

Uses the existing toolStartData map in pi-embedded-subscribe.handlers.tools.ts — entries are inserted at tool start (handleToolExecutionStart line 610) and deleted at tool end (handleToolExecutionEnd line 828). New countActiveToolExecutions(runId) helper filters by runId: key prefix since the map is shared across runs.

The flag is only set in abortRun when isTimeout && !timedOutDuringCompaction && activeCount > 0. Compaction takes precedence (existing behavior).

Plumbing

Mirrors the existing timedOutDuringCompaction flow exactly:

EmbeddedRunAttemptResult.timedOutDuringToolExecution: boolean
threaded through trajectory artifacts + export for diagnostic visibility (parallel field to the compaction flag)
failover-policy.ts shouldRotateAssistant adds the exemption
assistant-failover.ts + run.ts thread the param

Test plan

3 new failover-policy regressions:
- tool-exec timeout → continue_normal regardless of profile-rotation state
- LLM-phase timeout (no active tool) → still rotates as before (locks in current behavior)
All 191 related embedded-runner / harness / trajectory tests pass:
- pnpm vitest run src/agents/pi-embedded-runner/run/failover-policy.test.ts src/agents/pi-embedded-runner/run/assistant-failover.test.ts
- pnpm vitest run src/agents/pi-embedded-runner/run/compaction-timeout.test.ts src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts
- pnpm vitest run src/agents/pi-embedded-runner/run.timeout-triggered-compaction.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts
- pnpm vitest run src/agents/harness/v2.test.ts src/agents/harness/selection.test.ts src/trajectory/metadata.test.ts

Scope boundary

Doesn't change behavior for:

LLM-phase timeouts (the existing rotation/fallback path)
timedOutDuringCompaction (untouched)
External aborts (untouched)
Idle-watchdog timeouts (untouched — the LLM-idle-timeout handler still calls abortRun(true) and is correctly classified as LLM-phase since no tool is in flight by definition during idle wait)

Note

AI-assisted (Claude Opus 4.7).

🤖 Generated with Claude Code

clawsweeper · 2026-05-02T00:45:25Z

Codex review: needs maintainer review before merge.

Summary
The PR adds a timedOutDuringToolExecution attempt flag and threads it through embedded-run timeout handling, failover policy, trajectory metadata, fixtures, tests, and changelog coverage.

Reproducibility: yes. Configure a short agents.defaults.timeoutSeconds, a fallback model chain, and a long process(poll), browser, or exec tool call; current main can also be source-verified because non-compaction timedOut still feeds assistant failover.

Next step before merge
The implementation PR appears sound; the next action is maintainer review plus CI/changed-gate validation, not an automated repair.

Security
Cleared: The diff changes agent runtime timeout plumbing, diagnostics metadata, tests, and changelog text without new dependencies, CI permissions, secret handling, downloads, or package execution paths.

Review details

Best possible solution:

Merge the focused active-tool timeout exemption after maintainer review and CI/changed-gate proof, keeping LLM-phase timeouts on the existing fallback path.

Do we have a high-confidence way to reproduce the issue?

Yes. Configure a short agents.defaults.timeoutSeconds, a fallback model chain, and a long process(poll), browser, or exec tool call; current main can also be source-verified because non-compaction timedOut still feeds assistant failover.

Is this the best way to solve the issue?

Yes. The proposed active-tool flag mirrors the established compaction exemption, uses existing tool lifecycle state, keeps the new public attempt-result field optional, and avoids broader timeout-policy refactoring.

Acceptance criteria:

pnpm test src/agents/pi-embedded-runner/run/failover-policy.test.ts src/agents/pi-embedded-runner/run/assistant-failover.test.ts
pnpm test src/agents/pi-embedded-runner/run/compaction-timeout.test.ts src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts
pnpm test src/agents/pi-embedded-runner/run.timeout-triggered-compaction.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts
pnpm test src/agents/harness/v2.test.ts src/agents/harness/selection.test.ts src/trajectory/metadata.test.ts
pnpm check:changed

What I checked:

Current main failover gap: Assistant failover still rotates/falls back on timedOut && !timedOutDuringCompaction, with no active-tool exemption on current main. (src/agents/pi-embedded-runner/run/failover-policy.ts:84, 6a54aac4892c)
Current main timeout classification: The run deadline only checks compaction state before abortRun(true), so a timeout during tool execution is not distinguished before aborting. (src/agents/pi-embedded-runner/run/attempt.ts:2399, 6a54aac4892c)
Existing active-tool signal: Tool starts are stored in toolStartData with a run/tool key and removed on tool completion, giving the PR a concrete lifecycle signal to count by run id. (src/agents/pi-embedded-subscribe.handlers.tools.ts:615, 6a54aac4892c)
Current main secondary timeout branches: Timeout-triggered compaction and generic timeout payload synthesis also only exempt compaction timeouts today, matching the PR's broader plumbing target. (src/agents/pi-embedded-runner/run.ts:1246, 6a54aac4892c)
PR diff coverage: The extracted PR diff adds the new flag, active-tool counter, failover exemption, run.ts compaction/payload exemptions, optional public attempt-result field, trajectory export field, fixtures, failover-policy regressions, and changelog entry. (src/agents/pi-embedded-runner/run/failover-policy.ts:82, a1b57e9449cc)
Related bug report: Linked issue Agent run timeout during tool execution misclassified as LLM timeout, triggers unnecessary model fallback #52147 gives concrete reproduction steps with long process(poll)/exec tool execution, configured model fallback, and the misleading FailoverError: LLM request timed out. symptom.

Likely related people:

steipete: Local blame and history show Peter Steinberger on the current embedded-runner/failover lines and broader failover refactors, and the PR timeline mentions steipete on this thread. (role: recent maintainer and current-main owner; confidence: high; commits: 29dc18d33d50, 6472e03412ef, d2597d5ecf2e; files: src/agents/pi-embedded-runner/run/failover-policy.ts, src/agents/pi-embedded-runner/run/attempt.ts, src/agents/pi-embedded-runner/run.ts)
mverrilli: Local history shows Michael Verrilli authored the earlier compaction-timeout deadlock fix, which is the same timeout classification family this PR mirrors. (role: adjacent timeout-precedent contributor; confidence: medium; commits: e6f67d5f3150; files: src/agents/pi-embedded-runner/run/attempt.ts, src/agents/pi-embedded-runner/run.ts)
asyncjason: The PR and linked issue explicitly cite merged PR feat: make compaction timeout configurable via agents.defaults.compaction.timeoutSeconds #46889 as the precedent for non-LLM timeout exemptions, credited in the provided context to asyncjason. (role: compaction timeout precedent contributor; confidence: medium; commits: da6467f7ba6f; files: src/agents/pi-embedded-runner/run/attempt.ts, src/agents/pi-embedded-runner/run/compaction-timeout.ts, CHANGELOG.md)
Vincent Koc: Local history shows repeated work on embedded tool hook correlation and after_tool_call context, adjacent to the toolStartData lifecycle signal used by this PR. (role: tool lifecycle adjacent owner; confidence: medium; commits: 747902a26a63, 0954b6bf5f48, 44183c6eb19e; files: src/agents/pi-embedded-subscribe.handlers.tools.ts)

Remaining risk / open question:

Targeted tests and pnpm check:changed were not run in this read-only review; they should gate merge.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 6a54aac4892c.

… synthesis Address P2 review finding on PR openclaw#75873: thread timedOutDuringToolExecution through the two remaining timeout-handling branches in run.ts so a long tool call doesn't trigger compaction/retry or surface a misleading "Request timed out before a response was generated" payload before the failover-policy exemption matters. - run.ts:1250 (timeout-triggered compaction): tool-exec timeouts skip compaction. Compacting wouldn't help — the LLM already responded; the prompt wasn't the problem. It would only lose in-flight tool context. - run.ts:2084 (generic timeout payload synthesis): tool-exec timeouts skip the synthesized "Request timed out before a response was generated" payload. The assistant did produce a response (a tool call that ran long); the partial tool output already in the session is the correct artifact to surface. Also adds the missing CHANGELOG entry (P3 finding) under Unreleased/Fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…K back-compat Address P2 review finding on PR openclaw#75873: EmbeddedRunAttemptResult is re-exported as AgentHarnessAttemptResult in the plugin SDK (src/plugin-sdk/agent-harness-runtime.ts:25), so making the new field required would force every third-party harness to add a field it cannot necessarily observe. - types.ts: field is now optional with documentation explaining the back-compat contract. - run.ts: destructure with `?? false` default at the runner boundary so internal failover-decision code can rely on a strict boolean. Internal embedded-runner attempt code always sets the flag explicitly, so this is purely a defensive default for the public SDK contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@simonusa

P2: Remove the stray <<<<<<< HEAD marker on CHANGELOG.md:41 left over from the rebase-conflict resolution. CI's check-no-conflict-markers.mjs caught it as an actionlint failure. P3: Add @simonusa attribution to the Unreleased fixes entry per OpenClaw changelog policy. Both findings from clawsweeper review on PR openclaw#75873.

… synthesis Address P2 review finding on PR openclaw#75873: thread timedOutDuringToolExecution through the two remaining timeout-handling branches in run.ts so a long tool call doesn't trigger compaction/retry or surface a misleading "Request timed out before a response was generated" payload before the failover-policy exemption matters. - run.ts:1250 (timeout-triggered compaction): tool-exec timeouts skip compaction. Compacting wouldn't help — the LLM already responded; the prompt wasn't the problem. It would only lose in-flight tool context. - run.ts:2084 (generic timeout payload synthesis): tool-exec timeouts skip the synthesized "Request timed out before a response was generated" payload. The assistant did produce a response (a tool call that ran long); the partial tool output already in the session is the correct artifact to surface. Also adds the missing CHANGELOG entry (P3 finding) under Unreleased/Fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…K back-compat Address P2 review finding on PR openclaw#75873: EmbeddedRunAttemptResult is re-exported as AgentHarnessAttemptResult in the plugin SDK (src/plugin-sdk/agent-harness-runtime.ts:25), so making the new field required would force every third-party harness to add a field it cannot necessarily observe. - types.ts: field is now optional with documentation explaining the back-compat contract. - run.ts: destructure with `?? false` default at the runner boundary so internal failover-decision code can rely on a strict boolean. Internal embedded-runner attempt code always sets the flag explicitly, so this is purely a defensive default for the public SDK contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Detect run-level timeouts that fire while a tool call is still active and keep them out of assistant model fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Thread tool-timeout state through timeout-triggered compaction, generic timeout payload synthesis, and the changelog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Keep the new attempt-result field optional for third-party harness compatibility while defaulting absent values at the runner boundary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

obviyus

Verified the runner/failover bug path: run-level timeouts during active tool execution now avoid model fallback while LLM-phase timeouts keep the existing rotation path.

Maintainer follow-up: rebased onto current main, cleaned the commit messages, trimmed redundant comments, and kept the changelog entry in the active Unreleased fixes block.

Local gate: skipped tests per maintainer request; ran conflict-marker and diff sanity checks during the rebase.

obviyus · 2026-05-02T13:53:42Z

Landed on main.

Landed source commit: cf156be
Merge commit: 731f640

Thanks @simonusa.

Address P2/P3 bot review on PR openclaw#62682: [P2] Embedded run-budget timeouts misclassified — `opts.abortSignal` is the caller-provided signal (HTTP disconnect, cron wrapper). It is NOT the embedded runner's private `runAbortController` — when `scheduleAbortTimer` fires and calls `abortRun(true)`, only the internal controller is aborted with a TimeoutError. The previous patch's `isTerminalAbort(opts.abortSignal)` check therefore missed the openclaw#60388 case entirely; the fallback chain still tried alternative models even though the whole run's deadline had elapsed. Two complementary changes address this: (1) `EmbeddedRunAttemptResult.timedOutByRunBudget` flag — set explicitly in attempt.ts when the run-budget timer fires (NOT when LLM idle watchdog fires; that's still LLM-phase and benefits from fallback). Threaded through to failover-policy `shouldRotateAssistant` alongside the existing `timedOutDuringCompaction` and `timedOutDuringToolExecution` exemptions. Also gates the timeout-triggered compaction branch in run.ts (compacting wouldn't help — the deadline is exhausted). Optional in result type for public harness SDK back-compat (matches openclaw#75873 pattern). (2) `isTerminalAbortFromError(err)` — new helper that mirrors the existing `isTerminalAbort(signal)` checks but reads the thrown error's `.cause` chain. Needed because `abortable()` wraps the embedded controller's TimeoutError in an outer AbortError, and the fallback layer only sees the thrown error (not the embedded controller's signal). Used in `runFallbackCandidate.catch` next to the existing signal check. [P3] CHANGELOG entry added under Unreleased/Fixes. Tests: 197/197 pass in failover-policy, assistant-failover, model-fallback, and trajectory-metadata suites. New regression cases cover: (a) thrown error with TimeoutError in cause chain rethrows; (b) thrown error with ClientDisconnectError in cause chain rethrows; (c) generic AbortError still falls back; (d) failover-policy exempts run-budget timeouts both pre- and post-rotation. Closes openclaw#60388.

openclaw-barnacle Bot added agents Agent runtime and tooling extensions: codex size: S labels May 2, 2026

simonusa force-pushed the feat/tool-execution-timeout-exemption-52147 branch from c22de36 to d1e3326 Compare May 2, 2026 11:42

jackiedepp mentioned this pull request May 2, 2026

core-plugin-tools prep stage takes ~11s on every embedded run (cached registry, per-run factory cost) #75290

Closed

simonusa marked this pull request as ready for review May 2, 2026 13:23

simonusa requested a review from a team as a code owner May 2, 2026 13:23

obviyus self-assigned this May 2, 2026

obviyus force-pushed the feat/tool-execution-timeout-exemption-52147 branch from a1b57e9 to 424cf69 Compare May 2, 2026 13:47

obviyus force-pushed the feat/tool-execution-timeout-exemption-52147 branch from 424cf69 to 5235de7 Compare May 2, 2026 13:50

simonusa and others added 4 commits May 2, 2026 19:21

fix(agents): classify tool-execution timeouts

557f07e

Detect run-level timeouts that fire while a tool call is still active and keep them out of assistant model fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(agents): skip retry paths for tool timeouts

87430c3

Thread tool-timeout state through timeout-triggered compaction, generic timeout payload synthesis, and the changelog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(plugin-sdk): keep harness timeout flag optional

03bb29e

Keep the new attempt-result field optional for third-party harness compatibility while defaulting absent values at the runner boundary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

style(agents): trim tool-timeout comments

cf156be

obviyus force-pushed the feat/tool-execution-timeout-exemption-52147 branch from 5235de7 to cf156be Compare May 2, 2026 13:52

obviyus approved these changes May 2, 2026

View reviewed changes

obviyus merged commit 731f640 into openclaw:main May 2, 2026
94 checks passed

This was referenced May 2, 2026

Frequent Timeouts & Agent Ignoring Tool Restrictions (Browser vs. Fetch/Curl) #69190

Open

fix(agents): distinguish terminal aborts from retryable failures (#60388) #62682

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): exempt tool-execution timeouts from model fallback (#52147)#75873

fix(agents): exempt tool-execution timeouts from model fallback (#52147)#75873
obviyus merged 4 commits intoopenclaw:mainfrom
simonusa:feat/tool-execution-timeout-exemption-52147

simonusa commented May 2, 2026

Uh oh!

clawsweeper Bot commented May 2, 2026 •

edited

Loading

Uh oh!

obviyus left a comment

Uh oh!

Uh oh!

obviyus commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

simonusa commented May 2, 2026

Summary

Detection

Plumbing

Test plan

Scope boundary

Note

Uh oh!

clawsweeper Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

obviyus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

obviyus commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented May 2, 2026 •

edited

Loading