Skip to content

fix(agents): exempt tool-execution timeouts from model fallback (#52147)#75873

Merged
obviyus merged 4 commits intoopenclaw:mainfrom
simonusa:feat/tool-execution-timeout-exemption-52147
May 2, 2026
Merged

fix(agents): exempt tool-execution timeouts from model fallback (#52147)#75873
obviyus merged 4 commits intoopenclaw:mainfrom
simonusa:feat/tool-execution-timeout-exemption-52147

Conversation

@simonusa
Copy link
Copy Markdown
Contributor

@simonusa simonusa commented May 2, 2026

Summary

Closes #52147.

When the embedded run-level timer fires while tool execution is in flight (e.g. process(poll) loops, browser automation, long exec), the LLM has already responded. Today's failover policy still treats this as timedOut && !timedOutDuringCompaction → rotates auth profiles, then escalates to model fallback with FailoverError: LLM request timed out.

That's wrong on three axes:

  • the LLM was not pending — fallback model has nothing useful to do
  • the original task is interrupted mid-execution
  • the LLM request timed out error message is misleading (no LLM request was in flight)

This PR mirrors the existing timedOutDuringCompaction precedent (PR #46889): a new flag is set when abortRun(isTimeout=true) fires while at least one tool is still active, and the failover policy adds a parallel exemption. No behavior change for LLM-phase timeouts.

The bot's 2026-04-28 review on #52147 verified the gap on current main and endorsed this exact approach as the cleanest fix.

Detection

Uses the existing toolStartData map in pi-embedded-subscribe.handlers.tools.ts — entries are inserted at tool start (handleToolExecutionStart line 610) and deleted at tool end (handleToolExecutionEnd line 828). New countActiveToolExecutions(runId) helper filters by runId: key prefix since the map is shared across runs.

The flag is only set in abortRun when isTimeout && !timedOutDuringCompaction && activeCount > 0. Compaction takes precedence (existing behavior).

Plumbing

Mirrors the existing timedOutDuringCompaction flow exactly:

  • EmbeddedRunAttemptResult.timedOutDuringToolExecution: boolean
  • threaded through trajectory artifacts + export for diagnostic visibility (parallel field to the compaction flag)
  • failover-policy.ts shouldRotateAssistant adds the exemption
  • assistant-failover.ts + run.ts thread the param

Test plan

  • 3 new failover-policy regressions:
    • tool-exec timeout → continue_normal regardless of profile-rotation state
    • LLM-phase timeout (no active tool) → still rotates as before (locks in current behavior)
  • All 191 related embedded-runner / harness / trajectory tests pass:
    • pnpm vitest run src/agents/pi-embedded-runner/run/failover-policy.test.ts src/agents/pi-embedded-runner/run/assistant-failover.test.ts
    • pnpm vitest run src/agents/pi-embedded-runner/run/compaction-timeout.test.ts src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts
    • pnpm vitest run src/agents/pi-embedded-runner/run.timeout-triggered-compaction.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts
    • pnpm vitest run src/agents/harness/v2.test.ts src/agents/harness/selection.test.ts src/trajectory/metadata.test.ts

Scope boundary

Doesn't change behavior for:

  • LLM-phase timeouts (the existing rotation/fallback path)
  • timedOutDuringCompaction (untouched)
  • External aborts (untouched)
  • Idle-watchdog timeouts (untouched — the LLM-idle-timeout handler still calls abortRun(true) and is correctly classified as LLM-phase since no tool is in flight by definition during idle wait)

Note

AI-assisted (Claude Opus 4.7).

🤖 Generated with Claude Code

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling extensions: codex size: S labels May 2, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 2, 2026

Codex review: needs maintainer review before merge.

Summary
The PR adds a timedOutDuringToolExecution attempt flag and threads it through embedded-run timeout handling, failover policy, trajectory metadata, fixtures, tests, and changelog coverage.

Reproducibility: yes. Configure a short agents.defaults.timeoutSeconds, a fallback model chain, and a long process(poll), browser, or exec tool call; current main can also be source-verified because non-compaction timedOut still feeds assistant failover.

Next step before merge
The implementation PR appears sound; the next action is maintainer review plus CI/changed-gate validation, not an automated repair.

Security
Cleared: The diff changes agent runtime timeout plumbing, diagnostics metadata, tests, and changelog text without new dependencies, CI permissions, secret handling, downloads, or package execution paths.

Review details

Best possible solution:

Merge the focused active-tool timeout exemption after maintainer review and CI/changed-gate proof, keeping LLM-phase timeouts on the existing fallback path.

Do we have a high-confidence way to reproduce the issue?

Yes. Configure a short agents.defaults.timeoutSeconds, a fallback model chain, and a long process(poll), browser, or exec tool call; current main can also be source-verified because non-compaction timedOut still feeds assistant failover.

Is this the best way to solve the issue?

Yes. The proposed active-tool flag mirrors the established compaction exemption, uses existing tool lifecycle state, keeps the new public attempt-result field optional, and avoids broader timeout-policy refactoring.

Acceptance criteria:

  • pnpm test src/agents/pi-embedded-runner/run/failover-policy.test.ts src/agents/pi-embedded-runner/run/assistant-failover.test.ts
  • pnpm test src/agents/pi-embedded-runner/run/compaction-timeout.test.ts src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts
  • pnpm test src/agents/pi-embedded-runner/run.timeout-triggered-compaction.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts
  • pnpm test src/agents/harness/v2.test.ts src/agents/harness/selection.test.ts src/trajectory/metadata.test.ts
  • pnpm check:changed

What I checked:

Likely related people:

  • steipete: Local blame and history show Peter Steinberger on the current embedded-runner/failover lines and broader failover refactors, and the PR timeline mentions steipete on this thread. (role: recent maintainer and current-main owner; confidence: high; commits: 29dc18d33d50, 6472e03412ef, d2597d5ecf2e; files: src/agents/pi-embedded-runner/run/failover-policy.ts, src/agents/pi-embedded-runner/run/attempt.ts, src/agents/pi-embedded-runner/run.ts)
  • mverrilli: Local history shows Michael Verrilli authored the earlier compaction-timeout deadlock fix, which is the same timeout classification family this PR mirrors. (role: adjacent timeout-precedent contributor; confidence: medium; commits: e6f67d5f3150; files: src/agents/pi-embedded-runner/run/attempt.ts, src/agents/pi-embedded-runner/run.ts)
  • asyncjason: The PR and linked issue explicitly cite merged PR feat: make compaction timeout configurable via agents.defaults.compaction.timeoutSeconds #46889 as the precedent for non-LLM timeout exemptions, credited in the provided context to asyncjason. (role: compaction timeout precedent contributor; confidence: medium; commits: da6467f7ba6f; files: src/agents/pi-embedded-runner/run/attempt.ts, src/agents/pi-embedded-runner/run/compaction-timeout.ts, CHANGELOG.md)
  • Vincent Koc: Local history shows repeated work on embedded tool hook correlation and after_tool_call context, adjacent to the toolStartData lifecycle signal used by this PR. (role: tool lifecycle adjacent owner; confidence: medium; commits: 747902a26a63, 0954b6bf5f48, 44183c6eb19e; files: src/agents/pi-embedded-subscribe.handlers.tools.ts)

Remaining risk / open question:

  • Targeted tests and pnpm check:changed were not run in this read-only review; they should gate merge.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 6a54aac4892c.

simonusa added a commit to simonusa/simons-openclaw that referenced this pull request May 2, 2026
… synthesis

Address P2 review finding on PR openclaw#75873: thread timedOutDuringToolExecution
through the two remaining timeout-handling branches in run.ts so a long
tool call doesn't trigger compaction/retry or surface a misleading
"Request timed out before a response was generated" payload before the
failover-policy exemption matters.

- run.ts:1250 (timeout-triggered compaction): tool-exec timeouts skip
  compaction. Compacting wouldn't help — the LLM already responded; the
  prompt wasn't the problem. It would only lose in-flight tool context.

- run.ts:2084 (generic timeout payload synthesis): tool-exec timeouts
  skip the synthesized "Request timed out before a response was generated"
  payload. The assistant did produce a response (a tool call that ran
  long); the partial tool output already in the session is the correct
  artifact to surface.

Also adds the missing CHANGELOG entry (P3 finding) under Unreleased/Fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@simonusa simonusa force-pushed the feat/tool-execution-timeout-exemption-52147 branch from c22de36 to d1e3326 Compare May 2, 2026 11:42
simonusa added a commit to simonusa/simons-openclaw that referenced this pull request May 2, 2026
… synthesis

Address P2 review finding on PR openclaw#75873: thread timedOutDuringToolExecution
through the two remaining timeout-handling branches in run.ts so a long
tool call doesn't trigger compaction/retry or surface a misleading
"Request timed out before a response was generated" payload before the
failover-policy exemption matters.

- run.ts:1250 (timeout-triggered compaction): tool-exec timeouts skip
  compaction. Compacting wouldn't help — the LLM already responded; the
  prompt wasn't the problem. It would only lose in-flight tool context.

- run.ts:2084 (generic timeout payload synthesis): tool-exec timeouts
  skip the synthesized "Request timed out before a response was generated"
  payload. The assistant did produce a response (a tool call that ran
  long); the partial tool output already in the session is the correct
  artifact to surface.

Also adds the missing CHANGELOG entry (P3 finding) under Unreleased/Fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
simonusa added a commit to simonusa/simons-openclaw that referenced this pull request May 2, 2026
…K back-compat

Address P2 review finding on PR openclaw#75873: EmbeddedRunAttemptResult is
re-exported as AgentHarnessAttemptResult in the plugin SDK
(src/plugin-sdk/agent-harness-runtime.ts:25), so making the new field
required would force every third-party harness to add a field it cannot
necessarily observe.

- types.ts: field is now optional with documentation explaining the
  back-compat contract.
- run.ts: destructure with `?? false` default at the runner boundary so
  internal failover-decision code can rely on a strict boolean.

Internal embedded-runner attempt code always sets the flag explicitly,
so this is purely a defensive default for the public SDK contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
simonusa added a commit to simonusa/simons-openclaw that referenced this pull request May 2, 2026
P2: Remove the stray <<<<<<< HEAD marker on CHANGELOG.md:41 left over
from the rebase-conflict resolution. CI's check-no-conflict-markers.mjs
caught it as an actionlint failure.

P3: Add @simonusa attribution to the Unreleased fixes entry per
OpenClaw changelog policy.

Both findings from clawsweeper review on PR openclaw#75873.
@simonusa simonusa marked this pull request as ready for review May 2, 2026 13:23
@simonusa simonusa requested a review from a team as a code owner May 2, 2026 13:23
@obviyus obviyus self-assigned this May 2, 2026
@obviyus obviyus force-pushed the feat/tool-execution-timeout-exemption-52147 branch from a1b57e9 to 424cf69 Compare May 2, 2026 13:47
obviyus pushed a commit to simonusa/simons-openclaw that referenced this pull request May 2, 2026
… synthesis

Address P2 review finding on PR openclaw#75873: thread timedOutDuringToolExecution
through the two remaining timeout-handling branches in run.ts so a long
tool call doesn't trigger compaction/retry or surface a misleading
"Request timed out before a response was generated" payload before the
failover-policy exemption matters.

- run.ts:1250 (timeout-triggered compaction): tool-exec timeouts skip
  compaction. Compacting wouldn't help — the LLM already responded; the
  prompt wasn't the problem. It would only lose in-flight tool context.

- run.ts:2084 (generic timeout payload synthesis): tool-exec timeouts
  skip the synthesized "Request timed out before a response was generated"
  payload. The assistant did produce a response (a tool call that ran
  long); the partial tool output already in the session is the correct
  artifact to surface.

Also adds the missing CHANGELOG entry (P3 finding) under Unreleased/Fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
obviyus pushed a commit to simonusa/simons-openclaw that referenced this pull request May 2, 2026
…K back-compat

Address P2 review finding on PR openclaw#75873: EmbeddedRunAttemptResult is
re-exported as AgentHarnessAttemptResult in the plugin SDK
(src/plugin-sdk/agent-harness-runtime.ts:25), so making the new field
required would force every third-party harness to add a field it cannot
necessarily observe.

- types.ts: field is now optional with documentation explaining the
  back-compat contract.
- run.ts: destructure with `?? false` default at the runner boundary so
  internal failover-decision code can rely on a strict boolean.

Internal embedded-runner attempt code always sets the flag explicitly,
so this is purely a defensive default for the public SDK contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@obviyus obviyus force-pushed the feat/tool-execution-timeout-exemption-52147 branch from 424cf69 to 5235de7 Compare May 2, 2026 13:50
simonusa and others added 4 commits May 2, 2026 19:21
Detect run-level timeouts that fire while a tool call is still active and keep them out of assistant model fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thread tool-timeout state through timeout-triggered compaction, generic timeout payload synthesis, and the changelog.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keep the new attempt-result field optional for third-party harness compatibility while defaulting absent values at the runner boundary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@obviyus obviyus force-pushed the feat/tool-execution-timeout-exemption-52147 branch from 5235de7 to cf156be Compare May 2, 2026 13:52
Copy link
Copy Markdown
Contributor

@obviyus obviyus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified the runner/failover bug path: run-level timeouts during active tool execution now avoid model fallback while LLM-phase timeouts keep the existing rotation path.

Maintainer follow-up: rebased onto current main, cleaned the commit messages, trimmed redundant comments, and kept the changelog entry in the active Unreleased fixes block.

Local gate: skipped tests per maintainer request; ran conflict-marker and diff sanity checks during the rebase.

@obviyus obviyus merged commit 731f640 into openclaw:main May 2, 2026
94 checks passed
@obviyus
Copy link
Copy Markdown
Contributor

obviyus commented May 2, 2026

Landed on main.

Thanks @simonusa.

simonusa added a commit to simonusa/simons-openclaw that referenced this pull request May 2, 2026
Address P2/P3 bot review on PR openclaw#62682:

[P2] Embedded run-budget timeouts misclassified — `opts.abortSignal` is
the caller-provided signal (HTTP disconnect, cron wrapper). It is NOT
the embedded runner's private `runAbortController` — when
`scheduleAbortTimer` fires and calls `abortRun(true)`, only the internal
controller is aborted with a TimeoutError. The previous patch's
`isTerminalAbort(opts.abortSignal)` check therefore missed the openclaw#60388
case entirely; the fallback chain still tried alternative models even
though the whole run's deadline had elapsed.

Two complementary changes address this:

(1) `EmbeddedRunAttemptResult.timedOutByRunBudget` flag — set explicitly
    in attempt.ts when the run-budget timer fires (NOT when LLM idle
    watchdog fires; that's still LLM-phase and benefits from fallback).
    Threaded through to failover-policy `shouldRotateAssistant`
    alongside the existing `timedOutDuringCompaction` and
    `timedOutDuringToolExecution` exemptions. Also gates the
    timeout-triggered compaction branch in run.ts (compacting wouldn't
    help — the deadline is exhausted). Optional in result type for
    public harness SDK back-compat (matches openclaw#75873 pattern).

(2) `isTerminalAbortFromError(err)` — new helper that mirrors the
    existing `isTerminalAbort(signal)` checks but reads the thrown
    error's `.cause` chain. Needed because `abortable()` wraps the
    embedded controller's TimeoutError in an outer AbortError, and the
    fallback layer only sees the thrown error (not the embedded
    controller's signal). Used in `runFallbackCandidate.catch` next to
    the existing signal check.

[P3] CHANGELOG entry added under Unreleased/Fixes.

Tests: 197/197 pass in failover-policy, assistant-failover,
model-fallback, and trajectory-metadata suites. New regression cases
cover: (a) thrown error with TimeoutError in cause chain rethrows; (b)
thrown error with ClientDisconnectError in cause chain rethrows; (c)
generic AbortError still falls back; (d) failover-policy exempts
run-budget timeouts both pre- and post-rotation.

Closes openclaw#60388.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent run timeout during tool execution misclassified as LLM timeout, triggers unnecessary model fallback

2 participants