Skip to content

fix(gateway): stop chat timeout fallback cascade#87085

Merged
steipete merged 10 commits into
mainfrom
meow/control-ui-webchat-timeout-fallback
May 27, 2026
Merged

fix(gateway): stop chat timeout fallback cascade#87085
steipete merged 10 commits into
mainfrom
meow/control-ui-webchat-timeout-fallback

Conversation

@BunsDev

@BunsDev BunsDev commented May 26, 2026

Copy link
Copy Markdown
Member

Summary

  • Tag server-side chat.send timeout cleanup aborts with a terminal TimeoutError reason.
  • Thread the caller abort signal into model fallback and stop immediately on terminal timeout/client-disconnect aborts instead of retrying every fallback candidate.
  • Add focused regression coverage for timeout abort reasons and terminal caller-abort fallback behavior.

Fixes #83962.
Refs #83963, #62682.

Verification

  • pnpm install after rebasing onto current origin/main to refresh missing post-rebase dependency rastermill; no repo diff retained.
  • node scripts/run-vitest.mjs src/agents/model-fallback.test.ts src/gateway/chat-abort.test.ts
  • node scripts/run-vitest.mjs src/agents/model-fallback.run-embedded.e2e.test.ts src/gateway/server-methods/agent.test.ts src/gateway/server.chat.gateway-server-chat.test.ts
  • git diff --check origin/main..HEAD
  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main

Real behavior proof

Behavior addressed: A stale Control UI/WebChat chat.send maintenance timeout now becomes a terminal TimeoutError at the shared chat abort controller, and model fallback stops after the active candidate observes that terminal caller abort instead of cascading the same expired signal through every fallback candidate.

Real environment tested: Local macOS Codex worktree, branch meow/control-ui-webchat-timeout-fallback, commit c78e976b12d5, rebased on current origin/main 6ef0cbb94f38.

Exact steps or command run after this patch: Ran the focused unit tests for chat-abort and model-fallback, then ran the embedded fallback and gateway chat/agent server-method tests listed above.

Evidence after fix: Focused suite passed 5 files / 160 tests; broader suite passed 3 files / 275 tests; branch autoreview reported no accepted/actionable findings.

Observed result after fix: Timeout-derived chat aborts carry signal.reason.name === "TimeoutError", preserve abortStopReason === "timeout", and terminal caller aborts stop fallback after one failed candidate while ordinary provider failover remains covered by existing tests.

What was not tested: A live browser Control UI startup delay against real provider fallback was not run; this PR uses gateway/fallback boundary regressions for the shipped failure mode.

Copilot AI review requested due to automatic review settings May 26, 2026 23:56
@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling labels May 26, 2026
@openclaw-barnacle openclaw-barnacle Bot added size: S maintainer Maintainer-authored PR labels May 26, 2026
@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 26, 2026, 10:48 PM ET / 02:48 UTC.

Summary
The branch tags gateway chat maintenance timeouts with TimeoutError, threads admitted chat/reply abort signals into model fallback, stops terminal abort fallback attempts, and expands regression coverage.

PR surface: Source +62, Tests +255. Total +317 across 13 files.

Reproducibility: yes. The linked report has concrete WebChat logs showing every fallback candidate receiving the same timeout abort, and the patched source/tests now trace that caller abort signal through the fallback boundary; I did not run a live Control UI repro in this read-only review.

Review metrics: 1 noteworthy metric.

  • Fallback Stop Surface: 1 shared gate added, 4 fallback callers wired. The PR changes a shared fallback decision and threads caller abort state through multiple runtime entry points, so maintainers should review it as a compatibility-sensitive behavior change.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • This deliberately changes fallback compatibility for already-expired gateway/chat abort signals: configured fallback candidates stop being tried once the caller abort is terminal.
  • The related cron-prefix, embedded private runner, and compaction-signal cleanup remains in fix(agents): distinguish terminal aborts from retryable failures (#60388) #62682, so maintainers should keep the landing order clear.

Maintainer options:

  1. Land The Scoped Gateway Fix (recommended)
    Accept the changed terminal-abort fallback behavior for gateway/chat paths now and leave the cron/embedded cleanup to the related broader PR.
  2. Ask For Live WebChat Proof
    Require a real Control UI/WebChat timeout run showing the cascade stops after the first terminal abort before merge if maintainers want runtime proof beyond Testbox and focused regressions.
  3. Pause For The Broader PR
    Hold this branch and land only the broader terminal-abort PR if maintainers prefer one cross-surface compatibility decision.

Next step before merge
Human maintainer review/merge is appropriate because the protected maintainer label and fallback compatibility change require acceptance, and I found no narrow automated repair to make.

Security
Cleared: No security or supply-chain concern found; the diff stays in TypeScript runtime/test code and does not touch secrets, dependencies, workflows, install scripts, or package metadata.

Review details

Best possible solution:

Land this scoped gateway/WebChat timeout fix if maintainers accept the fallback compatibility change, then keep the broader terminal-abort sources in #62682.

Do we have a high-confidence way to reproduce the issue?

Yes. The linked report has concrete WebChat logs showing every fallback candidate receiving the same timeout abort, and the patched source/tests now trace that caller abort signal through the fallback boundary; I did not run a live Control UI repro in this read-only review.

Is this the best way to solve the issue?

Yes for the scoped WebChat/gateway outage path. Tagging the timeout source and carrying the admitted run abort signal to the shared fallback boundary is narrower than changing provider timeout behavior, while broader abort sources remain separated for the related PR.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 9119492f158a.

Label changes

Label changes:

  • add rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦞 diamond lobster, so this older rating label is no longer current.

Label justifications:

  • P1: The PR targets a user-visible Control UI/WebChat timeout cascade that can break agent replies for real users.
  • merge-risk: 🚨 compatibility: Merging changes when configured model fallbacks are skipped for terminal gateway/chat aborts, which can alter existing timeout and fallback behavior.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: This is a member/maintainer-labeled PR, so the external-contributor proof gate does not apply; the discussion still includes focused local and Testbox verification.
Evidence reviewed

PR surface:

Source +62, Tests +255. Total +317 across 13 files.

View PR surface stats
Area Files Added Removed Net
Source 7 71 9 +62
Tests 6 262 7 +255
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 13 333 16 +317

What I checked:

  • Terminal abort fallback gate: At PR head, runWithModelFallback accepts an optional caller abort signal, recognizes TimeoutError and ClientDisconnectError reasons, and rethrows terminal abort failures before continuing to fallback candidates. (src/agents/model-fallback.ts:144, f2a46a31af2e)
  • WebChat/reply path signal plumbing: The rewritten auto-reply path computes the effective run abort signal and passes it both to runWithModelFallback and the CLI/embedded candidate runs, addressing the review note that the reported WebChat path previously missed the fallback boundary. (src/auto-reply/reply/agent-runner-execution.ts:1866, f2a46a31af2e)
  • Followup and memory fallback signal plumbing: The followup runner and memory flush runner now pass the admitted reply operation signal to the outer fallback orchestration and candidate runs, closing sibling paths around the same admitted reply operation. (src/auto-reply/reply/followup-runner.ts:627, f2a46a31af2e)
  • Gateway timeout reason tagging: abortChatRunById now aborts timeout cleanup with an Error named TimeoutError, giving downstream fallback code a concrete terminal abort reason while preserving non-timeout abort behavior. (src/gateway/chat-abort.ts:38, f2a46a31af2e)
  • Async gateway agent classification: Gateway agent dispatch now treats rejections as aborted only when its registered abort controller is already aborted, preserving provider timeout errors without a gateway abort as errors while reporting gateway timeout aborts as timeout results. (src/gateway/server-methods/agent.ts:621, f2a46a31af2e)
  • Regression coverage and maintainer proof: The PR adds tests for terminal caller-abort fallback behavior, chat timeout abort reasons, admitted reply signal propagation, and gateway timeout/error classification; the latest maintainer verification comment reports focused Vitest, Testbox changed gate tbx_01kskmhr99rp48k5gcn3pg7h8p, and clean autoreview. (src/gateway/server-methods/agent.test.ts:2691, f2a46a31af2e)

Likely related people:

  • Shakker: Current-main blame on the central fallback, gateway abort, and async gateway agent code points to commit 848c38907d, so Shakker is a useful history contact for the current implementation shape. (role: recent area contributor; confidence: medium; commits: 848c38907de1; files: src/agents/model-fallback.ts, src/gateway/chat-abort.ts, src/gateway/server-methods/agent.ts)
  • steipete: After identifying the WebChat path mismatch, steipete authored the rewrite commits on this branch and posted the latest focused/Testbox verification for the fixed gateway fallback path. (role: reviewer and PR rewrite author; confidence: high; commits: 8c0e233f3045, 30d323192124, db1d0cc17f08; files: src/auto-reply/reply/agent-runner-execution.ts, src/auto-reply/reply/followup-runner.ts, src/auto-reply/reply/agent-runner-memory.ts)
  • BunsDev: BunsDev authored the initial branch commit and explicitly scoped the maintainer preference to landing this narrow WebChat/chat-send timeout fix before trimming the broader related PR. (role: initial patch author and maintainer preference signal; confidence: medium; commits: b4e178cef054; files: src/agents/model-fallback.ts, src/gateway/chat-abort.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes WebChat/Control UI timeout handling so a stale chat.send maintenance timeout is treated as a terminal abort and does not cascade through every model fallback candidate.

Changes:

  • Tags chat timeout aborts with a TimeoutError abort reason.
  • Threads the caller abort signal into model fallback and stops fallback on terminal timeout/client-disconnect aborts.
  • Adds regression coverage and a changelog entry for the WebChat timeout cascade fix.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/gateway/chat-abort.ts Adds timeout-specific abort reason tagging for chat run aborts.
src/gateway/chat-abort.test.ts Verifies timeout aborts carry a TimeoutError reason.
src/agents/model-fallback.ts Detects terminal caller aborts and stops fallback immediately.
src/agents/model-fallback.test.ts Covers terminal timeout abort behavior through runWithModelFallback.
src/agents/agent-command.ts Passes the agent command abort signal into fallback orchestration.
CHANGELOG.md Documents the Control UI/WebChat timeout cascade fix.

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Gilded Review Wisp

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: sparkles near resolved comments.
Image traits: location diff observatory; accessory miniature diff map; palette moss green and polished brass; mood proud; pose sitting proudly on a smooth stone; shell glossy opal shell; lighting clean product lighting; background soft code-shaped tiles.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Gilded Review Wisp in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@clawsweeper clawsweeper Bot added P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. labels May 27, 2026
@BunsDev

BunsDev commented May 27, 2026

Copy link
Copy Markdown
Member Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@simonusa

Copy link
Copy Markdown
Contributor

@BunsDev — heads up on overlap with #62682 (you have it in Refs:). Your isTerminalAbort(signal) in model-fallback.ts is the same shape #62682 lands (also a signal.reason.name === "TimeoutError" || === "ClientDisconnectError" check at the candidate boundary), and the chat-abort.ts source tagging is a clean upstream half I don't have in #62682.

Three additional terminal-abort sources #62682 covers that #87085 doesn't, in case useful for one combined landing:

  1. Cron timeout reason strings (src/cron/service/timer.ts timeoutErrorMessage/setupTimeoutErrorMessage/preExecutionTimeoutErrorMessage): these abort with plain strings ("cron: job execution timed out (last phase: ...)"), so signal.reason instanceof Error is false and isTerminalAbort returns false. fix(agents): distinguish terminal aborts from retryable failures (#60388) #62682 adds a TERMINAL_ABORT_REASON_PREFIXES list + prefix-match check for the bare + phase-suffixed shapes.
  2. Embedded runner private runAbortController (closes Don't trigger model fallback when abort reason is the run's own timeout budget #60388): scheduleAbortTimer aborts a private controller the fallback layer never sees; the rejection comes through as AbortError(cause: TimeoutError) via pi-embedded-runner/run/abortable.ts makeAbortError. fix(agents): distinguish terminal aborts from retryable failures (#60388) #62682 adds isTerminalAbortFromError(err) with a module-private Symbol marker on abortable() so SDK consumers can't bypass the hook via the typed API.
  3. Compaction fallback: compactEmbeddedPiSessionDirect calls runWithModelFallback without abortSignal plumbed (flagged by @Lellansin on fix(agents): distinguish terminal aborts from retryable failures (#60388) #62682). fix(agents): distinguish terminal aborts from retryable failures (#60388) #62682 adds abortSignal: params.abortSignal at compact.ts:436.

Happy to coordinate landing order — if #87085 merges first, I'll rebase #62682 and trim the chat-side isTerminalAbort duplicate, keeping only the cron-prefix / .cause-chain / compaction-signal pieces. Or if you'd prefer one combined PR, I can fold those into #87085 directly. Whichever you prefer.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 27, 2026
@BunsDev

BunsDev commented May 27, 2026

Copy link
Copy Markdown
Member Author

@simonusa thanks for the overlap map. Maintainer preference here is to land #87085 first as the narrow WebChat/chat-send timeout fix for #83962.

The pieces I’d keep in #87085 are the chat timeout source tagging, the agent-command abort-signal plumbing, and the caller-signal terminal abort check at the model-fallback boundary. The three extra sources you called out are real, but I’d keep them in #62682 / #60388 rather than folding them into this P1 PR:

  • cron timeout reason strings / prefix matching
  • embedded private runAbortController surfaced as AbortError(cause: TimeoutError) with the marker/cause-chain guard
  • compaction fallback abortSignal: params.abortSignal plumbing

So if #87085 lands first, please rebase #62682 afterward and trim the duplicated isTerminalAbort(signal) / chat-side model-fallback caller-abort pieces, while keeping the cron-prefix, .cause-chain, and compaction-signal coverage. That should keep this PR shippable for the Control UI/WebChat outage path without losing the broader terminal-abort cleanup.

@steipete

Copy link
Copy Markdown
Contributor

Blocking path mismatch found in review.

The core idea looks right, but this PR does not currently wire the new terminal-abort gate into the webchat path from #83962.

Evidence:

  • Control UI webchat request timeout aborts all fallback candidates before embedded Codex run starts #83962's failure log is Embedded agent failed before reply, which comes from src/auto-reply/reply/agent-runner-execution.ts.
  • The new runWithModelFallback({ abortSignal }) plumbing is only added in src/agents/agent-command.ts, which covers the agent-command path.
  • In src/auto-reply/reply/agent-runner-execution.ts, the top-level runWithModelFallback call still lacks the caller abort signal. The signal is only passed deeper into each candidate run via abortSignal: params.replyOperation?.abortSignal ?? params.opts?.abortSignal.
  • Because model-fallback.ts checks params.abortSignal at the fallback boundary, that webchat caller can still see an aborted candidate and continue to the next fallback with no terminal caller signal visible to the new gate.
  • The same pattern appears in src/auto-reply/reply/followup-runner.ts: the candidate run gets queued.abortSignal, but the surrounding runWithModelFallback call does not.

Requested fix:

I would not merge this as-is: the design is sound, but the current diff misses the user-reported path.

@steipete steipete force-pushed the meow/control-ui-webchat-timeout-fallback branch from d4dd9bd to f2a46a3 Compare May 27, 2026 02:39
@steipete

Copy link
Copy Markdown
Contributor

Verification after rewrite:

Behavior addressed: gateway/chat fallback timeout aborts now propagate the admitted run abort signal through fallback orchestration, memory flush fallback, CLI/embedded candidates, and async gateway agent terminal handling. Provider/runtime timeouts without the gateway abort signal still surface as errors instead of successful aborts.

Real environment tested: local macOS checkout plus Blacksmith Testbox changed gate.

Exact steps or command run after this patch:

  • node scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-memory.test.ts src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/followup-runner.test.ts src/agents/model-fallback.test.ts src/gateway/chat-abort.test.ts src/gateway/server-methods/agent.test.ts
  • pnpm check:changed
  • /Users/steipete/Projects/agent-skills/skills/autoreview/scripts/autoreview --mode branch --base origin/main

Evidence after fix:

  • Focused Vitest: 10 files, 639 tests passed.
  • Testbox changed gate: tbx_01kskmhr99rp48k5gcn3pg7h8p / amber-lobster, exit 0.
  • Autoreview: clean, no accepted/actionable findings; overall patch is correct.

Observed result after fix: timeout-aborted fallback attempts stop instead of cascading, wrapped timeout aborts preserve timeout status/stopReason, and provider TimeoutError rejections without the gateway abort signal remain errors.

What was not tested: live provider timeout against a real remote provider.

@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 27, 2026
@steipete steipete force-pushed the meow/control-ui-webchat-timeout-fallback branch from f2a46a3 to 8f42085 Compare May 27, 2026 02:53
@steipete

Copy link
Copy Markdown
Contributor

Verification after final rebase:

Behavior addressed: gateway/chat fallback timeout aborts now propagate the admitted run abort signal through fallback orchestration, memory flush fallback, CLI/embedded candidates, and async gateway agent terminal handling. Provider/runtime timeouts without the gateway abort signal still surface as errors instead of successful aborts.

Real environment tested: local macOS checkout plus Blacksmith Testbox changed gate.

Exact steps or command run after this patch:

  • node scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-memory.test.ts src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/followup-runner.test.ts src/agents/model-fallback.test.ts src/gateway/chat-abort.test.ts src/gateway/server-methods/agent.test.ts
  • pnpm check:test-types
  • pnpm check:changed
  • /Users/steipete/Projects/agent-skills/skills/autoreview/scripts/autoreview --mode branch --base origin/main

Evidence after fix:

  • Focused Vitest: 10 files, 648 tests passed.
  • Test types: passed.
  • Testbox changed gate: tbx_01kskn9w6hwh5ys8fmmk399rgk / crimson-hermit, exit 0.
  • Autoreview: clean, no accepted/actionable findings; overall patch is correct.

Observed result after fix: timeout-aborted fallback attempts stop instead of cascading, wrapped timeout aborts preserve timeout status/stopReason, and provider TimeoutError rejections without the gateway abort signal remain errors.

What was not tested: live provider timeout against a real remote provider.

@openclaw-barnacle openclaw-barnacle Bot added the cli CLI command changes label May 27, 2026
@steipete steipete merged commit b4f6928 into main May 27, 2026
105 of 106 checks passed
@steipete steipete deleted the meow/control-ui-webchat-timeout-fallback branch May 27, 2026 02:54
simonusa added a commit to simonusa/simons-openclaw that referenced this pull request May 27, 2026
…alAbort (closes openclaw#60388)

ClientDisconnectError on signal.reason) plus abortSignal threading through
the model-fallback and chat-side callers. This change adds the coverage

- ClientDisconnectError class + wiring in http-common.ts so the
  reason.name === "ClientDisconnectError" branch openclaw#87085 added is actually
  reachable (upstream watchClientDisconnect still aborts bare).
- cron run-budget string reasons (prefix match) — cron timer aborts with a
  plain string, which the Error-only base check skips.
- .cause-chain walking + isTerminalAbortFromError gated on the
  OPENCLAW_ABORTABLE_WRAPPER marker, for the embedded run-budget timer that
  aborts a private controller (not the caller signal).
- compaction-path abortSignal forwarding (compactEmbeddedPiSessionDirect).
- timedOutByRunBudget plumbing through attempt/failover-policy/assistant-failover
  so run-budget timeouts skip the fallback chain and wasted compaction.
simonusa added a commit to simonusa/simons-openclaw that referenced this pull request May 27, 2026
…alAbort (closes openclaw#60388)

PR openclaw#87085 landed the base isTerminalAbort(signal) check (TimeoutError /
ClientDisconnectError on signal.reason) plus abortSignal threading through
the model-fallback and chat-side callers. This change adds the coverage that
PR openclaw#87085 did not include:

- ClientDisconnectError class + wiring in http-common.ts so the
  reason.name === "ClientDisconnectError" branch PR openclaw#87085 added is actually
  reachable (upstream watchClientDisconnect still aborts bare).
- cron run-budget string reasons (prefix match) — cron timer aborts with a
  plain string, which the Error-only base check skips.
- .cause-chain walking + isTerminalAbortFromError gated on the
  OPENCLAW_ABORTABLE_WRAPPER marker, for the embedded run-budget timer that
  aborts a private controller (not the caller signal).
- compaction-path abortSignal forwarding (compactEmbeddedPiSessionDirect).
- timedOutByRunBudget plumbing through attempt/failover-policy/assistant-failover
  so run-budget timeouts skip the fallback chain and wasted compaction.
simonusa added a commit to simonusa/simons-openclaw that referenced this pull request May 28, 2026
…alAbort (closes openclaw#60388)

PR openclaw#87085 landed the base isTerminalAbort(signal) check (TimeoutError /
ClientDisconnectError on signal.reason) plus abortSignal threading through
the model-fallback and chat-side callers. This change adds the coverage that
PR openclaw#87085 did not include:

- ClientDisconnectError class + wiring in http-common.ts so the
  reason.name === "ClientDisconnectError" branch PR openclaw#87085 added is actually
  reachable (upstream watchClientDisconnect still aborts bare).
- cron run-budget string reasons (prefix match) — cron timer aborts with a
  plain string, which the Error-only base check skips.
- .cause-chain walking + isTerminalAbortFromError gated on the
  OPENCLAW_ABORTABLE_WRAPPER marker, for the embedded run-budget timer that
  aborts a private controller (not the caller signal).
- compaction-path abortSignal forwarding (compactEmbeddedPiSessionDirect).
- timedOutByRunBudget plumbing through attempt/failover-policy/assistant-failover
  so run-budget timeouts skip the fallback chain and wasted compaction.
simonusa added a commit to simonusa/simons-openclaw that referenced this pull request May 28, 2026
…alAbort (closes openclaw#60388)

PR openclaw#87085 landed the base isTerminalAbort(signal) check (TimeoutError /
ClientDisconnectError on signal.reason) plus abortSignal threading through
the model-fallback and chat-side callers. This change adds the coverage that
PR openclaw#87085 did not include:

- ClientDisconnectError class + wiring in http-common.ts so the
  reason.name === "ClientDisconnectError" branch PR openclaw#87085 added is actually
  reachable (upstream watchClientDisconnect still aborts bare).
- cron run-budget string reasons (prefix match) — cron timer aborts with a
  plain string, which the Error-only base check skips.
- .cause-chain walking + isTerminalAbortFromError gated on the
  OPENCLAW_ABORTABLE_WRAPPER marker, for the embedded run-budget timer that
  aborts a private controller (not the caller signal).
- compaction-path abortSignal forwarding (compactEmbeddedPiSessionDirect).
- timedOutByRunBudget plumbing through attempt/failover-policy/assistant-failover
  so run-budget timeouts skip the fallback chain and wasted compaction.
simonusa added a commit to simonusa/simons-openclaw that referenced this pull request Jun 3, 2026
…alAbort (closes openclaw#60388)

PR openclaw#87085 landed the base isTerminalAbort(signal) check (TimeoutError /
ClientDisconnectError on signal.reason) plus abortSignal threading through
the model-fallback and chat-side callers. This change adds the coverage that
PR openclaw#87085 did not include:

- ClientDisconnectError class + wiring in http-common.ts so the
  reason.name === "ClientDisconnectError" branch PR openclaw#87085 added is actually
  reachable (upstream watchClientDisconnect still aborts bare).
- cron run-budget string reasons (prefix match) — cron timer aborts with a
  plain string, which the Error-only base check skips.
- .cause-chain walking + isTerminalAbortFromError gated on the
  OPENCLAW_ABORTABLE_WRAPPER marker, for the embedded run-budget timer that
  aborts a private controller (not the caller signal).
- compaction-path abortSignal forwarding (compactEmbeddedPiSessionDirect).
- timedOutByRunBudget plumbing through attempt/failover-policy/assistant-failover
  so run-budget timeouts skip the fallback chain and wasted compaction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling cli CLI command changes gateway Gateway runtime maintainer Maintainer-authored PR merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Control UI webchat request timeout aborts all fallback candidates before embedded Codex run starts

4 participants