fix(cron): use per-attempt abort controller so model fallback chain survives timeout by Sid-Qin · Pull Request #38149 · openclaw/openclaw

Sid-Qin · 2026-03-06T16:36:39Z

Summary

Problem: Cron job timeout aborts a shared AbortController. When the first model attempt times out, subsequent fallback attempts immediately fail because they receive an already-aborted signal.
Why it matters: Model fallback chains become useless under cron timeout — the first timeout kills all alternatives, preventing recovery with a different (possibly faster) model.
What changed: In the run callback passed to runWithModelFallback, create a fresh AbortController for each attempt. Forward the outer signal's abort event to the per-attempt controller, so new attempts start with a non-aborted signal while remaining cancellable. Added 2 unit tests.
What did NOT change: timer.ts timeout logic, runWithModelFallback loop, CLI agent path (uses its own timeout), outer abort semantics.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes Cron job timeout aborts entire model fallback chain via shared AbortController #37505

User-visible / Behavior Changes

Cron jobs with model fallback chains now correctly try alternative models after the primary times out, instead of aborting the entire chain.

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Repro + Verification

Environment

OS: macOS Darwin 25.3.0 (arm64)
Runtime: Node v25.6.1

Steps

Configure a cron job with a slow primary model and a fallback chain
Set a timeout that triggers before the primary completes

Expected

After primary times out, fallback models are attempted

Actual

Before fix: All fallback attempts fail immediately with "cron: job execution timed out" because they see the aborted signal
After fix: Each fallback attempt gets a fresh abort controller

Evidence

 ✓ src/cron/isolated-agent/run.abort-signal.test.ts (2 tests) 5ms
   ✓ passes a per-attempt abort signal to runEmbeddedPiAgent, not the outer signal
   ✓ forwards outer abort to the per-attempt signal

Human Verification (required)

Verified scenarios: fresh signal per attempt, outer abort forwarded, listener cleanup on success
Edge cases checked: outer already aborted at attempt start, multiple sequential attempts
What I did not verify: Live cron job with real model timeout + fallback

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No

Failure Recovery (if this breaks)

How to disable/revert: Revert this commit
Files/config to restore: src/cron/isolated-agent/run.ts
Known bad symptoms: If per-attempt controller leaks (unlikely — addEventListener with once: true), abort forwarding could be delayed

Risks and Mitigations

Low — the change is scoped to the run callback closure. The outer abort is still forwarded, so cancellation semantics are preserved. The only behavior change is that new fallback attempts get a fresh signal.

…urvives timeout The cron timeout handler aborts a shared AbortController. When the first model attempt times out, subsequent fallback attempts receive an already- aborted signal and fail immediately. Create a fresh AbortController for each fallback attempt and forward the outer abort to it, so new attempts start with a non-aborted signal while still being cancellable. Closes openclaw#37505

aisle-research-bot · 2026-03-06T16:36:48Z

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

#	Severity	Title
1	🔵 Low	AbortSignal listener leak and ignored pre-aborted signal in runCronIsolatedAgentTurn attempt runner

1. 🔵 AbortSignal listener leak and ignored pre-aborted signal in runCronIsolatedAgentTurn attempt runner

Property	Value
Severity	Low
CWE	CWE-772
Location	`src/cron/isolated-agent/run.ts:471-549`

Description

In runCronIsolatedAgentTurn, the per-attempt abort forwarding registers an abort event listener on the outer abortSignal, but the listener cleanup is not guaranteed on all exit paths.

Impacts:

Listener leak / retained closures: If the attempt exits via an exception/rejection (e.g., runEmbeddedPiAgent(...) throws) or via the runCliAgent(...) early-return path, the abort listener is never removed. Over multiple model fallback attempts and/or multiple prompts within the same cron run, this can accumulate listeners and retain AbortController/closure objects.
DoS / resource overuse risk: runWithModelFallback can invoke run(...) multiple times (per candidate model). Repeated failing attempts can therefore register multiple listeners without cleanup, growing memory use and adding abort propagation overhead.
Cancellation bypass (functional/security impact): The previous code threw immediately when abortSignal.aborted was already true. The new code skips listener registration when already aborted, but also does not abort the per-attempt controller; the attempt proceeds with a non-aborted attemptSignal, potentially continuing expensive work after the job timed out/canceled.

Vulnerable code (listener added, but removed only on one successful path):

const attemptController = new AbortController();
const attemptSignal = attemptController.signal;
const forwardAbort = () => attemptController.abort(abortSignal?.reason);
if (abortSignal && !abortSignal.aborted) {
  abortSignal.addEventListener("abort", forwardAbort, { once: true });
}
...
if (isCliProvider(providerOverride, cfgWithAgentDefaults)) {
  const result = await runCliAgent({ ... });
  ...
  return result; // <- listener is never removed
}
const result = await runEmbeddedPiAgent({ ..., abortSignal: attemptSignal, ... });
...
abortSignal?.removeEventListener("abort", forwardAbort);
return result;

Why this can repeat:

runWithModelFallback loops through fallback candidates and calls params.run(...) for each attempt until one succeeds or all fail, so the above run callback may execute multiple times per cron turn.

Recommendation

Ensure listener cleanup and correct abort propagation on all exit paths by using try/finally, and immediately abort the per-attempt controller when the outer signal is already aborted.

Suggested pattern:

const attemptController = new AbortController();
const attemptSignal = attemptController.signal;
const forwardAbort = () => attemptController.abort(abortSignal?.reason);

if (abortSignal?.aborted) {
  attemptController.abort(abortSignal.reason);
} else if (abortSignal) {
  abortSignal.addEventListener("abort", forwardAbort, { once: true });
}

try {
  if (isCliProvider(providerOverride, cfgWithAgentDefaults)) {
    return await runCliAgent({ /* ... */ });
  }

  const result = await runEmbeddedPiAgent({
    /* ... */
    abortSignal: attemptSignal,
  });
  return result;
} finally {
  abortSignal?.removeEventListener("abort", forwardAbort);
}

This guarantees:

no leaked listeners on exceptions or early returns
per-attempt signal is aborted immediately if the outer job was already canceled/timed out

Analyzed PR: #38149 at commit b02874f

_{Last updated on: 2026-03-06T17:02:08Z}

greptile-apps · 2026-03-06T16:40:37Z

Greptile Summary

This PR correctly fixes a bug where the shared AbortController used by the cron timeout would abort the first model attempt and immediately fail all subsequent fallback attempts.

The fix: Creates a fresh AbortController per runWithModelFallback attempt with addEventListener("abort", ..., { once: true }) to forward the outer signal's abort event. This allows fallback models to start with a clean, non-aborted signal while remaining cancellable by the outer timeout.

Why it works: Each attempt gets its own controller. When the outer signal aborts, the { once: true } listener fires once and self-removes, forwarding the abort to the current attempt's controller. The loop in runWithModelFallback then proceeds to the next fallback with a new controller.

Tests: Well-structured coverage of signal isolation (confirms per-attempt controller is not the outer signal) and abort forwarding behavior.

Confidence Score: 5/5

The fix correctly implements per-attempt abort controllers that preserve fallback chain functionality under timeout.
The core fix is sound: fresh AbortController per attempt with proper signal forwarding ensures fallback attempts start with non-aborted signals while remaining cancellable by the outer timeout. Listener cleanup via { once: true } is sufficient for the job-scoped lifetime. Tests validate signal isolation and forwarding behavior. No functional issues identified.
No files require special attention

_{Last reviewed commit: b02874f}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b02874ff92

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-06T16:41:09Z

+          if (abortSignal && !abortSignal.aborted) {
+            abortSignal.addEventListener("abort", forwardAbort, { once: true });


Abort per-attempt signal when outer timeout is already set

This logic only forwards cancellation when abortSignal is not yet aborted, so if the cron timeout fires between fallback attempts (or the signal is already aborted on entry), the new attemptSignal stays live and the next model attempt still runs. In executeJobCoreWithTimeout (src/cron/service/timer.ts), that outer signal is the job deadline mechanism, so this change can let timed-out jobs keep executing model/tool work (and potentially send side effects) after the scheduler has already treated the run as timed out.

Useful? React with 👍 / 👎.

openclaw-barnacle Bot added size: S experienced-contributor labels Mar 6, 2026

chatgpt-codex-connector Bot reviewed Mar 6, 2026

View reviewed changes

thewilloftheshadow closed this Mar 6, 2026

frankbuild mentioned this pull request Mar 10, 2026

[Bug]: Isolated cron agentTurn jobs hang until timeout — providers reachable, interactive sessions unaffected #42464

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cron): use per-attempt abort controller so model fallback chain survives timeout#38149

fix(cron): use per-attempt abort controller so model fallback chain survives timeout#38149
Sid-Qin wants to merge 1 commit intoopenclaw:mainfrom
Sid-Qin:fix/37505-cron-timeout-per-attempt-abort

Sid-Qin commented Mar 6, 2026

Uh oh!

aisle-research-bot Bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Mar 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if (abortSignal && !abortSignal.aborted) {
		abortSignal.addEventListener("abort", forwardAbort, { once: true });

Uh oh!

Conversation

Sid-Qin commented Mar 6, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Uh oh!

aisle-research-bot Bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Aisle Security Analysis

1. 🔵 AbortSignal listener leak and ignored pre-aborted signal in runCronIsolatedAgentTurn attempt runner

Description

Recommendation

Uh oh!

greptile-apps Bot commented Mar 6, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aisle-research-bot Bot commented Mar 6, 2026 •

edited

Loading