Skip to content

fix(cron): cancel timed-out runs before side effects#22411

Merged
Takhoffman merged 2 commits intomainfrom
fix-cron-timeout-side-effects
Feb 22, 2026
Merged

fix(cron): cancel timed-out runs before side effects#22411
Takhoffman merged 2 commits intomainfrom
fix-cron-timeout-side-effects

Conversation

@Takhoffman
Copy link
Copy Markdown
Contributor

@Takhoffman Takhoffman commented Feb 21, 2026

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: Cron timer timeouts marked runs as failed, but isolated execution could still finish and emit downstream side effects (announce/main-session summary).
  • Why it matters: This can produce inconsistent outcomes (error state + late delivery) and duplicate/confusing notifications.
  • What changed: Added abort-signal plumbing from timer timeout into cron isolated execution path and guard checks before post-run side effects; wired signal through gateway cron service.
  • What did NOT change (scope boundary): No scheduler concurrency behavior changes; no CLI/API surface changes beyond internal optional signal threading.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • Timed-out cron isolated runs are now prevented from emitting late follow-up side effects after timeout.
  • Reduces cases where users see timeout/error while delayed follow-up delivery still appears later.

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS (darwin-arm64)
  • Runtime/container: Node + Vitest
  • Model/provider: N/A (unit/integration tests)
  • Integration/channel (if any): Cron service internals
  • Relevant config (redacted): default test config

Steps

  1. Create an isolated cron job with a very small timeout and delayed runner completion.
  2. Trigger scheduler tick.
  3. Confirm timed-out status and absence of follow-up enqueue side effects.

Expected

  • Run is marked timed out/error and does not emit late follow-up side effects.

Actual

  • Matches expected after fix.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
  • pnpm vitest run src/cron/service.issue-regressions.test.ts src/cron/service.rearm-timer-when-running.test.ts src/cron/service.restart-catchup.test.ts --maxWorkers=1
  • pnpm vitest run src/cron/service.delivery-plan.test.ts src/cron/service.runs-one-shot-main-job-disables-it.test.ts --maxWorkers=1
  • Edge cases checked:
  • timeoutSeconds=0 semantics remain intact
  • existing timer rearm/restart catch-up behavior still passes
  • What you did not verify:
  • Full live provider/channel end-to-end with real network delivery

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly:
  • Revert this PR commit from cron timeout-cancellation branch.
  • Files/config to restore:
  • src/cron/service/state.ts, src/cron/service/timer.ts, src/cron/isolated-agent/run.ts, src/gateway/server-cron.ts, src/cron/service.issue-regressions.test.ts
  • Known bad symptoms reviewers should watch for:
  • Timed-out jobs still emitting late announce/main-session summaries.

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk:
    • Over-cancellation could suppress legitimate post-run side effects in borderline timeout races.
    • Mitigation:
    • Abort checks are limited to timeout path and covered with a dedicated regression test for side-effect suppression after timeout.

Greptile Summary

Adds abort signal plumbing to cancel timed-out cron jobs before they emit downstream side effects (announce flows and main-session summaries). The implementation threads an AbortSignal from the timer timeout through executeJobCore, runIsolatedAgentJob, runCronIsolatedAgentTurn, and into deliverOutboundPayloads and runSubagentAnnounceFlow. Guard checks (if (signal?.aborted)) are placed before all side-effect operations to prevent late deliveries after timeout.

Key changes:

  • timer.ts: Creates AbortController on timeout, passes signal to executeJobCore, adds abort checks in main-session wake loop and after isolated job completion
  • isolated-agent/run.ts: Threads signal through execution, adds abort guards before structured delivery and announce flow
  • subagent-announce.ts: Adds abort checks at the start of delivery functions to short-circuit when already timed out
  • state.ts and server-cron.ts: Updates type signatures to accept optional signal parameter

The PR includes comprehensive regression tests covering timeout side-effect suppression for both isolated execution and structured outbound delivery paths.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The implementation is well-designed with proper abort signal plumbing throughout the execution chain, strategic guard checks before all side effects, thorough test coverage including regression tests for the specific bug, and backwards compatibility (signal parameter is optional everywhere). The changes are scoped to the timeout cancellation path only and include proper error normalization.
  • No files require special attention

Last reviewed commit: e4d9061

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: S maintainer Maintainer-authored PR labels Feb 21, 2026
@Takhoffman Takhoffman marked this pull request as ready for review February 21, 2026 04:21
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 21, 2026

Additional Comments (2)

src/cron/isolated-agent/run.ts
Potential race: if timeout fires while deliverOutboundPayloads is executing, delivery completes despite timeout. Add abort check before this call:

        if (payloadsForDelivery.length > 0) {
          if (isAborted()) {
            return withRunSession({ status: "error", error: abortReason(), ...telemetry });
          }
          const deliveryResults = await deliverOutboundPayloads({
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/isolated-agent/run.ts
Line: 654

Comment:
Potential race: if timeout fires while `deliverOutboundPayloads` is executing, delivery completes despite timeout. Add abort check before this call:

```suggestion
        if (payloadsForDelivery.length > 0) {
          if (isAborted()) {
            return withRunSession({ status: "error", error: abortReason(), ...telemetry });
          }
          const deliveryResults = await deliverOutboundPayloads({
```

How can I resolve this? If you propose a fix, please make it concise.

src/cron/isolated-agent/run.ts
Potential race: if timeout fires while runSubagentAnnounceFlow is executing, announce completes despite timeout. Add abort check before this call:

      try {
        if (isAborted()) {
          return withRunSession({ status: "error", error: abortReason(), ...telemetry });
        }
        const didAnnounce = await runSubagentAnnounceFlow({
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/isolated-agent/run.ts
Line: 747

Comment:
Potential race: if timeout fires while `runSubagentAnnounceFlow` is executing, announce completes despite timeout. Add abort check before this call:

```suggestion
      try {
        if (isAborted()) {
          return withRunSession({ status: "error", error: abortReason(), ...telemetry });
        }
        const didAnnounce = await runSubagentAnnounceFlow({
```

How can I resolve this? If you propose a fix, please make it concise.

@Takhoffman Takhoffman marked this pull request as draft February 21, 2026 04:35
@Takhoffman Takhoffman force-pushed the fix-cron-timeout-side-effects branch from 76fd2f8 to e4d9061 Compare February 21, 2026 04:40
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: M and removed size: S labels Feb 21, 2026
@Takhoffman Takhoffman marked this pull request as ready for review February 22, 2026 21:28
@Takhoffman Takhoffman force-pushed the fix-cron-timeout-side-effects branch from e4d9061 to 000d28c Compare February 22, 2026 21:34
@openclaw-barnacle openclaw-barnacle Bot removed the gateway Gateway runtime label Feb 22, 2026
@Takhoffman Takhoffman force-pushed the fix-cron-timeout-side-effects branch from 000d28c to 0b922aa Compare February 22, 2026 21:36
@Takhoffman Takhoffman force-pushed the fix-cron-timeout-side-effects branch from 88e6a50 to 7cc1442 Compare February 22, 2026 21:42
@Takhoffman Takhoffman merged commit 556af3f into main Feb 22, 2026
24 of 25 checks passed
@Takhoffman Takhoffman deleted the fix-cron-timeout-side-effects branch February 22, 2026 21:45
@Takhoffman
Copy link
Copy Markdown
Contributor Author

PR #22411 - fix(cron): cancel timed-out runs before side effects (#22411)

Merged via squash.

  • Merge commit: 556af3f
  • Verified: pnpm check
  • Verified: pnpm vitest run src/memory/qmd-manager.test.ts src/cron/service.issue-regressions.test.ts src/cron/isolated-agent.delivers-response-has-heartbeat-ok-but-includes.test.ts --maxWorkers=1
  • Changes made:
    • Rebased PR branch multiple times to latest main and resolved conflicts in cron timeout cancellation paths.
    • Preserved abort propagation through isolated cron execution (abortSignal compatibility + timeout error semantics).
    • Added/kept regression coverage for timeout side-effect suppression and manual timeout runs.
    • Fixed unrelated CI type regressions in tests so pnpm check passes (src/agents/pi-tools.read.workspace-root-guard.test.ts, src/infra/outbound/outbound-send-service.test.ts, src/node-host/invoke-system-run.test.ts).
  • Why these changes were made:
    • Keep this PR mergeable and behavior-safe while main moved quickly, and remove non-PR typecheck blockers preventing CI green.

Thanks @Takhoffman!

carlosrivera pushed a commit to myascendai/meshiclaw that referenced this pull request Feb 23, 2026
… thanks @Takhoffman

Verified:
- pnpm check
- pnpm vitest run src/memory/qmd-manager.test.ts src/cron/service.issue-regressions.test.ts src/cron/isolated-agent.delivers-response-has-heartbeat-ok-but-includes.test.ts --maxWorkers=1

Co-authored-by: Takhoffman <781889+Takhoffman@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
gabrielkoo pushed a commit to gabrielkoo/openclaw that referenced this pull request Feb 23, 2026
… thanks @Takhoffman

Verified:
- pnpm check
- pnpm vitest run src/memory/qmd-manager.test.ts src/cron/service.issue-regressions.test.ts src/cron/isolated-agent.delivers-response-has-heartbeat-ok-but-includes.test.ts --maxWorkers=1

Co-authored-by: Takhoffman <781889+Takhoffman@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
mreedr pushed a commit to mreedr/openclaw-custom that referenced this pull request Feb 24, 2026
… thanks @Takhoffman

Verified:
- pnpm check
- pnpm vitest run src/memory/qmd-manager.test.ts src/cron/service.issue-regressions.test.ts src/cron/isolated-agent.delivers-response-has-heartbeat-ok-but-includes.test.ts --maxWorkers=1

Co-authored-by: Takhoffman <781889+Takhoffman@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
Yuyang-0 pushed a commit to Yuyang-0/openclaw that referenced this pull request Feb 26, 2026
… thanks @Takhoffman

Verified:
- pnpm check
- pnpm vitest run src/memory/qmd-manager.test.ts src/cron/service.issue-regressions.test.ts src/cron/isolated-agent.delivers-response-has-heartbeat-ok-but-includes.test.ts --maxWorkers=1

Co-authored-by: Takhoffman <781889+Takhoffman@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
mylukin pushed a commit to mylukin/openclaw that referenced this pull request Feb 26, 2026
… thanks @Takhoffman

Verified:
- pnpm check
- pnpm vitest run src/memory/qmd-manager.test.ts src/cron/service.issue-regressions.test.ts src/cron/isolated-agent.delivers-response-has-heartbeat-ok-but-includes.test.ts --maxWorkers=1

Co-authored-by: Takhoffman <781889+Takhoffman@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
r4jiv007 pushed a commit to r4jiv007/openclaw that referenced this pull request Feb 28, 2026
… thanks @Takhoffman

Verified:
- pnpm check
- pnpm vitest run src/memory/qmd-manager.test.ts src/cron/service.issue-regressions.test.ts src/cron/isolated-agent.delivers-response-has-heartbeat-ok-but-includes.test.ts --maxWorkers=1

Co-authored-by: Takhoffman <781889+Takhoffman@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
zooqueen pushed a commit to hanzoai/bot that referenced this pull request Mar 6, 2026
… thanks @Takhoffman

Verified:
- pnpm check
- pnpm vitest run src/memory/qmd-manager.test.ts src/cron/service.issue-regressions.test.ts src/cron/isolated-agent.delivers-response-has-heartbeat-ok-but-includes.test.ts --maxWorkers=1

Co-authored-by: Takhoffman <781889+Takhoffman@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
… thanks @Takhoffman

Verified:
- pnpm check
- pnpm vitest run src/memory/qmd-manager.test.ts src/cron/service.issue-regressions.test.ts src/cron/isolated-agent.delivers-response-has-heartbeat-ok-but-includes.test.ts --maxWorkers=1

Co-authored-by: Takhoffman <781889+Takhoffman@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
… thanks @Takhoffman

Verified:
- pnpm check
- pnpm vitest run src/memory/qmd-manager.test.ts src/cron/service.issue-regressions.test.ts src/cron/isolated-agent.delivers-response-has-heartbeat-ok-but-includes.test.ts --maxWorkers=1

Co-authored-by: Takhoffman <781889+Takhoffman@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
… thanks @Takhoffman

Verified:
- pnpm check
- pnpm vitest run src/memory/qmd-manager.test.ts src/cron/service.issue-regressions.test.ts src/cron/isolated-agent.delivers-response-has-heartbeat-ok-but-includes.test.ts --maxWorkers=1

Co-authored-by: Takhoffman <781889+Takhoffman@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant