Skip to content

fix(cron): cancel active cron task runs#90671

Closed
849261680 wants to merge 1 commit into
openclaw:mainfrom
849261680:codex/fix-90630-cron-task-cancel
Closed

fix(cron): cancel active cron task runs#90671
849261680 wants to merge 1 commit into
openclaw:mainfrom
849261680:codex/fix-90630-cron-task-cancel

Conversation

@849261680

@849261680 849261680 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #90630.

  • routes openclaw tasks cancel for cron task records through the live Gateway tasks.cancel method, because the active cron abort signal is process-owned
  • registers cancellable handles for active isolated cron agent runs and finalizes the existing task record through the task registry cancellation path
  • keeps main-session cron jobs explicitly non-cancellable from the cron task row, because those jobs enqueue work into the target session rather than owning an isolated run
  • preserves the cron timeout watchdog after operator cancellation so a runner that ignores abort still gets timeout cleanup

This is an alternative focused fix to #90666 and #90669. #90669 only routes cron through the ACP branch. #90666 is closer, but it couples cancellation state to the duplicate-run active job marker and returns early from the timeout trigger after operator abort, which can remove the watchdog cleanup path for runners that ignore abort.

Verification

  • node scripts/run-vitest.mjs src/tasks/task-executor.test.ts src/gateway/server-methods/tasks.test.ts src/cron/service/timer.test.ts src/commands/tasks.test.ts
  • node scripts/run-vitest.mjs src/cli/program/register.status-health-sessions.test.ts src/commands/tasks.test.ts
  • pnpm exec oxfmt --check --threads=1 src/tasks/cron-task-cancel.ts src/tasks/task-registry.ts src/cron/service/timer.ts src/cron/service/ops.ts src/commands/tasks.ts src/tasks/task-executor.test.ts src/gateway/server-methods/tasks.test.ts src/cron/service/timer.test.ts src/commands/tasks.test.ts
  • git diff --check
  • .agents/skills/autoreview/scripts/autoreview --mode local

Real behavior proof

Behavior addressed: a running isolated cron task listed by the task ledger can now be cancelled through the Gateway-owned cron runtime instead of falling through to Task runtime does not support cancellation yet.

Real environment tested: local OpenClaw source checkout on this branch, exercising the real task ledger, active cron cancellation handle, and cancellation executor in a non-test Node process with an isolated OPENCLAW_STATE_DIR.

Exact steps or command run after this patch: OPENCLAW_STATE_DIR=$(mktemp -d /tmp/openclaw-cron-proof-XXXXXX) node --import tsx -e '<script creating a running cron task row, registering its active cron abort handle, then calling cancelDetachedTaskRunById>'.

Evidence after fix:

stateDir=/tmp/openclaw-cron-proof-gmSRfe
before taskId=5a77ee32-509f-48de-ac6d-3cdf5b6c622b runtime=cron status=running runId=cron:proof-cron-job:live
cancel found=true cancelled=true reason=
signal aborted=true reason=Cancelled by operator.
after taskId=5a77ee32-509f-48de-ac6d-3cdf5b6c622b runtime=cron status=cancelled error=Cancelled by operator.

Observed result after fix: the running cron task row was cancelled, the underlying AbortSignal was aborted with Cancelled by operator., and the task ledger status became cancelled with the same terminal error. Focused regression shards also passed and cover Gateway cancellation of an active cron handle, childless active cron task cancellation, explicit non-cancellable main-session cron behavior, CLI Gateway routing, and watchdog timeout cleanup after operator abort.

What was not tested: a live installed openclaw gateway smoke on this laptop was blocked by an already-running local Gateway/dist startup timeout, and Testbox-through-Crabbox was blocked by the local crabbox binary failing basic sanity checks before lease creation.

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime commands Command implementations size: M triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels Jun 5, 2026
@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 5, 2026, 9:08 AM ET / 13:08 UTC.

Summary
The PR adds a process-local active cron cancellation registry, routes CLI cron cancellations through Gateway tasks.cancel, passes abort controllers into cron runs, and adds regression coverage for scheduled/manual cron cancellation and watchdog cleanup.

PR surface: Source +118, Tests +297. Total +415 across 9 files.

Reproducibility: yes. by source inspection: current main routes cron task cancellation through the generic non-CLI branch and falls through to the unsupported-runtime response. I did not run a live current-main cron job in this read-only review.

Review metrics: none identified.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Run a live Gateway smoke that cancels an active isolated cron task through openclaw tasks cancel.

Risk before merge

  • [P1] The PR body says a full installed openclaw gateway smoke and Crabbox/Testbox proof were blocked, so the exact live CLI-to-running-Gateway cancellation path still needs maintainer-side verification before landing.

Maintainer options:

  1. Run a live Gateway cancel smoke (recommended)
    Before landing, verify an active isolated cron task can be cancelled through openclaw tasks cancel against a running Gateway, covering the process-bound RPC path not fully proven in the PR body.
  2. Accept the focused proof gap
    Maintainers can accept the supplied non-test Node-process proof plus focused regression tests if the live Gateway smoke remains blocked and the remaining risk is acceptable.
  3. Use a sibling fix instead
    If maintainers prefer the design in fix(cron): cancel active cron task runs #90666 or another branch, pause or close this PR after the chosen implementation lands.

Next step before merge

  • [P2] No automated code repair is needed; maintainers should choose the canonical fix among the open sibling PRs and decide whether the live Gateway smoke gap must be closed before merge.

Security
Cleared: The diff does not add dependencies, workflow changes, credential handling, downloads, or new external code execution paths beyond existing Gateway RPC and in-process abort handling.

Review details

Best possible solution:

Land one focused Gateway-owned cron cancellation implementation that aborts active isolated cron runs, leaves main-session cron rows non-cancellable with clear operator guidance, preserves timeout cleanup, and then close the sibling attempts after merge.

Do we have a high-confidence way to reproduce the issue?

Yes by source inspection: current main routes cron task cancellation through the generic non-CLI branch and falls through to the unsupported-runtime response. I did not run a live current-main cron job in this read-only review.

Is this the best way to solve the issue?

Yes, this appears to be the best narrow fix: the Gateway owns the live cron AbortController, so routing CLI cancellation to Gateway and registering process-local cron handles is cleaner than treating cron as an ACP child-session branch. A live Gateway smoke would still reduce merge risk.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 520992a1defd.

Label changes

Label changes:

  • add P2: This is a normal-priority user-facing bug fix for cancelling runaway cron task records with limited blast radius.
  • add merge-risk: 🚨 availability: The diff changes active cron abort and timeout cleanup behavior, where a missed live-process edge could leave cron runs stuck or incorrectly cleaned up.
  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal output from a non-test Node process using an isolated state dir, a real task ledger row, active cron abort handle, and cancellation executor; it does not include the full installed Gateway smoke.
  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes after-fix terminal output from a non-test Node process using an isolated state dir, a real task ledger row, active cron abort handle, and cancellation executor; it does not include the full installed Gateway smoke.

Label justifications:

  • P2: This is a normal-priority user-facing bug fix for cancelling runaway cron task records with limited blast radius.
  • merge-risk: 🚨 availability: The diff changes active cron abort and timeout cleanup behavior, where a missed live-process edge could leave cron runs stuck or incorrectly cleaned up.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes after-fix terminal output from a non-test Node process using an isolated state dir, a real task ledger row, active cron abort handle, and cancellation executor; it does not include the full installed Gateway smoke.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal output from a non-test Node process using an isolated state dir, a real task ledger row, active cron abort handle, and cancellation executor; it does not include the full installed Gateway smoke.
Evidence reviewed

PR surface:

Source +118, Tests +297. Total +415 across 9 files.

View PR surface stats
Area Files Added Removed Net
Source 5 134 16 +118
Tests 4 299 2 +297
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 9 433 18 +415

What I checked:

  • Current-main failure path: On current main, cancelTaskById only handles CLI, ACP, subagent, and childless Codex-native subagent task cases; a cron task with a child session falls through to Task runtime does not support cancellation yet.. (src/tasks/task-registry.ts:2082, 520992a1defd)
  • Gateway-owned cancellation implementation: The merge result adds cron-specific cancellation before the existing child-session runtime branches, aborting a registered active cron handle by task runId and otherwise returning a clear Gateway ownership/main-session explanation. (src/tasks/task-registry.ts:2084, 2b70f448a451)
  • Process-local active handle registry: The new helper keeps only live process-local abort controllers keyed by cron run id, with release cleanup that avoids deleting a newer controller for the same run id. (src/tasks/cron-task-cancel.ts:7, 2b70f448a451)
  • CLI-to-Gateway route: The tasks CLI now routes cron cancellation through live Gateway RPC before falling back to the local detached task executor for non-cron runtimes, matching the process-owned abort-controller boundary. (src/commands/tasks.ts:89, 2b70f448a451)
  • Cron runtime registration: Scheduled, startup catch-up, and manual cron execution paths register active isolated cron runs with the same abort controller passed to executeJobCoreWithTimeout; main-session cron rows intentionally do not register as cancellable. (src/cron/service/timer.ts:847, 2b70f448a451)
  • Regression coverage: The PR adds focused tests for CLI Gateway routing, Gateway tasks.cancel, active isolated cron cancellation, non-cancellable main-session cron rows, and timeout cleanup after operator abort. (src/cron/service/timer.test.ts:278, 2b70f448a451)

Likely related people:

  • Vincent Koc: Recent history shows substantial task-ledger, task-flow, cancellation seam, and cron detached dispatch work in the affected task/runtime paths. (role: feature-history owner; confidence: high; commits: 53bcd5769e, e57b3618fc, c52fac836c; files: src/tasks/task-registry.ts, src/tasks/task-executor.ts, src/cron/service/timer.ts)
  • Peter Steinberger: Recent history shows repeated cron/task reconciliation, command-test, and async task status work across the touched command, cron, and task files. (role: recent area contributor; confidence: medium; commits: 7d1575b5df, 5a42355d54, 310b5e4f6a; files: src/cron/service/timer.ts, src/commands/tasks.ts, src/tasks/task-registry.ts)
  • Mariano: Recent history includes TaskFlow substrate work and cron next-run/backoff fixes near the same cron and task ownership boundary. (role: adjacent cron/task contributor; confidence: medium; commits: 8bdca2323d, 2fa4c7cc61, 190a4b4869; files: src/cron/service/timer.ts, src/tasks/task-registry.ts)
  • Neerav Makwana: History shows the existing CLI task-cancel behavior for stuck background tasks was added in nearby command/task cancellation paths. (role: adjacent cancellation contributor; confidence: medium; commits: 7f714609f7; files: src/commands/tasks.ts, src/tasks/task-registry.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels Jun 5, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels Jun 5, 2026
@849261680

Copy link
Copy Markdown
Contributor Author

Closing this PR as superseded by current main.

While trying to rebase it, the conflicts were all in the cron cancellation implementation surfaces. Current main already contains the same active cron task cancellation path plus follow-up fixes:

  • c3cdd4971b fix(cron): cancel active cron task runs
  • c13802c912 fix(cron): preserve timeout cleanup after cancel
  • 372f85d368 fix: avoid cron cancel runtime cycle
  • 93313c95a5 fix: preserve cron timeout terminal state
  • 24196e05f5 fix: unwind timeout-disabled cron cancellation
  • 9082233a43 fix: unblock timed cron cancellation

The current main implementation has the process-local cron cancellation handle in src/tasks/cron-task-cancel.ts and the cron runtime cancellation branch in src/tasks/task-registry.ts, so keeping this PR open would mean replaying a stale overlapping patch rather than landing the canonical implementation.

Issue #90630 is still open; maintainers can close it separately if the current main behavior is accepted as the canonical fix.

@849261680 849261680 closed this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands Command implementations gateway Gateway runtime merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Cron task appears in task ledger but cannot be cancelled

1 participant