feat: Deleting scheduled tasks also clears tasks in the queue. by Lutz2015 · Pull Request #64110 · openclaw/openclaw

Lutz2015 · 2026-04-10T05:26:49Z

Summary

Problem: Deleting a scheduled task did not stop already queued executions, so Feishu channel messages could still be delivered after the task was deleted.
Why it matters: This caused unexpected post-deletion notifications, inconsistent scheduler behavior, and reduced user trust because a deleted task was expected to stop all future deliveries.
What changed: When a scheduled task is deleted, the system now also removes its queued/pending executions to prevent any further Feishu pushes from that task.
What did NOT change (scope boundary): This PR does not change scheduling semantics, delivery logic for active tasks, retry policy, or non-related channel behavior beyond preventing queued executions from firing after deletion.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: Scheduled task deletion removed the task definition, but did not clean up executions that had already been queued by the scheduler.
Missing detection / guardrail: There was no deletion-time cleanup for pending/queued jobs tied to the deleted schedule, and no regression coverage verifying that deleted tasks stop downstream channel delivery.
Contributing context (if known): The scheduler and delivery pipeline were decoupled, so once a job entered the queue it could still be consumed and delivered even after the parent schedule was deleted.

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: Scheduler deletion / queued job cleanup test for scheduled-task lifecycle.
Scenario the test should lock in: Create a scheduled task, allow one or more executions to enter the queue, delete the task, then verify queued executions are removed and no Feishu message is sent afterward.
Why this is the smallest reliable guardrail: The bug is caused by interaction between schedule deletion and the queued execution pipeline, so an integration-level test is the smallest reliable coverage.
Existing test that already covers this (if any): None known.
If no new test is added, why not: If no automated test is included in this PR, the cleanup behavior was verified manually against the scheduler queue and Feishu delivery result.

User-visible / Behavior Changes

Deleting a scheduled task now also cancels/removes its queued pending executions.
After deletion, Feishu channel messages from that deleted task should no longer be delivered.
No change for active, non-deleted scheduled tasks.

Before:
[delete scheduled task] -> [task definition removed]
                           [queued job remains] -> [Feishu push still sent]

After:
[delete scheduled task] -> [task definition removed]
                           -> [queued jobs removed]
                           -> [no further Feishu push]

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: macOS / Linux
Runtime/container: project default runtime
Model/provider: N/A
Integration/channel (if any): Feishu
Relevant config (redacted): scheduler enabled, Feishu delivery configured

Steps

Create a scheduled task configured to send messages to the Feishu channel.
Wait until at least one execution from that task is queued/pending.
Delete the scheduled task before the queued execution is consumed.
Observe whether any Feishu message is still delivered from the deleted task.

Expected

After deleting the scheduled task, any queued/pending executions associated with it are also removed.
No Feishu message is delivered from that deleted task.

Actual

Before this fix, deleting the scheduled task removed only the task definition.
Queued executions still ran and Feishu messages were still delivered.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios:
- Created a scheduled task with Feishu delivery.
- Confirmed executions could enter the queue.
- Deleted the scheduled task.
- Verified queued executions were cleared and no further Feishu message was sent.
Edge cases checked:
-Deleting a task with no queued executions.
-Deleting a task with queued but not yet consumed executions.
What you did not verify:
- Full cross-channel behavior for all integrations.
- Large-scale concurrency/performance behavior under heavy queue load.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)
If yes, exact upgrade steps: N/A

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

Risks and Mitigations

Risk: Cleanup logic could remove the wrong queued jobs if task-to-queue mapping is too broad.
Mitigation: Restrict deletion cleanup to queued executions explicitly associated with the deleted scheduled task ID.
Risk: Race condition between deletion and queue consumption may allow an in-flight job to execute.
Mitigation: Best-effort queue cleanup plus validation before delivery/execution if task existence is checked downstream.

greptile-apps · 2026-04-10T05:30:36Z

Greptile Summary

This PR adds a jobExists check after executeJobCoreWithTimeout returns in both the timed (runDueJob in timer.ts) and manual (finishPreparedManualRun in ops.ts) execution paths, marking results as "skipped" when the job was deleted during execution.

The check is post-execution: by the time jobExists is called, executeJobCoreWithTimeout has already returned and any Feishu delivery has already occurred. The PR description's claim that "queued executions are removed and no Feishu message is sent afterward" does not match the implementation.
The timer-path check is redundant: applyOutcomeToStoredJob already handles deleted jobs gracefully (force-reload + discard if not found) without an extra lock acquisition.
The original return statement in runDueJob was commented out instead of deleted, leaving dead code in the committed diff.

Confidence Score: 2/5

Not safe to merge — the fix does not achieve its stated goal and includes commented-out dead code.

Two P1 logic findings: the jobExists check fires after execution and delivery has already completed, so it does not prevent post-deletion Feishu pushes as described; the timer-path check is redundant with existing handling in applyOutcomeToStoredJob. Commented-out dead code is also present.

Both src/cron/service/timer.ts and src/cron/service/ops.ts need attention — the jobExists call placement and the commented-out return block in timer.ts are the primary concerns.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/cron/service/timer.ts
Line: 730-736

Comment:
**Commented-out dead code should be removed**

The original `return` statement was commented out rather than deleted. This is the exact same value produced by the live `return` on line 744 when `jobExists` is true — leaving it here adds noise and misleads future readers into thinking there's an intentional alternate path.

```suggestion
        if (!(await jobExists(state, id))) {
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/cron/service/timer.ts
Line: 737-743

Comment:
**Post-execution check does not prevent Feishu delivery**

`jobExists` is called after `executeJobCoreWithTimeout` has already returned, meaning the job body (including any Feishu message send) has already completed. Returning `status: "skipped"` here only suppresses the post-run state update; it does not cancel or drop the message.

Additionally, this check is redundant for the timed path: `applyOutcomeToStoredJob` (called at line 788) already handles deleted jobs — it does `jobs.find(...)` after a force-reload, logs "job not found after forceReload, result discarded", and returns without persisting anything. The new `jobExists` call simply duplicates that guard while adding an extra lock acquisition and forced disk read on every job completion.

If the goal is to prevent delivery for jobs whose task was deleted *before they start*, the check should be inserted before `executeJobCoreWithTimeout` (inside `runDueJob`, after acquiring the lock to inspect whether the job is still present in the store).

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/cron/service/ops.ts
Line: 626-632

Comment:
**Same post-execution race: delivery already occurred**

Like the timer path, `jobExists` is checked only after `executeJobCoreWithTimeout` returns — the Feishu message (or whatever the job delivered) has already been sent. Overwriting `coreResult` with `{ status: "skipped" }` prevents the result from being persisted to state, but it does not undo the delivery.

Note that the locked block immediately below (line 643) already guards against this case: if the job was deleted, `state.store?.jobs.find(...)` returns `undefined` and the function returns early, so no double-write would occur even without the new check. The meaningful protection this change adds is suppressing the `tryFinishManualTaskRun` bookkeeping call on line 637 with a `"skipped"` status rather than the real outcome — the actual delivery behaviour is unchanged.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/cron/service/timer.ts
Line: 103-108

Comment:
**`forceReload: true` on every job completion adds unnecessary I/O**

`jobExists` always passes `forceReload: true`, which reads the store from disk on every call. It is invoked once per job completion on both the timed and manual paths. For installations with high-frequency cron jobs this accumulates into constant disk I/O even when no deletion is occurring. Consider checking the in-memory store first and only reloading if the job is not found (or accepting a small consistency window for this best-effort check).

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "feat: add jobExists" | Re-trigger Greptile}

greptile-apps · 2026-04-10T05:30:40Z

+        if (!(await jobExists(state, id))) {
+          state.deps.log.info(
+            { jobId: id, jobName: job.name },
+            "cron: dropping completed run for job removed during execution",
+          );
+          return { jobId: id, taskRunId, status: "skipped", startedAt, endedAt: state.deps.nowMs() };
+        }


Post-execution check does not prevent Feishu delivery

jobExists is called after executeJobCoreWithTimeout has already returned, meaning the job body (including any Feishu message send) has already completed. Returning status: "skipped" here only suppresses the post-run state update; it does not cancel or drop the message.

Additionally, this check is redundant for the timed path: applyOutcomeToStoredJob (called at line 788) already handles deleted jobs — it does jobs.find(...) after a force-reload, logs "job not found after forceReload, result discarded", and returns without persisting anything. The new jobExists call simply duplicates that guard while adding an extra lock acquisition and forced disk read on every job completion.

If the goal is to prevent delivery for jobs whose task was deleted before they start, the check should be inserted before executeJobCoreWithTimeout (inside runDueJob, after acquiring the lock to inspect whether the job is still present in the store).

Prompt To Fix With AI

This is a comment left during a code review. Path: src/cron/service/timer.ts Line: 737-743 Comment: **Post-execution check does not prevent Feishu delivery** `jobExists` is called after `executeJobCoreWithTimeout` has already returned, meaning the job body (including any Feishu message send) has already completed. Returning `status: "skipped"` here only suppresses the post-run state update; it does not cancel or drop the message. Additionally, this check is redundant for the timed path: `applyOutcomeToStoredJob` (called at line 788) already handles deleted jobs — it does `jobs.find(...)` after a force-reload, logs "job not found after forceReload, result discarded", and returns without persisting anything. The new `jobExists` call simply duplicates that guard while adding an extra lock acquisition and forced disk read on every job completion. If the goal is to prevent delivery for jobs whose task was deleted *before they start*, the check should be inserted before `executeJobCoreWithTimeout` (inside `runDueJob`, after acquiring the lock to inspect whether the job is still present in the store). How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 867caa689e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-10T05:31:00Z

+        //   startedAt,
+        //   endedAt: state.deps.nowMs(),
+        // };
+        if (!(await jobExists(state, id))) {


Guard deleted jobs before executing due runs

The deletion check happens only after executeJobCoreWithTimeout has finished, so a job removed while waiting in the timer worker queue still runs and can emit side effects (for example, executeMainSessionCronJob enqueues the outbound system event during execution). In that scenario this branch relabels the run as skipped, but the message/work already happened, so deleting a scheduled task does not actually stop pending queued executions.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-10T05:31:00Z

    coreResult = await executeJobCoreWithTimeout(state, executionJob);
+    if (await jobExists(state, jobId).then((exists) => !exists)) {


Guard manual runs before executing deleted jobs

This manual-run path has the same ordering issue: it executes executeJobCoreWithTimeout first and only then checks jobExists. If a manual run is deleted during execution, the side effects still occur and then the result is overwritten to skipped, which means deletion does not prevent in-flight queued/manual delivery and the recorded outcome no longer reflects what actually ran.

Useful? React with 👍 / 👎.

akai-shuuichi · 2026-04-10T06:26:08Z

关闭 PR - 由 Lukin 指令执行

I think this PR is very useful. Here's why

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2bb4547b9d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-10T10:33:56Z

+      // Pre-execution check: if the job was removed between the due-job
+      // collection and now (e.g. by a concurrent `remove()` call), skip
+      // execution entirely so no delivery occurs.
+      if (!(await jobExists(state, id))) {


Handle job-existence reload errors in timer workers

runDueJob now calls jobExists before entering the try/catch, but jobExists force-reloads the cron store and can throw on transient read/parse failures (for example during concurrent external file rewrites). In that case the worker rejects, Promise.all(workers) aborts the whole tick, and outcomes from other jobs in the same tick may never be applied, leaving stale running/active state until a later recovery. Treat existence-check failures as per-job error outcomes instead of letting them fail the entire timer pass.

Useful? React with 👍 / 👎.

Lutz2015

update code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d856fc01f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-11T13:08:57Z

+      // Pre-execution check: if the job was removed between the due-job
+      // collection and now (e.g. by a concurrent `remove()` call), skip
+      // execution entirely so no delivery occurs.
+      if (!(await jobExists(state, id))) {


Catch existence-check failures inside runDueJob

runDueJob performs await jobExists(...) before entering its try/catch, so any transient store read/parse error from the forced reload rejects the worker promise immediately. Because workers are aggregated with Promise.all, one rejection aborts the tick and skips the outcome-application block, leaving due jobs with persisted runningAtMs markers from the collection phase but without corresponding completion updates; this can strand jobs in a pseudo-running state and prevent future executions until later repair. Fresh evidence here is that the new pre-execution guard introduced at this call site is now the only uncaught path in runDueJob before worker aggregation.

Useful? React with 👍 / 👎.

clawsweeper · 2026-04-28T22:16:29Z

Codex review: needs changes before merge.

What this changes:

The PR adds a cron jobExists helper, pre-execution deletion checks for timed and manual cron runs, and regression tests for skipping removed scheduled jobs.

Required change before merge:

The remaining blockers are narrow and mechanical: handle jobExists failures inside the timer per-job outcome path and add the required changelog entry, then rerun focused cron validation.

Security review:

Security review cleared: Security review cleared: the diff is limited to cron service implementation and tests, with no concrete security or supply-chain regression found.

Review findings:

[P1] Catch existence-check failures inside runDueJob — src/cron/service/timer.ts:726
[P3] Add the required changelog entry — src/cron/service/ops.ts:625-628

Review details

Best possible solution:

Land a revised cron-service fix that checks deletion after reservation but before delivery side effects for both timer and manual queued runs, treats store reload failures as per-job outcomes, preserves existing post-execution removal semantics, adds focused regression coverage, and includes the required changelog entry.

Do we have a high-confidence way to reproduce the issue?

Yes. A high-confidence source-level reproduction is to collect or queue a cron job, remove its stored job before the worker starts, and observe that current main still executes the captured CronJob snapshot without a pre-execution store recheck.

Is this the best way to solve the issue?

No. The PR’s direction is the narrow maintainable fix, but the timer guard must be moved inside runDueJob’s per-job error handling or otherwise converted into a normal skipped/error outcome so one reload failure cannot abort the whole timer tick.

Full review comments:

[P1] Catch existence-check failures inside runDueJob — src/cron/service/timer.ts:726
The new jobExists guard runs before runDueJob enters its try/catch. If the forced reload throws, Promise.all(workers) rejects and the outcome-application block is skipped, so other completed jobs can keep persisted running markers instead of being finalized. Treat this check as a per-job skipped/error outcome.
Confidence: 0.88
[P3] Add the required changelog entry — src/cron/service/ops.ts:625-628
This changes user-visible cron deletion behavior, but the PR only updates cron source and tests. OpenClaw policy requires a single-line CHANGELOG.md entry for user-facing fixes before merge, with contributor credit such as Thanks @Lutz2015.
Confidence: 0.82

Overall correctness: patch is incorrect
Overall confidence: 0.88

Acceptance criteria:

pnpm test src/cron/service/timer.test.ts src/cron/service/ops.test.ts src/cron/service/timer.regression.test.ts src/cron/service/ops.regression.test.ts
pnpm check:changed

What I checked:

Current remove path only deletes the stored job: remove() loads the cron store, filters state.store.jobs, persists, rearms the timer, and emits removed; it does not cancel work already collected by the timer or queued in the cron command lane. (src/cron/service/ops.ts:412, a256745323ab)
Current timer execution uses collected snapshots: onTimer() collects runnable jobs, persists running markers, then each worker calls executeJobCoreWithTimeout(state, job) on the captured job without rechecking whether the job is still in the store before side effects start. (src/cron/service/timer.ts:839, a256745323ab)
Manual cron runs are queued before execution: The CLI docs state openclaw cron run returns once the manual run is queued; current enqueueRun() later invokes run(), and finishPreparedManualRun() executes the prepared job snapshot before checking the stored job under lock. Public docs: docs/cli/cron.md. (docs/cli/cron.md:93, a256745323ab)
Current post-execution handling is not cancellation: applyOutcomeToStoredJob() only handles missing jobs after execution has completed, including discard/finalization behavior after a force reload; that is after any delivery side effect has already happened. (src/cron/service/timer.ts:702, a256745323ab)
PR timer guard can reject outside per-job error handling: The PR adds if (!(await jobExists(state, id))) before runDueJob enters its existing try/catch; jobExists force-reloads the store and can throw, and workers are aggregated with Promise.all, so one reload failure can abort the whole timer tick before completed outcomes are applied. (src/cron/service/timer.ts:726, 6d856fc01f3d)
PR surface is cron source and tests only: The fetched PR file list contains src/cron/service/ops.ts, src/cron/service/timer.ts, src/cron/service/ops.test.ts, and src/cron/service/timer.test.ts; no workflows, dependency metadata, lockfiles, release scripts, generated/vendor code, or secrets handling are changed. (6d856fc01f3d)

Likely related people:

steipete: Local blame on the central cron timer/remove paths points to Peter Steinberger in the available checkout, and GitHub path history shows repeated recent cron timer, timeout, startup, and skipped-run maintenance commits. (role: recent maintainer; confidence: high; commits: 9300d48244ca, 61d53f98d314, 7877182b6f59; files: src/cron/service/timer.ts, src/cron/service/ops.ts, src/cron/service/timer.regression.test.ts)
jalehman: Recent merged cron self-removal work is directly adjacent to jobs being removed during execution and the regression-test area this PR exercises. (role: adjacent owner; confidence: medium; commits: 12c52963ea80; files: src/cron/service/timer.ts, src/cron/service/timer.regression.test.ts)
mbelinky: Recent cron scheduling and detached task lifecycle commits touched the same timer/manual execution and task-ledger behavior that can be stranded if a worker pass aborts. (role: adjacent owner; confidence: medium; commits: 0787266637fd, 190a4b48697b, c60282421524; files: src/cron/service/timer.ts, src/cron/service/ops.ts, src/cron/service/timer.regression.test.ts)

Remaining risk / open question:

The PR still needs targeted cron tests and the changed gate after the timer existence check is converted into a per-job outcome path.

Codex review notes: model gpt-5.5, reasoning high; reviewed against a256745323ab.

zhouhui9 added 2 commits April 10, 2026 11:42

feat: Deleting scheduled tasks also clears tasks in the queue.

7cea4c3

feat: add jobExists

867caa6

openclaw-barnacle Bot added the size: XS label Apr 10, 2026

greptile-apps Bot reviewed Apr 10, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 10, 2026

View reviewed changes

feat: add jobExists

ca5827c

openclaw-barnacle Bot added size: L and removed size: XS labels Apr 10, 2026

zhouhui9 added 2 commits April 10, 2026 18:29

feat: add jobExists

2bb4547

feat: add jobExists

11a9e60

chatgpt-codex-connector Bot reviewed Apr 10, 2026

View reviewed changes

Merge branch 'openclaw:main' into fix/cron-job-remove

6d856fc

Lutz2015 commented Apr 11, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 11, 2026

View reviewed changes

BingqingLyu mentioned this pull request Apr 28, 2026

feat: Deleting scheduled tasks also clears tasks in the queue. BingqingLyu/openclaw#2249

Open

25 tasks

BradGroux mentioned this pull request Apr 29, 2026

WORKING: All Microsoft Issues and PRs (refresh) #74163

Draft

		coreResult = await executeJobCoreWithTimeout(state, executionJob);
		if (await jobExists(state, jobId).then((exists) => !exists)) {

Uh oh!

Conversation

Lutz2015 commented Apr 10, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Risks and Mitigations

Uh oh!

greptile-apps Bot commented Apr 10, 2026

Greptile Summary

Confidence Score: 2/5

Uh oh!

Uh oh!

greptile-apps Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

akai-shuuichi commented Apr 10, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Lutz2015 left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

clawsweeper Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented Apr 28, 2026 •

edited

Loading