Skip to content

fix(cron): prevent jobs from skipping runs after timer fires#9589

Closed
rodrigouroz wants to merge 3 commits intoopenclaw:mainfrom
rodrigouroz:fix/cron-stale-next-run
Closed

fix(cron): prevent jobs from skipping runs after timer fires#9589
rodrigouroz wants to merge 3 commits intoopenclaw:mainfrom
rodrigouroz:fix/cron-stale-next-run

Conversation

@rodrigouroz
Copy link
Contributor

@rodrigouroz rodrigouroz commented Feb 5, 2026

Summary

When the cron timer fired, onTimer() would call ensureLoaded() which recomputed nextRunAtMs before runDueJobs() checked for due jobs.

This caused jobs to skip their scheduled run:

  1. Timer fires at 06:00 (job scheduled for 06:00)
  2. ensureLoaded() recomputes nextRunAtMs from current time
  3. Croner returns next occurrence: tomorrow 06:00
  4. runDueJobs() sees nextRunAtMs (tomorrow) > now → skips execution

Fix

Skip recomputeNextRuns in ensureLoaded when called from onTimer, then recompute after runDueJobs executes the due jobs.

Testing

  • All existing cron tests pass (56 tests)
  • Manually verified with test cron jobs

Fixes #8424

Greptile Overview

Greptile Summary

This PR changes the cron timer tick flow to avoid skipping runs that become due exactly at the timer boundary. Specifically:

  • ensureLoaded() (src/cron/service/store.ts) gains an option (skipRecompute) to avoid calling recomputeNextRuns() immediately after reloading the cron store.
  • onTimer() (src/cron/service/timer.ts) now reloads the store with skipRecompute: true, runs runDueJobs() against the persisted nextRunAtMs values, then recomputes next runs and persists.
  • recomputeNextRuns() (src/cron/service/jobs.ts) adds a defensive warning + second-pass recalculation if computeJobNextRunAtMs() returns a timestamp still in the past.

These changes fit into the existing cron service design where most API operations (src/cron/service/ops.ts) work against an in-memory store, while the timer tick path force-reloads from disk to pick up external changes before deciding what is due to run.

Confidence Score: 3/5

  • This PR is likely correct in intent, but the post-run global recomputation could reintroduce scheduling edge cases for jobs that just executed.
  • Core change (skip recompute before due-check) matches the described bug, but the new unconditional recomputeNextRuns() after executing due jobs can overwrite per-job nextRunAtMs updates and may cause re-running or schedule drift depending on computeNextRunAtMs semantics. Unable to run tests in this environment (no node/pnpm).
  • src/cron/service/timer.ts; src/cron/service/jobs.ts

When jobs have nextRunAtMs in the past (due to missed runs while gateway was down),
the scheduler could get stuck. This adds:

1. Defensive check in recomputeNextRuns: if computed nextRunAtMs is still in the past,
   log a warning and force recalculation from now+1s to ensure we get a future slot.

2. Info logging in armTimer when next wake time is in the past, to help debug
   any remaining edge cases.

Fixes openclaw#8424
When the cron timer fired, onTimer() would call ensureLoaded() which
recomputed nextRunAtMs BEFORE runDueJobs() checked for due jobs.

This caused jobs to skip their scheduled run:
1. Timer fires at 06:00 (job scheduled for 06:00)
2. ensureLoaded() recomputes nextRunAtMs from current time (06:00)
3. Croner returns next occurrence: tomorrow 06:00
4. runDueJobs() sees nextRunAtMs (tomorrow) > now → skips execution

Fix: Skip recomputeNextRuns in ensureLoaded when called from onTimer,
then recompute AFTER runDueJobs executes the due jobs.

Fixes openclaw#8424
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 57 to 59
// Now recompute next runs for jobs that weren't executed (e.g., disabled).
recomputeNextRuns(state);
await persist(state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect nextRun recompute
After runDueJobs() you call recomputeNextRuns(state) unconditionally (src/cron/service/timer.ts:57-59). This will recompute nextRunAtMs based on now for all enabled jobs, including ones that just executed and already set nextRunAtMs in finish() / finally (e.g. src/cron/service/timer.ts:117-120 and :229-233). If computeNextRunAtMs treats its nowMs parameter as inclusive, this can re-select the same slot that just ran (or otherwise regress the schedule), causing immediate re-runs on the next tick.

Suggestion: avoid recomputing for jobs that executed in this tick (track executed IDs, or only recompute for jobs where nextRunAtMs is missing/invalid/disabled), or recompute using a monotonic reference time (e.g. endedAt already computed per job).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/service/timer.ts
Line: 57:59

Comment:
**Incorrect nextRun recompute**
After `runDueJobs()` you call `recomputeNextRuns(state)` unconditionally (`src/cron/service/timer.ts:57-59`). This will recompute `nextRunAtMs` based on `now` for *all* enabled jobs, including ones that just executed and already set `nextRunAtMs` in `finish()` / `finally` (e.g. `src/cron/service/timer.ts:117-120` and `:229-233`). If `computeNextRunAtMs` treats its `nowMs` parameter as inclusive, this can re-select the same slot that just ran (or otherwise regress the schedule), causing immediate re-runs on the next tick.

Suggestion: avoid recomputing for jobs that executed in this tick (track executed IDs, or only recompute for jobs where `nextRunAtMs` is missing/invalid/disabled), or recompute using a monotonic reference time (e.g. `endedAt` already computed per job).

How can I resolve this? If you propose a fix, please make it concise.

executeJob() already updates nextRunAtMs for executed jobs in finish()/finally,
so calling recomputeNextRuns() afterward is redundant and could potentially
re-select the same slot if croner treats nowMs as inclusive.
@matthewpapa07
Copy link

Ive got a PR fixing something similar here: #9573

@BuairtRi
Copy link

BuairtRi commented Feb 5, 2026

Can we get this merged? This is a really critical bug that is really hampering my system.

@matthewpapa07
Copy link

Wait my fix is here, the above one is a different PR for same issue. We need to unify the fix
#9393

@matthewpapa07
Copy link

all 3 fixes are pretty different. Gotta love vibe coding lol

@lucasverra
Copy link

i hope this get release very soon 🦞

@ksylvan
Copy link

ksylvan commented Feb 5, 2026

I'm running with this pull request and a few other PRs in my preview branch:

kayvan/preview branch is main + these PRs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cron scheduler bug: missed runs cause permanent stall

5 participants