Skip to content

[Bug]: Cron jobs skipped due to race condition - recomputeNextRuns called before runDueJobs #9661

@chrichts

Description

@chrichts

Summary

Cron jobs are systematically skipped when the timer fires even 1ms after the scheduled time due to a race condition in the timer execution order. This is a root cause for many of the recent cron-related bug reports (#8424, #8298, #9542, #9575, etc.).

Root Cause Analysis

In src/cron/service/timer.ts, the onTimer function calls operations in this order:

await locked(state, async () => {
  await ensureLoaded(state, { forceReload: true }); // ← Calls recomputeNextRuns
  await runDueJobs(state);                          // ← Too late, nextRunAtMs already advanced
  await persist(state);
  armTimer(state);
});

The problem:

  1. ensureLoaded calls recomputeNextRuns() at line 259 of store.ts
  2. recomputeNextRuns uses croner.nextRun(new Date(nowMs)) to calculate nextRunAtMs
  3. If nowMs is 12:00:00.001 (1ms past scheduled time), croner.nextRun() returns 14:00:00.000 (next occurrence)
  4. runDueJobs then checks now >= nextRunAtMs12:00:00.001 >= 14:00:00.000false → job skipped

Reproduction

const { Cron } = require("croner");

// Job scheduled for every 2 hours at minute 0
const cron = new Cron("0 */2 * * *");

// Timer fires 1ms late
const nowMs = Date.parse("2026-02-05T12:00:00.001Z");
const next = cron.nextRun(new Date(nowMs));

console.log("Timer fired at:", new Date(nowMs).toISOString());
console.log("Next run calculated as:", next.toISOString());
// Output: Timer fired at: 2026-02-05T12:00:00.001Z
//         Next run calculated as: 2026-02-05T14:00:00.000Z
// The 12:00 run is SKIPPED because nextRunAtMs is now 14:00

Impact

  • Every cron job is at risk of being skipped
  • setTimeout has inherent imprecision (~1-4ms minimum)
  • System load, GC pauses, or VM scheduling delays make this worse
  • Jobs are randomly skipped in normal operation, not just after crashes/restarts

Proposed Fix

Reorder operations in onTimer to run due jobs before recomputing:

await locked(state, async () => {
  await ensureLoaded(state, { forceReload: true, skipRecompute: true });
  await runDueJobs(state);      // Check stored nextRunAtMs values first
  recomputeNextRuns(state);     // THEN advance to next occurrence
  await persist(state);
  armTimer(state);
});

This requires:

  1. Adding skipRecompute option to ensureLoaded
  2. Calling recomputeNextRuns explicitly after runDueJobs

Environment

  • OpenClaw version: 2026.2.x (regression appears in 2026.2.1+)
  • Affects all platforms

Related Issues

This is likely the root cause for:

Labels

bug, cron, scheduler, regression

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions