-
-
Notifications
You must be signed in to change notification settings - Fork 52.7k
Closed
Description
We hit a silent cron outage where no jobs finished for ~200 minutes, even though openclaw cron status --json stayed enabled and nextWakeAtMs kept advancing.
Symptom
- Fleet-wide largest finished-event gap (last 24h in our run logs):
- from
2026-02-06T04:49:38.544Z - to
2026-02-06T08:10:31.957Z - gap
200.89minutes
- from
- During the gap, scheduler remained enabled and wake timestamps advanced, but run logs were not appended.
Suspected root cause
In timer wake path (onTimer), the cron store is force-reloaded before due-job evaluation:
onTimercallsensureLoaded(state, { forceReload: true })ensureLoadedcallsrecomputeNextRuns(state)
That recomputes nextRunAtMs from current time, so jobs that are due at wake can be pushed forward before runDueJobs executes. Repeating this on each wake can defer jobs indefinitely without a hard scheduler error.
One-line fix
In onTimer, avoid force reload:
- from:
await ensureLoaded(state, { forceReload: true }); - to:
await ensureLoaded(state);
Local confirmation
Applying that change locally restored cron execution immediately in our environment (new finished events resumed for 10m/15m jobs right after restart).
Environment
- OpenClaw runtime observed:
2026.2.3-1 (d84eb46) - Install path: Homebrew global install
Happy to provide additional logs or help test a proper upstream patch if useful.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels