fix(cron): catch up overdue jobs after restart and enable immediate firing#11236
fix(cron): catch up overdue jobs after restart and enable immediate firing#11236VintLin wants to merge 2 commits intoopenclaw:mainfrom
Conversation
| if (options?.allowPast) { | ||
| // To find the "current" or "last" run, we can look back a bit. | ||
| // Croner doesn't have a direct "lastRun" that is efficient without a reference. | ||
| // But we can check if nowMs itself is a run time, or search backwards. | ||
| // However, the requirement is "past-due preservation", so maybe we don't | ||
| // need computeNextRunAtMs to return a past time for cron expressions | ||
| // unless it's explicitly for initial scheduling. | ||
| // Let's try to find the most recent run before or at nowMs. | ||
| const prev = cron.previousRun(new Date(nowMs + 1)); | ||
| if (prev) { | ||
| return prev.getTime(); | ||
| } |
There was a problem hiding this comment.
Past cron run returned
In the allowPast path for cron expressions, cron.previousRun(new Date(nowMs + 1)) can return a timestamp equal to nowMs when nowMs is exactly on a cron boundary. That breaks the usual nextRunAtMs > nowMs invariant in computeNextRunAtMs and can result in a job being scheduled as immediately due at initialization (and potentially re-armed at 0ms repeatedly depending on timer logic). Consider ensuring the returned time is strictly < nowMs (or otherwise handling the equality case explicitly).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/schedule.ts
Line: 63:74
Comment:
**Past cron run returned**
In the `allowPast` path for cron expressions, `cron.previousRun(new Date(nowMs + 1))` can return a timestamp equal to `nowMs` when `nowMs` is exactly on a cron boundary. That breaks the usual `nextRunAtMs > nowMs` invariant in `computeNextRunAtMs` and can result in a job being scheduled as immediately due at initialization (and potentially re-armed at 0ms repeatedly depending on timer logic). Consider ensuring the returned time is strictly `< nowMs` (or otherwise handling the equality case explicitly).
How can I resolve this? If you propose a fix, please make it concise.| it("preserves past-due nextRunAtMs during recomputeNextRuns (#11184)", async () => { | ||
| const now = Date.parse("2026-02-06T10:05:00.000Z"); | ||
| const past = now - 60_000; | ||
|
|
||
| const state = createCronServiceState({ | ||
| cronEnabled: true, | ||
| storePath: "/tmp/dummy.json", | ||
| log: noopLogger, | ||
| nowMs: () => now, | ||
| enqueueSystemEvent: vi.fn(), | ||
| requestHeartbeatNow: vi.fn(), | ||
| runIsolatedAgentJob: vi.fn(), | ||
| }); | ||
|
|
||
| const job: CronJob = { | ||
| id: "past-job", | ||
| name: "past-job", | ||
| enabled: true, | ||
| createdAtMs: past - 1000, | ||
| updatedAtMs: past - 1000, | ||
| schedule: { kind: "every", everyMs: 3600_000, anchorMs: past }, | ||
| sessionTarget: "main", | ||
| payload: { kind: "systemEvent", text: "tick" }, | ||
| state: { nextRunAtMs: past }, | ||
| }; | ||
|
|
||
| state.store = { | ||
| version: 1, | ||
| jobs: [job], | ||
| }; |
There was a problem hiding this comment.
Hardcoded store version
These new regression tests build state.store = { version: 1, ... } directly. If the real store schema version changes (or differs in this repo today), the tests can start failing for reasons unrelated to the regression. Using the same store creation helpers/constants used by the service (or loading a minimal persisted store via existing helpers) would keep these tests stable across migrations.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/service.issue-regressions.test.ts
Line: 348:377
Comment:
**Hardcoded store version**
These new regression tests build `state.store = { version: 1, ... }` directly. If the real store schema version changes (or differs in this repo today), the tests can start failing for reasons unrelated to the regression. Using the same store creation helpers/constants used by the service (or loading a minimal persisted store via existing helpers) would keep these tests stable across migrations.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
This PR addresses remaining edge cases in cron scheduler reliability following #10776, specifically fixing issues where missed jobs are pushed to the future and dynamic additions suffer from polling delays.
Problem
runMissedJobshandles the initial catch-up, the currentrecomputeNextRunslogic is too aggressive. It recalculates thenextRunAtMsto a future slot before an overdue job has a chance to successfully complete, effectively "forgetting" the missed run if it fails once or if the timing window is tight.Changes
computeJobNextRunAtMsto implement a "past-due preservation" policy. IfnextRunAtMsis in the past andlastStatusis notok, the scheduler now preserves the overdue timestamp instead of advancing it.FAILED_RETRY_COOLDOWNto prevent tight infinite retry loops (high CPU) if an overdue job repeatedly fails (e.g., persistent network issues).allowPastflags during initial scheduling, allowingarmTimerto detect adelay <= 0and fire new jobs immediately upon addition.Validation
src/cron/service.issue-regressions.test.tscovering:Greptile Overview
Greptile Summary
This change adjusts cron scheduling to better handle missed/overdue runs and reduce delays when adding new jobs:
computeJobNextRunAtMsnow preserves past-duenextRunAtMsin some cases and applies a 5s retry cooldown after failures.computeNextRunAtMsaccepts anallowPastoption so initial scheduling can return a past timestamp (so timers can fire immediately).Key areas touched are
src/cron/schedule.ts(schedule computations) andsrc/cron/service/jobs.ts(job next-run logic), with new tests insrc/cron/service.issue-regressions.test.ts.Confidence Score: 3/5
allowPastbehavior for cron expressions can returnnowMsat exact boundaries (breaking the prior >now invariant), and the regression tests hardcode a store version which can make them brittle across schema bumps.(2/5) Greptile learns from your feedback when you react with thumbs up/down!