fix(cron): catch up overdue jobs after restart and enable immediate firing by VintLin · Pull Request #11236 · openclaw/openclaw

VintLin · 2026-02-07T15:56:10Z

This PR addresses remaining edge cases in cron scheduler reliability following #10776, specifically fixing issues where missed jobs are pushed to the future and dynamic additions suffer from polling delays.

Problem

Overdue jobs skipped on restart (Cron scheduler doesn't recover from past nextWakeAtMs after gateway restart #11184): While runMissedJobs handles the initial catch-up, the current recomputeNextRuns logic is too aggressive. It recalculates the nextRunAtMs to a future slot before an overdue job has a chance to successfully complete, effectively "forgetting" the missed run if it fails once or if the timing window is tight.
Dynamic addition delay ([Bug] Cron jobs created after gateway start never execute until restart #11217): Newly added jobs with past anchors do not always trigger immediate execution, often waiting for the next minute-level timer tick.

Changes

State-aware Recomputation: Modified computeJobNextRunAtMs to implement a "past-due preservation" policy. If nextRunAtMs is in the past and lastStatus is not ok, the scheduler now preserves the overdue timestamp instead of advancing it.
Failure Backoff: Introduced a 5-second FAILED_RETRY_COOLDOWN to prevent tight infinite retry loops (high CPU) if an overdue job repeatedly fails (e.g., persistent network issues).
Immediate Timer Arming: Enabled allowPast flags during initial scheduling, allowing armTimer to detect a delay <= 0 and fire new jobs immediately upon addition.

Validation

Added regression tests in src/cron/service.issue-regressions.test.ts covering:
- Overdue job preservation across recomputes.
- 0ms execution for dynamically added jobs with past anchors.
- Verification of the 5s cooldown window during consecutive failures.

Greptile Overview

Greptile Summary

This change adjusts cron scheduling to better handle missed/overdue runs and reduce delays when adding new jobs:

computeJobNextRunAtMs now preserves past-due nextRunAtMs in some cases and applies a 5s retry cooldown after failures.
computeNextRunAtMs accepts an allowPast option so initial scheduling can return a past timestamp (so timers can fire immediately).
Adds regression tests for overdue preservation, immediate scheduling for past anchors, and the failure cooldown behavior.

Key areas touched are src/cron/schedule.ts (schedule computations) and src/cron/service/jobs.ts (job next-run logic), with new tests in src/cron/service.issue-regressions.test.ts.

Confidence Score: 3/5

This PR is likely mergeable, but there are a couple edge cases/tests that should be tightened to avoid regressions.
The core intent is clear and the added tests cover the reported issues, but the new allowPast behavior for cron expressions can return nowMs at exact boundaries (breaking the prior >now invariant), and the regression tests hardcode a store version which can make them brittle across schema bumps.
src/cron/schedule.ts, src/cron/service.issue-regressions.test.ts

_{(2/5) Greptile learns from your feedback when you react with thumbs up/down!}

…iring

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-07T16:02:28Z

src/cron/schedule.ts

+  if (options?.allowPast) {
+    // To find the "current" or "last" run, we can look back a bit.
+    // Croner doesn't have a direct "lastRun" that is efficient without a reference.
+    // But we can check if nowMs itself is a run time, or search backwards.
+    // However, the requirement is "past-due preservation", so maybe we don't
+    // need computeNextRunAtMs to return a past time for cron expressions
+    // unless it's explicitly for initial scheduling.
+    // Let's try to find the most recent run before or at nowMs.
+    const prev = cron.previousRun(new Date(nowMs + 1));
+    if (prev) {
+      return prev.getTime();
+    }


Past cron run returned

In the allowPast path for cron expressions, cron.previousRun(new Date(nowMs + 1)) can return a timestamp equal to nowMs when nowMs is exactly on a cron boundary. That breaks the usual nextRunAtMs > nowMs invariant in computeNextRunAtMs and can result in a job being scheduled as immediately due at initialization (and potentially re-armed at 0ms repeatedly depending on timer logic). Consider ensuring the returned time is strictly < nowMs (or otherwise handling the equality case explicitly).

Prompt To Fix With AI

This is a comment left during a code review. Path: src/cron/schedule.ts Line: 63:74 Comment: **Past cron run returned** In the `allowPast` path for cron expressions, `cron.previousRun(new Date(nowMs + 1))` can return a timestamp equal to `nowMs` when `nowMs` is exactly on a cron boundary. That breaks the usual `nextRunAtMs > nowMs` invariant in `computeNextRunAtMs` and can result in a job being scheduled as immediately due at initialization (and potentially re-armed at 0ms repeatedly depending on timer logic). Consider ensuring the returned time is strictly `< nowMs` (or otherwise handling the equality case explicitly). How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-07T16:02:29Z

src/cron/service.issue-regressions.test.ts

+  it("preserves past-due nextRunAtMs during recomputeNextRuns (#11184)", async () => {
+    const now = Date.parse("2026-02-06T10:05:00.000Z");
+    const past = now - 60_000;
+
+    const state = createCronServiceState({
+      cronEnabled: true,
+      storePath: "/tmp/dummy.json",
+      log: noopLogger,
+      nowMs: () => now,
+      enqueueSystemEvent: vi.fn(),
+      requestHeartbeatNow: vi.fn(),
+      runIsolatedAgentJob: vi.fn(),
+    });
+
+    const job: CronJob = {
+      id: "past-job",
+      name: "past-job",
+      enabled: true,
+      createdAtMs: past - 1000,
+      updatedAtMs: past - 1000,
+      schedule: { kind: "every", everyMs: 3600_000, anchorMs: past },
+      sessionTarget: "main",
+      payload: { kind: "systemEvent", text: "tick" },
+      state: { nextRunAtMs: past },
+    };
+
+    state.store = {
+      version: 1,
+      jobs: [job],
+    };


Hardcoded store version

These new regression tests build state.store = { version: 1, ... } directly. If the real store schema version changes (or differs in this repo today), the tests can start failing for reasons unrelated to the regression. Using the same store creation helpers/constants used by the service (or loading a minimal persisted store via existing helpers) would keep these tests stable across migrations.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/cron/service.issue-regressions.test.ts Line: 348:377 Comment: **Hardcoded store version** These new regression tests build `state.store = { version: 1, ... }` directly. If the real store schema version changes (or differs in this repo today), the tests can start failing for reasons unrelated to the regression. Using the same store creation helpers/constants used by the service (or loading a minimal persisted store via existing helpers) would keep these tests stable across migrations. <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

… fix types

tyler6204 · 2026-02-08T03:47:29Z

Superseded by #11641 (merge commit: 8fae55e). Closing to reduce duplicate PR noise. Please open a new PR only if there is additional scope beyond this fix.

fix(cron): catch up overdue jobs after restart and enable immediate f…

3375f10

…iring

greptile-apps bot reviewed Feb 7, 2026

View reviewed changes

fix(cron): refine past-due logic to strictly use times before now and…

91ae292

… fix types

Takhoffman requested a review from tyler6204 February 7, 2026 16:48

tyler6204 mentioned this pull request Feb 8, 2026

fix(cron): share isolated announce flow + harden cron scheduling/delivery #11641

Merged

tyler6204 closed this Feb 8, 2026

VintLin deleted the fix/cron-catchup-recovery branch February 10, 2026 03:00

VintLin restored the fix/cron-catchup-recovery branch February 10, 2026 03:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cron): catch up overdue jobs after restart and enable immediate firing#11236

fix(cron): catch up overdue jobs after restart and enable immediate firing#11236
VintLin wants to merge 2 commits intoopenclaw:mainfrom
VintLin:fix/cron-catchup-recovery

VintLin commented Feb 7, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 7, 2026

Uh oh!

greptile-apps bot Feb 7, 2026

Uh oh!

tyler6204 commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

VintLin commented Feb 7, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Validation

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

tyler6204 commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

VintLin commented Feb 7, 2026 •

edited by greptile-apps bot

Loading