Skip to content

fix: prevent cron jobs from being skipped when timer fires#9833

Closed
christianhaberl wants to merge 1 commit intoopenclaw:mainfrom
christianhaberl:fix/cron-timer-skip-prevention
Closed

fix: prevent cron jobs from being skipped when timer fires#9833
christianhaberl wants to merge 1 commit intoopenclaw:mainfrom
christianhaberl:fix/cron-timer-skip-prevention

Conversation

@christianhaberl
Copy link

@christianhaberl christianhaberl commented Feb 5, 2026

Problem

Cron jobs are systematically skipped when the timer fires even 1ms after the scheduled time due to a race condition in operation order.

Root cause: In onTimer, ensureLoaded() calls recomputeNextRuns() which advances nextRunAtMs to the next occurrence before runDueJobs() can check if the job is due.

Timer fires at 12:00:00.001 (1ms late)
→ ensureLoaded() → recomputeNextRuns() → nextRunAtMs = 14:00:00
→ runDueJobs() checks: 12:00:00.001 >= 14:00:00.000 → FALSE → SKIPPED

Solution

Reorder operations in onTimer:

  1. ensureLoaded(skipRecompute: true) — load without recomputing
  2. runDueJobs() — check and run using stored nextRunAtMs
  3. recomputeNextRuns() — THEN recompute for next cycle
  4. persist() + armTimer()

This preserves runtime reload functionality (new/modified jobs still get scheduled correctly) while fixing the timer race condition.

Why This PR?

PR #9684 addresses the same issue but includes unrelated changes (Ollama baseUrl defaults, schema changes, numeric string parsing). This PR is focused solely on the cron fix for easier review and reduced regression risk.

#9684 This PR
Cron fix
Ollama defaults ❌ Unrelated ✅ Not included
Schema changes ❌ Unrelated ✅ Not included
Files changed 5 2

Testing

Environment: Raspberry Pi 5 running OpenClaw (hackable install)
Branch: fix/cron-timer-skip-prevention

Tests Performed

1. Basic Cron (*/2 * * * * systemEvent) — 11 consecutive runs over 22 minutes

  • ✅ All fired exactly on schedule
  • No skips observed

2. Isolated agentTurn Cron with Haiku — 6 runs

  • ✅ All executed successfully
  • Model override working correctly

3. Heartbeat (2 min interval) — 9 runs concurrent with crons

  • ✅ All fired reliably
  • No interference between heartbeat and cron timers

4. Complex Cron (curl + conditional output) — Skip conditions tested

  • ✅ Conditional logic works as expected

Key Findings

  • Fix confirmed working: Cron expression-based jobs (kind: "cron") now fire reliably without being skipped
  • Edge case discovered: kind: "every" (interval-based) jobs reset their nextRunAtMs on gateway restart, potentially delaying execution. This is separate from the timer-skip bug but worth noting.
  • No regressions observed in heartbeat or concurrent execution

Related Issues

Fixes #9661
Fixes #9542
Related: #9084, #8424, #8298


Reviewed by: Gemini Code Assist ✅

Greptile Overview

Greptile Summary

This PR changes cron timer processing order to avoid skipping cron jobs when the Node timer fires slightly after the scheduled time. It adds a skipRecompute option to ensureLoaded() so onTimer() can load the persisted store without advancing nextRunAtMs, then runs due jobs using the stored nextRunAtMs, and only afterward recomputes/persists the next run times and re-arms the timer.

Confidence Score: 3/5

  • This PR is close to safe to merge but has a likely missed-due-jobs regression on stores missing nextRunAtMs.
  • The change is small and targeted, but reordering to run due jobs before recomputing relies on persisted state.nextRunAtMs being present/valid; existing migration tests show stores with state: {} which would cause due jobs to be skipped for at least one tick.
  • src/cron/service/timer.ts, src/cron/service/store.ts

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Root cause: When timer fires even 1ms late, ensureLoaded() calls
recomputeNextRuns() which advances nextRunAtMs to the NEXT occurrence.
Then runDueJobs() checks now >= nextRunAtMs → false → job skipped.

Solution (based on openclaw#9661 analysis and PR openclaw#9684):
1. Add skipRecompute option to ensureLoaded()
2. In onTimer: load with skipRecompute=true → run due jobs → THEN recompute

This preserves runtime reload functionality (Gemini review feedback)
while fixing the timer race condition.

Fixes: openclaw#9542
Related: openclaw#9661, openclaw#9684
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +40 to +47
// Load without recomputing to preserve stored nextRunAtMs values
// This prevents the race condition where nextRunAtMs gets advanced
// to the next occurrence before runDueJobs can check if job is due
await ensureLoaded(state, { forceReload: true, skipRecompute: true });
// Check and run due jobs using stored nextRunAtMs
await runDueJobs(state);
// Then recompute next runs for subsequent executions
recomputeNextRuns(state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First tick can miss jobs

With ensureLoaded(..., skipRecompute: true) on the timer path, runDueJobs() relies on the persisted job.state.nextRunAtMs. If a store file contains jobs with state: {} (e.g. migrated/legacy stores) or otherwise missing nextRunAtMs, those jobs won’t be considered due on that timer tick and will only get a nextRunAtMs after recomputeNextRuns(state) runs later in the same tick, effectively delaying execution by at least one timer cycle. There are tests/fixtures that load jobs with state: {} (e.g. src/cron/service.store.migration.test.ts:41-63), so it seems realistic to encounter stores without nextRunAtMs.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/service/timer.ts
Line: 40:47

Comment:
**First tick can miss jobs**

With `ensureLoaded(..., skipRecompute: true)` on the timer path, `runDueJobs()` relies on the persisted `job.state.nextRunAtMs`. If a store file contains jobs with `state: {}` (e.g. migrated/legacy stores) or otherwise missing `nextRunAtMs`, those jobs won’t be considered due on that timer tick and will only get a `nextRunAtMs` after `recomputeNextRuns(state)` runs later in the same tick, effectively delaying execution by at least one timer cycle. There are tests/fixtures that load jobs with `state: {}` (e.g. `src/cron/service.store.migration.test.ts:41-63`), so it seems realistic to encounter stores without `nextRunAtMs`.

How can I resolve this? If you propose a fix, please make it concise.

@tyler6204
Copy link
Member

Closing as duplicate of #9823, which includes the same fix along with tests and passing CI. This PR had formatting issues and lacked test coverage.

@tyler6204 tyler6204 closed this Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Cron jobs skipped due to race condition - recomputeNextRuns called before runDueJobs Cron scheduler timer not firing

2 participants