Skip to content

[Bug]: one-shot cron jobs silently lost after gateway restart #63657

@myradon

Description

@myradon

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

Bug: one-shot cron jobs silently lost after gateway restart

Version: 2026.4.5
Platform: Docker (Linux)

What happens

When the gateway restarts while a one-shot cron job is mid-execution (runningAtMs is set), the job is permanently lost. No notification is delivered, no run is recorded, and the job is never retried.

Root cause

In src/cron/service/ops.ts (startup logic):

  1. On startup, any job with runningAtMs set gets cleared and added to startupInterruptedJobIds
  2. runMissedJobs is then called with skipJobIds: startupInterruptedJobIds
  3. For one-shot (at:) jobs this means: the job existed, started, gateway died, and on recovery it is skipped instead of retried
  4. Result: runningAtMs cleared, 0 entries in run history, delivery never happens

Observed symptoms

  • Scheduled reminder had runningAtMs set in cron/jobs.json
  • After gateway restart: runningAtMs cleared, state.runs = 0
  • No notification delivered to any channel
  • Job not rescheduled, not flagged, silently gone

Workaround

Recurring jobs survive restarts correctly (they compute next run via missed-jobs logic). One-shots do not. Current workaround: a recurring reconciliation job that compares tasks-store.json cron_ids against the active cron list and recreates orphaned one-shots after each restart.

Impact

Any one-shot reminder or scheduled task that fires exactly when the gateway is restarting (e.g. during a model switch, container restart, or update) is permanently lost without any user-visible signal.

Steps to reproduce

When the gateway restarts while a one-shot cron job is mid-execution
(runningAtMs is set), the job is permanently lost. No notification is
delivered, no run is recorded, and the job is never retried.

Root cause in src/cron/service/ops.ts (startup):

  1. Jobs with runningAtMs set get cleared and added to startupInterruptedJobIds
  2. runMissedJobs is called with skipJobIds: startupInterruptedJobIds
  3. One-shot (at:) jobs are skipped instead of retried
  4. Result: runningAtMs cleared, 0 runs in history, delivery never happens

Symptoms:

  • Scheduled reminder had runningAtMs set in cron/jobs.json
  • After restart: runningAtMs cleared, state.runs = 0
  • No notification delivered, job not rescheduled, silently gone

Expected behavior

An interrupted one-shot job should be retried on restart, not skipped. Options:

  • Re-execute it immediately (treat as overdue)
  • Flag it with a state.interruptedAt marker and surface it to the user
  • At minimum: do not silently discard it — log a warning or deliver a failure notification

Actual behavior

runningAtMs is cleared on restart, job is added to startupInterruptedJobIds,
and runMissedJobs skips it entirely. No retry, no warning, no failure
notification. The job disappears silently with 0 runs recorded.

OpenClaw version

2026.4.5

Operating system

Manjaro Linux

Install method

docker

Model

N/A

Provider / routing chain

N/A

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

Workaround: a recurring reconciliation job that compares cron_ids in
tasks-store.json against the active cron list and recreates orphaned
one-shots after each gateway restart.

Impact: any one-shot reminder that fires exactly when the gateway
restarts (model switch, container restart, update) is permanently lost
without any user-visible signal.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingbug:behaviorIncorrect behavior without a crashdedupe:parentPrimary canonical item in dedupe cluster

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions