Skip to content

Cron: gateway restart loop should pause/defer scheduled jobs #59301

@hikiwibot

Description

@hikiwibot

Problem

When the gateway enters a restart loop (e.g., after a version downgrade), cron jobs can fire during a brief ~5-minute window, only to be killed by SIGTERM before they complete. No run record is written, and the next gateway instance schedules the job for the next occurrence — silently dropping the current one.

Observed Behavior

  1. Gateway restarts every ~5 minutes due to instability
  2. A cron job fires at its scheduled time (e.g., 15 19 * * 0)
  3. ~35 seconds later, SIGTERM kills the gateway mid-execution
  4. New gateway starts, sees scheduled time has passed, moves nextRunAtMs forward
  5. No run entry is written — the job is silently lost
  6. openclaw cron runs --id <job-id> shows 0 entries

Expected Behavior

The gateway should detect restart instability (e.g., >3 restarts in 15 minutes) and defer cron execution until the gateway has been stable for a minimum window (e.g., 5 minutes). This prevents jobs from firing in doomed windows.

Alternatively, the cron scheduler could write a "started" record before execution, so the next instance knows a job was interrupted and can retry.

Environment

  • OpenClaw v2026.3.11
  • macOS (arm64)
  • Cron job using opus model (slow to initialize, making the kill window especially dangerous)

Suggested Approaches

  1. Stability gate: Track gateway start time. Don't fire cron jobs until uptime > N minutes.
  2. Pre-execution journaling: Write a started record to the runs JSONL before firing. On startup, check for started without completed and retry.
  3. Restart detection: If the gateway detects it's been restarted >3 times in 15 min, enter a "degraded" mode that defers non-critical crons.

Metadata

Metadata

Assignees

Labels

dedupe:parentPrimary canonical item in dedupe cluster

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions