Skip to content

Cron jobs stuck in 'already-running' state after gateway restart — stale runningAtMs never cleared #44920

@agentz-manfred

Description

@agentz-manfred

Bug Report

Version: OpenClaw 2026.3.7 (node, macOS arm64)
Channel: WhatsApp

Description

After batch-updating multiple cron jobs (changing delivery.channel on ~12 jobs), the gateway restarted as expected. However, all cron jobs got stuck in a stale runningAtMs state that persisted across restarts, preventing them from executing.

Steps to Reproduce

  1. Have multiple cron jobs running (we had ~20 active jobs)
  2. Batch-update delivery config on 12 jobs via cron.update (two waves, ~7 then ~11 jobs)
  3. Gateway restarts after updates
  4. Jobs stop running — cron.run returns {"ran": false, "reason": "already-running"}
  5. sessions_list(kinds=["cron"]) shows zero active sessions
  6. The runningAtMs timestamp in job state matches the gateway restart time, not any actual session

Expected Behavior

  • Gateway restart should clear runningAtMs for jobs with no matching active session
  • SIGUSR1 (graceful restart) should also clear stale running flags
  • Jobs should resume their schedules after restart

Actual Behavior

  • runningAtMs persists in ~/.openclaw/cron/jobs.json across both SIGUSR1 and full stop/start
  • All jobs with runningAtMs set return already-running when manually triggered
  • No cron sessions are actually running (confirmed via sessions_list)
  • Only a full openclaw gateway stop && sleep 3 && openclaw gateway start clears the state (inconsistently — first try with SIGUSR1 did NOT work)

Additional Bug: delivery.mode: "none" with delivery.channel set

When updating jobs to set delivery.channel: "whatsapp" with delivery.mode: "none":

  • OpenClaw still attempts delivery to WhatsApp despite mode: "none"
  • Fails with: "Delivering to WhatsApp requires target <E.164|group JID>"
  • Jobs are marked as error even though the agentTurn payload executed successfully
  • Setting delivery.channel: null via patch is rejected by schema validation (must be string)
  • Workaround: Add delivery.to: "+1234567890" to satisfy the validator, even though delivery should not fire

Impact

  • 7+ cron jobs stopped running for ~18 hours (missed morning briefing, klartext publish, satire generator, youtube reports, newsletter)
  • Required manual intervention and full gateway restart to recover
  • User perceived this as "crashes" since the gateway was unresponsive during restart cycles

Environment

  • macOS (Apple Silicon, arm64)
  • OpenClaw 2026.3.7
  • LaunchAgent managed gateway
  • ~20 active cron jobs, mixed schedules (every 10min to weekly)
  • WhatsApp + Telegram channels configured

Suggested Fix

  1. On gateway startup, iterate all jobs with runningAtMs set and check if a matching session exists. If not, clear runningAtMs.
  2. delivery.mode: "none" should skip ALL delivery logic regardless of other delivery fields.
  3. cron.update patch should support removing fields (e.g., setting channel: null to delete the key).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions