Cron runningAtMs zombie state regression — cleared state returns after gateway restart

## Problem

Regression of #18120 (closed as fixed). `runningAtMs` is still not reliably clearing on job completion. 

## Environment
- OpenClaw 2026.3.22 (4dcc39c)
- macOS (arm64), Node v22.22.0 (gateway service) / v25.5.0 (CLI)
- 45 cron jobs configured, ~14 scraper crons running frequently (every 1-2 hours)

## Observed Behavior

9 cron jobs simultaneously stuck with `runningAtMs` set to the **exact same timestamp** (`1775045523045` = 2026-04-01 08:12:03 ET), even though their last runs completed successfully (status: `ok` or `error` with proper `lastDurationMs` values).

The stuck jobs included a mix of:
- Jobs that last ran 3-12 hours ago and completed normally
- Jobs with different schedules and timeouts (2700s, 3600s, null)
- Both `haiku` and `sonnet46` model jobs

The identical `runningAtMs` across all 9 suggests a batch state write (possibly on scheduler wake/catch-up) that sets `runningAtMs` without a corresponding session actually starting.

## Impact

Downstream cron jobs (e.g. `revenue-loop`) stop firing because the scheduler appears to be at concurrency limit or skips scheduling when too many jobs are in `running` state.

## Workaround Applied

1. Edited `~/.openclaw/cron/jobs.json` to set `runningAtMs: null` for all stuck jobs
2. Restarted gateway to pick up clean state
3. Jobs resumed firing normally

## Hypothesis

The fix in #18120 handles the case where a session completes/errors through `applyJobResult`. But there appears to be a path where `runningAtMs` gets set **without a real session starting** — possibly during catch-up scheduling when overdue jobs are detected. The catch-up path may set `runningAtMs` but then skip actual execution (e.g. due to stale delivery), leaving the flag permanently set.

Evidence: gateway log shows `skipping stale delivery` warnings for some of these same jobs around the time the zombie state appeared.

## Possibly Related

Two different Node.js versions observed in gateway logs (`22.22.0` for the service, `25.5.0` for CLI). Unknown if this contributes to state inconsistency.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cron runningAtMs zombie state regression — cleared state returns after gateway restart #59056

Problem

Environment

Observed Behavior

Impact

Workaround Applied

Hypothesis

Possibly Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Cron runningAtMs zombie state regression — cleared state returns after gateway restart #59056

Description

Problem

Environment

Observed Behavior

Impact

Workaround Applied

Hypothesis

Possibly Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions