Cron: lastRunAtMs not persisted until Phase 3 — gateway restart causes duplicate job execution

## Bug

When a cron job executes successfully but the gateway restarts before the outcome persist (Phase 3) completes, `lastRunAtMs` is never written to `jobs.json`. On restart, the stale `lastRunAtMs` (potentially days old) causes `runMissedJobs` to consider the job overdue and execute it again.

**Result:** Duplicate delivery (e.g., two morning briefings).

## Reproduction

1. Cron job fires at 08:00, Phase 1 persists `runningAtMs` ✓
2. Job executes and delivers successfully (Phase 2, no lock held)
3. Gateway restarts (SIGUSR1, crash, deploy) **before** Phase 3 persist
4. On restart: `runningAtMs` is cleared (ops.ts:102-114), but `lastRunAtMs` is still stale
5. `isRunnableJob` (timer.ts:784-801) sees `previousRunAtMs > lastRunAtMs` → job re-runs
6. User receives a second delivery

This happened in production on 2025-04-05: a daily-summary job delivered at 08:00, then again at 12:06 after a restart cascade.

## Root cause

`onTimer` in `timer.ts` uses three phases:

| Phase | What it does | Persisted? |
|---|---|---|
| 1 (L611-616) | Sets `runningAtMs` | ✓ `await persist(state)` |
| 2 (L624-669) | Executes job (unbounded time, **no lock**) | ✗ |
| 3 (L675-689) | Calls `applyOutcomeToStoredJob` → sets `lastRunAtMs` | ✓ but only if reached |

If gateway dies between Phase 1 and Phase 3, `lastRunAtMs` is never updated. The `runningAtMs` marker is cleaned up on restart (ops.ts:102-114), but nothing prevents the missed-job check from using the stale `lastRunAtMs`.

The codebase already acknowledges Phase-1 crash scenarios — see the comment at ops.ts:390-392 referencing #17554 — but only for the `runningAtMs` marker, not for `lastRunAtMs`.

## Suggested fix

Persist `lastRunAtMs` immediately after job execution completes, **before** delivery attempt:

```typescript
// After executeJobCoreWithTimeout returns, before delivery:
await locked(state, async () => {
  await ensureLoaded(state, { forceReload: true, skipRecompute: true });
  const stored = state.store.jobs.find(j => j.id === job.id);
  if (stored) {
    stored.state.lastRunAtMs = startedAt;
    // Optionally: stored.state.lastRunStatus = "executed-pending-delivery"
  }
  await persist(state);
});
```

This way, even if the gateway crashes during or after delivery, the job won't be considered missed on restart. Delivery status can still be updated in Phase 3.

Alternative: on restart, if a `runningAtMs` marker is found and cleared, also set `lastRunAtMs = runningAtMs` before clearing — treating interrupted jobs as "ran but status unknown" rather than "never ran."

## Affected files

- `src/cron/service/timer.ts` — `onTimer` (L572-731), `applyJobResult` (L296-474), `isRunnableJob` (L784-801)
- `src/cron/service/ops.ts` — startup cleanup (L102-114), missed job detection

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cron: lastRunAtMs not persisted until Phase 3 — gateway restart causes duplicate job execution #61343

Bug

Reproduction

Root cause

Suggested fix

Affected files

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Phase	What it does	Persisted?
1 (L611-616)	Sets `runningAtMs`	✓ `await persist(state)`
2 (L624-669)	Executes job (unbounded time, no lock)	✗
3 (L675-689)	Calls `applyOutcomeToStoredJob` → sets `lastRunAtMs`	✓ but only if reached

Uh oh!

Cron: lastRunAtMs not persisted until Phase 3 — gateway restart causes duplicate job execution #61343

Description

Bug

Reproduction

Root cause

Suggested fix

Affected files

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions