[Bug] Cron spin loop via MIN_REFIRE_GAP_MS fallback when computeJobNextRunAtMs returns undefined (incomplete fix for #17821)

## Summary

`applyJobResult` in `src/cron/service/timer.ts` conflates two semantics on the `MIN_REFIRE_GAP_MS` constant:

1. **Lower bound** for the next cron fire, protecting against the case where `computeJobNextRunAtMs` returns a value in the same second as the run just finished (the `#17821` fix).
2. **Fallback value** for when `computeJobNextRunAtMs` returns `undefined` entirely.

The second usage is wrong: when the schedule cannot be computed at all, falling back to `endedAt + 2s` causes the scheduler to re-fire the job 2 seconds later. That run then fails/skips for the same reason, and we're in a tight loop. The structurally identical bug exists in the error-backoff branch.

`#17821` is the closed predecessor and covers the "landed in the same second" case. It does not cover the `undefined` case, and the existing regression tests (`timer.regression.test.ts`: `spin-loop-17821` and `spin-gap-17821`) do not exercise it.

## Root cause walkthrough

### 1. Silent `undefined` return from `computeNextRunAtMs`

`src/cron/schedule.ts` → `computeNextRunAtMs`:

```ts
const cron = resolveCronFromSchedule(schedule);
if (!cron) return;
let next = cron.nextRun(new Date(nowMs));
if (!next) return;
```

If `croner`'s `cron.nextRun()` returns `null` (observed for certain tz/ICU combinations — see environment below), the function silently returns `undefined`. No exception is thrown.

### 2. Silent fall-through in `computeStaggeredCronNextRunAtMs`

`src/cron/schedule.ts` → `computeStaggeredCronNextRunAtMs`:

```ts
for (let attempt = 0; attempt < 4; attempt += 1) {
  const baseNext = computeNextRunAtMs(job.schedule, cursorMs);
  if (baseNext === undefined) return;
  const shifted = baseNext + offsetMs;
  if (shifted > nowMs) return shifted;
  cursorMs = Math.max(cursorMs + 1, baseNext + 1000);
}
// implicit undefined return
```

The 4-attempt loop can exit without returning, implicitly giving `undefined`.

### 3. `applyJobResult` bad fallback in success path

`src/cron/service/timer.ts`, cron success branch:

```ts
} else if (isJobEnabled(job)) {
  let naturalNext: number | undefined;
  try {
    naturalNext = opts?.preserveSchedule && job.schedule.kind === "every"
      ? computeNextWithPreservedLastRun(result.endedAt)
      : computeJobNextRunAtMs(job, result.endedAt);
  } catch (err) {
    recordScheduleComputeError({ state, job, err });
    // no return; execution continues
  }
  if (job.schedule.kind === "cron") {
    const minNext = result.endedAt + MIN_REFIRE_GAP_MS;
    job.state.nextRunAtMs =
      naturalNext !== undefined ? Math.max(naturalNext, minNext) : minNext;
    //                                                           ^^^^^^^ bug
  } else {
    job.state.nextRunAtMs = naturalNext;
  }
}
```

When `naturalNext === undefined`, `nextRunAtMs` becomes `result.endedAt + 2000`. Next tick (2 seconds later) the scheduler re-fires the job.

There's also a subtle secondary issue: `recordScheduleComputeError` already sets `job.state.nextRunAtMs = undefined` for the THROW case, but because the `catch` does not `return`, the subsequent assignment in the same branch overwrites it back to `minNext`.

### 4. Same shape in the error-backoff branch

```ts
} else if (result.status === "error" && isJobEnabled(job)) {
  // ...
  const backoffNext = result.endedAt + backoff;
  job.state.nextRunAtMs =
    normalNext !== undefined ? Math.max(normalNext, backoffNext) : backoffNext;
  //                                                             ^^^^^^^^^^^^ same issue
}
```

Here the fallback is `errorBackoffMs(consecutiveErrors)` rather than `MIN_REFIRE_GAP_MS`, which mitigates the tight loop via exponential backoff — but for the first few errors the interval is still short enough to cause noticeable channel spam, and the structural confusion is the same.

## Observed symptom

A single cron job fires ~10 times in ~10 minutes. Each run completes in 4–6 seconds (agent turn duration); between runs the scheduler wakes up on its newly-set short `nextRunAtMs` and re-fires. Visible as channel spam (Telegram/WeChat messages repeating). On macOS specifically; same version on Ubuntu appears unaffected.

I don't have a deterministic minimal repro for `naturalNext === undefined` — a test `* * * * *` cron in `Asia/Shanghai` tz fired cleanly every 60s in my environment during verification. The `undefined` path appears to need a specific croner/ICU edge case that's not reliably triggered by a simple test job, but the structural bug is present in the code regardless of how often the path is hit.

## Proposed fix

Treat `undefined` as "unknown, clear the schedule" rather than as "use the minimum gap". Success path:

```ts
if (job.schedule.kind === "cron") {
  if (naturalNext !== undefined) {
    const minNext = result.endedAt + MIN_REFIRE_GAP_MS;
    job.state.nextRunAtMs = Math.max(naturalNext, minNext);
  } else {
    // Schedule computation failed; clearing nextRunAtMs avoids a
    // respin loop. The job will be rescheduled on the next heartbeat /
    // timer arm when the schedule can be recomputed from scratch.
    job.state.nextRunAtMs = undefined;
    state.deps.log.warn(
      { jobId: job.id, jobName: job.name },
      "cron: nextRun unresolved — clearing to avoid respin loop",
    );
  }
} else {
  job.state.nextRunAtMs = naturalNext;
}
```

The error-backoff branch should get a parallel treatment. Additionally the `catch` blocks around `computeJobNextRunAtMs` could early-`return` after `recordScheduleComputeError` (or set a flag checked below) so the subsequent assignment doesn't race with the error handler.

## Suggested regression test

In `src/cron/service/timer.regression.test.ts`, alongside `spin-loop-17821` / `spin-gap-17821`:

```ts
it("clears nextRunAtMs when schedule compute returns undefined (does not spin)", async () => {
  // Arrange: enabled cron job whose computeJobNextRunAtMs path returns undefined
  // (mock via a stubbed croner/schedule helper, or a test hook).
  // ...
  applyJobResult(state, job, { status: "ok", startedAt, endedAt });
  expect(job.state.nextRunAtMs).toBeUndefined();
});
```

An analogous test for the error-backoff branch would prevent regression in both paths.

## Environment

- OpenClaw: `2026.4.11` (installed via Homebrew, `/opt/homebrew/lib/node_modules/openclaw/`)
- OS: macOS 26.4.1 (Darwin 25.4.0), arm64, MacBook Pro 14" (M5)
- Node: `v25.9.0`
- Affected job combinations observed: `sessionTarget: isolated + payload.kind: agentTurn` and `sessionTarget: main + payload.kind: systemEvent + wakeMode: next-heartbeat`
- Not observed on Ubuntu running the same OpenClaw version, suggesting croner/ICU/tz edge case is platform-specific at the trigger layer — though the structural bug above is platform-independent.

## Related

- `#17821` — closed; added the `MIN_REFIRE_GAP_MS` safety net for the same-second case. This issue reports the `undefined` case that fix missed.
- `#52097` — open; "Cron job runs repeatedly after manual trigger". Symptom similar, root cause not confirmed the same but worth cross-linking for triage.

## Local patch (for reference)

I've been running the success-path fix above on the shipped bundle since the bug was diagnosed, with cron jobs behaving correctly. The new `warn` log line has not fired in my environment, consistent with the note above that the `undefined` path is rare on my specific setup — the patch functions as a safety net rather than a hot path.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Cron spin loop via MIN_REFIRE_GAP_MS fallback when computeJobNextRunAtMs returns undefined (incomplete fix for #17821) #66019

Summary

Root cause walkthrough

1. Silent `undefined` return from `computeNextRunAtMs`

2. Silent fall-through in `computeStaggeredCronNextRunAtMs`

3. `applyJobResult` bad fallback in success path

4. Same shape in the error-backoff branch

Observed symptom

Proposed fix

Suggested regression test

Environment

Related

Local patch (for reference)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] Cron spin loop via MIN_REFIRE_GAP_MS fallback when computeJobNextRunAtMs returns undefined (incomplete fix for #17821) #66019

Description

Summary

Root cause walkthrough

1. Silent undefined return from computeNextRunAtMs

2. Silent fall-through in computeStaggeredCronNextRunAtMs

3. applyJobResult bad fallback in success path

4. Same shape in the error-backoff branch

Observed symptom

Proposed fix

Suggested regression test

Environment

Related

Local patch (for reference)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Silent `undefined` return from `computeNextRunAtMs`

2. Silent fall-through in `computeStaggeredCronNextRunAtMs`

3. `applyJobResult` bad fallback in success path