Cron: `every` schedule jobs stop firing after repeated errors, no catch-up mechanism

## Summary

`every` (interval) schedule jobs stopped firing for ~23 hours after encountering repeated LLM errors (rate limits, timeouts). The scheduler jumped `nextRunAtMs` far into the future instead of retrying on the normal interval.

## Environment
- OpenClaw version: 2026.2.3-1
- OS: macOS 15.7.3 (arm64)
- Node: 25.6.0

## Steps to Reproduce

1. Create an `every` schedule job (e.g., hourly):
```json
{
  "schedule": {
    "kind": "every",
    "everyMs": 3600000
  }
}
```

2. Let the job encounter multiple consecutive errors (rate limits, timeouts, connection errors)

3. Observe that `nextRunAtMs` jumps far into the future (e.g., 24+ hours) instead of retrying on the next interval

## Expected Behavior

- After transient errors, the job should retry on the next scheduled interval (1 hour later), not jump 24+ hours ahead
- Optional: A configurable "max catch-up" setting to handle missed runs after downtime

## Actual Behavior

- After several errors around 09:00 EST on Feb 5, the hourly jobs didn't fire again until manually recreated on Feb 6
- The `nextRunAtMs` was set to ~09:00 EST the next day, skipping ~23 hourly runs
- Run history showed errors like:
  ```
  Error: All models failed (4): anthropic/claude-opus-4-5: LLM request timed out. (unknown) | anthropic/claude-sonnet-4-5: No available auth profile (rate_limit) | ...
  ```

## Workaround

Delete and recreate the job with a fresh `anchorMs` to reset the schedule state:
```bash
openclaw cron remove <job-id>
openclaw cron add --schedule.kind=every --schedule.everyMs=3600000 --schedule.anchorMs=<recent-timestamp> ...
```

## Additional Context

- `cron` expression jobs (e.g., `0 7 * * *`) were unaffected and continued running normally
- Only `every` (interval) jobs exhibited this behavior
- The gateway was running continuously during this period (not restarted until troubleshooting)

## Suggested Fix

1. After an error, calculate next run as `max(now, lastRunAtMs) + everyMs` rather than jumping to a much later time
2. Consider a `maxSkip` or `catchUp` option for interval schedules
3. Add logging when a job's next run is calculated to be significantly later than expected

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cron: `every` schedule jobs stop firing after repeated errors, no catch-up mechanism #10403

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Workaround

Additional Context

Suggested Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Cron: every schedule jobs stop firing after repeated errors, no catch-up mechanism #10403

Description

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Workaround

Additional Context

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Cron: `every` schedule jobs stop firing after repeated errors, no catch-up mechanism #10403