-
-
Notifications
You must be signed in to change notification settings - Fork 52.8k
Closed
Closed
Copy link
Description
Summary
every (interval) schedule jobs stopped firing for ~23 hours after encountering repeated LLM errors (rate limits, timeouts). The scheduler jumped nextRunAtMs far into the future instead of retrying on the normal interval.
Environment
- OpenClaw version: 2026.2.3-1
- OS: macOS 15.7.3 (arm64)
- Node: 25.6.0
Steps to Reproduce
- Create an
everyschedule job (e.g., hourly):
{
"schedule": {
"kind": "every",
"everyMs": 3600000
}
}-
Let the job encounter multiple consecutive errors (rate limits, timeouts, connection errors)
-
Observe that
nextRunAtMsjumps far into the future (e.g., 24+ hours) instead of retrying on the next interval
Expected Behavior
- After transient errors, the job should retry on the next scheduled interval (1 hour later), not jump 24+ hours ahead
- Optional: A configurable "max catch-up" setting to handle missed runs after downtime
Actual Behavior
- After several errors around 09:00 EST on Feb 5, the hourly jobs didn't fire again until manually recreated on Feb 6
- The
nextRunAtMswas set to ~09:00 EST the next day, skipping ~23 hourly runs - Run history showed errors like:
Error: All models failed (4): anthropic/claude-opus-4-5: LLM request timed out. (unknown) | anthropic/claude-sonnet-4-5: No available auth profile (rate_limit) | ...
Workaround
Delete and recreate the job with a fresh anchorMs to reset the schedule state:
openclaw cron remove <job-id>
openclaw cron add --schedule.kind=every --schedule.everyMs=3600000 --schedule.anchorMs=<recent-timestamp> ...Additional Context
cronexpression jobs (e.g.,0 7 * * *) were unaffected and continued running normally- Only
every(interval) jobs exhibited this behavior - The gateway was running continuously during this period (not restarted until troubleshooting)
Suggested Fix
- After an error, calculate next run as
max(now, lastRunAtMs) + everyMsrather than jumping to a much later time - Consider a
maxSkiporcatchUpoption for interval schedules - Add logging when a job's next run is calculated to be significantly later than expected
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels