Skip to content

Rapid cron create/delete cycles can freeze the scheduler timer #18121

@100menotu001

Description

@100menotu001

Problem

When many cron jobs are deleted and recreated in quick succession (e.g., running
ensure-crons for a workflow with 6+ agents), the gateway's internal scheduler
timer can enter a bad state where it stops executing ALL crons.

After this happens, every cron shows nextRunAtMs in the past but none fire.
The only recovery is restarting the gateway process.

Steps to Reproduce

  1. Create 6+ cron jobs via the API
  2. Delete all of them rapidly (sequential API calls, no delay)
  3. Immediately recreate them
  4. Observe that no crons fire — all show overdue nextRunAtMs

Expected Behavior

The scheduler should handle rapid create/delete operations gracefully,
or at minimum recover on the next tick.

Workaround

  • Add small delays (50ms) between sequential cron operations
  • Add a 200ms settling delay between bulk delete and bulk create
  • If frozen, restart the gateway: launchctl kickstart -k gui/$(id -u)/ai.openclaw.gateway

Environment

  • OpenClaw gateway v2026.2.x
  • macOS, launchd-managed gateway process

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions