|
| 1 | +--- |
| 2 | +summary: "RFC: Cron jobs + wakeups for Clawd/Clawdis (main vs isolated sessions)" |
| 3 | +read_when: |
| 4 | + - Designing scheduled jobs, alarms, or wakeups |
| 5 | + - Adding Gateway methods or CLI commands for automation |
| 6 | + - Adjusting heartbeat behavior or session routing |
| 7 | +--- |
| 8 | + |
| 9 | +# RFC: Cron jobs + wakeups for Clawd |
| 10 | + |
| 11 | +Status: Draft |
| 12 | +Last updated: 2025-12-13 |
| 13 | + |
| 14 | +## Context |
| 15 | + |
| 16 | +Clawdis already has: |
| 17 | +- A **periodic reply heartbeat** that runs the agent with `HEARTBEAT /think:high` and suppresses `HEARTBEAT_OK` (`src/web/auto-reply.ts`). |
| 18 | +- A lightweight, in-memory **system event queue** (`enqueueSystemEvent`) that is injected into the next **main session** turn (`drainSystemEvents` in `src/auto-reply/reply.ts`). |
| 19 | +- A WebSocket **Gateway** daemon that is intended to be always-on (`docs/gateway.md`). |
| 20 | + |
| 21 | +This RFC adds a small “cron job system” so Clawd can schedule future work and reliably wake itself up: |
| 22 | +- **Delayed**: run on the *next* normal heartbeat tick |
| 23 | +- **Immediate**: run *now* (trigger a heartbeat immediately) |
| 24 | +- **Isolated jobs**: optionally run in their own session that does not pollute the main session and can run concurrently (within configured limits). |
| 25 | + |
| 26 | +## Goals |
| 27 | + |
| 28 | +- Provide a **persistent job store** and an **in-process scheduler** owned by the Gateway. |
| 29 | +- Allow each job to target either: |
| 30 | + - `sessionTarget: "main"`: inject as `System:` lines and rely on the main heartbeat (or trigger it immediately). |
| 31 | + - `sessionTarget: "isolated"`: run an agent turn in a dedicated session key (job session), optionally delivering a message and/or posting a summary back to main. |
| 32 | +- Expose a stable control surface: |
| 33 | + - **Gateway methods** (`cron.*`, `wake`) for programmatic usage (mac app, CLI, agents). |
| 34 | + - **CLI commands** (`clawdis cron ...`) to add/remove/edit/list and to debug `run`. |
| 35 | +- Produce clear, structured **logs** for job lifecycle and execution outcomes. |
| 36 | + |
| 37 | +## Non-goals (v1) |
| 38 | + |
| 39 | +- Multi-host distributed scheduling. |
| 40 | +- Exactly-once semantics across crashes (we aim for “at-least-once with idempotency hooks”). |
| 41 | +- A full Unix-cron parser as the only schedule format (we can support it, but v1 should not require complex cron features to be useful). |
| 42 | + |
| 43 | +## Terminology |
| 44 | + |
| 45 | +- **Wake**: a request to ensure the agent gets a turn soon (either right now or next heartbeat). |
| 46 | +- **Main session**: the canonical session bucket (default key `"main"`) that receives `System:` events. |
| 47 | +- **Isolated session**: a per-job session key (e.g. `cron:<jobId>`) with its own session id / session file. |
| 48 | + |
| 49 | +## User stories |
| 50 | + |
| 51 | +- “Remind me in 20 minutes” → add a one-shot job that triggers an immediate heartbeat at T+20m. |
| 52 | +- “Every weekday at 7:30, wake me up and start music” → recurring job, isolated session, deliver to WhatsApp. |
| 53 | +- “Every hour, check battery; only interrupt me if < 20%” → isolated job that decides whether to deliver; may also post a brief status to main. |
| 54 | +- “Next heartbeat, please check calendar” → delayed wake targeting main session. |
| 55 | + |
| 56 | +## Job model |
| 57 | + |
| 58 | +### Storage schema (v1) |
| 59 | + |
| 60 | +Each job is a JSON object with stable keys (unknown keys ignored for forward compatibility): |
| 61 | + |
| 62 | +- `id: string` (UUID) |
| 63 | +- `name?: string` |
| 64 | +- `enabled: boolean` |
| 65 | +- `createdAtMs: number` |
| 66 | +- `updatedAtMs: number` |
| 67 | +- `schedule` (one of) |
| 68 | + - `{"kind":"at","atMs":number}` (one-shot) |
| 69 | + - `{"kind":"every","everyMs":number,"anchorMs"?:number}` (simple interval) |
| 70 | + - `{"kind":"cron","expr":string,"tz"?:string}` (optional; see “Schedule parsing”) |
| 71 | +- `sessionTarget: "main" | "isolated"` |
| 72 | +- `wakeMode: "next-heartbeat" | "now"` |
| 73 | + - For `sessionTarget:"isolated"`, `wakeMode:"now"` means “run immediately when due”. |
| 74 | + - For `sessionTarget:"main"`, `wakeMode` controls whether we trigger the heartbeat immediately or just enqueue and wait. |
| 75 | +- `payload` (one of) |
| 76 | + - `{"kind":"systemEvent","text":string}` (enqueue as `System:`) |
| 77 | + - `{"kind":"agentTurn","message":string,"deliver"?:boolean,"channel"?: "last"|"whatsapp"|"telegram","to"?:string,"timeoutSeconds"?:number}` |
| 78 | +- `isolation` (optional; only meaningful for isolated jobs) |
| 79 | + - `{"postToMain": boolean, "postToMainPrefix"?: string}` |
| 80 | +- `runtime` (optional) |
| 81 | + - `{"maxAttempts"?:number,"retryBackoffMs"?:number}` (best-effort retries; defaults off) |
| 82 | +- `state` (runtime-maintained) |
| 83 | + - `{"nextRunAtMs":number,"lastRunAtMs"?:number,"lastStatus"?: "ok"|"error"|"skipped","lastError"?:string,"lastDurationMs"?:number}` |
| 84 | + |
| 85 | +### Key behavior |
| 86 | + |
| 87 | +- `sessionTarget:"main"` jobs always enqueue `payload.kind:"systemEvent"` (directly or derived from `agentTurn` results; see below). |
| 88 | +- `sessionTarget:"isolated"` jobs create/use a stable session key: `cron:<jobId>`. |
| 89 | + |
| 90 | +## Storage location |
| 91 | + |
| 92 | +We can store this directly under `~/.clawdis` without a subfolder, but a folder gives us room for future artifacts (per-job state, migration backups, run history). |
| 93 | + |
| 94 | +Current behavior (v1): |
| 95 | +- Default store: `~/.clawdis/cron.json` |
| 96 | +- If `~/.clawdis/cron/jobs.json` exists, it is preferred (and is a good location for future per-cron artifacts). |
| 97 | +- Any path can be forced via `cron.store` in config. |
| 98 | + |
| 99 | +The scheduler should never require additional configuration for the base directory (Clawdis already treats `~/.clawdis` as fixed). |
| 100 | + |
| 101 | +## Enabling |
| 102 | + |
| 103 | +Cron execution should be opt-in via config: |
| 104 | + |
| 105 | +```json5 |
| 106 | +{ |
| 107 | + cron: { |
| 108 | + enabled: true, |
| 109 | + // optional: |
| 110 | + store: "~/.clawdis/cron.json", |
| 111 | + maxConcurrentRuns: 1 |
| 112 | + } |
| 113 | +} |
| 114 | +``` |
| 115 | + |
| 116 | +## Scheduler design |
| 117 | + |
| 118 | +### Ownership |
| 119 | + |
| 120 | +The Gateway owns: |
| 121 | +- the scheduler timer, |
| 122 | +- job store reads/writes, |
| 123 | +- job execution (enqueue system events and/or agent turns). |
| 124 | + |
| 125 | +This keeps scheduling unified with the always-on process and prevents “two schedulers” when multiple CLIs run. |
| 126 | + |
| 127 | +### Timer strategy |
| 128 | + |
| 129 | +- Maintain an in-memory heap/array of enabled jobs keyed by `state.nextRunAtMs`. |
| 130 | +- Use a **single `setTimeout`** to wake at the earliest next run. |
| 131 | +- On wake: |
| 132 | + - compute all due jobs (now >= nextRunAtMs), |
| 133 | + - mark them “in flight” (in memory), |
| 134 | + - persist updated `state` (at least bump `nextRunAtMs` / `lastRunAtMs`) before starting execution to minimize duplicate runs on crash, |
| 135 | + - execute jobs (with concurrency limits), |
| 136 | + - persist final `lastStatus/lastError/lastDurationMs`, |
| 137 | + - re-arm timer for the next earliest run. |
| 138 | + |
| 139 | +### Schedule parsing |
| 140 | + |
| 141 | +V1 can ship with `at` + `every` without extra deps. |
| 142 | + |
| 143 | +If we add `"kind":"cron"`: |
| 144 | +- Use a well-maintained parser (we use `croner`) and support: |
| 145 | + - 5-field cron (`min hour dom mon dow`) at minimum |
| 146 | + - optional `tz` |
| 147 | +- Store `nextRunAtMs` computed by the parser; re-compute after each run. |
| 148 | + |
| 149 | +## Execution semantics |
| 150 | + |
| 151 | +### Main session jobs |
| 152 | + |
| 153 | +Main session jobs do not run the agent directly by default. |
| 154 | + |
| 155 | +When due: |
| 156 | +1) `enqueueSystemEvent(job.payload.text)` (or a derived message) |
| 157 | +2) If `wakeMode:"now"`, trigger an immediate heartbeat run (see “Heartbeat wake hook”). |
| 158 | +3) Otherwise do nothing else (the next scheduled heartbeat will pick up the system event). |
| 159 | + |
| 160 | +Why: This keeps the main session’s “proactive” behavior centralized in the heartbeat rules and avoids ad-hoc agent turns that might fight with inbound message processing. |
| 161 | + |
| 162 | +### Isolated session jobs |
| 163 | + |
| 164 | +Isolated jobs run an agent turn in a dedicated session key, intended to be separate from main. |
| 165 | + |
| 166 | +When due: |
| 167 | +- Build a message body that includes schedule metadata, e.g.: |
| 168 | + - `"[cron:<jobId>] <job.name>: <payload.message>"` |
| 169 | +- Execute via the same agent runner path as other command-mode runs, but pinned to: |
| 170 | + - `sessionKey = cron:<jobId>` |
| 171 | + - `sessionId = store[sessionKey].sessionId` (create if missing) |
| 172 | +- Optionally deliver output (`payload.deliver === true`) to the configured channel/to. |
| 173 | +- If `isolation.postToMain` is true, enqueue a summary system event to main, e.g.: |
| 174 | + - `System: Cron "<name>" completed: <1-line summary>` |
| 175 | + |
| 176 | +### “Run in parallel to main” |
| 177 | + |
| 178 | +Clawdis currently serializes command execution through a global in-process queue (`src/process/command-queue.ts`) to avoid collisions. |
| 179 | + |
| 180 | +To support isolated cron jobs running “in parallel”, we should introduce **lanes** (keyed queues) plus a global concurrency cap: |
| 181 | +- Lane `"main"`: inbound auto-replies + main heartbeat. |
| 182 | +- Lane `"cron"` (or `cron:<jobId>`): isolated jobs. |
| 183 | +- Configurable `cron.maxConcurrentRuns` (default 1 or 2). |
| 184 | + |
| 185 | +This yields: |
| 186 | +- isolated jobs can overlap with the main lane (up to cap), |
| 187 | +- each lane still preserves ordering for its own work (optional), |
| 188 | +- we retain safety knobs to prevent runaway resource contention. |
| 189 | + |
| 190 | +## Heartbeat wake hook (immediate vs next heartbeat) |
| 191 | + |
| 192 | +We need a way for the Gateway (or the scheduler) to request an immediate heartbeat without duplicating heartbeat logic. |
| 193 | + |
| 194 | +Design: |
| 195 | +- `monitorWebProvider` owns the real `runReplyHeartbeat()` function (it already has all the local state needed). |
| 196 | +- Add a small global hook module: |
| 197 | + - `setReplyHeartbeatWakeHandler(fn | null)` installed by `monitorWebProvider` |
| 198 | + - `requestReplyHeartbeatNow({ reason, coalesceMs? })` |
| 199 | +- If the handler is absent (provider not connected), the request is stored as “pending”; the next time the handler is installed, it runs once. |
| 200 | +- Coalesce rapid calls and respect the existing “skip when queue busy” behavior (prefer retrying soon vs dropping). |
| 201 | + |
| 202 | +## Run history log (JSONL) |
| 203 | + |
| 204 | +In addition to normal structured logs, the Gateway writes an append-only run history “ledger” (JSONL) whenever a job finishes. This is intended for quick debugging (“did the job run, when, and what happened?”). |
| 205 | + |
| 206 | +Path rules: |
| 207 | +- If the cron store path basename is `jobs.json` (e.g. `~/.clawdis/cron/jobs.json`), logs go to `.../runs/<jobId>.jsonl` (e.g. `~/.clawdis/cron/runs/<jobId>.jsonl`). |
| 208 | +- Otherwise logs go to `<storeBase>.runs.jsonl` in the same directory (e.g. `~/.clawdis/cron.json` → `~/.clawdis/cron.runs.jsonl`). |
| 209 | + |
| 210 | +Retention: |
| 211 | +- Best-effort pruning when the file grows beyond ~2MB; keep the newest ~2000 lines. |
| 212 | + |
| 213 | +## Gateway API |
| 214 | + |
| 215 | +New methods (names can be bikeshed; `cron.*` is suggested): |
| 216 | + |
| 217 | +- `wake` |
| 218 | + - params: `{ mode: "now" | "next-heartbeat", text: string }` |
| 219 | + - effect: `enqueueSystemEvent(text)`, plus optional immediate heartbeat trigger |
| 220 | + |
| 221 | +- `cron.list` |
| 222 | + - params: optional `{ includeDisabled?: boolean }` |
| 223 | + - returns: `{ jobs: CronJob[] }` |
| 224 | + |
| 225 | +- `cron.add` |
| 226 | + - params: job payload without `id/state` (server generates and returns created job) |
| 227 | + |
| 228 | +- `cron.update` |
| 229 | + - params: `{ id: string, patch: Partial<CronJobWritableFields> }` |
| 230 | + |
| 231 | +- `cron.remove` |
| 232 | + - params: `{ id: string }` |
| 233 | + |
| 234 | +- `cron.run` |
| 235 | + - params: `{ id: string, mode?: "due" | "force" }` (debugging; does not change schedule unless `force` requires it) |
| 236 | + |
| 237 | +- `cron.runs` |
| 238 | + - params: `{ id?: string, limit?: number }` |
| 239 | + - returns: `{ entries: CronRunLogEntry[] }` |
| 240 | + - note: if the store layout is `.../jobs.json`, `id` is required (runs are stored per-job). |
| 241 | + |
| 242 | +The Gateway should broadcast a `cron` event for UI/debug: |
| 243 | +- event: `cron` |
| 244 | + - payload: `{ jobId, action: "added"|"updated"|"removed"|"started"|"finished", status?, error?, nextRunAtMs? }` |
| 245 | + |
| 246 | +## CLI surface |
| 247 | + |
| 248 | +Add a `cron` command group (all commands should also support `--json` where sensible): |
| 249 | + |
| 250 | +- `clawdis cron list [--json] [--all]` |
| 251 | +- `clawdis cron add ...` |
| 252 | + - schedule flags: |
| 253 | + - `--at <iso8601|ms|relative>` (one-shot) |
| 254 | + - `--every <duration>` (e.g. `10m`, `1h`) |
| 255 | + - `--cron "<expr>" [--tz "<tz>"]` |
| 256 | + - target flags: |
| 257 | + - `--session main|isolated` |
| 258 | + - `--wake now|next` |
| 259 | + - payload flags (choose one): |
| 260 | + - `--system-event "<text>"` |
| 261 | + - `--message "<agent message>" [--deliver] [--channel last|whatsapp|telegram] [--to <dest>]` |
| 262 | + |
| 263 | +- `clawdis cron edit <id> ...` (patch-by-flags, non-interactive) |
| 264 | +- `clawdis cron rm <id>` |
| 265 | +- `clawdis cron enable <id>` / `clawdis cron disable <id>` |
| 266 | +- `clawdis cron run <id> [--force]` (debug) |
| 267 | + |
| 268 | +Additionally: |
| 269 | +- `clawdis wake --mode now|next --text "<text>"` as a thin wrapper around `wake` for agents to call. |
| 270 | + |
| 271 | +## Examples |
| 272 | + |
| 273 | +### Run once at a specific time |
| 274 | + |
| 275 | +One-shot reminder that targets the main session and triggers a heartbeat immediately at the scheduled time: |
| 276 | + |
| 277 | +```bash |
| 278 | +clawdis cron add \ |
| 279 | + --at "2025-12-14T07:00:00-08:00" \ |
| 280 | + --session main \ |
| 281 | + --wake now \ |
| 282 | + --system-event "Alarm: wake up (meeting in 30 minutes)." |
| 283 | +``` |
| 284 | + |
| 285 | +### Run daily (calendar-accurate) |
| 286 | + |
| 287 | +Daily at 07:00 in a specific timezone (preferred over “every 24h” to avoid DST drift): |
| 288 | + |
| 289 | +```bash |
| 290 | +clawdis cron add \ |
| 291 | + --cron "0 7 * * *" \ |
| 292 | + --tz "America/Los_Angeles" \ |
| 293 | + --session isolated \ |
| 294 | + --wake now \ |
| 295 | + --message "Daily check: scan calendar + inbox; deliver only if urgent." \ |
| 296 | + --deliver \ |
| 297 | + --channel last |
| 298 | +``` |
| 299 | + |
| 300 | +### Run weekly (every Wednesday) |
| 301 | + |
| 302 | +Every Wednesday at 09:00: |
| 303 | + |
| 304 | +```bash |
| 305 | +clawdis cron add \ |
| 306 | + --cron "0 9 * * 3" \ |
| 307 | + --tz "America/Los_Angeles" \ |
| 308 | + --session isolated \ |
| 309 | + --wake now \ |
| 310 | + --message "Weekly: summarize status and remind me of goals." \ |
| 311 | + --deliver \ |
| 312 | + --channel last |
| 313 | +``` |
| 314 | + |
| 315 | +### “Next heartbeat” |
| 316 | + |
| 317 | +Enqueue a note for the main session but let the existing heartbeat cadence pick it up: |
| 318 | + |
| 319 | +```bash |
| 320 | +clawdis wake --mode next --text "Next heartbeat: check battery + upcoming meetings." |
| 321 | +``` |
| 322 | + |
| 323 | +## Logging & observability |
| 324 | + |
| 325 | +Logging requirements: |
| 326 | +- Use `getChildLogger({ module: "cron", jobId, runId, name })` for every run. |
| 327 | +- Log lifecycle: |
| 328 | + - store load/save (debug; include job count) |
| 329 | + - schedule recompute (debug; include nextRunAt) |
| 330 | + - job start/end (info) |
| 331 | + - job skipped (info; include reason) |
| 332 | + - job error (warn; include error + stack where available) |
| 333 | +- Emit a concise user-facing line to stdout when running in CLI mode (similar to heartbeat logs). |
| 334 | + |
| 335 | +Suggested log events: |
| 336 | +- `cron: scheduler started` (jobCount, nextWakeAt) |
| 337 | +- `cron: job started` (jobId, scheduleKind, sessionTarget, wakeMode) |
| 338 | +- `cron: job finished` (status, durationMs, nextRunAtMs) |
| 339 | + |
| 340 | +## Safety & security |
| 341 | + |
| 342 | +- Respect existing allowlists/routing rules: delivery defaults should not send to arbitrary destinations unless explicitly configured. |
| 343 | +- Provide a global “kill switch”: |
| 344 | + - `cron.enabled: boolean` config default true (or false until enabled). |
| 345 | + - `gateway method set-heartbeats` already exists; cron should have similar. |
| 346 | +- Avoid persistence of sensitive payloads unless requested; job text may contain private content. |
| 347 | + |
| 348 | +## Testing plan (v1) |
| 349 | + |
| 350 | +- Unit tests: |
| 351 | + - schedule computation for `at` and `every` |
| 352 | + - job store read/write + migration behavior |
| 353 | + - lane concurrency: main vs cron overlap is bounded |
| 354 | + - “wake now” coalescing and pending behavior when provider not ready |
| 355 | +- Integration tests: |
| 356 | + - start Gateway with `CLAWDIS_SKIP_PROVIDERS=1`, add jobs, list/edit/remove |
| 357 | + - simulate due jobs and assert `enqueueSystemEvent` called + cron events broadcast |
| 358 | + |
| 359 | +## Rollout plan |
| 360 | + |
| 361 | +1) Add the `wake` primitive + heartbeat wake hook (no persistent jobs yet). |
| 362 | +2) Add `cron.*` API and CLI wrappers with `at` + `every`. |
| 363 | +3) Add optional cron expression parsing (`kind:"cron"`) if needed. |
| 364 | +4) Add UI surfacing in WebChat/macOS app (optional). |
0 commit comments