Skip to content

Commit f9409cb

Browse files
committed
Cron: add scheduler, wakeups, and run history
1 parent 572d17f commit f9409cb

26 files changed

Lines changed: 3393 additions & 334 deletions

docs/configuration.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,30 @@ Array of E.164 phone numbers allowed to trigger the AI. Use `["*"]` to allow eve
112112

113113
> Quick start: If you omit `inbound.reply`, CLAWDIS falls back to the bundled `@mariozechner/pi-coding-agent` with `--mode rpc`, per-sender sessions, and a 200k-token window. No extra install or config needed to get a reply.
114114
115+
### `cron`
116+
117+
Cron is a Gateway-owned scheduler for wakeups and scheduled jobs. See `docs/cron.md` for the full RFC and CLI examples.
118+
119+
| Key | Type | Default | Description |
120+
|-----|------|---------|-------------|
121+
| `enabled` | boolean | `false` | Enable the cron scheduler inside the Gateway |
122+
| `store` | string | *(auto)* | Override the cron job store path (defaults to `~/.clawdis/cron/jobs.json` if present, otherwise `~/.clawdis/cron.json`) |
123+
| `maxConcurrentRuns` | number | `1` | Max concurrent isolated cron runs (command-queue lane `"cron"`) |
124+
125+
Run history:
126+
- The Gateway appends a JSONL run ledger on each job completion (see `docs/cron.md`). Location is derived from `cron.store` / the resolved store path.
127+
128+
Example:
129+
130+
```json5
131+
{
132+
cron: {
133+
enabled: true,
134+
maxConcurrentRuns: 2
135+
}
136+
}
137+
```
138+
115139
### Template Variables
116140

117141
Use these in your command:

docs/cron.md

Lines changed: 364 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,364 @@
1+
---
2+
summary: "RFC: Cron jobs + wakeups for Clawd/Clawdis (main vs isolated sessions)"
3+
read_when:
4+
- Designing scheduled jobs, alarms, or wakeups
5+
- Adding Gateway methods or CLI commands for automation
6+
- Adjusting heartbeat behavior or session routing
7+
---
8+
9+
# RFC: Cron jobs + wakeups for Clawd
10+
11+
Status: Draft
12+
Last updated: 2025-12-13
13+
14+
## Context
15+
16+
Clawdis already has:
17+
- A **periodic reply heartbeat** that runs the agent with `HEARTBEAT /think:high` and suppresses `HEARTBEAT_OK` (`src/web/auto-reply.ts`).
18+
- A lightweight, in-memory **system event queue** (`enqueueSystemEvent`) that is injected into the next **main session** turn (`drainSystemEvents` in `src/auto-reply/reply.ts`).
19+
- A WebSocket **Gateway** daemon that is intended to be always-on (`docs/gateway.md`).
20+
21+
This RFC adds a small “cron job system” so Clawd can schedule future work and reliably wake itself up:
22+
- **Delayed**: run on the *next* normal heartbeat tick
23+
- **Immediate**: run *now* (trigger a heartbeat immediately)
24+
- **Isolated jobs**: optionally run in their own session that does not pollute the main session and can run concurrently (within configured limits).
25+
26+
## Goals
27+
28+
- Provide a **persistent job store** and an **in-process scheduler** owned by the Gateway.
29+
- Allow each job to target either:
30+
- `sessionTarget: "main"`: inject as `System:` lines and rely on the main heartbeat (or trigger it immediately).
31+
- `sessionTarget: "isolated"`: run an agent turn in a dedicated session key (job session), optionally delivering a message and/or posting a summary back to main.
32+
- Expose a stable control surface:
33+
- **Gateway methods** (`cron.*`, `wake`) for programmatic usage (mac app, CLI, agents).
34+
- **CLI commands** (`clawdis cron ...`) to add/remove/edit/list and to debug `run`.
35+
- Produce clear, structured **logs** for job lifecycle and execution outcomes.
36+
37+
## Non-goals (v1)
38+
39+
- Multi-host distributed scheduling.
40+
- Exactly-once semantics across crashes (we aim for “at-least-once with idempotency hooks”).
41+
- A full Unix-cron parser as the only schedule format (we can support it, but v1 should not require complex cron features to be useful).
42+
43+
## Terminology
44+
45+
- **Wake**: a request to ensure the agent gets a turn soon (either right now or next heartbeat).
46+
- **Main session**: the canonical session bucket (default key `"main"`) that receives `System:` events.
47+
- **Isolated session**: a per-job session key (e.g. `cron:<jobId>`) with its own session id / session file.
48+
49+
## User stories
50+
51+
- “Remind me in 20 minutes” → add a one-shot job that triggers an immediate heartbeat at T+20m.
52+
- “Every weekday at 7:30, wake me up and start music” → recurring job, isolated session, deliver to WhatsApp.
53+
- “Every hour, check battery; only interrupt me if < 20%” → isolated job that decides whether to deliver; may also post a brief status to main.
54+
- “Next heartbeat, please check calendar” → delayed wake targeting main session.
55+
56+
## Job model
57+
58+
### Storage schema (v1)
59+
60+
Each job is a JSON object with stable keys (unknown keys ignored for forward compatibility):
61+
62+
- `id: string` (UUID)
63+
- `name?: string`
64+
- `enabled: boolean`
65+
- `createdAtMs: number`
66+
- `updatedAtMs: number`
67+
- `schedule` (one of)
68+
- `{"kind":"at","atMs":number}` (one-shot)
69+
- `{"kind":"every","everyMs":number,"anchorMs"?:number}` (simple interval)
70+
- `{"kind":"cron","expr":string,"tz"?:string}` (optional; see “Schedule parsing”)
71+
- `sessionTarget: "main" | "isolated"`
72+
- `wakeMode: "next-heartbeat" | "now"`
73+
- For `sessionTarget:"isolated"`, `wakeMode:"now"` means “run immediately when due”.
74+
- For `sessionTarget:"main"`, `wakeMode` controls whether we trigger the heartbeat immediately or just enqueue and wait.
75+
- `payload` (one of)
76+
- `{"kind":"systemEvent","text":string}` (enqueue as `System:`)
77+
- `{"kind":"agentTurn","message":string,"deliver"?:boolean,"channel"?: "last"|"whatsapp"|"telegram","to"?:string,"timeoutSeconds"?:number}`
78+
- `isolation` (optional; only meaningful for isolated jobs)
79+
- `{"postToMain": boolean, "postToMainPrefix"?: string}`
80+
- `runtime` (optional)
81+
- `{"maxAttempts"?:number,"retryBackoffMs"?:number}` (best-effort retries; defaults off)
82+
- `state` (runtime-maintained)
83+
- `{"nextRunAtMs":number,"lastRunAtMs"?:number,"lastStatus"?: "ok"|"error"|"skipped","lastError"?:string,"lastDurationMs"?:number}`
84+
85+
### Key behavior
86+
87+
- `sessionTarget:"main"` jobs always enqueue `payload.kind:"systemEvent"` (directly or derived from `agentTurn` results; see below).
88+
- `sessionTarget:"isolated"` jobs create/use a stable session key: `cron:<jobId>`.
89+
90+
## Storage location
91+
92+
We can store this directly under `~/.clawdis` without a subfolder, but a folder gives us room for future artifacts (per-job state, migration backups, run history).
93+
94+
Current behavior (v1):
95+
- Default store: `~/.clawdis/cron.json`
96+
- If `~/.clawdis/cron/jobs.json` exists, it is preferred (and is a good location for future per-cron artifacts).
97+
- Any path can be forced via `cron.store` in config.
98+
99+
The scheduler should never require additional configuration for the base directory (Clawdis already treats `~/.clawdis` as fixed).
100+
101+
## Enabling
102+
103+
Cron execution should be opt-in via config:
104+
105+
```json5
106+
{
107+
cron: {
108+
enabled: true,
109+
// optional:
110+
store: "~/.clawdis/cron.json",
111+
maxConcurrentRuns: 1
112+
}
113+
}
114+
```
115+
116+
## Scheduler design
117+
118+
### Ownership
119+
120+
The Gateway owns:
121+
- the scheduler timer,
122+
- job store reads/writes,
123+
- job execution (enqueue system events and/or agent turns).
124+
125+
This keeps scheduling unified with the always-on process and prevents “two schedulers” when multiple CLIs run.
126+
127+
### Timer strategy
128+
129+
- Maintain an in-memory heap/array of enabled jobs keyed by `state.nextRunAtMs`.
130+
- Use a **single `setTimeout`** to wake at the earliest next run.
131+
- On wake:
132+
- compute all due jobs (now >= nextRunAtMs),
133+
- mark them “in flight” (in memory),
134+
- persist updated `state` (at least bump `nextRunAtMs` / `lastRunAtMs`) before starting execution to minimize duplicate runs on crash,
135+
- execute jobs (with concurrency limits),
136+
- persist final `lastStatus/lastError/lastDurationMs`,
137+
- re-arm timer for the next earliest run.
138+
139+
### Schedule parsing
140+
141+
V1 can ship with `at` + `every` without extra deps.
142+
143+
If we add `"kind":"cron"`:
144+
- Use a well-maintained parser (we use `croner`) and support:
145+
- 5-field cron (`min hour dom mon dow`) at minimum
146+
- optional `tz`
147+
- Store `nextRunAtMs` computed by the parser; re-compute after each run.
148+
149+
## Execution semantics
150+
151+
### Main session jobs
152+
153+
Main session jobs do not run the agent directly by default.
154+
155+
When due:
156+
1) `enqueueSystemEvent(job.payload.text)` (or a derived message)
157+
2) If `wakeMode:"now"`, trigger an immediate heartbeat run (see “Heartbeat wake hook”).
158+
3) Otherwise do nothing else (the next scheduled heartbeat will pick up the system event).
159+
160+
Why: This keeps the main session’s “proactive” behavior centralized in the heartbeat rules and avoids ad-hoc agent turns that might fight with inbound message processing.
161+
162+
### Isolated session jobs
163+
164+
Isolated jobs run an agent turn in a dedicated session key, intended to be separate from main.
165+
166+
When due:
167+
- Build a message body that includes schedule metadata, e.g.:
168+
- `"[cron:<jobId>] <job.name>: <payload.message>"`
169+
- Execute via the same agent runner path as other command-mode runs, but pinned to:
170+
- `sessionKey = cron:<jobId>`
171+
- `sessionId = store[sessionKey].sessionId` (create if missing)
172+
- Optionally deliver output (`payload.deliver === true`) to the configured channel/to.
173+
- If `isolation.postToMain` is true, enqueue a summary system event to main, e.g.:
174+
- `System: Cron "<name>" completed: <1-line summary>`
175+
176+
### “Run in parallel to main”
177+
178+
Clawdis currently serializes command execution through a global in-process queue (`src/process/command-queue.ts`) to avoid collisions.
179+
180+
To support isolated cron jobs running “in parallel”, we should introduce **lanes** (keyed queues) plus a global concurrency cap:
181+
- Lane `"main"`: inbound auto-replies + main heartbeat.
182+
- Lane `"cron"` (or `cron:<jobId>`): isolated jobs.
183+
- Configurable `cron.maxConcurrentRuns` (default 1 or 2).
184+
185+
This yields:
186+
- isolated jobs can overlap with the main lane (up to cap),
187+
- each lane still preserves ordering for its own work (optional),
188+
- we retain safety knobs to prevent runaway resource contention.
189+
190+
## Heartbeat wake hook (immediate vs next heartbeat)
191+
192+
We need a way for the Gateway (or the scheduler) to request an immediate heartbeat without duplicating heartbeat logic.
193+
194+
Design:
195+
- `monitorWebProvider` owns the real `runReplyHeartbeat()` function (it already has all the local state needed).
196+
- Add a small global hook module:
197+
- `setReplyHeartbeatWakeHandler(fn | null)` installed by `monitorWebProvider`
198+
- `requestReplyHeartbeatNow({ reason, coalesceMs? })`
199+
- If the handler is absent (provider not connected), the request is stored as “pending”; the next time the handler is installed, it runs once.
200+
- Coalesce rapid calls and respect the existing “skip when queue busy” behavior (prefer retrying soon vs dropping).
201+
202+
## Run history log (JSONL)
203+
204+
In addition to normal structured logs, the Gateway writes an append-only run history “ledger” (JSONL) whenever a job finishes. This is intended for quick debugging (“did the job run, when, and what happened?”).
205+
206+
Path rules:
207+
- If the cron store path basename is `jobs.json` (e.g. `~/.clawdis/cron/jobs.json`), logs go to `.../runs/<jobId>.jsonl` (e.g. `~/.clawdis/cron/runs/<jobId>.jsonl`).
208+
- Otherwise logs go to `<storeBase>.runs.jsonl` in the same directory (e.g. `~/.clawdis/cron.json``~/.clawdis/cron.runs.jsonl`).
209+
210+
Retention:
211+
- Best-effort pruning when the file grows beyond ~2MB; keep the newest ~2000 lines.
212+
213+
## Gateway API
214+
215+
New methods (names can be bikeshed; `cron.*` is suggested):
216+
217+
- `wake`
218+
- params: `{ mode: "now" | "next-heartbeat", text: string }`
219+
- effect: `enqueueSystemEvent(text)`, plus optional immediate heartbeat trigger
220+
221+
- `cron.list`
222+
- params: optional `{ includeDisabled?: boolean }`
223+
- returns: `{ jobs: CronJob[] }`
224+
225+
- `cron.add`
226+
- params: job payload without `id/state` (server generates and returns created job)
227+
228+
- `cron.update`
229+
- params: `{ id: string, patch: Partial<CronJobWritableFields> }`
230+
231+
- `cron.remove`
232+
- params: `{ id: string }`
233+
234+
- `cron.run`
235+
- params: `{ id: string, mode?: "due" | "force" }` (debugging; does not change schedule unless `force` requires it)
236+
237+
- `cron.runs`
238+
- params: `{ id?: string, limit?: number }`
239+
- returns: `{ entries: CronRunLogEntry[] }`
240+
- note: if the store layout is `.../jobs.json`, `id` is required (runs are stored per-job).
241+
242+
The Gateway should broadcast a `cron` event for UI/debug:
243+
- event: `cron`
244+
- payload: `{ jobId, action: "added"|"updated"|"removed"|"started"|"finished", status?, error?, nextRunAtMs? }`
245+
246+
## CLI surface
247+
248+
Add a `cron` command group (all commands should also support `--json` where sensible):
249+
250+
- `clawdis cron list [--json] [--all]`
251+
- `clawdis cron add ...`
252+
- schedule flags:
253+
- `--at <iso8601|ms|relative>` (one-shot)
254+
- `--every <duration>` (e.g. `10m`, `1h`)
255+
- `--cron "<expr>" [--tz "<tz>"]`
256+
- target flags:
257+
- `--session main|isolated`
258+
- `--wake now|next`
259+
- payload flags (choose one):
260+
- `--system-event "<text>"`
261+
- `--message "<agent message>" [--deliver] [--channel last|whatsapp|telegram] [--to <dest>]`
262+
263+
- `clawdis cron edit <id> ...` (patch-by-flags, non-interactive)
264+
- `clawdis cron rm <id>`
265+
- `clawdis cron enable <id>` / `clawdis cron disable <id>`
266+
- `clawdis cron run <id> [--force]` (debug)
267+
268+
Additionally:
269+
- `clawdis wake --mode now|next --text "<text>"` as a thin wrapper around `wake` for agents to call.
270+
271+
## Examples
272+
273+
### Run once at a specific time
274+
275+
One-shot reminder that targets the main session and triggers a heartbeat immediately at the scheduled time:
276+
277+
```bash
278+
clawdis cron add \
279+
--at "2025-12-14T07:00:00-08:00" \
280+
--session main \
281+
--wake now \
282+
--system-event "Alarm: wake up (meeting in 30 minutes)."
283+
```
284+
285+
### Run daily (calendar-accurate)
286+
287+
Daily at 07:00 in a specific timezone (preferred over “every 24h” to avoid DST drift):
288+
289+
```bash
290+
clawdis cron add \
291+
--cron "0 7 * * *" \
292+
--tz "America/Los_Angeles" \
293+
--session isolated \
294+
--wake now \
295+
--message "Daily check: scan calendar + inbox; deliver only if urgent." \
296+
--deliver \
297+
--channel last
298+
```
299+
300+
### Run weekly (every Wednesday)
301+
302+
Every Wednesday at 09:00:
303+
304+
```bash
305+
clawdis cron add \
306+
--cron "0 9 * * 3" \
307+
--tz "America/Los_Angeles" \
308+
--session isolated \
309+
--wake now \
310+
--message "Weekly: summarize status and remind me of goals." \
311+
--deliver \
312+
--channel last
313+
```
314+
315+
### “Next heartbeat”
316+
317+
Enqueue a note for the main session but let the existing heartbeat cadence pick it up:
318+
319+
```bash
320+
clawdis wake --mode next --text "Next heartbeat: check battery + upcoming meetings."
321+
```
322+
323+
## Logging & observability
324+
325+
Logging requirements:
326+
- Use `getChildLogger({ module: "cron", jobId, runId, name })` for every run.
327+
- Log lifecycle:
328+
- store load/save (debug; include job count)
329+
- schedule recompute (debug; include nextRunAt)
330+
- job start/end (info)
331+
- job skipped (info; include reason)
332+
- job error (warn; include error + stack where available)
333+
- Emit a concise user-facing line to stdout when running in CLI mode (similar to heartbeat logs).
334+
335+
Suggested log events:
336+
- `cron: scheduler started` (jobCount, nextWakeAt)
337+
- `cron: job started` (jobId, scheduleKind, sessionTarget, wakeMode)
338+
- `cron: job finished` (status, durationMs, nextRunAtMs)
339+
340+
## Safety & security
341+
342+
- Respect existing allowlists/routing rules: delivery defaults should not send to arbitrary destinations unless explicitly configured.
343+
- Provide a global “kill switch”:
344+
- `cron.enabled: boolean` config default true (or false until enabled).
345+
- `gateway method set-heartbeats` already exists; cron should have similar.
346+
- Avoid persistence of sensitive payloads unless requested; job text may contain private content.
347+
348+
## Testing plan (v1)
349+
350+
- Unit tests:
351+
- schedule computation for `at` and `every`
352+
- job store read/write + migration behavior
353+
- lane concurrency: main vs cron overlap is bounded
354+
- “wake now” coalescing and pending behavior when provider not ready
355+
- Integration tests:
356+
- start Gateway with `CLAWDIS_SKIP_PROVIDERS=1`, add jobs, list/edit/remove
357+
- simulate due jobs and assert `enqueueSystemEvent` called + cron events broadcast
358+
359+
## Rollout plan
360+
361+
1) Add the `wake` primitive + heartbeat wake hook (no persistent jobs yet).
362+
2) Add `cron.*` API and CLI wrappers with `at` + `every`.
363+
3) Add optional cron expression parsing (`kind:"cron"`) if needed.
364+
4) Add UI surfacing in WebChat/macOS app (optional).

0 commit comments

Comments
 (0)