Skip to content

Commit 1fae716

Browse files
committed
fix: recover stale cron task records
1 parent 9d21200 commit 1fae716

15 files changed

Lines changed: 609 additions & 15 deletions

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ Docs: https://docs.openclaw.ai
8080
- Installer: load nvm before Node.js detection so `curl | bash` installs respect nvm-managed Node instead of stale system Node. Fixes #49556. Thanks @heavenlxj.
8181
- CLI/Volta: respawn raw `openclaw` CLI runs through the named `node` shim when the current Node executable resolves to `volta-shim`, avoiding direct shim execution failures in non-interactive shells. Fixes #68672. Thanks @sanchezm86.
8282
- Installer: warn when multiple npm global roots contain OpenClaw installs, showing active Node/npm/openclaw plus each install path and version so stale version-manager installs are visible. Fixes #40839. Thanks @zhixianio.
83+
- Cron/tasks: recover completed cron task ledger records from durable run logs and job state before marking them `lost`, reducing false `backing session missing` audit errors for isolated cron runs and keeping offline CLI audit from treating its empty local cron active-job set as authoritative. Fixes #71963.
8384
- Docker: copy patched dependency files into runtime images so downstream `pnpm install` layers keep working. Fixes #69224. Thanks @gucasbrg.
8485
- Agents/runtime: submit heartbeat, cron, and exec wakeups as transient runtime context instead of visible user prompts, keeping synthetic system work out of chat transcripts. Fixes #66496 and #66814. Thanks @jeades and @mandomaker.
8586
- Telegram: include native quote excerpts automatically for threaded replies and reply tags when the original Telegram text is available, without adding another config knob. Fixes #6975. Thanks @rex05ai.

docs/automation/cron-jobs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ Cron is the Gateway's built-in scheduler. It persists jobs, wakes the agent at t
5151
<a id="maintenance"></a>
5252

5353
<Note>
54-
Task reconciliation for cron is runtime-owned: an active cron task stays live while the cron runtime still tracks that job as running, even if an old child session row still exists. Once the runtime stops owning the job and the 5-minute grace window expires, maintenance can mark the task `lost`.
54+
Task reconciliation for cron is runtime-owned first, durable-history-backed second: an active cron task stays live while the cron runtime still tracks that job as running, even if an old child session row still exists. Once the runtime stops owning the job and the 5-minute grace window expires, maintenance checks persisted run logs and job state for the matching `cron:<jobId>:<startedAt>` run. If that durable history shows a terminal result, the task ledger is finalized from it; otherwise Gateway-owned maintenance can mark the task `lost`. Offline CLI audit can recover from durable history, but it does not treat its own empty in-process active-job set as proof that a Gateway-owned cron run is gone.
5555
</Note>
5656

5757
## Schedule types

docs/automation/tasks.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,12 @@ Not every agent run creates a task. Heartbeat turns and normal interactive chat
2525
- Tasks are **records**, not schedulers — cron and heartbeat decide _when_ work runs, tasks track _what happened_.
2626
- ACP, subagents, all cron jobs, and CLI operations create tasks. Heartbeat turns do not.
2727
- Each task moves through `queued → running → terminal` (succeeded, failed, timed_out, cancelled, or lost).
28-
- Cron tasks stay live while the cron runtime still owns the job; chat-backed CLI tasks stay live only while their owning run context is still active.
29-
- Completion is push-driven: detached work can notify directly or wake the requester session/heartbeat when it finishes, so status polling loops are usually the wrong shape.
28+
- Cron tasks stay live while the cron runtime still owns the job; if the
29+
in-memory runtime state is gone, task maintenance first checks durable cron
30+
run history before marking a task lost.
31+
- Completion is push-driven: detached work can notify directly or wake the
32+
requester session/heartbeat when it finishes, so status polling loops are
33+
usually the wrong shape.
3034
- Isolated cron runs and subagent completions best-effort clean up tracked browser tabs/processes for their child session before final cleanup bookkeeping.
3135
- Isolated cron delivery suppresses stale interim parent replies while descendant subagent work is still draining, and it prefers final descendant output when that arrives before delivery.
3236
- Completion notifications are delivered directly to a channel or queued for the next heartbeat.
@@ -143,8 +147,14 @@ Agent run completion is authoritative for active task records. A successful deta
143147

144148
- ACP tasks: backing ACP child session metadata disappeared.
145149
- Subagent tasks: backing child session disappeared from the target agent store.
146-
- Cron tasks: the cron runtime no longer tracks the job as active.
147-
- CLI tasks: isolated child-session tasks use the child session; chat-backed CLI tasks use the live run context instead, so lingering channel/group/direct session rows do not keep them alive. Gateway-backed `openclaw agent` runs also finalize from their run result, so completed runs do not sit active until the sweeper marks them `lost`.
150+
- Cron tasks: the cron runtime no longer tracks the job as active and durable
151+
cron run history does not show a terminal result for that run. Offline CLI
152+
audit does not treat its own empty in-process cron runtime state as authority.
153+
- CLI tasks: isolated child-session tasks use the child session; chat-backed
154+
CLI tasks use the live run context instead, so lingering
155+
channel/group/direct session rows do not keep them alive. Gateway-backed
156+
`openclaw agent` runs also finalize from their run result, so completed runs
157+
do not sit active until the sweeper marks them `lost`.
148158

149159
## Delivery and notifications
150160

@@ -236,7 +246,7 @@ openclaw tasks notify <lookup> state_changes
236246
Reconciliation is runtime-aware:
237247

238248
- ACP/subagent tasks check their backing child session.
239-
- Cron tasks check whether the cron runtime still owns the job.
249+
- Cron tasks check whether the cron runtime still owns the job, then recover terminal status from persisted cron run logs/job state before falling back to `lost`. Only the Gateway process is authoritative for the in-memory cron active-job set; offline CLI audit uses durable history but does not mark a cron task lost solely because that local Set is empty.
240250
- Chat-backed CLI tasks check the owning live run context, not just the chat session row.
241251

242252
Completion cleanup is also runtime-aware:

docs/cli/tasks.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,10 @@ openclaw tasks maintenance [--apply] [--json]
8484
```
8585

8686
Previews or applies task and Task Flow reconciliation, cleanup stamping, and pruning.
87+
For cron tasks, reconciliation uses persisted run logs/job state before marking an
88+
old active task `lost`, so completed cron runs do not become false audit errors
89+
just because the in-memory Gateway runtime state is gone. Offline CLI audit is
90+
not authoritative for the Gateway's process-local cron active-job set.
8791

8892
### `flow`
8993

src/commands/status.summary.test.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ vi.mock("../infra/system-events.js", () => ({
5858
}));
5959

6060
vi.mock("../tasks/task-registry.maintenance.js", () => ({
61+
configureTaskRegistryMaintenance: vi.fn(),
6162
getInspectableTaskRegistrySummary: vi.fn(() => ({
6263
total: 0,
6364
active: 0,

src/commands/status.summary.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import { resolveStorePath } from "../config/sessions/paths.js";
44
import { readSessionStoreReadOnly } from "../config/sessions/store-read.js";
55
import { resolveSessionTotalTokens, type SessionEntry } from "../config/sessions/types.js";
66
import type { OpenClawConfig } from "../config/types.js";
7+
import { resolveCronStorePath } from "../cron/store.js";
78
import { listGatewayAgentsBasic } from "../gateway/agent-list.js";
89
import { resolveHeartbeatSummaryForAgent } from "../infra/heartbeat-summary.js";
910
import { peekSystemEvents } from "../infra/system-events.js";
@@ -151,6 +152,9 @@ export async function getStatusSummary(
151152
const mainSessionKey = resolveMainSessionKey(cfg);
152153
const queuedSystemEvents = peekSystemEvents(mainSessionKey);
153154
const taskMaintenanceModule = await loadTaskRegistryMaintenanceModule();
155+
taskMaintenanceModule.configureTaskRegistryMaintenance({
156+
cronStorePath: resolveCronStorePath(cfg.cron?.store),
157+
});
154158
const tasks = taskMaintenanceModule.getInspectableTaskRegistrySummary();
155159
const taskAudit = taskMaintenanceModule.getInspectableTaskAuditSummary();
156160

src/commands/tasks.ts

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
import { loadConfig } from "../config/config.js";
2+
import { resolveCronStorePath } from "../cron/store.js";
13
import type { RuntimeEnv } from "../runtime.js";
24
import { normalizeOptionalString } from "../shared/string-coerce.js";
35
import { getTaskById, updateTaskNotifyPolicyById } from "../tasks/runtime-internal.js";
@@ -24,6 +26,7 @@ import { compareTaskAuditFindingSortKeys } from "../tasks/task-registry.audit.sh
2426
import {
2527
getInspectableTaskAuditSummary,
2628
getInspectableTaskRegistrySummary,
29+
configureTaskRegistryMaintenance,
2730
previewTaskRegistryMaintenance,
2831
runTaskRegistryMaintenance,
2932
} from "../tasks/task-registry.maintenance.js";
@@ -44,10 +47,16 @@ const RUN_PAD = 10;
4447
const info = theme.info;
4548

4649
async function loadTaskCancelConfig() {
47-
const { loadConfig } = await import("../config/config.js");
4850
return loadConfig();
4951
}
5052

53+
function configureTaskMaintenanceFromConfig(): void {
54+
const cfg = loadConfig();
55+
configureTaskRegistryMaintenance({
56+
cronStorePath: resolveCronStorePath(cfg.cron?.store),
57+
});
58+
}
59+
5160
function truncate(value: string, maxChars: number) {
5261
if (value.length <= maxChars) {
5362
return value;
@@ -417,6 +426,7 @@ export async function tasksAuditCommand(
417426
},
418427
runtime: RuntimeEnv,
419428
) {
429+
configureTaskMaintenanceFromConfig();
420430
const severityFilter = opts.severity?.trim() as TaskSystemAuditSeverity | undefined;
421431
const codeFilter = opts.code?.trim() as TaskSystemAuditCode | undefined;
422432
const { allFindings, filteredFindings, taskFindings, summary } = toSystemAuditFindings({
@@ -491,6 +501,7 @@ export async function tasksMaintenanceCommand(
491501
opts: { json?: boolean; apply?: boolean },
492502
runtime: RuntimeEnv,
493503
) {
504+
configureTaskMaintenanceFromConfig();
494505
const auditBefore = getInspectableTaskAuditSummary();
495506
const flowAuditBefore = getInspectableTaskFlowAuditSummary();
496507
const taskMaintenance = opts.apply

src/cron/run-log.test.ts

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import {
99
getPendingCronRunLogWriteCountForTests,
1010
readCronRunLogEntries,
1111
readCronRunLogEntriesPage,
12+
readCronRunLogEntriesSync,
1213
resolveCronRunLogPruneOptions,
1314
resolveCronRunLogPath,
1415
} from "./run-log.js";
@@ -96,6 +97,36 @@ describe("cron run log", () => {
9697
});
9798
});
9899

100+
it("reads run-log entries synchronously for task reconciliation", async () => {
101+
await withRunLogDir("openclaw-cron-log-sync-", async (dir) => {
102+
const logPath = path.join(dir, "runs", "job-1.jsonl");
103+
await appendCronRunLog(logPath, {
104+
ts: 1000,
105+
jobId: "job-1",
106+
action: "finished",
107+
status: "ok",
108+
runAtMs: 900,
109+
durationMs: 100,
110+
});
111+
await appendCronRunLog(logPath, {
112+
ts: 2000,
113+
jobId: "job-2",
114+
action: "finished",
115+
status: "error",
116+
});
117+
118+
expect(readCronRunLogEntriesSync(logPath, { jobId: "job-1" })).toEqual([
119+
expect.objectContaining({
120+
jobId: "job-1",
121+
status: "ok",
122+
runAtMs: 900,
123+
durationMs: 100,
124+
}),
125+
]);
126+
expect(readCronRunLogEntriesSync(path.join(dir, "runs", "missing.jsonl"))).toEqual([]);
127+
});
128+
});
129+
99130
it.skipIf(process.platform === "win32")(
100131
"writes run log files with secure permissions",
101132
async () => {

src/cron/run-log.ts

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import { randomBytes } from "node:crypto";
2+
import fsSync from "node:fs";
23
import fs from "node:fs/promises";
34
import path from "node:path";
45
import { parseByteSize } from "../cli/parse-bytes.js";
@@ -198,6 +199,23 @@ export async function readCronRunLogEntries(
198199
return page.entries.toReversed();
199200
}
200201

202+
export function readCronRunLogEntriesSync(
203+
filePath: string,
204+
opts?: { limit?: number; jobId?: string },
205+
): CronRunLogEntry[] {
206+
const limit = Math.max(1, Math.min(5000, Math.floor(opts?.limit ?? 200)));
207+
let raw: string;
208+
try {
209+
raw = fsSync.readFileSync(path.resolve(filePath), "utf-8");
210+
} catch (error) {
211+
if (typeof error === "object" && error !== null && "code" in error && error.code === "ENOENT") {
212+
return [];
213+
}
214+
throw error;
215+
}
216+
return parseAllRunLogEntries(raw, { jobId: opts?.jobId }).slice(-limit);
217+
}
218+
201219
function normalizeRunStatusFilter(status?: string): CronRunLogStatusFilter {
202220
if (status === "ok" || status === "error" || status === "skipped" || status === "all") {
203221
return status;

src/cron/store.test.ts

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ import os from "node:os";
33
import path from "node:path";
44
import { setTimeout as scheduleNativeTimeout } from "node:timers";
55
import { afterAll, afterEach, beforeAll, beforeEach, describe, expect, it, vi } from "vitest";
6-
import { loadCronStore, resolveCronStorePath, saveCronStore } from "./store.js";
6+
import { loadCronStore, loadCronStoreSync, resolveCronStorePath, saveCronStore } from "./store.js";
77
import type { CronStoreFile } from "./types.js";
88

99
let fixtureRoot = "";
@@ -125,6 +125,19 @@ describe("cron store", () => {
125125
});
126126
});
127127

128+
it("loads split cron state synchronously for task reconciliation", async () => {
129+
const { storePath } = await makeStorePath();
130+
await saveCronStore(storePath, makeStore("job-sync", true));
131+
132+
const loaded = loadCronStoreSync(storePath);
133+
134+
expect(loaded.jobs[0]).toMatchObject({
135+
id: "job-sync",
136+
state: expect.any(Object),
137+
updatedAtMs: expect.any(Number),
138+
});
139+
});
140+
128141
it("does not create a backup file when saving unchanged content", async () => {
129142
const store = await makeStorePath();
130143
const payload = makeStore("job-1", true);

0 commit comments

Comments
 (0)