[Bug] Cron isolated agentTurn: "already-running" survives restart, run history always empty

## Bug Description

Cron jobs with `sessionTarget: isolated` and `payload.kind: agentTurn` enter a stuck "already-running" state and never recover — even after gateway restart. The `openclaw cron runs --id <jobId>` history shows 0 entries, and manual `cron run` returns `{ok: true, ran: false, reason: "already-running"}` indefinitely.

This is a confirmed manifestation of issue #43452 but with additional detail: the stuck state survives gateway restart, suggesting the runningAtMs flag is persisted in the cron scheduler's internal state file, not cleared on restart.

## Environment

- Version: OpenClaw 2026.4.14 (323493f)
- Gateway: systemd, local loopback
- Node: node 24.14.1, Linux 6.17.0-20-generic
- Affected job: heartbeat-dispatch (sessionTarget=isolated, payload.kind=agentTurn, agentId=qa)

## Evidence

### 1. Stuck running state survives restart

Manual trigger while stuck:
```
$ openclaw cron run <jobId> --timeout 60000
{ok: true, enqueued: true, runId: "manual:<id>:..."}   <- initial manual trigger
$ openclaw cron run <jobId> --timeout 60000
{ok: true, ran: false, reason: "already-running"}         <- stuck forever
```

Gateway restart does NOT clear the stuck state. The job re-enters "already-running" within seconds of restart.

### 2. heartbeat-dispatch.sh executes but scheduler history is empty

The script runs every 30 min and writes to its own log:
```
[2026-04-17T14:30:09Z] heartbeat-dispatch: Subagent track check: STALE:0 REPORTED:0
[2026-04-17T14:30:09Z] heartbeat-dispatch: No PENDING tasks
```

Yet `openclaw cron runs --id <jobId>` returns `total: 0`. The script executes; the scheduler never records it.

### 3. STUCK_RUN_MS logic

From `jobs-cnkUBFyc.js`:
```javascript
const STUCK_RUN_MS = 7200 * 1e3; // 2 hours
if (typeof runningAt === "number" && nowMs - runningAt > STUCK_RUN_MS) {
  state.deps.log.warn({jobId, runningAtMs: runningAt}, "cron: clearing stuck running marker");
  job.state.runningAtMs = void 0;
  changed = true;
}
```

The stuck marker is only cleared after 2 hours. But the job re-triggers before the threshold is hit (cron fires every 30 min for heartbeat-dispatch).

## Root Cause Hypothesis

The `runningAtMs` flag is written BEFORE the isolated session executes. If the isolated session fails to start, the flag is never cleared. Subsequent runs see "already-running" immediately. The 2-hour STUCK_RUN_MS threshold exists but the job re-triggers before it expires.

## Recommended Patch (Option C — Self-healing on restart)

On gateway startup, check all isolated agentTurn jobs with `runningAtMs` set. If the associated isolated session is not actually running, clear the flag immediately. This prevents the "survives restart" behavior and is the lowest-risk fix:

```javascript
// On cron scheduler init / gateway startup:
for (const job of Object.values(state.jobs)) {
  if (job.config.sessionTarget === "isolated" && 
      job.config.payload?.kind === "agentTurn" &&
      typeof job.state.runningAtMs === "number") {
    // Check if isolated session is actually running for this job
    const isRunning = await state.deps.sessionManager.isJobSessionRunning(job.id);
    if (!isRunning) {
      state.deps.log.warn({jobId: job.id}, "cron: clearing orphaned runningAtMs on startup");
      job.state.runningAtMs = void 0;
    }
  }
}
```

## Workaround (for users)

Add a pre-flight reset to heartbeat-dispatch.sh:
```bash
openclaw cron disable <jobId>
sleep 2
openclaw cron enable <jobId>
```

## References

- #43452 — Cron manual run enqueued but never appears in run history
- #65225 — Cron isolated session fails to execute: task stuck in running state  
- #44232 — Cron manual runs enqueue but do not appear in run history (regression)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Cron isolated agentTurn: "already-running" survives restart, run history always empty #68157

Bug Description

Environment

Evidence

1. Stuck running state survives restart

2. heartbeat-dispatch.sh executes but scheduler history is empty

3. STUCK_RUN_MS logic

Root Cause Hypothesis

Recommended Patch (Option C — Self-healing on restart)

Workaround (for users)

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] Cron isolated agentTurn: "already-running" survives restart, run history always empty #68157

Description

Bug Description

Environment

Evidence

1. Stuck running state survives restart

2. heartbeat-dispatch.sh executes but scheduler history is empty

3. STUCK_RUN_MS logic

Root Cause Hypothesis

Recommended Patch (Option C — Self-healing on restart)

Workaround (for users)

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions