Summary
Non-interval heartbeat wake reasons (exec-event, wake, hook, cron:*, background-task, notifications-event) bypass the per-agent interval gate entirely, causing two related problems:
- Burst runs — multiple heartbeats fire in rapid succession when activity resumes after idle periods
- Silent gaps — during true inactivity (no user messages, no exec completions), heartbeats stop entirely because the
setTimeout-based interval timer appears to stall (macOS App Nap / process suspension), and no activity-driven events exist to compensate
Environment
- openclaw version: 2026.4.5
- Platform: macOS (Darwin 24.5.0, Apple Silicon)
- Mode: local gateway, Tailscale serve
- Heartbeat config: defaults only (30m interval, no explicit
heartbeat: block on any agent)
Observed Behavior
Over ~24 hours of activity-log.jsonl (a script that is called when the heartbeat runs) heartbeat entries:
| Metric |
Value |
| Total heartbeats |
59 |
| Expected at 30min intervals |
~47 |
| Gaps > 45 min |
7 |
| Largest gap |
489 min (overnight idle) |
| Burst example |
15 heartbeats in 12 minutes (23:45–23:59) |
Heartbeats correlate almost perfectly with user/message activity. When the gateway has no inbound messages or exec completions, heartbeats stop. When interaction resumes, queued events drain and heartbeats burst.
Root Cause (code trace)
In heartbeat-runner run() (~line 1022 of the bundled dist):
for (const agent of state.agents.values()) {
if (isInterval && now < agent.nextDueMs) continue; // ← only interval checks timing
// ...
advanceAgentSchedule(agent, now); // resets nextDueMs = now + intervalMs
}
isInterval is reason === "interval". All other wake reasons skip the time gate, run immediately, then call advanceAgentSchedule() which pushes nextDueMs forward — effectively "satisfying" the interval from the runner's perspective.
Meanwhile, the interval timer in scheduleNext() uses setTimeout(...).unref() (line 935). On macOS, when the gateway process is idle with no I/O, the OS can suspend the process (App Nap), delaying the timer indefinitely. There's no keepalive or system-level scheduler backing the interval.
Call sites that trigger non-interval heartbeats (all bypass the gate)
| File |
Reason |
Trigger |
exec-defaults |
exec-event |
Background exec notifyOnExit |
server-node-events |
exec-event |
Node exec completion |
server-node-events |
notifications-event |
OS notification change |
server (cron) |
cron:<id> |
Cron job completion |
server |
wake |
Restart sentinel, wake mode |
server (hooks) |
hook:wake, hook:<id> |
Hook triggers |
runtime-internal |
background-task |
Task completion/blocked |
Expected Behavior
- Non-interval wake reasons should still respect a minimum interval between heartbeat runs (e.g., the configured interval or a floor like 60s), even if they're allowed to preempt the next scheduled run
- The interval timer should be resilient to process suspension — either via a system-level scheduler (launchd timer, etc.) or by checking wall-clock drift on any event loop wake and firing overdue heartbeats immediately
advanceAgentSchedule on non-interval runs should not push the next interval-based run further into the future than it already was (i.e., nextDueMs = max(existing nextDueMs, now + intervalMs) rather than unconditionally resetting)
Suggested Fix
In run(), add a minimum-interval check for non-interval reasons:
for (const agent of state.agents.values()) {
if (isInterval && now < agent.nextDueMs) continue;
// Add: enforce minimum spacing for non-interval triggers too
if (!isInterval && typeof agent.lastRunMs === 'number'
&& (now - agent.lastRunMs) < agent.intervalMs) continue;
// ...
}
And in scheduleNext(), add drift detection:
const delay = Math.max(0, nextDue - now);
// If we overslept (App Nap, suspension), fire immediately
if (delay === 0 && nextDue < now) {
requestHeartbeatNow({ reason: "interval", coalesceMs: 0 });
return;
}
Related
Reproduction
- Configure openclaw with default heartbeat (30m) on macOS
- Stop interacting with the gateway for 2+ hours
- Observe
activity-log.jsonl (a script that is called when the heartbeat runs) — no heartbeat entries appear during the idle period
- Resume sending messages — observe a burst of heartbeats in quick succession
- Monitor the
reason field in heartbeat events — non-interval reasons dominate
Summary
Non-interval heartbeat wake reasons (
exec-event,wake,hook,cron:*,background-task,notifications-event) bypass the per-agent interval gate entirely, causing two related problems:setTimeout-based interval timer appears to stall (macOS App Nap / process suspension), and no activity-driven events exist to compensateEnvironment
heartbeat:block on any agent)Observed Behavior
Over ~24 hours of
activity-log.jsonl(a script that is called when the heartbeat runs) heartbeat entries:Heartbeats correlate almost perfectly with user/message activity. When the gateway has no inbound messages or exec completions, heartbeats stop. When interaction resumes, queued events drain and heartbeats burst.
Root Cause (code trace)
In
heartbeat-runnerrun()(~line 1022 of the bundled dist):isIntervalisreason === "interval". All other wake reasons skip the time gate, run immediately, then calladvanceAgentSchedule()which pushesnextDueMsforward — effectively "satisfying" the interval from the runner's perspective.Meanwhile, the interval timer in
scheduleNext()usessetTimeout(...).unref()(line 935). On macOS, when the gateway process is idle with no I/O, the OS can suspend the process (App Nap), delaying the timer indefinitely. There's no keepalive or system-level scheduler backing the interval.Call sites that trigger non-interval heartbeats (all bypass the gate)
exec-defaultsexec-eventnotifyOnExitserver-node-eventsexec-eventserver-node-eventsnotifications-eventserver(cron)cron:<id>serverwakeserver(hooks)hook:wake,hook:<id>runtime-internalbackground-taskExpected Behavior
advanceAgentScheduleon non-interval runs should not push the next interval-based run further into the future than it already was (i.e.,nextDueMs = max(existing nextDueMs, now + intervalMs)rather than unconditionally resetting)Suggested Fix
In
run(), add a minimum-interval check for non-interval reasons:And in
scheduleNext(), add drift detection:Related
Reproduction
activity-log.jsonl(a script that is called when the heartbeat runs) — no heartbeat entries appear during the idle periodreasonfield in heartbeat events — non-interval reasons dominate