Skip to content

Heartbeat: non-interval wake reasons bypass interval enforcement, causing bursts and silent gaps #62294

@bryanpearson

Description

@bryanpearson

Summary

Non-interval heartbeat wake reasons (exec-event, wake, hook, cron:*, background-task, notifications-event) bypass the per-agent interval gate entirely, causing two related problems:

  1. Burst runs — multiple heartbeats fire in rapid succession when activity resumes after idle periods
  2. Silent gaps — during true inactivity (no user messages, no exec completions), heartbeats stop entirely because the setTimeout-based interval timer appears to stall (macOS App Nap / process suspension), and no activity-driven events exist to compensate

Environment

  • openclaw version: 2026.4.5
  • Platform: macOS (Darwin 24.5.0, Apple Silicon)
  • Mode: local gateway, Tailscale serve
  • Heartbeat config: defaults only (30m interval, no explicit heartbeat: block on any agent)

Observed Behavior

Over ~24 hours of activity-log.jsonl (a script that is called when the heartbeat runs) heartbeat entries:

Metric Value
Total heartbeats 59
Expected at 30min intervals ~47
Gaps > 45 min 7
Largest gap 489 min (overnight idle)
Burst example 15 heartbeats in 12 minutes (23:45–23:59)

Heartbeats correlate almost perfectly with user/message activity. When the gateway has no inbound messages or exec completions, heartbeats stop. When interaction resumes, queued events drain and heartbeats burst.

Root Cause (code trace)

In heartbeat-runner run() (~line 1022 of the bundled dist):

for (const agent of state.agents.values()) {
  if (isInterval && now < agent.nextDueMs) continue;  // ← only interval checks timing
  // ...
  advanceAgentSchedule(agent, now);  // resets nextDueMs = now + intervalMs
}

isInterval is reason === "interval". All other wake reasons skip the time gate, run immediately, then call advanceAgentSchedule() which pushes nextDueMs forward — effectively "satisfying" the interval from the runner's perspective.

Meanwhile, the interval timer in scheduleNext() uses setTimeout(...).unref() (line 935). On macOS, when the gateway process is idle with no I/O, the OS can suspend the process (App Nap), delaying the timer indefinitely. There's no keepalive or system-level scheduler backing the interval.

Call sites that trigger non-interval heartbeats (all bypass the gate)

File Reason Trigger
exec-defaults exec-event Background exec notifyOnExit
server-node-events exec-event Node exec completion
server-node-events notifications-event OS notification change
server (cron) cron:<id> Cron job completion
server wake Restart sentinel, wake mode
server (hooks) hook:wake, hook:<id> Hook triggers
runtime-internal background-task Task completion/blocked

Expected Behavior

  1. Non-interval wake reasons should still respect a minimum interval between heartbeat runs (e.g., the configured interval or a floor like 60s), even if they're allowed to preempt the next scheduled run
  2. The interval timer should be resilient to process suspension — either via a system-level scheduler (launchd timer, etc.) or by checking wall-clock drift on any event loop wake and firing overdue heartbeats immediately
  3. advanceAgentSchedule on non-interval runs should not push the next interval-based run further into the future than it already was (i.e., nextDueMs = max(existing nextDueMs, now + intervalMs) rather than unconditionally resetting)

Suggested Fix

In run(), add a minimum-interval check for non-interval reasons:

for (const agent of state.agents.values()) {
  if (isInterval && now < agent.nextDueMs) continue;
  // Add: enforce minimum spacing for non-interval triggers too
  if (!isInterval && typeof agent.lastRunMs === 'number' 
      && (now - agent.lastRunMs) < agent.intervalMs) continue;
  // ...
}

And in scheduleNext(), add drift detection:

const delay = Math.max(0, nextDue - now);
// If we overslept (App Nap, suspension), fire immediately
if (delay === 0 && nextDue < now) {
  requestHeartbeatNow({ reason: "interval", coalesceMs: 0 });
  return;
}

Related

Reproduction

  1. Configure openclaw with default heartbeat (30m) on macOS
  2. Stop interacting with the gateway for 2+ hours
  3. Observe activity-log.jsonl (a script that is called when the heartbeat runs) — no heartbeat entries appear during the idle period
  4. Resume sending messages — observe a burst of heartbeats in quick succession
  5. Monitor the reason field in heartbeat events — non-interval reasons dominate

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions