Skip to content

Heartbeat flood: exec-event bypasses interval check, causing runaway heartbeat runs #17797

@ghostllm

Description

@ghostllm

Bug Description

requestHeartbeatNow() with non-interval reasons (e.g. exec:SESSION:exit, exec-event) bypasses the heartbeat interval check in the runner's run() function, causing rapid-fire heartbeat runs instead of respecting the configured interval (e.g. 30 minutes).

Root Cause

In the heartbeat runner, the run() function only enforces the interval for reason === "interval":

const isInterval = reason === "interval";
// ...
for (const agent of state.agents.values()) {
    if (isInterval && now < agent.nextDueMs) continue;  // only checked for "interval"
    // ... runOnce fires regardless of nextDueMs for non-interval reasons
}

Meanwhile, in the subagent registry:

// Every exec tool completion triggers this:
requestHeartbeatNow({ reason: `exec:${session.id}:exit` });

// Every exec system event triggers this:
requestHeartbeatNow({ reason: "exec-event" });

And in the schedule() function's finally block:

finally {
    running = false;
    if (pendingWake || scheduled) schedule(delay, "normal");  // immediately reschedules
}

The Cascade

  1. Normal heartbeat fires (reason: "interval") — agent runs, possibly using exec tool or spawning sub-agents
  2. Exec tool completes → requestHeartbeatNow({ reason: "exec:xxx:exit" }) → sets pendingWake
  3. Since running === true (heartbeat is still in progress), the schedule() function defers
  4. When the current heartbeat run finishes, the finally block sees pendingWake and immediately reschedules
  5. Next run fires with reason "exec:xxx:exit"not "interval" — so the interval check is skipped
  6. advanceAgentSchedule sets nextDueMs = now + intervalMs, but this is irrelevant since the check is bypassed
  7. If any concurrent sub-agents or exec processes are running, their completions keep feeding requestHeartbeatNow()
  8. Result: infinite loop of heartbeat runs, each completing in ~3-7 seconds

Observed Impact

  • 244 heartbeat runs in ~2.5 hours instead of the expected ~5 (30-minute interval)
  • All runs used claude-opus-4-6-thinking (the configured main model)
  • Quota exhausted, estimated ~$90 in wasted tokens
  • Normal heartbeat interval: 1,800,000ms (30 min), actual interval during flood: ~3-10 seconds
  • Triggered by sub-agent exec completions feeding back into the heartbeat wake system

Evidence from Logs

# 244 heartbeat runs, all in the same session
244 messageChannel=heartbeat  (sessionId=caa61517-...)

# Runs per minute during flood peak:
10  2026-02-16T03:40
21  2026-02-16T04:10
10  2026-02-16T04:38
10  2026-02-16T04:42
10  2026-02-16T04:44

# 352 exec tool calls across all runs that day, each generating exit events

Suggested Fix

Non-interval reasons like exec-event and exec:SESSION:exit should still respect the heartbeat interval (or at minimum a cooldown period). Options:

  1. Enforce nextDueMs for all reasons (simplest): Remove the isInterval gate so all reasons respect the schedule
  2. Add a minimum cooldown: Even for exec-event reasons, require at least N seconds since the last heartbeat run
  3. Debounce exec-event wakes: Coalesce multiple exec completions into a single heartbeat wake with a longer delay (e.g. 30-60 seconds)
  4. Separate exec-event handling: Don't use the heartbeat runner for exec-event delivery — handle it through a different path that doesn't trigger full heartbeat runs

Option 2 or 3 seems most appropriate since exec-events should eventually trigger a heartbeat check, just not at the rate of one per exec completion.

Environment

  • OpenClaw version: latest beta (wizard version 2026.2.13)
  • Node.js: v22.22.0
  • OS: Ubuntu 24.04
  • Config: heartbeat.every = 1800000 (30 min)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions