Skip to content

BUG: heartbeat fires duplicate runs when external wake events (openclaw agent CLI) arrive during scheduled heartbeat #64016

@foreverxdord

Description

@foreverxdord

Bug type

Behavior bug (duplicate/extra heartbeat runs beyond configured interval)

Summary

When heartbeat is configured with isolatedSession and every=60m, extra heartbeat runs occur within the same hour window. These duplicate heartbeats are triggered by external wake events (e.g. openclaw agent CLI calls) that arrive while a scheduled heartbeat is already running.

The heartbeat wake system (heartbeat-wake.ts) has a coalescing mechanism that queues incoming wake requests when a heartbeat is already running. Once the running heartbeat completes, the queued wake fires immediately, causing a second heartbeat run within the same interval window.

Steps to reproduce

  1. Configure an agent with heartbeat enabled: every=60m, isolatedSession=true, session pointing to a feishu group
  2. Start the gateway
  3. While a scheduled heartbeat is running (takes 30-60s), send a message via CLI: openclaw agent --agent alice --session-id test-123 --message test
  4. Observe: a second heartbeat fires within minutes of the first one

Expected behavior

One heartbeat per configured interval. External wake events should not trigger additional heartbeat runs.

Actual behavior

Evidence from raw.log (4 days):

Date Hour Count Notes
2026-04-07 22:xx 7 runs in 19 min Worst case
2026-04-08 08:xx 2 runs one labeled isolated session
2026-04-08 14:xx 3 runs
2026-04-08 15:xx 3 runs one labeled isolated session
2026-04-08 23:xx 6 runs Second worst
2026-04-09 16:xx 3 runs one labeled isolated session

Pattern: duplicate clusters correlate with active openclaw agent CLI usage periods.

Example cluster (2026-04-07 22:38-22:57):
22:38 scheduled
22:41 duplicate (3 min later)
22:48 duplicate
22:51 duplicate
22:52 duplicate
22:57 duplicate

Each heartbeat writes a git checkpoint, resulting in 4 git commits in 19 minutes.

Root cause analysis

In heartbeat-wake.ts, when a wake arrives during a running heartbeat, it is queued. After the heartbeat completes, the queued wake fires immediately.

In heartbeat-runner.ts, the wake handler calls run() which calls advanceAgentSchedule(agent, now), resetting nextDueMs = now + intervalMs. The requests-in-flight guard only checks at the START of runOnce. By the time the queued wake fires, the first heartbeat has already released the queue.

Impact

  • Wasted API calls: Each heartbeat is a full LLM call (17 check items, ~60s runtime)
  • Git noise: Each heartbeat creates a git checkpoint commit
  • raw.log bloat: Duplicate heartbeat reports pollute daily logs

Environment

  • OpenClaw version: 2026.4.8 (also observed on 2026.3.31)
  • OS: macOS 26.2 (ARM64)
  • Install: npm global
  • Heartbeat config: every=60m, isolatedSession=true
  • Model: glm-5-turbo (zai)

Suggested fix

  1. Skip wake-triggered heartbeats that fall within the current interval window: in run(), when reason is not interval, check if now < agent.nextDueMs and skip if so.
  2. Or: Do not advanceAgentSchedule() for non-interval reasons (wake, exec-event, etc) — only advance on actual interval triggers.
  3. Or: Add a minimum coalescing window larger than the heartbeat runtime.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions