fix: drop heartbeat runs that arrive while another run is active by mcaxtr · Pull Request #25610 · openclaw/openclaw

mcaxtr · 2026-02-24T16:23:14Z

Summary

Problem: Heartbeat runs arriving while another agent run is active get enqueued as followups, and when the queue drains later, they produce duplicate agent runs that send multiple response branches to users
Why it matters: Creates poor UX with 2-4 duplicate messages instead of 1, confusing users and creating noise
What changed: Added early-return guard in runReplyAgent that drops heartbeat runs when isActive: true, before they reach the enqueue path. Also added markRunComplete() calls to all early-return paths for consistency.
What did NOT change: Non-heartbeat runs continue to enqueue normally when another run is active; followup mechanism unchanged; heartbeat runs that arrive when no run is active execute normally

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes [Bug]: Heartbeat runs create duplicate response branches when enqueued as followups #25606
Related [Bug]: Heartbeat sends multiple response branches due to followup-runner and delivery-mirror #8063 (original issue, auto-closed as stale)
Related fix: drop heartbeat runs that arrive while another run is active #12786 (original PR, auto-closed as stale)

User-visible / Behavior Changes

Users with heartbeat enabled will no longer receive duplicate messages when heartbeat intervals coincide with active agent runs. Heartbeat runs that arrive while another run is active are now silently dropped, and the next heartbeat interval independently re-checks the session.

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Repro + Verification

Environment

OS: Linux / macOS / all platforms
Runtime/container: Node 22+
Model/provider: Any (tested with gpt-4o-mini, ministral-3b)
Integration/channel: All channels affected
Relevant config: Heartbeat enabled with any interval

Steps

Configure heartbeat with default settings
Trigger a heartbeat check while another agent run is active for the same session
Wait for the active run to complete and the followup queue to drain
Observe messages delivered to user

Expected

Single message delivered (either HEARTBEAT_OK or brief status update)

Actual

Before fix: 2-4 duplicate messages due to stale heartbeat followup creating additional agent run branches
After fix: Single message only (heartbeat dropped when isActive, next interval re-checks)

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

New test suite with 3 test cases:

Heartbeat + active → returns undefined, does not enqueue, does not run
Non-heartbeat + active → returns undefined, enqueues normally
Heartbeat + inactive → executes normally, returns payload

All tests pass. Existing agent-runner and heartbeat test suites pass.

Human Verification (required)

Verified scenarios:

New test suite passes (3 tests covering heartbeat drop, non-heartbeat enqueue, heartbeat execute)
Existing agent-runner tests pass (13 test files)
Existing heartbeat tests pass (43 tests)
pnpm build && pnpm check passes (oxfmt, oxlint, tsgo all clean)

Edge cases checked:

Non-heartbeat runs still enqueue normally when isActive
Heartbeat runs execute normally when not active
Early-return paths all call markRunComplete() before cleanup()

What I did not verify:

End-to-end testing with real heartbeat intervals and live channels (requires production setup with timing coordination)

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No

Failure Recovery (if this breaks)

How to disable/revert this change quickly: Revert commit or cherry-pick the inverse
Files/config to restore: None (no config changes)
Known bad symptoms reviewers should watch for: Heartbeat messages not being delivered at all (would indicate guard is triggering incorrectly)

Risks and Mitigations

Risk: Heartbeat might be dropped too aggressively if isActive logic has false positives
- Mitigation: Next heartbeat interval will re-check independently; isActive flag is managed by existing typing controller which is well-tested
Risk: Subtle differences in markRunComplete() timing across early-return paths
- Mitigation: Added markRunComplete() to all early-return paths for consistency, following the pattern from the finally block

Greptile Summary

This PR fixes a bug where heartbeat runs arriving while another agent run is active would get enqueued as followups, causing duplicate agent runs and multiple response branches delivered to users (2-4 duplicate messages instead of 1).

Adds an early-return guard in runReplyAgent that silently drops heartbeat runs when isActive: true, before they reach the enqueue path. The next heartbeat interval independently re-checks the session, so no heartbeat cycles are permanently lost.
Adds markRunComplete() calls to all pre-try early-return paths (steered, heartbeat-dropped, and enqueue paths) for consistency — these paths return before the try/finally block, so without explicit markRunComplete() calls the typing controller would skip that step.
Non-heartbeat runs continue to enqueue normally when another run is active; the followup mechanism is unchanged.
New test suite covers all three key scenarios: heartbeat + active (dropped), non-heartbeat + active (enqueued), heartbeat + inactive (executes normally).

Confidence Score: 5/5

This PR is safe to merge — the change is narrowly scoped, correctly guarded, and well-tested.
The fix is minimal and surgical: a single conditional guard that only affects heartbeat runs when isActive is true. The logic is straightforward (two boolean checks), placement is correct (before the enqueue path), and the consistency improvements to markRunComplete() calls are harmless since they target early-return paths outside the try/finally block. No existing behavior is altered for non-heartbeat runs. The test suite covers the critical scenarios.
No files require special attention.

_{Last reviewed commit: d77683e}

…nclaw#25606)

Co-authored-by: Marcus Castro <mcaxtr@gmail.com>

steipete · 2026-02-25T01:59:10Z

Implemented and landed on main as c736778b3.

What I changed (reimplemented after code review):

Added an early guard in runReplyAgent to drop heartbeat runs when isActive is already true, before the followup enqueue path.
Kept non-heartbeat active-run behavior unchanged (still enqueues followups).
Added regression coverage in src/auto-reply/reply/agent-runner.runreplyagent.test.ts for:
- heartbeat + active => dropped (no enqueue)
- non-heartbeat + active => enqueued

Validation:

pnpm lint
pnpm build
pnpm test

Thanks for the report and original PR, @mcaxtr.

Co-authored-by: Marcus Castro <mcaxtr@gmail.com>

@mcaxtr