Skip to content

fix(heartbeat): skip heartbeat execution while a reply run is active#2326

Open
BingqingLyu wants to merge 1 commit into
mainfrom
fork-pr-64963-fix-heartbeat-preempt-active-reply
Open

fix(heartbeat): skip heartbeat execution while a reply run is active#2326
BingqingLyu wants to merge 1 commit into
mainfrom
fork-pr-64963-fix-heartbeat-preempt-active-reply

Conversation

@BingqingLyu

@BingqingLyu BingqingLyu commented Apr 28, 2026

Copy link
Copy Markdown
Owner

Summary

Resubmission of openclaw#64823 (auto-closed ~1 minute after filing by the active-PR limit bot despite a Greptile 5/5 review). Code is unchanged; authorship preserved via cherry-pick so @EronFan / aoao is credited in git history.

Adds a guard in runHeartbeatOnce (src/infra/heartbeat-runner.ts) that skips heartbeat execution when resolveActiveReplyRunSessionId(sessionKey) returns truthy. Placed after preflight resolves the session key and before the existing session-lane queue check. Symmetric with the lane-busy skip path — emits the same heartbeat event and returns { status: "skipped", reason: "requests-in-flight" } so the wake-layer retry re-schedules automatically.

Why the existing lane-busy check is insufficient

A reply run can remain active for a session even after the command lane itself has drained, for example while the active assistant turn is still finishing provider/output cleanup. In that window, a heartbeat or async system-event wake landing on the same session lane races the user-visible reply and can effectively swallow it — the original turn never replays and the user sees no final answer. This is the class described in openclaw#64810 and reproduced by @jackiedepp + @EronFan.

Changes

  • src/infra/heartbeat-runner.ts: +15 lines. New import of resolveActiveReplyRunSessionId from auto-reply/reply/reply-run-registry.js plus the guard block inside runHeartbeatOnce before the existing sessionLaneKey queue check.
  • src/infra/heartbeat-runner.skips-busy-session-lane.test.ts: +57 lines. New regression test that seeds a main session, queues a system event on it, starts a live reply operation in "running" phase, and asserts the heartbeat runner skips with requests-in-flight without invoking the reply spy.

Total: 2 files, +72 / -0. Identical to openclaw#64823.

Testing

node scripts/test-projects.mjs src/infra/heartbeat-runner.skips-busy-session-lane.test.ts

(Same test invocation as the original PR.)

Code-level exposure confirmation

Verified on an Ubuntu 24.04 VPS deployment running v2026.4.9 (0512059) with agents.defaults.heartbeat.every: "1h" and Telegram as the primary channel:

  • resolveActiveReplyRunSessionId is exported from the reply-run registry in the installed bundle
  • it is not referenced from the heartbeat-runner module
  • runHeartbeatOnce only guards on getQueueSize(sessionLaneKey)

So the guard is absent on v2026.4.9, and the symbol needed to add it is already available in that bundle — this is a direct-hit class on hosts that share that configuration shape.

Fixes

Fixes openclaw#64810
Supersedes openclaw#64823 (auto-closed by PR-limit bot)

Credits

Opening this because the original PR is mechanically closed and the memory rule I operate by is: when a fix is small, well-reviewed, and we can credit the original author cleanly, resubmit rather than leave the code orphaned in a closed PR. No code change from me.

A reply run can remain active for a session even after the command lane
has drained, for example while the active assistant turn is still
finishing provider/output cleanup. In that window, a heartbeat or async
system-event wake landing on the same session lane can race the
user-visible reply and effectively swallow it — the original turn never
gets replayed and the user sees no final answer.

Adds a guard in runHeartbeatOnce that skips execution when
resolveActiveReplyRunSessionId(sessionKey) returns truthy, placed after
preflight resolves the session key and before the existing session-lane
queue check. Symmetric with the lane-busy skip path: emits the same
heartbeat event and returns { status: "skipped", reason: "requests-in-flight" }
so the wake-layer retry re-schedules automatically.

Covered by a new regression test (heartbeat-runner.skips-busy-session-lane.test.ts)
that seeds a main session, queues a system event on it, starts a live
reply operation in "running" phase, and asserts the heartbeat runner
skips with requests-in-flight without invoking the reply spy.

Fixes openclaw#64810.

This is a resubmission of the fix originally proposed in openclaw#64823, which
was auto-closed by the active-PR limit bot approximately one minute
after filing despite a Greptile 5/5 review. The code is unchanged from
that PR; authorship is preserved via cherry-pick so the original author
is properly credited in git history.

Verified code-level exposure on an Ubuntu 24.04 VPS deployment running
v2026.4.9 with agents.defaults.heartbeat.every set to 1h and Telegram as
the primary channel: resolveActiveReplyRunSessionId is exported from the
reply-run registry in the installed bundle but is not referenced from
the heartbeat-runner module — the guard is absent, confirming this is a
direct-hit class on hosts that share that configuration shape.

Co-authored-by: EronFan <EronFan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Heartbeat / async system events can interrupt and effectively swallow in-progress replies in Telegram topic sessions

1 participant