fix(heartbeat): skip heartbeat execution while a reply run is active#2326
Open
BingqingLyu wants to merge 1 commit into
Open
fix(heartbeat): skip heartbeat execution while a reply run is active#2326BingqingLyu wants to merge 1 commit into
BingqingLyu wants to merge 1 commit into
Conversation
A reply run can remain active for a session even after the command lane
has drained, for example while the active assistant turn is still
finishing provider/output cleanup. In that window, a heartbeat or async
system-event wake landing on the same session lane can race the
user-visible reply and effectively swallow it — the original turn never
gets replayed and the user sees no final answer.
Adds a guard in runHeartbeatOnce that skips execution when
resolveActiveReplyRunSessionId(sessionKey) returns truthy, placed after
preflight resolves the session key and before the existing session-lane
queue check. Symmetric with the lane-busy skip path: emits the same
heartbeat event and returns { status: "skipped", reason: "requests-in-flight" }
so the wake-layer retry re-schedules automatically.
Covered by a new regression test (heartbeat-runner.skips-busy-session-lane.test.ts)
that seeds a main session, queues a system event on it, starts a live
reply operation in "running" phase, and asserts the heartbeat runner
skips with requests-in-flight without invoking the reply spy.
Fixes openclaw#64810.
This is a resubmission of the fix originally proposed in openclaw#64823, which
was auto-closed by the active-PR limit bot approximately one minute
after filing despite a Greptile 5/5 review. The code is unchanged from
that PR; authorship is preserved via cherry-pick so the original author
is properly credited in git history.
Verified code-level exposure on an Ubuntu 24.04 VPS deployment running
v2026.4.9 with agents.defaults.heartbeat.every set to 1h and Telegram as
the primary channel: resolveActiveReplyRunSessionId is exported from the
reply-run registry in the installed bundle but is not referenced from
the heartbeat-runner module — the guard is absent, confirming this is a
direct-hit class on hosts that share that configuration shape.
Co-authored-by: EronFan <EronFan@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Resubmission of openclaw#64823 (auto-closed ~1 minute after filing by the active-PR limit bot despite a Greptile 5/5 review). Code is unchanged; authorship preserved via cherry-pick so
@EronFan/aoaois credited in git history.Adds a guard in
runHeartbeatOnce(src/infra/heartbeat-runner.ts) that skips heartbeat execution whenresolveActiveReplyRunSessionId(sessionKey)returns truthy. Placed after preflight resolves the session key and before the existing session-lane queue check. Symmetric with the lane-busy skip path — emits the same heartbeat event and returns{ status: "skipped", reason: "requests-in-flight" }so the wake-layer retry re-schedules automatically.Why the existing lane-busy check is insufficient
A reply run can remain active for a session even after the command lane itself has drained, for example while the active assistant turn is still finishing provider/output cleanup. In that window, a heartbeat or async system-event wake landing on the same session lane races the user-visible reply and can effectively swallow it — the original turn never replays and the user sees no final answer. This is the class described in openclaw#64810 and reproduced by
@jackiedepp+@EronFan.Changes
src/infra/heartbeat-runner.ts: +15 lines. New import ofresolveActiveReplyRunSessionIdfromauto-reply/reply/reply-run-registry.jsplus the guard block insiderunHeartbeatOncebefore the existingsessionLaneKeyqueue check.src/infra/heartbeat-runner.skips-busy-session-lane.test.ts: +57 lines. New regression test that seeds a main session, queues a system event on it, starts a live reply operation in"running"phase, and asserts the heartbeat runner skips withrequests-in-flightwithout invoking the reply spy.Total: 2 files, +72 / -0. Identical to openclaw#64823.
Testing
(Same test invocation as the original PR.)
Code-level exposure confirmation
Verified on an Ubuntu 24.04 VPS deployment running
v2026.4.9 (0512059)withagents.defaults.heartbeat.every: "1h"and Telegram as the primary channel:resolveActiveReplyRunSessionIdis exported from the reply-run registry in the installed bundlerunHeartbeatOnceonly guards ongetQueueSize(sessionLaneKey)So the guard is absent on
v2026.4.9, and the symbol needed to add it is already available in that bundle — this is a direct-hit class on hosts that share that configuration shape.Fixes
Fixes openclaw#64810
Supersedes openclaw#64823 (auto-closed by PR-limit bot)
Credits
@jackiedepp— original bug report with clean repro ([Bug]: Heartbeat / async system events can interrupt and effectively swallow in-progress replies in Telegram topic sessions openclaw/openclaw#64810)@EronFan/aoao— root-cause analysis and the fix + regression test (fix: avoid heartbeat preempting active reply runs openclaw/openclaw#64823, preserved as commit author here)Opening this because the original PR is mechanically closed and the memory rule I operate by is: when a fix is small, well-reviewed, and we can credit the original author cleanly, resubmit rather than leave the code orphaned in a closed PR. No code change from me.