fix(heartbeat): prevent ack-only pending delivery loops#80357
Conversation
shouldSkipHeartbeatPendingFinalDelivery was using the default 300-char threshold regardless of per-agent heartbeat config. Replace with inline logic that resolves ackMaxChars from cfg.agents[agentId].heartbeat -> cfg.agents.defaults.heartbeat -> DEFAULT_HEARTBEAT_ACK_MAX_CHARS. Also fix: store the stripped text (remainder after HEARTBEAT_OK) rather than the raw payload text. Previously pendingFinalDeliveryText would have contained the HEARTBEAT_OK prefix, causing heartbeat-runner to re-deliver it verbatim on retry. Resolves clawsweeper P2 review finding on #79270.
|
Codex review: needs real behavior proof before merge. Summary Reproducibility: yes. Current main source shows a concrete loop path: ack text can be persisted before stripping, heartbeat replay refreshes Real behavior proof Next step before merge Security Review detailsBest possible solution: Land a focused heartbeat/session-state fix after maintainer review, preserving durable replay for real heartbeat content while keeping ack-only heartbeat text out of pending replay state. Do we have a high-confidence way to reproduce the issue? Yes. Current main source shows a concrete loop path: ack text can be persisted before stripping, heartbeat replay refreshes Is this the best way to solve the issue? Yes, the code direction is the narrow maintainable fix: reuse the existing heartbeat token classifier at write, replay, and defer points rather than redesigning durable delivery. The remaining blocker is proof and maintainer handling, not an obvious patch defect. What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 3ba2ab7a0950. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Landed via rebase onto
Thanks @hclsys for the original fix work, and @haumanto for the report. |
Summary
main, preserving @hclsys's commits and changelog creditHEARTBEAT_OK, while preserving real heartbeat notification contentackMaxCharspolicy through write, stale replay cleanup, and defer-guard paths; normalize stale replay content to the stripped real remainderFixes #79258.
Supersedes/replaces #79270.
Real Behavior Proof
Behavior or issue addressed: Heartbeat ack text such as
HEARTBEAT_OKor shortHEARTBEAT_OKremainder text should not become durable pending final-delivery replay state; stale ack-only pending entries should self-heal; real heartbeat content must still replay/deliver according to configuredackMaxChars.Real environment tested: Local OpenClaw checkout on PR head
b74aa475f9, Node 22 runtime through the repopnpm testwrapper. The focused verification created real temporary session-store JSON files and exercised production TypeScript modules (runReplyAgent,getReplyFromConfig,runHeartbeatOnce,stripHeartbeatToken) with real file reads/writes; delivery/model edges were controlled by repo test harnesses.Exact steps or command run after this patch:
Evidence after fix: Terminal output from the focused run:
Terminal output from the acceptance-adjacent run:
Observed result after fix: Ack-only pending delivery no longer self-refreshes the heartbeat skip loop. Existing ack-only pending state is cleared before replay. With
ackMaxChars: 0, short post-token content remains real pending delivery and still defers as in-flight. Real heartbeat notification text remains durable for recovery.What was not tested: No live 60-minute gateway daemon interval and no external Telegram/Discord adapter were run for this PR; proof is local production-module behavior with real session-store files plus CI.
Verification
Thanks @haumanto for the detailed report and @hclsys for the original fix work in #79270.