fix(heartbeat): clear pendingFinalDelivery* on send success#83187
fix(heartbeat): clear pendingFinalDelivery* on send success#83187agocs wants to merge 3 commits into
Conversation
|
Codex review: needs changes before merge. Reviewed May 30, 2026, 3:02 AM ET / 07:02 UTC. Summary PR surface: Source +11, Tests +98. Total +109 across 2 files. Reproducibility: yes. Source inspection shows Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Copy recommended automerge instructionNext step before merge
Security Review findings
Review detailsBest possible solution: Gate the pending-final-delivery cleanup to actual visible heartbeat delivery, add regression coverage for the suppressed-send case, and make an explicit maintainer call on whether duplicate-suppression cleanup lands here or separately. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection shows Is this the best way to solve the issue? No, not quite. Clearing after a real sent heartbeat is the right fix shape, but the branch should distinguish Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against f90b8cffc744. Label changesLabel justifications:
Evidence reviewedPR surface: Source +11, Tests +98. Total +109 across 2 files. View PR surface stats
Acceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg: ✨ hatched 🥚 common Clockwork Signal Puff. Rarity: 🥚 common. Trait: hums during re-review. DetailsShare on X: post this hatch
About:
|
|
Sibling PR #87583 was closed as superseded by this canonical cleanup path, which makes sense for the send-success portion. One remaining gap from #87583 may still be worth porting here: duplicate-suppression cleanup. #83187 clears
I ran a disposable local OpenClaw source-runtime proof on #87583 ( Relevant duplicate-suppression proof: {
"scenario": "duplicate-skip",
"result": { "status": "ran", "durationMs": 51 },
"sendCallCount": 0,
"before": {
"pendingFinalDelivery": true,
"pendingFinalDeliveryText": "PR87583 duplicate-skip heartbeat payload",
"pendingFinalDeliveryCreatedAt": 1779957646044,
"pendingFinalDeliveryLastAttemptAt": 1779957706044,
"pendingFinalDeliveryAttemptCount": 2,
"pendingFinalDeliveryLastError": "redacted prior delivery failure",
"pendingFinalDeliveryContext": { "channel": "telegram", "to": "proof-chat" },
"pendingFinalDeliveryIntentId": "intent-duplicate-skip",
"lastHeartbeatText": "PR87583 duplicate-skip heartbeat payload",
"lastHeartbeatSentAt": 1779957706044
},
"after": {
"pendingFinalDelivery": "<absent>",
"pendingFinalDeliveryText": "<absent>",
"pendingFinalDeliveryCreatedAt": "<absent>",
"pendingFinalDeliveryLastAttemptAt": "<absent>",
"pendingFinalDeliveryAttemptCount": "<absent>",
"pendingFinalDeliveryLastError": "<absent>",
"pendingFinalDeliveryContext": "<absent>",
"pendingFinalDeliveryIntentId": "<absent>",
"lastHeartbeatText": "PR87583 duplicate-skip heartbeat payload",
"lastHeartbeatSentAt": 1779957706044
},
"clearedAllPendingFields": true,
"deliveryEvidence": "duplicate suppression skipped outbound send because lastHeartbeatText matched within 24h"
}If maintainers want this PR to remain the single canonical fix, please consider porting the duplicate-suppression cleanup and regression test from #87583 rather than reopening the sibling PR. |
|
Heads up: this PR needs to be updated against current |
|
Button clicked, thanks @RomneyDa |
Heartbeat-driven agent runs can leave pendingFinalDelivery set on the session; runHeartbeatOnce wrote lastHeartbeatText/lastHeartbeatSentAt on send success but never cleared the recovery fields, so the next heartbeat hit the get-reply redelivery short-circuit and skipped the agent. Extend the existing post-success updateSessionStore block to null all seven pendingFinalDelivery* fields, mirroring clearPendingFinalDeliveryAfterSuccess in dispatch-from-config.ts. Refs: openclaw#83184
20311f3 to
e2f3d73
Compare
|
Confirming this on a production deployment — the fix is correct and the approach is the right one. We hit this on OpenClaw What makes @agocs' method the right fix specifically:
#80357 already landed the ack-only side of this in 2026.5.26 — clearing on send-success here closes the remaining substantive-text gap in the same heartbeat path, so the two together cover both cases at the source rather than downstream. (Our current workarounds are both stopgaps for this exact issue: manually nulling the Checks are green and there are no conflicts — would love to see this prioritized for merge. 😊 🙏🏻 Happy to validate against our repro if that's useful. |
Summary
Fixes #83184.
The heartbeat send path in
src/infra/heartbeat-runner.ts:2011-2022wrotelastHeartbeatText/lastHeartbeatSentAtafter a successfulsendDurableMessageBatchbut never nulled thependingFinalDelivery*recovery fields. A heartbeat-driven agent run that previously setpendingFinalDelivery: true(viasrc/auto-reply/reply/agent-runner.ts:2169) therefore left the session permanently stuck: subsequent heartbeats short-circuited in get-reply's redelivery path, the agent was never re-invoked, and the symptom surfaced as silent loss of notifications.This change extends the existing post-success
updateSessionStoreblock to clear all sevenpendingFinalDelivery*fields, mirroring the canonicalclearPendingFinalDeliveryAfterSuccesshelper atsrc/auto-reply/reply/dispatch-from-config.ts:365-391(already invoked from the user-message dispatch path at:1671).Affected versions: verified on
2026.5.7through2026.5.16-beta.2; fix applies against currentmain(2026.5.17).Verification
Commands run locally against this branch on Ubuntu 24.04, Node 22.22.2, pnpm 11.1.0:
node scripts/run-vitest.mjs src/infra/heartbeat-runner.clears-pending-final-delivery.test.ts— new regression test, 1 passed (also confirmed RED before the fix landed in the same branch).node scripts/run-vitest.mjs <all 16 src/infra/heartbeat-runner.*.test.ts non-e2e files>— 16 files / 152 tests passed.pnpm check:changed— green: conflict markers, changelog attributions, wildcard re-export guards, duplicate coverage, dependency pin guard, package patch guard, typecheck core, typecheck core tests, lint core (0/0), media download helper guard, runtime sidecar loader guard, runtime import cycles, webhook body guard, pairing store guard, pairing account guard.oxfmt --checkand core oxlint — both clean on the two touched files.The new test seeds a session with
pendingFinalDelivery: true+ a heartbeat-ackpendingFinalDeliveryText(so the heartbeat-defer check atheartbeat-runner.ts:~1328does not short-circuit), manually patches in the remaining fivependingFinalDelivery*fields the seeder does not expose, runsrunHeartbeatOncewith a substantive reply, and asserts all seven fields areundefinedafterward. It also assertslastHeartbeatText/lastHeartbeatSentAtare written, proving the success branch was actually taken.Real behavior proof
Behavior addressed: heartbeat send-success path leaves
pendingFinalDelivery*set, causing the next heartbeat to skip the agent via the get-reply redelivery short-circuit.Real environment tested: Docker-deployed gateway on Ubuntu 24.04 (kernel 6.8.0-111-generic). Prior to this patch the gateway was on
OpenClaw 2026.5.12; the bug was deterministic there, confirmed by an out-of-band cron job (*/10 * * * *) that nulls the stuck fields wheneverpendingFinalDeliveryTextmatcheslastHeartbeatText. Cron log baseline showed one "cleared stuck" entry per hour (e.g.,2026-05-17T15:50:01Zand2026-05-17T16:50:01Z, bothagent:main:main) — i.e., every substantive heartbeat left the session stuck and the cron had to clean up.Exact steps or command run after this patch:
fix/heartbeat-clear-pending-final-delivery, currentmain+ the diff in this PR; reportedOpenClaw 2026.5.17).DOCKER_BUILDKIT=1 docker build -t openclaw-source:fix-heartbeat-pending-delivery ..docker compose build && docker compose down && docker compose up -d. New container came up at2026-05-17T17:04:33Z.docker compose exec openclaw-gateway grep -c "pendingFinalDeliveryContext: void 0" /app/dist/heartbeat-runner-*.js→1in the send-path bundle (heartbeat-runner-CecDa3Rk.js),0in the runtime stub (heartbeat-runner-DPGtQP6X.js). Context grep aroundlastHeartbeatSentAt: startedAtshows the sevenpendingFinalDelivery*: void 0lines immediately after it, in the sameupdateSessionStorewrite that recordslastHeartbeatText/lastHeartbeatSentAt.agent:main:mainin the host-mountedsessions.jsonand the cron log over the next ~2 hours of real heartbeats.Evidence after fix:
src/infra/heartbeat-runner.clears-pending-final-delivery.test.ts) goes RED onmain(expected true to be undefinedat thependingFinalDeliveryassertion) and GREEN with this patch. Sister tests insrc/infra/heartbeat-runner.*(16 files / 152 tests) remain green.agent:main:main, host-mounted sessions store, redacted preview):2026-05-17T17:04:33Z— patched container started (OpenClaw 2026.5.17).2026-05-17T17:04:41Z— heartbeat scheduler loggedheartbeat: started intervalMs=1800000. Subsequent ticks ran but were all HEARTBEAT_OK (silent, no store write), keepinglastHeartbeatSentAtat the pre-redeploy2026-05-17T16:40:18Zfor ~2.5 hours.2026-05-17T19:10:18Z— first substantive heartbeat after the redeploy.lastHeartbeatSentAtadvanced to1779045018769,lastHeartbeatTextwritten ("One mildly notable item — everything else is promos: …").2026-05-17T19:11:04Z(≤46s after the send write, well before the next*/10cron tick at19:20:00Z) —pendingFinalDelivery: None,pendingFinalDeliveryText: "". This clean state is attributable only to the patched send-path write, not to the cron.2026-05-17T19:20:00Zcron tick — zero new lines inclear-stuck-pending-delivery.log(baseline was 2 lines, post-tick was still 2 lines). On the unpatched image this is the tick that would have logged a "cleared stuck" line, matching the prior15:50/16:50cadence.Observed result after fix: a real substantive heartbeat in the deployed gateway now leaves all seven
pendingFinalDelivery*fields cleared in the sameupdateSessionStorewrite that recordslastHeartbeatText/lastHeartbeatSentAt, and the standing cron mitigation no longer finds anything to clean up.What was not tested:
pendingFinalDelivery: truerecovery branch where final delivery failed and a retry is genuinely needed — that path is still gated by theclearPendingFinalDeliveryAfterSuccessguard indispatch-from-config.ts; this PR only adds the equivalent clear in the heartbeat send-success path.