fix(heartbeat): suppress stale final replay#81313
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 27, 2026, 7:06 PM ET / 23:06 UTC. Summary PR surface: Source -20, Tests +104. Total +84 across 5 files. Reproducibility: yes. Current main has a source-clear path where heartbeat returns non-ack pending final text, and the PR includes sanitized live incident context plus after-fix Telegram terminal proof; I did not rerun the live Telegram scenario. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land this narrow suppression only if maintainers accept that heartbeats should never directly replay pending user finals; otherwise add route-aware recovery through the captured delivery context. Do we have a high-confidence way to reproduce the issue? Yes. Current main has a source-clear path where heartbeat returns non-ack pending final text, and the PR includes sanitized live incident context plus after-fix Telegram terminal proof; I did not rerun the live Telegram scenario. Is this the best way to solve the issue? Yes. Removing direct heartbeat replay is the narrowest fix for wrong-target leakage, with route-aware durable recovery as the safer long-term alternative if maintainers need heartbeat recovery to deliver pending finals. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against da5fe990d8b3. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source -20, Tests +104. Total +84 across 5 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
Added the live Telegram reproduction summary to the PR body and linked this to #74257. I kept the reproduction sanitized because the original live trace included private deployment/customer data, but the important shape is captured: one topic gets the intended task reply, then a heartbeat/default route re-emits stale prior task output into another Telegram context in the same minute. |
10c689c to
5c5b13d
Compare
|
I rechecked this PR against current origin/main on 2026-05-26. The branch still has unique patch content, but GitHub currently reports it as merge-conflicted, so a rebase/refresh is likely needed before maintainer review. @clawsweeper re-review to refresh the current author-facing status and blockers, please. |
|
🦞👀 Command router queued. I will update this comment with the next step. |
|
I have refactored this branch against the current I have retained both sides of the conflicted heartbeat changes: the current main's pending-final sanitization coverage and this PR's stale final replay suppression. Additionally, I have preserved the newer reply-run registry skip path and wired the isolated heartbeat active-run guard through the current I have verified the changes on the head branch pnpm test src/auto-reply/reply/get-reply.fast-path.test.ts src/infra/heartbeat-runner.skips-busy-session-lane.test.ts
pnpm exec oxfmt --check --threads=1 src/auto-reply/get-reply-options.types.ts src/auto-reply/reply/get-reply.fast-path.test.ts src/auto-reply/reply/get-reply.ts src/infra/heartbeat-runner.skips-busy-session-lane.test.ts src/infra/heartbeat-runner.test-utils.ts src/infra/heartbeat-runner.tsPlease review the changes. |
|
The branch has been narrowed against the current Head now: Current PR diff files:
Validation run locally: /tmp/openclaw-pr76920-merge/node_modules/.bin/oxfmt --check --threads=1 \\
src/auto-reply/get-reply-options.types.ts \\
src/auto-reply/reply/get-reply.fast-path.test.ts \\
src/auto-reply/reply/get-reply.ts \\
src/infra/heartbeat-runner.skips-busy-session-lane.test.ts \\
src/infra/heartbeat-runner.test-utils.ts \\
src/infra/heartbeat-runner.ts
node scripts/test-projects.mjs \\
src/auto-reply/reply/get-reply.fast-path.test.ts \\
src/infra/heartbeat-runner.skips-busy-session-lane.test.tsResults: formatter passed; focused Vitest passed 2 shards / 29 tests. @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Updated this PR to address the latest ClawSweeper finding. Head: What changed:
Reviewer/bot feedback addressed:
Proof:
Validation: pnpm exec oxfmt --write src/infra/heartbeat-runner.skips-busy-session-lane.test.ts
pnpm exec oxfmt --check --threads=1 src/infra/heartbeat-runner.skips-busy-session-lane.test.ts
git diff --check
timeout -k5s 90s pnpm vitest src/infra/heartbeat-runner.skips-busy-session-lane.test.ts -t "active reply run|immediate heartbeat wakes|unscoped active reply"
timeout -k5s 120s node scripts/run-vitest.mjs src/infra/heartbeat-runner.skips-busy-session-lane.test.ts -t "active reply run|immediate heartbeat wakes|unscoped active reply"
timeout -k5s 30s node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --noEmit --pretty falseResult:
Current state:
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Refreshed this branch against current This was a CI cleanup refresh, not a behavior expansion. The effective PR diff is still limited to the six heartbeat/reply files, and the stale CI failures are addressed by carrying current main forward:
Validation run locally on the refreshed head: pnpm exec oxfmt --check --threads=1 src/auto-reply/get-reply-options.types.ts src/auto-reply/reply/get-reply.fast-path.test.ts src/auto-reply/reply/get-reply.ts src/infra/heartbeat-runner.skips-busy-session-lane.test.ts src/infra/heartbeat-runner.test-utils.ts src/infra/heartbeat-runner.ts
pnpm lint --threads=8
pnpm run lint:extensions:bundled
pnpm check:test-types
pnpm test src/auto-reply/reply/get-reply.fast-path.test.ts src/infra/heartbeat-runner.skips-busy-session-lane.test.tsAll passed on @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. |
Live Telegram heartbeat stale-final proof, head
|
|
@clawsweeper re-review |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
|
Landing proof for head e1d6f32: Behavior addressed: heartbeat runs no longer directly replay non-ack durable pending final text; ack-only heartbeat pending state still clears, and isolated heartbeat runs now skip when their reply run is active. Real environment tested: local OpenClaw checkout on the PR branch plus GitHub CI for the PR head. Exact steps or command run after this patch:
Evidence after fix:
Observed result after fix: stale non-ack pending final text remains durable but is not returned as the heartbeat reply; heartbeat ack-only pending delivery is still cleared; heartbeat runner passes only normal heartbeat options and avoids replacing an active isolated heartbeat session. What was not tested: no new live production Telegram incident replay was run during landing; the PR body retains the existing sanitized live/staging proof. |
What
Why
A stale pending final answer can belong to a prior user task and route, while a later heartbeat may deliver to the agent's default heartbeat target. Returning that pending text from the heartbeat reply path can leak a previous answer into the wrong channel.
This keeps durable pending final recovery available for paths that opt into it, but makes heartbeat runs suppress direct pending-final replay.
Related to #74257.
Evidence
Live reproduction from a Telegram deployment:
pendingFinalDeliveryTextfrom session state and returning it as the heartbeat reply.The reproduction is the same failure class as #74257: asynchronous/heartbeat follow-up delivery can surface prior task output in the wrong Telegram context.
Real behavior proof
Behavior or issue addressed: Heartbeat/default-route delivery must not replay a non-ack pending final answer that was created by a previous user task in another Telegram context.
Real environment tested: Local OpenClaw checkout for PR fix(heartbeat): suppress stale final replay #81313 at head
b2a620dd5044e975aba215edd2edbbb0437814b1, running under Node on Linux. The live proof used@kesslerClawBotin the real Telegram staging forum group-1003908474243,#Generaltopic1, with a temporary isolated session store so production state was not modified.Exact steps or command run after this patch: Ran a redacted runtime proof script from
/tmp/openclaw-pr81313that seeded a heartbeat session with a unique non-ack pending final marker, routed heartbeat delivery to the staging Telegram topic, invokedrunHeartbeatOnce, and used the real Telegram sender through the heartbeat outbound adapter. The model reply was stubbed toHEARTBEAT_OKso the proof focused on stale pending-final suppression and actual Telegram delivery behavior.Evidence after fix: Redacted terminal output from the post-patch branch; public proof comment: fix(heartbeat): suppress stale final replay #81313 (comment)
Observed result after fix: The heartbeat path passed
suppressPendingFinalDeliveryReplay=true, sent onlyHEARTBEAT_OKthrough the real Telegram transport, and did not send the unique stale pending-final marker. The marker remained in durable pending state with attempt count0, so heartbeat did not consume or replay it.What was not tested: I did not deploy this patch to the private production Telegram gateway before opening the PR; the original live reproduction contains private customer data and is summarized rather than pasted. The live staging proof uses a stubbed heartbeat model reply to keep the proof deterministic while still exercising the heartbeat runner, pending-final suppression flag, outbound adapter, and real Telegram send.
Proof limitations or environment constraints: The proof uses a temporary isolated session store and a deterministic heartbeat reply stub, so it proves the stale-final suppression flag, heartbeat runner path, outbound adapter, and real Telegram send without modifying production state.
Tested against
Tests
pnpm test src/auto-reply/reply/get-reply.fast-path.test.ts src/infra/heartbeat-runner.skips-busy-session-lane.test.tspnpm exec oxfmt --check --threads=1 src/auto-reply/get-reply-options.types.ts src/auto-reply/reply/get-reply.ts src/auto-reply/reply/get-reply.fast-path.test.ts src/infra/heartbeat-runner.ts src/infra/heartbeat-runner.skips-busy-session-lane.test.ts src/infra/heartbeat-runner.test-utils.tspnpm check:changed --base upstream/mainpnpm lint --threads=8pnpm run lint:extensions:bundledpnpm check:test-typesNotes
test/scripts/prompt-snapshots.test.tsduringbuild-artifacts; this PR does not touch prompt snapshot code or fixtures.AI Assistance
Implemented with AI assistance. Reviewed with focused correctness, test-adequacy, and ce-simplify subagent passes; one isolated-heartbeat guard finding and one duplicate skip-return simplification were fixed before opening this PR.