Skip to content

[Bug]: Heartbeat-driven agent replies leave pendingFinalDelivery stuck, blocking subsequent heartbeats #83184

@agocs

Description

@agocs

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

runHeartbeatOnce writes lastHeartbeatText / lastHeartbeatSentAt after a successful send but never nulls the pendingFinalDelivery* fields, so any heartbeat-driven agent run that previously set pendingFinalDelivery: true leaves the session permanently stuck on the get-reply redelivery short-circuit.

Steps to reproduce

  1. Drive the heartbeat path until getReplyFromConfig produces a substantive (non-HEARTBEAT_OK) text payload. The reply-agent path (src/auto-reply/reply/agent-runner.ts:2169) sets pendingFinalDelivery: true on the session at the end of substantive runs regardless of trigger.
  2. Let the heartbeat finish: sendDurableMessageBatch succeeds and runHeartbeatOnce writes lastHeartbeatText / lastHeartbeatSentAt (src/infra/heartbeat-runner.ts:2011-2022).
  3. Wait for the next heartbeat tick.

Expected behavior

After step 2, pendingFinalDelivery* is cleared (the same way the user-message dispatch path does it via clearPendingFinalDeliveryAfterSuccess at src/auto-reply/reply/dispatch-from-config.ts:365-391, called from :1671). Subsequent heartbeats observe a clean session and invoke the agent normally.

Actual behavior

pendingFinalDelivery remains true indefinitely. On the next heartbeat, the get-reply fast path treats the session as having an unresolved final delivery and short-circuits before invoking the agent. The user observes silent loss: no notifications, no model calls, no errors surfaced.

OpenClaw version

Verified affected: 2026.5.7 through 2026.5.16-beta.2. Diagnosis applies to current main (2026.5.17).

Operating system

Linux (Docker deployment).

Install method

docker

Model

NOT_ENOUGH_INFO (model-independent; bug is in core heartbeat send path).

Provider / routing chain

NOT_ENOUGH_INFO (provider-independent).

Additional provider/model setup details

NOT_ENOUGH_INFO

Logs, screenshots, and evidence

Code-level diagnosis (paths against openclaw/openclaw main):

src/auto-reply/reply/agent-runner.ts:2169
  Sets `pendingFinalDelivery: true` at the end of every substantive agent run,
  regardless of trigger source (user message vs heartbeat).

src/auto-reply/reply/dispatch-from-config.ts:365-391
  Defines `clearPendingFinalDeliveryAfterSuccess` which nulls all seven
  pendingFinalDelivery* fields plus refreshes updatedAt.

src/auto-reply/reply/dispatch-from-config.ts:1671
  Calls `clearPendingFinalDeliveryAfterSuccess` after a successful
  user-message-driven dispatch. The heartbeat path has no equivalent call.

src/infra/heartbeat-runner.ts:1979-2022
  Heartbeat send path. On success it writes `lastHeartbeatText` and
  `lastHeartbeatSentAt` via `updateSessionStore`, but never clears any of
  the seven `pendingFinalDelivery*` fields. This is the entire bug.

Local out-of-band workaround in our deployment: a cron job at */10 minutes
that nulls the stuck fields when `pendingFinalDeliveryText` matches
`lastHeartbeatText`. The cron consistently fires after every active-hours
heartbeat that produces substantive text, confirming the bug is fully
deterministic on the affected versions.

Impact and severity

  • Affected: any deployment whose heartbeat-driven agent runs ever produce substantive (non-HEARTBEAT_OK) text. Heartbeat-only setups silently lose every subsequent heartbeat after the first substantive run.
  • Severity: High. No user-visible error path; the agent simply stops responding to heartbeats and no model calls are made.
  • Frequency: Deterministic. Every heartbeat that produces substantive text leaves the session stuck.
  • Consequence: Missed notifications, missed model invocations, no surfaced error. Detection requires inspecting the session store directly.

Additional information

Suggested fix: extend the existing updateSessionStore call at src/infra/heartbeat-runner.ts:2011-2022 to also null the seven pendingFinalDelivery* fields, mirroring clearPendingFinalDeliveryAfterSuccess. A regression test seeding pendingFinalDelivery: true + a heartbeat-ack pendingFinalDeliveryText (so the heartbeat-defer check at :1328 does not short-circuit), running runHeartbeatOnce with a substantive reply, and asserting all seven fields are undefined afterward reproduces the bug and pins the fix.

PR follow-up will be linked here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions