Skip to content

[Bug]: Stale Tool-Turn Finalization Delivered After Newer WhatsApp Message #76905

@sbmilburn

Description

@sbmilburn

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

After upgrading OpenClaw from 2026.4.29 to 2026.5.2, WhatsApp direct-message replies can become stale/out-of-order. A previous tool-backed assistant turn may finish after a newer user message arrives, and that old final response is delivered into the WhatsApp chat as if it were the answer to the latest message.

Steps to reproduce

A likely reproduction path:

  1. Run OpenClaw 2026.5.2 with WhatsApp auto-reply enabled.
  2. In a WhatsApp direct chat, send a tool-backed request, e.g.:
    • “Write a report to a file and email it to me.”
  3. Let the assistant execute one or more tools, especially external/tool-backed actions.
  4. Before the assistant’s final answer for that tool-backed request has fully completed/delivered, send a newer unrelated WhatsApp message, e.g.:
    • “Now analyze the code and tell me what caused this.”
  5. Observe whether the assistant delivers the final response from step 2 after step 4, making it look like the answer to the newer message.

Expected behavior

For a foreground WhatsApp direct-message turn:

  • Assistant output should be bound to the specific inbound message/run that produced it.
  • If a newer user message has arrived in the same chat/session, an older foreground final response should not be delivered as the answer to the newer message.
  • Old tool-turn finalizations should either be suppressed, marked as stale, or delivered only if explicitly routed as a background completion notification.
  • The latest inbound user message should always win when deciding what the visible WhatsApp reply is answering.

Actual behavior

A concrete observed sequence:

  1. User sent a WhatsApp message asking the assistant to write up a bug report and email the file.
  2. Assistant performed tool calls, wrote the Markdown file, and sent the email successfully.
  3. Before a final assistant response for that tool-backed turn was delivered/settled, the user sent a newer WhatsApp message asking the assistant to analyze the OpenClaw code and identify the suspected cause.
  4. Assistant then replied to the newer message with the stale completion from the previous task:

Done — wrote it up and emailed you the Markdown file.

  1. That reply was correct for the previous “write/email the bug report” request, but incorrect for the newer “analyze the code” request.

OpenClaw version

2026.5.2

Operating system

Ubunutu 24.04

Install method

npm global

Model

codex -> gpt-5.5

Provider / routing chain

openclaw -> codex (oauth)

Additional provider/model setup details

  • OpenClaw broken version: 2026.5.2 (8b2a6e5 observed locally)
  • Channel: WhatsApp direct message
  • Provider/runtime observed: openai-codex/gpt-5.5 via OpenAI Codex Responses
  • Host: Linux x64
  • Session type: main agent WhatsApp DM session

Logs, screenshots, and evidence

Potentially higher-probability triggers:
- Tool-heavy turns with multiple tool calls.
- A newer inbound message arriving immediately after the final tool result but before final assistant text is delivered.
- Context/preflight recovery, mid-turn continuation, or retry behavior.
- Long-running agent sessions where transcript compaction/windowing may occur.

## Diagnosis / Suspected Cause
Code comparison between `v2026.4.29` and `v2026.5.2` suggests the WhatsApp adapter is probably not the root cause.

### WhatsApp routing looked unchanged
Reviewed WhatsApp inbound/direct routing, dedupe, and route-session construction. For a direct DM, the session key still resolves in the expected form:


agent:main:whatsapp:direct:<peer>


The main WhatsApp changes in `2026.5.2` appear related to newsletter/channel support, not ordinary direct-message routing. Inbound message normalization and dedupe did not show an obvious regression matching this symptom.

### More likely: shared embedded-run/reply pipeline regression
The stronger suspect is in the shared agent run/reply finalization path, especially these areas:

- `src/agents/pi-embedded-runner/run.ts`
- `src/agents/pi-embedded-runner/run/attempt.ts`
- `src/auto-reply/reply/agent-runner.ts`
- `src/auto-reply/reply/followup-runner.ts`
- related reply queue/follow-up/finalization plumbing

Relevant behavioral changes noticed in `2026.5.2` include:

- More structured mid-turn precheck/recovery handling.
- A continuation prompt along the lines of:


Continue from the current transcript after the latest tool result. Do not repeat the original user request, and do not rerun completed tools unless the transcript shows they are still needed.


- Retry/continuation paths that can replace the original user prompt with a generic “continue from transcript” instruction.
- Reply queue behavior changes around queued followups while another run is active.

Impact and severity

  • User receives a reply to an older task after sending a newer unrelated message.
  • The assistant appears to ignore the latest WhatsApp message.
  • Tool-completion/finalization text can leak into a later conversational turn.
  • This breaks trust in messaging channels because replies are no longer anchored to the latest inbound message.

Additional information

Diagnosis / Suspected Cause

Code comparison between v2026.4.29 and v2026.5.2 suggests the WhatsApp adapter is probably not the root cause.

WhatsApp routing looked unchanged

Reviewed WhatsApp inbound/direct routing, dedupe, and route-session construction. For a direct DM, the session key still resolves in the expected form:

agent:main:whatsapp:direct:<peer>

The main WhatsApp changes in 2026.5.2 appear related to newsletter/channel support, not ordinary direct-message routing. Inbound message normalization and dedupe did not show an obvious regression matching this symptom.

More likely: shared embedded-run/reply pipeline regression

The stronger suspect is in the shared agent run/reply finalization path, especially these areas:

  • src/agents/pi-embedded-runner/run.ts
  • src/agents/pi-embedded-runner/run/attempt.ts
  • src/auto-reply/reply/agent-runner.ts
  • src/auto-reply/reply/followup-runner.ts
  • related reply queue/follow-up/finalization plumbing

Relevant behavioral changes noticed in 2026.5.2 include:

  • More structured mid-turn precheck/recovery handling.
  • A continuation prompt along the lines of:
Continue from the current transcript after the latest tool result. Do not repeat the original user request, and do not rerun completed tools unless the transcript shows they are still needed.
  • Retry/continuation paths that can replace the original user prompt with a generic “continue from transcript” instruction.
  • Reply queue behavior changes around queued followups while another run is active.

This creates a plausible failure mode:

  1. Run A receives user message A and performs tools.
  2. Tool result for Run A completes.
  3. Before Run A’s final assistant text is delivered, user message B arrives.
  4. Runtime/session state now contains or is focused on message B, but Run A still has a pending finalization/continuation.
  5. The continuation/finalization from Run A is delivered into the WhatsApp session after message B, appearing as the reply to B.

In short: the stale output appears to be a delayed foreground final answer from an older tool-backed run, not a misrouted WhatsApp inbound message.

Related Log Evidence

Another issue was observed in logs:

sessions.resolve ... INVALID_REQUEST ... No session found: current

This indicates a separate or adjacent current session alias resolution problem. It may be worth fixing, but based on the transcript evidence it does not appear to be the primary cause of the stale WhatsApp reply. The stale reply can be explained by delayed finalization of the previous tool-backed turn.

There was also an earlier restart/listener issue:

No active WhatsApp Web listener (account: default)

That may be a separate WhatsApp listener/reconnect problem unless further evidence connects it to stale foreground reply delivery.

Suggested Fix

Add freshness checks before delivering foreground assistant replies to messaging channels:

  1. Bind output to an origin run/message

    • Track runId, session key, inbound provider message ID, and/or latest user transcript sequence for every pending foreground reply.
  2. Check staleness before channel delivery

    • Before sending final assistant text to WhatsApp, verify that no newer user message exists in the same session/chat than the one that started the run.
  3. Suppress stale foreground finalizations

    • If a newer user message exists, do not deliver the older foreground final answer as a normal chat reply.
    • Optionally log it as suppressed stale output.
  4. Separate background completions from foreground replies

    • If an older run genuinely needs to notify the user after a newer message, deliver it as an explicitly labeled background/task completion, not as the assistant answer to the latest message.
  5. Guard continuation/retry prompts

    • Do not use “continue from current transcript after latest tool result” for a foreground messaging run if the latest transcript item is a newer user message from the same chat.

Workaround

Operational workaround only:

  • The assistant can try to manually re-anchor to the latest user message every turn and suppress stale continuation chatter.

This reduces visible confusion but cannot fully fix the runtime/channel delivery race.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions