fix(auto-reply): suppress stale foreground replies by MkDev11 · Pull Request #76970 · openclaw/openclaw

MkDev11 · 2026-05-03T22:19:23Z

Summary

Problem: older in-flight foreground auto-reply runs could deliver a final reply after a newer inbound message started for the same session target.
Why it matters: messaging channels can show stale tool-turn completions as if they answered the latest user message.
What changed: added a shared buffered-dispatch freshness guard that suppresses superseded foreground replies before delivery.
What did NOT change (scope boundary): no provider replay/cache behavior, WhatsApp-specific routing, or background completion notification semantics were changed.

Change Type

Scope

Linked Issue/PR

Closes [Bug]: Stale Tool-Turn Finalization Delivered After Newer WhatsApp Message #76905
Related #
This PR fixes a bug or regression

Real behavior proof

Behavior or issue addressed: older in-flight foreground auto-reply finals are suppressed when a newer inbound WhatsApp-style turn starts for the same channel/account/session/target before the older final is delivered.
Real environment tested: Linux container in /root/mkdev11/openclaw, Node v22.22.1, pnpm 10.33.2, PR branch fix/issue-76905-whatsapp-stale-final at 92148e40c818bdca1d9e5ef3554712159eddae40.
Exact steps or command run after this patch: Ran a local Node harness with node --import tsx -e that imported the real dispatchInboundMessageWithBufferedDispatcher, created two finalized WhatsApp-style inbound contexts for the same session target, held the older reply resolver until after the newer dispatch completed, then released the older final.
Evidence after fix: Terminal output from the real dispatch module run:

{
  "newerResult": {
    "queuedFinal": true,
    "counts": {
      "tool": 0,
      "block": 0,
      "final": 1
    }
  },
  "olderResult": {
    "queuedFinal": false,
    "counts": {
      "tool": 0,
      "block": 0,
      "final": 0
    }
  },
  "deliveries": [
    {
      "kind": "final",
      "text": "new final"
    }
  ]
}

Observed result after fix: The newer final was delivered, the older final was not delivered, and the older dispatch reconciled to queuedFinal=false with counts.final=0.
What was not tested: A live WhatsApp account plus live provider race was not replayed; this proof exercises the real shared OpenClaw buffered dispatch path with local WhatsApp-style contexts and a deterministic delayed resolver.
Before evidence (optional but encouraged): Before this patch, the shared buffered dispatch path had no foreground freshness fence, so an older final could still enter delivery after a newer same-target turn began.

Root Cause

Root cause: buffered foreground reply delivery did not verify that the originating inbound turn was still the newest active turn for the same channel/account/session/target before delivering a final payload.
Missing detection / guardrail: no shared freshness guard existed in the generic auto-reply delivery path.
Contributing context (if known): active-run steering can record a newer inbound message while an older tool-backed run is still finalizing.

Regression Test Plan

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/auto-reply/dispatch.freshness.test.ts
Scenario the test should lock in: an older foreground final is suppressed after a newer inbound turn starts for the same session target.
Why this is the smallest reliable guardrail: it exercises the shared buffered dispatch path directly without requiring live WhatsApp or provider timing.
Existing test that already covers this (if any): N/A
If no new test is added, why not: N/A

User-visible / Behavior Changes

Older stale foreground replies are suppressed when a newer inbound message starts for the same channel/account/session/target before the older dispatch finalizes.

Diagram

Before:
message A -> long tool run -> message B starts -> final from A is delivered

After:
message A -> long tool run -> message B starts -> final from A is suppressed

Security Impact

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: Linux
Runtime/container: local checkout
Model/provider: N/A
Integration/channel (if any): shared auto-reply dispatch; regression models WhatsApp-style foreground context
Relevant config (redacted): default test config

Steps

Start a foreground buffered dispatch for one session target.
Start a newer foreground buffered dispatch for the same session target before the older final is delivered.
Let the newer final deliver, then let the older final attempt delivery.

Expected

The newer final is delivered.
The older final is cancelled before delivery and removed from queued final counts.

Actual

Matches expected with the new regression test.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios:
- stale same-session same-target final is suppressed
- concurrent different-target finals sharing a session are not suppressed
- existing dispatch lifecycle tests still pass
Edge cases checked:
- missing or different target keys do not collide
- cancellation reconciles queued final counts
- repo formatting and changed-file checks pass
What you did not verify:
- live WhatsApp plus Codex provider reproduction

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: the guard could suppress a reply that intentionally completes after a newer foreground message.
- Mitigation: the key is scoped by channel, account, session, chat type, and target, and only applies to the shared foreground buffered delivery path.

clawsweeper · 2026-05-03T22:22:32Z

Codex review: needs maintainer review before merge.

Summary
The PR adds a key-scoped foreground auto-reply freshness fence in shared buffered dispatch, regression coverage for stale same-target final suppression and target isolation, and a changelog entry for the linked stale-reply bug.

Reproducibility: yes. source-reproducible: current main routes WhatsApp-style contexts through the shared buffered dispatcher and delivery awaits beforeDeliver/deliver without a same-target freshness guard. I did not run a live WhatsApp or WebChat provider race in this read-only review.

Real behavior proof
Sufficient (terminal): The PR body includes after-fix terminal output from a real dispatch-module harness showing the newer final delivered while the older same-target final is suppressed.

Next step before merge
No repair lane is appropriate: the patch already contains the focused fix and proof, but exact-head CI/merge readiness needs maintainer handling.

Security
Cleared: The diff changes in-memory TypeScript dispatch logic, regression tests, and changelog text without dependency, workflow, permission, secret, network, or package-execution changes.

Review details

Best possible solution:

Land the shared buffered-dispatch freshness fence after maintainer review and green relevant checks, keeping the guard scoped to channel/account/session/chat-target facts.

Do we have a high-confidence way to reproduce the issue?

Yes, source-reproducible: current main routes WhatsApp-style contexts through the shared buffered dispatcher and delivery awaits beforeDeliver/deliver without a same-target freshness guard. I did not run a live WhatsApp or WebChat provider race in this read-only review.

Is this the best way to solve the issue?

Yes: a shared buffered-dispatch freshness fence is narrower than a WhatsApp-only workaround and covers the async beforeDeliver window. The remaining issue is merge/CI readiness, not a known defect in the proposed code.

Acceptance criteria:

pnpm test src/auto-reply/dispatch.freshness.test.ts src/auto-reply/reply/before-deliver.test.ts
pnpm check:changed -- --base upstream/main
Review the latest PR check runs for check-additional, checks-node-core, build-smoke, build-artifacts, and check-additional-boundaries-d on head 25e8a81.

What I checked:

Current main lacks freshness guard: On current main, dispatchInboundMessageWithBufferedDispatcher passes the configured or message_sending beforeDeliver hook straight into createReplyDispatcherWithTyping without any same-target generation/freshness fence. (src/auto-reply/dispatch.ts:203, afe26f51a3f8)
Async delivery window exists: The reply dispatcher awaits beforeDeliver and then calls deliver, so an older queued final can remain pending while a newer inbound turn starts. (src/auto-reply/reply/reply-dispatcher.ts:257, afe26f51a3f8)
WhatsApp supplies stable key inputs: The WhatsApp inbound context includes SessionKey, AccountId, Provider/Surface, OriginatingChannel, and OriginatingTo, which are enough to scope a shared foreground freshness key to the same channel/account/session/target. (extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.ts:246, afe26f51a3f8)
PR adds shared fence: At PR head, dispatch.ts resolves a foreground key from channel/account/session/chat-target facts, begins a generation fence before dispatch, and rechecks supersession before and after the async beforeDeliver hook. (src/auto-reply/dispatch.ts:49, 25e8a819d5d1)
PR regression coverage matches bug class: The new test file covers stale same-target final suppression, the pending-beforeDeliver race, and different-target isolation within a shared session key. (src/auto-reply/dispatch.freshness.test.ts:86, 25e8a819d5d1)
Real behavior proof supplied: The PR body includes terminal JSON from a Node harness importing the real dispatch module: the newer final is delivered, the older same-target final is suppressed, and older counts reconcile to queuedFinal=false/counts.final=0. (25e8a819d5d1)

Likely related people:

steipete: Recent main history includes shared channel turn routing, durable outbound lifecycle, WhatsApp auto-reply tracking, and message-lifecycle migration near this dispatch path. (role: recent area contributor; confidence: high; commits: 9a9cd0c0abdf, 2ead1502c9bf, 05eda57b3c72; files: src/auto-reply/dispatch.ts, src/auto-reply/reply/reply-dispatcher.ts, extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.ts)
jzakirov: Introduced the shared message_sending beforeDeliver delivery hook path, which is one of the async windows this PR fences. (role: hook path introducer; confidence: high; commits: 52267a6b758c; files: src/auto-reply/dispatch.ts, src/auto-reply/reply/reply-dispatcher.ts)
obviyus: Merged history includes duplicate final-delivery accounting work in reply-dispatcher, and the latest PR head simplifies the foreground freshness fence implementation. (role: adjacent reply-dispatch contributor; confidence: medium; commits: 4b1c37a15239, 25e8a819d5d1; files: src/auto-reply/reply/reply-dispatcher.ts, src/auto-reply/dispatch.ts)

Remaining risk / open question:

The latest exact PR head has failing aggregate/build check runs, so merge readiness still depends on maintainer CI review or a refreshed validation run.
The branch is diverged from current main, so the final merge result should be checked against the current dispatch/changelog surface before landing.

Codex review notes: model gpt-5.5, reasoning high; reviewed against afe26f51a3f8.

MkDev11 · 2026-05-03T22:55:55Z

Codex review: needs changes before merge.

Summary The PR adds a shared foreground auto-reply freshness fence in src/auto-reply/dispatch.ts, regression coverage in src/auto-reply/dispatch.freshness.test.ts, and a changelog entry for #76905.

Reproducibility: yes. source-reproducible: WhatsApp records stable session/account/origin target fields and routes through the shared buffered dispatcher, while current main has no generation guard. The PR models the same-session race; this read-only review did not run tests.

Next step before merge A narrow automated repair can add the post-hook freshness recheck and one regression test on the existing PR branch.

Security Cleared: The diff only changes in-memory TypeScript dispatch logic, tests, and changelog text; it adds no dependency, CI, permission, secret, network, or package-execution surface.

Review findings

[P2] Recheck freshness after async beforeDeliver hooks — src/auto-reply/dispatch.ts:243

Review details

@clawsweeper, addressed in 819d1d4.

The foreground freshness fence now rechecks supersession after async beforeDeliver hooks resolve, so an older reply is cancelled if a newer same-key inbound starts while the hook is pending.

Added regression coverage for that hook race in src/auto-reply/dispatch.freshness.test.ts.

Verified:

pnpm test src/auto-reply/dispatch.freshness.test.ts src/auto-reply/reply/before-deliver.test.ts
pnpm exec oxfmt --check --threads=1 src/auto-reply/dispatch.ts src/auto-reply/dispatch.freshness.test.ts src/auto-reply/reply/before-deliver.test.ts
pnpm check:changed -- --base upstream/main

sbmilburn · 2026-05-04T00:51:36Z

Awesome! Looking forward to getting this fix.

christian-lallo · 2026-05-08T16:01:57Z

Hi @sbmilburn — we're hitting this same bug class on WebChat (same responseId mismatch pattern you described, confirmed via JSONL inspection of our session store). I'd like to apply this fix locally on top of openclaw@2026.5.7, but the PR's last main-merge (May 7, 11:46 UTC) is ~9 hours behind 2026.5.7's release tag. Would you mind syncing from main again so the patched dist would include 2026.5.7's other fixes? Happy to share reproduction details if useful for review acceleration. Thanks for the work on this.

sbmilburn · 2026-05-08T21:14:16Z

@christian-lallo I didn't write the PR or code the fix, that would be @MkDev11. I just filed Bug #76905.

christian-lallo · 2026-05-09T02:47:07Z

Apologies @sbmilburn, my mistake — and thanks for the redirect. @MkDev11 — would you mind re-syncing this PR from main? It was about ~9 hours behind 2026.5.7 when 2026.5.7 was published, so building from the current branch would inadvertently roll back some of the 2026.5.7 fixes when we apply the patched dist locally. Happy to share our WebChat-side reproduction details if useful for review acceleration. Appreciate the work.

MkDev11 · 2026-05-09T02:48:21Z

Apologies @sbmilburn, my mistake — and thanks for the redirect. @MkDev11 — would you mind re-syncing this PR from main? It was about ~9 hours behind 2026.5.7 when 2026.5.7 was published, so building from the current branch would inadvertently roll back some of the 2026.5.7 fixes when we apply the patched dist locally. Happy to share our WebChat-side reproduction details if useful for review acceleration. Appreciate the work.

sure

christian-lallo · 2026-05-09T14:16:15Z

Thanks @MkDev11 — much appreciated, fast turn. Will report back if our local build surfaces anything worth flagging.

obviyus · 2026-05-10T12:34:42Z

Landed via rebase onto main.

Scoped tests: pnpm test src/auto-reply/dispatch.freshness.test.ts src/auto-reply/reply/before-deliver.test.ts; pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/auto-reply/dispatch.ts src/auto-reply/dispatch.freshness.test.ts src/auto-reply/reply/before-deliver.test.ts; pnpm check:changed -- --base origin/main
Changelog: CHANGELOG.md updated
Source head before merge: 3d26832
Landed commits: 7308f40, cf7e01a, 12520e7
Merge commit: 12520e7

Thanks @MkDev11!

openclaw-barnacle Bot added the size: M label May 3, 2026

MkDev11 force-pushed the fix/issue-76905-whatsapp-stale-final branch 3 times, most recently from bb86df0 to 819d1d4 Compare May 3, 2026 22:54

clawsweeper Bot mentioned this pull request May 4, 2026

[Bug]: Stale Tool-Turn Finalization Delivered After Newer WhatsApp Message #76905

Closed

openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 7, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026

clawsweeper Bot mentioned this pull request May 7, 2026

fix/issue-76905-whatsapp-stale-final #79008

Closed

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026

MkDev11 and others added 3 commits May 10, 2026 17:57

fix(auto-reply): suppress stale foreground replies

152828e

refactor(auto-reply): simplify foreground freshness fence

50a1532

docs(changelog): credit stale reply fix author

3d26832

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026

obviyus force-pushed the fix/issue-76905-whatsapp-stale-final branch from 25e8a81 to 3d26832 Compare May 10, 2026 12:31

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026

obviyus merged commit 12520e7 into openclaw:main May 10, 2026
109 of 112 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(auto-reply): suppress stale foreground replies#76970

fix(auto-reply): suppress stale foreground replies#76970
obviyus merged 3 commits into
openclaw:mainfrom
MkDev11:fix/issue-76905-whatsapp-stale-final

MkDev11 commented May 3, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 3, 2026 •

edited

Loading

Uh oh!

MkDev11 commented May 3, 2026

Uh oh!

sbmilburn commented May 4, 2026

Uh oh!

christian-lallo commented May 8, 2026

Uh oh!

sbmilburn commented May 8, 2026

Uh oh!

christian-lallo commented May 9, 2026

Uh oh!

MkDev11 commented May 9, 2026

Uh oh!

christian-lallo commented May 9, 2026

Uh oh!

Uh oh!

obviyus commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

MkDev11 commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type

Scope

Linked Issue/PR

Real behavior proof

Root Cause

Regression Test Plan

User-visible / Behavior Changes

Diagram

Security Impact

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

clawsweeper Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MkDev11 commented May 3, 2026

Uh oh!

sbmilburn commented May 4, 2026

Uh oh!

christian-lallo commented May 8, 2026

Uh oh!

sbmilburn commented May 8, 2026

Uh oh!

christian-lallo commented May 9, 2026

Uh oh!

MkDev11 commented May 9, 2026

Uh oh!

christian-lallo commented May 9, 2026

Uh oh!

Uh oh!

obviyus commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MkDev11 commented May 3, 2026 •

edited

Loading

clawsweeper Bot commented May 3, 2026 •

edited

Loading