Skip to content

fix(auto-reply): suppress stale foreground replies#76970

Merged
obviyus merged 3 commits into
openclaw:mainfrom
MkDev11:fix/issue-76905-whatsapp-stale-final
May 10, 2026
Merged

fix(auto-reply): suppress stale foreground replies#76970
obviyus merged 3 commits into
openclaw:mainfrom
MkDev11:fix/issue-76905-whatsapp-stale-final

Conversation

@MkDev11

@MkDev11 MkDev11 commented May 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: older in-flight foreground auto-reply runs could deliver a final reply after a newer inbound message started for the same session target.
  • Why it matters: messaging channels can show stale tool-turn completions as if they answered the latest user message.
  • What changed: added a shared buffered-dispatch freshness guard that suppresses superseded foreground replies before delivery.
  • What did NOT change (scope boundary): no provider replay/cache behavior, WhatsApp-specific routing, or background completion notification semantics were changed.

Change Type

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Real behavior proof

  • Behavior or issue addressed: older in-flight foreground auto-reply finals are suppressed when a newer inbound WhatsApp-style turn starts for the same channel/account/session/target before the older final is delivered.
  • Real environment tested: Linux container in /root/mkdev11/openclaw, Node v22.22.1, pnpm 10.33.2, PR branch fix/issue-76905-whatsapp-stale-final at 92148e40c818bdca1d9e5ef3554712159eddae40.
  • Exact steps or command run after this patch: Ran a local Node harness with node --import tsx -e that imported the real dispatchInboundMessageWithBufferedDispatcher, created two finalized WhatsApp-style inbound contexts for the same session target, held the older reply resolver until after the newer dispatch completed, then released the older final.
  • Evidence after fix: Terminal output from the real dispatch module run:
{
  "newerResult": {
    "queuedFinal": true,
    "counts": {
      "tool": 0,
      "block": 0,
      "final": 1
    }
  },
  "olderResult": {
    "queuedFinal": false,
    "counts": {
      "tool": 0,
      "block": 0,
      "final": 0
    }
  },
  "deliveries": [
    {
      "kind": "final",
      "text": "new final"
    }
  ]
}
  • Observed result after fix: The newer final was delivered, the older final was not delivered, and the older dispatch reconciled to queuedFinal=false with counts.final=0.
  • What was not tested: A live WhatsApp account plus live provider race was not replayed; this proof exercises the real shared OpenClaw buffered dispatch path with local WhatsApp-style contexts and a deterministic delayed resolver.
  • Before evidence (optional but encouraged): Before this patch, the shared buffered dispatch path had no foreground freshness fence, so an older final could still enter delivery after a newer same-target turn began.

Root Cause

  • Root cause: buffered foreground reply delivery did not verify that the originating inbound turn was still the newest active turn for the same channel/account/session/target before delivering a final payload.
  • Missing detection / guardrail: no shared freshness guard existed in the generic auto-reply delivery path.
  • Contributing context (if known): active-run steering can record a newer inbound message while an older tool-backed run is still finalizing.

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/auto-reply/dispatch.freshness.test.ts
  • Scenario the test should lock in: an older foreground final is suppressed after a newer inbound turn starts for the same session target.
  • Why this is the smallest reliable guardrail: it exercises the shared buffered dispatch path directly without requiring live WhatsApp or provider timing.
  • Existing test that already covers this (if any): N/A
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

Older stale foreground replies are suppressed when a newer inbound message starts for the same channel/account/session/target before the older dispatch finalizes.

Diagram

Before:
message A -> long tool run -> message B starts -> final from A is delivered

After:
message A -> long tool run -> message B starts -> final from A is suppressed

Security Impact

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux
  • Runtime/container: local checkout
  • Model/provider: N/A
  • Integration/channel (if any): shared auto-reply dispatch; regression models WhatsApp-style foreground context
  • Relevant config (redacted): default test config

Steps

  1. Start a foreground buffered dispatch for one session target.
  2. Start a newer foreground buffered dispatch for the same session target before the older final is delivered.
  3. Let the newer final deliver, then let the older final attempt delivery.

Expected

  • The newer final is delivered.
  • The older final is cancelled before delivery and removed from queued final counts.

Actual

  • Matches expected with the new regression test.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios:
    • stale same-session same-target final is suppressed
    • concurrent different-target finals sharing a session are not suppressed
    • existing dispatch lifecycle tests still pass
  • Edge cases checked:
    • missing or different target keys do not collide
    • cancellation reconciles queued final counts
    • repo formatting and changed-file checks pass
  • What you did not verify:
    • live WhatsApp plus Codex provider reproduction

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: the guard could suppress a reply that intentionally completes after a newer foreground message.
    • Mitigation: the key is scoped by channel, account, session, chat type, and target, and only applies to the shared foreground buffered delivery path.

@clawsweeper

clawsweeper Bot commented May 3, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Summary
The PR adds a key-scoped foreground auto-reply freshness fence in shared buffered dispatch, regression coverage for stale same-target final suppression and target isolation, and a changelog entry for the linked stale-reply bug.

Reproducibility: yes. source-reproducible: current main routes WhatsApp-style contexts through the shared buffered dispatcher and delivery awaits beforeDeliver/deliver without a same-target freshness guard. I did not run a live WhatsApp or WebChat provider race in this read-only review.

Real behavior proof
Sufficient (terminal): The PR body includes after-fix terminal output from a real dispatch-module harness showing the newer final delivered while the older same-target final is suppressed.

Next step before merge
No repair lane is appropriate: the patch already contains the focused fix and proof, but exact-head CI/merge readiness needs maintainer handling.

Security
Cleared: The diff changes in-memory TypeScript dispatch logic, regression tests, and changelog text without dependency, workflow, permission, secret, network, or package-execution changes.

Review details

Best possible solution:

Land the shared buffered-dispatch freshness fence after maintainer review and green relevant checks, keeping the guard scoped to channel/account/session/chat-target facts.

Do we have a high-confidence way to reproduce the issue?

Yes, source-reproducible: current main routes WhatsApp-style contexts through the shared buffered dispatcher and delivery awaits beforeDeliver/deliver without a same-target freshness guard. I did not run a live WhatsApp or WebChat provider race in this read-only review.

Is this the best way to solve the issue?

Yes: a shared buffered-dispatch freshness fence is narrower than a WhatsApp-only workaround and covers the async beforeDeliver window. The remaining issue is merge/CI readiness, not a known defect in the proposed code.

Acceptance criteria:

  • pnpm test src/auto-reply/dispatch.freshness.test.ts src/auto-reply/reply/before-deliver.test.ts
  • pnpm check:changed -- --base upstream/main
  • Review the latest PR check runs for check-additional, checks-node-core, build-smoke, build-artifacts, and check-additional-boundaries-d on head 25e8a81.

What I checked:

  • Current main lacks freshness guard: On current main, dispatchInboundMessageWithBufferedDispatcher passes the configured or message_sending beforeDeliver hook straight into createReplyDispatcherWithTyping without any same-target generation/freshness fence. (src/auto-reply/dispatch.ts:203, afe26f51a3f8)
  • Async delivery window exists: The reply dispatcher awaits beforeDeliver and then calls deliver, so an older queued final can remain pending while a newer inbound turn starts. (src/auto-reply/reply/reply-dispatcher.ts:257, afe26f51a3f8)
  • WhatsApp supplies stable key inputs: The WhatsApp inbound context includes SessionKey, AccountId, Provider/Surface, OriginatingChannel, and OriginatingTo, which are enough to scope a shared foreground freshness key to the same channel/account/session/target. (extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.ts:246, afe26f51a3f8)
  • PR adds shared fence: At PR head, dispatch.ts resolves a foreground key from channel/account/session/chat-target facts, begins a generation fence before dispatch, and rechecks supersession before and after the async beforeDeliver hook. (src/auto-reply/dispatch.ts:49, 25e8a819d5d1)
  • PR regression coverage matches bug class: The new test file covers stale same-target final suppression, the pending-beforeDeliver race, and different-target isolation within a shared session key. (src/auto-reply/dispatch.freshness.test.ts:86, 25e8a819d5d1)
  • Real behavior proof supplied: The PR body includes terminal JSON from a Node harness importing the real dispatch module: the newer final is delivered, the older same-target final is suppressed, and older counts reconcile to queuedFinal=false/counts.final=0. (25e8a819d5d1)

Likely related people:

  • steipete: Recent main history includes shared channel turn routing, durable outbound lifecycle, WhatsApp auto-reply tracking, and message-lifecycle migration near this dispatch path. (role: recent area contributor; confidence: high; commits: 9a9cd0c0abdf, 2ead1502c9bf, 05eda57b3c72; files: src/auto-reply/dispatch.ts, src/auto-reply/reply/reply-dispatcher.ts, extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.ts)
  • jzakirov: Introduced the shared message_sending beforeDeliver delivery hook path, which is one of the async windows this PR fences. (role: hook path introducer; confidence: high; commits: 52267a6b758c; files: src/auto-reply/dispatch.ts, src/auto-reply/reply/reply-dispatcher.ts)
  • obviyus: Merged history includes duplicate final-delivery accounting work in reply-dispatcher, and the latest PR head simplifies the foreground freshness fence implementation. (role: adjacent reply-dispatch contributor; confidence: medium; commits: 4b1c37a15239, 25e8a819d5d1; files: src/auto-reply/reply/reply-dispatcher.ts, src/auto-reply/dispatch.ts)

Remaining risk / open question:

  • The latest exact PR head has failing aggregate/build check runs, so merge readiness still depends on maintainer CI review or a refreshed validation run.
  • The branch is diverged from current main, so the final merge result should be checked against the current dispatch/changelog surface before landing.

Codex review notes: model gpt-5.5, reasoning high; reviewed against afe26f51a3f8.

@MkDev11 MkDev11 force-pushed the fix/issue-76905-whatsapp-stale-final branch 3 times, most recently from bb86df0 to 819d1d4 Compare May 3, 2026 22:54
@MkDev11

MkDev11 commented May 3, 2026

Copy link
Copy Markdown
Contributor Author

Codex review: needs changes before merge.

Summary The PR adds a shared foreground auto-reply freshness fence in src/auto-reply/dispatch.ts, regression coverage in src/auto-reply/dispatch.freshness.test.ts, and a changelog entry for #76905.

Reproducibility: yes. source-reproducible: WhatsApp records stable session/account/origin target fields and routes through the shared buffered dispatcher, while current main has no generation guard. The PR models the same-session race; this read-only review did not run tests.

Next step before merge A narrow automated repair can add the post-hook freshness recheck and one regression test on the existing PR branch.

Security Cleared: The diff only changes in-memory TypeScript dispatch logic, tests, and changelog text; it adds no dependency, CI, permission, secret, network, or package-execution surface.

Review findings

  • [P2] Recheck freshness after async beforeDeliver hooks — src/auto-reply/dispatch.ts:243

Review details

@clawsweeper, addressed in 819d1d4.

The foreground freshness fence now rechecks supersession after async beforeDeliver hooks resolve, so an older reply is cancelled if a newer same-key inbound starts while the hook is pending.

Added regression coverage for that hook race in src/auto-reply/dispatch.freshness.test.ts.

Verified:

  • pnpm test src/auto-reply/dispatch.freshness.test.ts src/auto-reply/reply/before-deliver.test.ts
  • pnpm exec oxfmt --check --threads=1 src/auto-reply/dispatch.ts src/auto-reply/dispatch.freshness.test.ts src/auto-reply/reply/before-deliver.test.ts
  • pnpm check:changed -- --base upstream/main

@sbmilburn

Copy link
Copy Markdown

Awesome! Looking forward to getting this fix.

@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 7, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 7, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@christian-lallo

Copy link
Copy Markdown

Hi @sbmilburn — we're hitting this same bug class on WebChat (same responseId mismatch pattern you described, confirmed via JSONL inspection of our session store). I'd like to apply this fix locally on top of openclaw@2026.5.7, but the PR's last main-merge (May 7, 11:46 UTC) is ~9 hours behind 2026.5.7's release tag. Would you mind syncing from main again so the patched dist would include 2026.5.7's other fixes? Happy to share reproduction details if useful for review acceleration. Thanks for the work on this.

@sbmilburn

Copy link
Copy Markdown

@christian-lallo I didn't write the PR or code the fix, that would be @MkDev11. I just filed Bug #76905.

@christian-lallo

Copy link
Copy Markdown

Apologies @sbmilburn, my mistake — and thanks for the redirect. @MkDev11 — would you mind re-syncing this PR from main? It was about ~9 hours behind 2026.5.7 when 2026.5.7 was published, so building from the current branch would inadvertently roll back some of the 2026.5.7 fixes when we apply the patched dist locally. Happy to share our WebChat-side reproduction details if useful for review acceleration. Appreciate the work.

@MkDev11

MkDev11 commented May 9, 2026

Copy link
Copy Markdown
Contributor Author

Apologies @sbmilburn, my mistake — and thanks for the redirect. @MkDev11 — would you mind re-syncing this PR from main? It was about ~9 hours behind 2026.5.7 when 2026.5.7 was published, so building from the current branch would inadvertently roll back some of the 2026.5.7 fixes when we apply the patched dist locally. Happy to share our WebChat-side reproduction details if useful for review acceleration. Appreciate the work.

sure

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026
@christian-lallo

Copy link
Copy Markdown

Thanks @MkDev11 — much appreciated, fast turn. Will report back if our local build surfaces anything worth flagging.

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026
@obviyus obviyus force-pushed the fix/issue-76905-whatsapp-stale-final branch from 25e8a81 to 3d26832 Compare May 10, 2026 12:31
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026
@obviyus obviyus merged commit 12520e7 into openclaw:main May 10, 2026
109 of 112 checks passed
@obviyus

obviyus commented May 10, 2026

Copy link
Copy Markdown
Contributor

Landed via rebase onto main.

  • Scoped tests: pnpm test src/auto-reply/dispatch.freshness.test.ts src/auto-reply/reply/before-deliver.test.ts; pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/auto-reply/dispatch.ts src/auto-reply/dispatch.freshness.test.ts src/auto-reply/reply/before-deliver.test.ts; pnpm check:changed -- --base origin/main
  • Changelog: CHANGELOG.md updated
  • Source head before merge: 3d26832
  • Landed commits: 7308f40, cf7e01a, 12520e7
  • Merge commit: 12520e7

Thanks @MkDev11!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

proof: supplied External PR includes structured after-fix real behavior proof. size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Stale Tool-Turn Finalization Delivered After Newer WhatsApp Message

4 participants