Skip to content

fix(whatsapp): dedupe captioned MEDIA auto-replies#78770

Merged
mcaxtr merged 2 commits intoopenclaw:mainfrom
ai-hpc:fix/whatsapp-media-directive-dedupe
May 7, 2026
Merged

fix(whatsapp): dedupe captioned MEDIA auto-replies#78770
mcaxtr merged 2 commits intoopenclaw:mainfrom
ai-hpc:fix/whatsapp-media-directive-dedupe

Conversation

@ai-hpc
Copy link
Copy Markdown
Contributor

@ai-hpc ai-hpc commented May 7, 2026

Summary

  • Problem: WhatsApp inbound auto-reply can deliver a captioned MEDIA: directive twice: first as an empty media-only message, then again as the final captioned media reply.
  • Why it matters: Users receive duplicate WhatsApp images even though the assistant emitted one final response with one MEDIA: directive.
  • What changed: Buffer media-only interim WhatsApp auto-reply payloads and drop the buffered payload when a later block/final payload carries the same media URL with visible caption text.
  • What did NOT change (scope boundary): Low-level WhatsApp media sending, CLI message send --media, and legitimate tool-only/media-only replies are preserved.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause (if applicable)

  • Root cause: WhatsApp inbound auto-reply delivered media-only interim payloads immediately, then later delivered the final captioned payload with the same media URL.
  • Missing detection / guardrail: The dispatcher did not coalesce same-media interim/final deliveries before handing them to WhatsApp delivery.
  • Contributing context (if known): The assistant trajectory can contain one final visible response with MEDIA:..., while the dispatcher lifecycle can still surface an earlier media-only payload for the same media.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.test.ts
  • Scenario the test should lock in: a media-only interim payload with /tmp/generated.jpg is not sent when a later block payload contains the same media URL plus caption text.
  • Why this is the smallest reliable guardrail: The duplicate is created in the WhatsApp inbound dispatch lifecycle before low-level delivery, so the dispatcher unit test catches it without live WhatsApp credentials.
  • Existing test that already covers this (if any): existing delivery tests cover low-level media send behavior, but not lifecycle duplicate coalescing.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

WhatsApp auto-replies for captioned MEDIA: directives now send one captioned media message instead of an empty media message followed by the same captioned media message.

Diagram (if applicable)

Before:
media-only interim -> WhatsApp send
final caption + same media -> WhatsApp send again

After:
media-only interim -> buffer
final caption + same media -> replace buffered payload -> WhatsApp sends once

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Ubuntu/Linux VPS
  • Runtime/container: Node 22, local OpenClaw checkout
  • Model/provider: existing configured agent
  • Integration/channel (if any): WhatsApp default, linked and healthy
  • Relevant config (redacted): loopback gateway, systemd user service

Steps

  1. Ensure a small local image exists, e.g. media/whatsapp-tests/live-media-directive.png.
  2. From WhatsApp, send:
    Reply with exactly these two lines, no markdown, no code fence, no extra text:
    WhatsApp inbound MEDIA directive live test
    MEDIA:media/whatsapp-tests/live-media-directive.png
    
  3. Watch redacted logs for Sent media reply and auto-reply sent (media).

Expected

  • One WhatsApp media reply with caption text.

Actual

  • Before this fix, WhatsApp sent the same image twice: first media-only, then captioned media.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Redacted failing live log shape:

Sent media reply to +<redacted> (0.00MB)
auto-reply sent (media) {"text":"","mediaUrl":"<outbound-media>","mediaKind":"image"}
Sent media reply to +<redacted> (0.00MB)
auto-reply sent (media) {"text":"WhatsApp inbound MEDIA directive live test","mediaUrl":"<outbound-media>","mediaKind":"image"}

Validation:

pnpm exec oxfmt --check --threads=1 extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.ts extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.test.ts CHANGELOG.md
pnpm test extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.test.ts extensions/whatsapp/src/auto-reply/deliver-reply.test.ts -- --reporter=verbose
pnpm check:changed
git diff --check

Real behavior proof

  • Behavior or issue addressed: WhatsApp inbound auto-reply for one captioned MEDIA: directive sent the same image twice, first as empty media and then as the captioned media reply.
  • Real environment tested: Ubuntu 24.04 VPS, Node 22, local OpenClaw checkout, linked and healthy WhatsApp default channel, loopback Gateway, live WhatsApp direct chat.
  • Exact steps or command run after this patch:
    1. Ran the local OpenClaw Gateway from this checkout.
    2. Sent this WhatsApp prompt to the linked OpenClaw number:
      Reply with exactly these two lines, no markdown, no code fence, no extra text:
      WhatsApp inbound MEDIA directive live test
      MEDIA:media/whatsapp-tests/live-media-directive.png
      
    3. Tailed redacted OpenClaw runtime logs for Inbound message, Sent media reply, and auto-reply sent (media).
  • Evidence after fix: Redacted runtime log from the live WhatsApp run:
    2026-05-07T03:01:32.497Z info gateway/channels/whatsapp/inbound Inbound message +<redacted> -> +<redacted> (direct, 294 chars)
    2026-05-07T03:01:48.688Z info gateway/channels/whatsapp/outbound Sent media reply to +<redacted> (0.00MB)
    2026-05-07T03:01:48.690Z info web-auto-reply {"text":"WhatsApp inbound MEDIA directive live test","mediaUrl":"<outbound-media>","mediaSizeBytes":91,"mediaKind":"image"} auto-reply sent (media)
    
  • Observed result after fix: The live WhatsApp run produced one captioned media reply and no second auto-reply sent (media) line after waiting.
  • What was not tested: No additional gaps beyond the redacted live WhatsApp setup above; automated regression tests cover the dispatcher replacement behavior and the tool-only media flush edge case.

Human Verification (required)

  • Verified scenarios: targeted WhatsApp dispatcher regression, low-level WhatsApp delivery tests, and previous live VPS reproduction with redacted logs.
  • Edge cases checked: tool-only media still flushes and sends when no final replacement arrives; provider rejection still avoids rememberSentText.
  • What you did not verify: additional handset/browser screenshots beyond the redacted live WhatsApp log proof above.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: buffering media-only interim payloads could suppress legitimate media-only replies.
    • Mitigation: pending media-only payloads flush after the dispatcher completes when no same-media final replacement arrives, with unit coverage for tool-only media turns.

@openclaw-barnacle openclaw-barnacle Bot added channel: whatsapp-web Channel integration: whatsapp-web size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 7, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 7, 2026

Codex review: needs changes before merge.

Summary
Buffers WhatsApp media-only interim auto-reply payloads, drops same-URL duplicates when later captioned media arrives, updates dispatcher tests, and adds a changelog entry.

Reproducibility: yes. source-level. Current main has a clear dispatcher path and an existing test sequence where tool media and later captioned block media with the same URL are delivered separately; the linked issue and PR proof add live WhatsApp logs/screenshots.

Real behavior proof
Sufficient (logs): The PR body has redacted after-fix live WhatsApp runtime logs, and inspected before/after screenshots show duplicate media before and one captioned media reply after the patch.

Next step before merge
A single URL-granular coalescer repair plus a focused regression test is narrow enough for an automated fix attempt on the PR branch.

Security
Cleared: The diff only changes WhatsApp dispatcher logic, tests, and changelog text, with no dependency, workflow, package metadata, permission, secret, or new code-execution surface change.

Review findings

  • [P2] Preserve non-overlapping deferred media — extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.ts:171-173
Review details

Best possible solution:

Keep dispatcher-level coalescing, but make dedupe URL-granular so only duplicate media is suppressed while non-overlapping deferred media still sends.

Do we have a high-confidence way to reproduce the issue?

Yes, source-level. Current main has a clear dispatcher path and an existing test sequence where tool media and later captioned block media with the same URL are delivered separately; the linked issue and PR proof add live WhatsApp logs/screenshots.

Is this the best way to solve the issue?

No, not yet. The dispatcher is the right seam, but the current implementation suppresses a whole deferred media payload on partial URL overlap; filtering or splitting only duplicate URLs is the safer narrow fix.

Full review comments:

  • [P2] Preserve non-overlapping deferred media — extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.ts:171-173
    When a deferred media-only payload contains multiple mediaUrls, this skips the whole candidate as soon as one URL overlaps the later captioned payload. A pending [a, b] followed by captioned [a] would lose b, even though WhatsApp delivery supports multi-attachment media. Filter or split the pending payload so only duplicated URLs are suppressed.
    Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.9

Acceptance criteria:

  • pnpm exec oxfmt --check --threads=1 extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.ts extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.test.ts CHANGELOG.md
  • pnpm test extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.test.ts extensions/whatsapp/src/auto-reply/deliver-reply.test.ts -- --reporter=verbose
  • pnpm check:changed
  • git diff --check

What I checked:

Likely related people:

  • Peter Steinberger: Local history and blame for the current WhatsApp inbound dispatcher/test path point to commit 610e882, which added the relevant files in this checkout's available history. (role: introduced behavior in available current-main history; confidence: medium; commits: 610e882dbf81; files: extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.ts, extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.test.ts, extensions/whatsapp/src/auto-reply/deliver-reply.ts)
  • @mcaxtr: The PR timeline assigns this work to @mcaxtr, and the current PR head includes a mcaxtr commit adding the changelog entry after review feedback. (role: assigned maintainer and recent PR follow-up owner; confidence: medium; commits: 54abdd786d09; files: CHANGELOG.md)
  • hclsys: The discussion says hclsys independently verified the dispatch-order root cause while investigating the linked issue, matching the affected inbound dispatcher surface. (role: adjacent investigator; confidence: medium; files: extensions/whatsapp/src/auto-reply/monitor/inbound-dispatch.ts)

Remaining risk / open question:

  • Latest discussion says some CI checks began failing after conflict resolution; those checks need inspection or rerun before merge.

Codex review notes: model gpt-5.5, reasoning high; reviewed against c233e813a5fa.

@ai-hpc
Copy link
Copy Markdown
Contributor Author

ai-hpc commented May 7, 2026

before-fix real WhatsApp behavior proof

image

after-fix real WhatsApp behavior proof

image

@ai-hpc ai-hpc force-pushed the fix/whatsapp-media-directive-dedupe branch from a88d67d to 9571f1a Compare May 7, 2026 04:03
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 7, 2026
@bwlee-dix
Copy link
Copy Markdown

Clean fix for the WhatsApp captioned media dedupe issue. The pendingMediaOnlyPayloads buffer with URL overlap detection is an elegant way to avoid the empty media message before the final captioned reply. One small suggestion: consider documenting the overlap behavior in a brief code comment — future readers might wonder why media-only payloads are deferred rather than delivered immediately. Tests cover both the dedupe and the final delivery path well. LGTM.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@ai-hpc ai-hpc force-pushed the fix/whatsapp-media-directive-dedupe branch from 9571f1a to 5f520bb Compare May 7, 2026 04:17
@ai-hpc ai-hpc force-pushed the fix/whatsapp-media-directive-dedupe branch from 5f520bb to 543596b Compare May 7, 2026 04:26
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@ai-hpc ai-hpc force-pushed the fix/whatsapp-media-directive-dedupe branch from 543596b to 1a5560b Compare May 7, 2026 04:28
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@ai-hpc ai-hpc force-pushed the fix/whatsapp-media-directive-dedupe branch from 1a5560b to 8f76756 Compare May 7, 2026 04:36
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@ai-hpc ai-hpc force-pushed the fix/whatsapp-media-directive-dedupe branch from 8f76756 to ef11b2f Compare May 7, 2026 04:48
@openclaw-barnacle openclaw-barnacle Bot added the commands Command implementations label May 7, 2026
@ai-hpc ai-hpc force-pushed the fix/whatsapp-media-directive-dedupe branch from ef11b2f to e56cdd4 Compare May 7, 2026 05:07
@openclaw-barnacle openclaw-barnacle Bot removed the commands Command implementations label May 7, 2026
@ai-hpc ai-hpc force-pushed the fix/whatsapp-media-directive-dedupe branch from e56cdd4 to dfde64a Compare May 7, 2026 05:11
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@ai-hpc ai-hpc force-pushed the fix/whatsapp-media-directive-dedupe branch from dfde64a to 1ede151 Compare May 7, 2026 05:17
@mcaxtr mcaxtr self-assigned this May 7, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@hclsys
Copy link
Copy Markdown
Contributor

hclsys commented May 7, 2026

Verified the duplicate-dispatch root cause independently while investigating #78767. The path does run before the final reply assembler flushes, which creates the two-send window for captioned media. The fix here is the right seam — deduplicate at the dispatch site rather than in the outbound delivery layer. CI green, approach looks solid.

@ai-hpc
Copy link
Copy Markdown
Contributor Author

ai-hpc commented May 7, 2026

@mcaxtr All CI is green now

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026
@mcaxtr mcaxtr merged commit a2efabf into openclaw:main May 7, 2026
112 checks passed
@mcaxtr
Copy link
Copy Markdown
Member

mcaxtr commented May 7, 2026

Merged via squash.

Thanks @ai-hpc!

github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
* fix(whatsapp): dedupe captioned MEDIA auto-replies

* docs: note whatsapp media directive dedupe

---------

Co-authored-by: Marcus Castro <mcaxtr@openclaw.ai>
rogerdigital pushed a commit to rogerdigital/openclaw that referenced this pull request May 9, 2026
* fix(whatsapp): dedupe captioned MEDIA auto-replies

* docs: note whatsapp media directive dedupe

---------

Co-authored-by: Marcus Castro <mcaxtr@openclaw.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: whatsapp-web Channel integration: whatsapp-web proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: WhatsApp auto-reply sends MEDIA directive attachment twice for captioned replies

4 participants