feat(whatsapp): expand live QA coverage by mcaxtr · Pull Request #90480 · openclaw/openclaw

mcaxtr · 2026-06-04T23:36:26Z

Summary

Expands the WhatsApp live QA lane from a small smoke set into a broader regression lane for the extension's transport, Gateway, and WhatsApp Web integration boundaries.

This PR also expands the WhatsApp QA driver/send surface and mock-provider support so the lane can exercise structured WhatsApp capabilities deterministically instead of relying on private test-only hooks or frontier-model wording.

What Changed

Expands the WhatsApp QA driver and active send API support for:
- media captions and audio observations
- quoted reply metadata
- reactions
- polls
- contact cards
- locations
- stickers
- stale-observation filtering with observedAfter
Adds QA Gateway/live-lane helpers for:
- send
- poll
- message.action
- workspace media fixtures
- Gateway DM sends routed to the driver peer
- observed-message recording and post-send assertions
- redacted timeout diagnostics for unmatched WhatsApp observations
Adds deterministic mock-openai support for WhatsApp-specific scripted responses, including audio-preflight markers and long final/chunked response checks.
Updates WhatsApp inbound extraction and listener test harness coverage for structured live events used by the expanded lane.
Aligns shared live-transport baseline coverage so whatsapp-group-allowlist-block is the WhatsApp allowlist-block standard scenario.
Documents the expanded WhatsApp QA lane, scenario catalog, environment variables, and output artifacts in docs/concepts/qa-e2e-automation.md.

WhatsApp QA Coverage

The WhatsApp QA catalog now covers 35 scenarios total.

Scenario families covered in this branch include:

Basic DM/group smoke and gating:
- canary
- pairing gate
- mention gating
- top-level reply shape
- restart/resume
Native command UX:
- /help
- /status
- /commands
- /tools compact
- /whoami
- /context list
- /new
Usage/final-output behavior:
- tool-only usage footer
- long streamed final message accounting
- long quoted reply chunking
Reply/quote behavior:
- reply-to mode quotes the triggering message
- fresh Gateway sends do not reuse prior quote context
Inbound media/structured messages:
- image caption
- audio preflight transcript
- document caption
- poll
- location
- group audio gating
Outbound send/action behavior:
- image
- document
- audio/voice
- multi-media
- document filename preservation
- native poll
- message.action react
- message.action upload-file
Access control:
- DM open
- DM disabled
- group open
- group disabled
- group allowlist block
Native approvals:
- exec allow-once
- exec deny
- exec reaction approval
- plugin allow-once
Status reactions:
- observable WhatsApp status reactions

Contact-card and sticker send support are part of the expanded QA driver/send API surface, but they are covered by focused driver/send API tests rather than current live scenario IDs.

Default Lane Behavior

live-frontier remains intentionally small at 8 default scenarios for fast live smoke coverage.
mock-openai runs 29 deterministic WhatsApp scenarios by default.
Approval scenarios and a few heavier/blocking checks remain explicit-only unless selected by scenario ID.

The mocked-provider scenarios still run through the real WhatsApp transport; only the model/provider response is mocked so the lane can assert exact markers, chunk counts, and structured behavior without frontier-model variance.

Verification

WhatsApp QA proof on local rewritten HEAD:

Branch/head tested: feat/whatsapp-qa-scenarios @ 8b17055
Base: origin/main @ 6da3b1f
Command: OPENCLAW_QA_ALLOW_INSECURE_HTTP=1 OPENCLAW_QA_CONVEX_SITE_URL=http://127.0.0.1:<local-broker> OPENCLAW_QA_CONVEX_SECRET_MAINTAINER=<redacted> OPENCLAW_QA_WHATSAPP_GROUP_JID=<configured/redacted> pnpm openclaw qa whatsapp --credential-source convex --credential-role maintainer --provider-mode mock-openai --sut-account work --output-dir .artifacts/qa-e2e/whatsapp-full-current-8b17055ef7-20260605-213904 --scenario whatsapp-canary --scenario whatsapp-pairing-block --scenario whatsapp-mention-gating --scenario whatsapp-top-level-reply-shape --scenario whatsapp-restart-resume --scenario whatsapp-help-command --scenario whatsapp-status-command --scenario whatsapp-commands-command --scenario whatsapp-tools-compact-command --scenario whatsapp-whoami-command --scenario whatsapp-context-command --scenario whatsapp-tool-only-usage-footer --scenario whatsapp-reply-to-message --scenario whatsapp-reply-context-isolation --scenario whatsapp-inbound-image-caption --scenario whatsapp-audio-preflight --scenario whatsapp-outbound-media-matrix --scenario whatsapp-outbound-document-preserves-filename --scenario whatsapp-outbound-poll --scenario whatsapp-message-actions --scenario whatsapp-inbound-structured-messages --scenario whatsapp-group-audio-gating --scenario whatsapp-access-control-dm-open --scenario whatsapp-access-control-dm-disabled --scenario whatsapp-access-control-group-open --scenario whatsapp-access-control-group-disabled --scenario whatsapp-reply-delivery-shape --scenario whatsapp-stream-final-message-accounting --scenario whatsapp-native-new-command --scenario whatsapp-approval-exec-deny-native --scenario whatsapp-status-reactions --scenario whatsapp-group-allowlist-block --scenario whatsapp-approval-exec-native --scenario whatsapp-approval-exec-reaction-native --scenario whatsapp-approval-plugin-native
Provider mode: mock-openai
Driver account: default
SUT account: work
QA group: <configured/redacted>
Result: 35 passed / 0 failed / 0 skipped, discovered 35 scenarios
Artifacts:
- .artifacts/qa-e2e/whatsapp-full-current-8b17055ef7-20260605-213904/whatsapp-qa-report.md
- .artifacts/qa-e2e/whatsapp-full-current-8b17055ef7-20260605-213904/whatsapp-qa-summary.json
- .artifacts/qa-e2e/whatsapp-full-current-8b17055ef7-20260605-213904/whatsapp-qa-observed-messages.json
- .artifacts/qa-e2e/whatsapp-full-current-8b17055ef7-20260605-213904/gateway-debug/ (not created; preserved only on scenario failure)

Focused local checks:

pnpm test extensions/qa-lab/src/providers/mock-openai/server.test.ts extensions/qa-lab/src/live-transports/whatsapp/whatsapp-live.runtime.test.ts extensions/whatsapp/src/qa-driver.runtime.test.ts extensions/whatsapp/src/inbound/send-api.test.ts
pnpm test extensions/qa-lab/src/live-transports/shared/live-gateway.runtime.test.ts
pnpm tsgo:extensions
pnpm lint:extensions
git diff --check origin/main..HEAD

clawsweeper · 2026-06-04T23:38:04Z

Codex review: needs maintainer review before merge. Reviewed June 7, 2026, 10:32 PM ET / 02:32 UTC.

Summary
This PR expands the WhatsApp live QA lane, QA driver/send surface, mock-openai scripted responses, tests, and QA docs to cover 35 WhatsApp scenarios.

PR surface: Source +2443, Tests +1814, Docs +32. Total +4289 across 18 files.

Reproducibility: not applicable. as a feature PR. Current main lacks the new scenario IDs, while the PR body provides a real WhatsApp mock-openai lane result of 35 passed and focused local checks.

Review metrics: 1 noteworthy metric.

WhatsApp QA defaults: live-frontier 8 default scenarios; mock-openai 29 default scenarios; 35 total catalog. This is the maintainer-visible behavior change that can affect routine WhatsApp QA runtime and operator expectations before merge.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

Let the relevant head checks finish before landing.
Confirm maintainers accept the mock-openai WhatsApp default lane expanding to 29 scenarios.

Risk before merge

[P1] The mock-openai WhatsApp default lane grows to 29 scenarios and still uses the real WhatsApp transport, so existing maintainer/operator QA runs may become longer or more sensitive to account and group-state stability.
[P1] The PR touches WhatsApp send and observed-message classification paths used by live channel behavior; no concrete bug surfaced, but missed Baileys shape drift could affect message-delivery proof quality.
[P1] The branch is maintainer-labeled and size XL, so landing should wait for owner review of the broad QA/default-lane tradeoff rather than treating green unit checks as the whole decision.

Maintainer options:

Land after QA owner signoff (recommended)
Maintain the expanded defaults and live-driver surface if WhatsApp/QA owners accept the longer deterministic lane and current head validation is satisfactory.
Trim defaults before merge
If the default mock-openai runtime is too broad, keep the new scenarios selectable but move heavier cases to explicit scenario IDs before landing.
Pause for live-account stability
If the WhatsApp credential pool or group state is not stable enough for this broader lane, hold the PR until the operator runbook and account lease behavior are ready.

Next step before merge

[P2] Protected maintainer PR with broad live QA/default-lane changes; the remaining action is owner review and validation, not a narrow automated repair.

Security
Cleared: No concrete security or supply-chain regression was found; the diff does not change dependencies, lockfiles, workflows, permissions, or secret handling beyond redacted QA artifact behavior.

Review details

Best possible solution:

Keep this PR as the canonical WhatsApp QA expansion, then land it only after maintainers accept the broader default lane and the relevant head checks/proof are satisfactory.

Do we have a high-confidence way to reproduce the issue?

Not applicable as a feature PR. Current main lacks the new scenario IDs, while the PR body provides a real WhatsApp mock-openai lane result of 35 passed and focused local checks.

Is this the best way to solve the issue?

Yes, the direction fits the owner boundary: QA Lab imports WhatsApp through @openclaw/whatsapp/api.js and the production listener contract stays narrow. The remaining decision is whether the broader default lane cost is acceptable.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 4780546c124d.

Label changes

Label changes:

add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external contributor proof gate is not applicable because this is a maintainer/MEMBER PR, though the body includes live WhatsApp lane output and focused checks.
remove rating: 🌊 off-meta tidepool: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.

Label justifications:

P2: This is a normal-priority WhatsApp QA and extension improvement with limited product blast radius but meaningful review scope.
merge-risk: 🚨 automation: The PR changes the live WhatsApp QA lane, mock-provider scripting, scenario defaults, and artifacts that maintainer automation may rely on.
merge-risk: 🚨 compatibility: The PR changes default WhatsApp QA lane behavior for operators, especially the broader mock-openai default scenario set.
merge-risk: 🚨 message-delivery: The PR touches WhatsApp send API helpers and observed-message classification used to prove transport delivery.
rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external contributor proof gate is not applicable because this is a maintainer/MEMBER PR, though the body includes live WhatsApp lane output and focused checks.

Evidence reviewed

PR surface:

Source +2443, Tests +1814, Docs +32. Total +4289 across 18 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	11	2645	202	+2443
Tests	6	1828	14	+1814
Docs	1	62	30	+32
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	18	4535	246	+4289

What I checked:

Root policy read: Read the full root AGENTS.md and applied its maintainer/protected-label, extension-boundary, docs, and merge-risk guidance. (AGENTS.md:1, 4780546c124d)
Scoped policy read: Read the scoped extensions and docs AGENTS.md files; the PR stays within the extension public-boundary model and uses root-relative Mintlify docs links. (extensions/AGENTS.md:1, 4780546c124d)
Current main lacks the central scenario expansion: Current main has no matches for the new scenario IDs such as whatsapp-outbound-media-matrix, whatsapp-inbound-structured-messages, whatsapp-stream-final-message-accounting, or whatsapp-group-allowlist-block. (4780546c124d)
PR branch adds the expanded catalog: The PR head defines the expanded WhatsApp scenario union and WHATSAPP_QA_SCENARIOS catalog, including the new structured inbound, outbound media, stream-final, and allowlist scenarios. (extensions/qa-lab/src/live-transports/whatsapp/whatsapp-live.runtime.ts:55, 25e5c0bef7a8)
Default lane behavior changed: The PR selects default scenarios from standardId plus defaultProviderModes, making mock-openai the broader deterministic default while keeping live-frontier smaller. (extensions/qa-lab/src/live-transports/whatsapp/whatsapp-live.runtime.ts:1488, 25e5c0bef7a8)
WhatsApp send surface inspected: The PR adds contact, location, and sticker helpers through createWebSendApi while keeping ActiveWebListener narrowed to the existing sendMessage/sendPoll/sendReaction/sendComposingTo runtime surface. (extensions/whatsapp/src/inbound/send-api.ts:202, 25e5c0bef7a8)

Likely related people:

Vincent Koc: Current main blame and -S history for the WhatsApp QA lane, QA driver, send API, and mock provider all route through the recent baseline/import commits that currently own these files. (role: recent area contributor; confidence: medium; commits: 0f855ea71acc, 2e08f0f4221f; files: extensions/qa-lab/src/live-transports/whatsapp/whatsapp-live.runtime.ts, extensions/whatsapp/src/qa-driver.runtime.ts, extensions/whatsapp/src/inbound/send-api.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

mcaxtr · 2026-06-08T01:04:24Z

@clawsweeper re-review

clawsweeper · 2026-06-08T01:04:26Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Failed
Detail: The targeted re-review did not finish cleanly. Check the workflow run for details.
Run: https://github.com/openclaw/clawsweeper/actions/runs/27110390326
Updated: 2026-06-08T01:17:09.245Z

openclaw-barnacle Bot added docs Improvements or additions to documentation channel: whatsapp-web Channel integration: whatsapp-web extensions: qa-lab size: XL maintainer Maintainer-authored PR labels Jun 4, 2026

mcaxtr force-pushed the feat/whatsapp-qa-scenarios branch from 2acaf5d to d86971d Compare June 4, 2026 23:41

mcaxtr force-pushed the feat/whatsapp-qa-scenarios branch from d86971d to 65f008d Compare June 4, 2026 23:55

mcaxtr force-pushed the feat/whatsapp-qa-scenarios branch 2 times, most recently from 0c3e0d4 to 373b051 Compare June 5, 2026 01:00

mcaxtr force-pushed the feat/whatsapp-qa-scenarios branch from cff84f5 to acc68bb Compare June 5, 2026 05:10

clawsweeper Bot temporarily deployed to qa-live-shared June 6, 2026 01:54 Inactive

mcaxtr force-pushed the feat/whatsapp-qa-scenarios branch from aedbe8a to 0157cd7 Compare June 6, 2026 02:01

mcaxtr force-pushed the feat/whatsapp-qa-scenarios branch from 0157cd7 to 6ab719b Compare June 6, 2026 03:58

mcaxtr force-pushed the feat/whatsapp-qa-scenarios branch 5 times, most recently from cc3945d to 4e1ec22 Compare June 8, 2026 00:23

mcaxtr added 4 commits June 7, 2026 23:17

feat(whatsapp): expand qa driver message support

d6d7a6c

feat(qa-lab): add deterministic whatsapp mock replies

3ff8303

feat(qa-lab): expand whatsapp live qa scenarios

7df9de1

docs(qa): document whatsapp live qa coverage

25e5c0b

github-actions Bot mentioned this pull request Jun 8, 2026

📡 Upstream Digest — 2026-06-08 08:46 UTC curtismercier/openclaw-mods#1037

Open

clawsweeper Bot mentioned this pull request Jun 10, 2026

Feature Request: WhatsApp sticker send support #7476

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(whatsapp): expand live QA coverage#90480

feat(whatsapp): expand live QA coverage#90480
mcaxtr merged 4 commits into
mainfrom
feat/whatsapp-qa-scenarios

mcaxtr commented Jun 4, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

mcaxtr commented Jun 8, 2026

Uh oh!

clawsweeper Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mcaxtr commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

WhatsApp QA Coverage

Default Lane Behavior

Verification

Uh oh!

clawsweeper Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcaxtr commented Jun 8, 2026

Uh oh!

clawsweeper Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mcaxtr commented Jun 4, 2026 •

edited

Loading

clawsweeper Bot commented Jun 4, 2026 •

edited

Loading

clawsweeper Bot commented Jun 8, 2026 •

edited

Loading