feat(whatsapp): expand live QA coverage#90480
Conversation
|
Codex review: needs maintainer review before merge. Reviewed June 7, 2026, 10:32 PM ET / 02:32 UTC. Summary PR surface: Source +2443, Tests +1814, Docs +32. Total +4289 across 18 files. Reproducibility: not applicable. as a feature PR. Current main lacks the new scenario IDs, while the PR body provides a real WhatsApp mock-openai lane result of 35 passed and focused local checks. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Keep this PR as the canonical WhatsApp QA expansion, then land it only after maintainers accept the broader default lane and the relevant head checks/proof are satisfactory. Do we have a high-confidence way to reproduce the issue? Not applicable as a feature PR. Current main lacks the new scenario IDs, while the PR body provides a real WhatsApp mock-openai lane result of 35 passed and focused local checks. Is this the best way to solve the issue? Yes, the direction fits the owner boundary: QA Lab imports WhatsApp through @openclaw/whatsapp/api.js and the production listener contract stays narrow. The remaining decision is whether the broader default lane cost is acceptable. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 4780546c124d. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +2443, Tests +1814, Docs +32. Total +4289 across 18 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
2acaf5d to
d86971d
Compare
d86971d to
65f008d
Compare
0c3e0d4 to
373b051
Compare
cff84f5 to
acc68bb
Compare
aedbe8a to
0157cd7
Compare
0157cd7 to
6ab719b
Compare
cc3945d to
4e1ec22
Compare
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Summary
Expands the WhatsApp live QA lane from a small smoke set into a broader regression lane for the extension's transport, Gateway, and WhatsApp Web integration boundaries.
This PR also expands the WhatsApp QA driver/send surface and mock-provider support so the lane can exercise structured WhatsApp capabilities deterministically instead of relying on private test-only hooks or frontier-model wording.
What Changed
observedAftersendpollmessage.actionmock-openaisupport for WhatsApp-specific scripted responses, including audio-preflight markers and long final/chunked response checks.whatsapp-group-allowlist-blockis the WhatsAppallowlist-blockstandard scenario.docs/concepts/qa-e2e-automation.md.WhatsApp QA Coverage
The WhatsApp QA catalog now covers 35 scenarios total.
Scenario families covered in this branch include:
/help/status/commands/tools compact/whoami/context list/newmessage.actionreactmessage.actionupload-fileContact-card and sticker send support are part of the expanded QA driver/send API surface, but they are covered by focused driver/send API tests rather than current live scenario IDs.
Default Lane Behavior
live-frontierremains intentionally small at 8 default scenarios for fast live smoke coverage.mock-openairuns 29 deterministic WhatsApp scenarios by default.The mocked-provider scenarios still run through the real WhatsApp transport; only the model/provider response is mocked so the lane can assert exact markers, chunk counts, and structured behavior without frontier-model variance.
Verification
WhatsApp QA proof on local rewritten HEAD:
OPENCLAW_QA_ALLOW_INSECURE_HTTP=1 OPENCLAW_QA_CONVEX_SITE_URL=http://127.0.0.1:<local-broker> OPENCLAW_QA_CONVEX_SECRET_MAINTAINER=<redacted> OPENCLAW_QA_WHATSAPP_GROUP_JID=<configured/redacted> pnpm openclaw qa whatsapp --credential-source convex --credential-role maintainer --provider-mode mock-openai --sut-account work --output-dir .artifacts/qa-e2e/whatsapp-full-current-8b17055ef7-20260605-213904 --scenario whatsapp-canary --scenario whatsapp-pairing-block --scenario whatsapp-mention-gating --scenario whatsapp-top-level-reply-shape --scenario whatsapp-restart-resume --scenario whatsapp-help-command --scenario whatsapp-status-command --scenario whatsapp-commands-command --scenario whatsapp-tools-compact-command --scenario whatsapp-whoami-command --scenario whatsapp-context-command --scenario whatsapp-tool-only-usage-footer --scenario whatsapp-reply-to-message --scenario whatsapp-reply-context-isolation --scenario whatsapp-inbound-image-caption --scenario whatsapp-audio-preflight --scenario whatsapp-outbound-media-matrix --scenario whatsapp-outbound-document-preserves-filename --scenario whatsapp-outbound-poll --scenario whatsapp-message-actions --scenario whatsapp-inbound-structured-messages --scenario whatsapp-group-audio-gating --scenario whatsapp-access-control-dm-open --scenario whatsapp-access-control-dm-disabled --scenario whatsapp-access-control-group-open --scenario whatsapp-access-control-group-disabled --scenario whatsapp-reply-delivery-shape --scenario whatsapp-stream-final-message-accounting --scenario whatsapp-native-new-command --scenario whatsapp-approval-exec-deny-native --scenario whatsapp-status-reactions --scenario whatsapp-group-allowlist-block --scenario whatsapp-approval-exec-native --scenario whatsapp-approval-exec-reaction-native --scenario whatsapp-approval-plugin-nativeFocused local checks:
pnpm test extensions/qa-lab/src/providers/mock-openai/server.test.ts extensions/qa-lab/src/live-transports/whatsapp/whatsapp-live.runtime.test.ts extensions/whatsapp/src/qa-driver.runtime.test.ts extensions/whatsapp/src/inbound/send-api.test.tspnpm test extensions/qa-lab/src/live-transports/shared/live-gateway.runtime.test.tspnpm tsgo:extensionspnpm lint:extensionsgit diff --check origin/main..HEAD