Skip to content

Commit 181238f

Browse files
authored
feat(whatsapp): expand live QA coverage (#90480)
* feat(whatsapp): expand qa driver message support * feat(qa-lab): add deterministic whatsapp mock replies * feat(qa-lab): expand whatsapp live qa scenarios * docs(qa): document whatsapp live qa coverage
1 parent 4780546 commit 181238f

18 files changed

Lines changed: 4535 additions & 246 deletions

docs/concepts/qa-e2e-automation.md

Lines changed: 62 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ script aliases; both forms are supported.
4848
| `qa telegram` | Live transport lane against a real private Telegram group. |
4949
| `qa discord` | Live transport lane against a real private Discord guild channel. |
5050
| `qa slack` | Live transport lane against a real private Slack channel. |
51+
| `qa whatsapp` | Live transport lane against real WhatsApp Web accounts. |
5152
| `qa mantis` | Before and after verification runner for live transport bugs, with Discord status-reactions evidence, Crabbox desktop/browser smoke, and Slack-in-VNC smoke. See [Mantis](/concepts/mantis) and [Mantis Slack Desktop Runbook](/concepts/mantis-slack-desktop-runbook). |
5253

5354
## Operator flow
@@ -168,15 +169,16 @@ decision still comes from the Discord REST oracle.
168169

169170
CI uses the same command surface in `.github/workflows/qa-live-transports-convex.yml`. Scheduled and default manual runs execute the fast Matrix profile with live frontier credentials, `--fast`, and `OPENCLAW_QA_MATRIX_NO_REPLY_WINDOW_MS=3000`. Manual `matrix_profile=all` fans out into the five profile shards so the exhaustive catalog can run in parallel while keeping one artifact directory per shard.
170171

171-
For transport-real Telegram, Discord, and Slack smoke lanes:
172+
For transport-real Telegram, Discord, Slack, and WhatsApp smoke lanes:
172173

173174
```bash
174175
pnpm openclaw qa telegram
175176
pnpm openclaw qa discord
176177
pnpm openclaw qa slack
178+
pnpm openclaw qa whatsapp
177179
```
178180

179-
They target a pre-existing real channel with two bots (driver + SUT). Required env vars, scenario lists, output artifacts, and the Convex credential pool are documented in [Telegram, Discord, and Slack QA reference](#telegram-discord-and-slack-qa-reference) below.
181+
They target a pre-existing real channel with two bots or accounts (driver + SUT). Required env vars, scenario lists, output artifacts, and the Convex credential pool are documented in [Telegram, Discord, Slack, and WhatsApp QA reference](#telegram-discord-slack-and-whatsapp-qa-reference) below.
180182

181183
For a full Slack desktop VM run with VNC rescue, run:
182184

@@ -276,10 +278,10 @@ coverage helpers, and scenario-selection helper from
276278
| Telegram | x | x | x | | | | | | | x | |
277279
| Discord | x | x | x | | | | | | | | x |
278280
| Slack | x | x | x | x | x | x | x | x | | | |
281+
| WhatsApp | x | x | | x | x | x | | | x | x | |
279282

280283
This keeps `qa-channel` as the broad product-behavior suite while Matrix,
281-
Telegram, and future live transports share one explicit transport-contract
282-
checklist.
284+
Telegram, and other live transports share one explicit transport-contract checklist.
283285

284286
For a disposable Linux VM lane without bringing Docker into the QA path, run:
285287

@@ -308,25 +310,25 @@ guest: env-based provider keys, the QA live provider config path, and
308310
`CODEX_HOME` when present. Keep `--output-dir` under the repo root so the guest
309311
can write back through the mounted workspace.
310312

311-
## Telegram, Discord, and Slack QA reference
313+
## Telegram, Discord, Slack, and WhatsApp QA reference
312314

313-
Matrix has a [dedicated page](/concepts/qa-matrix) because of its scenario count and Docker-backed homeserver provisioning. Telegram, Discord, and Slack are smaller - a handful of scenarios each, no profile system, against pre-existing real channels - so their reference lives here.
315+
Matrix has a [dedicated page](/concepts/qa-matrix) because of its scenario count and Docker-backed homeserver provisioning. Telegram, Discord, Slack, and WhatsApp run against pre-existing real transports, so their reference lives here.
314316

315317
### Shared CLI flags
316318

317319
These lanes register through `extensions/qa-lab/src/live-transports/shared/live-transport-cli.ts` and accept the same flags:
318320

319-
| Flag | Default | Description |
320-
| ------------------------------------- | --------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
321-
| `--scenario <id>` | - | Run only this scenario. Repeatable. |
322-
| `--output-dir <path>` | `<repo>/.artifacts/qa-e2e/{telegram,discord,slack}-<timestamp>` | Where reports/summary/observed messages and the output log are written. Relative paths resolve against `--repo-root`. |
323-
| `--repo-root <path>` | `process.cwd()` | Repository root when invoking from a neutral cwd. |
324-
| `--sut-account <id>` | `sut` | Temporary account id inside the QA gateway config. |
325-
| `--provider-mode <mode>` | `live-frontier` | `mock-openai` or `live-frontier` (legacy `live-openai` still works). |
326-
| `--model <ref>` / `--alt-model <ref>` | provider default | Primary/alternate model refs. |
327-
| `--fast` | off | Provider fast mode where supported. |
328-
| `--credential-source <env\|convex>` | `env` | See [Convex credential pool](#convex-credential-pool). |
329-
| `--credential-role <maintainer\|ci>` | `ci` in CI, `maintainer` otherwise | Role used when `--credential-source convex`. |
321+
| Flag | Default | Description |
322+
| ------------------------------------- | -------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
323+
| `--scenario <id>` | - | Run only this scenario. Repeatable. |
324+
| `--output-dir <path>` | `<repo>/.artifacts/qa-e2e/<transport>-<timestamp>` | Where reports/summary/observed messages and the output log are written. Relative paths resolve against `--repo-root`. |
325+
| `--repo-root <path>` | `process.cwd()` | Repository root when invoking from a neutral cwd. |
326+
| `--sut-account <id>` | `sut` | Temporary account id inside the QA gateway config. |
327+
| `--provider-mode <mode>` | `live-frontier` | `mock-openai` or `live-frontier` (legacy `live-openai` still works). |
328+
| `--model <ref>` / `--alt-model <ref>` | provider default | Primary/alternate model refs. |
329+
| `--fast` | off | Provider fast mode where supported. |
330+
| `--credential-source <env\|convex>` | `env` | See [Convex credential pool](#convex-credential-pool). |
331+
| `--credential-role <maintainer\|ci>` | `ci` in CI, `maintainer` otherwise | Role used when `--credential-source convex`. |
330332

331333
Each lane exits non-zero on any failed scenario. `--allow-failures` writes artifacts without setting a failing exit code.
332334

@@ -688,22 +690,52 @@ Required env when `--credential-source env`:
688690

689691
Optional:
690692

691-
- `OPENCLAW_QA_WHATSAPP_GROUP_JID` enables `whatsapp-mention-gating`.
693+
- `OPENCLAW_QA_WHATSAPP_GROUP_JID` enables group scenarios such as
694+
`whatsapp-mention-gating` and `whatsapp-group-allowlist-block`.
692695
- `OPENCLAW_QA_WHATSAPP_CAPTURE_CONTENT=1` keeps message bodies in
693696
observed-message artifacts.
694697

695-
Scenarios (`extensions/qa-lab/src/live-transports/whatsapp/whatsapp-live.runtime.ts`):
696-
697-
- `whatsapp-canary`
698-
- `whatsapp-pairing-block`
699-
- `whatsapp-mention-gating`
700-
- `whatsapp-approval-exec-native` - opt-in native WhatsApp exec approval
701-
scenario. Requests an exec approval through the gateway, verifies the
702-
WhatsApp message has native reaction approval affordances, resolves it, and
703-
verifies the resolved WhatsApp follow-up.
704-
- `whatsapp-approval-plugin-native` - opt-in native WhatsApp plugin approval
705-
scenario. Enables exec and plugin approval forwarding together, then verifies
706-
the same pending/resolved native WhatsApp path.
698+
Scenario catalog (`extensions/qa-lab/src/live-transports/whatsapp/whatsapp-live.runtime.ts`):
699+
700+
- Baseline and group gating: `whatsapp-canary`, `whatsapp-pairing-block`,
701+
`whatsapp-mention-gating`, `whatsapp-top-level-reply-shape`,
702+
`whatsapp-restart-resume`, `whatsapp-group-allowlist-block`.
703+
- Native commands: `whatsapp-help-command`, `whatsapp-status-command`,
704+
`whatsapp-commands-command`, `whatsapp-tools-compact-command`,
705+
`whatsapp-whoami-command`, `whatsapp-context-command`,
706+
`whatsapp-native-new-command`.
707+
- Reply and final-output behavior: `whatsapp-tool-only-usage-footer`,
708+
`whatsapp-reply-to-message`, `whatsapp-reply-context-isolation`,
709+
`whatsapp-reply-delivery-shape`, `whatsapp-stream-final-message-accounting`.
710+
- Inbound media and structured messages: `whatsapp-inbound-image-caption`,
711+
`whatsapp-audio-preflight`, `whatsapp-inbound-structured-messages`,
712+
`whatsapp-group-audio-gating`. These send real WhatsApp image, audio,
713+
document, location, contact, and sticker events through the driver.
714+
- Outbound Gateway and message action coverage:
715+
`whatsapp-outbound-media-matrix`,
716+
`whatsapp-outbound-document-preserves-filename`, `whatsapp-outbound-poll`,
717+
`whatsapp-message-actions`.
718+
- Access-control coverage: `whatsapp-access-control-dm-open`,
719+
`whatsapp-access-control-dm-disabled`, `whatsapp-access-control-group-open`,
720+
`whatsapp-access-control-group-disabled`, `whatsapp-group-allowlist-block`.
721+
- Native approvals: `whatsapp-approval-exec-deny-native`,
722+
`whatsapp-approval-exec-native`, `whatsapp-approval-exec-reaction-native`,
723+
`whatsapp-approval-plugin-native`.
724+
- Status reactions: `whatsapp-status-reactions`.
725+
726+
The catalog currently contains 35 scenarios. The `live-frontier` default lane is
727+
kept small at 8 scenarios for fast smoke coverage. The `mock-openai` default
728+
lane runs 29 deterministic scenarios through the real WhatsApp transport while
729+
mocking only model output. Approval scenarios and a few heavier/blocking checks
730+
remain explicit by scenario id.
731+
732+
The WhatsApp QA driver observes structured live events (`text`, `media`,
733+
`location`, `reaction`, and `poll`) and can actively send media, polls,
734+
contacts, locations, and stickers. QA Lab imports that driver through the
735+
`@openclaw/whatsapp/api.js` package surface instead of reaching into private
736+
WhatsApp runtime files. Message content is redacted by default. Outbound
737+
poll and upload-file coverage run through deterministic gateway `poll` and
738+
`message.action` calls instead of model-prompt-only tool invocation.
707739

708740
Output artifacts:
709741

extensions/qa-lab/src/live-transports/shared/live-artifacts.test.ts

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,14 @@
11
// Qa Lab tests cover live artifacts plugin behavior.
22
import { describe, expect, it } from "vitest";
3-
import { redactQaLiveLaneIssues } from "./live-artifacts.js";
3+
import { redactQaLiveLaneDetails, redactQaLiveLaneIssues } from "./live-artifacts.js";
44

55
describe("live transport artifacts", () => {
6+
it("uses a stable public metadata redaction marker", () => {
7+
expect(redactQaLiveLaneDetails()).toBe(
8+
"details redacted (OPENCLAW_QA_REDACT_PUBLIC_METADATA=1)",
9+
);
10+
});
11+
612
it("preserves cleanup phase labels while redacting details", () => {
713
expect(
814
redactQaLiveLaneIssues([
@@ -14,4 +20,17 @@ describe("live transport artifacts", () => {
1420
"live gateway cleanup: details redacted (OPENCLAW_QA_REDACT_PUBLIC_METADATA=1)",
1521
]);
1622
});
23+
24+
it("redacts multi-line artifact errors without preserving later section labels", () => {
25+
expect(
26+
redactQaLiveLaneIssues([
27+
[
28+
"WhatsApp QA failed before scenario completion.",
29+
"raw startup error with +15550000002",
30+
"Artifacts:",
31+
"- gatewayDebug: /tmp/openclaw-whatsapp-qa/gateway-debug",
32+
].join("\n"),
33+
]),
34+
).toEqual(["details redacted (OPENCLAW_QA_REDACT_PUBLIC_METADATA=1)"]);
35+
});
1736
});

extensions/qa-lab/src/live-transports/shared/live-artifacts.ts

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,20 @@ import { formatErrorMessage } from "openclaw/plugin-sdk/error-runtime";
44
const REDACTED_QA_LIVE_LANE_ISSUE_DETAILS =
55
"details redacted (OPENCLAW_QA_REDACT_PUBLIC_METADATA=1)";
66

7+
export function redactQaLiveLaneDetails() {
8+
return REDACTED_QA_LIVE_LANE_ISSUE_DETAILS;
9+
}
10+
711
export function appendQaLiveLaneIssue(issues: string[], label: string, error: unknown) {
812
issues.push(`${label}: ${formatErrorMessage(error)}`);
913
}
1014

1115
export function redactQaLiveLaneIssues(issues: readonly string[]) {
1216
return issues.map((issue) => {
13-
const separatorIndex = issue.indexOf(":");
14-
const label = separatorIndex < 0 ? "" : issue.slice(0, separatorIndex).trim();
15-
return label
16-
? `${label}: ${REDACTED_QA_LIVE_LANE_ISSUE_DETAILS}`
17-
: REDACTED_QA_LIVE_LANE_ISSUE_DETAILS;
17+
const firstLine = issue.split(/\r?\n/u, 1)[0] ?? "";
18+
const separatorIndex = firstLine.indexOf(":");
19+
const label = separatorIndex < 0 ? "" : firstLine.slice(0, separatorIndex).trim();
20+
return label ? `${label}: ${redactQaLiveLaneDetails()}` : redactQaLiveLaneDetails();
1821
});
1922
}
2023

extensions/qa-lab/src/live-transports/shared/live-transport-scenarios.test.ts

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,9 +108,16 @@ describe("live transport scenario helpers", () => {
108108
standardId: "thread-follow-up",
109109
scenarioId: "slack-thread-follow-up",
110110
});
111+
expect(lanes.find((lane) => lane.transportId === "whatsapp")?.members).toContainEqual({
112+
standardId: "allowlist-block",
113+
scenarioId: "whatsapp-group-allowlist-block",
114+
});
111115
expect(
112116
lanes.find((lane) => lane.transportId === "discord")?.baselineMissingStandardScenarioIds,
113117
).toEqual(["allowlist-block", "top-level-reply-shape", "restart-resume"]);
118+
expect(
119+
lanes.find((lane) => lane.transportId === "whatsapp")?.baselineMissingStandardScenarioIds,
120+
).toEqual([]);
114121
});
115122

116123
it("keeps coverage report lane summaries aligned with runtime lanes", () => {

extensions/qa-lab/src/live-transports/shared/live-transport-scenarios.ts

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,8 +72,12 @@ export const LIVE_TRANSPORT_COVERAGE_LANES: readonly LiveTransportCoverageLane[]
7272
commandName: "whatsapp",
7373
members: [
7474
{ standardId: "canary", scenarioId: "whatsapp-canary" },
75-
{ standardId: "allowlist-block", scenarioId: "whatsapp-pairing-block" },
7675
{ standardId: "mention-gating", scenarioId: "whatsapp-mention-gating" },
76+
{ standardId: "top-level-reply-shape", scenarioId: "whatsapp-top-level-reply-shape" },
77+
{ standardId: "restart-resume", scenarioId: "whatsapp-restart-resume" },
78+
{ standardId: "help-command", scenarioId: "whatsapp-help-command" },
79+
{ standardId: "reaction-observation", scenarioId: "whatsapp-status-reactions" },
80+
{ standardId: "allowlist-block", scenarioId: "whatsapp-group-allowlist-block" },
7781
],
7882
},
7983
] as const;

0 commit comments

Comments
 (0)