fix(whatsapp): respect audioAsVoice flag in outbound delivery#68744
fix(whatsapp): respect audioAsVoice flag in outbound delivery#68744masatohoshino wants to merge 4 commits into
Conversation
|
For a bit of extra context on the follow-up note in the PR body: While working on this fix, I noticed a few other channel adapters that may be worth checking for similar Happy to prepare a separate follow-up PR for the cross-channel side if maintainers think that would be useful — otherwise I'll leave the observation here for the record. |
Greptile SummaryThis PR fixes the WhatsApp adapter to correctly route audio replies flagged with Confidence Score: 5/5Safe to merge — all previously flagged P1 issues have been addressed and the remaining changes are well-tested. The previously flagged bug (audio-kind branch overwriting forceVoiceDelivery) is fixed in the latest commit with the !forceVoiceDelivery guard on line 133 of send.ts. The mimetype normalization logic is consistent across all four touched files. No new logic paths are left unguarded, and no remaining P1 or P0 findings exist. No files require special attention. Reviews (2): Last reviewed commit: "fix(whatsapp): guard audio-kind branch a..." | Re-trigger Greptile |
|
Thanks @greptile-apps for the catch. Addressed in the latest commit:
Verification:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 566b640784
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
All security findings raised by the analysis bots on this PR have been addressed or scoped to follow-ups as discussed above. The PR is ready for human review when convenient. Two items intentionally left for separate follow-ups:
Happy to split either of those into their own PRs if maintainers want them tracked separately. |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
ece86aa to
098b4b0
Compare
🔒 Aisle Security AnalysisWe found 1 potential security issue(s) in this PR:
1. 🟡 Audio trust boundary bypass via isVerifiedAudioSource relying on caller-provided media.kind
DescriptionThe new In particular:
Vulnerable code: export function isVerifiedAudioSource(media: {
kind?: string | null;
contentType?: string | null;
}): boolean {
return media.kind === "audio";
}Because this function is explicitly intended as a trust boundary (“verified audio source”), it should not rely solely on a caller-controlled classification. RecommendationTreat Options (from strongest to weakest):
export function isVerifiedAudioSource(media: {
kind?: string | null;
sniffedMime?: string | null;
}): boolean {
return media.kind === "audio" && Boolean(media.sniffedMime?.startsWith("audio/"));
}
return media.kind === "audio" && media.sniffedMime === "audio/ogg";
Also consider failing closed: if sniffing is inconclusive, do not treat the media as verified audio for voice-note delivery. Analyzed PR: #68744 at commit Last updated on: 2026-04-25T00:08:29Z |
098b4b0 to
c0b5777
Compare
|
Second rebase and security review update (2026-04-24): Rebase Rebased onto the latest main after #69813 (canonicalize outbound media
Tests: Aisle response
Cross-channel note (reiterated) The shared helpers introduced here — |
|
Aisle CWE-345 follow-up (2026-04-24): The Medium finding flagged in the latest scan
Tests updated in
Verification:
The broader "plumb sniffed content type through loadWebMedia and have |
57a349a to
ab8bbda
Compare
ab8bbda to
76f1fdd
Compare
…leName utilities
Channel-agnostic helpers in src/media/mime.ts for voice-note delivery
safety, exported through openclaw/plugin-sdk/media-runtime.
- isVerifiedAudioSource(media): predicate gating voice-note (PTT)
routing on media.kind === 'audio' or sanitized audio/* mime
- sanitizeMediaMime(input, opts): validates MIME for outbound headers,
rejects ASCII control characters (CWE-93), normalizes to lowercase
base type, optionally preserves a sanitized codecs= param for audio
- sanitizeFileName(input): strips ASCII control characters and Unicode
bidirectional/invisible format characters (CWE-451), replaces path
separators and quotes, caps length at 128 chars, linear-time
iteration bounded against attacker-controlled input
Signatures are kept minimal ({ kind, contentType } / string | null)
so Telegram (resolveTelegramVoiceSend), Discord
(sendVoiceMessageDiscord), and Matrix can adopt the same helpers in a
follow-up for consistent cross-channel voice-note hardening.
Fixes openclaw#66053. The WhatsApp adapter previously dropped the audioAsVoice flag from ChannelOutboundContext, so replies carrying the [[audio_as_voice]] directive were delivered as document attachments instead of PTT voice notes. Layers - sendMessageWhatsApp: adds audioAsVoice option. When true and the loaded media is a verified audio source (isVerifiedAudioSource), mediaType is rebuilt from a sanitized MIME with opus codec preservation/fallback so the outbound frame reaches WhatsApp as a voice note. Non-audio-verified payloads stay on the document path. - createWhatsAppOutboundBase (outbound-base.ts): forwards audioAsVoice through the sendMedia factory option so outbound-adapter.ts no longer needs per-adapter wiring (upstream canonicalization landed in openclaw#69813). - deliverWebReply (auto-reply/deliver-reply.ts): routes kind='audio' and audioAsVoice-verified media to PTT with sanitized opus mimetype; image/video/document branches apply sanitized allowlisted mimetypes; document fileName is passed through sanitizeFileName. Security hardening (applied on top of upstream's normalizeWhatsAppLoadedMedia helper) - sanitizeMediaMime rejects control characters (CWE-93) and preserves codecs params only in the audio path. - sanitizeFileName strips ASCII control and bidi/invisible Unicode to prevent filename UI spoofing (CWE-451). - isVerifiedAudioSource gates forceVoiceDelivery so an audioAsVoice reply cannot coerce non-audio bytes into a voice-note payload. Tests (extensions/whatsapp) - send.test.ts: audioAsVoice=true+audio routing, forceVoiceDelivery override guard, mimetype sanitization across kinds, document fileName sanitization. - deliver-reply.test.ts: voice-coercion rejection, control-character fallbacks for audio/image/video/document, fileName sanitization, audioAsVoice-unset document path. - outbound-base.test.ts / outbound-adapter.sendpayload.test.ts: audioAsVoice propagation through the factory. Rebased onto upstream/main after openclaw#69813. Audio mimetype canonicalization ("audio/ogg" -> "audio/ogg; codecs=opus") is now owned by normalizeWhatsAppLoadedMedia; this change layers PTT routing and security hardening on top. Closes openclaw#66053
isVerifiedAudioSource previously returned true for any sanitized contentType starting with audio/, allowing an attacker who controls a mediaUrl response to spoof Content-Type: audio/... and coerce WhatsApp PTT delivery of non-audio bytes (Aisle CWE-345). Tighten the helper to only accept media.kind === "audio". The contentType parameter is retained on the signature as a seam for a future sniffedMime-based extension once magic-byte sniffing is plumbed through loadWebMedia / detectMime cross-channel. Addresses Aisle re-scan finding on PR openclaw#68744.
76f1fdd to
6abf1f4
Compare
|
Thanks @masatohoshino. I landed a narrower mainline fix for the issue in c2a2a481b2 What landed:
I intentionally did not land the broader MIME/filename helper bundle from this PR in this pass. The current root bug was the dropped shared payload flag, and the existing WhatsApp send path already maps audio MIME payloads to Baileys Dependency contract checked locally: installed Baileys types expose Verification:
Closing this PR as superseded by the landed main commit. |
Summary
Fixes #66053. The WhatsApp adapter dropped the
audioAsVoiceflag fromChannelOutboundContext, so replies with[[audio_as_voice]]weredelivered as document attachments instead of PTT voice notes. This PR
wires the flag through
createWhatsAppOutboundBase→sendMessageWhatsApp/deliverWebReply, and adds channel-agnosticMIME/filename hardening.
Changes
Commit 1 —
feat(media): adds tosrc/media/mime.ts(exported viaopenclaw/plugin-sdk/media-runtime):isVerifiedAudioSource({ kind, contentType })— predicate gatingvoice-note routing.
sanitizeMediaMime(input, { preserveCodecsParam? })— rejects controlcharacters (CWE-93), preserves
codecs=param only for audio.sanitizeFileName(input)— strips ASCII control + Unicode GeneralCategory Cf characters (
\p{Cf}, CWE-451), caps length at 128.Signatures are channel-agnostic so Telegram, Discord, and Matrix can
adopt the same helpers in a follow-up.
Commit 2 —
fix(whatsapp): consumes the new helpers.sendMessageWhatsApp:forceVoiceDeliverygated onisVerifiedAudioSource; PTT mimetype rebuilt with opus codecpreservation/fallback.
createWhatsAppOutboundBase: forwardsaudioAsVoicethroughsendMedia.deliverWebReply: audio / audio-as-voice paths route to PTT withsanitized opus mime; image/video/document branches use sanitized
allowlisted mime; document
fileNamesanitized.Commit 3 —
fix(media): widenssanitizeFileNameinvisible-charactercoverage to all Unicode Category Cf (addresses Aisle Finding 1).
Rebased onto
mainafter #69813 landed; the audio mimetypecanonicalization that previously lived inline is now delegated to
normalizeWhatsAppLoadedMedia, and this PR layers PTT routing +security hardening on top.
Verification
pnpm test extensions/whatsapp→ 567 passed / 0 failedpnpm test src/media/mime.test.ts→ 106 passed / 0 failed (after Aisle fix: add @lid format support and allowFrom wildcard handling #1 fix)pnpm tsgo:prod→ exit 0codex review --base upstream/main→ no P1 findingsSecurity notes
cross-channel follow-up. The upstream auto-reply path already lands
ptt: truebased onmedia.kind === "audio"(derived fromdetectMimeextension/header fallback), so the correct fix plumbs asniffed content-type through
loadWebMediaand has Telegram,Discord, Matrix, and WhatsApp all adopt the verified source together.
Follow-ups
loadWebMedia/detectMime(seesecurity notes).
in
src/media/mime.ts.