Summary
I hit what appears to be a real WhatsApp audio delivery bug in OpenClaw v2026.4.8.
There seem to be at least two closely related failures:
- A clean isolated
tts(...) call to WhatsApp produces no delivered media, and downstream logs indicate hasMedia:false.
- Direct WhatsApp media sends via
message(action=send, filePath=...) accept the request but return an empty messageId, and no audio arrives.
There is also a likely follow-on issue that audioAsVoice is not honored by the WhatsApp sender even if media survives far enough to be sent.
Environment
- OpenClaw:
v2026.4.8
- Channel: WhatsApp
- Model used during investigation:
openai-codex/gpt-5.4
- TTS provider was configured and validated successfully before the clean repro below
Reproduction A: clean isolated tts(...)
Run a clean isolated TTS send to WhatsApp:
tts(text="Hello Ivan. Clean isolated voice note test from Iris.", channel="whatsapp")
Expected
A WhatsApp audio message should be sent. Ideally, when voice-compatible, it should arrive as a native voice note / PTT.
Actual
No audio arrives.
Observed behavior from the investigation:
- TTS generation itself appears to succeed
- reply-payload assembly appears to preserve media upstream
- downstream behavior indicates
hasMedia:false
- this strongly suggests media is being stripped before the actual WhatsApp send step
Reproduction B: direct WhatsApp media send
Generate an audio file locally, for example OGG/Opus, then send it via the message tool:
message(
action="send",
channel="whatsapp",
target="<recipient>",
filePath="/path/to/audio.ogg",
asVoice=true
)
I also reproduced the same behavior with a regular audio attachment, not just asVoice=true.
Expected
The media send should return a non-empty messageId, and the recipient should receive the audio.
Actual
The send result includes a runId and destination JID, but messageId is empty and no audio arrives.
Related observation
The MEDIA: reply-directive path was also tested for the same voice-note scenario and did not result in successful delivery either.
That means, from a user perspective, WhatsApp audio / voice-note delivery currently appears broken across all practical pathways we tried.
Investigation notes
I traced the relevant code and found a likely primary bug plus a likely secondary bug.
Likely primary bug: TTS audio path is dropped before send
TTS output is written to a temp path under /tmp/openclaw/tts-*, then passed through as raw mediaUrl by all three TTS entrypoints I checked:
dist/extensions/speech-core/runtime-api.js
textToSpeech(...)
maybeApplyTtsToPayload(...)
dist/pi-embedded-CNTNdlGw.js
dist/commands-handlers.runtime-Akj_Dqoi.js
However, the normal reply-media path normalizer appears to only allow managed outbound media locations for absolute local paths.
That means a TTS temp path like /tmp/openclaw/tts-* is very likely being dropped during reply normalization, which matches the observed downstream symptom of hasMedia:false.
Likely secondary bug: WhatsApp sender does not honor audioAsVoice
In the WhatsApp send path I traced, mediaUrl is handled, but I could not find audioAsVoice being threaded through to the actual send implementation.
So even after the primary media-loss bug is fixed, there may still be a second bug where audio can only arrive as a generic attachment rather than a native WhatsApp voice note.
Direct media-send path also appears broken
Earlier investigation of message(action=send, filePath=...) suggested a separate problem in the WhatsApp send path: media requests can return an empty messageId, even though text sends work.
That may be a distinct bug from the TTS temp-path issue, but it contributes to the same user-visible result: audio delivery fails.
Why I think the TTS diagnosis is sound
The reasoning chain is:
- clean isolated
tts(...) produced no delivered media
- logs later indicated
hasMedia:false, not a transport send failure with media present
- TTS definitely produces an
audioPath
- the TTS entrypoints forward that raw temp path directly as
mediaUrl
- upstream tool-result and payload assembly appear to preserve media correctly
- reply-media normalization has restrictive absolute-path rules
/tmp/openclaw/tts-* does not appear to fit those allowed managed reply-media locations
That makes the temp-path / normalization mismatch the strongest explanation for the main failure.
Suggested fix
Patch 1, required
Persist TTS output into managed outbound media before returning reply payloads, instead of forwarding raw temp paths.
Apply consistently to:
- auto-TTS
tts(...)
/tts audio
In other words, do not emit raw temp-file paths into normal reply payloads.
Patch 2, likely also needed
Wire audioAsVoice all the way through the WhatsApp send path so voice-compatible audio can actually arrive as a native voice note / PTT.
Test plan after patching
- Re-run clean isolated
tts(...)
- Verify outbound logs show media present, not
hasMedia:false
- Confirm the recipient actually receives audio
- Verify whether it arrives as native voice note vs generic audio attachment
- Re-test direct
message(action=send, filePath=...) audio sends
Impact
This breaks a core user workflow:
- asking the assistant to send a voice note via WhatsApp
- using TTS as an audio reply path
- sending audio media directly to WhatsApp from the assistant
File references used during investigation
dist/extensions/speech-core/runtime-api.js
dist/pi-embedded-CNTNdlGw.js
dist/commands-handlers.runtime-Akj_Dqoi.js
dist/agent-runner.runtime-BEglkhP6.js
dist/send-Dw4UBtXk.js
Additional context
I wrote a longer internal investigation report with more detailed code-path notes and reasoning. Happy to condense or split this into separate issues if you would prefer one issue for TTS path loss and another for direct WhatsApp media sends.
Summary
I hit what appears to be a real WhatsApp audio delivery bug in OpenClaw
v2026.4.8.There seem to be at least two closely related failures:
tts(...)call to WhatsApp produces no delivered media, and downstream logs indicatehasMedia:false.message(action=send, filePath=...)accept the request but return an emptymessageId, and no audio arrives.There is also a likely follow-on issue that
audioAsVoiceis not honored by the WhatsApp sender even if media survives far enough to be sent.Environment
v2026.4.8openai-codex/gpt-5.4Reproduction A: clean isolated
tts(...)Run a clean isolated TTS send to WhatsApp:
Expected
A WhatsApp audio message should be sent. Ideally, when voice-compatible, it should arrive as a native voice note / PTT.
Actual
No audio arrives.
Observed behavior from the investigation:
hasMedia:falseReproduction B: direct WhatsApp media send
Generate an audio file locally, for example OGG/Opus, then send it via the message tool:
I also reproduced the same behavior with a regular audio attachment, not just
asVoice=true.Expected
The media send should return a non-empty
messageId, and the recipient should receive the audio.Actual
The send result includes a
runIdand destination JID, butmessageIdis empty and no audio arrives.Related observation
The
MEDIA:reply-directive path was also tested for the same voice-note scenario and did not result in successful delivery either.That means, from a user perspective, WhatsApp audio / voice-note delivery currently appears broken across all practical pathways we tried.
Investigation notes
I traced the relevant code and found a likely primary bug plus a likely secondary bug.
Likely primary bug: TTS audio path is dropped before send
TTS output is written to a temp path under
/tmp/openclaw/tts-*, then passed through as rawmediaUrlby all three TTS entrypoints I checked:dist/extensions/speech-core/runtime-api.jstextToSpeech(...)maybeApplyTtsToPayload(...)dist/pi-embedded-CNTNdlGw.jscreateTtsTool(...)dist/commands-handlers.runtime-Akj_Dqoi.js/tts audioHowever, the normal reply-media path normalizer appears to only allow managed outbound media locations for absolute local paths.
That means a TTS temp path like
/tmp/openclaw/tts-*is very likely being dropped during reply normalization, which matches the observed downstream symptom ofhasMedia:false.Likely secondary bug: WhatsApp sender does not honor
audioAsVoiceIn the WhatsApp send path I traced,
mediaUrlis handled, but I could not findaudioAsVoicebeing threaded through to the actual send implementation.So even after the primary media-loss bug is fixed, there may still be a second bug where audio can only arrive as a generic attachment rather than a native WhatsApp voice note.
Direct media-send path also appears broken
Earlier investigation of
message(action=send, filePath=...)suggested a separate problem in the WhatsApp send path: media requests can return an emptymessageId, even though text sends work.That may be a distinct bug from the TTS temp-path issue, but it contributes to the same user-visible result: audio delivery fails.
Why I think the TTS diagnosis is sound
The reasoning chain is:
tts(...)produced no delivered mediahasMedia:false, not a transport send failure with media presentaudioPathmediaUrl/tmp/openclaw/tts-*does not appear to fit those allowed managed reply-media locationsThat makes the temp-path / normalization mismatch the strongest explanation for the main failure.
Suggested fix
Patch 1, required
Persist TTS output into managed outbound media before returning reply payloads, instead of forwarding raw temp paths.
Apply consistently to:
tts(...)/tts audioIn other words, do not emit raw temp-file paths into normal reply payloads.
Patch 2, likely also needed
Wire
audioAsVoiceall the way through the WhatsApp send path so voice-compatible audio can actually arrive as a native voice note / PTT.Test plan after patching
tts(...)hasMedia:falsemessage(action=send, filePath=...)audio sendsImpact
This breaks a core user workflow:
File references used during investigation
dist/extensions/speech-core/runtime-api.jsdist/pi-embedded-CNTNdlGw.jsdist/commands-handlers.runtime-Akj_Dqoi.jsdist/agent-runner.runtime-BEglkhP6.jsdist/send-Dw4UBtXk.jsAdditional context
I wrote a longer internal investigation report with more detailed code-path notes and reasoning. Happy to condense or split this into separate issues if you would prefer one issue for TTS path loss and another for direct WhatsApp media sends.