Skip to content

WhatsApp audio / voice-note delivery is broken across tts(...) and direct media sends #63110

@2vanm

Description

@2vanm

Summary

I hit what appears to be a real WhatsApp audio delivery bug in OpenClaw v2026.4.8.

There seem to be at least two closely related failures:

  1. A clean isolated tts(...) call to WhatsApp produces no delivered media, and downstream logs indicate hasMedia:false.
  2. Direct WhatsApp media sends via message(action=send, filePath=...) accept the request but return an empty messageId, and no audio arrives.

There is also a likely follow-on issue that audioAsVoice is not honored by the WhatsApp sender even if media survives far enough to be sent.

Environment

  • OpenClaw: v2026.4.8
  • Channel: WhatsApp
  • Model used during investigation: openai-codex/gpt-5.4
  • TTS provider was configured and validated successfully before the clean repro below

Reproduction A: clean isolated tts(...)

Run a clean isolated TTS send to WhatsApp:

tts(text="Hello Ivan. Clean isolated voice note test from Iris.", channel="whatsapp")

Expected

A WhatsApp audio message should be sent. Ideally, when voice-compatible, it should arrive as a native voice note / PTT.

Actual

No audio arrives.

Observed behavior from the investigation:

  • TTS generation itself appears to succeed
  • reply-payload assembly appears to preserve media upstream
  • downstream behavior indicates hasMedia:false
  • this strongly suggests media is being stripped before the actual WhatsApp send step

Reproduction B: direct WhatsApp media send

Generate an audio file locally, for example OGG/Opus, then send it via the message tool:

message(
  action="send",
  channel="whatsapp",
  target="<recipient>",
  filePath="/path/to/audio.ogg",
  asVoice=true
)

I also reproduced the same behavior with a regular audio attachment, not just asVoice=true.

Expected

The media send should return a non-empty messageId, and the recipient should receive the audio.

Actual

The send result includes a runId and destination JID, but messageId is empty and no audio arrives.

Related observation

The MEDIA: reply-directive path was also tested for the same voice-note scenario and did not result in successful delivery either.

That means, from a user perspective, WhatsApp audio / voice-note delivery currently appears broken across all practical pathways we tried.

Investigation notes

I traced the relevant code and found a likely primary bug plus a likely secondary bug.

Likely primary bug: TTS audio path is dropped before send

TTS output is written to a temp path under /tmp/openclaw/tts-*, then passed through as raw mediaUrl by all three TTS entrypoints I checked:

  • dist/extensions/speech-core/runtime-api.js
    • textToSpeech(...)
    • maybeApplyTtsToPayload(...)
  • dist/pi-embedded-CNTNdlGw.js
    • createTtsTool(...)
  • dist/commands-handlers.runtime-Akj_Dqoi.js
    • /tts audio

However, the normal reply-media path normalizer appears to only allow managed outbound media locations for absolute local paths.

That means a TTS temp path like /tmp/openclaw/tts-* is very likely being dropped during reply normalization, which matches the observed downstream symptom of hasMedia:false.

Likely secondary bug: WhatsApp sender does not honor audioAsVoice

In the WhatsApp send path I traced, mediaUrl is handled, but I could not find audioAsVoice being threaded through to the actual send implementation.

So even after the primary media-loss bug is fixed, there may still be a second bug where audio can only arrive as a generic attachment rather than a native WhatsApp voice note.

Direct media-send path also appears broken

Earlier investigation of message(action=send, filePath=...) suggested a separate problem in the WhatsApp send path: media requests can return an empty messageId, even though text sends work.

That may be a distinct bug from the TTS temp-path issue, but it contributes to the same user-visible result: audio delivery fails.

Why I think the TTS diagnosis is sound

The reasoning chain is:

  1. clean isolated tts(...) produced no delivered media
  2. logs later indicated hasMedia:false, not a transport send failure with media present
  3. TTS definitely produces an audioPath
  4. the TTS entrypoints forward that raw temp path directly as mediaUrl
  5. upstream tool-result and payload assembly appear to preserve media correctly
  6. reply-media normalization has restrictive absolute-path rules
  7. /tmp/openclaw/tts-* does not appear to fit those allowed managed reply-media locations

That makes the temp-path / normalization mismatch the strongest explanation for the main failure.

Suggested fix

Patch 1, required

Persist TTS output into managed outbound media before returning reply payloads, instead of forwarding raw temp paths.

Apply consistently to:

  • auto-TTS
  • tts(...)
  • /tts audio

In other words, do not emit raw temp-file paths into normal reply payloads.

Patch 2, likely also needed

Wire audioAsVoice all the way through the WhatsApp send path so voice-compatible audio can actually arrive as a native voice note / PTT.

Test plan after patching

  1. Re-run clean isolated tts(...)
  2. Verify outbound logs show media present, not hasMedia:false
  3. Confirm the recipient actually receives audio
  4. Verify whether it arrives as native voice note vs generic audio attachment
  5. Re-test direct message(action=send, filePath=...) audio sends

Impact

This breaks a core user workflow:

  • asking the assistant to send a voice note via WhatsApp
  • using TTS as an audio reply path
  • sending audio media directly to WhatsApp from the assistant

File references used during investigation

  • dist/extensions/speech-core/runtime-api.js
  • dist/pi-embedded-CNTNdlGw.js
  • dist/commands-handlers.runtime-Akj_Dqoi.js
  • dist/agent-runner.runtime-BEglkhP6.js
  • dist/send-Dw4UBtXk.js

Additional context

I wrote a longer internal investigation report with more detailed code-path notes and reasoning. Happy to condense or split this into separate issues if you would prefer one issue for TTS path loss and another for direct WhatsApp media sends.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions