WhatsApp audio / voice-note delivery is broken across tts(...) and direct media sends

## Summary
I hit what appears to be a real WhatsApp audio delivery bug in OpenClaw `v2026.4.8`.

There seem to be at least two closely related failures:

1. A clean isolated `tts(...)` call to WhatsApp produces no delivered media, and downstream logs indicate `hasMedia:false`.
2. Direct WhatsApp media sends via `message(action=send, filePath=...)` accept the request but return an empty `messageId`, and no audio arrives.

There is also a likely follow-on issue that `audioAsVoice` is not honored by the WhatsApp sender even if media survives far enough to be sent.

## Environment
- OpenClaw: `v2026.4.8`
- Channel: WhatsApp
- Model used during investigation: `openai-codex/gpt-5.4`
- TTS provider was configured and validated successfully before the clean repro below

## Reproduction A: clean isolated `tts(...)`
Run a clean isolated TTS send to WhatsApp:

```js
tts(text="Hello Ivan. Clean isolated voice note test from Iris.", channel="whatsapp")
```

## Expected
A WhatsApp audio message should be sent. Ideally, when voice-compatible, it should arrive as a native voice note / PTT.

## Actual
No audio arrives.

Observed behavior from the investigation:
- TTS generation itself appears to succeed
- reply-payload assembly appears to preserve media upstream
- downstream behavior indicates `hasMedia:false`
- this strongly suggests media is being stripped before the actual WhatsApp send step

## Reproduction B: direct WhatsApp media send
Generate an audio file locally, for example OGG/Opus, then send it via the message tool:

```js
message(
  action="send",
  channel="whatsapp",
  target="<recipient>",
  filePath="/path/to/audio.ogg",
  asVoice=true
)
```

I also reproduced the same behavior with a regular audio attachment, not just `asVoice=true`.

## Expected
The media send should return a non-empty `messageId`, and the recipient should receive the audio.

## Actual
The send result includes a `runId` and destination JID, but `messageId` is empty and no audio arrives.

## Related observation
The `MEDIA:` reply-directive path was also tested for the same voice-note scenario and did not result in successful delivery either.

That means, from a user perspective, WhatsApp audio / voice-note delivery currently appears broken across all practical pathways we tried.

## Investigation notes
I traced the relevant code and found a likely primary bug plus a likely secondary bug.

### Likely primary bug: TTS audio path is dropped before send
TTS output is written to a temp path under `/tmp/openclaw/tts-*`, then passed through as raw `mediaUrl` by all three TTS entrypoints I checked:

- `dist/extensions/speech-core/runtime-api.js`
  - `textToSpeech(...)`
  - `maybeApplyTtsToPayload(...)`
- `dist/pi-embedded-CNTNdlGw.js`
  - `createTtsTool(...)`
- `dist/commands-handlers.runtime-Akj_Dqoi.js`
  - `/tts audio`

However, the normal reply-media path normalizer appears to only allow managed outbound media locations for absolute local paths.

That means a TTS temp path like `/tmp/openclaw/tts-*` is very likely being dropped during reply normalization, which matches the observed downstream symptom of `hasMedia:false`.

### Likely secondary bug: WhatsApp sender does not honor `audioAsVoice`
In the WhatsApp send path I traced, `mediaUrl` is handled, but I could not find `audioAsVoice` being threaded through to the actual send implementation.

So even after the primary media-loss bug is fixed, there may still be a second bug where audio can only arrive as a generic attachment rather than a native WhatsApp voice note.

### Direct media-send path also appears broken
Earlier investigation of `message(action=send, filePath=...)` suggested a separate problem in the WhatsApp send path: media requests can return an empty `messageId`, even though text sends work.

That may be a distinct bug from the TTS temp-path issue, but it contributes to the same user-visible result: audio delivery fails.

## Why I think the TTS diagnosis is sound
The reasoning chain is:

1. clean isolated `tts(...)` produced no delivered media
2. logs later indicated `hasMedia:false`, not a transport send failure with media present
3. TTS definitely produces an `audioPath`
4. the TTS entrypoints forward that raw temp path directly as `mediaUrl`
5. upstream tool-result and payload assembly appear to preserve media correctly
6. reply-media normalization has restrictive absolute-path rules
7. `/tmp/openclaw/tts-*` does not appear to fit those allowed managed reply-media locations

That makes the temp-path / normalization mismatch the strongest explanation for the main failure.

## Suggested fix
### Patch 1, required
Persist TTS output into managed outbound media before returning reply payloads, instead of forwarding raw temp paths.

Apply consistently to:
- auto-TTS
- `tts(...)`
- `/tts audio`

In other words, do not emit raw temp-file paths into normal reply payloads.

### Patch 2, likely also needed
Wire `audioAsVoice` all the way through the WhatsApp send path so voice-compatible audio can actually arrive as a native voice note / PTT.

## Test plan after patching
1. Re-run clean isolated `tts(...)`
2. Verify outbound logs show media present, not `hasMedia:false`
3. Confirm the recipient actually receives audio
4. Verify whether it arrives as native voice note vs generic audio attachment
5. Re-test direct `message(action=send, filePath=...)` audio sends

## Impact
This breaks a core user workflow:
- asking the assistant to send a voice note via WhatsApp
- using TTS as an audio reply path
- sending audio media directly to WhatsApp from the assistant

## File references used during investigation
- `dist/extensions/speech-core/runtime-api.js`
- `dist/pi-embedded-CNTNdlGw.js`
- `dist/commands-handlers.runtime-Akj_Dqoi.js`
- `dist/agent-runner.runtime-BEglkhP6.js`
- `dist/send-Dw4UBtXk.js`

## Additional context
I wrote a longer internal investigation report with more detailed code-path notes and reasoning. Happy to condense or split this into separate issues if you would prefer one issue for TTS path loss and another for direct WhatsApp media sends.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WhatsApp audio / voice-note delivery is broken across tts(...) and direct media sends #63110

Summary

Environment

Reproduction A: clean isolated `tts(...)`

Expected

Actual

Reproduction B: direct WhatsApp media send

Expected

Actual

Related observation

Investigation notes

Likely primary bug: TTS audio path is dropped before send

Likely secondary bug: WhatsApp sender does not honor `audioAsVoice`

Direct media-send path also appears broken

Why I think the TTS diagnosis is sound

Suggested fix

Patch 1, required

Patch 2, likely also needed

Test plan after patching

Impact

File references used during investigation

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

WhatsApp audio / voice-note delivery is broken across tts(...) and direct media sends #63110

Description

Summary

Environment

Reproduction A: clean isolated tts(...)

Expected

Actual

Reproduction B: direct WhatsApp media send

Expected

Actual

Related observation

Investigation notes

Likely primary bug: TTS audio path is dropped before send

Likely secondary bug: WhatsApp sender does not honor audioAsVoice

Direct media-send path also appears broken

Why I think the TTS diagnosis is sound

Suggested fix

Patch 1, required

Patch 2, likely also needed

Test plan after patching

Impact

File references used during investigation

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Reproduction A: clean isolated `tts(...)`

Likely secondary bug: WhatsApp sender does not honor `audioAsVoice`