BlueBubbles native iOS voice-memo delivery broken end-to-end with ElevenLabs (and other non-Azure TTS providers)

## Summary

Sending TTS audio to a BlueBubbles iMessage chat using the bundled `tts` agent tool (or `tts.convert` RPC) currently always renders as a **plain audio attachment** in iMessage, never as a **native iOS voice memo** (the bubble with the waveform / scrubber UI). Two distinct upstream gaps in the same pipeline are conspiring to make this delivery mode unreachable for any non-Azure TTS provider, even though every individual link in the chain otherwise works.

## Pipeline (what *should* happen)

For native voice-memo rendering, the chain must complete:

1. TTS provider returns `voiceCompatible: true` for the synthesized clip.
2. The bundled `tts` agent tool sets `details.media.audioAsVoice = true` based on `result.audioAsVoice || result.voiceCompatible` (`src/agents/tools/tts-tool.ts:97`).
3. The reply-delivery layer propagates `audioAsVoice` through to the BlueBubbles channel monitor (`extensions/bluebubbles/src/monitor-processing.ts:1689` reads `payload.audioAsVoice === true` into `asVoice`).
4. `extensions/bluebubbles/src/attachments.ts:134-188` flips `wantsVoice = true` and adds the `isAudioMessage=true` form field on the upload.
5. The BlueBubbles server converts MP3 → CAF and posts via the private API as a native iMessage voice memo.

## Where the chain breaks

### Gap 1 — `target=voice-note` is never set when delivering TTS to BlueBubbles

`extensions/elevenlabs/speech-provider.ts:514` only marks `voiceCompatible: true` when `req.target === \"voice-note\"`. But there's no path that sets `target = \"voice-note\"` automatically based on the destination channel:

- `tts.convert` RPC handler (`src/gateway/server-methods/tts.ts:92-144`) does not accept a `target` param. It calls `textToSpeech({ text, cfg, channel, overrides, disableFallback })` — the channel is forwarded, but I cannot find any branch in the runtime that maps `channel === \"bluebubbles\"` → `target = \"voice-note\"`.
- The bundled `tts` agent tool (`src/agents/tools/tts-tool.ts`) likewise has no `target` param in `TtsToolSchema` and does not set one explicitly.
- Adding `[[audio_as_voice]]` to the input text (or passing `\"target\": \"voice-note\"` directly to `tts.convert`) does not cause the synthesis to flip — `voiceCompatible` stays `false` (verified on v2026.4.24, see repro below).

### Gap 2 — ElevenLabs returns opus for voice-note target, but BlueBubbles rejects opus

Even if Gap 1 were closed, `extensions/elevenlabs/speech-provider.ts:469-513` defaults to `opus_48000_64` with file extension `.opus` whenever `req.target === \"voice-note\"`:

```ts
const outputFormat =
  trimToUndefined(overrides.outputFormat) ??
  (req.target === \"voice-note\" ? \"opus_48000_64\" : \"mp3_44100_128\");
// ...
fileExtension: req.target === \"voice-note\" ? \".opus\" : \".mp3\",
voiceCompatible: req.target === \"voice-note\",
```

But `extensions/bluebubbles/src/attachments.ts:170-188` requires MP3 or CAF for `isAudioMessage=true` and explicitly rejects opus:

```ts
if (isAudioMessage) {
  const voiceInfo = resolveVoiceInfo(filename, contentType);
  if (!voiceInfo.isAudio) { throw new Error(\"BlueBubbles voice messages require audio media (mp3 or caf).\"); }
  if (voiceInfo.isMp3) { /* ok */ }
  else if (voiceInfo.isCaf) { /* ok */ }
  else { throw new Error(\"BlueBubbles voice messages require mp3 or caf audio (convert before sending).\"); }
}
```

So overriding `outputFormat: \"mp3_44100_128\"` to coax MP3 out doesn't fix it either, because `fileExtension` is hardcoded to `.opus` whenever `target === \"voice-note\"` regardless of the actual format. BlueBubbles would receive `.opus` filename + MP3 bytes → `voiceInfo.isMp3` derived from filename would be false.

### Net effect

There is no provider+channel combination today (other than possibly Azure Speech, which has explicit `voiceNoteOutputFormat` config) that can produce a TTS clip BlueBubbles will accept as a native voice memo. The `isAudioMessage`/`asVoice` plumbing on the BlueBubbles side is fully wired and works (`extensions/bluebubbles/src/actions.ts:448` accepts an explicit `asVoice` param on direct attachment posts) — but the agent-facing surfaces (`tts` tool, `tts.convert`, auto-reply delivery) cannot reach it for synthesized speech.

## Reproduction

Environment:
- OpenClaw v2026.4.24 (file-backed secrets, macOS LaunchAgent, BlueBubbles bundled channel)
- BlueBubbles server with private API enabled (verified separately — `asVoice` works for non-TTS attachments via `bluebubbles_send_attachment` with `asVoice: true`)

Config:

```json5
{
  messages: {
    tts: {
      provider: \"elevenlabs\",
      providers: {
        elevenlabs: {
          apiKey: \"<literal sk_… key (workaround for #72496)>\",
          voiceId: \"<voice-id>\",
          model: \"eleven_v3\",
          outputFormat: \"mp3_44100_128\"
        }
      }
    }
  }
}
```

Tests (all return `voiceCompatible: false`):

```sh
openclaw gateway call tts.convert --params '{\"text\":\"hi\"}'
openclaw gateway call tts.convert --params '{\"text\":\"hi\",\"channel\":\"bluebubbles\"}'
openclaw gateway call tts.convert --params '{\"text\":\"hi\",\"channel\":\"bluebubbles\",\"target\":\"voice-note\"}'
openclaw gateway call tts.convert --params '{\"text\":\"[[audio_as_voice]] hi\",\"channel\":\"bluebubbles\"}'
```

Same result via the agent-facing `tts` tool: BlueBubbles delivery shows `provider: \"elevenlabs\"` in the tool result `details`, no `audioAsVoice` flag in `details.media`, BlueBubbles renders a generic audio attachment instead of a native voice memo.

## Suggested fix

Two complementary changes that together unblock the pipeline:

1. **Auto-target voice-note for voice-capable channels (or expose `target` on the agent surface).** When `textToSpeech({ channel })` is called with a channel whose downstream supports voice-memo rendering (BlueBubbles, WhatsApp, Telegram voice notes, etc.), set `target = \"voice-note\"` by default. Alternatively/additionally, expose `target` as a parameter on `tts.convert` and the bundled `tts` agent tool's input schema so callers can opt in explicitly. Also consider honoring `[[audio_as_voice]]` reply directives at the synthesis stage (today they only affect downstream delivery).

2. **Honor `outputFormat` override for voice-note in ElevenLabs (and friends), and align `fileExtension`.** In `extensions/elevenlabs/speech-provider.ts:469-513`, derive `fileExtension` from the resolved `outputFormat` rather than hardcoding `.opus` for voice-note. That lets users pin `outputFormat: \"mp3_44100_128\"` and have ElevenLabs return MP3 with `.mp3` extension while still marking `voiceCompatible: true`. (Optional: add a sibling `voiceNoteOutputFormat` config field matching the Azure provider's pattern, for symmetry.)

Both changes are relatively contained. Either one alone is insufficient — closing Gap 1 only routes us into the opus-rejection trap; closing Gap 2 only is unreachable without Gap 1.

## Related

- **#72496** — same bug family for `talk.config` SecretRef redaction, also blocking iOS/macOS Talk Mode end-to-end.
- **#68690** — umbrella SecretRef coverage gaps; explicitly lists `messages.tts.providers.<id>.apiKey` siblings as broken (compounds this issue when secrets are stored as SecretRefs).

## No PII

All voice IDs, key material, file paths, and account-specific identifiers are placeholders. Reproduces on a clean LaunchAgent install with any ElevenLabs voice and a BlueBubbles server with the private API enabled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BlueBubbles native iOS voice-memo delivery broken end-to-end with ElevenLabs (and other non-Azure TTS providers) #72506

Summary

Pipeline (what should happen)

Where the chain breaks

Gap 1 — `target=voice-note` is never set when delivering TTS to BlueBubbles

Gap 2 — ElevenLabs returns opus for voice-note target, but BlueBubbles rejects opus

Net effect

Reproduction

Suggested fix

Related

No PII

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

BlueBubbles native iOS voice-memo delivery broken end-to-end with ElevenLabs (and other non-Azure TTS providers) #72506

Description

Summary

Pipeline (what should happen)

Where the chain breaks

Gap 1 — target=voice-note is never set when delivering TTS to BlueBubbles

Gap 2 — ElevenLabs returns opus for voice-note target, but BlueBubbles rejects opus

Net effect

Reproduction

Suggested fix

Related

No PII

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Gap 1 — `target=voice-note` is never set when delivering TTS to BlueBubbles