Summary
Sending TTS audio to a BlueBubbles iMessage chat using the bundled tts agent tool (or tts.convert RPC) currently always renders as a plain audio attachment in iMessage, never as a native iOS voice memo (the bubble with the waveform / scrubber UI). Two distinct upstream gaps in the same pipeline are conspiring to make this delivery mode unreachable for any non-Azure TTS provider, even though every individual link in the chain otherwise works.
Pipeline (what should happen)
For native voice-memo rendering, the chain must complete:
- TTS provider returns
voiceCompatible: true for the synthesized clip.
- The bundled
tts agent tool sets details.media.audioAsVoice = true based on result.audioAsVoice || result.voiceCompatible (src/agents/tools/tts-tool.ts:97).
- The reply-delivery layer propagates
audioAsVoice through to the BlueBubbles channel monitor (extensions/bluebubbles/src/monitor-processing.ts:1689 reads payload.audioAsVoice === true into asVoice).
extensions/bluebubbles/src/attachments.ts:134-188 flips wantsVoice = true and adds the isAudioMessage=true form field on the upload.
- The BlueBubbles server converts MP3 → CAF and posts via the private API as a native iMessage voice memo.
Where the chain breaks
Gap 1 — target=voice-note is never set when delivering TTS to BlueBubbles
extensions/elevenlabs/speech-provider.ts:514 only marks voiceCompatible: true when req.target === \"voice-note\". But there's no path that sets target = \"voice-note\" automatically based on the destination channel:
tts.convert RPC handler (src/gateway/server-methods/tts.ts:92-144) does not accept a target param. It calls textToSpeech({ text, cfg, channel, overrides, disableFallback }) — the channel is forwarded, but I cannot find any branch in the runtime that maps channel === \"bluebubbles\" → target = \"voice-note\".
- The bundled
tts agent tool (src/agents/tools/tts-tool.ts) likewise has no target param in TtsToolSchema and does not set one explicitly.
- Adding
[[audio_as_voice]] to the input text (or passing \"target\": \"voice-note\" directly to tts.convert) does not cause the synthesis to flip — voiceCompatible stays false (verified on v2026.4.24, see repro below).
Gap 2 — ElevenLabs returns opus for voice-note target, but BlueBubbles rejects opus
Even if Gap 1 were closed, extensions/elevenlabs/speech-provider.ts:469-513 defaults to opus_48000_64 with file extension .opus whenever req.target === \"voice-note\":
const outputFormat =
trimToUndefined(overrides.outputFormat) ??
(req.target === \"voice-note\" ? \"opus_48000_64\" : \"mp3_44100_128\");
// ...
fileExtension: req.target === \"voice-note\" ? \".opus\" : \".mp3\",
voiceCompatible: req.target === \"voice-note\",
But extensions/bluebubbles/src/attachments.ts:170-188 requires MP3 or CAF for isAudioMessage=true and explicitly rejects opus:
if (isAudioMessage) {
const voiceInfo = resolveVoiceInfo(filename, contentType);
if (!voiceInfo.isAudio) { throw new Error(\"BlueBubbles voice messages require audio media (mp3 or caf).\"); }
if (voiceInfo.isMp3) { /* ok */ }
else if (voiceInfo.isCaf) { /* ok */ }
else { throw new Error(\"BlueBubbles voice messages require mp3 or caf audio (convert before sending).\"); }
}
So overriding outputFormat: \"mp3_44100_128\" to coax MP3 out doesn't fix it either, because fileExtension is hardcoded to .opus whenever target === \"voice-note\" regardless of the actual format. BlueBubbles would receive .opus filename + MP3 bytes → voiceInfo.isMp3 derived from filename would be false.
Net effect
There is no provider+channel combination today (other than possibly Azure Speech, which has explicit voiceNoteOutputFormat config) that can produce a TTS clip BlueBubbles will accept as a native voice memo. The isAudioMessage/asVoice plumbing on the BlueBubbles side is fully wired and works (extensions/bluebubbles/src/actions.ts:448 accepts an explicit asVoice param on direct attachment posts) — but the agent-facing surfaces (tts tool, tts.convert, auto-reply delivery) cannot reach it for synthesized speech.
Reproduction
Environment:
- OpenClaw v2026.4.24 (file-backed secrets, macOS LaunchAgent, BlueBubbles bundled channel)
- BlueBubbles server with private API enabled (verified separately —
asVoice works for non-TTS attachments via bluebubbles_send_attachment with asVoice: true)
Config:
{
messages: {
tts: {
provider: \"elevenlabs\",
providers: {
elevenlabs: {
apiKey: \"<literal sk_… key (workaround for #72496)>\",
voiceId: \"<voice-id>\",
model: \"eleven_v3\",
outputFormat: \"mp3_44100_128\"
}
}
}
}
}
Tests (all return voiceCompatible: false):
openclaw gateway call tts.convert --params '{\"text\":\"hi\"}'
openclaw gateway call tts.convert --params '{\"text\":\"hi\",\"channel\":\"bluebubbles\"}'
openclaw gateway call tts.convert --params '{\"text\":\"hi\",\"channel\":\"bluebubbles\",\"target\":\"voice-note\"}'
openclaw gateway call tts.convert --params '{\"text\":\"[[audio_as_voice]] hi\",\"channel\":\"bluebubbles\"}'
Same result via the agent-facing tts tool: BlueBubbles delivery shows provider: \"elevenlabs\" in the tool result details, no audioAsVoice flag in details.media, BlueBubbles renders a generic audio attachment instead of a native voice memo.
Suggested fix
Two complementary changes that together unblock the pipeline:
-
Auto-target voice-note for voice-capable channels (or expose target on the agent surface). When textToSpeech({ channel }) is called with a channel whose downstream supports voice-memo rendering (BlueBubbles, WhatsApp, Telegram voice notes, etc.), set target = \"voice-note\" by default. Alternatively/additionally, expose target as a parameter on tts.convert and the bundled tts agent tool's input schema so callers can opt in explicitly. Also consider honoring [[audio_as_voice]] reply directives at the synthesis stage (today they only affect downstream delivery).
-
Honor outputFormat override for voice-note in ElevenLabs (and friends), and align fileExtension. In extensions/elevenlabs/speech-provider.ts:469-513, derive fileExtension from the resolved outputFormat rather than hardcoding .opus for voice-note. That lets users pin outputFormat: \"mp3_44100_128\" and have ElevenLabs return MP3 with .mp3 extension while still marking voiceCompatible: true. (Optional: add a sibling voiceNoteOutputFormat config field matching the Azure provider's pattern, for symmetry.)
Both changes are relatively contained. Either one alone is insufficient — closing Gap 1 only routes us into the opus-rejection trap; closing Gap 2 only is unreachable without Gap 1.
Related
No PII
All voice IDs, key material, file paths, and account-specific identifiers are placeholders. Reproduces on a clean LaunchAgent install with any ElevenLabs voice and a BlueBubbles server with the private API enabled.
Summary
Sending TTS audio to a BlueBubbles iMessage chat using the bundled
ttsagent tool (ortts.convertRPC) currently always renders as a plain audio attachment in iMessage, never as a native iOS voice memo (the bubble with the waveform / scrubber UI). Two distinct upstream gaps in the same pipeline are conspiring to make this delivery mode unreachable for any non-Azure TTS provider, even though every individual link in the chain otherwise works.Pipeline (what should happen)
For native voice-memo rendering, the chain must complete:
voiceCompatible: truefor the synthesized clip.ttsagent tool setsdetails.media.audioAsVoice = truebased onresult.audioAsVoice || result.voiceCompatible(src/agents/tools/tts-tool.ts:97).audioAsVoicethrough to the BlueBubbles channel monitor (extensions/bluebubbles/src/monitor-processing.ts:1689readspayload.audioAsVoice === trueintoasVoice).extensions/bluebubbles/src/attachments.ts:134-188flipswantsVoice = trueand adds theisAudioMessage=trueform field on the upload.Where the chain breaks
Gap 1 —
target=voice-noteis never set when delivering TTS to BlueBubblesextensions/elevenlabs/speech-provider.ts:514only marksvoiceCompatible: truewhenreq.target === \"voice-note\". But there's no path that setstarget = \"voice-note\"automatically based on the destination channel:tts.convertRPC handler (src/gateway/server-methods/tts.ts:92-144) does not accept atargetparam. It callstextToSpeech({ text, cfg, channel, overrides, disableFallback })— the channel is forwarded, but I cannot find any branch in the runtime that mapschannel === \"bluebubbles\"→target = \"voice-note\".ttsagent tool (src/agents/tools/tts-tool.ts) likewise has notargetparam inTtsToolSchemaand does not set one explicitly.[[audio_as_voice]]to the input text (or passing\"target\": \"voice-note\"directly totts.convert) does not cause the synthesis to flip —voiceCompatiblestaysfalse(verified on v2026.4.24, see repro below).Gap 2 — ElevenLabs returns opus for voice-note target, but BlueBubbles rejects opus
Even if Gap 1 were closed,
extensions/elevenlabs/speech-provider.ts:469-513defaults toopus_48000_64with file extension.opuswheneverreq.target === \"voice-note\":But
extensions/bluebubbles/src/attachments.ts:170-188requires MP3 or CAF forisAudioMessage=trueand explicitly rejects opus:So overriding
outputFormat: \"mp3_44100_128\"to coax MP3 out doesn't fix it either, becausefileExtensionis hardcoded to.opuswhenevertarget === \"voice-note\"regardless of the actual format. BlueBubbles would receive.opusfilename + MP3 bytes →voiceInfo.isMp3derived from filename would be false.Net effect
There is no provider+channel combination today (other than possibly Azure Speech, which has explicit
voiceNoteOutputFormatconfig) that can produce a TTS clip BlueBubbles will accept as a native voice memo. TheisAudioMessage/asVoiceplumbing on the BlueBubbles side is fully wired and works (extensions/bluebubbles/src/actions.ts:448accepts an explicitasVoiceparam on direct attachment posts) — but the agent-facing surfaces (ttstool,tts.convert, auto-reply delivery) cannot reach it for synthesized speech.Reproduction
Environment:
asVoiceworks for non-TTS attachments viabluebubbles_send_attachmentwithasVoice: true)Config:
Tests (all return
voiceCompatible: false):Same result via the agent-facing
ttstool: BlueBubbles delivery showsprovider: \"elevenlabs\"in the tool resultdetails, noaudioAsVoiceflag indetails.media, BlueBubbles renders a generic audio attachment instead of a native voice memo.Suggested fix
Two complementary changes that together unblock the pipeline:
Auto-target voice-note for voice-capable channels (or expose
targeton the agent surface). WhentextToSpeech({ channel })is called with a channel whose downstream supports voice-memo rendering (BlueBubbles, WhatsApp, Telegram voice notes, etc.), settarget = \"voice-note\"by default. Alternatively/additionally, exposetargetas a parameter ontts.convertand the bundledttsagent tool's input schema so callers can opt in explicitly. Also consider honoring[[audio_as_voice]]reply directives at the synthesis stage (today they only affect downstream delivery).Honor
outputFormatoverride for voice-note in ElevenLabs (and friends), and alignfileExtension. Inextensions/elevenlabs/speech-provider.ts:469-513, derivefileExtensionfrom the resolvedoutputFormatrather than hardcoding.opusfor voice-note. That lets users pinoutputFormat: \"mp3_44100_128\"and have ElevenLabs return MP3 with.mp3extension while still markingvoiceCompatible: true. (Optional: add a siblingvoiceNoteOutputFormatconfig field matching the Azure provider's pattern, for symmetry.)Both changes are relatively contained. Either one alone is insufficient — closing Gap 1 only routes us into the opus-rejection trap; closing Gap 2 only is unreachable without Gap 1.
Related
talk.configSecretRef redaction, also blocking iOS/macOS Talk Mode end-to-end.messages.tts.providers.<id>.apiKeysiblings as broken (compounds this issue when secrets are stored as SecretRefs).No PII
All voice IDs, key material, file paths, and account-specific identifiers are placeholders. Reproduces on a clean LaunchAgent install with any ElevenLabs voice and a BlueBubbles server with the private API enabled.