Skip to content

Talk Mode: Support on-device TTS (iOS AVSpeechSynthesizer) as alternative to ElevenLabs #42630

@charlesnjohn-bot

Description

@charlesnjohn-bot

Feature Request

Current behavior: Talk Mode only supports ElevenLabs for text-to-speech synthesis, requiring audio to be streamed from the cloud on every response.

Proposed behavior: Add support for on-device TTS as a Talk Mode provider option, starting with iOS AVSpeechSynthesizer. This would allow the iPhone to speak responses locally using Apple's built-in voices instead of streaming audio from ElevenLabs.

Why this matters

  • Zero latency on the TTS step — no network round-trip for audio streaming
  • Works offline — voice responses don't depend on an external API
  • No API costs — ElevenLabs usage adds up, especially for frequent Talk Mode users
  • Apple's premium voices are good — the newer downloaded Siri voices sound natural and are completely on-device
  • Privacy — response text stays on-device rather than being sent to a third-party TTS API

Suggested implementation

  • New Talk provider option (e.g., "provider": "system" or "provider": "native") in talk config
  • On iOS, use AVSpeechSynthesizer with configurable voice identifier
  • Could extend to macOS (NSSpeechSynthesizer) and Android (TextToSpeech) as well
  • Keep ElevenLabs as the default for users who prefer cloud voices

Config example

{
  talk: {
    provider: "native",  // or "elevenlabs" (default)
    // native-specific settings
    nativeVoiceId: "com.apple.voice.premium.en-US.Zoe",
    nativeRate: 0.5,
  }
}

Platform: iOS (OpenClaw iOS node app)
Talk Mode docs: docs/nodes/talk.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions