Voicebox

Voicebox Documentation

Voicebox is a local-first voice cloning studio -- a free and open-source alternative to ElevenLabs.

Voicebox is a local-first voice cloning studio -- a free and open-source alternative to ElevenLabs. Clone voices from a few seconds of audio, generate speech in 23 languages across 5 TTS engines, apply post-processing effects, and compose multi-voice projects with a timeline editor.

Voicebox App Screenshot

  • Complete privacy -- models and voice data stay on your machine
  • 5 TTS engines -- Qwen3-TTS, LuxTTS, Chatterbox Multilingual, Chatterbox Turbo, and HumeAI TADA
  • 23 languages -- from English to Arabic, Japanese, Hindi, Swahili, and more
  • Post-processing effects -- pitch shift, reverb, delay, chorus, compression, and filters
  • Expressive speech -- paralinguistic tags like [laugh], [sigh], [gasp] via Chatterbox Turbo
  • Unlimited length -- auto-chunking with crossfade for scripts, articles, and chapters
  • Stories editor -- multi-track timeline for conversations, podcasts, and narratives
  • API-first -- REST API for integrating voice synthesis into your own projects
  • Native performance -- built with Tauri (Rust), not Electron
  • Runs everywhere -- macOS (MLX/Metal), Windows (CUDA), Linux, AMD ROCm, Intel Arc, Docker

Download

PlatformDownload
macOS (Apple Silicon)Download DMG
macOS (Intel)Download DMG
WindowsDownload MSI
Dockerdocker compose up

View all releases

Get Started

On this page