Voicebox Documentation
Voicebox is a local-first voice cloning studio -- a free and open-source alternative to ElevenLabs.
Voicebox is a local-first voice cloning studio -- a free and open-source alternative to ElevenLabs. Clone voices from a few seconds of audio, generate speech in 23 languages across 5 TTS engines, apply post-processing effects, and compose multi-voice projects with a timeline editor.

- Complete privacy -- models and voice data stay on your machine
- 5 TTS engines -- Qwen3-TTS, LuxTTS, Chatterbox Multilingual, Chatterbox Turbo, and HumeAI TADA
- 23 languages -- from English to Arabic, Japanese, Hindi, Swahili, and more
- Post-processing effects -- pitch shift, reverb, delay, chorus, compression, and filters
- Expressive speech -- paralinguistic tags like
[laugh],[sigh],[gasp]via Chatterbox Turbo - Unlimited length -- auto-chunking with crossfade for scripts, articles, and chapters
- Stories editor -- multi-track timeline for conversations, podcasts, and narratives
- API-first -- REST API for integrating voice synthesis into your own projects
- Native performance -- built with Tauri (Rust), not Electron
- Runs everywhere -- macOS (MLX/Metal), Windows (CUDA), Linux, AMD ROCm, Intel Arc, Docker
Download
| Platform | Download |
|---|---|
| macOS (Apple Silicon) | Download DMG |
| macOS (Intel) | Download DMG |
| Windows | Download MSI |
| Docker | docker compose up |
Get Started
- Installation -- download and install Voicebox
- Quick Start -- get up and running in 5 minutes
- API Reference -- integrate voice synthesis into your apps