Voicebox

Your voice, your machine.

Open source voice cloning studio with support for multiple TTS engines. Clone any voice, generate natural speech, and compose multi-voice projects — all running locally.

macOS, Windows, Linux

try me!

Voicebox

Jarvis
Dry wit, composed British AI assistant
en
Samuel L. Jackson
Commanding intensity with sharp, punchy delivery
en
Bob Ross
Gentle, soothing voice full of quiet encouragement
en
Sam Altman
Measured, thoughtful Silicon Valley cadence
en
Morgan Freeman
Rich, warm baritone with gravitas and calm authority
en
Linus Tech Tips
Enthusiastic, fast-paced tech explainer energy
en
Fireship
Rapid-fire, deadpan tech humor with zero filler
en
Scarlett Johansson
Smooth, low alto with understated warmth
en
Dario Amodei
Calm, precise articulation with academic depth
en
David Attenborough
Warm, reverent narration with wonder and precision
en
Zendaya
Relaxed, modern delivery with effortless cool
en
Barack Obama
Measured cadence with rhythmic pauses and gravitas
en
Generate speech using Jarvis...
EnglishQwen 1.7BRobot
Morgan Freeman
enQwen 1.7B0:08
2 minutes ago
The neural pathways of human speech contain more complexity than any language model can fully capture, yet we keep pushing the boundaries of what is possible.
Samuel L. Jackson
enQwen 1.7B0:07
15 minutes ago
In a world increasingly shaped by artificial intelligence, the human voice remains our most powerful tool for connection and storytelling.
Jarvis
enQwen 0.6B0:09
1 hour ago
The architecture of modern text-to-speech systems reveals an elegant interplay between transformer models and acoustic feature prediction.
Bob Ross
enChatterbox0:06
3 hours ago
Welcome to the next chapter. Every great story begins with a single voice, and today that voice can be yours.
Linus Tech Tips
enQwen 1.7B0:05
5 hours ago
Local inference gives you complete control over your voice data. No cloud, no subscriptions, no compromises.
0:00/0:00

Professional voice tools, zero compromise

Everything you need to clone voices, generate speech, and produce multi-voice content — running entirely on your machine.

Near-Perfect Voice Cloning

Multiple TTS engines for exceptional voice quality. Clone any voice from a few seconds of audio with natural intonation and emotion.

Stories Editor

Create multi-voice narratives with a timeline-based editor. Arrange tracks, trim clips, and mix conversations between characters.

Audio Effects Pipeline

Apply pitch shift, reverb, delay, compression, and more — then save as presets. Preview effects live and set defaults per voice profile.

Local or Remote

Run GPU inference locally with Metal, CUDA, ROCm, Intel Arc, or DirectML — or connect to a remote machine. One-click server setup with automatic discovery.

Audio Transcription

Powered by Whisper for accurate speech-to-text. Automatically extract reference text from voice samples.

Unlimited Generation Length

Generate up to 50,000 characters in one go. Text is auto-split at sentence boundaries, generated per-chunk, and crossfaded seamlessly.

Clone any voice in seconds

Three ways to capture a voice sample. Upload a clip, record from your microphone, or capture audio playing on your system. Voicebox clones the voice from as little as 3 seconds of audio.

Upload a clip
Drag and drop any audio file — WAV, MP3, FLAC, or WebM.
Record from microphone
Live waveform preview while you record. Up to 30 seconds.
System audio capture
Clone a voice from a YouTube video, podcast, or any app playing audio.
Start Recording

Click to record from your microphone.
Maximum duration: 30 seconds.

Multi-Engine Architecture

Choose the right model for every job. All models run locally on your hardware — download once, use forever.

Qwen3-TTS

by Alibaba
1.7B0.6B

High-quality multilingual voice cloning with natural prosody. The only engine with delivery instructions — control tone, pace, and emotion with natural language.

10 languagesDelivery instructions

Chatterbox

by Resemble AI

Production-grade voice cloning with the broadest language support. 23 languages with zero-shot cloning and emotion exaggeration control.

23 languages

Chatterbox Turbo

by Resemble AI
350M

Lightweight and fast. Supports paralinguistic tags — embed [laugh], [sigh], [gasp] and more directly in your text for expressive, natural speech.

350M params[laugh] [sigh] tags

LuxTTS

by ZipVoice

Ultra-fast, CPU-friendly voice cloning at 48kHz. Exceeds 150x realtime on CPU with ~1GB VRAM. The fastest engine for quick iterations.

150x realtime48kHz output

Download Voicebox

Available for macOS, Windows, and Linux. No dependencies required.