Your voice, your machine.

Open source voice cloning studio with support for multiple TTS engines. Clone any voice, generate natural speech, and compose multi-voice projects — all running locally.

Download View on GitHub

macOS, Windows, Linux

try me!

Voicebox

Jarvis

Dry wit, composed British AI assistant

Samuel L. Jackson

Commanding intensity with sharp, punchy delivery

Bob Ross

Gentle, soothing voice full of quiet encouragement

Sam Altman

Measured, thoughtful Silicon Valley cadence

Morgan Freeman

Rich, warm baritone with gravitas and calm authority

Linus Tech Tips

Enthusiastic, fast-paced tech explainer energy

Fireship

Rapid-fire, deadpan tech humor with zero filler

Scarlett Johansson

Smooth, low alto with understated warmth

Dario Amodei

Calm, precise articulation with academic depth

David Attenborough

Warm, reverent narration with wonder and precision

Zendaya

Relaxed, modern delivery with effortless cool

Barack Obama

Measured cadence with rhythmic pauses and gravitas

Jarvis

Dry wit, composed British AI assistant

Samuel L. Jackson

Commanding intensity with sharp, punchy delivery

Bob Ross

Gentle, soothing voice full of quiet encouragement

Sam Altman

Measured, thoughtful Silicon Valley cadence

Morgan Freeman

Rich, warm baritone with gravitas and calm authority

Linus Tech Tips

Enthusiastic, fast-paced tech explainer energy

Fireship

Rapid-fire, deadpan tech humor with zero filler

Scarlett Johansson

Smooth, low alto with understated warmth

Dario Amodei

Calm, precise articulation with academic depth

David Attenborough

Warm, reverent narration with wonder and precision

Zendaya

Relaxed, modern delivery with effortless cool

Barack Obama

Measured cadence with rhythmic pauses and gravitas

Generate speech using Jarvis...

EnglishQwen 1.7BRobot

Morgan Freeman

enQwen 1.7B0:08

2 minutes ago

The neural pathways of human speech contain more complexity than any language model can fully capture, yet we keep pushing the boundaries of what is possible.

Samuel L. Jackson

enQwen 1.7B0:07

15 minutes ago

In a world increasingly shaped by artificial intelligence, the human voice remains our most powerful tool for connection and storytelling.

Jarvis

enQwen 0.6B0:09

1 hour ago

The architecture of modern text-to-speech systems reveals an elegant interplay between transformer models and acoustic feature prediction.

Bob Ross

enChatterbox0:06

3 hours ago

Welcome to the next chapter. Every great story begins with a single voice, and today that voice can be yours.

Linus Tech Tips

enQwen 1.7B0:05

5 hours ago

Local inference gives you complete control over your voice data. No cloud, no subscriptions, no compromises.

0:00/0:00

Jarvis

Professional voice tools, zero compromise

Everything you need to clone voices, generate speech, and produce multi-voice content — running entirely on your machine.

Near-Perfect Voice Cloning

Multiple TTS engines for exceptional voice quality. Clone any voice from a few seconds of audio with natural intonation and emotion.

Stories Editor

Create multi-voice narratives with a timeline-based editor. Arrange tracks, trim clips, and mix conversations between characters.

Audio Effects Pipeline

Apply pitch shift, reverb, delay, compression, and more — then save as presets. Preview effects live and set defaults per voice profile.

Local or Remote

Run GPU inference locally with Metal, CUDA, ROCm, Intel Arc, or DirectML — or connect to a remote machine. One-click server setup with automatic discovery.

Audio Transcription

Unlimited Generation Length

Generate up to 50,000 characters in one go. Text is auto-split at sentence boundaries, generated per-chunk, and crossfaded seamlessly.

Clone any voice in seconds

Three ways to capture a voice sample. Upload a clip, record from your microphone, or capture audio playing on your system. Voicebox clones the voice from as little as 3 seconds of audio.

Upload a clip

Drag and drop any audio file — WAV, MP3, FLAC, or WebM.

Record from microphone

Live waveform preview while you record. Up to 30 seconds.

System audio capture

Clone a voice from a YouTube video, podcast, or any app playing audio.

Start Recording

Click to record from your microphone.
Maximum duration: 30 seconds.

Multi-Engine Architecture

Choose the right model for every job. All models run locally on your hardware — download once, use forever.

Qwen3-TTS

by Alibaba

1.7B0.6B

High-quality multilingual voice cloning with natural prosody. The only engine with delivery instructions — control tone, pace, and emotion with natural language.

10 languagesDelivery instructions