Portable local text-to-speech built on Higgs Audio v3 TTS — expressive speech in 100+ languages, zero-shot voice cloning, an AI text director, plus Podcast and Audiobook modes. 100% offline, one click.
🚀 One-click cross-platform install via Pinokio:
![]()
Works on Windows / Linux (x64 & aarch64) / macOS · NVIDIA / AMD / Apple Silicon / CPU. No
install.bat— Pinokio sets up CUDA, Python 3.12, PyTorch and dependencies for you. Launcher: timoncool/HiggsAudio-Studio-pinokio
Generate expressive speech in 100+ languages, clone any voice from a reference clip, let a local LLM direct the delivery, and produce full podcasts and audiobooks — entirely on your machine. 100% offline, no cloud, no API keys. Everything lives inside the folder: Python, dependencies, models, cache. Delete the folder — the app is gone.
Built on Higgs Audio v3 TTS by Boson AI — a 4B expressive TTS model with native multi-speaker and inline emotion/prosody control.
- 🎙️ TTS — text → speech, 100+ languages; temperature, top-p/k, seed lock; autoplay; ⏹ Stop interrupts generation on the fly (at the inference level); per-frame progress in the terminal.
- 🎭 Expression + 🤖 Director — insert control tags (
<|emotion|>,<|sfx|>,<|prosody|>,<|style|>) with buttons + an Enrich button: a lightweight local LLM normalizes the text and places emotion/sound/prosody tags by meaning. - 🧬 Voice cloning — zero-shot from a reference clip with auto-transcription (Moonshine ASR); preset library + on-demand Russian voice pack.
- 🎬 Podcast / Dialogue — the LLM writes a multi-speaker script, each speaker its own voice → stitched with loudness leveling across speakers (LUFS −16, podcast standard — no speaker quieter than another).
- 📚 Audiobook — narrator/character attribution with a persistent roster (same character = same voice), long-form with timbre carry-over + loudness normalization.
- 📦 Batch — a list of texts → mass synthesis with a live log.
- 💾 Output format — WAV / MP3 / FLAC / OGG; results saved to
output/with timestamps.
43 control tags: 21 emotions, 10 prosody, 3 styles, 9 sounds. AI director — switchable GGUF model: Qwen3.5-9B (default, ~5.5 GB) / Qwen3.5-4B (light, ~2.5 GB), runs on GPU via llama.cpp. Higgs TTS quantization is on-the-fly (⚗️ experimental), auto by VRAM. RU / EN UI.
| OS | GPU | Status | Acceleration |
|---|---|---|---|
| Windows 10/11 | NVIDIA RTX 30xx–50xx | ✅ tested | CUDA 12.8 + Triton (torch.compile ~2×) |
| Windows 10/11 | NVIDIA RTX 20xx | ✅ expected | CUDA 12.8 + Triton |
| Linux x64 | NVIDIA RTX 20xx–50xx | ✅ expected | CUDA 12.8 + Triton |
| Linux aarch64 | NVIDIA DGX Spark / Jetson | ✅ expected | CUDA 13.0 |
| Windows | AMD RDNA3+ | ✅ expected | DirectML |
| Linux | AMD RDNA3+ | ✅ expected | ROCm 6.3 |
| macOS | Apple Silicon M1–M4 | ✅ expected | MPS |
| macOS | Intel | torch CPU | |
| Any | CPU only | CPU |
Higgs uses PyTorch SDPA (flash kernels built in) and does not need external Flash-Attention 2. The local
install.batbuild targets NVIDIA Windows; full cross-platform support is via Pinokio.
| VRAM | TTS mode | LLM director |
|---|---|---|
| 24 GB+ | bf16 (~11 GB) | 9–12B in 4-bit (~6–8 GB) |
| 12 GB | 8-bit (~6–7 GB) | 4–9B in 4-bit (~3–6 GB) |
| 6–8 GB | 4-bit (~3.5 GB) | 2–4B in 4-bit (~1.5–3 GB) |
| CPU | works, very slow | — |
Models (~9 GB TTS + LLM) download automatically on first run.
- Download this repository.
- Install — run
install.bat, pick your GPU (CUDA 11.8 / 12.6 / 12.8 or CPU). It sets up portable Python, PyTorch and dependencies. - Run — run
run.bat; the app opens in the browser, models download on first launch. Update withupdate.bat.
Or install one-click cross-platform via Pinokio — no install.bat needed.
Other Projects by @timoncool
| Project | Description |
|---|---|
| VoxCPM2 Portable | Multilingual TTS + Voice Design + LoRA fine-tuning |
| Qwen3-TTS | Portable text-to-speech with voice cloning |
| ACE-Step Studio | AI music studio — songs, vocals, covers, videos |
| Foundation Music Lab | Music generation + timeline editor |
| VibeVoice ASR | Portable speech recognition |
| LavaSR | Portable audio enhancement |
| SuperCaption Qwen3-VL | Portable image captioning |
| VideoSOS | AI video production in the browser |
- Nerual Dreming — Telegram | neuro-cartel.com | ArtGeneration.me
- Нейро-Софт — Telegram | portable AI builds
- Boson AI — Higgs Audio v3 — the TTS model
- multimodalart — transformers port of the model
- Slait/russia_voices — 743 Russian voice presets
- Moonshine ASR — reference auto-transcription
- pyloudnorm — EBU R128 loudness leveling · Gradio — UI framework
I build open-source software and do AI research. Most of what I create is free and available to everyone. Your donations help me keep creating without worrying about where the next meal comes from =)
All donation methods | dalink.to/nerual_dreming | boosty.to/neuro_art
- BTC:
1E7dHL22RpyhJGVpcvKdbyZgksSYkYeEBC - ETH (ERC20):
0xb5db65adf478983186d4897ba92fe2c25c594a0c - USDT (TRC20):
TQST9Lp2TjK6FiVkn4fwfGUee7NmkxEE7C
The wrapper code is open. The Higgs Audio v3 weights are distributed by Boson AI under a Research & Non-Commercial license — this application is non-commercial. Voice cloning only with the consent of the voice owner; impersonation, fraud and any illegal use are prohibited. See the model card.