Soprano: Instant, Ultra‑Realistic Text‑to‑Speech

📰 News

2026.01.14 - Soprano-1.1-80M released! 95% fewer hallucinations and a 63% preference rate over Soprano-80M.
2026.01.13 - Soprano-Factory released! You can now train/fine-tune your own Soprano models.
2025.12.22 - Soprano-80M released! Model | Demo

Overview

Soprano is an ultra‑lightweight, on-device text‑to‑speech (TTS) model designed for expressive, high‑fidelity speech synthesis at unprecedented speed. Soprano was designed with the following features:

Up to 20x real-time generation on CPU and 2000x real-time on GPU
Lossless streaming with <250 ms latency on CPU, <15 ms on GPU
<1 GB memory usage with a compact 80M parameter architecture
Infinite generation length with automatic text splitting
Highly expressive, crystal clear audio generation at 32kHz
Widespread support for CUDA, CPU, and MPS devices on Windows, Linux, and Mac
Supports OpenAI-compatible endpoint, ONNX, WebUI, CLI, and ComfyUI for easy and production-ready inference

soprano-demo.mp4

Installation

Install with wheel (CUDA)

pip install soprano-tts[lmdeploy]

Install with wheel (CPU/MPS)

pip install soprano-tts

To get the latest features, you can install from source instead.

Install from source (CUDA)

git clone https://github.com/ekwek1/soprano.git
cd soprano
pip install -e .[lmdeploy]

Install from source (CPU/MPS)

git clone https://github.com/ekwek1/soprano.git
cd soprano
pip install -e .

⚠️ Warning: Windows CUDA users

On Windows with CUDA, pip will install a CPU-only PyTorch build. To ensure CUDA support works as expected, reinstall PyTorch explicitly with the correct CUDA wheel after installing Soprano:
pip uninstall -y torch
pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128

Usage

WebUI

Start WebUI:

soprano-webui # hosted on http://127.0.0.1:7860 by default

Tip: You can increase cache size and decoder batch size to increase inference speed at the cost of higher memory usage. For example:
soprano-webui --cache-size 1000 --decoder-batch-size 4

CLI

soprano "Soprano is an extremely lightweight text to speech model."

optional arguments:
  --output, -o                  Output audio file path (non-streaming only). Defaults to 'output.wav'
  --model-path, -m              Path to local model directory (optional)
  --device, -d                  Device to use for inference. Supported: auto, cuda, cpu, mps. Defaults to 'auto'
  --backend, -b                 Backend to use for inference. Supported: auto, transformers, lmdeploy. Defaults to 'auto'
  --cache-size, -c              Cache size in MB (for lmdeploy backend). Defaults to 100
  --decoder-batch-size, -bs     Decoder batch size. Defaults to 1
  --streaming, -s               Enable streaming playback to speakers

Tip: You can increase cache size and decoder batch size to increase inference speed at the cost of higher memory usage.

Note: The CLI will reload the model every time it is called. As a result, inference speed will be slower than other methods.

OpenAI-compatible endpoint

Start server:

uvicorn soprano.server:app --host 0.0.0.0 --port 8000

Use the endpoint like this:

curl http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Soprano is an extremely lightweight text to speech model."
  }' \
  --output speech.wav

Note: Currently, this endpoint only supports nonstreaming output.

Python script

from soprano import SopranoTTS

model = SopranoTTS(backend='auto', device='auto', cache_size_mb=100, decoder_batch_size=1)

Tip: You can increase cache_size_mb and decoder_batch_size to increase inference speed at the cost of higher memory usage.

# Basic inference
out = model.infer("Soprano is an extremely lightweight text to speech model.") # can achieve 2000x real-time with sufficiently long input!

# Save output to a file
out = model.infer("Soprano is an extremely lightweight text to speech model.", "out.wav")

# Custom sampling parameters
out = model.infer(
    "Soprano is an extremely lightweight text to speech model.",
    temperature=0.3,
    top_p=0.95,
    repetition_penalty=1.2,
)


# Batched inference
out = model.infer_batch(["Soprano is an extremely lightweight text to speech model."] * 10) # can achieve 2000x real-time with sufficiently large input size!

# Save batch outputs to a directory
out = model.infer_batch(["Soprano is an extremely lightweight text to speech model."] * 10, "/dir")


# Streaming inference
from soprano.utils.streaming import play_stream
stream = model.infer_stream("Soprano is an extremely lightweight text to speech model.", chunk_size=1)
play_stream(stream) # plays audio with <15 ms latency!

3rd-party tools

ONNX

https://github.com/KevinAHM/soprano-web-onnx

ComfyUI Nodes

https://github.com/jo-nike/ComfyUI-SopranoTTS

https://github.com/SanDiegoDude/ComfyUI-Soprano-TTS

Usage tips:

When quoting, use double quotes instead of single quotes.
Soprano works best when each sentence is between 2 and 30 seconds long.
Although Soprano recognizes numbers and some special characters, it occasionally mispronounces them. Best results can be achieved by converting these into their phonetic form. (1+1 -> one plus one, etc)
If Soprano produces unsatisfactory results, you can easily regenerate it for a new, potentially better generation. You may also change the sampling settings for more varied results.

Roadmap

Limitations

Soprano is currently English-only and does not support voice cloning. In addition, Soprano was trained on only 1,000 hours of audio (~100x less than other TTS models), so mispronunciation of uncommon words may occur. This is expected to diminish as Soprano is trained on more data.

Acknowledgements

Soprano uses and/or is inspired by the following projects:

License

This project is licensed under the Apache-2.0 license. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
examples		examples
soprano		soprano
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Soprano: Instant, Ultra‑Realistic Text‑to‑Speech

📰 News

Overview

Table of Contents

Installation

Install with wheel (CUDA)

Install with wheel (CPU/MPS)

Install from source (CUDA)

Install from source (CPU/MPS)

⚠️ Warning: Windows CUDA users

Usage

WebUI

CLI

OpenAI-compatible endpoint

Python script

3rd-party tools

ONNX

ComfyUI Nodes

Usage tips:

Roadmap

Limitations

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Soprano: Instant, Ultra‑Realistic Text‑to‑Speech

📰 News

Overview

Table of Contents

Installation

Install with wheel (CUDA)

Install with wheel (CPU/MPS)

Install from source (CUDA)

Install from source (CPU/MPS)

⚠️ Warning: Windows CUDA users

Usage

WebUI

CLI

OpenAI-compatible endpoint

Python script

3rd-party tools

ONNX

ComfyUI Nodes

Usage tips:

Roadmap

Limitations

Acknowledgements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages