Skip to content

ekwek1/soprano

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Soprano: Instant, Ultra‑Realistic Text‑to‑Speech

Alt Text Alt Text

soprano-github

📰 News

2026.01.14 - Soprano-1.1-80M released! 95% fewer hallucinations and a 63% preference rate over Soprano-80M.
2026.01.13 - Soprano-Factory released! You can now train/fine-tune your own Soprano models.
2025.12.22 - Soprano-80M released! Model | Demo


Overview

Soprano is an ultra‑lightweight, on-device text‑to‑speech (TTS) model designed for expressive, high‑fidelity speech synthesis at unprecedented speed. Soprano was designed with the following features:

  • Up to 20x real-time generation on CPU and 2000x real-time on GPU
  • Lossless streaming with <250 ms latency on CPU, <15 ms on GPU
  • <1 GB memory usage with a compact 80M parameter architecture
  • Infinite generation length with automatic text splitting
  • Highly expressive, crystal clear audio generation at 32kHz
  • Widespread support for CUDA, CPU, and MPS devices on Windows, Linux, and Mac
  • Supports OpenAI-compatible endpoint, ONNX, WebUI, CLI, and ComfyUI for easy and production-ready inference
soprano-demo.mp4

Table of Contents

Installation

Install with wheel (CUDA)

pip install soprano-tts[lmdeploy]

Install with wheel (CPU/MPS)

pip install soprano-tts

To get the latest features, you can install from source instead.

Install from source (CUDA)

git clone https://github.com/ekwek1/soprano.git
cd soprano
pip install -e .[lmdeploy]

Install from source (CPU/MPS)

git clone https://github.com/ekwek1/soprano.git
cd soprano
pip install -e .

⚠️ Warning: Windows CUDA users

On Windows with CUDA, pip will install a CPU-only PyTorch build. To ensure CUDA support works as expected, reinstall PyTorch explicitly with the correct CUDA wheel after installing Soprano:

pip uninstall -y torch
pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128

Usage

WebUI

Start WebUI:

soprano-webui # hosted on http://127.0.0.1:7860 by default

Tip: You can increase cache size and decoder batch size to increase inference speed at the cost of higher memory usage. For example:

soprano-webui --cache-size 1000 --decoder-batch-size 4

CLI

soprano "Soprano is an extremely lightweight text to speech model."

optional arguments:
  --output, -o                  Output audio file path (non-streaming only). Defaults to 'output.wav'
  --model-path, -m              Path to local model directory (optional)
  --device, -d                  Device to use for inference. Supported: auto, cuda, cpu, mps. Defaults to 'auto'
  --backend, -b                 Backend to use for inference. Supported: auto, transformers, lmdeploy. Defaults to 'auto'
  --cache-size, -c              Cache size in MB (for lmdeploy backend). Defaults to 100
  --decoder-batch-size, -bs     Decoder batch size. Defaults to 1
  --streaming, -s               Enable streaming playback to speakers

Tip: You can increase cache size and decoder batch size to increase inference speed at the cost of higher memory usage.

Note: The CLI will reload the model every time it is called. As a result, inference speed will be slower than other methods.

OpenAI-compatible endpoint

Start server:

uvicorn soprano.server:app --host 0.0.0.0 --port 8000

Use the endpoint like this:

curl http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Soprano is an extremely lightweight text to speech model."
  }' \
  --output speech.wav

Note: Currently, this endpoint only supports nonstreaming output.

Python script

from soprano import SopranoTTS

model = SopranoTTS(backend='auto', device='auto', cache_size_mb=100, decoder_batch_size=1)

Tip: You can increase cache_size_mb and decoder_batch_size to increase inference speed at the cost of higher memory usage.

# Basic inference
out = model.infer("Soprano is an extremely lightweight text to speech model.") # can achieve 2000x real-time with sufficiently long input!

# Save output to a file
out = model.infer("Soprano is an extremely lightweight text to speech model.", "out.wav")

# Custom sampling parameters
out = model.infer(
    "Soprano is an extremely lightweight text to speech model.",
    temperature=0.3,
    top_p=0.95,
    repetition_penalty=1.2,
)


# Batched inference
out = model.infer_batch(["Soprano is an extremely lightweight text to speech model."] * 10) # can achieve 2000x real-time with sufficiently large input size!

# Save batch outputs to a directory
out = model.infer_batch(["Soprano is an extremely lightweight text to speech model."] * 10, "/dir")


# Streaming inference
from soprano.utils.streaming import play_stream
stream = model.infer_stream("Soprano is an extremely lightweight text to speech model.", chunk_size=1)
play_stream(stream) # plays audio with <15 ms latency!

3rd-party tools

ONNX

https://github.com/KevinAHM/soprano-web-onnx

ComfyUI Nodes

https://github.com/jo-nike/ComfyUI-SopranoTTS

https://github.com/SanDiegoDude/ComfyUI-Soprano-TTS

Usage tips:

  • When quoting, use double quotes instead of single quotes.
  • Soprano works best when each sentence is between 2 and 30 seconds long.
  • Although Soprano recognizes numbers and some special characters, it occasionally mispronounces them. Best results can be achieved by converting these into their phonetic form. (1+1 -> one plus one, etc)
  • If Soprano produces unsatisfactory results, you can easily regenerate it for a new, potentially better generation. You may also change the sampling settings for more varied results.

Roadmap

  • Add model and inference code
  • Seamless streaming
  • Batched inference
  • Command-line interface (CLI)
  • CPU support
  • Server / API inference
  • ROCm support (see #29)
  • Additional LLM backends
  • Voice cloning
  • Multilingual support

Limitations

Soprano is currently English-only and does not support voice cloning. In addition, Soprano was trained on only 1,000 hours of audio (~100x less than other TTS models), so mispronunciation of uncommon words may occur. This is expected to diminish as Soprano is trained on more data.


Acknowledgements

Soprano uses and/or is inspired by the following projects:


License

This project is licensed under the Apache-2.0 license. See LICENSE for details.

About

Soprano: Instant, Ultra-Realistic Text-to-Speech

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages