Convert text to natural-sounding speech using Speechmatics TTS API.
Generate high-quality speech audio from text in under 5 minutes with minimal setup.
- How to use the Speechmatics TTS SDK
- Generate speech from text with different voices
- Save audio output to WAV files
- Available voice options (UK/US English)
- Speechmatics API Key: Get one from portal.speechmatics.com
- Python 3.8+
Step 1: Create and activate a virtual environment
On Windows:
cd python
python -m venv .venv
.venv\Scripts\activateOn Mac/Linux:
cd python
python3 -m venv .venv
source .venv/bin/activateStep 2: Install dependencies
pip install -r requirements.txtStep 3: Configure your API key
cp ../.env.example .envOpen the .env file and add your API key:
SPEECHMATICS_API_KEY=your_actual_api_key_here
Important
Why .env? Never commit API keys to version control. The .env file keeps secrets out of your code.
Step 4: Run the example
python main.pyThis is the complete code to generate speech:
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.tts import AsyncClient, Voice, OutputFormat
load_dotenv()
async def main():
api_key = os.getenv("SPEECHMATICS_API_KEY")
text = "Hello! Welcome to Speechmatics text to speech."
async with AsyncClient(api_key=api_key) as client:
# Generate speech
response = await client.generate(
text=text,
voice=Voice.SARAH,
output_format=OutputFormat.WAV_16000,
)
# Read complete audio response and save to file
audio_data = await response.read()
with open("output.wav", "wb") as f:
f.write(audio_data)
if __name__ == "__main__":
asyncio.run(main())| Voice | Language | Gender | ID |
|---|---|---|---|
| Sarah | English (UK) | Female | sarah |
| Theo | English (UK) | Male | theo |
| Megan | English (US) | Female | megan |
| Jack | English (US) | Male | jack |
Note
More voices coming soon! Check the Speechmatics TTS documentation for the latest available voices.
from speechmatics.tts import Voice
# UK Female
voice = Voice.SARAH
# UK Male
voice = Voice.THEO
# US Female
voice = Voice.MEGAN
# US Male (use string ID)
voice = "jack"Note
The jack voice may not be available in the Voice enum yet. Use the string "jack" directly.
| Format | Description | Sample Rate |
|---|---|---|
WAV_16000 |
Complete WAV file | 16 kHz, 16-bit mono |
PCM_16000 |
Raw PCM data | 16 kHz, 16-bit mono |
Speechmatics Text-to-Speech Demo
========================================
Text: Hello! Welcome to Speechmatics text to speech. This is a demonstration of natural sounding speech synthesis.
Voice: Sarah (English UK Female)
Output: output.wav
Generating speech...
Audio saved to: assets/output.wav
Available voices:
- sarah: English Female (UK)
- theo: English Male (UK)
- megan: English Female (US)
- jack: English Male (US)
| Feature | Description |
|---|---|
| Low Latency | Under 200ms initial latency for streaming |
| Natural Speech | High-quality, natural-sounding voices |
| Multiple Voices | UK and US English male/female options |
| Simple API | Single method call to generate speech |
Note
- Load Environment - Load your API key from
.envfile - Define Text - Set the text you want to convert to speech
- Create Client - Initialize AsyncClient with your API key
- Generate - Call
client.generate()with text, voice, and output format - Save Audio - Read the response and write audio bytes to file
Now that you have basic TTS working, explore:
- Hello World - Transcribe audio (STT)
- Batch vs Real-time - Understand different STT modes
- LiveKit Voice Assistant - Build a complete voice agent with STT + LLM + TTS
Error: "Invalid API key"
- Check your
.envfile has the correctSPEECHMATICS_API_KEY - Verify your key at portal.speechmatics.com
- Make sure there are no extra spaces or quotes around the key
Error: "Module not found"
- Make sure you installed dependencies:
pip install -r requirements.txt - Verify you're in the activated virtual environment
No audio output
- Check the
assets/folder for theoutput.wavfile - Try playing the file with your default audio player
Help us improve this guide:
- Found an issue? Report it
- Have suggestions? Open a discussion
Time to Complete: 5 minutes Difficulty: Beginner SDK: TTS