How do I get an API key?

Sign up for a free TTS.ai account, then navigate to your account dashboard and click "Generate API Key." Your key will be prefixed with sk-tts- and can be used immediately. Free accounts receive 15 credits to get started.

Is the API compatible with OpenAI's format?

Yes, our API follows OpenAI-compatible request and response formats. If you have existing code that uses OpenAI's TTS API, you can switch to TTS.ai by changing the base URL and API key with minimal code changes.

What programming languages are supported?

The REST API works with any language that can make HTTP requests. We provide code examples in Python, JavaScript (Node.js and browser), cURL, and more. Any language with an HTTP client library (Go, Ruby, Java, C#, PHP, etc.) can use the API.

What are the API rate limits?

Free accounts are limited to 3 requests per hour. Paid plans have higher limits based on your subscription tier: Starter (60/hour), Professional (300/hour), Enterprise (unlimited). Rate limit headers are included in every API response.

How does API pricing and credits work?

API usage consumes credits based on the model tier and text length. Free models use 0 credits, standard models use 2 credits per 1K characters, and premium models use 4 credits per 1K characters. Credits are included in all paid plans and can also be purchased separately.

What endpoints are available?

The API provides endpoints for text-to-speech (POST /v1/tts/), speech-to-text (POST /v1/transcribe/), voice cloning (POST /v1/voice-clone/), voice conversion (POST /v1/voice-convert/), speech translation (POST /v1/speech-translate/), audio enhancement (POST /v1/audio-enhance/), vocal removal, stem splitting, key and BPM analysis, and more.

What audio formats does the API return?

The API returns audio in WAV format by default. You can specify the output format (mp3, wav, ogg, flac) using the response_format parameter. MP3 is recommended for web applications, WAV for further audio processing.

Is there a streaming API for real-time TTS?

Yes, our async API returns a job UUID that you can poll for results. For supported models like Kokoro, audio generation is fast enough for near-real-time applications. The polling endpoint returns the audio URL when processing is complete.

How do I handle errors in the API?

The API returns standard HTTP status codes (400 for bad requests, 401 for auth errors, 429 for rate limits, 500 for server errors) with JSON error messages. Always check the status code and error field in responses for proper error handling.

Can I use the API for commercial applications?

Yes, the API is designed for commercial use. Audio generated through the API can be used in your products, applications, and services. All models use open-source licenses, and there are no additional royalties on generated audio.

Is there a sandbox or testing environment?

Free-tier models (Kokoro, Piper, VITS, MeloTTS) serve as an excellent sandbox — they use zero credits and are available to all accounts. Test your integration with free models before switching to premium models for production use.

How do I list available voices and models via the API?

Use GET /v1/voices to list all available voices with filtering options (model, language, gender). Use GET /v1/models to list all available TTS models with their capabilities and tier information. Both endpoints return JSON responses.

TTS.ai API Documentation - Text to Speech REST API

Overview

The TTS.ai API provides programmatic access to all platform features: text-to-speech synthesis, speech-to-text transcription, voice cloning, audio enhancement, and more. The API uses standard REST conventions with JSON request/response bodies.

API Key

Get your API key from Account Settings. Available on Pro and Enterprise plans.

Base URL

https://api.tts.ai/v1/

Auth

Bearer token via Authorization header

Authentication

All API requests require authentication via a Bearer token in the Authorization header.

HTTP Header

Authorization: Bearer sk-tts-your-api-key-here

Keep your API key secret. Do not share it in client-side code, public repositories, or logs. Rotate keys regularly from your account settings.

SDKs

Official SDKs make it easy to integrate TTS.ai into your application. Both are open source and available on GitHub.

Python

pip install ttsai

from tts_ai import TTSClient

client = TTSClient(api_key="sk-tts-...")
audio = client.generate(
    text="Hello world!",
    model="kokoro"
)
client.save(audio, "output.wav")

GitHub

JavaScript / Node.js

npm install @ttsainpm/ttsai

const { TTSClient } = require('@ttsainpm/ttsai');

const client = new TTSClient({
  apiKey: 'sk-tts-...'
});
const audio = await client.generate({
  input: 'Hello world!',
  model: 'kokoro'
});
await client.saveToFile(audio, 'output.wav');

GitHub

Base URL

Base URL: https://api.tts.ai/v1/

All endpoints are relative to this base URL. For example, the TTS endpoint is:

POST https://api.tts.ai/v1/tts/

Rate Limits

API rate limits vary by plan:

Plan	Requests/min	Concurrent	Max Text Length
Pro	60	5	5,000 chars
Enterprise	300	20	50,000 chars

Rate limit headers are included in every response: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset.

Credit Costs

Service	Cost	Unit
TTS (Free models: Piper, VITS, MeloTTS)	1 credit	per 1,000 characters
TTS (Standard models: Kokoro, CosyVoice 2, etc.)	2 credits	per 1,000 characters
TTS (Premium models: Tortoise, Chatterbox, etc.)	4 credits	per 1,000 characters
Speech to Text	2 credits	per minute of audio
Voice Cloning	4 credits	per 1,000 characters
Voice Changer	3 credits	per minute of audio
Audio Enhancement	2 credits	per minute of audio
Vocal Removal / Stem Splitting	3-4 credits	per minute of audio
Speech Translation	5 credits	per minute of audio
Voice Chat	3 credits	per turn
Key & BPM Finder	Free	--
Audio Converter	Free	--

Text to Speech

POST /v1/tts/

Convert text to speech audio. Returns audio file in the requested format.

Request Body

Parameter	Type	Required	Description
model	string	Yes	Model ID (e.g., `kokoro`, `chatterbox`, `piper`)
text	string	Yes	Text to convert to speech (max 5,000 chars for Pro, 50,000 for Enterprise)
voice	string	Yes	Voice ID (use `/v1/voices/` to list available voices)
format	string	No	Output format: `mp3` (default), `wav`, `flac`, `ogg`
speed	float	No	Speaking speed multiplier. Default: `1.0`. Range: `0.5` to `2.0`
language	string	No	Language code (e.g., `en`, `es`). Auto-detected if omitted.
stream	boolean	No	Enable streaming response. Default: `false`

Example Request

cURL

curl -X POST https://api.tts.ai/v1/tts/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "text": "Hello from TTS.ai! This is a test.",
    "voice": "af_bella",
    "format": "mp3"
  }' \
  --output output.mp3

Response

Returns the audio file as binary data with appropriate Content-Type header (audio/mpeg, audio/wav, etc.).

Response Headers

Content-Type: audio/mpeg
Content-Length: 48256
X-Credits-Used: 2
X-Credits-Remaining: 498

Speech to Text

POST /v1/stt/

Transcribe audio to text. Supports 99 languages with auto-detection.

Request Body (multipart/form-data)

Parameter	Type	Required	Description
file	file	Yes	Audio file (MP3, WAV, FLAC, OGG, M4A, MP4, WebM). Max 100MB.
model	string	No	STT model: `whisper` (default), `faster-whisper`, `sensevoice`
language	string	No	Language code. `auto` for auto-detection (default).
timestamps	boolean	No	Include word-level timestamps. Default: `false`
diarize	boolean	No	Enable speaker diarization. Default: `false`

Response

JSON Response

{
  "text": "Hello, this is a transcription test.",
  "language": "en",
  "duration": 3.5,
  "segments": [
    {
      "start": 0.0,
      "end": 1.8,
      "text": "Hello, this is",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 1.8,
      "end": 3.5,
      "text": "a transcription test.",
      "speaker": "SPEAKER_00"
    }
  ]
}

Voice Cloning

POST /v1/tts/clone/

Generate speech in a cloned voice. Upload a reference audio and text.

Request Body (multipart/form-data)

Parameter	Type	Required	Description
reference_audio	file	Yes	Reference voice audio (10-30 seconds recommended). Max 20MB.
text	string	Yes	Text to speak in the cloned voice.
model	string	No	Clone model: `chatterbox` (default), `cosyvoice2`, `gpt-sovits`
format	string	No	Output format: `mp3` (default), `wav`, `flac`
language	string	No	Target language code. Must be supported by the chosen model.

Response

Returns the audio file as binary data, same as the TTS endpoint.

Voice Changer

POST /v1/voice-convert/

Convert audio to sound like a different voice. Upload source audio and choose a target voice.

Request Body (multipart/form-data)

Parameter	Type	Required	Description
file	file	Yes	Source audio file (MP3, WAV, FLAC). Max 50MB.
target_voice	string	Yes	Target voice ID to convert to (use `/v1/voices/` to list available voices)
model	string	No	Voice conversion model: `openvoice` (default), `knn-vc`
format	string	No	Output format: `wav` (default), `mp3`, `flac`

Example Request

cURL

curl -X POST https://api.tts.ai/v1/voice-convert/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -F "file=@source_audio.mp3" \
  -F "target_voice=af_bella" \
  -F "model=openvoice" \
  -o converted.wav

Response

Returns the converted audio file as binary data.

Speech Translation

POST /v1/speech-translate/

Translate spoken audio from one language to another. Combines speech-to-text, translation, and text-to-speech in a single call.

Request Body (multipart/form-data)

Parameter	Type	Required	Description
file	file	Yes	Source audio file in the original language. Max 100MB.
target_language	string	Yes	Target language code (e.g., `es`, `fr`, `de`, `ja`)
voice	string	No	Voice for translated output. Auto-selected if omitted.
preserve_voice	boolean	No	Attempt to preserve the original speaker

Response

JSON Response

{
  "original_text": "Hello, how are you?",
  "translated_text": "Hola, como estas?",
  "source_language": "en",
  "target_language": "es",
  "audio_url": "https://api.tts.ai/v1/results/translate_abc123.mp3",
  "credits_used": 5
}

Speech to Speech

POST /v1/speech-to-speech/

Transform speech style, emotion, or delivery while keeping the content. Useful for adjusting tone, pacing, and expressiveness.

Request Body (multipart/form-data)

Parameter	Type	Required	Description
file	file	Yes	Source speech audio file. Max 50MB.
voice	string	Yes	Target voice ID for the output speech
model	string	No	Model: `openvoice` (default), `chatterbox`
emotion	string	No	Target emotion: `neutral`, `happy`, `sad`, `angry`, `excited`
speed	float	No	Speed adjustment. Default: `1.0`. Range: `0.5` to `2.0`

Response

Returns the transformed audio file as binary data.

Audio Tools

Audio processing endpoints for enhancement, vocal removal, stem splitting, and more.

POST /v1/audio/enhance/

Enhance audio quality: denoise, improve clarity, super resolution.

file file	Audio file to enhance
denoise boolean	Enable denoising (default: true)
enhance_clarity boolean	Enhance speech clarity (default: true)
super_resolution boolean	Upscale audio quality (default: false)
strength integer	1-3 (light, medium, strong). Default: 2

POST /v1/audio/separate/

Separate vocals from instrumentals (vocal removal) or split into stems.

file file	Audio file to separate
model string	`demucs` (default) or `spleeter`
stems integer	Number of stems: 2, 4, 5, or 6 (default: 2)
format string	Output format: `wav`, `mp3`, `flac`

POST /v1/audio/dereverb/

Remove echo and reverb from audio recordings.

file file	Audio file to process
type string	`echo` or `reverb` (default: both)
intensity integer	1-5 (default: 3)

POST /v1/audio/analyze/ Free

Analyze audio to detect key, BPM, and time signature.

file file

Audio file to analyze

Response

{
  "key": "C",
  "scale": "Major",
  "bpm": 120.0,
  "time_signature": "4/4",
  "camelot": "8B",
  "compatible_keys": ["C Major", "G Major", "F Major", "A Minor"]
}

POST /v1/audio/convert/ Free

Convert audio between formats.

file file	Audio file to convert
format string	Target format: `mp3`, `wav`, `flac`, `ogg`, `m4a`, `aac`
bitrate integer	Output bitrate in kbps: 64, 128, 192, 256, 320
sample_rate integer	Sample rate: 22050, 44100, 48000
channels string	`mono` or `stereo`

Voice Chat

POST /v1/voice-chat/

Send audio or text and receive an AI response with synthesized speech.

Request Body (multipart/form-data or JSON)

Parameter	Type	Required	Description
audio	file	No*	Audio input (either `audio` or `text` required)
text	string	No*	Text input (either `audio` or `text` required)
voice	string	No	Voice for AI response. Default: `af_bella`
tts_model	string	No	TTS model for response. Default: `kokoro`
system_prompt	string	No	Custom system prompt for the AI
conversation_id	string	No	Continue an existing conversation

Response

JSON Response

{
  "conversation_id": "conv_abc123",
  "user_text": "What is the capital of France?",
  "ai_text": "The capital of France is Paris.",
  "audio_url": "https://api.tts.ai/v1/audio/tmp/resp_xyz.mp3",
  "credits_used": 3
}

Batch TTS

POST /v1/tts/batch/

Submit multiple texts for parallel TTS generation. Optionally receive a webhook callback when all jobs complete.

Parameters

Parameter	Type	Description
texts	array	Array of objects: `{text, model, voice}`. Max 50 items.
webhook_url	string	Optional URL to POST results when batch completes.

Response

JSON Response

{
  "batch_id": "abc123",
  "total": 3,
  "completed": 0,
  "status": "processing"
}

Poll progress with GET /v1/tts/batch/result/?batch_id=abc123

Voice Embedding

POST /v1/voice-embed/

Pre-compute a voice embedding from reference audio. Use the returned embed_id in subsequent voice cloning requests for near-instant generation.

Parameters

Parameter	Type	Description
file	file	Reference audio file (WAV, MP3, FLAC).
model	string	Cloning model (default: chatterbox). Supported: chatterbox, cosyvoice2, openvoice, gpt-sovits, spark, indextts2, qwen3-tts.

Response

JSON Response

{
  "embed_id": "emb_abc123",
  "model": "chatterbox",
  "duration_ms": 450
}

Health Check

GET /v1/health/

Check GPU server status, loaded models, and queue size. No authentication required. Cached for 30 seconds.

Response

JSON Response

{
  "status": "online",
  "latency_ms": 45,
  "queue_size": 3,
  "models_loaded": ["kokoro", "chatterbox", "cosyvoice2"]
}

List Models

GET /v1/models/

Returns a list of all available models with their capabilities.

Response

JSON Response

{
  "models": [
    {
      "id": "kokoro",
      "name": "Kokoro",
      "type": "tts",
      "tier": "standard",
      "languages": ["en", "ja", "ko", "zh", "fr"],
      "supports_cloning": false,
      "supports_streaming": true,
      "credits_per_1k_chars": 2
    },
    {
      "id": "chatterbox",
      "name": "Chatterbox",
      "type": "tts",
      "tier": "premium",
      "languages": ["en"],
      "supports_cloning": true,
      "supports_streaming": true,
      "credits_per_1k_chars": 4
    }
  ]
}

List Voices

GET /v1/voices/

Returns a list of all available voices, optionally filtered by model or language.

Query Parameters

Parameter	Type	Description
model	string	Filter by model ID (e.g., `kokoro`)
language	string	Filter by language code (e.g., `en`)
gender	string	Filter by gender: `male`, `female`, `neutral`

Response

JSON Response

{
  "voices": [
    {
      "id": "af_bella",
      "name": "Bella",
      "model": "kokoro",
      "language": "en",
      "gender": "female",
      "preview_url": "https://api.tts.ai/v1/voices/preview/af_bella.mp3"
    }
  ],
  "total": 142
}

Code Examples

Text to Speech

Python - requests

import requests

API_KEY = "sk-tts-your-key"

# Text to Speech
response = requests.post(
    "https://api.tts.ai/v1/tts/",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "model": "kokoro",
        "text": "Hello from TTS.ai!",
        "voice": "af_bella",
        "format": "mp3"
    }
)

with open("output.mp3", "wb") as f:
    f.write(response.content)

print(f"Credits used: {response.headers.get('X-Credits-Used')}")

Speech to Text

Python - requests

# Speech to Text
with open("recording.mp3", "rb") as f:
    response = requests.post(
        "https://api.tts.ai/v1/stt/",
        headers={"Authorization": f"Bearer {API_KEY}"},
        files={"file": f},
        data={"model": "faster-whisper", "timestamps": "true"}
    )

result = response.json()
print(result["text"])

Voice Cloning

Python - requests

# Voice Cloning
with open("reference.wav", "rb") as ref:
    response = requests.post(
        "https://api.tts.ai/v1/tts/clone/",
        headers={"Authorization": f"Bearer {API_KEY}"},
        files={"reference_audio": ref},
        data={
            "text": "This speech uses a cloned voice.",
            "model": "chatterbox"
        }
    )

with open("cloned_output.mp3", "wb") as f:
    f.write(response.content)

Text to Speech

JavaScript - fetch

const API_KEY = 'sk-tts-your-key';

// Text to Speech
const response = await fetch('https://api.tts.ai/v1/tts/', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'kokoro',
    text: 'Hello from TTS.ai!',
    voice: 'af_bella',
    format: 'mp3'
  })
});

const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();

Speech to Text

JavaScript - fetch

// Speech to Text
const formData = new FormData();
formData.append('file', audioFile);
formData.append('model', 'faster-whisper');

const response = await fetch('https://api.tts.ai/v1/stt/', {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${API_KEY}` },
  body: formData
});

const result = await response.json();
console.log(result.text);

Text to Speech

cURL

# Text to Speech
curl -X POST https://api.tts.ai/v1/tts/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"kokoro","text":"Hello!","voice":"af_bella","format":"mp3"}' \
  -o output.mp3

Speech to Text

cURL

# Speech to Text
curl -X POST https://api.tts.ai/v1/stt/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -F "file=@recording.mp3" \
  -F "model=faster-whisper" \
  -F "timestamps=true"

Voice Cloning

cURL

# Voice Cloning
curl -X POST https://api.tts.ai/v1/tts/clone/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -F "reference_audio=@reference.wav" \
  -F "text=This uses a cloned voice." \
  -F "model=chatterbox" \
  -o cloned.mp3

Audio Enhancement

cURL

# Audio Enhancement
curl -X POST https://api.tts.ai/v1/audio/enhance/ \
  -H "Authorization: Bearer sk-tts-your-key" \
  -F "file=@noisy_audio.mp3" \
  -F "denoise=true" \
  -F "enhance_clarity=true" \
  -o enhanced.mp3

Error Codes

All errors return a JSON response with an error field.

Error Response Format

{
  "error": {
    "code": "insufficient_credits",
    "message": "You do not have enough credits for this request.",
    "credits_required": 4,
    "credits_available": 2
  }
}

HTTP Status	Error Code	Description
400	`bad_request`	Invalid request parameters. Check the error message for details.
401	`unauthorized`	Missing or invalid API key.
402	`insufficient_credits`	Not enough credits. Purchase more at /pricing/.
403	`forbidden`	API access not available on your plan.
404	`not_found`	Model or voice not found.
413	`file_too_large`	Uploaded file exceeds the size limit.
429	`rate_limited`	Too many requests. Check rate limit headers.
500	`internal_error`	Server error. Try again later.
503	`model_loading`	Model is loading. Retry in a few seconds.

Webhooks

For long-running tasks (stem splitting, batch TTS), you can provide a webhook_url parameter. When the task completes, we will POST the result to your URL.

Webhook Payload

{
  "event": "task.completed",
  "task_id": "task_abc123",
  "status": "success",
  "result_url": "https://api.tts.ai/v1/results/task_abc123",
  "credits_used": 12,
  "created_at": "2025-01-15T10:30:00Z",
  "completed_at": "2025-01-15T10:30:45Z"
}

Webhook results are available for download for 24 hours after completion. Make sure to download them promptly.

Ready to Build?

Get your API key and start integrating TTS.ai into your applications.

Sign Up Free View Plans