# Audio Services

Text-to-speech, speech-to-text, and sound effects generation

## Navigation

**Getting Started:** [All You Need](/index.md) | [Getting Started](/introduction.md) | [How Sapiom Works](/how-it-works.md) | [Quick Start](/quick-start.md) | [Using Services](/using-services.md) | [For AI Tools](/for-agents.md)  
**Capabilities:** [Overview](/capabilities.md) | [Verify Users](/capabilities/verify.md) | [Search the Web](/capabilities/search.md) | [AI Model Access](/capabilities/ai-models.md) | [Generate Images](/capabilities/images.md) | **Audio Services** | [Browser Automation](/capabilities/browser.md)  
**Integration / Agent Frameworks:** [Overview](/integration/agent-frameworks.md) | [LangChain](/integration/agent-frameworks/langchain.md) | [LangChain Classic](/integration/agent-frameworks/langchain-classic.md)  
**Integration / HTTP Clients:** [Overview](/integration/http-clients.md) | [Axios](/integration/http-clients/axios.md) | [Fetch](/integration/http-clients/fetch.md) | [Node.js HTTP](/integration/http-clients/node-http.md)  
**Governance:** [Overview](/governance.md) | [Setting Up Rules](/governance/rules.md) | [Agents & Identity](/governance/agents.md) | [Activity](/governance/activity.md)  
**Reference / API Reference:** [Introduction](/api-reference/introduction.md)  
**Reference / API Reference / Endpoints:** [Agents Endpoints](/api-reference/endpoints/agents.md) | [Get agent by ID](/api-reference/endpoints/agents/v1-agents-by-id-get.md) | [Update agent](/api-reference/endpoints/agents/v1-agents-by-id-patch.md) | [List all agents](/api-reference/endpoints/agents/v1-agents-get.md) | [Create a new agent](/api-reference/endpoints/agents/v1-agents-post.md) | [API Endpoints](/api-reference/endpoints.md) | [Get Sapiom payment JWKS](/api-reference/endpoints/other/.well-known-sapiom-jwks.json-get.md) | [Analytics Endpoints](/api-reference/endpoints/other.md) | [Get analytics chart](/api-reference/endpoints/other/v1-analytics-chart-get.md) | [Get analytics leaderboards](/api-reference/endpoints/other/v1-analytics-leaderboards-get.md) | [Get analytics summary](/api-reference/endpoints/other/v1-analytics-summary-get.md) | [Rules Endpoints](/api-reference/endpoints/rules.md) | [Get rule by ID](/api-reference/endpoints/rules/v1-spending-rules-by-id-get.md) | [Update a rule](/api-reference/endpoints/rules/v1-spending-rules-by-ruleId-put.md) | [List all rules](/api-reference/endpoints/rules/v1-spending-rules-get.md) | [Create a new rule](/api-reference/endpoints/rules/v1-spending-rules-post.md) | [Transactions Endpoints](/api-reference/endpoints/transactions.md) | [Complete a transaction](/api-reference/endpoints/transactions/v1-transactions-by-transactionId-complete-post.md) | [List transaction costs](/api-reference/endpoints/transactions/v1-transactions-by-transactionId-costs-get.md) | [Add cost to transaction](/api-reference/endpoints/transactions/v1-transactions-by-transactionId-costs-post.md) | [Add facts to transaction](/api-reference/endpoints/transactions/v1-transactions-by-transactionId-facts-post.md) | [Get transaction details](/api-reference/endpoints/transactions/v1-transactions-by-transactionId-get.md) | [Reauthorize a transaction with x402 payment data](/api-reference/endpoints/transactions/v1-transactions-by-transactionId-reauthorize-post.md) | [List transactions](/api-reference/endpoints/transactions/v1-transactions-get.md) | [Create a new transaction](/api-reference/endpoints/transactions/v1-transactions-post.md) | [Verification Endpoints](/api-reference/endpoints/verification.md) | [Check verification code](/api-reference/endpoints/verification/v1-services-verify-check-post.md) | [Send verification code](/api-reference/endpoints/verification/v1-services-verify-send-post.md)  
**Reference / SDK Reference:** [@sapiom/axios](/reference/sdk/axios.md) | [@sapiom/fetch](/reference/sdk/fetch.md) | [SDK Reference](/reference/sdk.md) | [@sapiom/langchain-classic](/reference/sdk/langchain-classic.md) | [@sapiom/langchain](/reference/sdk/langchain.md) | [@sapiom/node-http](/reference/sdk/node-http.md)  
**Reference:** [Concepts](/reference/concepts.md)

---

Convert text to natural-sounding speech or generate sound effects — all through a single API with no account setup required.

## Quick Example

```typescript

// Create a Sapiom-tracked fetch function
const sapiomFetch = createFetch({
  apiKey: process.env.SAPIOM_API_KEY,
  agentName: "my-agent",
});

// Convert text to speech - SDK handles payment/auth automatically
const response = await sapiomFetch(
  "https://elevenlabs.services.sapiom.ai/v1/text-to-speech/EXAVITQu4vr4xnSDxMaL",
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      text: "Hello! Welcome to Sapiom. This is a test of the text-to-speech API.",
      model_id: "eleven_multilingual_v2",
    }),
  }
);

// Save the audio to a file
const buffer = await response.arrayBuffer();
fs.writeFileSync("output.mp3", Buffer.from(buffer));
console.log("Audio saved to output.mp3");
```

## How It Works

Sapiom routes audio requests to [ElevenLabs](https://elevenlabs.io), which provides state-of-the-art voice AI technology. The SDK handles payment negotiation automatically — you pay based on character count (TTS) or a flat rate (sound effects).

The service supports two operations:

1. **Text-to-Speech** — Convert text to natural-sounding audio
2. **Sound Effects** — Generate sound effects from text descriptions

## Provider

Powered by [ElevenLabs](https://elevenlabs.io). ElevenLabs provides industry-leading voice synthesis with natural intonation and emotional range across 29 languages.

## API Reference

### Text-to-Speech

**Endpoint:** `POST https://elevenlabs.services.sapiom.ai/v1/text-to-speech/{voiceId}`

Convert text to natural-sounding speech. The voice ID is specified in the URL path.

**Popular voice IDs:**
- `EXAVITQu4vr4xnSDxMaL` — Sarah (female, soft)
- `JBFqnCBsd6RMkjVDRZzb` — George (male, narrative)
- `21m00Tcm4TlvDq8ikWAM` — Rachel (female, calm)
- `AZnzlk1XvdvUeBnXmlld` — Domi (female, strong)

#### Request

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `text` | string | Yes | Text to convert to speech (max 5000 characters) |
| `model_id` | string | No | Model for synthesis (default: `eleven_multilingual_v2`) |
| `output_format` | string | No | Audio format (default: `mp3_44100_128`) |

**Output format options:**
- MP3: `mp3_22050_32`, `mp3_44100_64`, `mp3_44100_128`, `mp3_44100_192`
- PCM: `pcm_16000`, `pcm_22050`, `pcm_24000`, `pcm_44100`
- Opus: `opus_48000_64`, `opus_48000_128`

```json
{
  "text": "Welcome to our application. How can I help you today?",
  "model_id": "eleven_multilingual_v2"
}
```

#### Response

The response is binary audio data with the appropriate `Content-Type` header:

- `audio/mpeg` for MP3 formats
- `audio/pcm` for PCM formats
- `audio/basic` for μ-law/A-law formats

The `X-Character-Count` header contains the number of characters processed.

### Sound Effects

**Endpoint:** `POST https://elevenlabs.services.sapiom.ai/v1/sound-generation`

Generate sound effects from text descriptions.

#### Request

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `text` | string | Yes | Description of the sound effect to generate |
| `duration_seconds` | number | No | Duration in seconds, 0.5-22.0 (default: 2.0) |
| `prompt_influence` | number | No | How literally to follow the prompt, 0.0-1.0 (default: 0.3) |

```json
{
  "text": "Cinematic braam, horror atmosphere",
  "duration_seconds": 3.0,
  "prompt_influence": 0.5
}
```

#### Response

The response is binary MP3 audio data with `Content-Type: audio/mpeg`.

### Price Estimation

**Endpoints:**
- `POST https://elevenlabs.services.sapiom.ai/v1/text-to-speech/{voiceId}/price`
- `POST https://elevenlabs.services.sapiom.ai/v1/sound-generation/price`

Get the estimated cost before making a request. Accepts the same parameters as the main endpoint.

```json
{
  "price": "$0.012",
  "currency": "USD"
}
```

### List Voices

**Endpoint:** `GET https://elevenlabs.services.sapiom.ai/v2/voices`

List all available ElevenLabs voices. This endpoint is free and requires no payment.

```typescript
const { data } = await client.get("https://elevenlabs.services.sapiom.ai/v2/voices");

for (const voice of data.voices) {
  console.log(`${voice.name} (${voice.voice_id})`);
}
```

### Error Codes

| Code | Description |
|------|-------------|
| 400 | Invalid request — check parameters |
| 402 | Payment required — ensure you're using the Sapiom SDK |
| 404 | Voice or model not found |
| 413 | Text or audio too large |
| 429 | Rate limit exceeded |

## Complete Example

```typescript

const sapiomFetch = createFetch({
  apiKey: process.env.SAPIOM_API_KEY,
  agentName: "my-agent",
});

const baseUrl = "https://elevenlabs.services.sapiom.ai/v1";

async function createPodcastIntro(title: string, host: string) {
  // Generate podcast intro with TTS
  const script = `Welcome to ${title}. I'm your host, ${host}. Let's dive in.`;

  const response = await sapiomFetch(`${baseUrl}/text-to-speech/JBFqnCBsd6RMkjVDRZzb`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      text: script,
      model_id: "eleven_multilingual_v2",
    }),
  });

  return Buffer.from(await response.arrayBuffer());
}

async function generateTransitionSound() {
  // Create a custom sound effect
  const response = await sapiomFetch(`${baseUrl}/sound-generation`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      text: "Soft whoosh transition, podcast style",
      duration_seconds: 1.5,
    }),
  });

  return Buffer.from(await response.arrayBuffer());
}

// Usage
const introAudio = await createPodcastIntro("Tech Weekly", "Alex");
console.log("Intro audio size:", introAudio.byteLength, "bytes");

const transitionSfx = await generateTransitionSound();
console.log("Transition audio size:", transitionSfx.byteLength, "bytes");
```

## Pricing

| Operation | Price | Unit |
|-----------|-------|------|
| Text-to-Speech | $0.24 | per 1,000 characters |
| Sound Effects | $0.08 | flat per generation |

**Minimums:**
- Text-to-Speech: $0.001 minimum per request

**Example costs:**
- 500 character TTS: ~$0.12
- Sound effect: $0.08