Extract insights from audio: sentiment analysis, topic detection, and automatic summaries - all in one API call.
Go beyond transcription to understand WHAT was said and HOW it was said!
- How to enable sentiment analysis (positive/negative/neutral)
- Topic detection for content categorization
- Automatic summarization of conversations
- Combining intelligence features with transcription
- Speechmatics API Key: Get one from portal.speechmatics.com
- Python 3.8+
- Completed Configuration Guide
Step 1: Create and activate a virtual environment
On Windows:
cd python
python -m venv .venv
.venv\Scripts\activateOn Mac/Linux:
cd python
python3 -m venv .venv
source .venv/bin/activateStep 2: Install dependencies and run
pip install -r requirements.txt
cp ../.env.example .env
python main.pyNote
This example demonstrates audio intelligence by:
- Create JobConfig - Configure transcription with intelligence features
- Enable Sentiment Analysis - Detect emotional tone in speech
- Enable Topic Detection - Identify discussion topics automatically
- Enable Summarization - Generate bullet-point summaries
- Submit Job - Process audio with all intelligence features
- Wait for Completion - Job processes asynchronously
- Extract Results - Access transcript, sentiment, topics, and summary
Audio intelligence runs alongside transcription, enriching your results with insights.
Detect the emotional tone of speech segments:
from speechmatics.batch import (
AsyncClient,
JobConfig,
JobType,
TranscriptionConfig,
SentimentAnalysisConfig,
)
config = JobConfig(
type=JobType.TRANSCRIPTION,
transcription_config=TranscriptionConfig(language="en"),
sentiment_analysis_config=SentimentAnalysisConfig(),
)Output:
{
"text": "I'm really happy with this service!",
"sentiment": "positive",
"confidence": 0.95
}Use cases:
- Customer satisfaction analysis
- Call center quality monitoring
- Social media monitoring
- Market research
Automatically categorize content by topics:
from speechmatics.batch import (
AsyncClient,
JobConfig,
JobType,
TranscriptionConfig,
TopicDetectionConfig,
)
config = JobConfig(
type=JobType.TRANSCRIPTION,
transcription_config=TranscriptionConfig(language="en"),
topic_detection_config=TopicDetectionConfig(), # Default categories
)
# Or with custom topics:
config = JobConfig(
type=JobType.TRANSCRIPTION,
transcription_config=TranscriptionConfig(language="en"),
topic_detection_config=TopicDetectionConfig(
topics=["pricing", "deployment", "languages"]
),
)Output:
{
"topics": ["Business & Finance", "Education"]
}Use cases:
- Content categorization
- Meeting summarization
- Research analysis
- News monitoring
Generate concise summaries of long conversations:
from speechmatics.batch import (
AsyncClient,
JobConfig,
JobType,
TranscriptionConfig,
SummarizationConfig,
)
config = JobConfig(
type=JobType.TRANSCRIPTION,
transcription_config=TranscriptionConfig(language="en"),
summarization_config=SummarizationConfig(
content_type="conversational", # or "informative", "auto"
summary_length="brief", # or "detailed"
summary_type="paragraphs", # or "bullets"
),
)Configuration Options:
| Parameter | Values | Default | Description |
|---|---|---|---|
content_type |
"auto", "informative", "conversational" |
"auto" |
auto - Automatically selects based on transcript analysis conversational - Best for dialogues (calls, meetings, discussions) informative - Best for structured content (videos, podcasts, lectures, presentations) |
summary_length |
"brief", "detailed" |
"brief" |
brief - Succinct summary in a few sentences detailed - Longer, structured summary with sections |
summary_type |
"paragraphs", "bullets" |
"paragraphs" |
paragraphs - Summary as continuous text bullets - Summary as bullet points |
Examples:
# Brief conversational summary (calls, meetings)
SummarizationConfig(
content_type="conversational",
summary_length="brief",
summary_type="paragraphs"
)
# Detailed informative summary with bullets (lectures, presentations)
SummarizationConfig(
content_type="informative",
summary_length="detailed",
summary_type="bullets"
)
# Auto-detect with detailed summary
SummarizationConfig(
content_type="auto",
summary_length="detailed"
)Output Example (brief, paragraphs):
Customer called to inquire about product features.
Representative explained key capabilities and pricing.
Customer expressed satisfaction and requested follow-up documentation.
Output Example (detailed, bullets):
• Customer inquired about account permissions issue
• Representative identified role change as root cause
• Solution provided: restore admin role
• Additional topics discussed:
- New reporting feature demo offered
- 15-minute onboarding call scheduled
- Tutorial link to be sent via email
Use cases:
- Meeting notes
- Call summaries
- Content briefs
- Executive summaries
Combine all intelligence features:
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import (
AsyncClient,
JobConfig,
JobType,
TranscriptionConfig,
SentimentAnalysisConfig,
TopicDetectionConfig,
SummarizationConfig,
)
load_dotenv()
async def analyze_audio():
api_key = os.getenv("SPEECHMATICS_API_KEY")
audio_file = "call.wav"
async with AsyncClient(api_key=api_key) as client:
# Enable all intelligence features
config = JobConfig(
type=JobType.TRANSCRIPTION,
transcription_config=TranscriptionConfig(
language="en",
diarization="speaker", # Also identify speakers
),
sentiment_analysis_config=SentimentAnalysisConfig(),
topic_detection_config=TopicDetectionConfig(),
summarization_config=SummarizationConfig(
content_type="conversational",
summary_length="brief",
),
)
# Submit and wait for results
job = await client.submit_job(audio_file, config=config)
result = await client.wait_for_completion(job.id)
# Access intelligence data
print(f"Transcript: {result.transcript_text}")
if result.sentiment_analysis:
print(f"Sentiment: {result.sentiment_analysis}")
if result.topics:
print(f"Topics: {result.topics}")
if result.summary:
print(f"Summary: {result.summary.get('content')}")
asyncio.run(analyze_audio())| Field | Type | Description |
|---|---|---|
transcript_text |
str |
Full transcript as plain text |
format |
str |
JSON format version |
job |
JobInfo |
Job metadata and information |
metadata |
RecognitionMetadata |
Recognition process metadata |
results |
list[RecognitionResult] |
Detailed results with timing |
| Field | Type | Enabled By | Package | Description |
|---|---|---|---|---|
sentiment_analysis |
dict |
SentimentAnalysisConfig() |
batch |
Sentiment per segment + summary |
topics |
dict |
TopicDetectionConfig() |
batch |
Topic categorization + counts |
summary |
dict |
SummarizationConfig() |
batch |
Auto-generated summary |
chapters |
list[dict] |
AutoChaptersConfig() |
batch |
Auto-generated chapter markers |
translations |
dict |
TranslationConfig() |
batch, rt |
Translations by language code |
audio_events |
list[dict] |
AudioEventsConfig() |
batch, rt |
Music, laughter, etc. with timestamps |
audio_event_summary |
dict |
AudioEventsConfig() |
batch |
Summary of audio events |
{
"segments": [
{
"text": "I'm really happy with this service!",
"sentiment": "positive",
"confidence": 0.95,
"start_time": 10.5,
"end_time": 12.3
}
],
"summary": { ... } # Summary stats if available
}Access pattern:
if result.sentiment_analysis:
segments = result.sentiment_analysis.get('segments', [])
for segment in segments:
sentiment = segment.get('sentiment') # 'positive', 'negative', 'neutral'Two modes of topic detection:
Mode 1: Auto-detect (Default 10 Categories)
# Detect from standard categories
config = JobConfig(
type=JobType.TRANSCRIPTION,
transcription_config=TranscriptionConfig(language="en"),
topic_detection_config=TopicDetectionConfig(), # No topics specified
)Mode 2: Custom Topic List
# Detect specific custom topics
config = JobConfig(
type=JobType.TRANSCRIPTION,
transcription_config=TranscriptionConfig(language="en"),
topic_detection_config=TopicDetectionConfig(
topics=["pricing", "deployment", "languages"] # Custom topics
),
)Response structure:
{
"segments": [
{
"text": "...",
"topics": [{"topic": "Business & Finance"}],
"start_time": 20.76,
"end_time": 27.88
}
],
"summary": {
"overall": {
# When using default detection: All 10 categories with counts
"Business & Finance": 2,
"Education": 1,
"Entertainment": 0,
"Events & Attractions": 0,
"Food & Drink": 0,
"News & Politics": 0,
"Science": 0,
"Sports": 0,
"Technology & Computing": 0,
"Travel": 0
# When using custom topics: Your specified topics with counts
# "pricing": 5,
# "deployment": 2,
# "languages": 3
}
}
}Structure:
segments- Array of topic assignments per text segmentsummary.overall- Contains topic counts- Default mode: All 10 standard categories with counts
- Custom mode: Your specified topics with counts
- Categories/topics with
count > 0indicate detected topics - Categories/topics with
count = 0were not detected
Access pattern:
if result.topics:
# Get overall topic counts
overall = result.topics.get('summary', {}).get('overall', {})
# Filter to only detected topics (count > 0)
detected = [topic for topic, count in overall.items() if count > 0]
# Access specific topic count
finance_count = overall.get('Business & Finance', 0)
pricing_count = overall.get('pricing', 0) # For custom topicsDefault Topic Categories (10 total): When no custom topics are specified, these categories are detected:
- Business & Finance
- Education
- Entertainment
- Events & Attractions
- Food & Drink
- News & Politics
- Science
- Sports
- Technology & Computing
- Travel
{
"content": "Customer called to inquire about account permissions...",
# Or with bullets:
# "content": "• Point 1\n• Point 2\n• Point 3"
}Configuration:
SummarizationConfig(
content_type="conversational", # "auto", "informative", "conversational"
summary_length="brief", # "brief", "detailed"
summary_type="paragraphs", # "paragraphs", "bullets"
)Access pattern:
if result.summary:
content = result.summary.get('content', '')
print(f"Summary: {content}")[
{
"start_time": 0.0,
"end_time": 120.5,
"title": "Introduction and Problem Statement"
},
{
"start_time": 120.5,
"end_time": 245.0,
"title": "Solution Discussion"
}
][
{
"type": "music",
"start_time": 0.0,
"end_time": 5.2
},
{
"type": "laughter",
"start_time": 45.3,
"end_time": 46.1
}
]Event types: music, laughter, applause, etc.
Note
About SPEAKER UU: The label "UU" (Unknown/Unidentified) appears when speaker diarization is disabled. To get distinct speaker labels like S1, S2, etc., enable diarization with diarization="speaker" in your TranscriptionConfig. See Configuration Guide for details.
When you run the audio intelligence example, you'll see:
======================================================================
AUDIO INTELLIGENCE - Sentiment + Summaries
======================================================================
Transcribing with audio intelligence...
- Sentiment analysis
- Topic Detection
- Summarization
Job ID: 7zm3vjhm8p
Processing...
======================================================================
RESULTS
======================================================================
Transcript:
----------------------------------------------------------------------
SPEAKER UU: Hi. Thanks for calling our housing support. This is Alex. How can I help you today? Hi, Alex. I'm having trouble accessing my team dashboard. It keeps showing a permissions error. Oh, I'm sorry to hear that. Jordan, let me take a look. Can I confirm the email address associated with your account? Sure. It's Jordan at Skyline Group.com. Perfect. Thank you. Uh, I can see that your account is active, but your team role was recently changed to editor instead of admin, which would explain the permission issue. Um, I can either update your role or send a request to your current admin to grant full access. Which would you prefer? Um, if you could update it for me, that would be great. I've just switched your role back to admin. Uh, could you please refresh your browser and try opening the dashboard again? Yep. It's working now. Thank you. Awesome. While I have you, would you like me to walk you through our new reporting feature? It lets you create custom analytics dashboards in just a few clicks. Yeah, sure, that sounds super useful. I'll send you a quick tutorial link via email and if you like, we can schedule a 15 minute onboarding call to go over the advanced settings. Yeah, that would be perfect. I've booked you for tomorrow at 10 a.m. you'll get a confirmation email shortly. Is there anything else I can help you with today? No. That's all. Thanks again. Alex. You're welcome. Jordan, thanks for calling housing support and have a productive day.
----------------------------------------------------------------------
Sentiment: neutral
Topics:
• Business & Finance
• Technology & Computing
Summary:
Key Topics:
- Team dashboard access
- Permissions error
- Role change (editor vs admin)
- Reporting feature tutorial
- Onboarding call scheduling
Discussion:
- Jordan reported a permissions error when accessing the team dashboard for Skyline Group.
- Alex identified that Jordan's role was changed from admin to editor, causing restricted access.
- Alex updated Jordan's role back to admin, resolving the dashboard access issue.
- Alex offered to introduce Jordan to a new reporting feature that enables custom analytics dashboards.
- Alex sent a tutorial link via email and scheduled a 15-minute onboarding call for tomorrow at 10 a.m. to review advanced settings.
Audio intelligence analysis complete!
- Transcript - Full conversation with speaker labels
- Sentiment - Overall emotional tone (positive/negative/neutral)
- Topics - Detected categories from the 10 standard topics
- Summary - Structured summary with:
- Key topics section (bullet points)
- Detailed discussion points (when using
summary_length="detailed")
The summary format changes based on your config:
summary_type="bullets"- Structured with sections and bullet pointssummary_type="paragraphs"- Continuous narrative textsummary_length="brief"- Few sentencessummary_length="detailed"- Comprehensive breakdown with sections
Sentiment Analysis:
- Segment-level emotion detection
- Positive, negative, neutral classification
- Confidence scores for each segment
Topic Detection:
- 10 standard topic categories
- Automatic topic identification
- Topic counts and distribution
Summarization:
- Configurable summary types (bullets/paragraphs)
- Length control (brief/detailed)
- Content type optimization (conversational/informational)
Job Management:
- Asynchronous batch processing
- Job status tracking with wait_for_completion
- Timeout handling for long files
"Job timed out"
- Increase timeout parameter in
wait_for_completion(timeout=600) - Check job status manually using
get_job_status(job_id) - Very large files may take several minutes
"No sentiment detected"
- Sentiment requires clear emotional cues in speech
- Works best with conversational audio
- May return neutral for factual/monotone content
"Summary too short/long"
- Adjust
summary_lengthparameter ("brief" vs "detailed") - Brief: 1-2 sentences per key point
- Detailed: Comprehensive multi-section breakdown
"Topics not relevant"
- Topics are from 10 standard categories
- Best for general conversation, meetings, calls
- May not match highly specialized domain content
- Multilingual & Translation - Work across languages
- Turn Detection - Real-time turn detection for conversations
- Voice Agent Turn Detection - Advanced presets for voice agents
Help us improve this guide:
- Found an issue? Report it
- Have suggestions? Open a discussion
Time to Complete: 15 minutes Difficulty: Intermediate API Mode: Batch (sentiment, topics, summary, chapters) | Batch + RT (translations, audio events)