🤖 Local Voice Agent

A real-time voice conversational AI assistant that listens, understands, thinks, and speaks. Built for Apple Silicon Macs with local-first processing.

🎯 Features

🎤 Real-time Speech Recognition - MLX Whisper (local, GPU-accelerated on Apple Silicon)
🧠 Intelligent Responses - Choose between:
- Groq LLM (ultra-fast, cloud-based, free tier)
- Local LLaMA via llama.cpp (fully offline, runs locally)
🔊 Natural Speech Synthesis - Kokoro TTS or macOS fallback
⏱️ Performance Metrics - Real-time timing for each step (recording, transcribe, LLM, TTS)
💬 Conversation Memory - Multi-turn conversation with context
🛡️ Error Resilience - Automatic fallbacks and retry logic

📋 Tech Stack

Component	Technology	Notes
STT	MLX Whisper	Local GPU-accelerated, no auth needed
LLM	Groq or llama.cpp	Switch between cloud/local
TTS	Kokoro or macOS say	Fallback to built-in TTS
Audio	sounddevice, scipy	Cross-platform audio capture/playback
Framework	Python 3.12+	Async-ready, minimal dependencies

🚀 Quick Start

Prerequisites

macOS (tested on Apple Silicon / M-series)
Python 3.10+
pip package manager

Installation

Clone or navigate to the project:
```
cd /path/to/voice-agent
```

Create virtual environment:

python -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Or manually:

pip install numpy sounddevice soundfile scipy mlx-whisper groq requests python-dotenv

Install TTS (optional but recommended):

# Kokoro TTS (better quality)
pip install kokoro-onnx

# Or use macOS built-in 'say' command (fallback, no install needed)

Install local LLM (optional):

# For offline LLM, install llama.cpp
brew install llama.cpp

Environment Setup

Create a .env file in the project root:

# Groq API Key (get from https://console.groq.com)
GROQ_API_KEY=your_groq_api_key_here

# Hugging Face Token (optional, for model access)
HF_TOKEN=your_hf_token_here

⚙️ Configuration

Edit agent.py to customize:

# ─── CONFIG ───
SAMPLE_RATE = 16000              # Audio sample rate
SILENCE_THRESHOLD = 0.01         # Silence detection threshold
SILENCE_DURATION = 1.5           # Seconds to wait before ending recording
GROQ_MODEL = "meta-llama/..."   # Groq model ID

# LLM Selection
USE_LOCAL_LLM = False            # True = local llama.cpp, False = Groq
LOCAL_LLM_URL = "http://127.0.0.1:8080"  # llama.cpp endpoint

LLM Options

Option 1: Groq (Recommended for Quick Start)

USE_LOCAL_LLM = False

Setup:

Get free API key from console.groq.com
Add to .env: GROQ_API_KEY=your_key
No additional setup needed!

Option 2: Local LLaMA via llama.cpp

USE_LOCAL_LLM = True
LOCAL_LLM_URL = "http://127.0.0.1:8080"

Setup:

Start llama.cpp server:

# Using ollama
ollama serve

# Or direct llama.cpp
./llama-server -m path/to/model.gguf -p "port 8080"

Verify it's running: curl http://127.0.0.1:8080/v1/models

▶️ Running the Agent

Basic Run

source .venv/bin/activate
python agent.py

Output Example

==================================================
🤖 Voice Bot Ready! (Ctrl+C to quit)
   STT: MLX Whisper (mlx-community/whisper-tiny)
   LLM: Groq (meta-llama/llama-4-scout-17b-16e-instruct)
   TTS: Kokoro / macOS say
==================================================
🎤 Listening... (speak now)
📝 Captured 1.6s of audio
🗣️  You: Hello, how are you?
   ⏱️  Recording: 1.80s | Transcribe: 0.35s
🤖 Bot: I'm doing great, thanks for asking! How can I help you today?
   ⏱️  LLM: 0.82s
   🔊 Using Kokoro TTS
   ⏱️  TTS: 2.34s
   ⏱️  Total: 5.31s
--------------------------------------------------

Stop the Agent

Press Ctrl+C to exit gracefully.

📊 Performance Metrics

The agent prints real-time timing for each step:

Recording - Time to capture audio until silence detected
Transcribe - STT conversion (audio → text)
LLM - Time to generate response
TTS - Text-to-speech synthesis
Total - Complete conversation cycle

🔧 Troubleshooting

❌ "401 Unauthorized" - Hugging Face Auth

Problem: MLX Whisper can't download model

Solutions:

Set HF_TOKEN in .env with your Hugging Face token
Or run huggingface-cli login once
Or use a smaller model: WHISPER_MODEL = "mlx-community/whisper-tiny"

❌ Audio Hardware Error

Problem: PortAudioError: Error starting stream

Solutions:

Check microphone is connected: System Settings → Sound → Input
Restart the agent (automatic retry included)
Try different input device (check sounddevice.query_devices())

❌ Kokoro TTS Not Found

Problem: Kokoro failed: [Errno 2] No such file or directory

Solution:

Install: pip install kokoro-onnx
Or let it fall back to macOS say command automatically

❌ Can't Connect to llama.cpp

Problem: Cannot connect to llama.cpp at http://127.0.0.1:8080

Solutions:

Start the llama.cpp server (see Configuration section)
Check URL is correct and port 8080 is open
Run curl http://127.0.0.1:8080/v1/models to verify

❌ No Speech Detected

Problem: Keeps saying "(no speech detected, listening again...)"

Solutions:

Adjust SILENCE_THRESHOLD (make it more sensitive):

SILENCE_THRESHOLD = 0.005  # Lower = more sensitive

Check microphone volume
Speak louder or closer to mic

📁 Project Structure

voice-agent/
├── agent.py           # Main voice agent logic
├── main.py            # Alternative entry point (optional)
├── README.md          # This file
├── pyproject.toml     # Project metadata
├── .env              # Configuration (create this)
├── .venv/            # Virtual environment
└── requirements.txt   # Python dependencies (create with pip freeze)

🛠️ Development

Generate requirements.txt

source .venv/bin/activate
pip freeze > requirements.txt

Enable Debug Logging

Add to agent.py:

import logging
logging.basicConfig(level=logging.DEBUG)

📝 API References

Groq API: https://console.groq.com/keys
Hugging Face Models: https://huggingface.co/mlx-community
MLX Whisper: https://github.com/ml-explore/mlx-examples
llama.cpp: https://github.com/ggerganov/llama.cpp
Kokoro TTS: https://github.com/thewh1teagle/kokoro-onnx

🎓 How It Works

Listen 🎤 → Records audio until silence detected
Transcribe 📝 → MLX Whisper converts audio to text
Think 🧠 → LLM generates intelligent response
Speak 🔊 → TTS converts response back to speech
Repeat ↩️ → Maintains conversation history

Each step is timed and logged for performance analysis.

⚡ Performance Tips

Faster responses: Use Groq instead of local LLM
Offline mode: Use local llama.cpp (slower but no cloud)
Lower latency: Use whisper-tiny (already set)
Better quality: Switch to whisper-small (slower)

📄 License

MIT

🤝 Contributing

Contributions welcome! Areas for improvement:

Multi-language support
Custom wake words
Streaming responses
Long-term memory (persistent context)

Made for Apple Silicon Macs 🍎 | Local-first AI 🔐 | Real-time voice ⚡

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🤖 Local Voice Agent

🎯 Features

📋 Tech Stack

🚀 Quick Start

Prerequisites

Installation

Environment Setup

⚙️ Configuration

LLM Options

Option 1: Groq (Recommended for Quick Start)

Option 2: Local LLaMA via llama.cpp

▶️ Running the Agent

Basic Run

Output Example

Stop the Agent

📊 Performance Metrics

🔧 Troubleshooting

❌ "401 Unauthorized" - Hugging Face Auth

❌ Audio Hardware Error

❌ Kokoro TTS Not Found

❌ Can't Connect to llama.cpp

❌ No Speech Detected

📁 Project Structure

🛠️ Development

Generate requirements.txt

Enable Debug Logging

📝 API References

🎓 How It Works

⚡ Performance Tips

📄 License

🤝 Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages