A real-time voice conversational AI assistant that listens, understands, thinks, and speaks. Built for Apple Silicon Macs with local-first processing.
- 🎤 Real-time Speech Recognition - MLX Whisper (local, GPU-accelerated on Apple Silicon)
- 🧠 Intelligent Responses - Choose between:
- Groq LLM (ultra-fast, cloud-based, free tier)
- Local LLaMA via llama.cpp (fully offline, runs locally)
- 🔊 Natural Speech Synthesis - Kokoro TTS or macOS fallback
- ⏱️ Performance Metrics - Real-time timing for each step (recording, transcribe, LLM, TTS)
- 💬 Conversation Memory - Multi-turn conversation with context
- 🛡️ Error Resilience - Automatic fallbacks and retry logic
| Component | Technology | Notes |
|---|---|---|
| STT | MLX Whisper | Local GPU-accelerated, no auth needed |
| LLM | Groq or llama.cpp | Switch between cloud/local |
| TTS | Kokoro or macOS say | Fallback to built-in TTS |
| Audio | sounddevice, scipy | Cross-platform audio capture/playback |
| Framework | Python 3.12+ | Async-ready, minimal dependencies |
- macOS (tested on Apple Silicon / M-series)
- Python 3.10+
- pip package manager
-
Clone or navigate to the project:
cd /path/to/voice-agent -
Create virtual environment:
python -m venv .venv source .venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
Or manually:
pip install numpy sounddevice soundfile scipy mlx-whisper groq requests python-dotenv
-
Install TTS (optional but recommended):
# Kokoro TTS (better quality) pip install kokoro-onnx # Or use macOS built-in 'say' command (fallback, no install needed)
-
Install local LLM (optional):
# For offline LLM, install llama.cpp brew install llama.cpp
Create a .env file in the project root:
# Groq API Key (get from https://console.groq.com)
GROQ_API_KEY=your_groq_api_key_here
# Hugging Face Token (optional, for model access)
HF_TOKEN=your_hf_token_hereEdit agent.py to customize:
# ─── CONFIG ───
SAMPLE_RATE = 16000 # Audio sample rate
SILENCE_THRESHOLD = 0.01 # Silence detection threshold
SILENCE_DURATION = 1.5 # Seconds to wait before ending recording
GROQ_MODEL = "meta-llama/..." # Groq model ID
# LLM Selection
USE_LOCAL_LLM = False # True = local llama.cpp, False = Groq
LOCAL_LLM_URL = "http://127.0.0.1:8080" # llama.cpp endpointUSE_LOCAL_LLM = FalseSetup:
- Get free API key from console.groq.com
- Add to
.env:GROQ_API_KEY=your_key - No additional setup needed!
USE_LOCAL_LLM = True
LOCAL_LLM_URL = "http://127.0.0.1:8080"Setup:
- Start llama.cpp server:
# Using ollama ollama serve # Or direct llama.cpp ./llama-server -m path/to/model.gguf -p "port 8080"
- Verify it's running:
curl http://127.0.0.1:8080/v1/models
source .venv/bin/activate
python agent.py==================================================
🤖 Voice Bot Ready! (Ctrl+C to quit)
STT: MLX Whisper (mlx-community/whisper-tiny)
LLM: Groq (meta-llama/llama-4-scout-17b-16e-instruct)
TTS: Kokoro / macOS say
==================================================
🎤 Listening... (speak now)
📝 Captured 1.6s of audio
🗣️ You: Hello, how are you?
⏱️ Recording: 1.80s | Transcribe: 0.35s
🤖 Bot: I'm doing great, thanks for asking! How can I help you today?
⏱️ LLM: 0.82s
🔊 Using Kokoro TTS
⏱️ TTS: 2.34s
⏱️ Total: 5.31s
--------------------------------------------------
Press Ctrl+C to exit gracefully.
The agent prints real-time timing for each step:
- Recording - Time to capture audio until silence detected
- Transcribe - STT conversion (audio → text)
- LLM - Time to generate response
- TTS - Text-to-speech synthesis
- Total - Complete conversation cycle
Problem: MLX Whisper can't download model
Solutions:
- Set
HF_TOKENin.envwith your Hugging Face token - Or run
huggingface-cli loginonce - Or use a smaller model:
WHISPER_MODEL = "mlx-community/whisper-tiny"
Problem: PortAudioError: Error starting stream
Solutions:
- Check microphone is connected:
System Settings → Sound → Input - Restart the agent (automatic retry included)
- Try different input device (check
sounddevice.query_devices())
Problem: Kokoro failed: [Errno 2] No such file or directory
Solution:
- Install:
pip install kokoro-onnx - Or let it fall back to macOS
saycommand automatically
Problem: Cannot connect to llama.cpp at http://127.0.0.1:8080
Solutions:
- Start the llama.cpp server (see Configuration section)
- Check URL is correct and port 8080 is open
- Run
curl http://127.0.0.1:8080/v1/modelsto verify
Problem: Keeps saying "(no speech detected, listening again...)"
Solutions:
- Adjust
SILENCE_THRESHOLD(make it more sensitive):SILENCE_THRESHOLD = 0.005 # Lower = more sensitive
- Check microphone volume
- Speak louder or closer to mic
voice-agent/
├── agent.py # Main voice agent logic
├── main.py # Alternative entry point (optional)
├── README.md # This file
├── pyproject.toml # Project metadata
├── .env # Configuration (create this)
├── .venv/ # Virtual environment
└── requirements.txt # Python dependencies (create with pip freeze)
source .venv/bin/activate
pip freeze > requirements.txtAdd to agent.py:
import logging
logging.basicConfig(level=logging.DEBUG)- Groq API: https://console.groq.com/keys
- Hugging Face Models: https://huggingface.co/mlx-community
- MLX Whisper: https://github.com/ml-explore/mlx-examples
- llama.cpp: https://github.com/ggerganov/llama.cpp
- Kokoro TTS: https://github.com/thewh1teagle/kokoro-onnx
- Listen 🎤 → Records audio until silence detected
- Transcribe 📝 → MLX Whisper converts audio to text
- Think 🧠 → LLM generates intelligent response
- Speak 🔊 → TTS converts response back to speech
- Repeat ↩️ → Maintains conversation history
Each step is timed and logged for performance analysis.
- Faster responses: Use Groq instead of local LLM
- Offline mode: Use local llama.cpp (slower but no cloud)
- Lower latency: Use
whisper-tiny(already set) - Better quality: Switch to
whisper-small(slower)
MIT
Contributions welcome! Areas for improvement:
- Multi-language support
- Custom wake words
- Streaming responses
- Long-term memory (persistent context)
Made for Apple Silicon Macs 🍎 | Local-first AI 🔐 | Real-time voice ⚡