An AI-powered technical interviewer that conducts voice-based interviews about your project. Built with LiveKit for real-time voice communication and RAG (Retrieval Augmented Generation) for context-aware questioning.
Full technical walkthrough: Watch on YouTube
The video covers:
- Token generation with capacity management
- Function calling for on-demand RAG
- RAG setup with ChromaDB
- Trade-offs discussion
- Live demo showing the voice interview in action
- 🎤 Real-time Voice Interview: Natural conversation with AI interviewer using LiveKit's voice pipeline
- 📄 RAG-Powered Questions: AI reads your project documentation to ask informed, specific questions
- 📝 Live Transcription: See conversation transcript in real-time as you speak
- 🎯 Intelligent Follow-ups: Agent challenges vague answers and probes for technical depth
- 📊 Structured Feedback: Receive detailed feedback on strengths and areas for improvement
- 🔒 Capacity Management: Controlled concurrency to manage costs (max 5 concurrent sessions)
- Python 3.11+ with
uvpackage manager (install uv) - Node.js 18+ and npm
- LiveKit Cloud Account (sign up free)
- API Keys:
- OpenAI API key (for LLM and embeddings) - Get your API key
- LiveKit credentials (URL, API key, API secret) - Sign up free
# Clone the repository
git clone <your-repo-url>
cd backend
# Install backend dependencies
uv sync
# Install frontend dependencies
cd frontend
npm install
cd ..Create .env.local in the backend/ directory:
# LiveKit Configuration
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
# OpenAI Configuration
OPENAI_API_KEY=sk-your-openai-keyNote: See
.env.examplefor a template.
Place your project documentation PDF in backend/data/project_doc_long.pdf, then run:
cd backend
uv run ingest.pyExpected output:
Split 1 documents into 147 chunks
Stored 15 chunks in Chroma at data/chroma_db
You need three terminals running simultaneously:
cd backend
uv run token_server.pyExpected output:
Token Server Started!
cd backend
uv run agent.py devExpected output:
Agent Server Starting...
============================================================
LiveKit URL: wss://your-project.livekit.cloud
Connect at: https://meet.livekit.io
============================================================
cd frontend
npm run devExpected output:
VITE v5.x.x ready in xxx ms
➜ Local: http://localhost:5173/
➜ Network: use --host to expose
- Open the app: Navigate to
http://localhost:5173in your browser - Start interview: Click the "Start Interview" button
- Grant permissions: Allow microphone access when prompted
- Wait for greeting: The AI interviewer will introduce itself (2-5 seconds)
- Speak naturally: Answer questions as you would in a real interview
- View transcript: Click "Transcript" button to see live conversation text
- End interview: Click "End Interview" when finished
- Speak clearly: Use a good microphone in a quiet environment
- Be specific: Vague answers will trigger follow-up questions
- Pause naturally: VAD detects when you stop speaking (~500ms silence)
- Check browser: Chrome and Edge have best WebRTC support
interview-agent/
├── backend/ # Backend Python code
│ ├── agent.py # Main AI agent with voice pipeline
│ ├── token_server.py # FastAPI server for LiveKit tokens
│ ├── tools.py # Function tools (RAG search, feedback)
│ ├── rag.py # Vector store and retrieval logic
│ ├── prompts.py # System prompts for interviewer
│ ├── ingest.py # Script to load docs into vector DB
│ ├── data/
│ │ ├── project_doc_long.pdf # Your project documentation
│ │ └── chroma_db/ # Vector database (created by ingest.py)
│ └── .env.local # API keys and configuration
├── frontend/ # React frontend
│ ├── src/
│ │ ├── App.tsx # Main app with connection logic
│ │ ├── components/
│ │ │ ├── CallInterface.tsx # Call UI and controls
│ │ │ ├── Transcript.tsx # Live transcription display
│ │ │ ├── AudioVisualizer.tsx # Audio waveform visualization
│ │ │ └── VideoRenderer.tsx # Video track rendering
│ │ ├── types.ts # TypeScript type definitions
│ │ └── index.css # Global styles
│ └── package.json
├── DESIGN.md # Architecture and design decisions
└── README.md # This file
User speaks → VAD detects speech → STT transcribes → LLM processes → TTS speaks → User hears
↓ ↓
LiveKit RAG System
- Voice Activity Detection (VAD): Silero model detects when you start/stop speaking
- Speech-to-Text (STT): AssemblyAI transcribes your speech to text
- RAG Search: Agent searches your project docs using semantic search (ChromaDB + OpenAI embeddings)
- LLM Processing: GPT-4o-mini generates contextual follow-up questions
- Text-to-Speech (TTS): Cartesia Sonic synthesizes agent's response
- Audio Delivery: LiveKit streams audio back to your browser
project_doc_long.pdf → Chunking (1000 chars, 200 overlap) → Embeddings → ChromaDB
↓
Agent asks question → Semantic search → Relevant chunks → LLM context
Deduplication: Uses content fingerprinting (first 150 chars) to remove duplicate chunks from retrieval results.
The RAG system is integrated into the agent's decision-making through function calling:
- Document Ingestion: PDFs are chunked and embedded into ChromaDB (see RAG Settings)
- On-Demand Retrieval: When the agent needs project-specific context, it calls the
search_project_docsfunction tool - Context Injection: Retrieved chunks are added to the LLM context, enabling informed follow-up questions
- Deduplication: Overlapping chunks are filtered using content fingerprinting to reduce redundancy
For detailed RAG architecture and assumptions, see DESIGN.md - RAG Integration Details and DESIGN.md - Design Decisions.
- LiveKit Agents SDK: Voice pipeline, WebRTC handling, STT/TTS integration
- OpenAI GPT-4o-mini: LLM for conversation and embeddings
- AssemblyAI: Speech-to-text transcription
- Cartesia Sonic: Text-to-speech synthesis
- LangChain: Document processing and RAG utilities
- ChromaDB: Local vector database for semantic search
- FastAPI: Token server for LiveKit authentication
- Silero VAD: Voice activity detection
- React + TypeScript: UI framework
- Vite: Build tool and dev server
- LiveKit React SDK: WebRTC components and hooks
For detailed design decisions and trade-offs, see DESIGN.md.
Note: This is a summary. For detailed analysis of trade-offs, limitations, and alternatives considered, see DESIGN.md.
- Hosting: Local development setup; production deployment requires infrastructure changes (see DESIGN.md - Deployment Recommendations)
- RAG: ChromaDB suitable for small-to-medium document collections; production may need Pinecone/Weaviate
- Concurrency: 5 concurrent sessions limit for cost control
- Voice Pipeline: LiveKit handles all WebRTC complexity
- RAG Chunking: 1000 chars, 200 overlap (see DESIGN.md - RAG Chunk Size)
- Vector DB: ChromaDB (local) - see DESIGN.md - Vector Store for production alternatives
- LLM: GPT-4o-mini for speed/cost balance (see DESIGN.md - LLM Choice)
- Session Management: 15-minute idle timeout, 3-second empty room grace period
- Maximum 5 concurrent sessions (configurable)
- ChromaDB not suitable for production scale (thousands of docs)
- No authentication in current setup (development only)
- See DESIGN.md for complete trade-offs and limitations
Base URL: http://localhost:8000
Returns the LiveKit WebSocket URL.
Response:
{
"url": "wss://your-project.livekit.cloud"
}Generates a LiveKit access token for the specified room and username.
Parameters:
room(string): Room nameusername(string): Participant identity
Response:
{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}Error (429):
{
"detail": "Maximum number of interviews reached. Please try again later."
}Checks if there's capacity for a new session.
Response:
{
"has_capacity": true,
"active_sessions": 2,
"max_sessions": 5,
"message": "Capacity available"
}MAX_CONCURRENT_SESSIONS = 5 # Maximum simultaneous interviews
IDLE_TIMEOUT = 900 # 15 minutes - auto-cleanup idle sessions
ACTIVITY_CHECK_INTERVAL = 60 # Check every 60 seconds
EMPTY_ROOM_GRACE_PERIOD = 3 # Wait 3 seconds before cleanupchunk_size = 1000 # Characters per chunk
chunk_overlap = 200 # Overlap between chunks
k = 4 # Number of chunks to retrieve
deduplicate = True # Remove duplicate chunksAgentSession(
stt="assemblyai/universal-streaming:en", # Speech-to-Text
llm="openai/gpt-4o-mini", # Language Model
tts="cartesia/sonic-3:9626c31c-...", # Text-to-Speech (British male)
vad=silero.VAD.load(), # Voice Activity Detection
)- Check: Is the token server running on port 8000?
- Fix: Run
cd backend && uv run token_server.pyin a separate terminal
- Check: Is the agent server running?
- Check: Browser microphone permissions granted?
- Fix: Open browser console (F12) and check for WebRTC errors
- Check: Did you run
uv run ingest.py? - Check: Does
backend/data/chroma_db/exist? - Fix: Re-run ingestion and restart agent
- Cause: 5 other interviews are active
- Fix: Wait for sessions to end, or increase
MAX_CONCURRENT_SESSIONSinagent.py
- Wait: First transcription may take 5-10 seconds
- Check: Open browser console (F12) for errors
- Note: Transcriptions appear after you stop speaking (VAD detects end-of-speech)
- Cause: OpenAI/AssemblyAI API quota exceeded
- Fix: Check your API key usage dashboards
- Prevention: Reduce
MAX_CONCURRENT_SESSIONSto control costs
This project was built using AI-assisted development:
- IDE: Cursor - AI-powered code editor
- AI Model: Claude Sonnet 4.5 for code generation, architecture design, and iterative development
- Workflow: Iterative development with AI pair programming for rapid prototyping and refinement
# Frontend linting
cd frontend
npm run lint
# Python formatting (if using black/ruff)
cd backend
uv run ruff format .- Frontend: Vite provides instant HMR (Hot Module Replacement)
- Backend: Restart agent/token server manually after code changes
- Authentication: Add OAuth/JWT to token server
- Rate limiting: Prevent token farming
- CORS restriction: Change from
*to specific domains - Input validation: Sanitize room names, usernames
- Secrets management: Use AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault
- HTTPS only: Force SSL for all connections
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is for educational purposes. Check with LiveKit, OpenAI, AssemblyAI, and Cartesia for their respective license terms.
- LiveKit: Real-time voice infrastructure and Agents SDK
- OpenAI: GPT-4o-mini for natural conversations and embeddings
- AssemblyAI: High-quality speech-to-text
- Cartesia: Ultra-low latency text-to-speech
- LangChain: RAG and document processing utilities
- ChromaDB: Vector database for semantic search
- Architecture & Design: See DESIGN.md for detailed design decisions, trade-offs, and system architecture
- Quick Reference: See Configuration Options for runtime settings
- Troubleshooting: See Troubleshooting for common issues
- Post-interview analytics dashboard
- Persistent interview history (database integration)
- Resume parsing for better context
- Multi-language support
- Video analysis (body language, eye contact)
- Custom interviewer voices
- Screen sharing for code walkthroughs
Built with ❤️ using LiveKit, OpenAI, and modern web technologies