Project Interview Coach

An AI-powered technical interviewer that conducts voice-based interviews about your project. Built with LiveKit for real-time voice communication and RAG (Retrieval Augmented Generation) for context-aware questioning.

📹 Video Walkthrough

Full technical walkthrough: Watch on YouTube

The video covers:

Token generation with capacity management
Function calling for on-demand RAG
RAG setup with ChromaDB
Trade-offs discussion
Live demo showing the voice interview in action

Features

🎤 Real-time Voice Interview: Natural conversation with AI interviewer using LiveKit's voice pipeline
📄 RAG-Powered Questions: AI reads your project documentation to ask informed, specific questions
📝 Live Transcription: See conversation transcript in real-time as you speak
🎯 Intelligent Follow-ups: Agent challenges vague answers and probes for technical depth
📊 Structured Feedback: Receive detailed feedback on strengths and areas for improvement
🔒 Capacity Management: Controlled concurrency to manage costs (max 5 concurrent sessions)

Quick Start

Prerequisites

Python 3.11+ with uv package manager (install uv)
Node.js 18+ and npm
LiveKit Cloud Account (sign up free)
API Keys:
- OpenAI API key (for LLM and embeddings) - Get your API key
- LiveKit credentials (URL, API key, API secret) - Sign up free

Installation

# Clone the repository
git clone <your-repo-url>
cd backend

# Install backend dependencies
uv sync

# Install frontend dependencies
cd frontend
npm install
cd ..

Configuration

Create .env.local in the backend/ directory:

# LiveKit Configuration
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
# OpenAI Configuration
OPENAI_API_KEY=sk-your-openai-key

Note: See .env.example for a template.

Ingest Your Project Documentation

Place your project documentation PDF in backend/data/project_doc_long.pdf, then run:

cd backend
uv run ingest.py

Expected output:

Split 1 documents into 147 chunks
Stored 15 chunks in Chroma at data/chroma_db

Running the Application

You need three terminals running simultaneously:

Terminal 1: Token Server

cd backend
uv run token_server.py

Expected output:

Token Server Started!

Terminal 2: AI Agent

cd backend
uv run agent.py dev

Expected output:

Agent Server Starting...
============================================================
LiveKit URL: wss://your-project.livekit.cloud
Connect at: https://meet.livekit.io
============================================================

Terminal 3: Frontend

cd frontend
npm run dev

Expected output:

  VITE v5.x.x  ready in xxx ms

  ➜  Local:   http://localhost:5173/
  ➜  Network: use --host to expose

Usage

Open the app: Navigate to http://localhost:5173 in your browser
Start interview: Click the "Start Interview" button
Grant permissions: Allow microphone access when prompted
Wait for greeting: The AI interviewer will introduce itself (2-5 seconds)
Speak naturally: Answer questions as you would in a real interview
View transcript: Click "Transcript" button to see live conversation text
End interview: Click "End Interview" when finished

Tips for Best Experience

Speak clearly: Use a good microphone in a quiet environment
Be specific: Vague answers will trigger follow-up questions
Pause naturally: VAD detects when you stop speaking (~500ms silence)
Check browser: Chrome and Edge have best WebRTC support

Project Structure

interview-agent/
├── backend/                # Backend Python code
│   ├── agent.py           # Main AI agent with voice pipeline
│   ├── token_server.py    # FastAPI server for LiveKit tokens
│   ├── tools.py           # Function tools (RAG search, feedback)
│   ├── rag.py             # Vector store and retrieval logic
│   ├── prompts.py         # System prompts for interviewer
│   ├── ingest.py          # Script to load docs into vector DB
│   ├── data/
│   │   ├── project_doc_long.pdf      # Your project documentation
│   │   └── chroma_db/           # Vector database (created by ingest.py)
│   └── .env.local         # API keys and configuration
├── frontend/              # React frontend
│   ├── src/
│   │   ├── App.tsx              # Main app with connection logic
│   │   ├── components/
│   │   │   ├── CallInterface.tsx    # Call UI and controls
│   │   │   ├── Transcript.tsx       # Live transcription display
│   │   │   ├── AudioVisualizer.tsx  # Audio waveform visualization
│   │   │   └── VideoRenderer.tsx    # Video track rendering
│   │   ├── types.ts             # TypeScript type definitions
│   │   └── index.css            # Global styles
│   └── package.json
├── DESIGN.md              # Architecture and design decisions
└── README.md              # This file

How It Works

Voice Pipeline

User speaks → VAD detects speech → STT transcribes → LLM processes → TTS speaks → User hears
     ↓                                                      ↓
   LiveKit                                          RAG System

Voice Activity Detection (VAD): Silero model detects when you start/stop speaking
Speech-to-Text (STT): AssemblyAI transcribes your speech to text
RAG Search: Agent searches your project docs using semantic search (ChromaDB + OpenAI embeddings)
LLM Processing: GPT-4o-mini generates contextual follow-up questions
Text-to-Speech (TTS): Cartesia Sonic synthesizes agent's response
Audio Delivery: LiveKit streams audio back to your browser

RAG System

project_doc_long.pdf → Chunking (1000 chars, 200 overlap) → Embeddings → ChromaDB
                                                                          ↓
Agent asks question → Semantic search → Relevant chunks → LLM context

Deduplication: Uses content fingerprinting (first 150 chars) to remove duplicate chunks from retrieval results.

How RAG Was Integrated

The RAG system is integrated into the agent's decision-making through function calling:

Document Ingestion: PDFs are chunked and embedded into ChromaDB (see RAG Settings)
On-Demand Retrieval: When the agent needs project-specific context, it calls the search_project_docs function tool
Context Injection: Retrieved chunks are added to the LLM context, enabling informed follow-up questions
Deduplication: Overlapping chunks are filtered using content fingerprinting to reduce redundancy

For detailed RAG architecture and assumptions, see DESIGN.md - RAG Integration Details and DESIGN.md - Design Decisions.

Tools & Frameworks

Backend

LiveKit Agents SDK: Voice pipeline, WebRTC handling, STT/TTS integration
OpenAI GPT-4o-mini: LLM for conversation and embeddings
AssemblyAI: Speech-to-text transcription
Cartesia Sonic: Text-to-speech synthesis
LangChain: Document processing and RAG utilities
ChromaDB: Local vector database for semantic search
FastAPI: Token server for LiveKit authentication
Silero VAD: Voice activity detection

Frontend

React + TypeScript: UI framework
Vite: Build tool and dev server
LiveKit React SDK: WebRTC components and hooks

For detailed design decisions and trade-offs, see DESIGN.md.

Design Decisions & Assumptions

Note: This is a summary. For detailed analysis of trade-offs, limitations, and alternatives considered, see DESIGN.md.

Key Assumptions

Hosting: Local development setup; production deployment requires infrastructure changes (see DESIGN.md - Deployment Recommendations)
RAG: ChromaDB suitable for small-to-medium document collections; production may need Pinecone/Weaviate
Concurrency: 5 concurrent sessions limit for cost control
Voice Pipeline: LiveKit handles all WebRTC complexity

Quick Reference

RAG Chunking: 1000 chars, 200 overlap (see DESIGN.md - RAG Chunk Size)
Vector DB: ChromaDB (local) - see DESIGN.md - Vector Store for production alternatives
LLM: GPT-4o-mini for speed/cost balance (see DESIGN.md - LLM Choice)
Session Management: 15-minute idle timeout, 3-second empty room grace period

Limitations

Maximum 5 concurrent sessions (configurable)
ChromaDB not suitable for production scale (thousands of docs)
No authentication in current setup (development only)
See DESIGN.md for complete trade-offs and limitations

API Endpoints

Token Server (FastAPI)

Base URL: http://localhost:8000

`GET /livekit-url`

Returns the LiveKit WebSocket URL.

Response:

{
  "url": "wss://your-project.livekit.cloud"
}

`GET /token?room=<room>&username=<user>`

Generates a LiveKit access token for the specified room and username.

Parameters:

room (string): Room name
username (string): Participant identity

Response:

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}

Error (429):

{
  "detail": "Maximum number of interviews reached. Please try again later."
}

`GET /capacity-check`

Checks if there's capacity for a new session.

Response:

{
  "has_capacity": true,
  "active_sessions": 2,
  "max_sessions": 5,
  "message": "Capacity available"
}

Configuration Options

Concurrency Settings (agent.py)

MAX_CONCURRENT_SESSIONS = 5  # Maximum simultaneous interviews
IDLE_TIMEOUT = 900           # 15 minutes - auto-cleanup idle sessions
ACTIVITY_CHECK_INTERVAL = 60 # Check every 60 seconds
EMPTY_ROOM_GRACE_PERIOD = 3  # Wait 3 seconds before cleanup

RAG Settings (rag.py)

chunk_size = 1000      # Characters per chunk
chunk_overlap = 200    # Overlap between chunks
k = 4                  # Number of chunks to retrieve
deduplicate = True     # Remove duplicate chunks

Voice Models (agent.py)

AgentSession(
    stt="assemblyai/universal-streaming:en",  # Speech-to-Text
    llm="openai/gpt-4o-mini",                  # Language Model
    tts="cartesia/sonic-3:9626c31c-...",       # Text-to-Speech (British male)
    vad=silero.VAD.load(),                     # Voice Activity Detection
)

Troubleshooting

"Failed to connect to token server"

Check: Is the token server running on port 8000?
Fix: Run cd backend && uv run token_server.py in a separate terminal

"No audio from agent"

Check: Is the agent server running?
Check: Browser microphone permissions granted?
Fix: Open browser console (F12) and check for WebRTC errors

"Agent asks generic questions (not using my docs)"

Check: Did you run uv run ingest.py?
Check: Does backend/data/chroma_db/ exist?
Fix: Re-run ingestion and restart agent

"Maximum concurrent sessions reached"

Cause: 5 other interviews are active
Fix: Wait for sessions to end, or increase MAX_CONCURRENT_SESSIONS in agent.py

"Transcript not showing"

Wait: First transcription may take 5-10 seconds
Check: Open browser console (F12) for errors
Note: Transcriptions appear after you stop speaking (VAD detects end-of-speech)

"Rate limit errors"

Cause: OpenAI/AssemblyAI API quota exceeded
Fix: Check your API key usage dashboards
Prevention: Reduce MAX_CONCURRENT_SESSIONS to control costs

Development

Development Tools

This project was built using AI-assisted development:

IDE: Cursor - AI-powered code editor
AI Model: Claude Sonnet 4.5 for code generation, architecture design, and iterative development
Workflow: Iterative development with AI pair programming for rapid prototyping and refinement

Code Quality

# Frontend linting
cd frontend
npm run lint

# Python formatting (if using black/ruff)
cd backend
uv run ruff format .

Hot Reload

Frontend: Vite provides instant HMR (Hot Module Replacement)
Backend: Restart agent/token server manually after code changes

Security Notes

⚠️ This is a development setup. Do NOT use in production without:

Authentication: Add OAuth/JWT to token server
Rate limiting: Prevent token farming
CORS restriction: Change from * to specific domains
Input validation: Sanitize room names, usernames
Secrets management: Use AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault
HTTPS only: Force SSL for all connections

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is for educational purposes. Check with LiveKit, OpenAI, AssemblyAI, and Cartesia for their respective license terms.

Acknowledgments

LiveKit: Real-time voice infrastructure and Agents SDK
OpenAI: GPT-4o-mini for natural conversations and embeddings
AssemblyAI: High-quality speech-to-text
Cartesia: Ultra-low latency text-to-speech
LangChain: RAG and document processing utilities
ChromaDB: Vector database for semantic search

Support

Architecture & Design: See DESIGN.md for detailed design decisions, trade-offs, and system architecture
Quick Reference: See Configuration Options for runtime settings
Troubleshooting: See Troubleshooting for common issues

Roadmap

Post-interview analytics dashboard
Persistent interview history (database integration)
Resume parsing for better context
Multi-language support
Video analysis (body language, eye contact)
Custom interviewer voices
Screen sharing for code walkthroughs

Built with ❤️ using LiveKit, OpenAI, and modern web technologies

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
backend		backend
frontend		frontend
.dockerignore		.dockerignore
.gitignore		.gitignore
DESIGN.md		DESIGN.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

yizucodes/interview-agent

Folders and files

Latest commit

History

Repository files navigation

Project Interview Coach

📹 Video Walkthrough

Features

Quick Start

Prerequisites

Installation

Configuration

Ingest Your Project Documentation

Running the Application

Terminal 1: Token Server

Terminal 2: AI Agent

Terminal 3: Frontend

Usage

Tips for Best Experience

Project Structure

How It Works

Voice Pipeline

RAG System

How RAG Was Integrated

Tools & Frameworks

Backend

Frontend

Design Decisions & Assumptions

Key Assumptions

Quick Reference

Limitations

API Endpoints

Token Server (FastAPI)

GET /livekit-url

GET /token?room=<room>&username=<user>

GET /capacity-check

Configuration Options

Concurrency Settings (agent.py)

RAG Settings (rag.py)

Voice Models (agent.py)

Troubleshooting

"Failed to connect to token server"

"No audio from agent"

"Agent asks generic questions (not using my docs)"

"Maximum concurrent sessions reached"

"Transcript not showing"

"Rate limit errors"

Development

Development Tools

Code Quality

Hot Reload

Security Notes

Contributing

License

Acknowledgments

Support

Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`GET /livekit-url`

`GET /token?room=<room>&username=<user>`

`GET /capacity-check`

Packages