Talk naturally with specialized AI agents in a calm, minimal video-call style room.
Arlo lets you have a real spoken conversation with an AI agent — no typing, no prompts. Pick an agent, click Start, and just talk. The agent listens, thinks, and responds in a natural human voice. It knows when you're done speaking and won't cut you off.
Agents:
- CMSC420 TA — Advanced Data Structures (trees, heaps, graphs, spatial indexing, complexity)
- CMSC417 TA — Computer Networks (TCP, DNS, routing, congestion control, protocols)
- Telehealth Intake — Calm, structured pre-visit intake assistant
- Interview Coach — Behavioral and technical interview practice
Features:
- Voice conversation with natural turn-taking — no button to press when done
- Human-sounding ElevenLabs voices via Vapi
- Visual avatar that animates while listening, thinking, and speaking
- Auto-captured session notes + full transcript
- Paste document URLs (Google Docs, lecture notes, course pages) as context so the agent knows your specific course material
npm installVapi (handles voice pipeline — STT, LLM, TTS, turn-taking)
- Sign up at vapi.ai — free $10 credit on signup
- Go to Account → copy your Public Key
Anthropic (Claude powers the AI responses)
- Get a key at console.anthropic.com
cp .env.example .env.localFill in .env.local:
ANTHROPIC_API_KEY=sk-ant-...
NEXT_PUBLIC_VAPI_PUBLIC_KEY=pk_...
npm run devOpen the URL shown in your terminal (usually http://localhost:3000).
Note: Use Chrome or Edge — they have the best Web Speech API support.
Vapi chains together three services into a seamless real-time call:
- Deepgram Nova-2 — streaming speech-to-text with smart VAD (knows when you're done)
- Claude Haiku — fast, concise AI responses tuned for spoken conversation
- ElevenLabs — natural-sounding TTS voices, one per agent
Each agent has a distinct human face avatar that reacts in real time:
- Idle — gentle breathing animation
- Listening — ripple rings expand outward
- Thinking — face dims, dots bounce below
- Speaking — mouth opens and closes, waveform pulses
On the agent selection page, paste up to 3 URLs before starting a session. Arlo fetches the text and injects it into the agent's context — so it can answer questions about your specific lecture notes or class materials.
Supported sources:
- Google Docs (set to "Anyone with link can view")
- Google Slides
- Any public HTML page (course websites, lecture notes, wikis)
The session room has a right panel with two tabs:
- Notes — auto-captures a key point from each agent response, plus free-form typing area
- Transcript — full back-and-forth conversation log
| Layer | Technology |
|---|---|
| Frontend | Next.js 16 App Router, TypeScript, Tailwind CSS |
| Voice pipeline | Vapi (@vapi-ai/web) |
| AI | Anthropic Claude Haiku (claude-haiku-4-5-20251001) |
| TTS | ElevenLabs (via Vapi) |
| STT | Deepgram Nova-2 (via Vapi) |
| Document parsing | Cheerio (server-side HTML extraction) |
app/
page.tsx # Landing page
select/page.tsx # Agent + document selection
session/page.tsx # Live call room
api/chat/route.ts # Claude API endpoint (fallback)
api/fetch-doc/route.ts # Document URL fetcher
components/
Avatar.tsx # Animated human face avatar (4 styles, 4 states)
AgentCard.tsx # Selectable agent card
DocumentInput.tsx # URL input for study materials
Notepad.tsx # Auto-capture + freeform notes panel
Transcript.tsx # Conversation log
StatusPill.tsx # Idle / Listening / Thinking / Speaking indicator
ControlBar.tsx # Mic / End call controls
lib/
personas.ts # Agent definitions, system prompts, voice IDs
claude.ts # /api/chat client helper
speech.ts # Browser Web Speech API utilities
turnTaking.ts # Pause detection logic (fallback)
types/
speech.d.ts # Web Speech API type declarations
| Variable | Required | Description |
|---|---|---|
NEXT_PUBLIC_VAPI_PUBLIC_KEY |
Yes | Vapi public key — enables voice calls |
ANTHROPIC_API_KEY |
Optional | Anthropic key for the /api/chat fallback route |