Arlo

Talk naturally with specialized AI agents in a calm, minimal video-call style room.

What it does

Arlo lets you have a real spoken conversation with an AI agent — no typing, no prompts. Pick an agent, click Start, and just talk. The agent listens, thinks, and responds in a natural human voice. It knows when you're done speaking and won't cut you off.

Agents:

CMSC420 TA — Advanced Data Structures (trees, heaps, graphs, spatial indexing, complexity)
CMSC417 TA — Computer Networks (TCP, DNS, routing, congestion control, protocols)
Telehealth Intake — Calm, structured pre-visit intake assistant
Interview Coach — Behavioral and technical interview practice

Features:

Voice conversation with natural turn-taking — no button to press when done
Human-sounding ElevenLabs voices via Vapi
Visual avatar that animates while listening, thinking, and speaking
Auto-captured session notes + full transcript
Paste document URLs (Google Docs, lecture notes, course pages) as context so the agent knows your specific course material

Setup

1. Install dependencies

npm install

2. Get your API keys

Vapi (handles voice pipeline — STT, LLM, TTS, turn-taking)

Sign up at vapi.ai — free $10 credit on signup
Go to Account → copy your Public Key

Anthropic (Claude powers the AI responses)

Get a key at console.anthropic.com

3. Configure environment

cp .env.example .env.local

Fill in .env.local:

ANTHROPIC_API_KEY=sk-ant-...
NEXT_PUBLIC_VAPI_PUBLIC_KEY=pk_...

4. Run locally

npm run dev

Open the URL shown in your terminal (usually http://localhost:3000).

Note: Use Chrome or Edge — they have the best Web Speech API support.

How it works

Voice pipeline

Vapi chains together three services into a seamless real-time call:

Deepgram Nova-2 — streaming speech-to-text with smart VAD (knows when you're done)
Claude Haiku — fast, concise AI responses tuned for spoken conversation
ElevenLabs — natural-sounding TTS voices, one per agent

Avatar states

Each agent has a distinct human face avatar that reacts in real time:

Idle — gentle breathing animation
Listening — ripple rings expand outward
Thinking — face dims, dots bounce below
Speaking — mouth opens and closes, waveform pulses

Document context

On the agent selection page, paste up to 3 URLs before starting a session. Arlo fetches the text and injects it into the agent's context — so it can answer questions about your specific lecture notes or class materials.

Supported sources:

Google Docs (set to "Anyone with link can view")
Google Slides
Any public HTML page (course websites, lecture notes, wikis)

Notes & transcript

The session room has a right panel with two tabs:

Notes — auto-captures a key point from each agent response, plus free-form typing area
Transcript — full back-and-forth conversation log

Tech stack

Layer	Technology
Frontend	Next.js 16 App Router, TypeScript, Tailwind CSS
Voice pipeline	Vapi (`@vapi-ai/web`)
AI	Anthropic Claude Haiku (`claude-haiku-4-5-20251001`)
TTS	ElevenLabs (via Vapi)
STT	Deepgram Nova-2 (via Vapi)
Document parsing	Cheerio (server-side HTML extraction)

File structure

app/
  page.tsx                 # Landing page
  select/page.tsx          # Agent + document selection
  session/page.tsx         # Live call room
  api/chat/route.ts        # Claude API endpoint (fallback)
  api/fetch-doc/route.ts   # Document URL fetcher

components/
  Avatar.tsx               # Animated human face avatar (4 styles, 4 states)
  AgentCard.tsx            # Selectable agent card
  DocumentInput.tsx        # URL input for study materials
  Notepad.tsx              # Auto-capture + freeform notes panel
  Transcript.tsx           # Conversation log
  StatusPill.tsx           # Idle / Listening / Thinking / Speaking indicator
  ControlBar.tsx           # Mic / End call controls

lib/
  personas.ts              # Agent definitions, system prompts, voice IDs
  claude.ts                # /api/chat client helper
  speech.ts                # Browser Web Speech API utilities
  turnTaking.ts            # Pause detection logic (fallback)

types/
  speech.d.ts              # Web Speech API type declarations

Environment variables

Variable	Required	Description
`NEXT_PUBLIC_VAPI_PUBLIC_KEY`	Yes	Vapi public key — enables voice calls
`ANTHROPIC_API_KEY`	Optional	Anthropic key for the `/api/chat` fallback route

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude		.claude
app		app
components		components
lib		lib
public		public
types		types
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arlo

What it does

Setup

1. Install dependencies

2. Get your API keys

3. Configure environment

4. Run locally

How it works

Voice pipeline

Avatar states

Document context

Notes & transcript

Tech stack

File structure

Environment variables

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Arlo

What it does

Setup

1. Install dependencies

2. Get your API keys

3. Configure environment

4. Run locally

How it works

Voice pipeline

Avatar states

Document context

Notes & transcript

Tech stack

File structure

Environment variables

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages