A voice-first curiosity companion for kids. Hoot is a friendly owl who answers "why is the sky blue?", explores subjects on demand, and tells real stories — interactive or just-narrate-to-me — with a kid's choices actually shaping where the plot goes.
- Ask Hoot — tap the mic, speak any question. Hoot answers conversationally with full follow-up context. Voice barge-in works: speak over Hoot and he stops mid-sentence.
- Explore — 10 subjects (Space, Animals, The Human Body, Dinosaurs, …). Each subject shows curated wonder cards. Tap one to start an Ask Hoot session pre-loaded with that question.
- Story — two flavours:
- Interactive — Hoot narrates a complete story scene by scene, asking real questions: branching choices that change the next scene, and reflective ones ("do you think it was fair?"). Stories are pre-planned by GPT-4 before narration starts, so they have a real arc and a real ending — not a wandering "and then…".
- Listen — Hoot tells the whole story straight through, ~1–2 min, with a satisfying close.
- Pause / Resume — stop the story or conversation any time, tap continue, and Hoot picks up the same story (or same Q&A thread) — same characters, same plot, same context.
- Multilingual — pick English, Spanish, or Hindi in Settings. Wonder cards, story plans, narration, and Q&A all switch language. One TTS voice speaks all of them.
- Themes — 5 colour palettes the kid can switch in Settings; persists
via
localStorage.
- UI: Next.js 15 (App Router) · React 19 · TypeScript · Tailwind. All custom inline-SVG doodle icons — no icon library, no raster assets.
- Voice: ElevenLabs Speech Engine (bring-your-own-LLM) via
@elevenlabs/elevenlabs-js+@elevenlabs/react. STT, TTS, turn-taking, and voice-activity-driven interruption all handled by ElevenLabs; the LLM call lives on our server. - LLM: OpenAI Responses API, streamed.
gpt-4o-minifor live voice turns (low latency).gpt-4ofor story-plan generation (quality matters more there).
- State:
localStorageonly (no DB, no auth) — the profile, the theme, cached wonder lists.
Full step-by-step including ngrok and engine creation is in
SETUP.md. The short version:
pnpm install
cp .env.example .env # fill in ELEVENLABS_API_KEY + OPENAI_API_KEY
ngrok http 3001 # terminal 1, leave running
# put the wss://…/ws URL into PUBLIC_WS_URL in .env
pnpm run speech-engine:create # one-time — gets you a seng_… ID for .env
pnpm run speech-engine:server # terminal 2
pnpm run dev # terminal 3Open http://localhost:3000.
browser your machine
─────── ─────────────────────────────
┌──────────┐ ┌─────────────────────────────┐
│ Next.js │ WebRTC audio ┌──▶│ ElevenLabs Speech Engine │
│ (React) │ ◀──────────────┘ │ (STT + TTS + VAD + turns) │
└────┬─────┘ └────────────┬────────────────┘
│ │ /ws (ngrok in dev)
│ POST /api/token ▼
│ POST /api/voice-context ┌──────────────┐
│ POST /api/wonders │ server.mts │
│ POST /api/story-plan │ (Node WS) │
▼ └──────┬───────┘
┌──────────────────┐ │ onTranscript(messages, signal)
│ Next.js API │ ◀── shared file ──┐ ▼
│ routes │ (.cache/ │ ┌──────────────────┐
│ (api/*) │ voice- │ │ OpenAI │
└────────┬─────────┘ context.json)│ │ (streamed, │
│ │ │ AbortSignal │
▼ │ │ for barge-in) │
┌──────────────────┐ │ └──────────────────┘
│ OpenAI │ ◀──────────────────┘
│ (wonders + │
│ story plans) │
└──────────────────┘
Voice loop, end to end:
- Client requests a WebRTC token from
/api/token. - Client opens an ElevenLabs voice session and POSTs the kid's profile + any
starting context (preset wonder, story plan) to
/api/voice-context, keyed by the newconversationId. - ElevenLabs streams the kid's speech to
server.mtson/ws. server.mts onTranscriptreads the per-session context, builds Hoot's system prompt (Q&A or Story flavour), and streams an OpenAI response back throughsession.sendResponse(stream).- The
AbortSignalElevenLabs provides is forwarded into the OpenAI call, so a real voice barge-in cancels the in-flight LLM stream — Hoot stops speaking immediately.
Discovery (Explore, Home wonders) and Story all reuse this exact loop — there is no second voice system.
app/
api/
story-plan/ POST — generates a structured story outline
token/ GET — issues a WebRTC conversation token
voice-context/ POST — per-session profile + plan + history
wonders/ POST — generates curated wonder cards
ask/ /ask (mic screen)
explore/ /explore + /explore/[subject]
story/ /story (chooser + in-story view)
settings/ /settings (name, age, language, theme)
layout.tsx, page.tsx, globals.css
components/
icons/ Inline SVG: Hoot mascot, subject icons, nav glyphs
primitives.tsx Card · Pill · SmallSquare · WaveBg · BgWaves
screens/ One file per screen (Onboarding, Home, Ask, Story, …)
Nav.tsx TopNav (md+) + BottomNav (mobile)
AppShell.tsx Page chrome (nav + bg waves)
ThemeBoot.tsx Applies saved palette to <html data-theme>
useProfile.ts Profile hooks + gated route helper
lib/
hoot-prompt.ts System prompt templates (Q&A + Story) + plan formatter
assistant.ts OpenAI streaming wrapper for live voice turns
voice-context-store.ts File-based store shared by Next.js + server.mts
wonders-client.ts Client fetch + localStorage cache for wonder cards
profile.ts localStorage profile (name, age, theme, language)
palettes.ts 5 theme palettes
data.ts SUBJECTS, seeded WONDERS, HOME_WONDERS
server.mts Speech Engine WebSocket server (binds /ws on :3001)
scripts/
create-engine.mts Upsert the Speech Engine + push tuned config
| Layer | Multilingual? |
|---|---|
| LLM responses (Q&A, story narration) | yes — language is templated into every system prompt |
| Wonder cards (Home + per-subject) | yes — cache key includes language, refetched on switch |
| Story plan + interactive questions | yes — LLM is told to write all prose in target language |
| Opening greeting | yes — localized for EN / ES / HI |
| Voice (TTS) | yes if you set HOOT_TTS_MODEL_ID=eleven_flash_v2_5 (~32 languages, low latency) |
| Voice (STT) | yes — Scribe auto-detects |
| Static UI strings ("Home", "Settings", etc.) | no — hardcoded English. Would need full i18n. |
See .env.example for the full list. Required:
ELEVENLABS_API_KEY— server-only, your ElevenLabs keyELEVENLABS_SPEECH_ENGINE_ID—seng_…frompnpm run speech-engine:createOPENAI_API_KEY— server-only, your OpenAI keyPUBLIC_WS_URL— the publicwss://…/wsURL the Speech Engine connects to (ngrok in dev, a real domain in prod)
Optional:
OPENAI_MODEL— live-voice LLM (defaultgpt-4o-mini)OPENAI_STORY_PLAN_MODEL— plan generator (defaultgpt-4o)HOOT_VOICE_ID— any ElevenLabs voice ID (default: engine default)HOOT_TTS_MODEL_ID— TTS model (default: engine default; set toeleven_flash_v2_5for low-latency multilingual narration)
After changing voice/model env vars, re-run pnpm run speech-engine:create —
it's upsert-aware and updates the existing engine in place.
Short answer: yes you can deploy on Vercel, but only the Next.js app.
server.mts needs to live somewhere that supports long-running
WebSockets — Vercel functions don't. See Deployment
below for the full options.
The build was scoped as a hackathon-quality demo. The following are intentionally left to a later pass:
- Real auth + parent-managed accounts (currently
localStorageonly). - Parent dashboard / transcript review.
- Server-side content moderation beyond OpenAI's defaults.
- Full UI internationalization (static button labels, screen titles).
- Persistent conversation history across browser sessions.
- Multi-device sync of the kid's profile.
There are two moving pieces in production and they can either share one host or live separately.
Static + serverless-route part. Deploys cleanly on Vercel.
# Push to GitHub, then on Vercel:
# New Project → import repo → set env vars:
# ELEVENLABS_API_KEY
# ELEVENLABS_SPEECH_ENGINE_ID
# OPENAI_API_KEY
# PUBLIC_WS_URL ← the public wss URL of your Piece 2 host
# HOOT_VOICE_ID (optional)
# HOOT_TTS_MODEL_ID (optional)
# Deploy.This is a long-running Node process that ElevenLabs opens a persistent WebSocket to. It cannot run on Vercel — Vercel functions are short-lived (10 s on Hobby / 60 s on Pro / max ~5 min on serverless), and ElevenLabs needs the socket open for the whole conversation.
Host it somewhere that supports long-running processes with a stable public hostname:
- Railway, Fly.io, Render, Koyeb — all good fits.
- A small EC2 / DigitalOcean droplet / any always-on box also works.
Setup on that host:
- Deploy the repo and set the same env vars as Vercel (you only need
ELEVENLABS_API_KEY,OPENAI_API_KEY,ELEVENLABS_SPEECH_ENGINE_ID,OPENAI_MODEL). - Run
pnpm run speech-engine:server(e.g. as the start command). - Get the host's public URL (
https://hoot-engine.fly.devor similar). - Set
PUBLIC_WS_URL=wss://hoot-engine.fly.dev/wson both Vercel and this host's env. - Run
pnpm run speech-engine:createonce (locally, with the prodPUBLIC_WS_URLin your.env, andELEVENLABS_SPEECH_ENGINE_IDeither blank to create new or set to update the existing engine) to point your engine at the prod URL.
The voice-context-store.ts writes per-conversation context to a local
JSON file (.cache/hoot-voice-context.json). That's fine when Next.js
and server.mts run on the same machine. In split deploys (Vercel +
remote WS host) they can't see each other's filesystem — you'd need to
swap that store for Redis or Postgres for production. For a demo or
hackathon, the simplest production setup is to deploy both pieces on
one host (Railway, Fly, etc.) so the file works.
| Scenario | Recommendation |
|---|---|
| Local hackathon demo | localhost + ngrok, as documented in SETUP.md |
| Single-host prod-ish demo | Railway / Fly / Render running both next start and tsx server.mts |
| Truly serverless prod | Vercel for next + a small WS host for server.mts + Redis for shared context |
- Both API keys (ElevenLabs + OpenAI) are server-only. They're never shipped to the browser. The WebRTC token endpoint mints short-lived tokens so the browser only ever holds those.
.envis.gitignored. Don't commit it.- The runtime voice-context store under
.cache/is also.gitignored. - The kid's profile lives in
localStorage— no PII leaves the device.
Hackathon demo. Use it however you like.