A real-time voice tutor that answers questions about your lecture content using RAG (Retrieval-Augmented Generation), streaming STT, LLM, and TTS with full barge-in/interrupt support.
- Voice when ElevenLabs fails: If ElevenLabs returns an error (e.g. quota exceeded or voice not found), the server sends
tts.fallbackwith the reply text and the browser uses built-in TTS (your OS voice) so you still get spoken replies. - Echo fix: The mic was being transcribed while the tutor was speaking, so the speaker output was showing up as your next message. Now the server does not forward mic audio to STT while the tutor is in the "speaking" state, and the Azure STT buffer is flushed when the tutor starts speaking, so the tutor’s voice is never transcribed as you. Use the interrupt (stop) button to cut off the tutor and ask a new question.
- ElevenLabs 404 voice_not_found: If the configured
ELEVENLABS_VOICE_IDis not found (404), the app retries once with the default voice so TTS can work without changing.env. - Interrupt: Interrupt (stop) now also cancels browser TTS (SpeechSynthesis) so playback stops immediately.
- Stop old voice when starting a new question: When you type or speak a new question while the tutor is still talking, the old TTS is stopped so the new answer's voice plays from the start. The client stops playback when: (1) you send a new message (Send or Enter), and (2) when the server signals a new response (
tutor.state === 'thinking'), so the transition to the next answer is smooth with no overlap. - Conversation memory: The tutor now receives the full conversation history for the current session (all previous questions and answers), not just the last few turns, so it can refer back to earlier questions and give consistent, context-aware answers.
Where the voice tutor lives: The tutor UI with all the above fixes is at /courses/[courseId]/study-buddy (e.g. open Dashboard → click a course → click StudyBuddy). The root / shows the dashboard (course cards), not the old single-page tutor.
- Lecture vs Assignment mode: In Session Setup you can choose Topic type: Lecture or Assignment. For Lecture, the tutor uses RAG over that lecture’s slides (same as before). For Assignment, it uses the selected assignment’s markdown: it gives overview, deliverables, and hints only — it will not give full solutions. If the student asks for the answer or solution, it politely declines and offers hints instead.
- Node.js 18+ (recommend 20+)
- npm 9+
- API keys (see below)
npm installcp .env.example .envEdit .env and add your API keys:
| Key | Required? | Notes |
|---|---|---|
OPENAI_API_KEY |
Yes (if using OpenAI) | For embeddings + LLM |
DEEPGRAM_API_KEY |
Recommended | Streaming speech-to-text |
ELEVENLABS_API_KEY |
Optional | Voice output; text-only without it |
ELEVENLABS_VOICE_ID |
Optional | Defaults to a built-in voice |
npm run build:sharednpm run devThis starts:
- Backend at
http://localhost:3001(WebSocket atws://localhost:3001/ws) - Frontend at
http://localhost:3000
- Open
http://localhost:3000(Dashboard). - Click a course (e.g. CPSC1020), then click StudyBuddy to open the voice tutor (
/courses/CPSC1020/study-buddy). - Select a lecture from the dropdown and click Start Tutor.
- Click the mic button or type a question.
- The tutor answers using voice + text with slide citations.
- Interrupt anytime by clicking the red stop button or by sending a new message (old voice stops, new answer plays).
Place your lectures under the lectures/ directory (or set LECTURE_ROOT in .env):
lectures/
CPSC2120/
Lecture01/
slides.txt # or slides.pdf
Lecture02/
slides.pdf
STAT3090/
Lecture01/
slides.txt
Use --- slide N --- markers:
--- slide 1 ---
Title and content of slide 1...
--- slide 2 ---
Content of slide 2...
Each page is treated as one slide. Text is extracted automatically (best-effort).
studybuddy/
├── apps/
│ ├── server/ # Node.js + Express + WebSocket backend
│ │ └── src/
│ │ ├── index.ts # Entry point
│ │ ├── config.ts # Environment config
│ │ ├── session.ts # WebSocket session handler
│ │ └── services/
│ │ ├── lectureStore.ts # Scans & parses lectures
│ │ ├── embeddings.ts # OpenAI embeddings
│ │ ├── vectorIndex.ts # In-memory vector search
│ │ ├── llm.ts # Streaming LLM
│ │ ├── tts.ts # ElevenLabs TTS
│ │ └── stt.ts # Deepgram/Whisper STT
│ └── web/ # Next.js frontend
│ └── src/
│ ├── app/page.tsx # Main UI
│ └── lib/
│ ├── ws-client.ts # WebSocket client
│ ├── mic-capture.ts # Mic → PCM16
│ └── audio-player.ts # Audio queue + playback
├── packages/
│ └── shared/ # Shared TypeScript types
├── lectures/ # Sample lecture content
├── .env.example
└── README.md
WebSocket messages use JSON. See packages/shared/src/index.ts for full type definitions.
Client → Server:
session.start— begin tutoring session for a course/lectureaudio.chunk— streaming mic audio (PCM16 base64)user.text— typed questioninterrupt— cancel current responsesession.stop— end session
Server → Client:
session.ready— available courses/lecturesstt.partial/stt.final— speech transcriptionrag.citations— retrieved slide referencesllm.token— streaming LLM tokenstts.audio— audio chunks (MP3 base64)tutor.state— listening / thinking / speakingerror— error message
- Ensure you're on
localhost(or HTTPS) — browsers require secure contexts for mic access - Check browser permissions: click the lock icon in the address bar
- Try Chrome or Edge (best WebAudio support)
- The app mutes STT while the tutor is speaking: mic audio is not sent to speech-to-text during TTS playback, so the speaker output is not transcribed as your next question. Use the interrupt (stop) button to cut off the tutor and ask a new question.
- Use the correct page: Open Dashboard → click a course → click StudyBuddy. The tutor is at
/courses/[courseId]/study-buddy. - Echo with browser TTS: If ElevenLabs fails (e.g. invalid API key), the app uses browser TTS. The client now gates mic: it does not send audio to the server while tutor voice is playing (ElevenLabs or browser). So you should no longer see the tutor's reply transcribed as your message. Restart the app and do a hard refresh (Cmd+Shift+R / Ctrl+Shift+R).
- Check that
LECTURE_ROOTin.envpoints to a directory with the correct structure - Click "Rescan Lectures" in the UI
- Check server logs for scan output
- ElevenLabs quota: If the server logs
[tts] ElevenLabs error 401withquota_exceeded, your API key has run out of credits. The app will automatically fall back to browser TTS (your OS voice) so you still get spoken replies; text is unchanged. - To use ElevenLabs again: top up credits at elevenlabs.io or add a new API key in
.env. - Without
ELEVENLABS_API_KEYat all, the tutor uses text-only mode (or browser TTS when the server sendstts.fallback).
- Make sure the server is running on port 3001
- Check for firewall/proxy issues
- Look at browser console for connection errors
- Verify
OPENAI_API_KEYis set and valid - Check server logs for API errors
- The app caches embeddings in
.cache/embeddings.jsonfor faster restarts
- Frontend: Next.js 14, React 18, Web Audio API
- Backend: Node.js, Express, ws (WebSocket)
- STT: Deepgram (streaming) or Whisper (fallback)
- LLM: OpenAI GPT-4o-mini (or Azure OpenAI)
- TTS: ElevenLabs
- RAG: In-memory vector search with OpenAI embeddings