StudyBuddy — Clemson Lecture-Aware Voice Tutor

A real-time voice tutor that answers questions about your lecture content using RAG (Retrieval-Augmented Generation), streaming STT, LLM, and TTS with full barge-in/interrupt support.

What was done (for reference)

Voice when ElevenLabs fails: If ElevenLabs returns an error (e.g. quota exceeded or voice not found), the server sends tts.fallback with the reply text and the browser uses built-in TTS (your OS voice) so you still get spoken replies.
Echo fix: The mic was being transcribed while the tutor was speaking, so the speaker output was showing up as your next message. Now the server does not forward mic audio to STT while the tutor is in the "speaking" state, and the Azure STT buffer is flushed when the tutor starts speaking, so the tutor’s voice is never transcribed as you. Use the interrupt (stop) button to cut off the tutor and ask a new question.
ElevenLabs 404 voice_not_found: If the configured ELEVENLABS_VOICE_ID is not found (404), the app retries once with the default voice so TTS can work without changing .env.
Interrupt: Interrupt (stop) now also cancels browser TTS (SpeechSynthesis) so playback stops immediately.
Stop old voice when starting a new question: When you type or speak a new question while the tutor is still talking, the old TTS is stopped so the new answer's voice plays from the start. The client stops playback when: (1) you send a new message (Send or Enter), and (2) when the server signals a new response (tutor.state === 'thinking'), so the transition to the next answer is smooth with no overlap.
Conversation memory: The tutor now receives the full conversation history for the current session (all previous questions and answers), not just the last few turns, so it can refer back to earlier questions and give consistent, context-aware answers.

Where the voice tutor lives: The tutor UI with all the above fixes is at /courses/[courseId]/study-buddy (e.g. open Dashboard → click a course → click StudyBuddy). The root / shows the dashboard (course cards), not the old single-page tutor.

Lecture vs Assignment mode: In Session Setup you can choose Topic type: Lecture or Assignment. For Lecture, the tutor uses RAG over that lecture’s slides (same as before). For Assignment, it uses the selected assignment’s markdown: it gives overview, deliverables, and hints only — it will not give full solutions. If the student asks for the answer or solution, it politely declines and offers hints instead.

Quick Start

Prerequisites

Node.js 18+ (recommend 20+)
npm 9+
API keys (see below)

1. Install dependencies

npm install

2. Configure environment

cp .env.example .env

Edit .env and add your API keys:

Key	Required?	Notes
`OPENAI_API_KEY`	Yes (if using OpenAI)	For embeddings + LLM
`DEEPGRAM_API_KEY`	Recommended	Streaming speech-to-text
`ELEVENLABS_API_KEY`	Optional	Voice output; text-only without it
`ELEVENLABS_VOICE_ID`	Optional	Defaults to a built-in voice

3. Build shared types

npm run build:shared

4. Start the app

npm run dev

This starts:

Backend at http://localhost:3001 (WebSocket at ws://localhost:3001/ws)
Frontend at http://localhost:3000

5. Use it

Open http://localhost:3000 (Dashboard).
Click a course (e.g. CPSC1020), then click StudyBuddy to open the voice tutor (/courses/CPSC1020/study-buddy).
Select a lecture from the dropdown and click Start Tutor.
Click the mic button or type a question.
The tutor answers using voice + text with slide citations.
Interrupt anytime by clicking the red stop button or by sending a new message (old voice stops, new answer plays).

Lecture Directory Structure

Place your lectures under the lectures/ directory (or set LECTURE_ROOT in .env):

lectures/
  CPSC2120/
    Lecture01/
      slides.txt    # or slides.pdf
    Lecture02/
      slides.pdf
  STAT3090/
    Lecture01/
      slides.txt

Text file format

Use --- slide N --- markers:

--- slide 1 ---
Title and content of slide 1...

--- slide 2 ---
Content of slide 2...

PDF files

Each page is treated as one slide. Text is extracted automatically (best-effort).

Architecture

studybuddy/
├── apps/
│   ├── server/          # Node.js + Express + WebSocket backend
│   │   └── src/
│   │       ├── index.ts          # Entry point
│   │       ├── config.ts         # Environment config
│   │       ├── session.ts        # WebSocket session handler
│   │       └── services/
│   │           ├── lectureStore.ts  # Scans & parses lectures
│   │           ├── embeddings.ts    # OpenAI embeddings
│   │           ├── vectorIndex.ts   # In-memory vector search
│   │           ├── llm.ts           # Streaming LLM
│   │           ├── tts.ts           # ElevenLabs TTS
│   │           └── stt.ts           # Deepgram/Whisper STT
│   └── web/             # Next.js frontend
│       └── src/
│           ├── app/page.tsx        # Main UI
│           └── lib/
│               ├── ws-client.ts    # WebSocket client
│               ├── mic-capture.ts  # Mic → PCM16
│               └── audio-player.ts # Audio queue + playback
├── packages/
│   └── shared/          # Shared TypeScript types
├── lectures/            # Sample lecture content
├── .env.example
└── README.md

Realtime Protocol

WebSocket messages use JSON. See packages/shared/src/index.ts for full type definitions.

Client → Server:

session.start — begin tutoring session for a course/lecture
audio.chunk — streaming mic audio (PCM16 base64)
user.text — typed question
interrupt — cancel current response
session.stop — end session

Server → Client:

session.ready — available courses/lectures
stt.partial / stt.final — speech transcription
rag.citations — retrieved slide references
llm.token — streaming LLM tokens
tts.audio — audio chunks (MP3 base64)
tutor.state — listening / thinking / speaking
error — error message

Troubleshooting

"Microphone not working"

Ensure you're on localhost (or HTTPS) — browsers require secure contexts for mic access
Check browser permissions: click the lock icon in the address bar
Try Chrome or Edge (best WebAudio support)

"Tutor's own voice is transcribed as my next message" (echo)

The app mutes STT while the tutor is speaking: mic audio is not sent to speech-to-text during TTS playback, so the speaker output is not transcribed as your next question. Use the interrupt (stop) button to cut off the tutor and ask a new question.

Still seeing echo or old voice playing after a new question?

Use the correct page: Open Dashboard → click a course → click StudyBuddy. The tutor is at /courses/[courseId]/study-buddy.
Echo with browser TTS: If ElevenLabs fails (e.g. invalid API key), the app uses browser TTS. The client now gates mic: it does not send audio to the server while tutor voice is playing (ElevenLabs or browser). So you should no longer see the tutor's reply transcribed as your message. Restart the app and do a hard refresh (Cmd+Shift+R / Ctrl+Shift+R).

"No courses found"

Check that LECTURE_ROOT in .env points to a directory with the correct structure
Click "Rescan Lectures" in the UI
Check server logs for scan output

"No voice output" / "Voice used to work but now it's silent"

ElevenLabs quota: If the server logs [tts] ElevenLabs error 401 with quota_exceeded, your API key has run out of credits. The app will automatically fall back to browser TTS (your OS voice) so you still get spoken replies; text is unchanged.
To use ElevenLabs again: top up credits at elevenlabs.io or add a new API key in .env.
Without ELEVENLABS_API_KEY at all, the tutor uses text-only mode (or browser TTS when the server sends tts.fallback).

"WebSocket connection failed"

Make sure the server is running on port 3001
Check for firewall/proxy issues
Look at browser console for connection errors

"Embeddings failed"

Verify OPENAI_API_KEY is set and valid
Check server logs for API errors
The app caches embeddings in .cache/embeddings.json for faster restarts

Tech Stack

Frontend: Next.js 14, React 18, Web Audio API
Backend: Node.js, Express, ws (WebSocket)
STT: Deepgram (streaming) or Whisper (fallback)
LLM: OpenAI GPT-4o-mini (or Azure OpenAI)
TTS: ElevenLabs
RAG: In-memory vector search with OpenAI embeddings

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.claude		.claude
apps		apps
courses		courses
packages/shared		packages/shared
.env		.env
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
openapi.yaml		openapi.yaml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StudyBuddy — Clemson Lecture-Aware Voice Tutor

What was done (for reference)

Quick Start

Prerequisites

1. Install dependencies

2. Configure environment

3. Build shared types

4. Start the app

5. Use it

Lecture Directory Structure

Text file format

PDF files

Architecture

Realtime Protocol

Troubleshooting

"Microphone not working"

"Tutor's own voice is transcribed as my next message" (echo)

Still seeing echo or old voice playing after a new question?

"No courses found"

"No voice output" / "Voice used to work but now it's silent"

"WebSocket connection failed"

"Embeddings failed"

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StudyBuddy — Clemson Lecture-Aware Voice Tutor

What was done (for reference)

Quick Start

Prerequisites

1. Install dependencies

2. Configure environment

3. Build shared types

4. Start the app

5. Use it

Lecture Directory Structure

Text file format

PDF files

Architecture

Realtime Protocol

Troubleshooting

"Microphone not working"

"Tutor's own voice is transcribed as my next message" (echo)

Still seeing echo or old voice playing after a new question?

"No courses found"

"No voice output" / "Voice used to work but now it's silent"

"WebSocket connection failed"

"Embeddings failed"

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages