A FastAPI-based REST API for downloading Panopto streams with an audio-first workflow.
- Download Panopto streams and persist MP3 audio as the canonical artifact
- Extract audio tracks (MP3 via ffmpeg) alongside each download
- Transcribe audio tracks to text through the ElevenLabs Speech-to-Text API
- Upload PDF slide decks for local storage (future processing)
- Track download progress and status
- List and manage downloaded videos
- RESTful API with CORS support
- Python 3.8 or higher
- pip or uv package manager
- ffmpeg (required for video processing)
Linux (Ubuntu/Debian/WSL):
sudo apt update
sudo apt install -y ffmpegmacOS:
brew install ffmpegWindows: Download from ffmpeg.org or use:
choco install ffmpegVerify installation:
ffmpeg -version- Install dependencies:
pip install -r requirements.txtNote: Due to dependency conflicts, you may need to install some packages separately:
pip install "yarl>=1.9.0" "multidict>=4.0" "propcache>=0.2.1" --no-deps
pip install -r requirements.txt --no-depsIf you're using uv as your package manager:
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv pip install -r requirements.txtCreate a .env.local (or .env) file at the project root and define the ElevenLabs API key:
ELEVENLABS_API_KEY=your_elevenlabs_key
Optional overrides include ELEVENLABS_MODEL_ID, ELEVENLABS_LANGUAGE_CODE, ELEVENLABS_DIARIZE, and ELEVENLABS_TAG_AUDIO_EVENTS. Restart the API server any time you change these values.
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reloadThe --reload flag enables auto-reload on code changes (useful for development).
uv run uvicorn app.main:app --host 0.0.0.0 --port 8000 --reloadpython -m app.mainOr:
python app/main.pyOnce the server is running, you can access:
- API Base URL:
http://localhost:8000 - API Documentation (Swagger UI):
http://localhost:8000/docs - Alternative API Docs (ReDoc):
http://localhost:8000/redoc - Health Check:
http://localhost:8000/api/health
GET /api/health- Check server health and storage status
POST /api/videos/download- Start downloading a videoGET /api/videos- List all stored videosGET /api/videos/active- List active downloadsGET /api/videos/{video_id}- Get video metadataGET /api/videos/{video_id}/status- Get download statusGET /api/audio/{video_id}- Download the MP3 artifact for a lecture (primary route)GET /api/videos/{video_id}/file- Legacy download route; returns audio when available and falls back to MP4 for archived entriesDELETE /api/videos/{video_id}- Delete a video
POST /api/videos/download defaults to {"audio_only": true} to avoid persisting MP4s unless explicitly requested.
POST /api/documents/upload- Upload PDF slides; file is saved understorage/documents/and metadata recorded indata/documents.jsonGET /api/documents- List metadata for every stored PDF (document ID, filename, paths, slide description details)GET /api/documents/{document_id}- Retrieve metadata for a single stored documentDELETE /api/documents/{document_id}- Remove a PDF and any slide descriptions on disk
POST /api/courses- Create a course record in SQLite (course_idreturned)GET /api/courses- List available courses for UI dropdowns/extensionsPOST /api/courses/{course_id}/units- Create a unit for the specified course (title/description/position)GET /api/courses/{course_id}/units- List all units for a coursePOST /api/units/{unit_id}/topics- Create a topic inside a unitGET /api/units/{unit_id}/topics- List topics belonging to a unit
POST /api/chat- Send a message to the StudyBuddy agent (requirescourse_id; response includessession_idso clients can associate history)POST /api/chat/stream- Stream the same response over SSE; initial event contains thesession_idGET /api/courses/{course_id}/chat/history- Persisted chat sessions/messages for a course (filterable via?user_id=)
curl -X POST "http://localhost:8000/api/videos/download" \
-H "Content-Type: application/json" \
-d '{
"stream_url": "https://example.panopto.com/stream/...",
"title": "My Video",
"source_url": "https://example.panopto.com/...",
"course_id": "course_20250101_120000_000000",
"course_name": "CSC282 - Algorithms",
"audio_only": true
}'curl "http://localhost:8000/api/videos/{video_id}/status"curl "http://localhost:8000/api/videos"# Transcript chunks for a lecture (first 3 chunks only)
PYTHONPATH=$PWD scripts/export_chunks.py --video-id video_20250101_120000_000000 --limit 3
# Slide chunks from a processed document
PYTHONPATH=$PWD scripts/export_chunks.py --document-id doc_20250102_130000_000000 --limit 3Outputs land in data/chunks/ for quick inspection before sending to Chroma.
PYTHONPATH=$PWD scripts/ingest_chroma.py \
--course-id course_20250101_120000_000000 \
--user-id alice@example.com \
--lectures video_20250105_101010_000001 \
--documents doc_20250106_123000_000000 \
--lecture-collection course_lectures \
--slide-collection course_slides \
--chroma-path data/chroma_db--lectures defaults to all lectures stored for that course; --documents is optional and expects slide decks that already have slides/describe output. Lecture and slide chunks are inserted into separate Chroma collections (specified via --lecture-collection and --slide-collection) so your agent can query them independently.
studybuddy-fastapi/
├── app/
│ ├── main.py # FastAPI application and routes
│ ├── models.py # Pydantic models
│ ├── downloader.py # Video download logic
│ ├── document_storage.py # PDF storage utilities
│ ├── chunkings/
│ │ └── chunking.py # Timestamp-aware chunking strategy
│ ├── storage.py # Local storage management
│ └── transcriber.py # ElevenLabs speech-to-text integration
├── storage/
│ ├── videos/ # Downloaded video files
│ └── audio/ # Extracted audio files (mp3)
├── data/ # Metadata and data files
├── requirements.txt # Python dependencies
└── README.md # This file
To proxy CopilotKit traffic from the Vite UI into StudyBuddy’s existing agent:
-
Expose AG-UI
python -m agent.dev_agui # serves http://localhost:8001/agui (override via AGUI_PORT)The script wraps
StudyBuddyChatAgentwith the Agno v2AGUIinterface, so you interact with the same retrieval pipeline as/api/chat. -
Start the CopilotKit bridge (Node 18+)
cd dev/copilotkit-server npm install @ag-ui/agno @copilotkit/runtime cors dotenv express \ && npm install -D @types/express @types/node ts-node typescript npx ts-node --project tsconfig.json server.ts # hosts http://localhost:3000/api/copilotkit
Optional env vars:
AGNO_AGENT_URL(defaults tohttp://localhost:8001/agui) andCOPILOTKIT_PORT(defaults to3000). -
Point Vite’s CopilotChat runtime at
http://localhost:3000/api/copilotkit. Messages now flow: Vite → CopilotKit bridge → AG-UI → StudyBuddy agent.
The server runs with auto-reload enabled when using the --reload flag, so changes to the code will automatically restart the server.
- Videos are stored in the
storage/videos/directory - Audio-only files are stored in
storage/audio/and share the samevideo_idfilename - Uploaded PDFs are stored in
storage/documents/with metadata indata/documents.json - Metadata (status, file paths, etc.) lives in
data/videos.json, while transcript text and segments are stored per-lecture underdata/transcripts/anddata/transcript_segments/ - CORS is enabled for all origins (configure appropriately for production)
To feed transcripts into Agno’s knowledge base while keeping the precise ElevenLabs timecodes,
use the TimestampAwareChunking strategy defined in app/chunkings/chunking.py. It converts the stored
transcript_segments (word-level timestamps) into chunks that include start_ms/end_ms
metadata so the frontend and agents can jump directly to the right moment in a lecture.
import json
from pathlib import Path
from agno.knowledge.document.base import Document
from app.chunkings.chunking import TimestampAwareChunking
transcript_text = Path(metadata["transcript_path"]).read_text(encoding="utf-8")
segments = json.loads(Path(metadata["transcript_segments_path"]).read_text(encoding="utf-8"))
doc = Document(
id=video_id,
name=metadata["title"],
content=transcript_text,
meta_data={
"segments": segments,
"lecture_id": video_id,
"source": "transcript",
"course_id": metadata.get("course_id"),
},
)
chunker = TimestampAwareChunking(max_words=110, max_duration_ms=75_000, overlap_ms=12_000)
chunks = chunker.chunk(doc)Each emitted chunk inherits the transcript context and includes the millisecond offsets required for semantic search + timestamp previews in your UI.