Skip to content

saat-sy/timbre

Timbre - Real-Time AI Music Generation for Videos

Create adaptive cinematic soundtracks powered by Google Lyria.

Built with FastAPI Next.js 14 Google Lyria Docker MIT License PRs Welcome

🎬Watch Live Demo β€’ 🌐Try it Now


🎼 What Is Timbre?

Timbre is a multimodal, adaptive scoring engine that creates real-time, context-aware music for your videos. By reading visual cues and spoken dialogue, Timbre identifies the exact vibe of every moment and uses Google Lyria to generate a perfectly synchronized soundtrack that evolves with your story.


✨ Key Features

  • 🎡 Real-time music generation using Google Lyria's streaming API
  • πŸ“½οΈ Automatic scene segmentation (PySceneDetect + OpenCV)
  • 🧠 Multimodal LLM analysis for mood, emotion, pacing
  • ⚑ Low-latency WebSocket audio streaming with custom buffering layer
  • πŸ›‘οΈ Fault-tolerant session manager using Redis + resumable Lyria sessions

πŸ—οΈ System Overview

System Architecture


βš™οΈ How It Works

Analysis Phase

  1. Video Upload - Client sends video file via multipart upload
  2. Parallel Processing - Concurrent frame extraction (OpenCV + PySceneDetect) and audio transcription
  3. LLM Musical Script - AI analyzes visual/audio content to generate tempo, key, mood timeline
  4. Session Creation - Redis stores analysis results and streaming configuration

Streaming Phase

  1. WebSocket Connection - Real-time bidirectional communication established
  2. Lyria Integration - Google's RT API receives musical prompts and streams audio
  3. Dynamic Adaptation - System adjusts musical parameters based on scene changes
  4. Seamless Delivery - 2-second audio chunks with smooth crossfading

🧠 Engineering Challenges & Solutions

  • Inference Speed: Switched to Groq and parallelized scene analysis because waiting for LLMs is boring.
  • Lyria Stability: Engineered a custom heartbeat and reconnection system to keep the Google Lyria WebSocket alive during long sessions.
  • Audio Artifacts: Wrote a crossfading algorithm to smooth out jarring "pops" between generated audio chunks.
  • Redis Latency: Implemented pipelining and connection pooling to prevent bottlenecks during high-frequency state updates.
  • Error Recovery: Added automatic retries and state migration so a single network blip doesn't crash the whole stream.

πŸ› οΈ Tech Stack

Backend

  • FastAPI - High-performance async API framework
  • Python 3.13 - Latest language features and performance
  • Redis - Session state and real-time data management
  • PySceneDetect - Intelligent video scene analysis
  • OpenCV - Computer vision and frame processing
  • Google Lyria RT - Real-time music generation
  • WebSockets - Low-latency bidirectional communication

Frontend

  • Next.js 14 - React framework with App Router
  • React 19 - Latest React features and concurrent rendering
  • AWS Amplify - Authentication (Cognito) and deployment
  • Tailwind CSS - Utility-first styling
  • Framer Motion - Smooth animations and transitions
  • TypeScript - Type-safe development

Infrastructure

  • Docker Compose - Containerized development environment
  • Turborepo - Monorepo build system and caching
  • pnpm - Fast, disk space efficient package manager
  • UV - Ultra-fast Python package installer and resolver

πŸ“ Project Structure

timbre/
β”œβ”€β”€ apps/
β”‚   β”œβ”€β”€ backend/                # FastAPI application
β”‚   β”‚   β”œβ”€β”€ service/            # Core business logic
β”‚   β”‚   β”‚   β”œβ”€β”€ auth/           # Authentication services
β”‚   β”‚   β”‚   β”œβ”€β”€ global_eval/    # Video analysis engine
β”‚   β”‚   β”‚   β”œβ”€β”€ lyria/          # Lyria API integration
β”‚   β”‚   β”‚   └── video/          # Video processing utilities
β”‚   β”‚   β”œβ”€β”€ utils/              # Shared utilities
β”‚   β”‚   β”‚   β”œβ”€β”€ audio/          # Audio processing
β”‚   β”‚   β”‚   β”œβ”€β”€ video/          # Video manipulation
β”‚   β”‚   β”‚   β”œβ”€β”€ llm/            # LLM integration & prompts
β”‚   β”‚   β”‚   └── helper/         # Common utilities
β”‚   β”‚   β”œβ”€β”€ models/             # Data models
β”‚   β”‚   └── tests/              # Test suite
β”‚   └── frontend/               # Next.js application
β”‚       β”œβ”€β”€ src/app/            # App Router pages
β”‚       β”œβ”€β”€ src/components/     # React components
β”‚       └── src/lib/            # Frontend utilities
β”œβ”€β”€ packages/                   # Shared packages
β”‚   β”œβ”€β”€ eslint-config/          # Linting configuration
β”‚   └── typescript-config/      # TypeScript settings
└── docker-compose.yml         # Development environment

πŸš€ Installation & Running Locally

Prerequisites

  • Docker & Docker Compose
  • Node.js 18+ and pnpm
  • Python 3.13+ and uv
  • Google Cloud Project with Lyria API access

Quick Start

# Clone the repository
git clone https://github.com/saat-sy/timbre.git
cd timbre

# Install Node.js dependencies
pnpm install

# Set up environment variables
cp apps/backend/.env.example apps/backend/.env
cp apps/frontend/.env.example apps/frontend/.env
# Configure your API keys and database URLs

# Start the development environment
docker-compose up -d

# Run both frontend and backend
pnpm dev

πŸŽ‰ That's it!


πŸ—ΊοΈ Roadmap

Core Features

  • Export mode - Export an MP4 file with background music
  • Advanced scene detection - More advanced emotion detection to understand the scene in depth
  • Multi-character emotional arcs - Track and score individual character journeys

DevOps & Production

  • CI/CD Pipeline - GitHub Actions for automated testing and deployment
  • Production deployment - Live staging and production environments
  • Monitoring & observability - Error tracking, performance metrics, and alerting
  • Load testing - Performance validation under high concurrent usage
  • Rate limiting - DDoS protection and API throttling

Testing & Quality

  • Frontend test suite - React component and integration testing
  • Backend test expansion - Increased unit test coverage and API testing
  • End-to-end tests - Full user workflow automation
  • Security scanning - Automated vulnerability detection
  • Performance benchmarking - Latency and throughput optimization

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“¬ Contact & About Me

Saatwik Yajaman - MSCS Student at USC
Building the future of AI-powered creative tools.

Always excited to discuss AI, music technology, and creative engineering!

About

Timbre is an AI tool that generates custom soundtracks for videos, matching mood, pacing, and style automatically.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors