Skip to content

jeremyky/CineWave

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CineWave

Transform audio into visual narratives. Every song deserves a screen.

CineWave is a multimodal platform that analyzes music (audio + lyrics), extracts emotional arcs and narrative structure, and generates cinematic storyboards and music video trailers using AI. It democratizes visual storytelling for independent artists who can't afford $5K–$20K production budgets.

Built for the Vultr hackathon — heavy cloud compute on Vultr; structured data in Snowflake.


Motivation

The Problem

Music discovery today is visual-first: TikTok, Instagram Reels, YouTube Shorts. Songs without compelling visual content struggle to reach new audiences. High-quality music videos are expensive—production crews, cinematographers, editors—and label-backed artists dominate feeds because they can produce content at scale. Creativity alone is not enough; production scale wins.

The Deeper Problem

This isn’t just a video-generation problem—it’s a multimodal understanding problem. Current platforms rely on shallow metadata and genre tags. AI video tools are generic; they don’t understand emotional arcs, lyrical meaning, or sonic progression. Most treat music as background audio. CineWave treats music as structured emotional data.

The Opportunity

Every song has an emotional trajectory: verse, chorus, bridge, build, release. From audio features, lyrics, and metadata, we can build:

  • Emotional signature embeddings → “Songs that feel similar” instead of “songs that sound similar”
  • Narrative mapping → Visual storytelling grounded in structure
  • Emotion-aware discovery → Reels-style feed driven by feeling, not genre

Solution

CineWave combines:

  1. Audio analysis — Waveform, spectral features, energy peaks, emotional arcs
  2. Lyric analysis — Themes, sentiment, imagery, semantic parsing
  3. Multimodal alignment — Lyrics + audio + structure → coherent narrative
  4. AI visual generation — Cinematic storyboards and video via Luma (and extensible providers)
  5. Reels feed — Discovery driven by generated visuals and emotional similarity

The video is the output layer; the core product is emotion-aware creative amplification.


Features

  • Upload or import — Audio files, YouTube URLs (via yt-dlp)
  • Analysis pipeline — Spectrogram, features, emotional peaks (external audio-analysis service or local fallback)
  • Storyboard generation — Per-shot prompts from audio features + lyrics + style preset
  • Music video trailer — Multi-shot generation with Luma, continuity or hard cuts, optional LLM refinement
  • Stitch & export — FFmpeg-based concatenation, audio overlay, subtitle burn
  • Reels feed — Scrollable clips linked to tracks and regions
  • Signature-based similarity — Global and segment embeddings for “songs similar to this moment”
  • Resolution selection — 512/768/1024 (images), 720p/1080p (video)

Vultr Cloud Compute (Hackathon Track)

CineWave uses Vultr for heavy cloud compute — our chosen hackathon track. Serverless (Vercel) cannot run long-running audio processing, FFmpeg, or Python ML pipelines. Vultr provides the compute tier that makes the product possible.

What Runs on Vultr

Component Role
Backend API Next.js API routes, or Node/Python API when split
Ingest Worker BullMQ worker: ffmpeg, yt-dlp, calls to audio-analysis
Audio Analysis Service FastAPI (Python): spectrograms, librosa, energy curves, peaks
Features API Transcript, LLM tags, director specs, NLP
Redis Job queue for async ingest
Vultr Object Storage S3-compatible storage for audio, spectrograms, media

Why Vultr

  • Compute limits — FFmpeg, yt-dlp, librosa, and long-running analysis need real VMs, not serverless timeouts
  • Docker deployment — Full stack (Next.js, worker, audio-analysis, features-api, Redis) runs via Docker Compose on a Vultr VM
  • Object Storage — S3-compatible; spectrograms and audio artifacts stored in Vultr Object Storage
  • Cost-effective — Dedicated CPU for predictable performance and cost

See deploy/ for Vultr setup: deploy/README.md, deploy/.env.example, Docker Compose config.


Snowflake Data Layer

Snowflake is our analytical data warehouse. All structured outputs from analysis and generation are stored in Snowflake for reproducibility, versioning, and analytics — not just blobs in object storage.

What Lives in Snowflake

Table / Area Purpose
TRACK_ANALYSIS_EVENTS Every analysis run: track_id, job_run_id, spectrogram_url, spectrogram_key, features_json, peaks_json, transcript, director_specs_json
TRACK_AUDIO_ANALYSIS Legacy analysis records: analysis_id, features_json, spectrogram_url, params
GENERATIONS Generated assets: song, type (image/video), title, description, thumbnail_url, media_url
SONGS_2025 Top 25 2025 seed data (prod)
Reels / Featured Featured reels for landing page sourced from Snowflake

Why Snowflake

  • Structured storage — Features, peaks, transcripts, and prompts as queryable JSON (VARIANT)
  • Reproducibility — Every analysis and generation is versioned and traceable
  • Analytics — Query patterns, usage, and insights over time
  • Hackathon alignment — Enterprise data warehouse instead of ad-hoc SQLite blobs

The ingest worker writes TRACK_ANALYSIS_EVENTS; the audio-analysis service writes to Snowflake; the app reads generations and featured reels from Snowflake when configured. See deploy/snowflake.sql, lib/snowflake-events.ts, lib/snowflake-generations.ts.


Architecture

High-Level Flow

User Browser
    → Next.js (Vercel / local)
    → API Routes (upload, analyze, generate, runs, stitch)
    → Redis (BullMQ) → Ingest Worker on Vultr (ffmpeg, yt-dlp, audio-analysis)
    → Prisma + SQLite (local) / Postgres (prod)
    → Snowflake (analysis events, generations, reels)  ← data warehouse
    → Vultr Object Storage / S3 (audio, spectrograms, assets)
    → Luma (video generation)
    → FFmpeg (stitch)

Core Components

Layer Responsibility
Frontend Next.js 14 (App Router), Tailwind, shadcn/ui, wavesurfer.js, Framer Motion
Auth Clerk
API Next.js API routes for tracks, songs, uploads, generations, runs, reels
Queue BullMQ + Redis for track ingest (upload, YouTube)
Worker workers/ingest-worker.ts — runs on Vultr; analysis, S3 upload, Snowflake write
Orchestrator src/lib/runOrchestrator.ts — Luma job creation, polling, continuity
Storage Vultr Object Storage or S3 (audio, spectrograms, media)
Data Snowflake — analysis events, generations, featured reels, signatures

Technical Architecture

Feature Extraction Pipelines

Two pipelines combine metadata, lyrics, and audio/spectrogram features to produce gen AI prompts:

Pipeline 1: Features API (features/extract.py, features_app.py)

Step Component Data Output
1 Whisper/faster-whisper Audio Segments with timestamps
2 features/audio_analysis.py (librosa) Audio BPM, beat times, RMS loudness curve
3 features/lyrics_analysis.py Segments Lyric lines, stats
4 features/structure.py Lines + duration Sections (verse/chorus)
5 LLM (features/llm_tags.py) Metadata + lyric lines Global style, per-line tags (emotion, topics, imagery, prompt_atoms)
6 features/scene_plan.py LLM tags + RMS curve + structure Scene plan (prompts per lyric line)

Pipeline 2: Storyboard / Run (spectrogram.py + src/lib/storyboardPrompts.ts)

Step Component Data Output
1 spectrogram.py (librosa STFT) Audio tempo, RMS, spectral centroid, onset peaks, frequency bands, sections
2 services/audio_analysis/prompt_builder.py Spectrogram features + lyrics PromptCandidate (style, mood, story, scene_arc, key_moments, visual_motifs)
3 storyboardPrompts.ts PromptCandidate + shot plan Per-shot prompts
4 LLM (optional, useLlm: true) Deterministic prompts + context Refined per-shot prompts

LLM Usage

  • Model: OpenAI GPT-4o-mini (configurable via LLM_MODEL, default gpt-4o-mini)
  • Features API: One call per song — metadata + lyric lines → global_style + per_line tags (vibe_keywords, visual_style_preset, color_palette, prompt_atoms: setting/subject/action/mood/lighting)
  • Storyboard refinement: Optional second call — style/mood/story + shot count → refined { index, prompt } per shot
  • Fallback: Stub/heuristic tags when OPENAI_API_KEY is missing

Spectrogram Analysis (spectrogram.py)

Extracted via librosa (STFT, n_fft=2048, hop_length=512):

Feature Description
rhythm.tempo_bpm Detected tempo
rhythm.beat_times_s Beat timestamps (capped)
energy.rms_summary Mean/min/max/percentiles of RMS
energy.dynamic_range_rms_p95_minus_p5 Dynamic range
brightness.spectral_centroid_hz_summary Spectral brightness
moments.peak_moments_s Top-k onset peaks (configurable)
frequency_bands.band_energy_ratios Sub-bass, bass, mids, presence, air
sections Structural segments from spectral changes

Prompt Assembly

  • Scene plan (features pipeline): preset + vibe + setting + subject + action + mood + lighting + colors; motion_intensity from RMS at line start; chorus sections get +0.2 intensity boost
  • Storyboard (run pipeline): style + energy + brightness + motifs + moment_label; optional LLM refinement for more varied descriptions

Data Model

Prisma (app DB) — Users, tracks, runs, shots, jobs, assets.

Snowflake (data warehouse) — Analysis events, generations, featured reels, signatures. All structured outputs (features, peaks, transcripts, prompts) are queryable for reproducibility and analytics.

Prisma Snowflake
User, Track, Analysis, Moment TRACK_ANALYSIS_EVENTS (features_json, peaks_json, spectrogram_url)
Generation, Reel GENERATIONS (media_url, type, metadata)
GenerationRun, Shot, ProviderJob, Asset
CanonicalSong, SongDraft SONGS_2025 (prod seed)
Signature Signatures / similarity index (planned)

Key Flows

1. Track Ingest

  • User uploads file or provides YouTube URL
  • API creates Track, enqueues BullMQ job
  • Worker (Vultr): downloads (yt-dlp) or processes file, calls audio-analysis service
  • Spectrogram → Vultr Object Storage / S3; features, peaks → Snowflake (TRACK_ANALYSIS_EVENTS); Track → READY

2. Music Video Generation

  • User starts run from Create or song page (track/song draft)
  • POST /api/runsgetAudioFeaturesForRungenerateStoryboardPromptsbuildStoryboardAndShots
  • Shots + ProviderJobs created; processRun submits Luma jobs with continuity
  • Luma callback or polling updates ProviderJob status; Assets store video URLs
  • POST /api/runs/[runId]/stitch concatenates clips, overlays audio, returns final video

3. Storyboard / Image Generation

  • POST /api/tracks/[id]/generate or /api/generate (mock)
  • Generations stored with resolution; linked to Reels for feed

System Design

API Surface

Endpoint Method Purpose
/api/tracks/upload POST Multipart upload, enqueue ingest
/api/tracks/import-youtube POST YouTube URL → enqueue yt-dlp + ingest
/api/tracks/[id] GET Track status, analysis metadata
/api/tracks/[id]/analyze POST Trigger spectrogram analysis (or use worker)
/api/runs POST Create generation run (track/song draft + duration)
/api/runs/[id] GET Run status, shot progress
/api/runs/[id]/stitch POST Concatenate clips, overlay audio, burn lyrics
/api/providers/luma/callback POST Luma webhook (optional)

External Service APIs

Service Purpose
Audio Analysis (/analyze, /build-prompt) Spectrogram, features; prompt candidates from audio + lyrics
Features API (/songs/upload, /songs/{id}/process, /songs/{id}/features) Transcribe, LLM tags, scene plan, Snowflake write
Luma API Video generation from text prompts

Job Queue (BullMQ)

  • Queue: track-ingest
  • Job types: processUpload, downloadYoutubeAudio
  • Retries: 3 attempts, exponential backoff (2s base)
  • Retention: Last 100 completed jobs
  • Connection: Redis (from REDIS_URL)

Storage Modes

Mode Audio Spectrograms Notes
local UPLOADS_DIR / volume Local disk Default; single-node
s3 Vultr Object Storage / S3 S3 bucket Multi-node; set S3_* env vars

Scalability

Horizontal Scaling

  • Worker: Run multiple npm run worker instances; BullMQ distributes jobs
  • Redis: Single instance sufficient for ~10K jobs/day; cluster for higher throughput
  • Audio Analysis / Features API: Stateless; scale behind load balancer
  • Next.js: Stateless; scale behind nginx/load balancer

Limits & Timeouts

Config Default Env
Upload size 50 MB MAX_UPLOAD_MB
Analysis timeout 120 s ANALYSIS_TIMEOUT_S
Nginx client body 50 M client_max_body_size
Proxy read/send 150 s nginx.conf

Bottlenecks

  1. Audio analysis — CPU-bound (librosa); long tracks (5+ min) may hit timeout; consider ANALYSIS_TIMEOUT_S increase
  2. Luma API — External rate limits; run orchestrator polls; continuity mode creates jobs sequentially
  3. Transcription — faster-whisper is GPU-acceleratable; CPU-only on smaller VMs
  4. Stitch — FFmpeg CPU-bound; large runs (10+ shots) may take 1–2 min

Resource Recommendations

Deployment CPU RAM Notes
Local dev 2 4 GB SQLite, single worker
Small prod 4 8 GB Docker Compose, 1 worker
Production 8+ 16 GB+ Separate worker nodes, S3, Snowflake

Tech Stack

Category Technology
Cloud Compute Vultr (VM, Docker, Object Storage)
Data Warehouse Snowflake (analysis events, generations, reels)
Framework Next.js 14 (App Router)
Language TypeScript
UI React 18, Tailwind CSS, shadcn/ui, Framer Motion
Auth Clerk
DB Prisma + SQLite (dev)
Queue BullMQ, Redis
Storage Vultr Object Storage, S3-compatible (AWS SDK)
Video Luma API, FFmpeg
Waveform wavesurfer.js
External yt-dlp, ffmpeg, ffprobe

Getting Started

Prerequisites

  • Node.js 18+
  • Redis (for ingest jobs)
  • ffmpeg, ffprobe (for audio/video)
  • yt-dlp (optional, for YouTube import)

Setup

# Install dependencies
npm install

# Copy environment
cp .env.example .env
# Edit .env: DATABASE_URL, REDIS_URL, Clerk keys

# Initialize database
npm run db:push

# Optional: seed data
npm run db:seed
# Or: npm run db:seed-songs-2025

Run

# Terminal 1: Next.js dev server
npm run dev

# Terminal 2: Ingest worker (processes uploads, YouTube, analysis)
npm run worker

Open http://localhost:3000.

Optional — Features API (transcribe, LLM tags, scene plan):

pip install -r features_requirements.txt
uvicorn features_app:app --host 0.0.0.0 --port 8000
# Set API_BACKEND_URL=http://localhost:8000 or NEXT_PUBLIC_API_BASE_URL

Optional — Audio Analysis (spectrogram, build-prompt): see services/audio_analysis/; or use AUDIO_ANALYSIS_SERVICE_URL pointing to a deployed instance.

Deployment (Vultr / Docker)

Prerequisites: Docker, Docker Compose; Vultr VM or any Linux host.

Vultr VM setup:

ssh root@YOUR_VULTR_IP
curl -fsSL https://get.docker.com | sh
systemctl enable docker && systemctl start docker
apt-get update && apt-get install -y docker-compose-plugin

Deploy:

cd deploy
cp .env.example .env
# Edit .env: CLERK keys, SNOWFLAKE_*, S3_*, OPENAI_API_KEY
# Run Snowflake DDL: deploy/snowflake.sql (in Snowflake worksheet)
docker compose up -d --build

Docker services (behind nginx on port 80):

Service Internal Port Role
nginx 80 Reverse proxy; routes /, /audio-analysis/, /features-api/
web 3000 Next.js
audio-analysis 8001 Spectrogram + features (FastAPI)
features-api 8000 Transcript, LLM tags, director (FastAPI)
redis 6379 BullMQ
worker BullMQ ingest (ffmpeg, yt-dlp, analysis calls)

Nginx routing (deploy/nginx.conf):

  • / → web:3000
  • /audio-analysis/* → audio-analysis:8001
  • /features-api/* → features-api:8000
  • client_max_body_size 50M; proxy_read_timeout 150s

Volumes: redis_data, app_data (SQLite + uploads), audio_artifacts

Health checks:

curl http://localhost/                    # Next.js
curl http://localhost/audio-analysis/health
curl http://localhost/features-api/docs   # Features API Swagger

Troubleshooting:

  • Worker failing → docker compose ps; ensure Redis healthy
  • Analysis timeout → increase ANALYSIS_TIMEOUT_S in .env
  • DB errors → docker compose exec web npx prisma db push
  • Upload 404 → check UPLOADS_DIR, app_data volume

S3 (Vultr Object Storage):

  1. Create bucket in Vultr Object Storage
  2. Set S3_BUCKET, S3_ENDPOINT, S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY in .env
  3. Set STORAGE_MODE=s3 for audio; spectrograms go to S3 when configured

Environment Variables

Variable Required Description
DATABASE_URL Yes Prisma DB URL (file:./dev.db for SQLite)
REDIS_URL Yes Redis for BullMQ (redis://localhost:6379)
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY Yes Clerk publishable key
CLERK_SECRET_KEY Yes Clerk secret key
APP_URL No Base URL for worker callbacks (e.g. https://yourdomain.com)
AUDIO_ANALYSIS_SERVICE_URL No Audio analysis API (in Docker: http://nginx/audio-analysis)
FEATURES_API_URL No Features API (in Docker: http://nginx/features-api)
STORAGE_MODE No local or s3
S3_BUCKET, S3_ENDPOINT, S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY S3 Vultr Object Storage or S3
SNOWFLAKE_ACCOUNT, SNOWFLAKE_USER, SNOWFLAKE_PASSWORD Prod Snowflake — analysis events, generations, reels
SNOWFLAKE_WAREHOUSE, SNOWFLAKE_DATABASE, SNOWFLAKE_SCHEMA No Default: COMPUTE_WH, SPECTRA, APP
OPENAI_API_KEY LLM For LLM tags; stub used if missing
LLM_MODEL No Default gpt-4o-mini
LUMA_API_KEY No Luma API for video generation
MAX_UPLOAD_MB No Default 50
ANALYSIS_TIMEOUT_S No Default 120

API Examples

Create run from track:

curl -X POST http://localhost:3000/api/runs \
  -H "Content-Type: application/json" \
  -H "Cookie: <auth-cookie>" \
  -d '{"trackId":"TRACK_ID","durationSeconds":15,"stylePreset":"cinematic","mode":"continuity"}'

Create run from song input (URL + metadata):

curl -X POST http://localhost:3000/api/runs \
  -H "Content-Type: application/json" \
  -H "Cookie: <auth-cookie>" \
  -d '{
    "songInput": {
      "audioSource": {"type":"URL","url":"https://example.com/track.mp3"},
      "lyrics": {"raw":"Verse lyrics..."},
      "metadata": {"title":"Song","artist":"Artist"}
    },
    "durationSeconds":12,
    "stylePreset":"noir"
  }'

Features API (upload → process → features):

curl -X POST -F "file=@song.mp3" -F "title=My Song" -F "artist=Me" http://localhost:8000/songs/upload
curl -X POST http://localhost:8000/songs/{song_id}/process
curl http://localhost:8000/songs/{song_id}/features

Scripts

Script Description
npm run dev Start Next.js dev server
npm run build Production build
npm run start Start production server
npm run worker Start ingest worker
npm run db:push Push Prisma schema
npm run db:seed Seed database
npm run db:studio Open Prisma Studio
npm run db:seed-songs-2025 Seed Top 25 2025 songs (local)
npm run db:seed-songs-2025-snowflake Seed Top 25 2025 songs into Snowflake
npm run db:seed-generations-snowflake Seed Snowflake GENERATIONS table
npm run db:test-snowflake Test Snowflake connection
npm run build:viva-demo Build Viva music video demo

Project Structure

├── app/                    # Next.js App Router
│   ├── (landing)/          # Landing page
│   ├── (main)/             # Dashboard, create, library, discover, reels, video, storyboard, timeline
│   ├── auth/               # Clerk auth routes
│   ├── api/                # API routes (tracks, runs, providers, etc.)
│   └── demo/               # Demo pages
├── components/             # React components (landing, create, processing, ui)
├── deploy/                 # Docker, nginx, Vultr deployment
│   ├── Dockerfile.*        # web, worker, features, audio-analysis
│   ├── docker-compose.yml
│   ├── nginx.conf
│   └── snowflake.sql
├── features/               # Python features API (transcribe, LLM tags, scene plan)
│   ├── extract.py          # Main pipeline: audio → lyrics → LLM → scene_plan
│   ├── llm_tags.py         # GPT-4o-mini lyric tagging
│   ├── scene_plan.py       # Prompts from LLM + RMS
│   └── audio_analysis.py   # librosa (BPM, beats, RMS)
├── lib/                    # DB, auth, audio-analysis, storage, snowflake
├── services/
│   └── audio_analysis/     # FastAPI spectrogram service
│       ├── app.py
│       ├── analysis_runner.py
│       └── prompt_builder.py
├── spectrogram.py          # librosa STFT, peaks, bands, sections
├── src/
│   ├── lib/                # runOrchestrator, storyboardPrompts, shotPlanner, stitcher
│   ├── providers/          # Luma adapter
│   └── schemas/            # Zod schemas
├── workers/                # Ingest worker (BullMQ)
├── prisma/
│   ├── schema.prisma
│   └── seed.ts
├── scripts/                # Seed, build demo
└── server/                 # Queue definition

Security

  • Auth: Clerk handles sign-in; API routes check auth() for protected endpoints
  • Uploads: File type validation (audio extensions); size limit via MAX_UPLOAD_MB
  • Secrets: OPENAI_API_KEY, CLERK_SECRET_KEY, SNOWFLAKE_*, S3_* — never commit; use .env
  • CORS: Next.js same-origin by default; configure for custom domains if needed
  • Storage: S3 presigned URLs or public bucket; ensure bucket policy restricts writes

Further Documentation

Doc Contents
deploy/README.md Vultr VM, Docker Compose, troubleshooting
features/README.md Features API env vars, Snowflake tables
lyrics_ts/README.md Transcription (faster-whisper)

License

ISC

About

Hackathon Winner |

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors