Transform audio into visual narratives. Every song deserves a screen.
CineWave is a multimodal platform that analyzes music (audio + lyrics), extracts emotional arcs and narrative structure, and generates cinematic storyboards and music video trailers using AI. It democratizes visual storytelling for independent artists who can't afford $5K–$20K production budgets.
Built for the Vultr hackathon — heavy cloud compute on Vultr; structured data in Snowflake.
Music discovery today is visual-first: TikTok, Instagram Reels, YouTube Shorts. Songs without compelling visual content struggle to reach new audiences. High-quality music videos are expensive—production crews, cinematographers, editors—and label-backed artists dominate feeds because they can produce content at scale. Creativity alone is not enough; production scale wins.
This isn’t just a video-generation problem—it’s a multimodal understanding problem. Current platforms rely on shallow metadata and genre tags. AI video tools are generic; they don’t understand emotional arcs, lyrical meaning, or sonic progression. Most treat music as background audio. CineWave treats music as structured emotional data.
Every song has an emotional trajectory: verse, chorus, bridge, build, release. From audio features, lyrics, and metadata, we can build:
- Emotional signature embeddings → “Songs that feel similar” instead of “songs that sound similar”
- Narrative mapping → Visual storytelling grounded in structure
- Emotion-aware discovery → Reels-style feed driven by feeling, not genre
CineWave combines:
- Audio analysis — Waveform, spectral features, energy peaks, emotional arcs
- Lyric analysis — Themes, sentiment, imagery, semantic parsing
- Multimodal alignment — Lyrics + audio + structure → coherent narrative
- AI visual generation — Cinematic storyboards and video via Luma (and extensible providers)
- Reels feed — Discovery driven by generated visuals and emotional similarity
The video is the output layer; the core product is emotion-aware creative amplification.
- Upload or import — Audio files, YouTube URLs (via yt-dlp)
- Analysis pipeline — Spectrogram, features, emotional peaks (external audio-analysis service or local fallback)
- Storyboard generation — Per-shot prompts from audio features + lyrics + style preset
- Music video trailer — Multi-shot generation with Luma, continuity or hard cuts, optional LLM refinement
- Stitch & export — FFmpeg-based concatenation, audio overlay, subtitle burn
- Reels feed — Scrollable clips linked to tracks and regions
- Signature-based similarity — Global and segment embeddings for “songs similar to this moment”
- Resolution selection — 512/768/1024 (images), 720p/1080p (video)
CineWave uses Vultr for heavy cloud compute — our chosen hackathon track. Serverless (Vercel) cannot run long-running audio processing, FFmpeg, or Python ML pipelines. Vultr provides the compute tier that makes the product possible.
| Component | Role |
|---|---|
| Backend API | Next.js API routes, or Node/Python API when split |
| Ingest Worker | BullMQ worker: ffmpeg, yt-dlp, calls to audio-analysis |
| Audio Analysis Service | FastAPI (Python): spectrograms, librosa, energy curves, peaks |
| Features API | Transcript, LLM tags, director specs, NLP |
| Redis | Job queue for async ingest |
| Vultr Object Storage | S3-compatible storage for audio, spectrograms, media |
- Compute limits — FFmpeg, yt-dlp, librosa, and long-running analysis need real VMs, not serverless timeouts
- Docker deployment — Full stack (Next.js, worker, audio-analysis, features-api, Redis) runs via Docker Compose on a Vultr VM
- Object Storage — S3-compatible; spectrograms and audio artifacts stored in Vultr Object Storage
- Cost-effective — Dedicated CPU for predictable performance and cost
See deploy/ for Vultr setup: deploy/README.md, deploy/.env.example, Docker Compose config.
Snowflake is our analytical data warehouse. All structured outputs from analysis and generation are stored in Snowflake for reproducibility, versioning, and analytics — not just blobs in object storage.
| Table / Area | Purpose |
|---|---|
| TRACK_ANALYSIS_EVENTS | Every analysis run: track_id, job_run_id, spectrogram_url, spectrogram_key, features_json, peaks_json, transcript, director_specs_json |
| TRACK_AUDIO_ANALYSIS | Legacy analysis records: analysis_id, features_json, spectrogram_url, params |
| GENERATIONS | Generated assets: song, type (image/video), title, description, thumbnail_url, media_url |
| SONGS_2025 | Top 25 2025 seed data (prod) |
| Reels / Featured | Featured reels for landing page sourced from Snowflake |
- Structured storage — Features, peaks, transcripts, and prompts as queryable JSON (VARIANT)
- Reproducibility — Every analysis and generation is versioned and traceable
- Analytics — Query patterns, usage, and insights over time
- Hackathon alignment — Enterprise data warehouse instead of ad-hoc SQLite blobs
The ingest worker writes TRACK_ANALYSIS_EVENTS; the audio-analysis service writes to Snowflake; the app reads generations and featured reels from Snowflake when configured. See deploy/snowflake.sql, lib/snowflake-events.ts, lib/snowflake-generations.ts.
User Browser
→ Next.js (Vercel / local)
→ API Routes (upload, analyze, generate, runs, stitch)
→ Redis (BullMQ) → Ingest Worker on Vultr (ffmpeg, yt-dlp, audio-analysis)
→ Prisma + SQLite (local) / Postgres (prod)
→ Snowflake (analysis events, generations, reels) ← data warehouse
→ Vultr Object Storage / S3 (audio, spectrograms, assets)
→ Luma (video generation)
→ FFmpeg (stitch)
| Layer | Responsibility |
|---|---|
| Frontend | Next.js 14 (App Router), Tailwind, shadcn/ui, wavesurfer.js, Framer Motion |
| Auth | Clerk |
| API | Next.js API routes for tracks, songs, uploads, generations, runs, reels |
| Queue | BullMQ + Redis for track ingest (upload, YouTube) |
| Worker | workers/ingest-worker.ts — runs on Vultr; analysis, S3 upload, Snowflake write |
| Orchestrator | src/lib/runOrchestrator.ts — Luma job creation, polling, continuity |
| Storage | Vultr Object Storage or S3 (audio, spectrograms, media) |
| Data | Snowflake — analysis events, generations, featured reels, signatures |
Two pipelines combine metadata, lyrics, and audio/spectrogram features to produce gen AI prompts:
Pipeline 1: Features API (features/extract.py, features_app.py)
| Step | Component | Data | Output |
|---|---|---|---|
| 1 | Whisper/faster-whisper | Audio | Segments with timestamps |
| 2 | features/audio_analysis.py (librosa) |
Audio | BPM, beat times, RMS loudness curve |
| 3 | features/lyrics_analysis.py |
Segments | Lyric lines, stats |
| 4 | features/structure.py |
Lines + duration | Sections (verse/chorus) |
| 5 | LLM (features/llm_tags.py) |
Metadata + lyric lines | Global style, per-line tags (emotion, topics, imagery, prompt_atoms) |
| 6 | features/scene_plan.py |
LLM tags + RMS curve + structure | Scene plan (prompts per lyric line) |
Pipeline 2: Storyboard / Run (spectrogram.py + src/lib/storyboardPrompts.ts)
| Step | Component | Data | Output |
|---|---|---|---|
| 1 | spectrogram.py (librosa STFT) |
Audio | tempo, RMS, spectral centroid, onset peaks, frequency bands, sections |
| 2 | services/audio_analysis/prompt_builder.py |
Spectrogram features + lyrics | PromptCandidate (style, mood, story, scene_arc, key_moments, visual_motifs) |
| 3 | storyboardPrompts.ts |
PromptCandidate + shot plan | Per-shot prompts |
| 4 | LLM (optional, useLlm: true) |
Deterministic prompts + context | Refined per-shot prompts |
- Model: OpenAI GPT-4o-mini (configurable via
LLM_MODEL, defaultgpt-4o-mini) - Features API: One call per song — metadata + lyric lines →
global_style+per_linetags (vibe_keywords, visual_style_preset, color_palette, prompt_atoms: setting/subject/action/mood/lighting) - Storyboard refinement: Optional second call — style/mood/story + shot count → refined
{ index, prompt }per shot - Fallback: Stub/heuristic tags when
OPENAI_API_KEYis missing
Extracted via librosa (STFT, n_fft=2048, hop_length=512):
| Feature | Description |
|---|---|
rhythm.tempo_bpm |
Detected tempo |
rhythm.beat_times_s |
Beat timestamps (capped) |
energy.rms_summary |
Mean/min/max/percentiles of RMS |
energy.dynamic_range_rms_p95_minus_p5 |
Dynamic range |
brightness.spectral_centroid_hz_summary |
Spectral brightness |
moments.peak_moments_s |
Top-k onset peaks (configurable) |
frequency_bands.band_energy_ratios |
Sub-bass, bass, mids, presence, air |
sections |
Structural segments from spectral changes |
- Scene plan (features pipeline):
preset + vibe + setting + subject + action + mood + lighting + colors;motion_intensityfrom RMS at line start; chorus sections get +0.2 intensity boost - Storyboard (run pipeline):
style + energy + brightness + motifs + moment_label; optional LLM refinement for more varied descriptions
Prisma (app DB) — Users, tracks, runs, shots, jobs, assets.
Snowflake (data warehouse) — Analysis events, generations, featured reels, signatures. All structured outputs (features, peaks, transcripts, prompts) are queryable for reproducibility and analytics.
| Prisma | Snowflake |
|---|---|
| User, Track, Analysis, Moment | TRACK_ANALYSIS_EVENTS (features_json, peaks_json, spectrogram_url) |
| Generation, Reel | GENERATIONS (media_url, type, metadata) |
| GenerationRun, Shot, ProviderJob, Asset | — |
| CanonicalSong, SongDraft | SONGS_2025 (prod seed) |
| Signature | Signatures / similarity index (planned) |
1. Track Ingest
- User uploads file or provides YouTube URL
- API creates Track, enqueues BullMQ job
- Worker (Vultr): downloads (yt-dlp) or processes file, calls audio-analysis service
- Spectrogram → Vultr Object Storage / S3; features, peaks → Snowflake (
TRACK_ANALYSIS_EVENTS); Track → READY
2. Music Video Generation
- User starts run from Create or song page (track/song draft)
POST /api/runs→getAudioFeaturesForRun→generateStoryboardPrompts→buildStoryboardAndShots- Shots + ProviderJobs created;
processRunsubmits Luma jobs with continuity - Luma callback or polling updates ProviderJob status; Assets store video URLs
POST /api/runs/[runId]/stitchconcatenates clips, overlays audio, returns final video
3. Storyboard / Image Generation
POST /api/tracks/[id]/generateor/api/generate(mock)- Generations stored with resolution; linked to Reels for feed
| Endpoint | Method | Purpose |
|---|---|---|
/api/tracks/upload |
POST | Multipart upload, enqueue ingest |
/api/tracks/import-youtube |
POST | YouTube URL → enqueue yt-dlp + ingest |
/api/tracks/[id] |
GET | Track status, analysis metadata |
/api/tracks/[id]/analyze |
POST | Trigger spectrogram analysis (or use worker) |
/api/runs |
POST | Create generation run (track/song draft + duration) |
/api/runs/[id] |
GET | Run status, shot progress |
/api/runs/[id]/stitch |
POST | Concatenate clips, overlay audio, burn lyrics |
/api/providers/luma/callback |
POST | Luma webhook (optional) |
| Service | Purpose |
|---|---|
Audio Analysis (/analyze, /build-prompt) |
Spectrogram, features; prompt candidates from audio + lyrics |
Features API (/songs/upload, /songs/{id}/process, /songs/{id}/features) |
Transcribe, LLM tags, scene plan, Snowflake write |
| Luma API | Video generation from text prompts |
- Queue:
track-ingest - Job types:
processUpload,downloadYoutubeAudio - Retries: 3 attempts, exponential backoff (2s base)
- Retention: Last 100 completed jobs
- Connection: Redis (from
REDIS_URL)
| Mode | Audio | Spectrograms | Notes |
|---|---|---|---|
local |
UPLOADS_DIR / volume |
Local disk | Default; single-node |
s3 |
Vultr Object Storage / S3 | S3 bucket | Multi-node; set S3_* env vars |
- Worker: Run multiple
npm run workerinstances; BullMQ distributes jobs - Redis: Single instance sufficient for ~10K jobs/day; cluster for higher throughput
- Audio Analysis / Features API: Stateless; scale behind load balancer
- Next.js: Stateless; scale behind nginx/load balancer
| Config | Default | Env |
|---|---|---|
| Upload size | 50 MB | MAX_UPLOAD_MB |
| Analysis timeout | 120 s | ANALYSIS_TIMEOUT_S |
| Nginx client body | 50 M | client_max_body_size |
| Proxy read/send | 150 s | nginx.conf |
- Audio analysis — CPU-bound (librosa); long tracks (5+ min) may hit timeout; consider
ANALYSIS_TIMEOUT_Sincrease - Luma API — External rate limits; run orchestrator polls; continuity mode creates jobs sequentially
- Transcription — faster-whisper is GPU-acceleratable; CPU-only on smaller VMs
- Stitch — FFmpeg CPU-bound; large runs (10+ shots) may take 1–2 min
| Deployment | CPU | RAM | Notes |
|---|---|---|---|
| Local dev | 2 | 4 GB | SQLite, single worker |
| Small prod | 4 | 8 GB | Docker Compose, 1 worker |
| Production | 8+ | 16 GB+ | Separate worker nodes, S3, Snowflake |
| Category | Technology |
|---|---|
| Cloud Compute | Vultr (VM, Docker, Object Storage) |
| Data Warehouse | Snowflake (analysis events, generations, reels) |
| Framework | Next.js 14 (App Router) |
| Language | TypeScript |
| UI | React 18, Tailwind CSS, shadcn/ui, Framer Motion |
| Auth | Clerk |
| DB | Prisma + SQLite (dev) |
| Queue | BullMQ, Redis |
| Storage | Vultr Object Storage, S3-compatible (AWS SDK) |
| Video | Luma API, FFmpeg |
| Waveform | wavesurfer.js |
| External | yt-dlp, ffmpeg, ffprobe |
- Node.js 18+
- Redis (for ingest jobs)
- ffmpeg, ffprobe (for audio/video)
- yt-dlp (optional, for YouTube import)
# Install dependencies
npm install
# Copy environment
cp .env.example .env
# Edit .env: DATABASE_URL, REDIS_URL, Clerk keys
# Initialize database
npm run db:push
# Optional: seed data
npm run db:seed
# Or: npm run db:seed-songs-2025# Terminal 1: Next.js dev server
npm run dev
# Terminal 2: Ingest worker (processes uploads, YouTube, analysis)
npm run workerOpen http://localhost:3000.
Optional — Features API (transcribe, LLM tags, scene plan):
pip install -r features_requirements.txt
uvicorn features_app:app --host 0.0.0.0 --port 8000
# Set API_BACKEND_URL=http://localhost:8000 or NEXT_PUBLIC_API_BASE_URLOptional — Audio Analysis (spectrogram, build-prompt): see services/audio_analysis/; or use AUDIO_ANALYSIS_SERVICE_URL pointing to a deployed instance.
Prerequisites: Docker, Docker Compose; Vultr VM or any Linux host.
Vultr VM setup:
ssh root@YOUR_VULTR_IP
curl -fsSL https://get.docker.com | sh
systemctl enable docker && systemctl start docker
apt-get update && apt-get install -y docker-compose-pluginDeploy:
cd deploy
cp .env.example .env
# Edit .env: CLERK keys, SNOWFLAKE_*, S3_*, OPENAI_API_KEY
# Run Snowflake DDL: deploy/snowflake.sql (in Snowflake worksheet)
docker compose up -d --buildDocker services (behind nginx on port 80):
| Service | Internal Port | Role |
|---|---|---|
| nginx | 80 | Reverse proxy; routes /, /audio-analysis/, /features-api/ |
| web | 3000 | Next.js |
| audio-analysis | 8001 | Spectrogram + features (FastAPI) |
| features-api | 8000 | Transcript, LLM tags, director (FastAPI) |
| redis | 6379 | BullMQ |
| worker | — | BullMQ ingest (ffmpeg, yt-dlp, analysis calls) |
Nginx routing (deploy/nginx.conf):
/→ web:3000/audio-analysis/*→ audio-analysis:8001/features-api/*→ features-api:8000client_max_body_size 50M;proxy_read_timeout 150s
Volumes: redis_data, app_data (SQLite + uploads), audio_artifacts
Health checks:
curl http://localhost/ # Next.js
curl http://localhost/audio-analysis/health
curl http://localhost/features-api/docs # Features API SwaggerTroubleshooting:
- Worker failing →
docker compose ps; ensure Redis healthy - Analysis timeout → increase
ANALYSIS_TIMEOUT_Sin.env - DB errors →
docker compose exec web npx prisma db push - Upload 404 → check
UPLOADS_DIR,app_datavolume
S3 (Vultr Object Storage):
- Create bucket in Vultr Object Storage
- Set
S3_BUCKET,S3_ENDPOINT,S3_ACCESS_KEY_ID,S3_SECRET_ACCESS_KEYin.env - Set
STORAGE_MODE=s3for audio; spectrograms go to S3 when configured
| Variable | Required | Description |
|---|---|---|
DATABASE_URL |
Yes | Prisma DB URL (file:./dev.db for SQLite) |
REDIS_URL |
Yes | Redis for BullMQ (redis://localhost:6379) |
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY |
Yes | Clerk publishable key |
CLERK_SECRET_KEY |
Yes | Clerk secret key |
APP_URL |
No | Base URL for worker callbacks (e.g. https://yourdomain.com) |
AUDIO_ANALYSIS_SERVICE_URL |
No | Audio analysis API (in Docker: http://nginx/audio-analysis) |
FEATURES_API_URL |
No | Features API (in Docker: http://nginx/features-api) |
STORAGE_MODE |
No | local or s3 |
S3_BUCKET, S3_ENDPOINT, S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY |
S3 | Vultr Object Storage or S3 |
SNOWFLAKE_ACCOUNT, SNOWFLAKE_USER, SNOWFLAKE_PASSWORD |
Prod | Snowflake — analysis events, generations, reels |
SNOWFLAKE_WAREHOUSE, SNOWFLAKE_DATABASE, SNOWFLAKE_SCHEMA |
No | Default: COMPUTE_WH, SPECTRA, APP |
OPENAI_API_KEY |
LLM | For LLM tags; stub used if missing |
LLM_MODEL |
No | Default gpt-4o-mini |
LUMA_API_KEY |
No | Luma API for video generation |
MAX_UPLOAD_MB |
No | Default 50 |
ANALYSIS_TIMEOUT_S |
No | Default 120 |
Create run from track:
curl -X POST http://localhost:3000/api/runs \
-H "Content-Type: application/json" \
-H "Cookie: <auth-cookie>" \
-d '{"trackId":"TRACK_ID","durationSeconds":15,"stylePreset":"cinematic","mode":"continuity"}'Create run from song input (URL + metadata):
curl -X POST http://localhost:3000/api/runs \
-H "Content-Type: application/json" \
-H "Cookie: <auth-cookie>" \
-d '{
"songInput": {
"audioSource": {"type":"URL","url":"https://example.com/track.mp3"},
"lyrics": {"raw":"Verse lyrics..."},
"metadata": {"title":"Song","artist":"Artist"}
},
"durationSeconds":12,
"stylePreset":"noir"
}'Features API (upload → process → features):
curl -X POST -F "file=@song.mp3" -F "title=My Song" -F "artist=Me" http://localhost:8000/songs/upload
curl -X POST http://localhost:8000/songs/{song_id}/process
curl http://localhost:8000/songs/{song_id}/features| Script | Description |
|---|---|
npm run dev |
Start Next.js dev server |
npm run build |
Production build |
npm run start |
Start production server |
npm run worker |
Start ingest worker |
npm run db:push |
Push Prisma schema |
npm run db:seed |
Seed database |
npm run db:studio |
Open Prisma Studio |
npm run db:seed-songs-2025 |
Seed Top 25 2025 songs (local) |
npm run db:seed-songs-2025-snowflake |
Seed Top 25 2025 songs into Snowflake |
npm run db:seed-generations-snowflake |
Seed Snowflake GENERATIONS table |
npm run db:test-snowflake |
Test Snowflake connection |
npm run build:viva-demo |
Build Viva music video demo |
├── app/ # Next.js App Router
│ ├── (landing)/ # Landing page
│ ├── (main)/ # Dashboard, create, library, discover, reels, video, storyboard, timeline
│ ├── auth/ # Clerk auth routes
│ ├── api/ # API routes (tracks, runs, providers, etc.)
│ └── demo/ # Demo pages
├── components/ # React components (landing, create, processing, ui)
├── deploy/ # Docker, nginx, Vultr deployment
│ ├── Dockerfile.* # web, worker, features, audio-analysis
│ ├── docker-compose.yml
│ ├── nginx.conf
│ └── snowflake.sql
├── features/ # Python features API (transcribe, LLM tags, scene plan)
│ ├── extract.py # Main pipeline: audio → lyrics → LLM → scene_plan
│ ├── llm_tags.py # GPT-4o-mini lyric tagging
│ ├── scene_plan.py # Prompts from LLM + RMS
│ └── audio_analysis.py # librosa (BPM, beats, RMS)
├── lib/ # DB, auth, audio-analysis, storage, snowflake
├── services/
│ └── audio_analysis/ # FastAPI spectrogram service
│ ├── app.py
│ ├── analysis_runner.py
│ └── prompt_builder.py
├── spectrogram.py # librosa STFT, peaks, bands, sections
├── src/
│ ├── lib/ # runOrchestrator, storyboardPrompts, shotPlanner, stitcher
│ ├── providers/ # Luma adapter
│ └── schemas/ # Zod schemas
├── workers/ # Ingest worker (BullMQ)
├── prisma/
│ ├── schema.prisma
│ └── seed.ts
├── scripts/ # Seed, build demo
└── server/ # Queue definition
- Auth: Clerk handles sign-in; API routes check
auth()for protected endpoints - Uploads: File type validation (audio extensions); size limit via
MAX_UPLOAD_MB - Secrets:
OPENAI_API_KEY,CLERK_SECRET_KEY,SNOWFLAKE_*,S3_*— never commit; use.env - CORS: Next.js same-origin by default; configure for custom domains if needed
- Storage: S3 presigned URLs or public bucket; ensure bucket policy restricts writes
| Doc | Contents |
|---|---|
deploy/README.md |
Vultr VM, Docker Compose, troubleshooting |
features/README.md |
Features API env vars, Snowflake tables |
lyrics_ts/README.md |
Transcription (faster-whisper) |
ISC