πŸ’‘ Inspiration

Most music ads feel like jarring interruptions. Whether it's a sudden volume spike or a total shift in genre, the transition usually breaks the listener's immersion. We built Flow to explore how generative AI can integrate promotional content into the musical context itself. By using AI inpainting, we can insert ads that match the "DNA" of a track, preserving the user experience while providing a flexible monetization tool.

🎨 What it does

Flow is a text-to-integrated-audio utility that automates the ad insertion process.

  • Upload: Accepts source audio in standard formats.
  • Prompt: Generates ad content based on natural language descriptions (e.g., "Upbeat coffee shop promo with acoustic guitar").
  • Inpaint: Uses MusicGPT to replace a specific time window of the song with generated audio that matches the original's tempo, key, and texture.
  • Export: Returns a production-ready MP3 with the ad seamlessly blended into the track.

πŸ”§ How we built it

The system is built as an asynchronous service designed to bridge generative AI with traditional audio handling.

  • FastAPI & Uvicorn: Powers the REST API layer and handles non-blocking request routing.
  • MusicGPT API: Provides the generative /v1/inpaint and /v1/MusicAI endpoints.
  • Async Job Management: A background task architecture that processes long-running AI operations (30s+) without blocking the main thread or timing out the client.
  • httpx & aiofiles: Manages non-blocking network communication with AI providers and asynchronous local file I/O.
  • pydantic-settings: Handles environment-based configuration for API keys and project constants.

πŸ’ͺ Challenges we ran into

  • Asynchronous Orchestration: AI generation is slow. We had to implement a robust polling and status-tracking architecture to handle MusicGPT's nested JSON payloads and variable job states.
  • Transition Physics: Seamless results depend heavily on the "replace window" timing. We spent significant time testing how much surrounding context the AI needs to maintain a song's rhythm.
  • Environment Stability: Standardizing the runtime for audio processing (Python 3.12+ with FFmpeg dependencies) was crucial for consistent output quality across different deployment environments.

πŸ† Accomplishments that we're proud of

  • Zero-Asset Ad Creation: Developed a workflow that produces high-quality ad-integrated tracks with zero external audio filesβ€”only a text prompt.
  • Reliable Background Execution: Built a stable job-tracking system that survives the high latency of generative AI models.
  • End-to-End Automation: Created a clean path from raw upload to a fully processed, downloadable MP3.

πŸ“š What we learned

  • Context is Key: The quality of an AI inpaint is highly dependent on the "buffers" (the audio before and after the ad window). Providing the right context is the difference between a jump-scare and a smooth transition.
  • Async by Default: For media generation, synchronous patterns are a dealbreaker. Learning to manage persistent job states was our most significant technical takeaway.

πŸš€ What's next for Flow

  • User Accounts & Projects: Implementing a persistent database to allow users to save their work, manage multiple "Flows," and revisit previous generations.
  • Integrated Music Library: Building a searchable library of licensed tracks so users can select a song to monetize without needing to provide their own audio file.

Built With

Share this project:

Updates