Inspiration

We all love singing with friends (privately!). Karaoke has always been a favorite pastime of ours, and we wanted to expand that hobby to the digital world.

While exploring existing digital karaoke platforms, we realized there was an opportunity to build something more interactive and safer. We wanted users not only to sing their favorite songs, but also to creatively remix them. To do this, we integrated AI-powered lyric transformation and allowed users to tweak lyrics for humor, personalization, and safety.

Safety was a core design principle. Because digital spaces should be inclusive for younger audiences as well, our customizable lyrics provide content moderation and transformation options, ensuring that explicit content can be filtered or rewritten appropriately. Our goal was to combine creativity, safety, and social connection all in one cohesive platform.

What We Learned

Technical Skills

Through building this project, we gained hands-on experience with:

  • Supabase for database management and authentication.
  • LRC file formatting to synchronize lyrics with audio timing.
  • WebSockets to power real-time karaoke lobbies.
  • ElevenLabs for high-quality AI voice generation.
  • Google OAuth for secure and seamless authentication.
  • API orchestration and media-processing pipelines.

Team & Product Skills

Beyond the technical stack, we learned how to function as an effective engineering team by:

  • Delegating responsibilities based on individual strengths.
  • Writing clear product requirements before implementation.
  • Planning system architecture before touching code.
  • Integrating multiple complex services/API calls into a cohesive pipeline.

Strong communication helped us develop strong code.


How We Built Our Project

Our system combines multiple technologies into a cohesive karaoke platform:

Next.js

We used Next.js to build both:

  • The frontend interface (interactive lobbies, lyric editor, playback UI).
  • Backend API routes for securely orchestrating AI and audio-processing calls.

This framework simplified deployment and reduced architectural overhead.


Supabase

Supabase handled:

  • User authentication and session management.
  • Storage of user-generated lyric variations.
  • Lobby state persistence.

Its real-time capabilities also complemented our WebSocket implementation for our lobbies.


Google Gemini 2.5 Flash API

We used Google Gemini 2.5 Flash to transform lyrics based on user input. Whether users wanted:

  • A funny parody,
  • A thematic rewrite,
  • Or explicit-content moderation,

Gemini handled controlled text transformation with structured prompts.


LRC API

To enable lyric timing synchronization, we used an LRC API to fetch timestamped lyrics in .lrc format. This allowed us to match text with audio playback for karaoke mode.


Tailwind CSS

Tailwind allowed us to rapidly prototype and maintain a clean, responsive UI. Given the real-time nature of the app, minimizing styling overhead helped us focus on system integration.


ElevenLabs

We used ElevenLabs to generate high-quality AI voice audio from modified lyrics. This allowed users to hear their parody versions performed dynamically.


Replicate (RVC Model)

To make the generated voice sound more like singing rather than plain speech, we used an RVC (Retrieval-Based Voice Conversion) model hosted on Replicate. This improved vocal realism and pitch adaptation.


Spleeter (via pip)

For karaoke mode, we used Spleeter to isolate instrumentals from full-track audio files. This allowed users to:

  • Remove original vocals,
  • Sing over clean instrumentals,
  • Or combine AI-generated vocals with background tracks.

Challenges We Faced

1. Cross-Branch Feature Integration

With multiple team members building different components simultaneously, merging branches became complex. Features that worked independently sometimes conflicted when integrated.


2. Building the Data & Audio Pipeline

Our pipeline looked roughly like: Original Lyrics --> AI Transformation --> Voice Generation --> Voice Conversion --> Audio Mixing

Each step introduced latency, formatting differences, and potential failure points. This ensured consistent data flow across APIs required careful error handling and standardization.


3. Real-Time Synchronization

Maintaining synchronization across:

  • Multiple users in a lobby,
  • Audio playback,
  • Timed lyric rendering, required careful WebSocket event design and time-state management.

Overall

This project pushed us to combine AI, audio processing, and real-time networking into one cohesive platform. We definitely won’t be auditioning anytime soon, but we’re proud of the stage we built.

Built With

Share this project:

Updates