Inspiration

We were inspired by how difficult and isolating it can be to practice music alone. Most musicians don’t always have access to a band, and practicing with full songs can feel unrealistic without proper control over instruments. We wanted to create something that makes practice feel interactive, immersive, and real, almost like you’re playing live with a band. At the same time, we saw an opportunity to combine AI and audio processing to solve a real problem in a creative way.

What it does

PitchPurrfect is an AI-powered music companion that allows users to play along with music as if they are part of a live band. Users can upload or select audio and perform alongside it, while the system analyzes their performance and provides feedback.

The platform detects tempo, extracts beat timing, and identifies when the user plays notes. It compares the user’s timing against the reference track to determine whether they are ahead, behind, or on time. In addition to rhythm analysis, the system performs chord detection using harmonic features, giving insight into the musical structure of the performance.

To enhance the experience, PitchPurrfect includes an AI coaching system powered by Gemini, which transforms raw performance data into natural, human-like feedback such as “You’re slightly behind the beat—try locking in with the drums.” This feedback is then converted into real-time voice using ElevenLabs, creating an interactive and engaging coaching experience that feels like practicing with a real instructor.

How we built it

We built PitchPurrfect using a modular architecture that integrates backend systems, audio intelligence, AI feedback, and frontend interaction into a unified pipeline.

The backend was developed using FastAPI, which handles file uploads, processes audio data, and exposes endpoints for analysis. This allowed seamless communication between the frontend and the audio processing engine.

For audio intelligence, we used Python and librosa to analyze sound signals. We implemented beat tracking to estimate tempo and extract beat timestamps from reference audio. We also built onset detection to determine when the user plays notes, enabling precise timing comparison against the reference track. These components allow the system to classify whether a user is ahead, behind, or on time.

We extended the system further by implementing chord detection using chroma features, which capture pitch-class information from the audio. By comparing these features against chord templates, we were able to infer chord progressions and add harmonic awareness to the system.

To make the experience more interactive and human-like, we integrated the Gemini API to generate intelligent coaching feedback based on the user’s performance metrics. This transforms raw numerical outputs into meaningful insights. We then used ElevenLabs to convert this feedback into natural-sounding speech, allowing the system to “talk” to the user in real time.

On the frontend, we used modern web technologies to build an interface where users can upload audio, select instruments, and view feedback. The frontend communicates with the backend via API calls, sending audio data and receiving structured results that are displayed visually and audibly.

Challenges we ran into

One of the biggest challenges was achieving reliable beat detection across different types of audio. Not all audio files have clear rhythmic patterns, which sometimes resulted in inaccurate tempo estimation or missing beat data. We had to experiment with different inputs and refine our approach to improve consistency.

Another challenge was aligning user input with the reference track. Timing differences can be very subtle, so even small inaccuracies in onset detection or beat tracking could affect the feedback. Ensuring synchronization between these components required careful debugging.

Integrating multiple systems, including audio processing, backend APIs, AI feedback generation, and voice synthesis, also introduced complexity. Coordinating these components and ensuring smooth data flow across the pipeline required strong collaboration and system design

Accomplishments that we're proud of

We are proud of building a complete end-to-end system that transforms raw audio into meaningful, real-time feedback. Successfully integrating beat tracking, onset detection, timing comparison, and chord detection into a single pipeline was a major accomplishment.

We are especially proud of the AI coaching feature, which uses Gemini to generate intelligent feedback and ElevenLabs to deliver it through natural voice. This elevates the project from a technical tool to an engaging user experience.

Most importantly, we created a system that feels like a real product—one that musicians could genuinely use to improve their skills.

What we learned

Through this project, we gained hands-on experience with audio signal processing, including beat tracking, onset detection, tempo estimation, and chroma-based chord analysis. Working with real-world audio taught us how inconsistent and complex sound data can be, and how important it is to design systems that can handle noise and variation.

On the engineering side, we learned how to build and structure APIs using FastAPI, manage data flow between frontend and backend systems, and integrate multiple components into a single pipeline. We also gained experience working with AI services, particularly using the Gemini API to transform raw performance data into meaningful, human-readable feedback.

One of the most impactful learning experiences was working with ElevenLabs for the first time. We learned how to take dynamically generated text and convert it into realistic, natural-sounding speech, which added an entirely new layer of interactivity to our project. Understanding how to integrate voice synthesis into a real-time system challenged us to think beyond traditional UI design and consider user experience in a more immersive way. It showed us how powerful voice AI can be in making applications feel more human and engaging, and it opened our eyes to new possibilities for building conversational and interactive systems in the future.

Overall, this project helped us grow both technically and creatively, teaching us how to combine AI, audio processing, and user experience into a cohesive and impactful product.

What's next for PitchPurrfect

Moving forward, we want to make the system fully real-time so users can receive instant feedback while playing. We also plan to improve the accuracy of chord detection and expand it to evaluate whether the user is playing the correct chords.

We aim to enhance the AI coaching system by making feedback more personalized and adaptive to different skill levels. Additionally, we want to improve the user interface and add visualizations that make performance insights easier to understand.

In the long term, we envision PitchPurrfect becoming a full platform for music practice and learning, combining AI-driven feedback, real-time interaction, and collaborative features to redefine how musicians practice.

Built With

Share this project:

Updates