Parrot: AI Phonetics Coach

landing page
sign up/sign in
log in
animated loading screen
practice page
record yourself speaking
after session recorded
individual word analysis + tooltips
individual sound analysis
phoneme mastery progress
score progression
mastered phonemes based on accuracy + tooltips
example if you messed up certain phonemes
recent sessions
language breakdown
french (fail)
individual french word analysis
french session in recent sessions

Inspiration

Anyone who has used a language app knows they are great for vocabulary and grammar. But what happens when you try to speak? You quickly realize that knowing the words is only half the battle. For many non-native English speakers, the struggle is often neither vocabulary nor grammar, but actually how to sound out and pronounce words, or getting rid of accents. While most apps can tell you if your sentence is correct, they cannot tell you exactly how to sound it out. In recent times, accent bias has been a massive source of prejudice so we wanted to build something to target this pain point. This gap was the perfect problem for a hackathon. We wanted to build a practical tool that zeroes in on the mechanics of speech, giving learners the specific, phoneme-level feedback they actually need to improve.

What it does

Parrot is an AI phonetics (pronunciation) training platform that uses Azure Speech Services to analyze pronunciation accuracy and Gemini AI to generate targeted practice sentences. Users practice with live lip tracking via MediaPipe FaceMesh, receive detailed phoneme (individual sound) - level feedback, and track progress through interactive dashboards and data visualizations. Parrot provides an overall pronunciation score, based on accuracy, fluency, intonation, and completeness. Not only is each word graded on accuracy, but every individual sound that makes up a word. If you are struggling with a specific sound, such as "th", Parrot recommends you phrases to practice to really get it down. The platform supports multiple languages with adaptive features - English users get the full phoneme analysis while other languages receive pronunciation scoring.

How we built it

Built using Next.js 14 with TypeScript and Tailwind CSS, integrated with Supabase for authentication and databasing. Combines Azure Speech Services for pronunciation assessment, Google Gemini AI for sentence generation, and MediaPipe FaceMesh for lip tracking. Uses Recharts and Plotly for data visualization.

Challenges we ran into

Azure Speech Services only supports phoneme-level assessment for English, requiring conditional rendering and fallbacks for other languages. MediaPipe FaceMesh caused performance issues, solved through frame throttling and canvas optimization. Complex state management across authentication and UI updates required custom events and async data loading. Chart library compatibility issues led to migrating from Plotly.js to Recharts.

Accomplishments that we're proud of

Creating something that legitimately helped tackle the original problem we had in mind was what we are most proud of. We were able to fit in a respectable amount of features, flesh each of them out, all within a relatively polished web application.

What we learned

Performance optimization is crucial when working with real-time computer vision APIs, frame throttling and efficient rendering are essential. We learned about complex state management across multiple services. Working with AI service limitations forced us to build adaptive features that gracefully handle different limitations.

What's next for Parrot: AI Phonetics Coach

The obvious next step is to expand phoneme development to other languages like French, Spanish, and Mandarin. More gamification features could be nice and increase engagement as well.