HapticPhonix

Inspiration

Millions of people with hearing loss or speech difficulties struggle not because sound is missing, but because feedback is missing. When a person speaks, they cannot feel whether their lips, tongue, and airflow are forming the correct phonemes. Most speech therapy tools are visual or audio-based. We asked a different question:

What if speech could be felt?

HapticPhonix was inspired by the idea that vibration is the only truly universal feedback channel: it works even when sound and vision fail. By converting speech, lip movement, and phonemes into real-time haptic patterns, we create a new sensory pathway for learning to speak and understand speech.

What it does

HapticPhonix is a real-time speech learning system that combines:

AI-powered lip reading Speech-to-haptic translation Phoneme-based vibration lessons

A student opens the mobile web app and sees their own face, lip landmarks, and phoneme feedback. As they speak, their phone vibrates in patterns that represent how their voice and mouth are moving. This allows students to feel their own speech in real time, creating immediate tactile feedback for learning proper phoneme formation.

How we built it

HapticPhonix is a real-time system built around two synchronized pipelines:

Visual Lip Analysis The student's phone streams camera frames to our backend over WebSockets. We use MediaPipe Face Mesh to track lip landmarks and crop the mouth region. These frames are sent to Gemini Vision, which performs AI lip reading, phoneme extraction, and confidence scoring.
Phoneme & Lesson Engine Lessons are stored as JSON phoneme timelines. A background engine advances through the lesson and triggers vibration patterns whenever a phoneme becomes active. This allows students to physically feel when to form sounds like "B", "M", or "S". As students speak, their voice is analyzed for loudness and rhythm, which is converted into vibration intensity, providing immediate haptic feedback on their speech patterns. All communication happens over WebSockets, enabling real-time feedback with minimal latency.

Challenges we ran into

Latency: We built a parallel RMS-based vibration path to provide instant feedback while AI transcription runs, ensuring students feel their speech in real time. Lip reading: No off-the-shelf lip reading model works well on live webcam video, so we engineered a hybrid system using MediaPipe for geometry and Gemini for interpretation. Mobile haptics: Browsers limit vibration APIs, so we had to design compact vibration patterns that still convey speech rhythm and intensity. Real-time sync: Ensuring the visual feedback, haptic patterns, and phoneme analysis all synchronized perfectly required careful WebSocket management.

Accomplishments that we're proud of

We created a working real-time lip reading pipeline using live camera video. We built speech-to-haptics, allowing someone to feel their own voice as they speak. We implemented dual haptic channels: pitch feedback and phoneme pattern guidance. We designed a phoneme lesson engine that turns speech training into tactile patterns. We built a fully functional accessibility platform that works on mobile devices.