Inspiration
presenceAI sprang from a simple observation: most public-speaking and interview-prep apps focus only on vocal delivery, ignoring the body language and facial cues that carry just as much weight.
What it does
We close that gap with a web platform where users rehearse live on camera while three pipelines work in tandem. OpenAI Whisper for speech metrics, MediaPipe pose landmarks for posture and gestures, and OpenCV facial-expression analysis for emotional tone producing clear scores and trend graphs for measurable progress.
How we built it
React, TypeScript, Vite, and Tailwind drive the UI; Node.js and Python micro-services run Whisper, MediaPipe, and OpenCV in Web Workers and store metrics in MongoDB. We also used gemini to process the assessments from the voice, body language, and facial expressions from the users video, and to provide insights to them.
Challenges we ran into
We wrestled with webcam variability, lighting changes, and landmark jitter.
Accomplishments that we're proud of
The result is a privacy-minded, browser-based mentor that offers actionable, real-time feedback with no human coach required. It has proven to be considerably accurate in properly rating videos of different levels of public speaking abilities.
What we learned
Building this two-stream (audio and video) coach taught us how hard real-time multimodal fusion can be.
What's next for PresenceAI
We’ll keep refining accuracy, expand mobile support, and add personalized training modules to deepen user progress tracking.
Log in or sign up for Devpost to join the conversation.