Inspiration

presenceAI sprang from a simple observation: most public-speaking and interview-prep apps focus only on vocal delivery, ignoring the body language and facial cues that carry just as much weight.

What it does

We close that gap with a web platform where users rehearse live on camera while three pipelines work in tandem. OpenAI Whisper for speech metrics, MediaPipe pose landmarks for posture and gestures, and OpenCV facial-expression analysis for emotional tone producing clear scores and trend graphs for measurable progress.

How we built it

React, TypeScript, Vite, and Tailwind drive the UI; Node.js and Python micro-services run Whisper, MediaPipe, and OpenCV in Web Workers and store metrics in MongoDB. We also used gemini to process the assessments from the voice, body language, and facial expressions from the users video, and to provide insights to them.

Challenges we ran into

We wrestled with webcam variability, lighting changes, and landmark jitter.

Accomplishments that we're proud of

The result is a privacy-minded, browser-based mentor that offers actionable, real-time feedback with no human coach required. It has proven to be considerably accurate in properly rating videos of different levels of public speaking abilities.

What we learned

Building this two-stream (audio and video) coach taught us how hard real-time multimodal fusion can be.

What's next for PresenceAI

We’ll keep refining accuracy, expand mobile support, and add personalized training modules to deepen user progress tracking.

Built With

Share this project:

Updates