VitalSign

Inspiration

Recognizing that many accessibility tools focus on word translation and miss emotional context, we built VitalSign to help Deaf and non-verbal users communicate more naturally in situations where interpreters aren't available, preserving both meaning and emotion.

What it does

VitalSign is a real-time web application that translates ASL gestures and facial expressions into natural, emotion-aware speech. Users sign in front of their webcam, and the system detects gestures, refines the text with AI, and converts it to expressive speech.

How we built it

Built with Next.js 15 and React 19, using MediaPipe for real-time hand tracking and gesture recognition , Cohere AI for text refinement into natural sentences, ElevenLabs API for emotion-aware text-to-speech, and face-api.js for emotion detection. All processing happens in the browser with server-side API routes for secure key management.

Challenges we ran into

Managing real-time performance while processing video, gesture recognition, emotion detection, and API calls simultaneously. Ensuring gesture recognition accuracy across different lighting and camera angles. Integrating multiple APIs (Cohere, ElevenLabs) with proper error handling and fallbacks. Balancing latency with quality in the translation pipeline.

Accomplishments that we're proud of

Built a complete end-to-end pipeline from sign detection to speech output, all within 24 hours. Implemented emotion-aware synthesis that adjusts voice tone based on detected emotions. Created a web-based solution requiring no special hardware, just a browser and webcam. Achieved real-time performance suitable for live conversations. Designed an intuitive UI with voice selection and volume controls.

What we learned

The importance of combining multiple AI technologies (computer vision, NLP, TTS) to create a cohesive solution. How to structure Next.js API routes for secure third-party API integration. Accessibility considerations beyond basic functionality—preserving emotional nuance matters. The challenges of real-time video processing and managing multiple async operations in React.

What's next for Vital Sign

Expand the gesture vocabulary to 50+ common ASL signs. Improve gesture recognition accuracy with better filtering and confidence thresholds. Add support for multi-word phrases and sentence construction. Integrate with video conferencing platforms (Zoom, Teams) for meeting accessibility. Add support for other sign languages beyond ASL. Implement user customization for gesture sensitivity and personal voice preferences.

Built With

cohere
elevenlabs
faceapi
mediapipe
next
vercel

Submitted to

DeltaHacks 12

Created by

I engineered the core translation pipeline that bridges computer vision with Generative AI using Mediapipe. On the frontend, I designed the reactive UI—including the glassmorphic transcript panel and the custom 'Thinking' state—managing complex asynchronous flows to ensure the app remains responsive while waiting for AI inference.

Bosco Zhang
University of Waterloo Honours Computer Science Student (BCS '29)
Liam Ma
ethanluoeng Luo
Nathan Espejo
I specialize in making android apps, games in Unity3D, and 3D design/printing and programming for Arduino