Inspiration
Recognizing that many accessibility tools focus on word translation and miss emotional context, we built VitalSign to help Deaf and non-verbal users communicate more naturally in situations where interpreters aren't available, preserving both meaning and emotion.
What it does
VitalSign is a real-time web application that translates ASL gestures and facial expressions into natural, emotion-aware speech. Users sign in front of their webcam, and the system detects gestures, refines the text with AI, and converts it to expressive speech.
How we built it
Built with Next.js 15 and React 19, using MediaPipe for real-time hand tracking and gesture recognition , Cohere AI for text refinement into natural sentences, ElevenLabs API for emotion-aware text-to-speech, and face-api.js for emotion detection. All processing happens in the browser with server-side API routes for secure key management.
Challenges we ran into
Managing real-time performance while processing video, gesture recognition, emotion detection, and API calls simultaneously. Ensuring gesture recognition accuracy across different lighting and camera angles. Integrating multiple APIs (Cohere, ElevenLabs) with proper error handling and fallbacks. Balancing latency with quality in the translation pipeline.
Accomplishments that we're proud of
Built a complete end-to-end pipeline from sign detection to speech output, all within 24 hours. Implemented emotion-aware synthesis that adjusts voice tone based on detected emotions. Created a web-based solution requiring no special hardware, just a browser and webcam. Achieved real-time performance suitable for live conversations. Designed an intuitive UI with voice selection and volume controls.
What we learned
The importance of combining multiple AI technologies (computer vision, NLP, TTS) to create a cohesive solution. How to structure Next.js API routes for secure third-party API integration. Accessibility considerations beyond basic functionality—preserving emotional nuance matters. The challenges of real-time video processing and managing multiple async operations in React.
What's next for Vital Sign
Expand the gesture vocabulary to 50+ common ASL signs. Improve gesture recognition accuracy with better filtering and confidence thresholds. Add support for multi-word phrases and sentence construction. Integrate with video conferencing platforms (Zoom, Teams) for meeting accessibility. Add support for other sign languages beyond ASL. Implement user customization for gesture sensitivity and personal voice preferences.
Built With
- cohere
- elevenlabs
- faceapi
- mediapipe
- next
- vercel
Log in or sign up for Devpost to join the conversation.