Introduction
Innovative Technology
Real World Impact
ASL for "No"

About EmotiSign Mentor

The Inspiration

Learning sign language in isolation is incredibly lonely. Traditional apps teach you signs, but they can't tell when you're frustrated, confused, or making progress. They don't provide the emotional support that human instructors naturally offer.

I wanted to create an AI companion that doesn't just watch your hands—it reads your face, understands your emotions, and responds with empathy and encouragement. The vision was simple: what if your practice sessions felt like conversations with a patient, emotionally intelligent teacher?

What I Learned

Building EmotiSign Mentor taught me that multimodal AI is the future of educational technology. Combining computer vision, emotion recognition, natural language processing, and speech synthesis creates genuinely interactive learning experiences.

Key insights:

Real-time emotion detection dramatically improves learning outcomes
Streaming APIs are essential for fluid, conversational interfaces
Browser-based AI democratizes access to sophisticated learning tools
Modular architecture enables rapid iteration and easy AI service integration

How I Built It

Core Architecture

EmotiSign Mentor uses Next.js 14 with a modular API architecture orchestrating multiple AI services. The frontend combines React 18 for real-time interactions, Tailwind CSS 4 for dark-first design with glassmorphism effects, and WebRTC & Canvas API for live video capture.

AI Integration & APIs

Hume AI - Emotion Recognition: Analyzes facial expressions from webcam frames every second, providing real-time emotional insights. Chosen for its nuanced emotional understanding with low latency—perfect for immediate feedback during practice.

Claude API - Conversational Coach: Streams user questions with emotion context to create empathetic, personalized responses. Claude excels at educational conversations and intelligently interprets emotional context for targeted coaching.

LMNT - Text-to-Speech: Converts Claude's responses into natural, emotional speech. Their neural TTS produces lifelike voice responses that complete the bimodal interaction loop.

Browser APIs: Integrated Web Speech API for hands-free voice input and MediaDevices API for high-quality webcam streams.

Challenges Faced & Solutions

Real-time Performance

Challenge: Processing webcam frames every second while maintaining smooth UI performance. Solution: Implemented smart frame sampling, asynchronous processing, and intelligent caching to keep the interface responsive.

API Coordination

Challenge: Coordinating multiple streaming APIs without blocking the UI. Solution: Built a state machine managing conversation flow: Idle → Listening → Processing → Speaking with clear visual indicators.

Cross-browser Compatibility

Challenge: Inconsistent Web Speech API and WebRTC support. Solution: Progressive enhancement with feature detection, manual text input backup, and WebRTC polyfills.

Emotion Integration

Challenge: Meaningfully combining emotion data with conversational AI. Solution: Created an emotion summarization system sending only significant emotional shifts—dominant emotions, confidence trends, and session overviews.

Technical Innovation

The most innovative aspect is the multimodal emotion-aware conversation system. By combining:

Computer vision (webcam analysis)
Emotion AI (facial recognition)
Conversational AI (coaching)
Speech synthesis (voice response)
Speech recognition (voice input)

We created the first sign language practice tool that truly understands both what you're doing AND how you're feeling about it.

This represents a new paradigm: emotionally intelligent AI tutoring that removes the loneliness of practicing alone by making solo sessions feel like supportive conversations with an empathetic teacher.

EmotiSign Mentor proves the future of education isn't just about AI that teaches—it's about AI that truly understands and cares about how students feel while learning.