Inspiration
As a team, we wanted to use technology to make communication more inclusive. We realized how challenging it can be for people who use sign language to interact with those who don’t, especially in everyday situations.
That inspired us to create SayLess, a tool that connects speech, text, and sign language in real time. Our goal was to bridge that communication gap and make it easier for everyone to be understood, no matter how they communicate.
What it does
- The app recognizes your sign language in real time.
- It converts your signs into spoken words so others can hear what you are saying.
- The app also listens to speech and converts it into written text for you to read.
- When the button turns red, it indicates that voice processing is active.
- This enables smooth, two-way communication between people who use sign language and those who speak.
How we built it
Backend: Python (FastAPI), Docker, Render
- Forked a mediapipe based ML model that converts frames/images of sign language to English.
- Trained the model by ourselves, and saved the training data in a pickle file so that the model can be trained every time the backend server is spun up/activated
- Created FastAPI server and endpoints that is containerized using Docker with the following capabilities:
- ASL -> text -> speech: POST API that takes in an image of a person using ASL, calls the model to get back text, using the text as input to call the ElevenLabs API, and returns audio file decoded
- Speech -> text: POST API that takes in speech .WAV file, calls the Gemini API, and returns translated text
Frontend: HTML, CSS, TypeScript (React), Vercel
- Batched audio in intervals and send as wav file to backend periodically for generating text in real time
- Batch images in intervals during the video capturing in order for the backend to perform ASL inference on the images
- This approach is a solution so that we do not end up sending too many requests in a given time interval
Challenges we ran into
- Difficult to find a model that was ready to use/train that took ASL as input and outputted Strings
- CORS mysteries
- ElevenLabs and Gemini API integration
Accomplishments that we're proud of
- First time using Docker
- Finding a model for ASL to English and figuring out how to run it and train it
- Creating separate functions to train the model on backend initialization and call model on a single frame
- ]Making sure deployments work successfully
What we learned
- Containerization using Docker
- Using PostMan API to test FastAPI endpoints
What's next for Sayless
- finding a model that accepts short video clips as input instead of still images

Log in or sign up for Devpost to join the conversation.