Sign Sync

Training our model for 50 epochs with a Learning Rate of 0.001, we're able to avoid overfitting too much
Demonstrating our custom trained tensorflow.js model uploading it to Teachable Machine

Inspiration

Over 15% of American adults, over 37 million people, are either deaf or have trouble hearing according to the National Institutes of Health. One in eight people have hearing loss in both ears, and not being able to hear or freely express your thoughts to the rest of the world can put deaf people in isolation. However, only 250 - 500 million adults in America are said to know ASL. We strongly believe that no one's disability should hold them back from expressing themself to the world, and so we decided to build Sign Sync, an end-to-end, real-time communication app, to bridge the language barrier between a deaf and a non-deaf person. Using Natural Language Processing to analyze spoken text and Computer Vision models to translate sign language to English, and vice versa, our app brings us closer to a more inclusive and understanding world.

What it does

Our app connects a deaf person who speaks American Sign Language into their device's camera to a non-deaf person who then listens through a text-to-speech output. The non-deaf person can respond by recording their voice and having their sentences translated directly into sign language visuals for the deaf person to see and understand. After seeing the sign language visuals, the deaf person can respond to the camera to continue the conversation.

We believe real-time communication is the key to having a fluid conversation, and thus we use automatic speech-to-text and text-to-speech translations. Our app is a web app designed for desktop and mobile devices for instant communication, and we use a clean and easy-to-read interface that ensures a deaf person can follow along without missing out on any parts of the conversation in the chat box.

How we built it

For our project, precision and user-friendliness were at the forefront of our considerations. We were determined to achieve two critical objectives:

Precision in Real-Time Object Detection: Our foremost goal was to develop an exceptionally accurate model capable of real-time object detection. We understood the urgency of efficient item recognition and the pivotal role it played in our image detection model.
Seamless Website Navigation: Equally essential was ensuring that our website offered a seamless and intuitive user experience. We prioritized designing an interface that anyone could effortlessly navigate, eliminating any potential obstacles for our users.

Frontend Development with Vue.js: To rapidly prototype a user interface that seamlessly adapts to both desktop and mobile devices, we turned to Vue.js. Its flexibility and speed in UI development were instrumental in shaping our user experience.
Backend Powered by Flask: For the robust foundation of our API and backend framework, Flask was our framework of choice. It provided the means to create endpoints that our frontend leverages to retrieve essential data.
Speech-to-Text Transformation: To enable the transformation of spoken language into text, we integrated the webkitSpeechRecognition library. This technology forms the backbone of our speech recognition system, facilitating communication with our app.
NLTK for Language Preprocessing: Recognizing that sign language possesses distinct grammar, punctuation, and syntax compared to spoken English, we turned to the NLTK library. This aided us in preprocessing spoken sentences, ensuring they were converted into a format comprehensible by sign language users.
Translating Hand Motions to Sign Language: A pivotal aspect of our project involved translating the intricate hand and arm movements of sign language into a visual form. To accomplish this, we employed a MobileNetV2 convolutional neural network. Trained meticulously to identify individual characters using the device's camera, our model achieves an impressive accuracy rate of 97%. It proficiently classifies video stream frames into one of the 26 letters of the sign language alphabet or one of the three punctuation marks used in sign language. The result is the coherent output of multiple characters, skillfully pieced together to form complete sentences

Challenges we ran into

Since we used multiple AI models, it was tough for us to integrate them seamlessly with our Vue frontend. Since we are also using the webcam through the website, it was a massive challenge to seamlessly use video footage, run realtime object detection and classification on it and show the results on the webpage simultaneously. We also had to find as many opensource datasets for ASL as possible, which was definitely a challenge, since with a short budget and time we could not get all the words in ASL, and thus, had to resort to spelling words out letter by letter. We also had trouble figuring out how to do real time computer vision on a stream of hand gestures of ASL.

Accomplishments that we're proud of

We are really proud to be working on a project that can have a profound impact on the lives of deaf individuals and contribute to greater accessibility and inclusivity. Some accomplishments that we are proud of are:

Accessibility and Inclusivity: Our app is a significant step towards improving accessibility for the deaf community.
Innovative Technology: Developing a system that seamlessly translates sign language involves cutting-edge technologies such as computer vision, natural language processing, and speech recognition. Mastering these technologies and making them work harmoniously in our app is a major achievement.
User-Centered Design: Crafting an app that's user-friendly and intuitive for both deaf and hearing users has been a priority.
Speech Recognition: Our success in implementing speech recognition technology is a source of pride.
Multiple AI Models: We also loved merging natural language processing and computer vision in the same application.

What we learned

We learned a lot about how accessibility works for individuals that are from the deaf community. Our research led us to a lot of new information and we found ways to include that into our project. We also learned a lot about Natural Language Processing, Computer Vision, and CNN's. We learned new technologies this weekend. As a team of individuals with different skillsets, we were also able to collaborate and learn to focus on our individual strengths while working on a project.

What's next?

We have a ton of ideas planned for Sign Sync next!

Translate between languages other than English
Translate between other sign languages, not just ASL
Native mobile app with no internet access required for more seamless usage
Usage of more sophisticated datasets that can recognize words and not just letters
Use a video image to demonstrate the sign language component, instead of static images

Built With

flask
javascript
python
tensorflow-js
vue

Submitted to

PennApps XXIV
- Winner The Wolfram Award

Created by

I worked with Alex on developing the front-end of the application. My main contribution to the front-end was the component that takes speech and converts it to gestures & text

John Khachian
Shehryar Usman
Alex Zavalny

Updates

Shehryar Usman started this project — Sep 09, 2023 03:48 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.