Inspiration
Millions of deaf, mute, and other people around the world rely on sign language as a primary means of communication. Their ability to learn the language and use it effectively is a basic human right, especially in underprivileged areas. I recently learned about a hand tracking solution created by the company MediaPipe, and decided to use it in order to create a project that could transcribe sign language letters in real time.
What it does
The project, LexiSign, is a Python program that uses your camera to transcribe American Sign Language (ASL) sign language letters into text. It uses the MediaPipe hand tracking solution to find the coordinates of 21 unique points on your hand, and compare them to the points found on a set of reference images. The images are taken from the ASL Alphabet dataset on Kaggle by Akash (link here). There are 3,000 images provided for each letter, but this program uses only 5. In this way, the system is able to classify different letters quickly and display them onto the screen.
How I built it
The program mainly consists of classifying camera frames, comparing them to the reference images, and displaying the formatted results in a window. For the first two I used MediaPipe, and for the latter I used OpenCV. A set of vectors is picked from each reference image, which are to be compared with corresponding vectors from the camera feed. The reference image with the most similarity to the camera frame (the smallest difference between all the vectors) will be displayed as the predicted letter. I also created a simple GUI using OpenCV for the window, which displays the camera feed, transcribed text, and some controls (start, pause, and stop).
Challenges I ran into
I wanted the project to be scalable, so that more reference images could be added at any time to improve accuracy. In addition, I wanted it to work in real-time and not lag too much. I eventually found a good mix of images and vectors that can accurately predict most letters from multiple angles. Building the GUI was also a challenge, as I had never done it before, but I was able to create a small, simple one using the functions from OpenCV.
Accomplishments that I'm proud of
I am proud of the fact that the program is able to recognize many letters despite the low amount of reference images. Nevertheless, I do intend to increase the amount of references in the future to further improve the accuracy.
What I learned
I learned a lot about MediaPipe and the hand tracking system, different functions of OpenCV, and even picked up a little sign language as I was developing the project.
What's next for LexiSign
As said before, I would like to add more reference images (from a dataset) to improve the accuracy of the program. In this way, I can also add different sign language dialects, such as British Sign Language (BSL). Perhaps most importantly, I would like to be able to have it analyze hand gestures in motion, rather than just the alphabet. For example, it should pick up phrases like "how are you?" or "thank you," which use moving hand gestures.
Built With
- mediapipe
- opencv-python
- pycharm
- python
Log in or sign up for Devpost to join the conversation.