Best Main Prize - ML for social good - Sign4Good

Hand tracking
GIF
Working Gif
GIF
Multi Gesture Translation
GIF
Words to Try: Yellow
GIF
Words to Try: Red
GIF
Words to Try: Green
GIF
Words to Try: Bright
GIF
Words to Try: Opaque
GIF
Words to Try: Light-Blue

Inspiration

Today, around one million people use Sign Language as their main way to communicate, according to https://www.csd.org/. We decided to create an application that will help bridge the gap for those who have impaired hearing. Additionally, it also helps people who do not know sign language. Using this app, communication will be easier for both parties and people who have their voices drowned out will have a way to speak up.

What it does

Sign4Good allows users to sign a word (full gesture) which is then translated into text. This allows people to communicate without having to fully learn sign language

How I built it

The hand tracking application was built with opencv. It segments the hand from the frame using masking techniques. The translation of the sign is done using a deep neural network that uses a CNN which recognizes the features of the image. These features are then fed into an RNN which checks the differences between high level frames.

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 256, 256, 32)      896       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 128, 128, 32)      0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 128, 128, 64)      18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 64, 64, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 64, 64, 128)       73856     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 32, 32, 128)       0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 32, 32, 256)       295168    
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 16, 16, 256)       0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 16, 4096)          0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 16, 64)            1065216   
_________________________________________________________________
lstm_2 (LSTM)                (None, 16, 32)            12416     
_________________________________________________________________
lstm_3 (LSTM)                (None, 32)                8320      
_________________________________________________________________
dense_1 (Dense)              (None, 6)                 198       
=================================================================
Total params: 1,474,566
Trainable params: 1,474,566
Non-trainable params: 0
_________________________________________________________________

Challenges I ran into

Segmenting the fingers from the hand in the hand detection Handling the large amounts of data

Accomplishments that I'm proud of

Being able to achieve a model with high accuracy Segmenting fingers for the hand tracking Supporting multi gesture translation

What I learned

Working with Video Detection

Limitations:

Due to the nature of sign language, some gestures closely resemble others and because of this our current model has some difficulty recognizing words like opaque and green. Since they both have a similar gesture when looked at from the pov of the masked pink glove.

At the moment Google’s mediapipe supports only single hand detection. This prevented us from training on gestures that require both hands. Mediapipe is an open source software that has a very good tracking algorithms implemented. As soon as it supports dual hand recognition this issue can also be resolved.

What's next for Sign4Good

Train using more words Train using different sign languages

Dataset Used

Sign Language Used

Built With

artifical-intelligence
google-mediapipe
keras
opencv
python
tensorflow

Updates

Zafir Khalid started this project — Jul 18, 2020 07:43 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.