Inspiration

With new and upcoming AI technologies revolving around pose estimation coming out, we were inspired to impact the ASL community by bridging the gap between speakers and non-speakers of ASL.

What does it do?

Simply is a mobile application that incorporates new AI technologies such as Meta's Sapiens, Apple's new native speech to text library, and various other technologies to create a simulated translation of English speech to ASL.

Build Process

We first generated our dataset using Handspeak via a script to get hundreds of ASL videos and their corresponding word. We converted these english words to their ASL Gloss equivalent, and utilized Apple's native speech to text library in Swift to tokenize spoken audio in our mobile app. Once we had our data, we would sequentialize the audio into parts of Gloss and fetch the corresponding videos to process their pose estimation through sapiens. After setting up our poses, we used intermediate frame generation to smooth transitions between poses. Finally for the display of these poses, we used OpenCV to visualize the poses into a video format to be sent back to our app via a Flask backend for it to be finally displayed.

Challenges

Originally, we worked very hard to generate a sufficient 3D pose model. While we had some infrastructure for it, we lacked a sufficient enough dataset to accurately predict a 3D pose. This caused us to migrate down to 2D pose, affecting numerous processes that we were working on such our Blender Animation, and utilizing our cameras, since they wouldn't help as much for 2D.

Accomplishments

We are proud to have generated sample videos from spoken English words and having a working demonstration within the Hackathon time-frame. In addition, we built a large amount of infrastructure for improving this project even further in the future.

The Future

We aim to further work on incorporating our existing work for 3D pose estimation and 3D blender animation into Simply-ASL. To accomplish this, we want to crowdsource a dataset for 3D pose estimation fitting our use case and parameters. Lastly, our team is working to incorporate a live animation feed, rather then a generated video.

Share this project:

Updates