AR and Speech to Text

Our goal was to create a AR Android app that would detect faces and add a text bubble coming out from someone's mouth. When the speaker talks, the spoken words would appear in the text bubble. The plan was to use speech to text with the audio from the phone, and the camera for AR. We faced quite a few time consuming bumps in getting ramped up with Android development and AR in general. We also spent a lot of time investigating the different speech to text APIs available (Revspeech, Watson, GCP) to try to ffind the best option for real time speech to text. We even played around with a recent Google Research paper which uses video and audio spectrometer to separate speakers into different audio channels.

At the end of the day, we don't consider what we did a finished project, but we did manage to get the AR portion working.

Built With

  • arcore
Share this project:

Updates