Inspiration

We wanted to create a project that included a unique challenge and could result in a lasting impact. A web-based code dictation application provided the challenge of parsing and interpreting live speech transcripts, requiring unique features that distinguished it from other options. It is also an application that could have significant impact. Programmers who are missing arms or hands, suffering from carpal tunnel syndrome, or developing other degenerative conditions like Parkinson's disease can use this app to overcome their disability and continue to pursue their passion.

What it does

This project features a web-based code editor requiring only your voice. You dictate your code directly to the webpage, typing code and editing code using only your voice.

How we built it

We developed this project using modern web technologies: HTML5, javascript, jQuery, and CSS. Our application is hosted using Google Cloud's Compute Engine service, which is exposed using a public domain from Domain.com. The speech recognition component of the project uses Chrome's builtin webkitSpeechRecognition module.

We took a two-fold approach to interpreting the speech recognition results for our project. For inserting code, we devised a purely custom lexicon and grammar that we used to parse and interpret programming code from the transcript generated from the user's audio stream. For editing code, we employed Microsoft Azure's Language Understanding service to read an intent out of the user's commands. Our software then acted on the users's intent to move around inside the code editor.

The code editor itself was generated, embedded, and manipulated using the Ace open-source editor API.

Challenges we ran into

Among the many challenges we faced during the development process, incorporating Google Cloud services into our project proved to be the most difficult. Leveraging on our past experience with Django, a Python Web framework tool, we eventually discovered an innovative way to deploy our service publicly using Google Cloud. In addition to that, we also continue to have issues with voice recognition not detecting certain words. This is a feature we care deeply about and are actively seeking ways to improve it.

Accomplishments that we're proud of

Overall, we are proud of the way we explored the area of speech recognition and natural language--a field none of us were familiar with--and learned everything we needed to know to implement a fully-functioning application controlled completely by voice. We are also proud that we were able to overcome the technicalities involved in deploying an application to a publicly-hosted web service running through Google Cloud.

What we learned

The past 24 hours have shown us the intricacies surrounding web-development and voice recognition software. We discovered alternative methods for interpreting text into software commands, which included learning how to harness the power of natural language processing to interpret the user's desire from a speech command.

What's next for Orator

With our service still in its inchoate form, we expect much more developing and tweaking to be done to ensure the user always has a seamless experience. Voice recognition software often requires a copious amount of time to ensure phrases are captured accurately and responded to in a natural manner. We foresee doing more trials and gathering voice data to better curate the entire user experience.

Built With

Share this project:

Updates