Inspiration

We wanted to help develop accessibility for captioning for any and all individuals, particularly people that have hearing impairment and/or difficulties with recognizing emotion through verbal and textual communication. We recognized that there is a gap in accessibility in many avenues and we wanted to make a captioning service that would be able to fill as many gaps as we could possibly work towards. This service strives to not only reach out to deaf communities, but also people with certain neurodivergences, as well as work on various levels of use: from lectures, to presentations, streaming, games + voice chat; the possibilities are endless! In our plight to seek advance and innovation in advancement, we aimed to incorporate dynamic caption customization, the ability to view and distribute transcriptions, sentiment analysis, multilingual support and more. Captioning is incredibly useful for connecting people and helping people meld stronger relationships and open up new doors of opportunities. The use of our service would foster a more inclusive and accommodating community for not just people with certain impairments but also allow businesses to leverage the tool to increase productivity and enhance their quality of life and connection with their customers, clients, and colleagues.

What it does

Our uJet application takes in audio from the microphone or it identifies where the audio source is coming from locally. Those streams of information gets sent to a transcription api that will then be formatted and outputted to the uJet start page, with proper interpretation and source label.

How we built it

We built this application in JavaFX, using Java libraries for more backend things.

We utilized AssemblyAI for helping transcribe/caption our audio (on top of the emotion/tone analysis of sentences spoken), and used PulseAudio to deal with system related audio (ex. application audio). We used JNA (Java Native Access) to help us interface with PulseAudio, although we couldn't get it to function as we wanted to in the end.

Challenges we ran into

We ran into issues when trying to integrate AssemblyAI, a third-party AI transcription service, with our project. We had Maven issues when attempting to use their Java SDK and after wrestling with the dependencies, we were greeted with nothing but errors from the API with no documentation. Eventually, through trial and error, we got transcription working but discovered that the sentiment analysis feature that we planned to leverage was not available for real-time transcription. This fact was not made apparent in the documentation and we had to learn it from customer support.

Another massive hurdle was trying to obtain the raw audio and the source of that audio from applications. Since there are no APIs that have tackled that kind of issue before, it would've meant a monstrous amount of C code to even attempt getting both the raw audio bytes and where it came from. We tried interfacing with PulseAudio to get that information on top of the raw audio, but unfortunately, after well over 12 hours of trial and error, it was not possible to have both without greatly changing the C implementation of PulseAudio. It was very disappointing to learn that we could not achieve what we wanted by only interfacing with PulseAudio after we invested so much time reading the documentation and testing so many things out, thinking we were getting closer and closer to our goal.

Accomplishments that we're proud of

We are proud that we got something that works at a basic level, even though it is not exactly what we hoped for. The fact we got transcription to work in a minimal way is very huge for us.

What we learned

We learned a number of lessons the hard way. The general theme was that audio can be very tricky to deal with.

We learned to consider our options more carefully. We committed early on to using AssemblyAI with our project, but wasted a good portion of our time trying to fix external issues that were not clearly documented. If we pivoted to another service, we may have been able to spend less time on integrating the service and could have focused more on the features we planned on adding.

One thing was that, if we're going to do anything audio related, we need to really hammer down on writing our own C/C++ code to get close to the hardware- there's almost nothing that will help us know the source (app) transmitting audio to a mixer. But, that doesn't mean there's a lot of documentation and code that would help guide us to our goal.

What's next for uJet

Given more time, the audio analysis on the system end could've worked out much better. This is still something we would like to pursue in the long run, but with a much stronger look and better implementation of dealing with audio.

Not to mention, a lot of the code is poorly structured- we would like to create a much more refined code structure, on top of having the true capabilities we set out to create.

Another issue we would like to address is how this is only usable on a Linux system. PulseAudio is a Linux only software, meaning our project would not work on any other OS. We would like to have a solution for Linux, but also Windows and MacOS one day.

Overall, this is something we do not want to drop- we want to persist in getting this to work eventually.

Built With

  • assemblyai
  • java
  • pulseaudio
Share this project:

Updates