Speak&Sync

Current Speak&Sync UI using our website.

Inspiration

Our team has experience on both the customer side and the service worker's side. Each side presents their own challenges that may make the transaction difficult. The customer may not have a clear idea of what the issue is, or may not speak in a structured manner. This makes the job of the service worker more difficult as they must sift through the customer's statements and pinpoint issues. This may result in lengthy conversations and confusion for both parties.

What it does

Speak&Sync transcribes the customer's statement and summarizes it into key points for the worker to reference. Using the SpaCy NLM, Speak&Sync identifies the customer's name, the main problem, steps the customer has taken to solve the issue, and additional relevant details.

How we built it

We used a variety of Python libraries to first record the customer's voice and save it into a .wav file, then analyze the file using the SpeechRecogintion Library's speech-to-text, and finally input the resulting string into SpaCy's NLM model to extract the required information.

Challenges we ran into

We encountered issues accessing larger NLMs and LLMs. We initially planned to use OpenAI's whisper or Google's Speech-To-Text V1. Since we used a locally running NLM, we encountered issues implementing the program into the website, as running the NLM would overload the server's memory. We also had issues with sending an email with the summary to a customer as the software wasn't functioning as intended.

Accomplishments that we're proud of

We made the core of our program function and acquired a lot of knowledge about generative AI and web development tools.

What we learned

We each learned several new programming languages and APIs, including React, Flask, Javascript, CSS, HTML, and GenAI python libraries.

What's next for Speak&Sync

Properly combining the front-end and back-end by using Google's Speech-To-Text or Gemini's API. This allows us to access a larger LLM to produce more accurate outputs, and run the AI on Google's Server, therefore resolving the issue of overloading our hardware's memory.

We also want the software to be incorporated into a company's Telecom communication lines. The system would automatically transcribe and summarize the phone conversation between the service worker and the customer, or only record the customer if they choose to not directly speak to a service worker. A final improvement would be fixing our email sending software to be able to send the customer a summary after the call.