SpeechSplendid

SpeechSplendid.IO, a modern approach to speech analysis and feedback.

Inspiration

What inspired us to create the app was the realization that speech therapy and practice are not only crucial for individuals with speech disorders, but they can also benefit anyone facing speech anxiety or a lack of confidence in speaking. Our inspiration stemmed from recognizing that the challenges of speech anxiety extend beyond specific populations and can affect people from all walks of life.

Drawing upon the insights provided by the National Library of Medicine, which identifies public speaking anxiety as a social anxiety disorder prevalent in 15% to 30% of the general population, we understood the significant impact speech anxiety has on individuals' daily lives. We were motivated to develop a solution that could address this issue and provide valuable support.

Moreover, we observed the rising costs of traditional speech therapy, with sessions often exceeding $100 per hour, making it financially burdensome for many families. The Covid-19 pandemic, leading to heightened inflation and economic strain, further emphasized the need for an accessible and affordable alternative.

Combining these factors, we embarked on creating our speech therapy/practice app. While originally inspired by the needs of children with speech disorders, we aimed to cater to a broader audience by addressing speech anxiety and fostering confidence in speaking. Our app, Speech Splendid, was designed to provide innovative tools and feedback to enhance speech performance and boost confidence while speaking. By creating a user-friendly and accessible platform, we aspire to offer a valuable resource that enables personal growth and communication skills development for all users.

What it does

Our app is an AI-driven speech practice buddy. Once a recording of a speech is uploaded to the platform, our app splits the video into its linguistic and facial components, which are then individually analyzed via machine learning and natural language processing to generate insightful metrics, which are displayed for the user along with feedback to improve. For the linguistic component, our app performs sentiment analysis, topic identification, and behavioral analysis (i.e. identifying the tone of the text). We also give feedback regarding the usage of low-confidence words, such as “uh”, which may decrease the effectiveness of the speech. For the face segment, our app samples a certain percentage of the frames throughout the video, which are then put under a deep convolutional neural network to determine the overall sentiment of the facial expressions in the speech. In a matter of seconds, our program outputs all of this useful information, which can be invaluable to improving speech confidence and performance. This method of online speech analysis is quick, helpful, and intuitive.

How we built it

We built our project using Python, Streamlit, machine-learning, transfer-learning, IBM-cloud, VADER, py-feat, expert.ai, and Natural Language processing.

Challenges we ran into

The most arduous problem was deciding which model architecture to use for the expression analysis. Using a more complex system of models would mean sacrificing the number of frames analyzed. Originally, we used the Python Facial Expression Analysis Toolbox (py-feat) which allows us to import pretrained transfer learning models to use for frame analysis. Each frame would be run under 5 models, with 4 of which being deep neural networks. The models performed face detection, alignment, landmarking, muscle contraction analysis, and emotion analysis. While the added facial detention did slightly improve analysis, the large amount of computations done on each frame limited the amount of frames that we could analyze from the video, since we had to balance efficiency with runtime and memory management. This complex system of models often crashed the app when hosted on Streamlit due to spikes in memory usage. The solution for this problem was exchanging our current approach in favor of a lighter model. The DeepFace model is a nine-layer CNN that was trained on over 4 million images, and it was developed by researchers at Facebook. The DeepFace system drastically decreased runtime while still maintaining a high level of accuracy. Additionally, using a lightweight model also enabled us to sample more frames from the speech video, improving the accuracy of our expression scores.

Accomplishments that we're proud of

Bridging natural language processing and image detection/analysis to create an app that helps people address such a common problem such as public speaking anxiety.

What's next for SpeechSplendid

Adding practice functionalities to the app, further optimizing memory usage, adding an option for live recordings, and hosting the app on a server.

Built With

Submitted to

ROBOHackIT 2022
- Winner Second Place
- Winner Best Use of NLP with Cohere

Created by

I worked on the back-end: picking the models, writing the Python code, debugging.

Neel Kondapalli
I created proof of concept, researched background info such as medical data, and I worked on coding the front end through streamlit.

Shreyas Sambara

Updates

Neel Kondapalli started this project — Oct 23, 2022 11:04 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.