Inspiration
In our last hackathon, we struggled with creating and practising our presentation before the judges. Without any feedback from others not involved in the project, we could not understand the weak parts of our speech. We realized that having a website that could automatically find these issues and give us advice would be incredibly useful.
What it does
Our project takes in an audio file of a practice presentation, the user’s intended tone of the speech, and target timeframe. Using this information, the website analyzes the speech patterns, grades the overall writing, and saves all the information to their project. It gives back the total duration of the speech in terms of the time, total number of words, the average speaking rate in words per minute, and number of filler words (including the total percentage of filler words in the speech). In addition, it gives a total performance breakdown of the presentation with statistics of the time accuracy, a grade for the coherence and tone, a score for filler words, and a grade for pause control. It then gives specific data and advice that can improve their skills. There are also specific tabs for each data point with more detailed information, including the overall message in each section of the speech, and the entire transcript.
How we built it
We programmed the website's UI and backend in Next.js, and used Google Gemini to analyze the audio recordings in depth. We also used Prisma as a wrapper for MongoDB as our database to store the results of all the saved projects on the website.
Challenges we ran into
One major challenge we ran into was being able to convert the analysis information from Gemini into text that can be displayed on the website. We had created the website to get the audio recording input, and also integrated Gemini AI to analyze the audio recording. We were able to get this text in our code, but we struggled to figure out how to output this text back onto the website.
Accomplishments that we're proud of
One accomplishment we are proud of is getting the website to accurately detect all the filler words in a speech. At first, the model would only detect a few, major pauses, but after some major changes, we were able to get it to detect each individual minor pause and filler word as well.
What we learned
We learned much about creating the UI of a website. We had procrastinated the design of our project, thinking that focusing on the functionalities was more important. However, after creating and displaying the main utilities, we realized that working on the format and graphical elements was just as vital. We ended up spending a large amount of time on the design as well, which made our website look much cleaner.
What's next for Speakify
Next, we would like to expand the capabilities of the website to also analyze the eye contact, posture, and other body movements of a presentation. Users would be able to input their speech video, and using an AI model, the website could give additional recommendations about their body language. In addition, we would also like to convert the website into a mobile app.
Built With
- better-auth
- gemini-api
- google-gmail-oauth
- mongodb
- next.js
- prisma
- react-native

Log in or sign up for Devpost to join the conversation.