Inspiration
We are college students who occasionally (often) miss our lectures and end up watching the recorded versions of the lectures. We wanted to devise a better way to use recorded videos to study, so we created Waffle!
What it does
Waffle turns a video into a custom chatbot -- Users can ask questions about the content, access a summary of main ideas, request additional information, and find supporting resources online to further their exploration.
How we built it
The backend is written in Python. First, we use Whisper-JAX to convert the audio stream into a transcript string, which we then feed to an OpenAI LLM using LangChain for document summarization and interactive generative question-answering. Additionally, we use the Metaphor API to access and provide additional resources online that are relevant to the primary content of the video. Finally, we use FastAPI to create a RESTful API service that the front-end can interact with.
For the frontend, we use React and Chakra UI. We stuck with a theme of waffles, and the color scheme derived from that.
Challenges we ran into
Initially, we used the default Whisper API from OpenAI, but we ran into trouble due to poorly scaling runtime; fortunately, we found Whisper-JAX, which combines Whisper with Google's JAX machine learning TPU-based framework. That enabled us to have a 70x speedup on transcription prediction.
On the non-technical side, one of our members fell sick and couldn't make it, and another two of us had their flight canceled and subsequently did not arrive on campus until 2AM on Saturday. Despite this, we're grateful to still have been able to hack together!
Accomplishments that we're proud of
We're happy to have been able to finish both our initial vision of Waffle as a basic chatbot, as well as include additional functionality for summarizing the video and retrieving relevant information from the internet using Metaphor.
What we learned
We learned a lot more about LLM integration using LangChain -- specifically, we learned about how LangChain composes LLM calls and how we can use it to increase the velocity of AI-based development. We also gained experience using FastAPI and Render, since we had previously never used these for backend development.
What's next for waffle
We currently have the generalized functional data pipeline set up, but it accepts videos in the form of Youtube links -- later, we will expand functionality to allow users to upload any type of video file for parsing. We also intend to incorporate a database to allow users to store their chat history for certain videos that they interact with, for later review.
Built With
- chakra-ui
- fastapi
- gunicorn
- javascript
- langchain
- metaphor
- openai
- python
- react
- render
- whisper-jax
Log in or sign up for Devpost to join the conversation.