🦑 What the Sigma?
Chat, Clip That! 🎬 is a wearable device that captures live video and audio and automatically generates video clips and edit them in real-time with our unique multi-layer and multi-agent architecture. A livestream of the feed is made available on our platform for anyone to tune in, and saying the infamous meme phrase "Chat, Clip That" yields a fully-and-programatically-edited video clip of the past thirty seconds, powered by semantic analysis, a breadth of meme editing assets and techniques, and makes it available to the website for viewing and downloading.
🗿 How the Skibidi Does It Work?
Chat, Clip That! 🎬 's pipeline begins by bridging our camera-and-microphone-equipped Raspberry Pi 🍓 to our processing server with a custom-built TCP tunnel that handles two separate audio and video streams. The audio is then processed through a pre-trained speech-to-text model to listen for our phrase, and to perform arithmetic with the audio chunks to determine timeline placement for audio-video time-stamping integral for the project.
These timestamps are then utilized by our 'Brainrot Bot', a prompt-tuned LLM designed to perform semantic analysis on the contents of the video clip to decide both where we should edit the video, and also how we should edit it to fit the context and maximize humor. We parse the output of Brainrot Bot to programmatically edit the video clip to its desires, and finalize the clip.
Finally, a second TCP tunnel and HTTP endpoint live on our second (yes, this project contains TWO servers) powers our livestream website where a public endpoint allows anyone to tune in, view, and download any of the clips that were saved and edited by Chat, Clip That!.
🥊 Damn... 'Chat, Clip That' Got Hands
We were especially challenged by the software stack of this project -- creating a split TCP tunnel and taking the data through semantic analysis, transcription, and editing and only after the packet assembly and audio-video syncing is done through network optimization meant that while this project's user-facing ethos is whimsical and silly, we all agree this was the most difficult technical project we've submitted at a hackathon.
🏀 Smiling Through It All! Can't Believe This My Life
Here's what we're especially proud of:
- Successfully integrated the microphone, camera, and lights on our wearable hat
- Split-TCP tunnels for audio-video pre-editing, and secondary server for livestreaming and hosting
- Programmatic editing of buffered video clips and low-latency turnaround
- Real-time transcription of audio stream as opposed to periodic scanning for latency
- Front-end livestreaming website and clip hosting, with lovely UX/UI
- Duck hardware theming (always a W)
🍑 What Do We Gyatt Planned?
We have an exciting vision for enhancing the range of options our Brainrot Bot has to edit the videos through semantic analysis -- namely through object detection or facial recognition models. We think the more specific information we're able to feed to our Brainrot Bot means it can be more granular with the kinds of edits it wishes to make, and ultimately improve the humor of the system.




Log in or sign up for Devpost to join the conversation.