Pulse

Inspiration

A lot of us students feel lazy, especially when we have to study course content. There's no music, there's no graphics, there are no games, or celebrities. Our project aims to provide a solution for this, while keeping us hooked!

Scientific Inspiration

Our attention spans are lowering (Source: USC Dr. Albright's studies). As we head on, it becomes harder and harder to focus on boring content. In most cases, within minutes you'll be wondering: when will something happen? How do platforms like TikTok or Instagram prey on our shifting mindset?

"In psychological terms [it's] called random reinforcement," Dr. Albright says. "It means sometimes you win, sometimes you lose. And that's how these platforms are designed... they're exactly like a slot machine. Well, the one thing we know is slot machines are addictive. But we don't often talk about how our devices and these platforms and these apps do have these same addictive qualities baked into them."

What it does

With Pulse, we take advantage of the ever-changing content that keeps you hooked to short-form video content on social media. We combine the content of coursework, with the addictiveness of TikTok to produce short clips summarizing the most important content of the course. More accurately, on uploading a lecture recording or slides, Pulse generates a short series of TikToks with the most important content from the lecture using AI. Further, videos produces come in many forms! You could be looking at satisfying videos while Lebron James tweets about "thermodynamics or you could be watching Elon Musk lecture you himself! With Pulse, you could be watching your favorite celebrities teach you only what you need to know.

How we built it

On our web app, users can upload the given course lectures or slides (mp3 and pdf formats), and view summarized notes of the coursework. The web app is supported by a Flask server and uses a combination of the MongoDB and Firebase databases to store data. We use OpenAI's Whisper speech recognition models hosted on a GPU to transcribe video data along with the strength of large language models to generate the most important content). We use a combination of several Python libraries PIL, Open-CV, MoviePy (ffmpeg) for video and image processing, and the TikTok API for Text-to-Speech.

To generate deepfakes of a few popular celebrities, we employ a modified version of this Wav2Lip implementation running on GPU-configured containers on Modal Labs. This allows us to lip-sync with high accuracy and quality. Additionally, to make voice clones of those celebrities, we also use some few-shot prompting via ElevenLab's API for text-to-speech.

Challenges we ran into

We ran into a significant amount of errors and challenges throughout the hackathon. Everything from a lack of sleep to attempting to synchronize the creation of multiple videos at once served as a hurdle throughout the last 36 hours. An error that we dealt with uniquely was related to our Databases. As we started, we used Firebase Firestore to store media files, deal with authentication, and serve as our database, but as we expanded our scope and began developing our mobile extension with React Native, we realized our production environment Expo Go and React Native had significant inherent friction. As time passed and we were determined to meet our all of our goals for the project, we decided to use a two-layer database, where we kept Firebase for user authentication and media storage, however, we introduced MongoDB to serve as our core Database and store user content. This solution ended up being efficient, sleek, and secure enough for us to continue development and move past an error that had stumped us for hours. Using PIL and the moviepy (FFmpeg) libraries for video/image processing in Python proved to be quite problematic as well. This was primarily due to the deprecated usage of these libraries and the slow processing runtime. We sped this up by switching to frame-by-frame traversal using open-cv (while maintaining some operations with PIL/moviepy).

Accomplishments that we're proud of

Going into the hackathon we were unsure of how much we could accomplish within 36 hours. But after 2 nights of straight work and a hefty amount of caffeine, we learned how to work with Model and React Native, built out a consistent video generation pipeline, allowed users to process both lecture slides and videos, and extended our project with a mobile app. We're extremely proud that all of it ended up working!

What we learned

This was one of the largest projects we've all built out in such a short-span. From everything ranging from working with cloud cluster services like Modal to extending our core application with a React-native App to solidifying AI video captioning using Whisper, we got the change to expand our knowledge of web-development and machine learning.