Chopsticks - AI powered Video Editor

Project Plan

Inspiration

We live in a digital world fueled and filled with more content than ever. After conducting extensive market research with numerous Twitch and YouTube creators, we stumbled upon a rather niche issue. Content creators face tremendous difficulty when having to edit lengthy videos; Their tools are often designed to be used by experts. To account for this, many have to waste a lot of time learning tools or resort to outsourcing their work. In addition to this, the user experience with certain editing software often feels archaic with a disproportionate amount of tools provided to the user.

What it does

In comes Chopsticks, the premier AI-powered editing software that utilizes deep learning to improve efficiency, enhance user experience, and (amazingly) increase creator profits. Our platform consists of a dual-use system: one chat-powered end where we take user text queries and perform video manipulation, and another where we analyze the most entertaining and important parts of a video (based on developed metrics) and present the user with many different clips of short-form content. With that being said, Chopsticks is a first-of-its-kind software coming to market, and here is how it works:

How we built it

Retrieves voice transcription using Whisper, chat logs using OCR/Web Scraping, and creator expressions using OpenCV.
Uses Roberta's fine-tuned model to analyze the viewership engagement (chats) with the creator, and its sentiment.
Uses a fine-tuned T5 model to transcribe the stream and run a text-text analysis to gauge the streamer's key moments.
Uses a DeepFace model to read the streamer's reaction and weigh this metric with the models used above to produce valuable insights into key moments of a stream. 5 We normalize these metrics and generate "spikes" that occur at certain time intervals, representing high levels of engagement between the streamer and the viewer.
This data is fed into our LLM-based video clipping tool, to autogenerate or chop clips into short-form content.
Using Reflex, we created a simple user interface that allows users of any level to be able to edit their videos seamlessly.

Challenges we ran into

One of our most complex challenges was regarding how to determine whether something was "entertaining" or not. When dealing with human emotions, classifying data in a meaningful way becomes less boolean and harder to quantify. To overcome some of this friction, we spent a lot of time identifying relevant factors that contribute to this metric. We decided to give custom weights to certain inputs (chats being the highest since we have more consistent data to rely on), leading to an overall better model.
Another big problem we faced technically was the memory and time needed to classify our inputs. For our first run with a 30-minute video, our combined time to gather transcription data, chat logs, and facial emotion recognition data took us well over an hour. Thinking about the consumer, we realized this wouldn't be sustainable in the long run so we cleaned up our algorithms, ignoring certain data to significantly reduce overhead. Recently, we were able to classify this large video in less than 20 minutes by running scripts at the same time and using better hardware.
Coming into TreeHacks we initially were on track to pursue a project that analyzes research papers for beginner researchers. When we talked to a mentor here (shoutout to Luke), he asked us the hard, but important questions. When we discussed who our consumers would be and the real use case of our product, we realized that maybe research was a track we didn't want to pursue. 4 hours into hacking we got back to the drawing board and went about choosing a project a different way.

Accomplishments that we're proud of

Although coming up with an idea on the spot started very difficult, we approached the idea by first conducting heavy market research in many different fields which then led us down the content creation path. Coming across this hump in our journey was not only a breath of relief, but it also provided us with a newfound motivation to put all our effort into a singular goal that we all believed had potential.
On top of this, we all as a team have grown tremendously in the technical space. Being introduced to new sponsor technologies like Reflex, we were able to create a compelling web app using only python.
Lastly, our proudest moment was when our first output was generated. We had selected a random Pewdiepie Minecraft stream and when we saw the quality of the short format videos generated, we knew that all the work we had put in was not in vain and our project indeed had a future.

What we learned

Coming in with a diverse range of skill sets, a quick thing we failed to grasp on our first night was splitting work efficiently. When we had our first team meeting the day after, we split up work better, allowing members who are proficient at doing something to create quality work in that area. This reduced our workload (still 2 all-nighters) and allowed us to get significantly more work done.

What's next for Chopsticks?

As a potentially (very) successful startup, our goal for Chopsticks is to push directly into the market. One big constraint we had on us during TreeHacks was simply time. Our models were efficient but sometimes didn't classify our inputs perfectly. By having the time to fine-tune our custom models, generate better metrics for clips, and reduce overhead, we will be able to scale our company quickly and efficiently beat everyone to market. We hope to launch initially as open-source software to gain traction in the industry but then transition to a subscription-based model, which will allow us to pay for new hardware required to run our algorithm as fast as possible.

In terms of pure concept, our company has the potential to do good in our community. Not only is our product significantly cheaper than our direct customers, but our software has limitless applications to do social good, especially in the education space. By being able to quickly extract important bits of lectures into viewable content, students with short attention spans could easily learn content without being bored to death.

We hope to secure funding for this idea so we can keep spending time on a project we are all so passionate about.