Inspiration

We were inspired by the informative, funny, but sometimes chaotic YouTube comments — a place where everyone interact on the internet with pop culture. We thought: what if we could teach a model to talk like the internet's youtube comments?

What it does

Our streamlit demo generates YouTube-style comments based on user-selected video categories (like Gaming or Music) and an optional seed prompt. It captures the tone, randomness, and flavor of real comment sections using various models.

How we built it

We started by merging two datasets of ~4000 YouTube videos with 400,000+ real user comments. We filtered for English-only content and cleaned the text while keeping fun elements like emojis. Using this data, we trained a basic trigram model to predict the next word based on the two preceding words. Then, we used more advanced models like TF-IDF. We wrapped it all in a Streamlit interface to make it interactive and fun!

Challenges we ran into

  1. Cleaning chaotic YouTube comment data(random links, weird symbols, mentions, etc.) without stripping out too much personality.
  2. Ensuring category alignment between videos and their comments
  3. Balancing randomness in generation while maintaining coherence

Accomplishments that we're proud of

  1. Built a working end-to-end text generator in under 48 hours (and put in on streamlit for demo and interaction)!
  2. Learned more about language processing models; preserved the chaotic nature of YouTube comments while staying within categories.
  3. Learned to turn messy, real-world data into something functional.

What we learned

We learned about implementing tf-idf, n-grams, and to build an interactive demo with streamlit

What's next for Youtube Comment Generator

Better training to make the generated comments more interesting and more like "human", of course!

Built With

Share this project:

Updates