Inspiration
We were inspired by the informative, funny, but sometimes chaotic YouTube comments — a place where everyone interact on the internet with pop culture. We thought: what if we could teach a model to talk like the internet's youtube comments?
What it does
Our streamlit demo generates YouTube-style comments based on user-selected video categories (like Gaming or Music) and an optional seed prompt. It captures the tone, randomness, and flavor of real comment sections using various models.
How we built it
We started by merging two datasets of ~4000 YouTube videos with 400,000+ real user comments. We filtered for English-only content and cleaned the text while keeping fun elements like emojis. Using this data, we trained a basic trigram model to predict the next word based on the two preceding words. Then, we used more advanced models like TF-IDF. We wrapped it all in a Streamlit interface to make it interactive and fun!
Challenges we ran into
- Cleaning chaotic YouTube comment data(random links, weird symbols, mentions, etc.) without stripping out too much personality.
- Ensuring category alignment between videos and their comments
- Balancing randomness in generation while maintaining coherence
Accomplishments that we're proud of
- Built a working end-to-end text generator in under 48 hours (and put in on streamlit for demo and interaction)!
- Learned more about language processing models; preserved the chaotic nature of YouTube comments while staying within categories.
- Learned to turn messy, real-world data into something functional.
What we learned
We learned about implementing tf-idf, n-grams, and to build an interactive demo with streamlit
What's next for Youtube Comment Generator
Better training to make the generated comments more interesting and more like "human", of course!
Built With
- python
- streamlit
Log in or sign up for Devpost to join the conversation.