Inspiration
I've dreamed about an idea for a while called Databate - where any topic could be broken down into pros/cons and have automated fact-checking to user responses. This is NOT that. But the thought has stayed with me and I wanted to do something with fact-checking on a Gigacontext scale, with something I could only do with Modal's ability to parallelize at scale.
Polarization is everywhere and more and more sources (cough cough Twitter) are trying to fact-check information with varying degrees of success. Unfortunately, even tweets are too far removed from the context of the conversation.
What if you could have a conversation with a friend and fact-check in real time; ideally in a zoom call or video interface where you could see that you were wrong and immediately correct and not waste time operating on wrong information. That's what this is - or at least a very basic MVP.
What it does
You have a conversation / record a conversation. The conversation is automatically transcribed faster than you thought possible and fact-checked in real time. Wrong statements show the correct information, vauge/nuanced statements are labeled as such.
How I built it
- I started with the frontend and used React to create a UI and data pattern I liked for the API to send
- Then I looked for the lowest latency SST providers and decided on cartesia-ink, the newest SST model. It's truly incredibly accurate and fast on my testing.
- I now needed a corpus of "factual data" to validate. I used Modal to ingest all of wikipedia in 15 minutes thanks to the massive parallel processing it provided.
- I used qwen-3-embedding to embed chunks into turbopuffer as a primary vector store, also using Modal.
- I used Cerebras and qwen-3-235b to respond 1200t/s to make responses instantaneous.
Challenges we ran into
- Every single thing I used was new/rusty. I haven't used Modal, turbopuffer, Huggingface, Cognition, or even python in a good few months. I also had very limited time, only being able to start hacking at 8PM yesterday.
- Getting modal to work took longer than I expected for some reason, since all these libraries update frequently, a lot of the code I generated with LLMs was outdated and wrong.
Accomplishments that I'm proud of
- just being able to get as far as I did in the 8 hours I was able to work on this.
What I learned
- Deepwiki is incredible and I'll be frequently using it on my company codebase
- Modal's parallelism makes things I would have assumed to take a few days take a few minutes. I would never have assumed I would have gotten as far as I did pre-hackathon.
What's next for You're wrong
- Allowing user submitted media for fact checking.
- Refreshes with updates from news, more wikipedia, company knowledge bases, youtube videos, twitter, stock information, etc.
Built With
- cartesia
- cerebras
- cognition
- modal
- react
Log in or sign up for Devpost to join the conversation.