Inspiration
Short videos spread fast, but so do bad claims. Friends and family send clips every single day, with confident statements but no sources. We wanted a “pause → check → show sources” system that works in seconds, not hours.
What it does
User uploads a video, ReelGuardian transcribes it, pulls 3–6 fact-checkable claims, and returns verdicts: supported, refuted, or needs context. Each verdict includes a brief rationale, confidence score, and up to 3 citations, biased to primary sources (.gov/.edu, for example). ReelGuardian brings a visual summary (sampled frames from the content), so it also fact-checks based on what was on screen, not just what was said.
How we built it
Challenges we ran into
The problems and the solutions:
- Cold starts & CUDA problems: fixed with a cron GPU warmer.
- LLM JSON chaos: added a tolerant parser and a strict schema to keep the pipeline flowing.
- Search noise: biased to primary sources, URL dedupe.
- Timestamp alignment: mapping loose claims back to exact ASR segments for context.
- API rate/latency spikes: model rotation and CPU fallback to avoid user-facing failures (OpenRouter).
- Lack of visual information: passing sample frames as supplementary information for context.
Accomplishments that we're proud of
End-to-end upload → transcript → claims → verdicts with sources in one pass.
Solid Modal stack: cache-hot models, warmed GPU, identical behavior on web/worker.
Resilient LLM layer (multi-model + loose parsing) that doesn’t derail on imperfect outputs.
Clean, human-readable output with verdicts, confidence, and citations that people can trust.
What we learned
Infra simplicity wins: one pinned image + shared cache gets amazing speed
Warmers and caching matter more for UX than fancy prompts.
Primary-source (and simply sourcing the information) bias can improve trust (and judge confidence) immediately, which compensates for possible LLM slip-ups.
What's next for ReelGuardian
Browser extension: inline overlays on TikTok, YouTube Shorts, Reels.
Real-time mode for livestreams/spaces; incremental ASR + rolling verdicts.
Richer vision: reliable OCR for on-screen text, logo/channel provenance.
Proper dialogue isolation and voice detection (narrator vs dialogue, etc).
Multilingual ASR + diarization
Shareable reports and an API for moderators, creators, and newsrooms.
Built With
- github
- modal
- nextjs
- openrouter
- tavily
- whisper
Log in or sign up for Devpost to join the conversation.