Inspiration Sento, meaning “I Hear”, was inspired by how visual-first the internet is. Blind and visually impaired users are often excluded from platforms like YouTube, where critical information is communicated through visuals. We wanted to create a system where sound becomes the center of the experience — turning video into real-time, meaningful audio narration. Our goal was to build something calm, accessible, and trustworthy: a tool that lets users hear the internet rather than struggle to see it. What it does Sento is split into two powerful parts that work together: Website – Real-Time Video Narration The Sento website is responsible for accessibility and narration. It allows users to: Paste in any YouTube link for instant accessibility. Receive real-time AI narration of what’s happening on screen. Get continuous live transcript analysis to describe visual events, gestures, and scene changes. Hear everything through soft, calm, human-like text-to-speech designed specifically for blind and visually impaired users. Experience a minimal, gentle interface built for clarity and ease of use. This part of Sento focuses entirely on making video content audible, understandable, and comfortable. Chrome Extension – Safety & Filtering Layer The Sento Chrome Extension focuses on safety while browsing YouTube, not narration. It provides: Real-time misinformation detection on video titles. Clickbait detection for exaggerated or misleading thumbnails and captions. A kid-friendly browsing mode that blocks or warns about unsafe videos. Automatic filtering of content involving: Violence Hate speech Sexual content Harmful or disturbing topics Continuous live scanning while users browse YouTube. This keeps users protected before they even choose a video. How we built it We built Sento using modern AI and web technologies: Speech generation powered by neural text-to-speech models Real-time transcript parsing from YouTube’s live caption systems Custom AI pipelines for scene understanding and narration timing Chrome Extension APIs for safe content scanning Lightweight front-end optimized for screen readers and keyboard navigation Each system was built independently and then connected to ensure clear separation between accessibility and safety. Challenges we ran into Synchronizing narration timing with fast-moving video scenes Preventing narration from overwhelming the user with too much information Building a calm, non-intrusive voice system Detecting misinformation and clickbait in real time without slowing down browsing Designing an interface that works purely by sound and keyboard controls Accomplishments that we're proud of Creating truly real-time narration instead of delayed summaries Achieving live, calming speech output with almost no noticeable lag Building a functional misinformation and safety filter in a browser extension Keeping the system modular so the website and extension do not interfere with each other What we learned We learned that accessibility isn’t just about adding features — it’s about removing friction. Small things like tone of voice, timing, and simplicity matter a lot when building for blind and visually impaired users. We also learned how powerful AI can be when it’s used not for speed, but for clarity and empathy. What’s next for Sento – "I Hear" In the future, we plan to: Expand beyond YouTube to platforms like TikTok, Twitch, and news sites Add multilingual real-time narration Offer adjustable narration styles and voice personalities Create a mobile version for hands-free listening Sento’s mission is simple: If you can’t see it — you should be able to hear it.

Share this project:

Updates