Inspiration
Millions of people interact with social media every day — but for those with motor disabilities, limited mobility, that simple scroll and tap can be a barrier. We wanted to reimagine what it means to be hands-free on the internet. HandsFree started as a question: what if you could control Twitter with nothing but a gesture?
What it does
HandsFree is a Chrome extension that lets you control Twitter/X entirely through hand gestures detected by your webcam. Give a thumbs up to like the post in front of you. Flash a peace sign to open the compose page. Do the shaka to scroll to the next post, which gets read aloud automatically. Hold up an open palm to hear the current post read out loud, or stop it mid-sentence. On the compose page, your tweet is dictated by voice and submitted with a confirmed thumbs up — no keyboard required.
How we built it
We split the work across three layers. The gesture recognition layer uses MediaPipe Hands to track 21 hand landmarks per frame and classify them into named gestures in real time — running on a hosted Vercel page to avoid Chrome extension CSP restrictions on WebAssembly. The extension layer is a Manifest V3 Chrome extension built with Vite and CRXJS, consisting of a background service worker that routes gesture events, a content script that performs DOM actions on Twitter, and a side panel UI. The two layers communicate via chrome.runtime.sendMessage using externally_connectable. Audio feedback uses Chrome's TTS API, and tweet dictation uses the Web Speech API.
Challenges we ran into
Chrome's Manifest V3 Content Security Policy blocks WebAssembly loading from CDNs inside extension pages — which is exactly what MediaPipe needs. After several failed approaches, we pivoted to running gesture detection on a separate hosted page and messaging the extension, which unlocked everything. Filling Twitter's React-based contenteditable compose box programmatically was also tricky — we needed the Selection API combined with execCommand to properly trigger React's internal state. Keeping gesture state, TTS state, and page context in sync across the content script and background worker required careful design to avoid race conditions.
Accomplishments that we're proud of
Getting the full end-to-end pipeline working — webcam to MediaPipe to Chrome extension to live Twitter interaction — felt like a real milestone. The post confirmation flow (audio prompt before submitting) shows that accessibility and safety aren't mutually exclusive. We're also proud of how context-aware the gestures are: the same thumbs up likes a post on the feed and submits a draft on the compose page.
What we learned
MV3 is significantly more restrictive than MV2, and understanding its CSP model was a hard-won lesson. We also learned a lot about React's synthetic event system and why programmatic DOM interaction often requires more than a simple .click(). Most importantly, we learned that building for accessibility requires thinking about the full user journey — not just individual features, but how they chain together.
What's next for HandsFree
Fingerspelling support to compose tweets letter by letter using ASL hand signs. Expanding to other platforms — the gesture layer is completely platform-agnostic, so Instagram and TikTok support would be straightforward. A smarter gesture engine using a trained ML classifier instead of landmark heuristics, for better accuracy across different hand sizes and lighting conditions. And a proper onboarding flow so any user can set it up without touching a config file.
Built With
- chromeextension
- chrometts
- crxjs
- googlechrome
- javascript
- manifestv3
- mediapipe
- vercel
- vite
- webrtc
- webspeech
Log in or sign up for Devpost to join the conversation.