AccessiGesture

Doing Webwork with Accessigesture
Web browser usage
Playing OSU
our customizable UI

About AccessiGesture

Our Inspiration: Accessibility for All

In a world that runs on digital interaction, the mouse and keyboard remain a fundamental barrier for millions. We were inspired by the daily challenge faced by individuals with motor disabilities, arthritis, repetitive strain injuries (RSI), or other conditions that make using a traditional mouse difficult or even painful. We asked ourselves: why should digital accessibility be a luxury, gated behind expensive, specialized hardware?

Our project, AccessiGesture, is our answer. It’s built on the philosophy that accessibility should be built-in, not bolted-on. We envisioned a tool that could transform any standard webcam—a device already built into most computers—into a high-fidelity, intuitive, and ergonomic gesture controller. We wanted to create something that didn't just work, but felt empowering, requiring no special setup, just the user and their hands.

What It Does: Fluent, Hands-Free Control

AccessiGesture runs as a lightweight, unobtrusive application that translates a user's hand movements into direct, real-time cursor control. We meticulously designed a set of gestures to be both ergonomic and comprehensive, allowing for the full spectrum of mouse operations.

Cursor Movement: An open hand gesture ([1, 1, 1, 1, 1]) puts the system into "mouse mode." The application maps the hand's position within a user-defined "active zone" on the screen to the entire desktop, allowing for smooth, 1-to-1 cursor movement.
Left Click & Drag: A pinch between the thumb and index finger triggers a "left-click-down" event. This was a critical design choice: by holding this pinch, the user can naturally drag files, highlight text, or interact with any drag-and-drop interface. Releasing the pinch immediately fires the "left-click-up" event.
Right Click: A pinch between the thumb and middle finger executes a single, discrete right-click. We intentionally chose a different pinch to make this action algorithmically distinct and prevent accidental right-clicks.
Scrolling: We mapped vertical motion to intuitive, vertical gestures. A "thumbs-up" gesture ([1, 0, 0, 0, 0]) initiates continuous scrolling up, while a "thumbs-down" gesture scrolls down. The user can hold this pose to scroll through long documents or websites effortlessly.
Pause/Resume: Perhaps most critical for usability is the pause gesture ([1, 1, 0, 0, 1]). This allows the user to pause and resume tracking at any time. This frees them to rest, type, or use their hands naturally without the cursor flying across the screen or accidentally clicking—a simple feature that is the key to making the application practical for all-day use. (Each Control Gesture Changeable)

How We Built It: A Symphony of Python Libraries

The project was built entirely in Python, which served as the "glue" for a stack of powerful, specialized libraries, each chosen for a specific purpose to create a seamless, real-time experience.

OpenCV: This was our high-speed, low-latency "eye." We used it for the initial camera capture, frame-by-frame processing, and image manipulation. Crucially, OpenCV also served as our visual debugging tool, allowing us to draw the hand landmarks and our program's current "state" (e.g., "PINCH") directly onto the video feed.
Google MediaPipe (Hands): This is the core machine-learning engine and the "brain" of our recognition. We leveraged its incredibly high-fidelity, pre-trained model, which provides 21 distinct 3D landmarks for the hand in real-time. This level of detail is what allowed us to move beyond simple motion tracking and reliably distinguish between nuanced gestures like an open hand, a pinch, and a thumbs-up.
PyAutoGUI: This library was our "hand," serving as the bridge to the operating system. It programmatically translates our detected gestures into universal, OS-level mouse movements (pyautogui.moveTo), clicks (pyautogui.click), and scroll events (pyautogui.scroll), allowing AccessiGesture to control any application on the user's computer.
Tkinter: We used Tkinter to elevate the project from a simple script to a polished application. It provides the essential graphical user interface (GUI) where users can adjust critical parameters in real-time. This includes gesture-to-action mapping, sensitivity sliders, detection confidence, and the boundaries of the "active zone," making the tool truly personalizable.

Challenges We Faced (And Our "Aha!" Moments)

Our development process was a story of hitting walls and finding breakthroughs.

Our first challenge was Framework Selection. We initially explored a sophisticated web-based solution using Next.js, drawn to the idea of a modern, browser-based tool. However, we quickly ran into the sandboxed limitations of the web. Gaining the simple, direct, low-latency access to the webcam and, more importantly, the system-level cursor control we needed was a massive, complex hurdle. We made the hard pivot to a native Python application, which proved to be the right decision, offering us the raw power and simplicity we needed.

Our primary technical hurdle was Gesture-Clash Resolution. Our first prototypes were a mess of false positives. A "fist" might be misread as a "thumbs-up." A hand moving into position would be misread as a "scroll." We learned that a gesture that feels natural to a human is not always algorithmically distinct for a machine. This led to our biggest "aha!" moment. Instead of just checking if a finger was "up" or "down," we re-engineered our get_finger_states function to be rotation-invariant. It now calculates the 2D distance of each fingertip relative to the wrist. A finger is only "open" if its tip is farther from the wrist than its middle (PIP) joint. This single change dramatically improved accuracy and made the system robust to different hand angles.

This directly led to one of our proudest accomplishments: building a powerful visual debugger. By drawing the landmark data and our program's state (like "PINCH DETECTED") directly onto the camera feed, we could see exactly where our logic was failing and iterate rapidly.

Finally, our first prototype was functional but suffered from unusable input lag. This taught us how to profile and debug performance bottlenecks. We learned firsthand how blocking I/O (camera capture) and CPU-intensive tasks (MediaPipe processing) were fighting for resources in our main loop. We had to profile, debug, and hunt for every redundant calculation. We optimized our pipeline and fine-tuned MediaPipe's parameters, finding the perfect balance between accuracy and performance.

Accomplishments That We're Proud Of

This project was a journey of firsts, but a few accomplishments stand out:

The Visual Debugger: We are incredibly proud of the real-time diagnostic overlay. By drawing our program's state (gesture_detected, fingers_list) onto the camera feed, we created an essential engineering tool that allowed us to see our logic working (or failing) and iterate with incredible speed.
The Standalone App: We didn't just build a script. We built a full desktop application with a robust UI, packaged as a standalone executable. This was a huge step in learning how to deliver a real product to users.
Mastering the Stack: We dove headfirst into a professional-grade development environment, learning to integrate complex libraries like OpenCV and MediaPipe within VS Code.
The Ultimate Test: Beating *osu!* We knew we had solved our latency problem, but we needed a final test. We launched the high-speed rhythm game osu!... and were able to successfully play a map using only our gestures. It was the ultimate, hilarious proof that our system was truly real-time.

What We Learned: A Deep Dive into HCI

This project was a masterclass in the architecture of real-time computer vision applications.

Stateful Program Design: We learned the critical importance of managing "state." A user's intent is not just in a single frame; it's a process. We had to build logic to track is_pinching vs. was_pinching to distinguish between a discrete "click" and a continuous "drag."
Human-Computer Interaction (HCI): We learned that gesture selection is a deep, complex trade-off. A gesture must be intuitive for the user, ergonomic to hold, and mathematically unambiguous for the machine. This core conflict is the central challenge of HCI design.
Performance Profiling: We learned that in real-time apps, "good enough" code isn't. Every line matters. We learned how to hunt for performance bottlenecks and understand the give-and-take between a model's accuracy and its processing speed.

What's Next for AccessiGesture

Trainable Models: Our next big leap is to move from our hard-coded (heuristic) gesture rules to a proper machine-learned model. This would not only make detection more robust but would also allow users to train the app to recognize their own unique set of custom gestures.
Expanding the Vocabulary: Right now, we only speak "mouse." We want to expand AccessiGesture's features to include keyboard shortcuts (like Ctrl-C, Ctrl-V) and a pop-up virtual keyboard, transforming it from a mouse replacement into a complete, hands-free operating system navigator.