IRIS — Intelligent Reach & Interaction System

A spatial audio prosthetic that turns any smartphone into an object-finding guide for visually impaired users.

No special hardware. No wearables. No app install. Just open a URL on the phone you already own.

The Problem

2.2 billion people worldwide have some form of vision impairment. For many, the simple act of finding everyday objects — keys on a counter, a phone on a table, medication on a nightstand — requires asking someone for help or painstakingly sweeping their hands across a surface and hoping for contact.

Existing assistive apps like Be My Eyes and Seeing AI can describe what a camera sees: "Your keys are to the left." But description is where they stop. The user is left to translate a verbal hint into physical action, with no feedback on whether they're getting closer or drifting further away.

The gap: No existing solution provides continuous, real-time physical guidance from detection to touch.

What IRIS Does

IRIS bridges that gap with a three-phase closed-loop guidance system:

Phase 1 — Voice. The user taps Start and speaks naturally: "Find my keys." The app understands — no rigid commands, no button navigation.

Phase 2 — Sweep. The user holds their phone and slowly sweeps it across the table. IRIS sends camera frames to Gemini AI once per second, which judges how close the phone is to the target object. The phone vibrates faster as it gets closer — like a metal detector for everyday objects. Gemini also provides directional hints ("try moving left") spoken aloud.

Phase 3 — Touch. When the phone is directly over the target, IRIS says "Guiding now." The user props the phone and moves their hand into the camera view. Stereo audio beeps guide the hand: pitch and tempo increase with proximity, stereo panning indicates direction. When the hand reaches the object, IRIS confirms: "Found it!"

The entire interaction — from "find my keys" to fingers on the keys — takes about 30 seconds, eyes closed.

Why This Is Different

	Be My Eyes	Seeing AI	IRIS
Identifies objects	Yes (human + AI)	Yes (AI)	Yes (Gemini AI)
Describes location	Yes ("to your left")	Limited	Yes
Guides you there	No	No	Yes — continuous feedback until touch
Confirms you found it	No	No	Yes — audio + Gemini visual confirmation
Works on any object	Yes	Fixed classes only	Yes — describe anything in natural language
Requires install	App download	App download	No — runs in browser
Requires hardware	Phone	Phone	Phone (same one you have)
Feedback modality	Voice description	Voice + some haptic	Haptic + spatial audio + voice

Be My Eyes is a pair of remote eyes. IRIS is a pair of remote hands.

The key technical differentiator: IRIS uses Gemini 2.5 Flash as a zero-shot semantic object detector. Unlike YOLO or MobileNet (trained on fixed object classes), Gemini can find anything you can describe in words — "the small white pill bottle behind the mug" — with no retraining. This makes IRIS infinitely flexible for real-world use.

How It Works (Technical)

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Phone Browser                         │
│                                                          │
│  Camera (getUserMedia) ──→ Frame capture (JPEG)          │
│          │                        │                      │
│          ▼                        ▼ (1 req/sec)          │
│  MediaPipe WASM ◄──┐     Vercel API Routes               │
│  Hand tracking     │     ┌────────────────────┐          │
│  ~30fps client     │     │ /api/proximity      │──→ Gemini│
│          │         │     │ /api/detect         │   2.5    │
│          ▼         │     │ /api/confirm        │   Flash  │
│  Geometry Engine   │     └────────────────────┘          │
│  (distance + pan)  │              │                      │
│          │         │              ▼                      │
│          ▼         │     Bounding box / proximity        │
│  Sonifier ─────────┘                                     │
│          │                                               │
│          ▼                                               │
│  Web Audio API          navigator.vibrate()              │
│  (stereo beeps)         (haptic pulses)                  │
│          │                     │                         │
│          ▼                     ▼                         │
│      Earbuds              Phone motor                    │
└─────────────────────────────────────────────────────────┘

Everything except Gemini runs client-side. Hand tracking, geometry, audio synthesis, haptics, and speech all execute on the phone. The only server round-trip is one JPEG frame per second to the Gemini API through a Vercel edge function (which also keeps the API key server-side and secure).

Tech Stack

Framework: Next.js 14 (App Router), TypeScript, Tailwind CSS
Deployment: Vercel
Object Detection: Google Gemini 2.5 Flash (zero-shot, via REST API)
Hand Tracking: MediaPipe HandLandmarker (WASM, client-side, ~30fps)
Audio: Web Audio API (OscillatorNode + StereoPannerNode)
Haptics: Vibration API (navigator.vibrate(), Android Chrome)
Speech: Browser SpeechRecognition + SpeechSynthesis (free, no API)

Key Design Decisions

Bbox smoothing. Gemini returns slightly different bounding boxes each poll. Raw coordinates cause the target to "jump," confusing the audio guidance. IRIS maintains a rolling average of the last 3 bbox centers, giving stable guidance while still tracking movement.

Resolution-independent arrival detection. The "arrived" threshold is 10% of the frame diagonal, not a fixed pixel count. This ensures consistent behavior whether the camera provides 640×480 or 1920×1080.

Rolling window arrival. Instead of requiring N consecutive frames where the hand is "close enough" (which fails due to MediaPipe jitter), IRIS uses a rolling window: if 8 of the last 15 frames register arrival, it declares found. This tolerates natural hand tremor without false positives.

Phase-gated MediaPipe loading. The ~10MB MediaPipe WASM model is preloaded during Phase 2 (while the user is sweeping) so Phase 3 starts instantly with no loading delay.

Running Locally

git clone https://github.com/Karthikgaur8/IRIS.git
cd IRIS
npm install

Create .env.local:

GOOGLE_API_KEY=your-gemini-api-key

npm run dev

Open https://localhost:3000 on your phone (same WiFi network) or laptop.

Deploying to Vercel

vercel
vercel env add GOOGLE_API_KEY    # paste your key
vercel --prod

Open the Vercel URL on any Android phone with Chrome. Grant camera + microphone permissions. Plug in wired earbuds. Tap Start.

Usage

Tap START → grant camera + mic permissions
Say what you're looking for: "red earbuds case"
Sweep your phone slowly over the table — feel vibrations intensify as you get closer
When IRIS says "Guiding now" — prop the phone, move your hand into frame
Follow the stereo beeps to the object
"Found it!"

Accessibility

All interactive elements have aria-label attributes
Status updates use aria-live="polite" regions for screen reader compatibility
Works with VoiceOver (iOS) and TalkBack (Android) for initial navigation
Zero visual dependency during use — the entire UX is audio + haptic
Voice input with text fallback if speech recognition is unavailable

Limitations

Haptics: navigator.vibrate() works on Android Chrome only. iOS Safari does not support it — haptic feedback is gracefully disabled.
2D camera: A single phone camera cannot perceive true depth. IRIS compensates by using Gemini's understanding of apparent object size as a proxy for distance during Phase 2.
Latency: Each Gemini API call takes 0.5–1.5 seconds. Phase 2 (proximity) and Phase 3 (bbox) poll once per second — fast enough for a tabletop scenario, not for navigation.
Bluetooth earbuds: Add 100–300ms audio latency, which desynchronizes the beeps from hand movement. Wired earbuds are recommended.

Project Origin

Built for a hackathon. Started as a Python prototype with OpenCV + MediaPipe + sounddevice, then ported to the web for universal phone access. The original spec called for the Gemini Live API, but standard generateContent proved more reliable for structured JSON responses (bounding boxes, proximity scores).

The name "Ariadne" references the Greek myth — the thread that guided Theseus through the labyrinth. IRIS (Intelligent Reach & Interaction System) is the deployment name.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IRIS — Intelligent Reach & Interaction System

The Problem

What IRIS Does

Why This Is Different

How It Works (Technical)

Architecture

Tech Stack

Key Design Decisions

Running Locally

Deploying to Vercel

Usage

Accessibility

Limitations

Project Origin

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IRIS — Intelligent Reach & Interaction System

The Problem

What IRIS Does

Why This Is Different

How It Works (Technical)

Architecture

Tech Stack

Key Design Decisions

Running Locally

Deploying to Vercel

Usage

Accessibility

Limitations

Project Origin

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages