Haven: Your Voice, Our Priority

Home page
Login
Live Virtual Machine Stream and Voice Agent Interface

Inspiration 🌍

Over 253 million people worldwide live with visual impairment, including some of our closest family members. Tens of millions more face motor disabilities or age-related limitations that make modern technology like apps and websites genuinely inaccessible. And yet, the most powerful tools we've built as a society are locked behind screens and keyboards.

We kept asking ourselves: what does independence actually look like for someone who is blind, or elderly, or has limited motor control? Not a screen reader that reads labels aloud. Not a voice assistant that sets timers. True independence to us means getting to speak and having the world move around what you vocalize. Where technology doesn't just inform you, it acts for you.

Haven was born from that question.

What it does 🚀

Haven is a voice-first AI agent that gives blind, visually impaired, motor-impaired, and senior users the ability to take real action in the digital and physical world entirely by speaking.

🎙️ Always-on voice interface: Haven listens continuously and responds naturally using the OpenAI Realtime API over WebRTC
☁️ Fully cloud-based, zero install: Haven runs entirely in the browser, no software to download, no setup. A caregiver, family member, or support worker can open Haven on any laptop and give a user full access to another computer remotely, enabling independence from anywhere in the world without touching a local installation
💡 Smart home control: Haven controls physical devices like GovEE smart lamps on voice command for users who cannot reach a switch or navigate a companion app
🛒 Autonomous web tasks: Haven navigates websites and completes real purchases on Amazon using browser automation, users who have never independently ordered anything online can do so simply by asking
🚨 Proactive emergency alerts: Haven monitors live X/Twitter feeds every 5 seconds, detects geopolitical and military emergencies using an LLM classifier, and immediately announces threats aloud; it then takes protective action without waiting to be asked, turning on lights and ordering emergency supplies automatically

How we built it 🛠️

🎧 Voice Layer: OpenAI Realtime API over browser-direct WebRTC. No audio relay through our servers, low latency, high reliability, always listening.
🌐 Browser Automation: OpenClaw, a browser-use-powered gateway, executes real multi-step tasks on the user's machine, navigating to Amazon, completing checkout, controlling smart home apps, all driven by natural language instructions from the voice agent.
🧠 Threat Intelligence Worker: A Python/APScheduler background worker polls the Twitter API every 5 seconds. Each post is evaluated by an LLM classifier that assesses full context to distinguish genuine emergencies from noise.
📡 Alert Pipeline: When a threat is detected, the worker pushes a watchtower.alert event via Server-Sent Events to the frontend, which injects it directly into the live OpenAI Realtime session. The voice agent interrupts, announces the alert, and dispatches tool calls to OpenClaw.
⚙️ Backend: FastAPI serving tool dispatch, session management, and SSE event streaming. Supabase for persistence. LiveKit for real-time screen sharing.

Challenges we ran into 🧩

Token Burn from Browser Automation: Our largest challenge was computation cost. OpenClaw navigates websites by taking screenshots and feeding them as images into each input and without tight guardrails, this becomes exponentially expensive very quickly. A single multi-step task can balloon into dozens of vision-model calls, each carrying a full screenshot. We had to carefully constrain how OpenClaw collects visual context, limit screenshot frequency, and tune instructions to minimize redundant image captures, which keeps the system usable without burning through API budget in a single session.
Voice Session Injection: Interrupting a live conversation mid-session to deliver an urgent emergency alert without corrupting WebRTC state, creating duplicate responses, or triggering the "active response in progress" error required careful management of the OpenAI Realtime API data channel and response lifecycle tracking.
Real Autonomous Purchasing: Amazon's checkout flow is dynamic, personalized, and constantly changing. Building a reliable pipeline that could navigate it end-to-end, from cart state to address confirmation to order placement, all without human input required extensive iteration on the OpenClaw instruction design.

Accomplishments that we're proud of 🏆

Zero-install independence from anywhere. Haven runs entirely in the browser, no software, no configuration, no IT support needed. A caregiver can open a laptop anywhere in the world, navigate to Haven, and give a blind or motor-impaired user full autonomous access to another computer's capabilities within seconds. That kind of reach has never been this frictionless. ✨

The first truly cloud-based voice agent for physical-world action. Most cloud tools stay in the cloud. Haven closes the loop by allowing a browser session on one machine to help anyone, driven by voice.

Haven also represents true independence for users who've never had it. A visually impaired or motor-impaired user can now independently order emergency supplies and be proactively informed of breaking global events, entirely by voice, without touching a screen, from any device with a browser.

Fully automated end-to-end emergency response. From a tweet about a military strike to a spoken alert, a light turning on, and a first-aid kit ordered: under 30 seconds, zero user interaction. A senior living alone doesn't need to understand what happened because Haven handles it all.

As a new interaction model, rather than an assistant that waits to be asked, Haven monitors the world, detects what matters, and acts before the user has to say a word, redefining what an accessibility tool can be.

What we learned 📚

Building for accessibility forced us to rethink what "working software" means. A feature that works 90% of the time is fine for a general user. For someone who depends on Haven because they have no alternative, 90% is not enough. ❤️

We learned that voice-first design is fundamentally different from voice-added design. Every assumption baked into a traditional UI like the fact that the user can see state, retry on failure, and read an error message, all has to be rebuilt from scratch for a fully voice-driven experience.

What's next for Haven: Your Voice, Our Priority 🔮

🌐 Broader Task Vocabulary: Expanding autonomous browser skills beyond purchasing, including essentials like booking medical appointments and navigating government portals, giving users genuine independence across the entire internet.
🏠 Deeper Smart Home Integration: Door locks, thermostats, security cameras, and alarm systems, these devices can all controllable by voice for users who cannot interact with physical controls or companion apps.
🛟 Personalized Emergency Playbooks: User-configured response plans per emergency type, different lights, different purchases, different alert scripts, tailored to each person's home, mobility level, and needs.
👨‍👩‍👧 Caregiver Dashboard: A real-time view for family members and caregivers showing what Haven has done, what alerts have fired, and what actions were taken, giving full accountability without removing user autonomy.