https://github.com/Interpause/EchoPath-web

Usually education is thought of in terms of academia, but the utility of our senses in providing us with strong foundational capability to learn is often taken for granted. If you think about it, blind people require a completely different type of learning, given that they lack what is arguably the most essential sense, vision.

Inspiration

EchoPath was inspired by this gap: most tools assume visual interaction, even when they include accessibility features. We wanted to build something that treats audio as the primary interface, not a secondary add-on.
Our core idea was simple—combine camera perception, voice interaction, and spatial sound into one assistant that helps blind and low-vision users navigate and ask context-aware questions hands-free.

What it does

EchoPath is a real-time voice-and-vision navigation assistant. While the camera is running, it:

  • Streams camera frames to the backend for perception.
  • Continuously listens for the wake phrase “hey john.”
  • Captures a spoken command after wake word detection.
  • Sends the command + current frame to the backend (query_llm).
  • Speaks back the backend response (query_llm_response) in concise, non-visual language.
  • Provides spatial audio cues for nearby obstacles using 3D coordinate data.

How we built it

We built EchoPath as a web/mobile-friendly frontend connected to an async backend over WebSocket.

  • Frontend: React + Tailwind for the UI, with TypeScript and Capacitor camera integration for real-time capture and mobile-ready camera access.
  • Real-time transport: WebSocket for continuous frame upload and streaming response messages.
  • Voice loop: Browser speech recognition for wake-word (hey john) + command capture, and browser speech synthesis for spoken responses.
  • Perception UX: Detection overlays and distance points for real-time visual debugging/validation of backend outputs.
  • Spatial audio: Custom Web Audio engine that maps object position into directional cue sounds.
  • Backend: FastAPI server orchestrating the pipeline, Hugging Face Transformers for depth estimation, Ultralytics YOLO for object detection, and a llama.cpp server exposed through an OpenAI-compatible API for language responses.

We implemented strict message contracts so the frontend and backend stay aligned:

  • image for continuous frame streaming
  • query_llm for wake-word-triggered command + frame
  • query_llm_response for spoken assistant output

Challenges we ran into

  • Getting stable wake-word behavior in continuous speech recognition.
  • Avoiding stale transcript/state bugs where old commands repeated.
  • Managing retry behavior when speech recognition or WebSocket failed.
  • Keeping responses useful for blind users without relying on visual descriptors.
  • Handling real-time timing constraints across capture, network, inference, and TTS.

Accomplishments that we're proud of

  • Built a fully working wake-word → command → backend → spoken-response loop.
  • Integrated live camera streaming with backend vision/LLM querying.
  • Added spatial audio cues to communicate obstacle direction and proximity.
  • Improved robustness with timeout logic, retries, and failure-safe behavior.
  • Kept the interaction flow hands-free and accessibility-first from end to end.

What we learned

  • Accessibility-first design changes architecture decisions, not just UI wording.
  • Reliability and state management matter as much as model quality in real-time systems.
  • Fast iteration with clear protocol contracts is critical in hackathon environments.
  • “Helpful” responses for blind users must be concise, actionable, and sensory-aware.

What's next for EchoPath

  • Add stronger on-device/offline fallback for voice commands.
  • Improve personalization (voice style, verbosity, route preferences).
  • Expand route safety signals (surface changes, curb detection, crossing cues).
  • Introduce confidence-aware responses when model certainty is low.
  • Run broader user testing with blind and low-vision participants to refine guidance quality.

Built With

  • fastapi
  • llama
  • react
  • tailwind
  • transformers
  • yolo
Share this project:

Updates