Accesso

tourist home screen layout
tourist ui
ui2
transition animation
Ai model working
scene recognition
scene

Inspiration

Accesso was born out of the desire to make travel and daily exploration more inclusive especially for individuals who are visually impaired. While many tourist apps and AR experiences exist, few are built with accessibility at their core. We wanted to create an AI-powered assistant that not only helps tourists discover their surroundings intelligently, but also enables visually impaired users to navigate and understand the world around them through voice and vision. Our dual-mode system ensures everyone can experience the magic of movement, regardless of physical limitations or language barriers.

What it does

Accesso is an Android application designed with two core modes:

🧭 ## Tourist Mode -Capture scenes and get AI-generated image descriptions with contextual understanding. -Translate foreign text (e.g., Japanese billboards) into the user’s language. -Ask follow-up questions about the image (e.g., "What’s the name of this place?", "Is this a restaurant?"). -Discover and store favorite spots using MongoDB Atlas Vector Search, so users can revisit or get recommendations based on past preferences.

🦯 Visually Impaired Mode -One-tap capture for real-time scene description via speech. -Voice command-based interactions—ask follow-up questions or navigate through voice. -Audio feedback is immediate, making navigation and understanding seamless. -All interactions are powered by image recognition + Q&A using Qwen models via Nebius API.

How we built it

Technology Stack Frontend: Kotlin for tourist mode

Backend (AI Agent):Qwen model via Nebius API for image-based scene recognition and conversational Q&A. OCR + Translation: Image text extraction + foreign language translation. TTS + Voice Input: Android’s TextToSpeech and SpeechRecognizer for offline/low-latency audio feedback.

Database: MongoDB Atlas for user data. Vector Search to store and match scenes and preferred locations based on image/text embeddings.

🧩 Key Implementation Details -Dual Mode Flow: Separate UX tracks for tourists and visually impaired users, with a shared AI backend. -Image + Query Handling: Image captured → sent to Qwen → response processed → shown or spoken. -Multilingual Support: Embedded translation pipeline for cross-language scenarios (e.g., foreign signage). -Scene Embedding Matching: Similar scenes grouped or recommended using MongoDB’s vector similarity.

Challenges we ran into

-Real-Time Performance: Balancing cloud-based Qwen processing with fast TTS and voice interactions for a smooth user experience. -Voice Command Mapping: Designing an intuitive voice command system that feels natural and responsive. -Accessibility UI: Crafting an interface that works for both sighted and non-sighted users without overwhelming either. -Embedding Search Accuracy: Ensuring scene similarity retrieval doesn’t return false positives from the vector database.

Accomplishments that we're proud of

🧭 Dual Experience: Designed a unified system that serves both tourists and visually impaired users. 🗣️ Conversational AI on Images: Users can chat about images—not just get static descriptions. 🌐 Global Ready: Easily usable in multilingual scenarios with integrated text recognition + translation. 💾 Memory with Meaning: Uses scene embeddings to “remember” user preferences—like a smart travel companion.

What we learned

-Multimodal AI Integration: Image + text + audio can work seamlessly with the right flow. -Voice UX Design: Simplicity is key—minimal gestures + clean voice commands make or break accessibility. -Vector DB Power: MongoDB Atlas Vector Search allows context-aware recommendations and memory recall. -Cloud + Edge Balance: Combining Nebius API with local Android components leads to scalable, fast performance.

What's next for Untitled

Offline Mode: Introduce lightweight on-device scene recognition and TTS fallback when offline. Crowdsourced Sight Tags: Let users contribute scene insights for popular locations (e.g., “Great for sunset photos”). Navigation Assistance: Integrate GPS + footpath tracking to guide visually impaired users in real time. AR Overlay (Tourist Mode): Show visual overlays of translated signs or place tags using ARCore.