Inspiration
We noticed that travelers love capturing photos of monuments, art, and local scenes, but rarely understand the deeper meaning behind them. Most travel experiences stay on the surface, focused on visuals rather than stories.
Hermes aims to change that by using multimodal AI and cultural intelligence to bring context back into travel. It transforms images and short prompts into personalized cultural insights and builds an evolving travel journal that grows with every journey.
What It Does
Hermes is an AI travel companion that helps users understand the world as they explore it. Travelers can capture photos, voice notes, or text, and instantly receive cultural insights about their surroundings.
It interprets landmarks, art, monuments, and signs to explain their historical and social context. Each interaction is saved to a personal travel journal that includes location data, images, and reflections, creating an intelligent memory companion that grows with every journey.
How we built it
Frontend: React Native + Expo for mobile capture, chat UI, and map-based journaling
Backend: FastAPI + Agents ADK , geo_agent, context_agent, conversation_agent, and journal_agent. Agents ADK and the A2A protocol serve as the core reasoning framework among the agents, managing prompt templates, agent routing, and context memory across the workflow
AI Stack: Gemini Vision & Text for multimodal understanding, Google Maps API for reverse-geocoding, Eleven Labs voice conversation and Firebase for authentication + Firestore data storage.
Challenges we ran into
We faced challenges aligning multimodal reasoning with consistent location data and ensuring that contextual insights remained relevant and concise. Integrating multiple APIs and managing latency across perception, conversation, and journaling workflows required significant optimization & complex architectural design.
Accomplishments that we're proud of
We successfully built an end-to-end multimodal experience that allows users to interact naturally with their environment through images, voice, and text. The modular backend enables scalable agent orchestration and smooth integration of multiple AI services. We are especially proud of achieving a seamless bridge between visual recognition, cultural storytelling, and memory preservation.
What we learned
We learned how to coordinate multiple AI agents using Agents ADK for a mobile application, how to manage multimedia data pipelines, and how to balance technical complexity with user experience. We also deepened our understanding of designing culturally aware AI systems that deliver meaningful insights without overwhelming the user.
What's next for Hermes
AR Overlay of Culture and Context Enable travelers to view cultural facts and historical details directly through their camera, blending real-world visuals with AI-powered insights in real time.
Meta Ray-Ban Glasses Integration Bring Hermes to smart glasses for hands-free exploration, allowing users to see and hear context about what they’re looking at while on the move.
Gamified "Learn the World" Badges Reward users for discovering landmarks, local art, or hidden gems, turning travel into an interactive and educational experience.
Curated Community Feed Create a shared space where travelers can post journal entries, insights, and photos, encouraging cultural exchange and collective storytelling.
Built With
- fastapi
- firestore
- gemini-vision
- google-adk
- langchain
- python
- react-native
- typescript

Log in or sign up for Devpost to join the conversation.