Inspiration
We wanted to build an AI travel and location agent that feels more like a live guide than a chatbot. Most assistants can answer questions, but they do not help users see places, understand local context, move through destinations, or switch naturally between maps, Street View, voice, vision, and live web information. That inspired us to create Google AI Guide as a multimodal exploration experience.
What it does
Google AI Guide turns a map into an interactive AI guide. Users can search destinations, open Street View, discover places, get current local context, explore city briefings, watch location-related media, translate signs and speech, analyze meals, scan scenes, identify plants, and run guided tourist-style experiences. The goal is to help people understand a place from multiple angles instead of only reading text.
How we built it
We built the project as a unified web app with a Node.js/Express backend and a browser-based frontend using JavaScript, HTML, and CSS. The app is deployed on Google Cloud Run as a single service, with the backend serving the frontend and orchestrating AI + Google API calls.
The core experience combines:
- Google Maps JavaScript API for the live map UI
- Places API for discovery, autocomplete, and place details
- Street View for immersive destination exploration
- Geocoding + Directions for movement, routing, and location lookup
- Gemini for reasoning, summaries, multimodal understanding, and guided narration
- Gemini Live for real-time conversational interaction
- Vertex AI grounded search for fresher web-backed answers
- Google Cloud Text-to-Speech for Joy’s voice
- Google Cloud Vision API for visual analysis and OCR-based features
- Google Cloud Translation API for translation workflows
- YouTube Data API for location-related video exploration
We also designed the UI to support multiple modes inside one product instead of splitting everything into separate demos. That meant building reusable popup systems, draggable/resizable panels, mobile-friendly controls, and map-aware result cards that connect search, narration, and Street View in one flow.
Challenges we ran into
One of the biggest challenges was orchestration. Many features needed multiple services to work together at once. For example, a single search might need Gemini reasoning, Maps lookup, place photos, Street View actions, and current web context. Making those outputs feel unified instead of fragmented took a lot of iteration.
Another challenge was balancing freshness with clarity. General web search results often felt vague, so we improved the result pipeline with intent-aware search handling, better source ranking, de-duplication, and current-news routing when users ask for up-to-date information.
We also spent a lot of time on UI/UX quality. Because this is a highly visual product, small layout issues had a big impact. We refined popup sizing, search result presentation, mobile behavior, Street View reliability, image-heavy result cards, and Joy’s tone so the experience felt polished and less mechanical.
What we learned
We learned that multimodal AI products feel strongest when the model is not working alone. The best results came from combining Gemini with structured Google services like Maps, Places, Street View, Vision, Translate, and Cloud Run deployment. We also learned that prompt design is only part of the solution; ranking, layout, action design, and response formatting matter just as much.
Most importantly, we learned how to design an AI agent that moves beyond text-only chat. Google AI Guide is not just answering questions, it is helping users see, hear, search, interpret, and navigate places in a more natural way.
What's next for Google AI Guide
Next, we want to make the experience even more adaptive and real-time. We plan to improve live guidance, interruption handling, richer interleaved responses, and stronger mobile-first camera workflows. We also want to expand destination storytelling, real-time local awareness, and personalized place guidance so Google AI Guide becomes an even stronger location intelligence companion.
Built With
- ai
- api
- cloud
- css
- data
- directions
- docker
- express.js
- gemini
- geocoding
- grounded
- html
- javascript
- live
- maps
- node.js
- places
- run
- search
- street
- text-to-speech
- translation
- vertex
- view
- vision
- websockets
- youtube
Log in or sign up for Devpost to join the conversation.