Google AI Guide

Inspiration

We wanted to build an AI travel and location agent that feels more like a live guide than a chatbot. Most assistants can answer questions, but they do not help users see places, understand local context, move through destinations, or switch naturally between maps, Street View, voice, vision, and live web information. That inspired us to create Google AI Guide as a multimodal exploration experience.

What it does

Google AI Guide turns a map into an interactive AI guide. Users can search destinations, open Street View, discover places, get current local context, explore city briefings, watch location-related media, translate signs and speech, analyze meals, scan scenes, identify plants, and run guided tourist-style experiences. The goal is to help people understand a place from multiple angles instead of only reading text.

How we built it

We built the project as a unified web app with a Node.js/Express backend and a browser-based frontend using JavaScript, HTML, and CSS. The app is deployed on Google Cloud Run as a single service, with the backend serving the frontend and orchestrating AI + Google API calls.

The core experience combines:

Google Maps JavaScript API for the live map UI
Places API for discovery, autocomplete, and place details
Street View for immersive destination exploration
Geocoding + Directions for movement, routing, and location lookup
Gemini for reasoning, summaries, multimodal understanding, and guided narration
Gemini Live for real-time conversational interaction
Vertex AI grounded search for fresher web-backed answers
Google Cloud Text-to-Speech for Joy’s voice
Google Cloud Vision API for visual analysis and OCR-based features
Google Cloud Translation API for translation workflows
YouTube Data API for location-related video exploration

We also designed the UI to support multiple modes inside one product instead of splitting everything into separate demos. That meant building reusable popup systems, draggable/resizable panels, mobile-friendly controls, and map-aware result cards that connect search, narration, and Street View in one flow.

Challenges we ran into

One of the biggest challenges was orchestration. Many features needed multiple services to work together at once. For example, a single search might need Gemini reasoning, Maps lookup, place photos, Street View actions, and current web context. Making those outputs feel unified instead of fragmented took a lot of iteration.

Another challenge was balancing freshness with clarity. General web search results often felt vague, so we improved the result pipeline with intent-aware search handling, better source ranking, de-duplication, and current-news routing when users ask for up-to-date information.

We also spent a lot of time on UI/UX quality. Because this is a highly visual product, small layout issues had a big impact. We refined popup sizing, search result presentation, mobile behavior, Street View reliability, image-heavy result cards, and Joy’s tone so the experience felt polished and less mechanical.

What we learned

We learned that multimodal AI products feel strongest when the model is not working alone. The best results came from combining Gemini with structured Google services like Maps, Places, Street View, Vision, Translate, and Cloud Run deployment. We also learned that prompt design is only part of the solution; ranking, layout, action design, and response formatting matter just as much.

Most importantly, we learned how to design an AI agent that moves beyond text-only chat. Google AI Guide is not just answering questions, it is helping users see, hear, search, interpret, and navigate places in a more natural way.

What's next for Google AI Guide

Next, we want to make the experience even more adaptive and real-time. We plan to improve live guidance, interruption handling, richer interleaved responses, and stronger mobile-first camera workflows. We also want to expand destination storytelling, real-time local awareness, and personalized place guidance so Google AI Guide becomes an even stronger location intelligence companion.

Built With

ai
api
cloud
css
data
directions
docker
express.js
gemini
geocoding
google
grounded
html
javascript
live
maps
node.js
places
run
search
street
text-to-speech
translation
vertex
view
vision
websockets
youtube

Updates

Al Moreau started this project — Mar 09, 2026 01:02 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.