We will be undergoing planned maintenance on January 16th, 2026 at 1:00pm UTC. Please make sure to save your work.

VocGuide - Project Story

Inspiration

Traveling to a new destination is thrilling, but planning an itinerary can be overwhelming. We wanted to create an AI-powered companion that not only plans your trip but also speaks to you like a local historian at each stop. The vision was simple: what if your phone could be your personal tour guide, crafting personalized adventures and narrating the story of every place you visit?

The breakthrough came when we discovered Gemini's Native Audio capabilities—the ability to generate natural, expressive speech directly from the AI model. This opened the door to creating truly immersive, voice-first travel experiences.

What it does

VocGuide is an AI-powered travel companion that:

  1. Generates personalized itineraries based on your destination, dates, budget, and mood (Adventure, Relaxed, Foodie, Culture)
  2. Creates a dynamic color theme matching your destination (green for Kerala's forests, blue for coastal getaways)
  3. Provides audio guides for each point of interest using Gemini's native speech synthesis
  4. Enables smart timeline management with drag-and-drop reordering that automatically recalculates schedules
  5. Shows locations on an interactive map with direct links to Google Maps

How we built it

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   React + Vite  │────▶│   FastAPI       │────▶│  Gemini API     │
│   (Frontend)    │     │   (Backend)     │     │  (AI Services)  │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                               │
                               ▼
                        ┌─────────────────┐
                        │   PostgreSQL    │
                        │   (Database)    │
                        └─────────────────┘
  • Frontend: React with Vite, TailwindCSS, Framer Motion for animations, @dnd-kit for drag-and-drop, React-Leaflet for maps
  • Backend: FastAPI (Python) with async SQLAlchemy for database operations
  • AI Integration: Google's google-genai SDK with both REST and Live API (WebSocket) for native audio streaming
  • Database: PostgreSQL for persisting itineraries and user preferences

Key Technical Challenges

The most significant challenge was integrating Gemini's Native Audio models. We discovered that gemini-2.5-flash-native-audio requires the Live API (bidirectional WebSocket streaming) rather than standard REST calls. We implemented:

async with client.aio.live.connect(model=model_id, config={"response_modalities": ["AUDIO"]}) as session:
    await session.send(input=prompt, end_of_turn=True)
    async for response in session.receive():
        # Stream and collect raw PCM audio

The raw PCM audio stream required wrapping in a WAV header for browser playback compatibility.

Challenges we ran into

  1. Model Compatibility: Navigating between gemini-2.0-flash-exp (REST audio), gemini-2.5-flash (text-only), and gemini-2.5-flash-native-audio (Live API only) was complex.

  2. Rate Limits: Free tier quotas (429 errors) required graceful handling and model selection optimization.

  3. Audio Format Conversion: The Live API returns raw PCM at 24kHz—we had to implement a WAV header generator to make it playable in browsers.

  4. Dynamic Theming: Ensuring Gemini reliably returns a valid hex color for each destination required careful prompt engineering.

Accomplishments that we're proud of

  • Successfully integrated Gemini's Live API for real-time audio streaming—a cutting-edge feature
  • Created a genuinely beautiful, professional UI with destination-adaptive theming
  • Built a complete full-stack application in a rapid development cycle
  • Implemented smart timeline management with automatic schedule recalculation

What we learned

  • The nuances of Google's Gemini API ecosystem and model capabilities
  • WebSocket-based AI streaming for real-time applications
  • Audio processing fundamentals (PCM encoding, WAV format structure)
  • The power of prompt engineering for consistent structured outputs

What's next for VocGuide

  • Offline mode with cached itineraries and pre-generated audio
  • Multi-language support for international travelers
  • Real-time weather and event integration
  • Collaborative trip planning for groups
  • Expense tracking integrated with budget recommendations

Built With

  • gemini
  • javascript
  • languages:**-python-javascript/jsx-**frameworks-&-libraries:**-react-+-vite-fastapi-tailwindcss-framer-motion-@dnd-kit-(drag-and-drop)-react-leaflet-(maps)-sqlalchemy-(orm)-**apis-&-cloud-services:**-google-gemini-api-(2.5-flash
  • postgresql
  • python
  • react
  • sqlalchemy
  • tailwindcss
  • vite
Share this project:

Updates