VocGuide - Project Story

Inspiration

Traveling to a new destination is thrilling, but planning an itinerary can be overwhelming. We wanted to create an AI-powered companion that not only plans your trip but also speaks to you like a local historian at each stop. The vision was simple: what if your phone could be your personal tour guide, crafting personalized adventures and narrating the story of every place you visit?

The breakthrough came when we discovered Gemini's Native Audio capabilities—the ability to generate natural, expressive speech directly from the AI model. This opened the door to creating truly immersive, voice-first travel experiences.

What it does

VocGuide is an AI-powered travel companion that:

Generates personalized itineraries based on your destination, dates, budget, and mood (Adventure, Relaxed, Foodie, Culture)
Creates a dynamic color theme matching your destination (green for Kerala's forests, blue for coastal getaways)
Provides audio guides for each point of interest using Gemini's native speech synthesis
Enables smart timeline management with drag-and-drop reordering that automatically recalculates schedules
Shows locations on an interactive map with direct links to Google Maps

How we built it

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   React + Vite  │────▶│   FastAPI       │────▶│  Gemini API     │
│   (Frontend)    │     │   (Backend)     │     │  (AI Services)  │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                               │
                               ▼
                        ┌─────────────────┐
                        │   PostgreSQL    │
                        │   (Database)    │
                        └─────────────────┘

Frontend: React with Vite, TailwindCSS, Framer Motion for animations, @dnd-kit for drag-and-drop, React-Leaflet for maps
Backend: FastAPI (Python) with async SQLAlchemy for database operations
AI Integration: Google's google-genai SDK with both REST and Live API (WebSocket) for native audio streaming
Database: PostgreSQL for persisting itineraries and user preferences

Key Technical Challenges

The most significant challenge was integrating Gemini's Native Audio models. We discovered that gemini-2.5-flash-native-audio requires the Live API (bidirectional WebSocket streaming) rather than standard REST calls. We implemented:

async with client.aio.live.connect(model=model_id, config={"response_modalities": ["AUDIO"]}) as session:
    await session.send(input=prompt, end_of_turn=True)
    async for response in session.receive():
        # Stream and collect raw PCM audio

The raw PCM audio stream required wrapping in a WAV header for browser playback compatibility.

Challenges we ran into

Model Compatibility: Navigating between gemini-2.0-flash-exp (REST audio), gemini-2.5-flash (text-only), and gemini-2.5-flash-native-audio (Live API only) was complex.
Rate Limits: Free tier quotas (429 errors) required graceful handling and model selection optimization.
Audio Format Conversion: The Live API returns raw PCM at 24kHz—we had to implement a WAV header generator to make it playable in browsers.
Dynamic Theming: Ensuring Gemini reliably returns a valid hex color for each destination required careful prompt engineering.

Accomplishments that we're proud of

Successfully integrated Gemini's Live API for real-time audio streaming—a cutting-edge feature
Created a genuinely beautiful, professional UI with destination-adaptive theming
Built a complete full-stack application in a rapid development cycle
Implemented smart timeline management with automatic schedule recalculation

What we learned

The nuances of Google's Gemini API ecosystem and model capabilities
WebSocket-based AI streaming for real-time applications
Audio processing fundamentals (PCM encoding, WAV format structure)
The power of prompt engineering for consistent structured outputs

What's next for VocGuide

Offline mode with cached itineraries and pre-generated audio
Multi-language support for international travelers
Real-time weather and event integration
Collaborative trip planning for groups
Expense tracking integrated with budget recommendations

Built With

gemini
javascript
languages:**-python-javascript/jsx-**frameworks-&-libraries:**-react-+-vite-fastapi-tailwindcss-framer-motion-@dnd-kit-(drag-and-drop)-react-leaflet-(maps)-sqlalchemy-(orm)-**apis-&-cloud-services:**-google-gemini-api-(2.5-flash
postgresql
python
react
sqlalchemy
tailwindcss
vite

Updates

Adhil A Backer started this project — Jan 09, 2026 07:49 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.