Voice Waitress

architecture diagram, showing interactions between twilio, cloud run, and gemini
logo, showing emojis of a hamburger, a phone, and a robot over a synthwave background

Inspiration

Let me know if this sounds familiar -- You call into a restaurant to place a food order. You're excited to get your burger, or salad.

Often times they're too busy to pick up. There's tons to do in a restaurant, and phone duty is just another chore. Even making a reservation can be a challenge.

Or maybe if they do pick up, you can't hear eachother over the noises of the kitchen and the customers. Pure chaos.

What if there was a better way? We're in 2025 after all. The future is here! Introducing, Voice Waitress. An AI agent capable of taking food orders and answering FAQs without human intervention.

How we built it

Voice Waitress is built atop a foundation of Twilio and Google Agent Development Kit. ADK lets us create agents with live bidirectional streaming, allowing agents to respond in real-time with minimal latency. FastAPI acts as a bridge between Twilio and ADK, coordinating the flow of data going both ways.

Challenges we ran into

Twilio and ADK have fundamentally different audio encodings. ADK takes 16-bit 16kHz PCM, while Twilio only deals with 8-bit 8kHz μ-law. The audio stream needs to be resampled and transcoded in real-time in order to let the systems interact. Overcoming this obstable required learning how audio works.

What we learned

Audio Encoding Crash Course
ADK Tools can modify Agent State
Automated deploys using GitHub Actions

What's next for Voice Waitress

Improvements to prompts, tools, MCP integrations.
Integrations with inventory, POS software
Guardrails, Evaluations
Making agents, menus, faq's configurable by restaurant owners.

Built With

docker
fastapi
google-adk
twilio

Updates

Julian Hecker started this project — Nov 10, 2025 03:30 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.