Inspiration

Let me know if this sounds familiar -- You call into a restaurant to place a food order. You're excited to get your burger, or salad.

Often times they're too busy to pick up. There's tons to do in a restaurant, and phone duty is just another chore. Even making a reservation can be a challenge.

Or maybe if they do pick up, you can't hear eachother over the noises of the kitchen and the customers. Pure chaos.

What if there was a better way? We're in 2025 after all. The future is here! Introducing, Voice Waitress. An AI agent capable of taking food orders and answering FAQs without human intervention.

How we built it

Voice Waitress is built atop a foundation of Twilio and Google Agent Development Kit. ADK lets us create agents with live bidirectional streaming, allowing agents to respond in real-time with minimal latency. FastAPI acts as a bridge between Twilio and ADK, coordinating the flow of data going both ways.

Challenges we ran into

Twilio and ADK have fundamentally different audio encodings. ADK takes 16-bit 16kHz PCM, while Twilio only deals with 8-bit 8kHz μ-law. The audio stream needs to be resampled and transcoded in real-time in order to let the systems interact. Overcoming this obstable required learning how audio works.

What we learned

  • Audio Encoding Crash Course
  • ADK Tools can modify Agent State
  • Automated deploys using GitHub Actions

What's next for Voice Waitress

  • Improvements to prompts, tools, MCP integrations.
  • Integrations with inventory, POS software
  • Guardrails, Evaluations
  • Making agents, menus, faq's configurable by restaurant owners.

Built With

Share this project:

Updates