Inspiration

Pirates and their the parrots that perch on their shoulder (Polly) are awesome. Imagine having one today, but take it to the next level, these days if a pirate had a parrot, it would sing songs, dance and make funny quips.

That's what Polly does. We couldn't find a parrot plushie, so we used a crow. Same thing.

What it does

Polly is your shoulder-side companion that:

  • Chats with you and makes fun of you
  • Plays literally any song on the internet
  • Remembers your tastes and preferences
  • Reads your facial expressions to react in the moment
  • Perches on your shoulder and yaps like the real deal

How we built it

  • VAPI for short conversation handling
  • Databricks for memory and insights on your preferences (RAG)
  • Our event loop, enhancing Groq with tool driven LLM approach (play_music, transcribe, say_joke)
  • Computer vision with GCP Vision API, STT with Deepgram + voices and some TTS with 11 labs
  • Client + server architecture with websockets serving realtime data streaming

Challenges we ran into

  • Reliable LOW latency with longer more involved conversations, had to switch everything to stream in order to remove latency from waiting for batches
  • How to prioritize cues (like facial recognition vs speech, which one is more important contextually?)

Accomplishments that we're proud of

  • Functioning parrot on the shoulder, (even if it's a crow)
  • Different speech modes, commenting (does not speak each turn), conversation, cue words, robust music picking system based on natural language
  • Facial emotion recognition to extract potential insights from visual context
  • RAG based approach for prompting

What we learned

  • It's awesome to have a parrot on your shoulder to joke around with you
  • Filming Toronto style street interviews are not for the weak

What's next for Polly

  • Facial recognition of individual people to refresh names
  • More natural way to interrupt the parrot (crow) while it talks
  • Better computer vision with YOLOv10 or other object recognition models to have visual awareness
  • Motor in the mouth to simulate movement, or in wings

Built With

  • databricks
  • deepgram
  • elevenlabs
  • gcp
  • groq
  • nextjs
  • python
  • railway
  • vapi
  • vercel
Share this project:

Updates