Inspiration
Pirates and their the parrots that perch on their shoulder (Polly) are awesome. Imagine having one today, but take it to the next level, these days if a pirate had a parrot, it would sing songs, dance and make funny quips.
That's what Polly does. We couldn't find a parrot plushie, so we used a crow. Same thing.
What it does
Polly is your shoulder-side companion that:
- Chats with you and makes fun of you
- Plays literally any song on the internet
- Remembers your tastes and preferences
- Reads your facial expressions to react in the moment
- Perches on your shoulder and yaps like the real deal
How we built it
- VAPI for short conversation handling
- Databricks for memory and insights on your preferences (RAG)
- Our event loop, enhancing Groq with tool driven LLM approach (play_music, transcribe, say_joke)
- Computer vision with GCP Vision API, STT with Deepgram + voices and some TTS with 11 labs
- Client + server architecture with websockets serving realtime data streaming
Challenges we ran into
- Reliable LOW latency with longer more involved conversations, had to switch everything to stream in order to remove latency from waiting for batches
- How to prioritize cues (like facial recognition vs speech, which one is more important contextually?)
Accomplishments that we're proud of
- Functioning parrot on the shoulder, (even if it's a crow)
- Different speech modes, commenting (does not speak each turn), conversation, cue words, robust music picking system based on natural language
- Facial emotion recognition to extract potential insights from visual context
- RAG based approach for prompting
What we learned
- It's awesome to have a parrot on your shoulder to joke around with you
- Filming Toronto style street interviews are not for the weak
What's next for Polly
- Facial recognition of individual people to refresh names
- More natural way to interrupt the parrot (crow) while it talks
- Better computer vision with YOLOv10 or other object recognition models to have visual awareness
- Motor in the mouth to simulate movement, or in wings

Log in or sign up for Devpost to join the conversation.