Three guppies. One TV. Two MacBooks. A phone number. Decisions get made β yes or yes.
A user calls a Twilio number, talks to a snarky "fish" voice agent, two options appear on a TV behind a fish tank, the guppies vote (majority of which side they're on), and the winning option is then executed by a remote MacBook in real time.
caller ββ Twilio βββΊ bridge (Fly.io) βββΊ Deepgram (STT)
β² β
β βΌ
ElevenLabs (TTS) Claude Haiku 4.5
β² β
βββββ tool calls βββββ€
βΌ
web (Vercel) ββ Redis ββ Pusher Channels
β
ββββββββββββββββββββΌββββββββββββββββββ
βΌ βΌ βΌ
/display TV vision (Mac mini) remote-agent
(A/B + countdown) (OpenCV @ 30 Hz) (Playwright + osascript)
- Caller dials the Twilio number; audio streams into the bridge.
- Bridge transcribes with Deepgram, feeds the running transcript to Claude, and speaks Claude's replies back via ElevenLabs.
- Claude listens for whatever mess the caller is in, frames it as a binary choice, and calls
present_options(stage, A, B). The bridge POSTs that to the web orchestration API, which writes Redis state and publishes to Pusher. - The TV display subscribes to
optionsand renders A vs B with a countdown. Vision publishes per-frame fish counts onfish-posat 30 Hz; the display tallies which side has the majority over a 1-second rolling window. - When the countdown ends,
/displaypublishes adecisionsevent with{stage, chosen, text, vote}. The bridge'swait_for_decisiontool unblocks, Claude announces the winner, then callsdispatch_action(stage, chosen, text). - The remote-agent is subscribed to
agent-tasks, picks up the dispatch, and carries the winning option out on a second MacBook β composing an iMessage, setting a Reminder, drafting an Outlook email, posting to LinkedIn, whatever the option calls for. Status updates stream back through Pusher to the TV.
The persona is built around real-life "should I, or should I really" moments β breakups, missed meetings, getting laid off, and so on. Claude turns the caller's situation into a pair of options (one defensible, one regrettable), the council picks, and the remote agent makes it happen.
| Vibe | Sample option A | Sample option B |
|---|---|---|
| Just-broke-up energy | Text a coworker | Text the ex |
| Forgot a meeting | Set a reminder | Email your boss to frick off |
| Got laid off | Beg for a job on LinkedIn | Post your .env on LinkedIn |
The option set isn't fixed β Claude composes options live from the call, and new outcomes can be wired into the remote agent without touching the bridge.
web/ Next.js + Vercel β TV display + orchestration HTTP API + Redis state
bridge/ Node service on Fly.io β Twilio call audio β Deepgram β Claude β ElevenLabs
vision/ Python on Mac mini β DJI Osmo Pocket 3 capture, OpenCV multi-fish detection,
live MJPEG preview server on :8765
remote-agent/ Node on remote MacBook β Playwright + osascript runner that executes the chosen option
scripts/ Mock event publisher for end-to-end smoke tests
docs/ Per-component docs + hour-by-hour workflow + demo script
docs/00-overview.mdβ system map and glossary.docs/03-bridge.mdβ call flow, tool calls, persona prompt.docs/05-remote-agent.mdβ how the remote agent picks up and runs an option.docs/workflow.mdβ hour-by-hour build order.- Each runtime has its own README inside its folder.
- Twilio (phone) β Fly.io bridge β Deepgram (STT) β Claude Haiku 4.5 (brain + tool use) β ElevenLabs (TTS)
- Vercel hosts the Next.js display + orchestration API
- Pusher Channels for cross-runtime realtime pub/sub:
fish-pos,options,decisions,agent-tasks,agent-status - Upstash / Redis Cloud (via Vercel Marketplace) for per-call state
- Playwright (headed Chromium) + osascript drive the remote MacBook
- OpenCV + DJI Osmo Pocket 3 for fish position detection
./setup.sh # installs deps for all four runtimes
cd web && npm run dev
open http://localhost:3000/display/dev # click-through preview of every TV statecd scripts && npm run mock -- demo # full ig-swipe stage runs on /displayYES or YES β because when you put a goldfish in charge of your life, every option is the right one.