A meeting bot that joins your Zoom call, introduces itself, listens, then tells you what you just agreed to. With a real voice. Built in one evening.
This is the plumbing. Clone it, drop your personality in, dispatch your bot. Six hours from zero to "holy shit, it works."
You invite pennybot to a Zoom (or Google Meet, or Teams). A participant with the name you chose appears in the tile grid, your avatar as her face. She introduces herself in two sentences and shuts up. You have your meeting. When you say "can I get a recap?" she speaks a concise summary of decisions and action items. When you say "send Alex a follow-up draft," she drafts it and waits for your "yes" before doing anything.
She's a real voice agent, not a silent note-taker.
- A working voice-in-meetings agent. Not a proposal, not a plan, a thing that runs.
- Wake-word gated. She stays silent unless addressed. Meeting noise doesn't trigger her.
- Context-aware. Hears everything, speaks only on invitation, recalls what you said.
- Customizable. Swap the personality, the voice, the avatar, the wake words.
- Cheap. ~$1 per hour of meeting, all-in (Recall + OpenAI Realtime).
- Open. MIT license. Fork it, change it, run it however you want.
- A product. It's a weekend build. No SLA, no support, no roadmap. You run it, you own it.
- A replacement for Otter / Fireflies / Fathom for pure note-taking. Those services transcribe post-hoc and don't speak. This bot actively participates.
- A replacement for Tavus / HeyGen for animated avatars. The avatar here is a static image, not a lip-synced video. Plug Tavus in on top of this if you want live lip-sync.
- Multi-meeting concurrent. One bot, one meeting, for now. Scaling is out of scope.
| This | Otter / Fireflies / Fathom | Loom meeting recordings | Tavus / HeyGen | Zoom AI Companion | |
|---|---|---|---|---|---|
| Joins meeting as participant | ✅ | ✅ (silent) | ❌ (screen only) | ✅ | ✅ (host-only) |
| Speaks in the meeting | ✅ | ❌ | ❌ | ✅ | ❌ |
| Wake-word gated | ✅ | ❌ (always listens) | ❌ | ❌ | partial |
| Real-time recap on demand | ✅ | post-meeting only | post-meeting only | ❌ | ✅ |
| Custom personality | ✅ (full prompt) | no | no | no | no |
| Custom avatar | ✅ (any image) | no face | no | ✅ (paid tier) | no |
| Runs on your infra | ✅ | no | no | no | no |
| Cost per hour | ~$1 | ~$0.30 | free + $$$/seat | $$$ | $$ |
| Vendor lock-in | none | full | full | full | full |
Longer comparison with honest tradeoffs: docs/comparison.md.
The honest framing: Otter / Fireflies are better products for pure note-taking. Zoom's AI Companion is better integrated. Tavus is better looking. But none of them are a bot that talks when you address it, stays quiet when you don't, and gives you a spoken recap mid-meeting. That's the niche this fills. It also happens to be the niche Loom got acquired for ~$975M to eventually build.
If this is worth ~$1B built by a company of hundreds, it's worth publishing as a recipe built by a person in a night.
Zoom / Meet / Teams
│ meeting audio + video
▼
Recall.ai bot (handles joining, waiting-room admission, audio routing)
│
▼
webpage rendered as bot's "camera" (your avatar + audio capture)
│
▼
cloudflared / ngrok tunnel (exposes your local relay to the internet)
│
▼
python-server/server.py (WebSocket relay, wake-word gate, session config injection)
│
▼
OpenAI Realtime API (STT + LLM + TTS in one service, `gpt-realtime-1.5`)
│
▼
audio flows back the reverse path → meeting
The relay is ~250 lines of Python. The client is ~150 lines of React. Everything else is config and a prompt.
Deep dive: docs/architecture.md.
- Recall.ai account + API key (note your region: us-west-2 / us-east-1 / eu-central-1 / ap-northeast-1)
- OpenAI API key with Realtime API access and credits on the project. Without credits you'll connect and get silence.
- Python 3.10+
- Node 18+
cloudflared(preferred, no signup:brew install cloudflared) ORngrokwith an authtoken
git clone https://github.com/shawnpetros/pennybot.git
cd pennybot
# 1. Python relay
cd python-server
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env # fill in OPENAI_API_KEY and RECALL_API_KEY
# 2. React client (avatar + audio pipeline)
cd ../client
npm install
npm run build
# 3. (optional) Drop your avatar at client/public/avatar.jpg (1024×1024 recommended).
# If you skip, a placeholder silhouette SVG renders instead.
# 4. Serve the built client
cd dist && python3 -m http.server 8001 &
# 5. Run the relay
cd ../../python-server && source venv/bin/activate && python server.py &
# 6. Expose both via tunnels (two separate cloudflared quick-tunnels)
cloudflared tunnel --url http://localhost:3000 & # relay — copy the URL
cloudflared tunnel --url http://localhost:8001 & # avatar — copy the URL
# 7. Dispatch the bot (edit the URLs in the command)
./scripts/join_meeting.sh "<zoom_meeting_url>" "wss://<your-relay-tunnel>"Bot appears in 15-30 seconds. Admit from the waiting room. Say the wake word ("penny" by default) and she responds.
Careful version with debug checklist: docs/setup.md.
Everything tunable is in python-server/config.py. Change the name, voice, personality, wake words, model, turn-detection behavior — one file.
Full customization guide: docs/customization.md.
Highlights:
PENNY_VOICE— one of 10 OpenAI Realtime voices. Defaultmarin.PENNY_INSTRUCTIONS— full system prompt. Replace with your bot's personality. Keep the anti-slop guardrails (banned words, banned openers, no em dashes) even if you change everything else.PENNY_WAKE_REGEX— regex for what wakes her up. Default matches bot's name + meeting-scribe keywords (recap,action items,summarize,what did we decide).PENNY_REALTIME_MODEL—gpt-realtime-1.5(default),gpt-realtime-mini(faster, less accurate),gpt-realtime(older).
- OpenAI Realtime voices all sound slightly "AI." If you want truly nuanced voice, swap to Cartesia Sonic or ElevenLabs via a Pipecat pipeline. That's the next major build.
- Cloudflared quick-tunnels change URL on restart. For production, use a named tunnel (
cloudflared tunnel create) or ngrok with an authtoken. - ~1.5s baseline latency between user end-of-speech and bot start-of-speech. ~1s achievable with
gpt-realtime-miniat the cost of some instruction-following. True sub-second requires the Pipecat + Cartesia rebuild. - If OpenAI session config is rejected, the bot falls back to the default "helpful assistant" persona. Use valid voice/model names for your model version (check OpenAI's
/v1/modelsendpoint). - Em dashes still sneak into transcripts despite being banned in the prompt. Audible only as pauses, cosmetic.
- No tools wired yet. The bot answers from in-conversation context only. Vault search, calendar lookup, email drafting, CRM updates require wiring OpenAI Realtime function-calling — template included in
docs/customization.md.
MIT. Fork it, change it, ship it.
Forked and heavily customized from recallai/voice-agent-demo (also MIT). That repo's the starting point for the Recall + OpenAI Realtime integration; this repo layers a wake-word gate, session-config injection, anti-slop prompt template, React avatar renderer, and opinionated defaults on top.
Built by @shawnpetros in a night because the tools now make a night enough.
If you build on this, ping me. I'd love to see what you make her do.