pennybot

A meeting bot that joins your Zoom call, introduces itself, listens, then tells you what you just agreed to. With a real voice. Built in one evening.

This is the plumbing. Clone it, drop your personality in, dispatch your bot. Six hours from zero to "holy shit, it works."

What it actually does

You invite pennybot to a Zoom (or Google Meet, or Teams). A participant with the name you chose appears in the tile grid, your avatar as her face. She introduces herself in two sentences and shuts up. You have your meeting. When you say "can I get a recap?" she speaks a concise summary of decisions and action items. When you say "send Alex a follow-up draft," she drafts it and waits for your "yes" before doing anything.

She's a real voice agent, not a silent note-taker.

What this IS and ISN'T

IS

A working voice-in-meetings agent. Not a proposal, not a plan, a thing that runs.
Wake-word gated. She stays silent unless addressed. Meeting noise doesn't trigger her.
Context-aware. Hears everything, speaks only on invitation, recalls what you said.
Customizable. Swap the personality, the voice, the avatar, the wake words.
Cheap. ~$1 per hour of meeting, all-in (Recall + OpenAI Realtime).
Open. MIT license. Fork it, change it, run it however you want.

ISN'T

A product. It's a weekend build. No SLA, no support, no roadmap. You run it, you own it.
A replacement for Otter / Fireflies / Fathom for pure note-taking. Those services transcribe post-hoc and don't speak. This bot actively participates.
A replacement for Tavus / HeyGen for animated avatars. The avatar here is a static image, not a lip-synced video. Plug Tavus in on top of this if you want live lip-sync.
Multi-meeting concurrent. One bot, one meeting, for now. Scaling is out of scope.

How it compares

	This	Otter / Fireflies / Fathom	Loom meeting recordings	Tavus / HeyGen	Zoom AI Companion
Joins meeting as participant	✅	✅ (silent)	❌ (screen only)	✅	✅ (host-only)
Speaks in the meeting	✅	❌	❌	✅	❌
Wake-word gated	✅	❌ (always listens)	❌	❌	partial
Real-time recap on demand	✅	post-meeting only	post-meeting only	❌	✅
Custom personality	✅ (full prompt)	no	no	no	no
Custom avatar	✅ (any image)	no face	no	✅ (paid tier)	no
Runs on your infra	✅	no	no	no	no
Cost per hour	~$1	~$0.30	free + $$$/seat	$$$	$$
Vendor lock-in	none	full	full	full	full

Longer comparison with honest tradeoffs: docs/comparison.md.

The honest framing: Otter / Fireflies are better products for pure note-taking. Zoom's AI Companion is better integrated. Tavus is better looking. But none of them are a bot that talks when you address it, stays quiet when you don't, and gives you a spoken recap mid-meeting. That's the niche this fills. It also happens to be the niche Loom got acquired for ~$975M to eventually build.

If this is worth ~$1B built by a company of hundreds, it's worth publishing as a recipe built by a person in a night.

Architecture (in one diagram)

Zoom / Meet / Teams
         │ meeting audio + video
         ▼
Recall.ai bot  (handles joining, waiting-room admission, audio routing)
         │
         ▼
webpage rendered as bot's "camera"  (your avatar + audio capture)
         │
         ▼
cloudflared / ngrok tunnel  (exposes your local relay to the internet)
         │
         ▼
python-server/server.py  (WebSocket relay, wake-word gate, session config injection)
         │
         ▼
OpenAI Realtime API  (STT + LLM + TTS in one service, `gpt-realtime-1.5`)
         │
         ▼
audio flows back the reverse path → meeting

The relay is ~250 lines of Python. The client is ~150 lines of React. Everything else is config and a prompt.

Deep dive: docs/architecture.md.

Prerequisites

Recall.ai account + API key (note your region: us-west-2 / us-east-1 / eu-central-1 / ap-northeast-1)
OpenAI API key with Realtime API access and credits on the project. Without credits you'll connect and get silence.
Python 3.10+
Node 18+
cloudflared (preferred, no signup: brew install cloudflared) OR ngrok with an authtoken

Quickstart

git clone https://github.com/shawnpetros/pennybot.git
cd pennybot

# 1. Python relay
cd python-server
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env       # fill in OPENAI_API_KEY and RECALL_API_KEY

# 2. React client (avatar + audio pipeline)
cd ../client
npm install
npm run build

# 3. (optional) Drop your avatar at client/public/avatar.jpg (1024×1024 recommended).
#    If you skip, a placeholder silhouette SVG renders instead.

# 4. Serve the built client
cd dist && python3 -m http.server 8001 &

# 5. Run the relay
cd ../../python-server && source venv/bin/activate && python server.py &

# 6. Expose both via tunnels (two separate cloudflared quick-tunnels)
cloudflared tunnel --url http://localhost:3000 &   # relay — copy the URL
cloudflared tunnel --url http://localhost:8001 &   # avatar — copy the URL

# 7. Dispatch the bot (edit the URLs in the command)
./scripts/join_meeting.sh "<zoom_meeting_url>" "wss://<your-relay-tunnel>"

Bot appears in 15-30 seconds. Admit from the waiting room. Say the wake word ("penny" by default) and she responds.

Careful version with debug checklist: docs/setup.md.

Customizing

Everything tunable is in python-server/config.py. Change the name, voice, personality, wake words, model, turn-detection behavior — one file.

Full customization guide: docs/customization.md.

Highlights:

PENNY_VOICE — one of 10 OpenAI Realtime voices. Default marin.
PENNY_INSTRUCTIONS — full system prompt. Replace with your bot's personality. Keep the anti-slop guardrails (banned words, banned openers, no em dashes) even if you change everything else.
PENNY_WAKE_REGEX — regex for what wakes her up. Default matches bot's name + meeting-scribe keywords (recap, action items, summarize, what did we decide).
PENNY_REALTIME_MODEL — gpt-realtime-1.5 (default), gpt-realtime-mini (faster, less accurate), gpt-realtime (older).

Known limitations

OpenAI Realtime voices all sound slightly "AI." If you want truly nuanced voice, swap to Cartesia Sonic or ElevenLabs via a Pipecat pipeline. That's the next major build.
Cloudflared quick-tunnels change URL on restart. For production, use a named tunnel (cloudflared tunnel create) or ngrok with an authtoken.
~1.5s baseline latency between user end-of-speech and bot start-of-speech. ~1s achievable with gpt-realtime-mini at the cost of some instruction-following. True sub-second requires the Pipecat + Cartesia rebuild.
If OpenAI session config is rejected, the bot falls back to the default "helpful assistant" persona. Use valid voice/model names for your model version (check OpenAI's /v1/models endpoint).
Em dashes still sneak into transcripts despite being banned in the prompt. Audible only as pauses, cosmetic.
No tools wired yet. The bot answers from in-conversation context only. Vault search, calendar lookup, email drafting, CRM updates require wiring OpenAI Realtime function-calling — template included in docs/customization.md.

License

MIT. Fork it, change it, ship it.

Credit

Forked and heavily customized from recallai/voice-agent-demo (also MIT). That repo's the starting point for the Recall + OpenAI Realtime integration; this repo layers a wake-word gate, session-config injection, anti-slop prompt template, React avatar renderer, and opinionated defaults on top.

Built by @shawnpetros in a night because the tools now make a night enough.

If you build on this, ping me. I'd love to see what you make her do.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
client		client
docs		docs
python-server		python-server
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pennybot

What it actually does

What this IS and ISN'T

IS

ISN'T

How it compares

Architecture (in one diagram)

Prerequisites

Quickstart

Customizing

Known limitations

License

Credit

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pennybot

What it actually does

What this IS and ISN'T

IS

ISN'T

How it compares

Architecture (in one diagram)

Prerequisites

Quickstart

Customizing

Known limitations

License

Credit

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages