Say "clip that" — and the last 30 seconds of audio are saved and transcribed.
clip-thing is a small, always-listening recorder for meeting rooms and desks. It keeps a rolling 30-second buffer of audio in memory; when it hears the wake word it writes that buffer to a clip, transcribes it, and lists it in a self-hosted web UI you can play and search.
It runs two ways from one codebase:
- Laptop mode —
pip install, run, uses your built-in mic. Zero hardware. - Device mode — a Raspberry Pi 4/5 + ReSpeaker mic that sits in a room.
Everything is open source — the code, the hardware build guide, and the wiring schematics.
mic → rolling 30s buffer (in memory) → wake word ("clip that")
→ save WAV → transcribe → SQLite → web UI (play / read / search)
- Wake word: openWakeWord — open source, no API keys, runs on a Pi.
- Transcription: on-device by default (faster-whisper); an optional cloud backend is one config line away.
- Self-contained: each unit captures, transcribes, stores, and serves its own UI. No central server, no account.
clip-thing is an always-on microphone. Be transparent with anyone in the room.
- The rolling 30-second buffer lives in memory only and is continuously overwritten. Audio is written to disk only when the wake word fires.
- With the default
localtranscription backend, audio never leaves the device — capture and transcription are entirely on-device. - The web UI has no authentication. It is built for a trusted LAN:
anyone who can reach the address can play and read every clip. Do not expose
it to the public internet. Set
web.host: 127.0.0.1to keep it local-only. - Switching to the
cloudtranscription backend sends clip audio to a third-party API. That is opt-in.
Requires Python 3.11+.
git clone https://github.com/yoelgal/clip-thing.git
cd clip-thing
python -m venv .venv && source .venv/bin/activate
pip install -e ".[local]"
clip-thing runOpen http://localhost:8080. Say the wake word near your mic — within a few seconds a transcribed clip appears.
First run downloads the wake-word model (~a few MB) and, on first transcription, a Whisper model (~75 MB for
tiny).
Full walkthrough: docs/setup-laptop.md.
Build a Raspberry Pi + ReSpeaker 2-Mic HAT unit and have it autostart on boot:
- Hardware build, bill of materials, and wiring:
hardware/README.md - Pi software setup and autostart:
docs/setup-pi.md
openWakeWord ships no stock "clip that" model, so:
- Out of the box clip-thing uses a stock model (
hey_jarvis) — the pipeline works immediately, with no training and no setup. - For the actual "clip that" phrase, train a model once with
clip-thing train-wakeword "clip that", then pointwakeword.model_pathat the resultingmodels/clip_that.onnx. - Want a different phrase? Same command with your phrase — see
docs/training-wakeword.md.
| Command | What it does |
|---|---|
clip-thing run |
Start the capture pipeline and web UI |
clip-thing devices |
List input devices (to set audio.input_device) |
clip-thing train-wakeword "phrase" |
Prepare/train a custom wake word |
Copy config.example.yaml to config.yaml and edit.
Every option is documented in docs/configuration.md.
pip install -e ".[local,dev]"
pytestDocumented but intentionally not built yet, to keep the first build bare-bones:
- A physical "clip that" button as a manual backup to the voice trigger.
- A small OLED/e-ink display showing last-clip status or a QR code to the UI.
whisper.cppas an alternative on-device transcription backend for slower Pis.
MIT — code and docs. The hardware build uses off-the-shelf parts (no custom PCB), so the wiring guide is documentation under the same license.
