Train your own wake word, locally, and ship it as a ~220 KB ONNX file.
What you get · Technical highlights · Privacy · Use the model · Related projects · Advanced docs
wakeword-forge trains a custom wake-word detector for a phrase you choose (Hey Nova, Okay Atlas, anything). You collect local audio, review positives and hard negatives, train a WavLM teacher, distill it into a compact RepCNN student, and export wakeword.onnx.
Audio is local by default. The project is for building a custom detector, not just selecting from a fixed pretrained vocabulary.
Requires Git, Python 3.10+, make, and a microphone. Run commands from the repo root; Make targets create/use the repo-local .venv. Certain functionalities require working CUDA/CuDNN drivers.
git clone https://github.com/H-Ali13381/wakeword-forge.git
cd wakeword-forgeChoose one install path:
Standard install — dashboard plus lightweight local TTS backends:
make start DIR=./projects/defaultFull install — includes QwenTTS; recommended for users with CUDA-compatible NVIDIA hardware:
make install-qwentts
make start DIR=./projects/defaultDIR is your local wake-word project folder. The default ./projects/default workspace stays inside the checkout but is ignored by git. It will contain samples/, output/wakeword.onnx, and output/wakeword.json.
Name map: the repo and CLI are named wakeword-forge; source code lives in forge/; your local training workspace is whatever DIR=... points at.
Terminal-only: make cli-run DIR=./projects/default
The dashboard guides you through:
- Choose a wake phrase.
- Record or import positive samples.
- Add background, silence, partial phrases, and near-misses.
- Review real and generated clips.
- Train the WavLM teacher and compact RepCNN student.
- Export
output/wakeword.onnx. - Run a live mic check before accepting the model.
A reproducible local pipeline that goes from your voice to a deployable runtime artifact:
| Metric | Value |
|---|---|
| Teacher (training-only) | WavLM-base, 94.4 M params |
| Student (export) | RepCNN, ~40 K params after reparameterization |
| Exported model size | 217 KB ONNX file |
| Inference latency | ~15 ms per 3 s clip on CPU |
| Audio frontend | 16 kHz mono, 40-mel log-mel, 25 ms / 10 ms |
| FAR operating point | 1 % false-accept budget, threshold + EER stored in wakeword.json |
| Minimum data to train | 10 positives, 5 negatives, 150 background, 100 partials (multi-word) |
The 94 M-parameter teacher is discarded after distillation. The 40 K-parameter student ships.
Public cross-speaker benchmark sweeps are not published yet; see Limitations.
- End-to-end ML pipeline: guided data collection, review gates, training, export, live validation, and model acceptance.
- Teacher-student design: WavLM-base is used only during training; a ~40 K-param RepCNN ships as ONNX.
- Trust boundaries: local-first storage, provenance docs, consent rules, and fingerprinted approvals.
- Deployment focus: exported ONNX model plus threshold/config metadata for app integration.
- False-positive discipline: background, silence, partial phrases, and near-misses are required training data, not afterthoughts.
Alternative open-source wake-word solutions ship pretrained models for a fixed vocabulary. wakeword-forge is for the case where you need to build the model:
- Your phrase, your voice. Record
Hey Novafrom your own mic, or import existing audio. - Local-first. Samples and training stay on your machine. Nothing uploaded by default.
- Review gates. Samples, generated clips, live checks, and final acceptance are explicit, fingerprinted approvals.
- Hard negatives are a first-class input. Background speech, silence, partial phrases, and near-misses get their own training surface.
- 2350× parameter reduction. A 94 M-param WavLM teacher distills into a 40 K-param RepCNN student exported as a single 217 KB
wakeword.onnx.
The dashboard enforces the order. Each review gate is fingerprinted against the underlying audio — if you change samples, prior approvals invalidate.
- Audio stays under the project directory you pass as
DIR; it is not uploaded by default. - Treat voice clips as personal data.
- Only record, import, publish, or contribute voices when the speaker consent and license allow it.
- Generated, TTS, or voice-clone clips must be reviewed before use.
- See DATA_PROVENANCE.md and SECURITY.md before sharing datasets or trained models.
After make train, your project directory has:
output/wakeword.onnx— RepCNN detector, inputwaveform(float32, 16 kHz mono, up to 3 s), outputscore(0–1)output/wakeword.json— threshold, sample rate (16000), mel settings (40 mel, 25 ms / 10 ms), EER
Run it with onnxruntime:
import json, numpy as np, onnxruntime as ort
cfg = json.load(open("output/wakeword.json"))
sess = ort.InferenceSession("output/wakeword.onnx")
# audio_16khz_f32: mono float32 NumPy array, resampled to 16 kHz, up to 3 seconds
score = sess.run(None, {"waveform": audio_16khz_f32[None, :]})[0]
if score.item() > cfg["threshold"]:
print("wake!")Or test it on your mic: make mic-test DIR=./projects/default
| Task | Command |
|---|---|
| Open dashboard | make start DIR=./projects/default |
| Terminal wizard | make cli-run DIR=./projects/default |
| Show status | make info DIR=./projects/default |
| Task | Command |
|---|---|
| Record positives | make record DIR=./projects/default PHRASE='Hey Nova' N=20 |
| Generate TTS positives | make synth DIR=./projects/default PHRASE='Hey Nova' N=300 |
| Import background negatives | make import-negatives DIR=./projects/default NEG_SOURCE_DIR=~/clips NEG_LIMIT=150 |
| Review samples | make review DIR=./projects/default |
| Audit generated clips | make audit DIR=./projects/default |
| Task | Command |
|---|---|
| Train and export ONNX | make train DIR=./projects/default |
| Live quality check | make quality-check DIR=./projects/default |
| Accept the model | make accept-model DIR=./projects/default |
| Test accepted model on mic input | make mic-test DIR=./projects/default |
Full reference, negative imports, synthesis backends, and voice-clone staging are in docs/advanced-usage.md.
- docs/advanced-usage.md — full commands, negative imports, synthesis, training output
- docs/architecture.md — review gates, fingerprinting, ONNX export
- CHANGELOG.md — source release history
- RELEASING.md — source release checklist and tagging commands
- DATA_PROVENANCE.md — consent rules and data sources
- SECURITY.md — handling private audio
- CONTRIBUTING.md · THIRD_PARTY_NOTICES.md · SUPPORT.md
- okay-hermes-repcnn-onnx — example ONNX model output for a Hermes wake phrase, packaged as a small model-card repository.
- okay-hermes-voice — example runtime implementation: an always-on local voice daemon that gates Hermes Agent voice interactions behind an ONNX wake-word detector.
- Single-speaker training generalizes weakly to other speakers, mics, and rooms.
- Benchmark numbers (EER, FAR/FRR sweeps across speakers) are not yet published.
- TTS voices and datasets carry their own license terms — see DATA_PROVENANCE.md.
Apache-2.0. See LICENSE, NOTICE, and CITATION.cff.
Created and maintained by Hasan Ali. See CONTRIBUTING.md for project workflow and support expectations.


