GitHub - H-Ali13381/wakeword-forge: Local-first personal wake-word training with ONNX export.

Train your own wake word, locally, and ship it as a ~220 KB ONNX file.

What you get · Technical highlights · Privacy · Use the model · Related projects · Advanced docs

wakeword-forge trains a custom wake-word detector for a phrase you choose (Hey Nova, Okay Atlas, anything). You collect local audio, review positives and hard negatives, train a WavLM teacher, distill it into a compact RepCNN student, and export wakeword.onnx.

Audio is local by default. The project is for building a custom detector, not just selecting from a fixed pretrained vocabulary.

Quickstart

Requires Git, Python 3.10+, make, and a microphone. Run commands from the repo root; Make targets create/use the repo-local .venv. Certain functionalities require working CUDA/CuDNN drivers.

git clone https://github.com/H-Ali13381/wakeword-forge.git
cd wakeword-forge

Choose one install path:

Standard install — dashboard plus lightweight local TTS backends:

make start DIR=./projects/default

Full install — includes QwenTTS; recommended for users with CUDA-compatible NVIDIA hardware:

make install-qwentts
make start DIR=./projects/default

DIR is your local wake-word project folder. The default ./projects/default workspace stays inside the checkout but is ignored by git. It will contain samples/, output/wakeword.onnx, and output/wakeword.json.

Name map: the repo and CLI are named wakeword-forge; source code lives in forge/; your local training workspace is whatever DIR=... points at.

Terminal-only: make cli-run DIR=./projects/default

The dashboard guides you through:

Choose a wake phrase.
Record or import positive samples.
Add background, silence, partial phrases, and near-misses.
Review real and generated clips.
Train the WavLM teacher and compact RepCNN student.
Export output/wakeword.onnx.
Run a live mic check before accepting the model.

What you get

A reproducible local pipeline that goes from your voice to a deployable runtime artifact:

Metric	Value
Teacher (training-only)	WavLM-base, 94.4 M params
Student (export)	RepCNN, ~40 K params after reparameterization
Exported model size	217 KB ONNX file
Inference latency	~15 ms per 3 s clip on CPU
Audio frontend	16 kHz mono, 40-mel log-mel, 25 ms / 10 ms
FAR operating point	1 % false-accept budget, threshold + EER stored in `wakeword.json`
Minimum data to train	10 positives, 5 negatives, 150 background, 100 partials (multi-word)

The 94 M-parameter teacher is discarded after distillation. The 40 K-parameter student ships.

Public cross-speaker benchmark sweeps are not published yet; see Limitations.

Technical highlights

End-to-end ML pipeline: guided data collection, review gates, training, export, live validation, and model acceptance.
Teacher-student design: WavLM-base is used only during training; a ~40 K-param RepCNN ships as ONNX.
Trust boundaries: local-first storage, provenance docs, consent rules, and fingerprinted approvals.
Deployment focus: exported ONNX model plus threshold/config metadata for app integration.
False-positive discipline: background, silence, partial phrases, and near-misses are required training data, not afterthoughts.

Why use it

Alternative open-source wake-word solutions ship pretrained models for a fixed vocabulary. wakeword-forge is for the case where you need to build the model:

Your phrase, your voice. Record Hey Nova from your own mic, or import existing audio.
Local-first. Samples and training stay on your machine. Nothing uploaded by default.
Review gates. Samples, generated clips, live checks, and final acceptance are explicit, fingerprinted approvals.
Hard negatives are a first-class input. Background speech, silence, partial phrases, and near-misses get their own training surface.
2350× parameter reduction. A 94 M-param WavLM teacher distills into a 40 K-param RepCNN student exported as a single 217 KB wakeword.onnx.

How it works

The dashboard enforces the order. Each review gate is fingerprinted against the underlying audio — if you change samples, prior approvals invalidate.

Privacy and consent

Audio stays under the project directory you pass as DIR; it is not uploaded by default.
Treat voice clips as personal data.
Only record, import, publish, or contribute voices when the speaker consent and license allow it.
Generated, TTS, or voice-clone clips must be reviewed before use.
See DATA_PROVENANCE.md and SECURITY.md before sharing datasets or trained models.

Use the exported model

After make train, your project directory has:

output/wakeword.onnx — RepCNN detector, input waveform (float32, 16 kHz mono, up to 3 s), output score (0–1)
output/wakeword.json — threshold, sample rate (16000), mel settings (40 mel, 25 ms / 10 ms), EER

Run it with onnxruntime:

import json, numpy as np, onnxruntime as ort

cfg = json.load(open("output/wakeword.json"))
sess = ort.InferenceSession("output/wakeword.onnx")
# audio_16khz_f32: mono float32 NumPy array, resampled to 16 kHz, up to 3 seconds
score = sess.run(None, {"waveform": audio_16khz_f32[None, :]})[0]
if score.item() > cfg["threshold"]:
    print("wake!")

Or test it on your mic: make mic-test DIR=./projects/default

Common commands

Start

Task	Command
Open dashboard	`make start DIR=./projects/default`
Terminal wizard	`make cli-run DIR=./projects/default`
Show status	`make info DIR=./projects/default`

Collect and review

Task	Command
Record positives	`make record DIR=./projects/default PHRASE='Hey Nova' N=20`
Generate TTS positives	`make synth DIR=./projects/default PHRASE='Hey Nova' N=300`
Import background negatives	`make import-negatives DIR=./projects/default NEG_SOURCE_DIR=~/clips NEG_LIMIT=150`
Review samples	`make review DIR=./projects/default`
Audit generated clips	`make audit DIR=./projects/default`

Train and use

Task	Command
Train and export ONNX	`make train DIR=./projects/default`
Live quality check	`make quality-check DIR=./projects/default`
Accept the model	`make accept-model DIR=./projects/default`
Test accepted model on mic input	`make mic-test DIR=./projects/default`

Full reference, negative imports, synthesis backends, and voice-clone staging are in docs/advanced-usage.md.

Documentation

docs/advanced-usage.md — full commands, negative imports, synthesis, training output
docs/architecture.md — review gates, fingerprinting, ONNX export
CHANGELOG.md — source release history
RELEASING.md — source release checklist and tagging commands
DATA_PROVENANCE.md — consent rules and data sources
SECURITY.md — handling private audio
CONTRIBUTING.md · THIRD_PARTY_NOTICES.md · SUPPORT.md

Related projects

okay-hermes-repcnn-onnx — example ONNX model output for a Hermes wake phrase, packaged as a small model-card repository.
okay-hermes-voice — example runtime implementation: an always-on local voice daemon that gates Hermes Agent voice interactions behind an ONNX wake-word detector.

Limitations

Single-speaker training generalizes weakly to other speakers, mics, and rooms.
Benchmark numbers (EER, FAR/FRR sweeps across speakers) are not yet published.
TTS voices and datasets carry their own license terms — see DATA_PROVENANCE.md.

License

Apache-2.0. See LICENSE, NOTICE, and CITATION.cff.

Created and maintained by Hasan Ali. See CONTRIBUTING.md for project workflow and support expectations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quickstart

What you get

Technical highlights

Why use it

How it works

Privacy and consent

Use the exported model

Common commands

Start

Collect and review

Train and use

Documentation

Related projects

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
docker/qwentts		docker/qwentts
docs		docs
forge		forge
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
COMMERCIAL.md		COMMERCIAL.md
CONTRIBUTING.md		CONTRIBUTING.md
DATA_PROVENANCE.md		DATA_PROVENANCE.md
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
RELEASING.md		RELEASING.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
TRADEMARK.md		TRADEMARK.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Quickstart

What you get

Technical highlights

Why use it

How it works

Privacy and consent

Use the exported model

Common commands

Start

Collect and review

Train and use

Documentation

Related projects

Limitations

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages