Live mic → MFCCs → small MLP → ARKit blendshapes → 3D mouth.
Runs in your browser. Train it on your own voice in the
Train tab, or use the formant heuristic
out of the box.
Live
click to grant mic access
mapperchecking for model…
Viseme weights
jaw open
aa
oo
oh
ee
consonant
Pipeline
mic → Web Audio FFT →
13-D MFCCs
(src/mfcc.js) →
9-frame window →
MLP 117 → 128 → 64 → 52
(src/model.js, TF.js) →
blendshape→viseme reduce →
3D mouth (src/avatar.js).
Training data comes from MediaPipe FaceLandmarker. Your
webcam labels the audio for you. The browser handles
training too (TF.js). The same model also exports to LiteRT
via training/train_jax.py for shipping to
Android XR or Quest.
Capture
click start to grant permissions
frames captured: 0
How to record
Do each of the 3 takes below once — they
cover different phonetic territory, so reading distinct
material gives the model much more variety than repeating
the same script. Each take is ~75 seconds. Vary your
volume and pace; move your head a bit.
Take 1 — sentences + silence + vowels
"The birch canoe slid on the smooth planks. Glue the
sheet to the dark blue background. It's easy to tell the
depth of a well. These days a chicken leg is a rare
dish. Rice is often served in round bowls. The juice of
lemons makes fine punch."
"Sssssss, ffffff, shhhhh, zzzzzz, vvvvvv, mmmmm,
nnnnnn. Pa pa pa, ta ta ta, ka ka ka, ba ba ba, da da
da, ga ga ga. Pip, top, kick, big, dot, get."
[pause 4 seconds — silent]
"Peter Piper picked a peck of pickled peppers. She
sells sea shells by the sea shore. How now brown cow.
Red lorry, yellow lorry, red lorry, yellow lorry.
Unique New York, unique New York. The Leith police
dismisseth us."
[pause 4 seconds — silent]
──────────────
Take 3 — pangrams + casual speech
"The quick brown fox jumps over the lazy dog. Pack my
box with five dozen liquor jugs. Sphinx of black
quartz, judge my vow. Bright vixens jump dozy fowl
quack. Five quacking zephyrs jolt my wax bed."
[pause 4 seconds — silent]
Then, in a casual voice, as if chatting to a
friend, read:
"I went to the shop this morning, and they were
completely out of milk, which is really annoying
because I was planning to make pancakes for breakfast
and now I have to walk all the way to the other shop
on the corner. Anyway, how are you doing today?
Anything interesting happen?"
[pause 4 seconds — silent]
Drop all three downloads on the
Train tab when done.
Training data
Drop JSON files here, or
frames loaded: 0 (need
~1000+ to train; 30k+ for good quality)
no model saved
drag in JSON files to begin
What this does
Trains a small MLP (117 → 128 → 64 → 52) in your browser
with TensorFlow.js. Inputs are 9-frame windows of MFCCs;
targets are the 52 ARKit blendshape coefficients
MediaPipe captured for that audio.
Saves the trained weights to your browser's
localStorage. The Demo tab auto-loads the
saved model on every visit; the mapper indicator flips to
learned.
For shipping to Android XR or Quest, the same architecture
is in training/train_jax.py — JAX/Flax trainer
that exports to LiteRT (.tflite).