fix(crispasr): write piper TTS WAV at the model's native sample rate by localai-bot · Pull Request #10277 · mudler/LocalAI

localai-bot · 2026-06-12T20:34:02Z

Problem

CrispASR's piper backend returns PCM at the voice's native sample rate (read from the GGUF piper.sample_rate key — 16 kHz for x_low/low, 22.05 kHz for medium/high) and does not resample. The Go WAV encoder in the crispasr backend hardcoded 24000 Hz, so every piper voice was written with a wrong header and played back at the wrong pitch/speed (~+9% for medium voices).

The session-level C-ABI (crispasr_session_synthesize) only returns the sample buffer + count, not the rate, so the rate must be recovered on the Go side.

Fix

piperSampleRate() reads piper.sample_rate (u32) from the model's GGUF metadata via the already-vendored gguf-parser-go.
Load() stores it on the CrispASR struct, falling back to the 24 kHz default for the other CrispASR TTS engines (vibevoice / orpheus / chatterbox / qwen3-tts) that emit 24 kHz and carry no such key.
writeWAV(dst, pcm, rate) (was writeWAV24k) uses the stored rate for both the encoder and the audio.Format.

Pure Go change; no shim/C rebuild needed.

Tests

Unit specs: craft minimal in-memory GGUFs (22050 / 16000 / non-piper / garbage) and decode the produced WAV header — no network or model needed.
Env-gated e2e spec (CRISPASR_PIPER_MODEL_PATH), same convention as the other model-backed specs.

Verified e2e: built libgocrispasr-fallback.so from the current pin and synthesized en_GB-cori-medium through backend:piper → WAV header is 22050 Hz (old code: 24000).

Split out as a standalone correctness fix from in-progress work to add piper voices to the gallery.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

CrispASR's piper backend returns PCM at the voice's native rate (from the GGUF piper.sample_rate key: 16 kHz for x_low/low, 22.05 kHz for medium/high) and does not resample, but the Go WAV encoder hardcoded 24000 Hz. Every piper voice was therefore written with a wrong header and played back at the wrong pitch/speed. Read piper.sample_rate from the model's GGUF metadata at Load via the vendored gguf-parser-go and use it for the WAV header, falling back to the 24 kHz default for the other CrispASR TTS engines (vibevoice/orpheus/chatterbox/qwen3-tts) that emit 24 kHz and carry no such key. Adds unit specs (minimal crafted GGUFs + WAV-header decode) and an env-gated end-to-end spec (CRISPASR_PIPER_MODEL_PATH). Verified e2e: en_GB-cori-medium synthesizes a 22050 Hz WAV through backend:piper. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]

localai-bot mentioned this pull request Jun 12, 2026

feat(crispasr): bundle espeak-ng and add piper TTS voices to the gallery #10283

Merged

mudler merged commit 46ba706 into master Jun 12, 2026
58 of 69 checks passed

mudler deleted the fix/crispasr-piper-samplerate branch June 12, 2026 21:10

localai-bot mentioned this pull request Jun 13, 2026

feat(gallery): add 60 piper TTS voices across 42 languages (Phase 2) #10296

Merged

BrewTestBot mentioned this pull request Jun 13, 2026

localai 4.4.3 Homebrew/homebrew-core#287865

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(crispasr): write piper TTS WAV at the model's native sample rate#10277

fix(crispasr): write piper TTS WAV at the model's native sample rate#10277
mudler merged 1 commit into
masterfrom
fix/crispasr-piper-samplerate

localai-bot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented Jun 12, 2026

Problem

Fix

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants