Skip to content

feat(crispasr): bundle espeak-ng and add piper TTS voices to the gallery#10283

Merged
mudler merged 1 commit into
masterfrom
feat/crispasr-piper-voices
Jun 12, 2026
Merged

feat(crispasr): bundle espeak-ng and add piper TTS voices to the gallery#10283
mudler merged 1 commit into
masterfrom
feat/crispasr-piper-voices

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

Adds Piper TTS voices to the gallery, run through the crispasr backend's backend:piper engine, and bundles espeak-ng into the backend image so non-English voices work.

Pairs with #10277 (the piper WAV sample-rate fix) for correct playback rate.

espeak-ng bundling

CrispASR's piper backend phonemizes non-English text via espeak-ng (loaded through the MIT-clean dlopen path; English uses a built-in CMUdict/LTS G2P). The FROM scratch crispasr image shipped none of it, so German/Italian/etc. voices loaded but failed synthesis with piper_tts: phonemization failed.

  • Dockerfile.golang - install espeak-ng-data libespeak-ng1 libpcaudio0 libsonic0 in the crispasr builder. espeak's dlopen of libespeak-ng.so.1 succeeds but fails unless libpcaudio.so.0 + libsonic.so.0 are also present (confirmed via strace).
  • package.sh - copy the three .so into package/lib/ and the espeak-ng-data/ dir into the package root.
  • run.sh - export CRISPASR_ESPEAK_DATA_PATH so the bundled data is found.

No CrispASR rebuild/flag needed: the dlopen path is already compiled in (CRISPASR_WITH_ESPEAK_NG=AUTO).

Voices (9, single-speaker)

Hosted at LocalAI-Community/piper-voices-GGUF, converted from rhasspy/piper-voices with CrispASR's models/convert-piper-to-gguf.py:

Voice Lang Quality Rate
eva_k, karlsson, kerstin, ramona German x_low/low 16 kHz
thorsten German medium 22.05 kHz
cori English (GB) medium 22.05 kHz
lessac English (US) medium 22.05 kHz
paola, riccardo Italian medium/x_low 22.05 / 16 kHz

Only single-speaker, low/medium voices are included - the CrispASR piper engine currently segfaults on multi-speaker models (mls, thorsten_emotional, libritts_r) and high-quality decoders (thorsten-high).

Verification

Built the crispasr image (make docker-build-crispasr), extracted its package, and confirmed every voice synthesizes a WAV at the model's native sample rate using only the image-bundled espeak payload (the build host has no system espeak-ng).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

CrispASR's piper backend phonemizes non-English text via espeak-ng (dlopen,
the MIT-clean path; English uses a built-in G2P). The FROM scratch crispasr
image shipped none of it, so non-English piper voices loaded but failed
synthesis with "phonemization failed". Bundle the espeak-ng runtime so they
work:

- Dockerfile.golang: install espeak-ng-data + libespeak-ng1 and its libpcaudio0
  / libsonic0 deps in the crispasr builder (espeak's dlopen fails without the
  latter two).
- package.sh: copy libespeak-ng.so.1, libpcaudio.so.0, libsonic.so.0 into
  package/lib/ and the espeak-ng-data dir into the package root.
- run.sh: export CRISPASR_ESPEAK_DATA_PATH so the bundled data is found.

Add 9 single-speaker piper voices (de/en/it, incl. Italian paola + riccardo) to
the gallery, run through backend:piper, hosted at
LocalAI-Community/piper-voices-GGUF (converted from rhasspy/piper-voices with
CrispASR's convert-piper-to-gguf.py). Only single-speaker low/medium voices are
included; the engine does not yet support multi-speaker or high-quality piper
decoders.

All 9 verified end-to-end: each synthesizes a WAV at the model's native sample
rate using only the image-bundled espeak payload.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
@mudler mudler merged commit 50dea8c into master Jun 12, 2026
19 of 20 checks passed
@mudler mudler deleted the feat/crispasr-piper-voices branch June 12, 2026 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants