feat(#187 AC#4): Piper TTS backend + /api/voice/synthesize route#255
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an embedded Piper TTS implementation to @skytwin/embedded-llm and exposes it via a new /api/voice/synthesize route, extending /api/voice/capabilities to report both STT and TTS capabilities while preserving the legacy STT-shaped fields.
Changes:
- Introduce
PiperTtsBackend(+findFirstPiperModel) and a newcreateEmbeddedTtsPort()factory that mirrors the existing embedded STT/text port resolution flow. - Add
POST /api/voice/synthesizeand extendGET /api/voice/capabilities/:userIdto return nestedstt/ttsblocks (legacy fields preserved). - Add comprehensive unit tests for the new backend and API routes; document the feature in the changelog.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/embedded-llm/src/piper-tts-backend.ts | New Piper CLI-backed TTS port + model discovery helper. |
| packages/embedded-llm/src/index.ts | Re-export new TTS backend/helper and createEmbeddedTtsPort. |
| packages/embedded-llm/src/factory.ts | Add createEmbeddedTtsPort() to resolve Piper binary + model or fall back to Null port. |
| packages/embedded-llm/src/tests/piper-tts-backend.test.ts | Unit coverage for Piper backend behavior, error paths, and model discovery. |
| apps/api/src/routes/voice.ts | Add /synthesize route and extend /capabilities to include TTS. |
| apps/api/src/tests/voice-routes.test.ts | Add API tests for new synth route and expanded capabilities shape. |
| CHANGELOG.md | Document the new Piper TTS backend and API surface. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
58
to
61
| export function createVoiceRouter(): Router { | ||
| const router = Router(); | ||
| bindUserIdParamOwnership(router); | ||
|
|
|
|
||
| private spawnPiper(args: string[], stdinText: string): Promise<void> { | ||
| return new Promise((resolve, reject) => { | ||
| const child = spawn(this.binaryPath, args, { stdio: ['pipe', 'pipe', 'pipe'] }); |
Comment on lines
+170
to
+171
| child.stdin?.write(stdinText); | ||
| child.stdin?.end(); |
Comment on lines
+191
to
+195
| res.json({ | ||
| audioBase64: wav.toString('base64'), | ||
| durationBytes: wav.length, | ||
| voice: opts.voice ?? port.capabilities.voices[0] ?? '', | ||
| }); |
Comment on lines
58
to
61
| export function createVoiceRouter(): Router { | ||
| const router = Router(); | ||
| bindUserIdParamOwnership(router); | ||
|
|
Comment on lines
+20
to
+28
| * Backed by `createEmbeddedSttPort()` + `createEmbeddedTtsPort()` from | ||
| * `@skytwin/embedded-llm`. Ports return their Null* fallbacks when the | ||
| * corresponding binary isn't installed — those throw `NotAvailableError` | ||
| * on use, which we surface as 503 so the client can fall back to a | ||
| * manual transcript / silent text rendering. | ||
| * | ||
| * Why the binary lives behind a single port: the same backend serves | ||
| * Why the binaries live behind a single port: the same backend serves | ||
| * desktop voice-first (#194 Child 4) and mobile voice (#179). Both | ||
| * clients POST audio here; one place to install/upgrade the model. | ||
| * clients POST here; one place to install/upgrade the model. |
Comment on lines
+34
to
+41
| function getPort(): Promise<EmbeddedSttPort> { | ||
| if (cachedPort === null) cachedPort = createEmbeddedSttPort(); | ||
| return cachedPort; | ||
| if (cachedSttPort === null) cachedSttPort = createEmbeddedSttPort(); | ||
| return cachedSttPort; | ||
| } | ||
|
|
||
| function getTtsPort(): Promise<EmbeddedTtsPort> { | ||
| if (cachedTtsPort === null) cachedTtsPort = createEmbeddedTtsPort(); | ||
| return cachedTtsPort; |
| ## [unreleased] — Piper TTS backend + `/api/voice/synthesize` route (#187 AC#4) | ||
|
|
||
| Closes #187 AC#4. Mirrors the proven spawn pattern of | ||
| `LlamaCppTextBackend` and `WhisperCppSttBackend`. Three pieces: |
jayzalowitz
added a commit
that referenced
this pull request
May 11, 2026
… newline, audioBytes naming Four Copilot findings on PR #255 addressed: 1. /api/voice mount: now goes through requireOwnership so POST /transcribe and /synthesize body-userId is checked against the authenticated session. The in-router bindUserIdParamOwnership only covered :userId path params; body POSTs were unprotected. 2. PiperTtsBackend.spawnPiper: stdout switched from 'pipe' to 'ignore'. The WAV is read from --output_file, not stdout — leaving stdout piped without consuming it could block piper once the OS pipe buffer filled. Matches whisper-cli pattern. 3. Piper stdin now gets a trailing \n so the newline-delimited reader treats the input as one complete utterance. Test updated. 4. /api/voice/synthesize response: durationBytes → audioBytes. "Duration" implied seconds; the value is a byte count of the WAV. New endpoint, no compat concern. Test plan: embedded-llm 86/86, api 551/551 green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jayzalowitz
added a commit
that referenced
this pull request
May 12, 2026
…ing, fix CHANGELOG pieces count Three Copilot round-2 findings on PR #255 addressed: 1. getPort() renamed to getSttPort() throughout the file. With both STT and TTS ports cached in this router, "getPort" was ambiguous and could lead to accidentally calling the wrong cached port as the file evolves. 2. Header doc updated: removed the "binaries live behind a single port" wording — there are two ports now (STT + TTS). The new wording reflects the actual architecture. 3. CHANGELOG entry "Three pieces:" → "Four pieces:" to match the actual list (backend, findFirstPiperModel, factory, API route). Test plan: api 551/551 green; build clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Owner
Author
|
Round-2 review reply: Already fixed in round 1 (037938c):
Round-2 findings fixed in 4ffac42:
|
Mirrors the proven spawn pattern of LlamaCppTextBackend and
WhisperCppSttBackend.
- PiperTtsBackend implements EmbeddedTtsPort. Spawns piper with
--model <model.onnx> --output_file <tmp> --quiet, writes text to
stdin, reads the resulting WAV into a Buffer on success. Cleans up
the tempdir on both success and failure. Bounded inputs (max 8000
chars; mismatched voice request fails hard).
- findFirstPiperModel(dir) locates the first .onnx model with a paired
.onnx.json config (Piper requires both). Catches "stray .onnx,
missing config" at boot instead of at synth time.
- createEmbeddedTtsPort() factory mirrors createEmbeddedSttPort: probe
the runtime detector for a piper binary, resolve a voice model,
fall back to NullEmbeddedTtsPort when either is missing.
- POST /api/voice/synthesize consumer. Body { userId, text, voice? }
→ { audioBase64, durationBytes, voice }. 503 + hint when piper not
installed. GET /capabilities now reports stt + tts blocks alongside
legacy STT-shaped fields so older clients keep working.
Tests: 15 new vitest cases for PiperTtsBackend + findFirstPiperModel
(mocked node:child_process + node:fs so they run with no piper on
the host). 9 new API tests for /synthesize + the updated capabilities
shape. Workspace: 70/70 turbo tasks green; build clean.
Out of scope (follow-ups): bundling the piper binary + a default
voice model (joins #187 AC#1 + #188 distribution work); auto-speaking
briefings (UI follow-up; the backend it needs is now in main).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… newline, audioBytes naming Four Copilot findings on PR #255 addressed: 1. /api/voice mount: now goes through requireOwnership so POST /transcribe and /synthesize body-userId is checked against the authenticated session. The in-router bindUserIdParamOwnership only covered :userId path params; body POSTs were unprotected. 2. PiperTtsBackend.spawnPiper: stdout switched from 'pipe' to 'ignore'. The WAV is read from --output_file, not stdout — leaving stdout piped without consuming it could block piper once the OS pipe buffer filled. Matches whisper-cli pattern. 3. Piper stdin now gets a trailing \n so the newline-delimited reader treats the input as one complete utterance. Test updated. 4. /api/voice/synthesize response: durationBytes → audioBytes. "Duration" implied seconds; the value is a byte count of the WAV. New endpoint, no compat concern. Test plan: embedded-llm 86/86, api 551/551 green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ing, fix CHANGELOG pieces count Three Copilot round-2 findings on PR #255 addressed: 1. getPort() renamed to getSttPort() throughout the file. With both STT and TTS ports cached in this router, "getPort" was ambiguous and could lead to accidentally calling the wrong cached port as the file evolves. 2. Header doc updated: removed the "binaries live behind a single port" wording — there are two ports now (STT + TTS). The new wording reflects the actual architecture. 3. CHANGELOG entry "Three pieces:" → "Four pieces:" to match the actual list (backend, findFirstPiperModel, factory, API route). Test plan: api 551/551 green; build clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4ffac42 to
8536a98
Compare
This was referenced May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #187 AC#4. Mirrors the proven spawn pattern of
LlamaCppTextBackendandWhisperCppSttBackend— same shape, same diagnostics, sameNull*fallback when the binary isn't installed.Takes #187 from 7/8 → 8/8 closed for code-bound ACs. The last remaining AC (#1 bundling) is distribution work paired with #188.
What changed
New backend (
@skytwin/embedded-llm)piper-tts-backend.ts—PiperTtsBackend implements EmbeddedTtsPort. Spawnspiper --model <model.onnx> --output_file <tmp> --quiet, writes text to stdin, reads the resulting WAV into aBufferon successful exit. Cleans up the tempdir on both success and failure. Bounded inputs: text required, max 8000 chars; mismatched voice request fails hard rather than silently substituting.findFirstPiperModel(dir)locates the first.onnxvoice model with a paired.onnx.jsonconfig (Piper requires both). The pairing check is what differentiates a usable voice from a stray.onnxsomeone dropped in — catching it at detection keeps the failure visible at boot rather than at synth time.factory.ts → createEmbeddedTtsPort(overrides?)mirrorscreateEmbeddedSttPort: proberuntime-detectorfor apiperbinary (env-var override → PATH lookup), then resolve a voice model (env-var override → first valid pair in the configured model dir). Falls back toNullEmbeddedTtsPortwhen either is missing.New API consumer (
apps/api/src/routes/voice.ts)POST /api/voice/synthesize— body{ userId, text, voice? }→ response{ audioBase64, durationBytes, voice }. Base64 instead of binary so it goes through the same JSON envelope the rest of the API uses (the mobile client + web dashboard both decode base64 → Blob/audio element).install piper-tts and an .onnx voice model, or set SKYTWIN_PIPER_BIN + SKYTWIN_PIPER_MODEL).GET /api/voice/capabilitiesnow reportsstt+ttscapability blocks alongside the legacy STT-shaped fields so older clients keep working.What this PR deliberately does NOT do
brew install piper-ttson macOS,apt install piper-ttson Ubuntu) and drop an.onnx+ matching.onnx.jsonconfig in the configured model dir. Bundling joins the same distribution work as Capability loop #N: Embedded LLM with auto-upgrade path (Phi/Llama/Qwen via llama.cpp; Whisper-tiny STT; Piper TTS) #187 AC#1 (default GGUF) paired with Capability loop #M: Turnkey distribution + embedded runtime (signed installers, single-binary, embedded SQLite, auto-update) #188 turnkey distribution.Test plan
piper-tts-backend.test.tscovering:available=true, voice list).findFirstPiperModel: null dir, missing dir, paired-config check, non-.onnxskipped, orphan.onnx(no config) skipped.voice-routes.test.tscovering:/capabilitiesresponse shape (legacy + newstt/ttsblocks)./synthesizehappy path (base64 WAV + voice in response).node:child_process+node:fsstubs so tests run hermetically with no piper on the host.pnpm test).pnpm build --concurrency=1clean.@skytwin/embedded-llm: 86 tests passing (+15 new).@skytwin/api: 551 tests passing (+9 new).Notes for reviewers
--quietflag on piper suppresses its banner + progress output on stderr so the diagnostics tail we collect on failure isn't 90% noise — same hygiene as the whisper-cli-np -ntflags.modelPathis the documented way to switch voices.stt/ttsnested blocks in/capabilitiesare additive; legacy clients that readbody.available/body.supportedFormatskeep working unchanged.🤖 Generated with Claude Code