feat(#187 AC#4): Piper TTS backend + /api/voice/synthesize route by jayzalowitz · Pull Request #255 · jayzalowitz/skytwin

jayzalowitz · 2026-05-11T15:22:22Z

Summary

Closes #187 AC#4. Mirrors the proven spawn pattern of LlamaCppTextBackend and WhisperCppSttBackend — same shape, same diagnostics, same Null* fallback when the binary isn't installed.

Takes #187 from 7/8 → 8/8 closed for code-bound ACs. The last remaining AC (#1 bundling) is distribution work paired with #188.

What changed

New backend (`@skytwin/embedded-llm`)

piper-tts-backend.ts — PiperTtsBackend implements EmbeddedTtsPort. Spawns piper --model <model.onnx> --output_file <tmp> --quiet, writes text to stdin, reads the resulting WAV into a Buffer on successful exit. Cleans up the tempdir on both success and failure. Bounded inputs: text required, max 8000 chars; mismatched voice request fails hard rather than silently substituting.
findFirstPiperModel(dir) locates the first .onnx voice model with a paired .onnx.json config (Piper requires both). The pairing check is what differentiates a usable voice from a stray .onnx someone dropped in — catching it at detection keeps the failure visible at boot rather than at synth time.
factory.ts → createEmbeddedTtsPort(overrides?) mirrors createEmbeddedSttPort: probe runtime-detector for a piper binary (env-var override → PATH lookup), then resolve a voice model (env-var override → first valid pair in the configured model dir). Falls back to NullEmbeddedTtsPort when either is missing.

New API consumer (`apps/api/src/routes/voice.ts`)

POST /api/voice/synthesize — body { userId, text, voice? } → response { audioBase64, durationBytes, voice }. Base64 instead of binary so it goes through the same JSON envelope the rest of the API uses (the mobile client + web dashboard both decode base64 → Blob/audio element).
503 + hint when no piper binary is on PATH, matching the STT path's shape. Same recovery message (install piper-tts and an .onnx voice model, or set SKYTWIN_PIPER_BIN + SKYTWIN_PIPER_MODEL).
GET /api/voice/capabilities now reports stt + tts capability blocks alongside the legacy STT-shaped fields so older clients keep working.

What this PR deliberately does NOT do

Bundled piper binary / voice model. Still requires the operator to install piper-tts (brew install piper-tts on macOS, apt install piper-tts on Ubuntu) and drop an .onnx + matching .onnx.json config in the configured model dir. Bundling joins the same distribution work as Capability loop #N: Embedded LLM with auto-upgrade path (Phi/Llama/Qwen via llama.cpp; Whisper-tiny STT; Piper TTS) #187 AC#1 (default GGUF) paired with Capability loop #M: Turnkey distribution + embedded runtime (signed installers, single-binary, embedded SQLite, auto-update) #188 turnkey distribution.
Briefing → speech wiring. The API surface is reachable now but the dashboard / mobile briefing screen doesn't yet auto-speak the current briefing. That's a UI follow-up gated on the existing briefing surface; the backend it needs is in main as of this PR.

Test plan

15 new vitest cases in piper-tts-backend.test.ts covering:
- Capabilities (available=true, voice list).
- Happy path: stdin write + spawn args + WAV read + tempdir cleanup.
- Non-zero exit: stderr tail surfaced in error message + cleanup still happens.
- Exit 0 but missing WAV → rejection.
- Exit 0 but empty WAV → rejection.
- Empty text → synchronous reject, never spawns.
- Text > 8000 chars → synchronous reject, never spawns.
- Voice mismatch → reject, never spawns.
- Voice match → resolves.
- findFirstPiperModel: null dir, missing dir, paired-config check, non-.onnx skipped, orphan .onnx (no config) skipped.
9 new API tests in voice-routes.test.ts covering:
- Updated /capabilities response shape (legacy + new stt/tts blocks).
- /synthesize happy path (base64 WAV + voice in response).
- Voice option forwarded to port.
- 503 when piper unavailable.
- 400 for missing userId / empty text.
- 413 for text exceeding the 8000-char ceiling.
All mocks use node:child_process + node:fs stubs so tests run hermetically with no piper on the host.
Full workspace: 70/70 turbo tasks green (pnpm test).
pnpm build --concurrency=1 clean.
@skytwin/embedded-llm: 86 tests passing (+15 new). @skytwin/api: 551 tests passing (+9 new).

Notes for reviewers

The --quiet flag on piper suppresses its banner + progress output on stderr so the diagnostics tail we collect on failure isn't 90% noise — same hygiene as the whisper-cli -np -nt flags.
Voice mismatch hard-fails rather than silently substituting. Re-instantiating the backend with a matching modelPath is the documented way to switch voices.
The stt/tts nested blocks in /capabilities are additive; legacy clients that read body.available / body.supportedFormats keep working unchanged.

🤖 Generated with Claude Code

Copilot

Pull request overview

Adds an embedded Piper TTS implementation to @skytwin/embedded-llm and exposes it via a new /api/voice/synthesize route, extending /api/voice/capabilities to report both STT and TTS capabilities while preserving the legacy STT-shaped fields.

Changes:

Introduce PiperTtsBackend (+ findFirstPiperModel) and a new createEmbeddedTtsPort() factory that mirrors the existing embedded STT/text port resolution flow.
Add POST /api/voice/synthesize and extend GET /api/voice/capabilities/:userId to return nested stt/tts blocks (legacy fields preserved).
Add comprehensive unit tests for the new backend and API routes; document the feature in the changelog.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
packages/embedded-llm/src/piper-tts-backend.ts	New Piper CLI-backed TTS port + model discovery helper.
packages/embedded-llm/src/index.ts	Re-export new TTS backend/helper and `createEmbeddedTtsPort`.
packages/embedded-llm/src/factory.ts	Add `createEmbeddedTtsPort()` to resolve Piper binary + model or fall back to Null port.
packages/embedded-llm/src/tests/piper-tts-backend.test.ts	Unit coverage for Piper backend behavior, error paths, and model discovery.
apps/api/src/routes/voice.ts	Add `/synthesize` route and extend `/capabilities` to include TTS.
apps/api/src/tests/voice-routes.test.ts	Add API tests for new synth route and expanded capabilities shape.
CHANGELOG.md	Document the new Piper TTS backend and API surface.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 export function createVoiceRouter(): Router {
  const router = Router();
  bindUserIdParamOwnership(router);



+
+  private spawnPiper(args: string[], stdinText: string): Promise<void> {
+    return new Promise((resolve, reject) => {
+      const child = spawn(this.binaryPath, args, { stdio: ['pipe', 'pipe', 'pipe'] });


+      child.stdin?.write(stdinText);
+      child.stdin?.end();


+      res.json({
+        audioBase64: wav.toString('base64'),
+        durationBytes: wav.length,
+        voice: opts.voice ?? port.capabilities.voices[0] ?? '',
+      });


Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

 export function createVoiceRouter(): Router {
  const router = Router();
  bindUserIdParamOwnership(router);



+ * Backed by `createEmbeddedSttPort()` + `createEmbeddedTtsPort()` from
+ * `@skytwin/embedded-llm`. Ports return their Null* fallbacks when the
+ * corresponding binary isn't installed — those throw `NotAvailableError`
+ * on use, which we surface as 503 so the client can fall back to a
+ * manual transcript / silent text rendering.
 *
- * Why the binary lives behind a single port: the same backend serves
+ * Why the binaries live behind a single port: the same backend serves
 * desktop voice-first (#194 Child 4) and mobile voice (#179). Both
- * clients POST audio here; one place to install/upgrade the model.
+ * clients POST here; one place to install/upgrade the model.


 function getPort(): Promise<EmbeddedSttPort> {
-  if (cachedPort === null) cachedPort = createEmbeddedSttPort();
-  return cachedPort;
+  if (cachedSttPort === null) cachedSttPort = createEmbeddedSttPort();
+  return cachedSttPort;
+}
+
+function getTtsPort(): Promise<EmbeddedTtsPort> {
+  if (cachedTtsPort === null) cachedTtsPort = createEmbeddedTtsPort();
+  return cachedTtsPort;


+## [unreleased] — Piper TTS backend + `/api/voice/synthesize` route (#187 AC#4)
+
+Closes #187 AC#4. Mirrors the proven spawn pattern of
+`LlamaCppTextBackend` and `WhisperCppSttBackend`. Three pieces:


… newline, audioBytes naming Four Copilot findings on PR #255 addressed: 1. /api/voice mount: now goes through requireOwnership so POST /transcribe and /synthesize body-userId is checked against the authenticated session. The in-router bindUserIdParamOwnership only covered :userId path params; body POSTs were unprotected. 2. PiperTtsBackend.spawnPiper: stdout switched from 'pipe' to 'ignore'. The WAV is read from --output_file, not stdout — leaving stdout piped without consuming it could block piper once the OS pipe buffer filled. Matches whisper-cli pattern. 3. Piper stdin now gets a trailing \n so the newline-delimited reader treats the input as one complete utterance. Test updated. 4. /api/voice/synthesize response: durationBytes → audioBytes. "Duration" implied seconds; the value is a byte count of the WAV. New endpoint, no compat concern. Test plan: embedded-llm 86/86, api 551/551 green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ing, fix CHANGELOG pieces count Three Copilot round-2 findings on PR #255 addressed: 1. getPort() renamed to getSttPort() throughout the file. With both STT and TTS ports cached in this router, "getPort" was ambiguous and could lead to accidentally calling the wrong cached port as the file evolves. 2. Header doc updated: removed the "binaries live behind a single port" wording — there are two ports now (STT + TTS). The new wording reflects the actual architecture. 3. CHANGELOG entry "Three pieces:" → "Four pieces:" to match the actual list (backend, findFirstPiperModel, factory, API route). Test plan: api 551/551 green; build clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

jayzalowitz · 2026-05-12T02:54:07Z

Round-2 review reply:

Already fixed in round 1 (037938c):

/api/voice ownership: mounted with requireOwnership (apps/api/src/index.ts:242). Body userId is enforced.

Round-2 findings fixed in 4ffac42:

getPort() renamed to getSttPort() to disambiguate from the new TTS port.
"single port" wording in the header doc replaced — there are two ports now.
CHANGELOG "Three pieces:" → "Four pieces:" to match the actual list.

Mirrors the proven spawn pattern of LlamaCppTextBackend and WhisperCppSttBackend. - PiperTtsBackend implements EmbeddedTtsPort. Spawns piper with --model <model.onnx> --output_file <tmp> --quiet, writes text to stdin, reads the resulting WAV into a Buffer on success. Cleans up the tempdir on both success and failure. Bounded inputs (max 8000 chars; mismatched voice request fails hard). - findFirstPiperModel(dir) locates the first .onnx model with a paired .onnx.json config (Piper requires both). Catches "stray .onnx, missing config" at boot instead of at synth time. - createEmbeddedTtsPort() factory mirrors createEmbeddedSttPort: probe the runtime detector for a piper binary, resolve a voice model, fall back to NullEmbeddedTtsPort when either is missing. - POST /api/voice/synthesize consumer. Body { userId, text, voice? } → { audioBase64, durationBytes, voice }. 503 + hint when piper not installed. GET /capabilities now reports stt + tts blocks alongside legacy STT-shaped fields so older clients keep working. Tests: 15 new vitest cases for PiperTtsBackend + findFirstPiperModel (mocked node:child_process + node:fs so they run with no piper on the host). 9 new API tests for /synthesize + the updated capabilities shape. Workspace: 70/70 turbo tasks green; build clean. Out of scope (follow-ups): bundling the piper binary + a default voice model (joins #187 AC#1 + #188 distribution work); auto-speaking briefings (UI follow-up; the backend it needs is now in main). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… newline, audioBytes naming Four Copilot findings on PR #255 addressed: 1. /api/voice mount: now goes through requireOwnership so POST /transcribe and /synthesize body-userId is checked against the authenticated session. The in-router bindUserIdParamOwnership only covered :userId path params; body POSTs were unprotected. 2. PiperTtsBackend.spawnPiper: stdout switched from 'pipe' to 'ignore'. The WAV is read from --output_file, not stdout — leaving stdout piped without consuming it could block piper once the OS pipe buffer filled. Matches whisper-cli pattern. 3. Piper stdin now gets a trailing \n so the newline-delimited reader treats the input as one complete utterance. Test updated. 4. /api/voice/synthesize response: durationBytes → audioBytes. "Duration" implied seconds; the value is a byte count of the WAV. New endpoint, no compat concern. Test plan: embedded-llm 86/86, api 551/551 green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ing, fix CHANGELOG pieces count Three Copilot round-2 findings on PR #255 addressed: 1. getPort() renamed to getSttPort() throughout the file. With both STT and TTS ports cached in this router, "getPort" was ambiguous and could lead to accidentally calling the wrong cached port as the file evolves. 2. Header doc updated: removed the "binaries live behind a single port" wording — there are two ports now (STT + TTS). The new wording reflects the actual architecture. 3. CHANGELOG entry "Three pieces:" → "Four pieces:" to match the actual list (backend, findFirstPiperModel, factory, API route). Test plan: api 551/551 green; build clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 11, 2026 15:22

Copilot started reviewing on behalf of jayzalowitz May 11, 2026 15:23 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

jayzalowitz requested a review from Copilot May 11, 2026 23:11

Copilot started reviewing on behalf of jayzalowitz May 11, 2026 23:12 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

jayzalowitz and others added 3 commits May 12, 2026 00:47

jayzalowitz force-pushed the jayzalowitz/issue-187-ac4-piper-tts branch from 4ffac42 to 8536a98 Compare May 12, 2026 04:47

jayzalowitz merged commit c32de57 into main May 12, 2026
7 checks passed

This was referenced May 13, 2026

docs: sync README + CLAUDE.md with v0.6.18-0.6.21 merge sweep #273

Merged

Capability loop #N: Embedded LLM with auto-upgrade path (Phi/Llama/Qwen via llama.cpp; Whisper-tiny STT; Piper TTS) #187

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#187 AC#4): Piper TTS backend + /api/voice/synthesize route#255

feat(#187 AC#4): Piper TTS backend + /api/voice/synthesize route#255
jayzalowitz merged 3 commits into
mainfrom
jayzalowitz/issue-187-ac4-piper-tts

jayzalowitz commented May 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

jayzalowitz commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jayzalowitz commented May 11, 2026

Summary

What changed

New backend (@skytwin/embedded-llm)

New API consumer (apps/api/src/routes/voice.ts)

What this PR deliberately does NOT do

Test plan

Notes for reviewers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

jayzalowitz commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New backend (`@skytwin/embedded-llm`)

New API consumer (`apps/api/src/routes/voice.ts`)