Skip to content

feat(#187 AC#4): Piper TTS backend + /api/voice/synthesize route#255

Merged
jayzalowitz merged 3 commits into
mainfrom
jayzalowitz/issue-187-ac4-piper-tts
May 12, 2026
Merged

feat(#187 AC#4): Piper TTS backend + /api/voice/synthesize route#255
jayzalowitz merged 3 commits into
mainfrom
jayzalowitz/issue-187-ac4-piper-tts

Conversation

@jayzalowitz

Copy link
Copy Markdown
Owner

Summary

Closes #187 AC#4. Mirrors the proven spawn pattern of LlamaCppTextBackend and WhisperCppSttBackend — same shape, same diagnostics, same Null* fallback when the binary isn't installed.

Takes #187 from 7/8 → 8/8 closed for code-bound ACs. The last remaining AC (#1 bundling) is distribution work paired with #188.

What changed

New backend (@skytwin/embedded-llm)

  • piper-tts-backend.tsPiperTtsBackend implements EmbeddedTtsPort. Spawns piper --model <model.onnx> --output_file <tmp> --quiet, writes text to stdin, reads the resulting WAV into a Buffer on successful exit. Cleans up the tempdir on both success and failure. Bounded inputs: text required, max 8000 chars; mismatched voice request fails hard rather than silently substituting.
  • findFirstPiperModel(dir) locates the first .onnx voice model with a paired .onnx.json config (Piper requires both). The pairing check is what differentiates a usable voice from a stray .onnx someone dropped in — catching it at detection keeps the failure visible at boot rather than at synth time.
  • factory.ts → createEmbeddedTtsPort(overrides?) mirrors createEmbeddedSttPort: probe runtime-detector for a piper binary (env-var override → PATH lookup), then resolve a voice model (env-var override → first valid pair in the configured model dir). Falls back to NullEmbeddedTtsPort when either is missing.

New API consumer (apps/api/src/routes/voice.ts)

  • POST /api/voice/synthesize — body { userId, text, voice? } → response { audioBase64, durationBytes, voice }. Base64 instead of binary so it goes through the same JSON envelope the rest of the API uses (the mobile client + web dashboard both decode base64 → Blob/audio element).
  • 503 + hint when no piper binary is on PATH, matching the STT path's shape. Same recovery message (install piper-tts and an .onnx voice model, or set SKYTWIN_PIPER_BIN + SKYTWIN_PIPER_MODEL).
  • GET /api/voice/capabilities now reports stt + tts capability blocks alongside the legacy STT-shaped fields so older clients keep working.

What this PR deliberately does NOT do

Test plan

  • 15 new vitest cases in piper-tts-backend.test.ts covering:
    • Capabilities (available=true, voice list).
    • Happy path: stdin write + spawn args + WAV read + tempdir cleanup.
    • Non-zero exit: stderr tail surfaced in error message + cleanup still happens.
    • Exit 0 but missing WAV → rejection.
    • Exit 0 but empty WAV → rejection.
    • Empty text → synchronous reject, never spawns.
    • Text > 8000 chars → synchronous reject, never spawns.
    • Voice mismatch → reject, never spawns.
    • Voice match → resolves.
    • findFirstPiperModel: null dir, missing dir, paired-config check, non-.onnx skipped, orphan .onnx (no config) skipped.
  • 9 new API tests in voice-routes.test.ts covering:
    • Updated /capabilities response shape (legacy + new stt/tts blocks).
    • /synthesize happy path (base64 WAV + voice in response).
    • Voice option forwarded to port.
    • 503 when piper unavailable.
    • 400 for missing userId / empty text.
    • 413 for text exceeding the 8000-char ceiling.
  • All mocks use node:child_process + node:fs stubs so tests run hermetically with no piper on the host.
  • Full workspace: 70/70 turbo tasks green (pnpm test).
  • pnpm build --concurrency=1 clean.
  • @skytwin/embedded-llm: 86 tests passing (+15 new). @skytwin/api: 551 tests passing (+9 new).

Notes for reviewers

  • The --quiet flag on piper suppresses its banner + progress output on stderr so the diagnostics tail we collect on failure isn't 90% noise — same hygiene as the whisper-cli -np -nt flags.
  • Voice mismatch hard-fails rather than silently substituting. Re-instantiating the backend with a matching modelPath is the documented way to switch voices.
  • The stt/tts nested blocks in /capabilities are additive; legacy clients that read body.available / body.supportedFormats keep working unchanged.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 11, 2026 15:22

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an embedded Piper TTS implementation to @skytwin/embedded-llm and exposes it via a new /api/voice/synthesize route, extending /api/voice/capabilities to report both STT and TTS capabilities while preserving the legacy STT-shaped fields.

Changes:

  • Introduce PiperTtsBackend (+ findFirstPiperModel) and a new createEmbeddedTtsPort() factory that mirrors the existing embedded STT/text port resolution flow.
  • Add POST /api/voice/synthesize and extend GET /api/voice/capabilities/:userId to return nested stt/tts blocks (legacy fields preserved).
  • Add comprehensive unit tests for the new backend and API routes; document the feature in the changelog.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/embedded-llm/src/piper-tts-backend.ts New Piper CLI-backed TTS port + model discovery helper.
packages/embedded-llm/src/index.ts Re-export new TTS backend/helper and createEmbeddedTtsPort.
packages/embedded-llm/src/factory.ts Add createEmbeddedTtsPort() to resolve Piper binary + model or fall back to Null port.
packages/embedded-llm/src/tests/piper-tts-backend.test.ts Unit coverage for Piper backend behavior, error paths, and model discovery.
apps/api/src/routes/voice.ts Add /synthesize route and extend /capabilities to include TTS.
apps/api/src/tests/voice-routes.test.ts Add API tests for new synth route and expanded capabilities shape.
CHANGELOG.md Document the new Piper TTS backend and API surface.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 58 to 61
export function createVoiceRouter(): Router {
const router = Router();
bindUserIdParamOwnership(router);


private spawnPiper(args: string[], stdinText: string): Promise<void> {
return new Promise((resolve, reject) => {
const child = spawn(this.binaryPath, args, { stdio: ['pipe', 'pipe', 'pipe'] });
Comment on lines +170 to +171
child.stdin?.write(stdinText);
child.stdin?.end();
Comment on lines +191 to +195
res.json({
audioBase64: wav.toString('base64'),
durationBytes: wav.length,
voice: opts.voice ?? port.capabilities.voices[0] ?? '',
});

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment on lines 58 to 61
export function createVoiceRouter(): Router {
const router = Router();
bindUserIdParamOwnership(router);

Comment thread apps/api/src/routes/voice.ts Outdated
Comment on lines +20 to +28
* Backed by `createEmbeddedSttPort()` + `createEmbeddedTtsPort()` from
* `@skytwin/embedded-llm`. Ports return their Null* fallbacks when the
* corresponding binary isn't installed — those throw `NotAvailableError`
* on use, which we surface as 503 so the client can fall back to a
* manual transcript / silent text rendering.
*
* Why the binary lives behind a single port: the same backend serves
* Why the binaries live behind a single port: the same backend serves
* desktop voice-first (#194 Child 4) and mobile voice (#179). Both
* clients POST audio here; one place to install/upgrade the model.
* clients POST here; one place to install/upgrade the model.
Comment thread apps/api/src/routes/voice.ts Outdated
Comment on lines +34 to +41
function getPort(): Promise<EmbeddedSttPort> {
if (cachedPort === null) cachedPort = createEmbeddedSttPort();
return cachedPort;
if (cachedSttPort === null) cachedSttPort = createEmbeddedSttPort();
return cachedSttPort;
}

function getTtsPort(): Promise<EmbeddedTtsPort> {
if (cachedTtsPort === null) cachedTtsPort = createEmbeddedTtsPort();
return cachedTtsPort;
Comment thread CHANGELOG.md Outdated
## [unreleased] — Piper TTS backend + `/api/voice/synthesize` route (#187 AC#4)

Closes #187 AC#4. Mirrors the proven spawn pattern of
`LlamaCppTextBackend` and `WhisperCppSttBackend`. Three pieces:
jayzalowitz added a commit that referenced this pull request May 11, 2026
… newline, audioBytes naming

Four Copilot findings on PR #255 addressed:

1. /api/voice mount: now goes through requireOwnership so POST
   /transcribe and /synthesize body-userId is checked against the
   authenticated session. The in-router bindUserIdParamOwnership
   only covered :userId path params; body POSTs were unprotected.

2. PiperTtsBackend.spawnPiper: stdout switched from 'pipe' to
   'ignore'. The WAV is read from --output_file, not stdout —
   leaving stdout piped without consuming it could block piper
   once the OS pipe buffer filled. Matches whisper-cli pattern.

3. Piper stdin now gets a trailing \n so the newline-delimited
   reader treats the input as one complete utterance. Test updated.

4. /api/voice/synthesize response: durationBytes → audioBytes.
   "Duration" implied seconds; the value is a byte count of the WAV.
   New endpoint, no compat concern.

Test plan: embedded-llm 86/86, api 551/551 green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jayzalowitz added a commit that referenced this pull request May 12, 2026
…ing, fix CHANGELOG pieces count

Three Copilot round-2 findings on PR #255 addressed:

1. getPort() renamed to getSttPort() throughout the file. With both
   STT and TTS ports cached in this router, "getPort" was ambiguous
   and could lead to accidentally calling the wrong cached port as
   the file evolves.

2. Header doc updated: removed the "binaries live behind a single
   port" wording — there are two ports now (STT + TTS). The new
   wording reflects the actual architecture.

3. CHANGELOG entry "Three pieces:" → "Four pieces:" to match the
   actual list (backend, findFirstPiperModel, factory, API route).

Test plan: api 551/551 green; build clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jayzalowitz

Copy link
Copy Markdown
Owner Author

Round-2 review reply:

Already fixed in round 1 (037938c):

  • /api/voice ownership: mounted with requireOwnership (apps/api/src/index.ts:242). Body userId is enforced.

Round-2 findings fixed in 4ffac42:

  • getPort() renamed to getSttPort() to disambiguate from the new TTS port.
  • "single port" wording in the header doc replaced — there are two ports now.
  • CHANGELOG "Three pieces:" → "Four pieces:" to match the actual list.

jayzalowitz and others added 3 commits May 12, 2026 00:47
Mirrors the proven spawn pattern of LlamaCppTextBackend and
WhisperCppSttBackend.

- PiperTtsBackend implements EmbeddedTtsPort. Spawns piper with
  --model <model.onnx> --output_file <tmp> --quiet, writes text to
  stdin, reads the resulting WAV into a Buffer on success. Cleans up
  the tempdir on both success and failure. Bounded inputs (max 8000
  chars; mismatched voice request fails hard).

- findFirstPiperModel(dir) locates the first .onnx model with a paired
  .onnx.json config (Piper requires both). Catches "stray .onnx,
  missing config" at boot instead of at synth time.

- createEmbeddedTtsPort() factory mirrors createEmbeddedSttPort: probe
  the runtime detector for a piper binary, resolve a voice model,
  fall back to NullEmbeddedTtsPort when either is missing.

- POST /api/voice/synthesize consumer. Body { userId, text, voice? }
  → { audioBase64, durationBytes, voice }. 503 + hint when piper not
  installed. GET /capabilities now reports stt + tts blocks alongside
  legacy STT-shaped fields so older clients keep working.

Tests: 15 new vitest cases for PiperTtsBackend + findFirstPiperModel
(mocked node:child_process + node:fs so they run with no piper on
the host). 9 new API tests for /synthesize + the updated capabilities
shape. Workspace: 70/70 turbo tasks green; build clean.

Out of scope (follow-ups): bundling the piper binary + a default
voice model (joins #187 AC#1 + #188 distribution work); auto-speaking
briefings (UI follow-up; the backend it needs is now in main).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… newline, audioBytes naming

Four Copilot findings on PR #255 addressed:

1. /api/voice mount: now goes through requireOwnership so POST
   /transcribe and /synthesize body-userId is checked against the
   authenticated session. The in-router bindUserIdParamOwnership
   only covered :userId path params; body POSTs were unprotected.

2. PiperTtsBackend.spawnPiper: stdout switched from 'pipe' to
   'ignore'. The WAV is read from --output_file, not stdout —
   leaving stdout piped without consuming it could block piper
   once the OS pipe buffer filled. Matches whisper-cli pattern.

3. Piper stdin now gets a trailing \n so the newline-delimited
   reader treats the input as one complete utterance. Test updated.

4. /api/voice/synthesize response: durationBytes → audioBytes.
   "Duration" implied seconds; the value is a byte count of the WAV.
   New endpoint, no compat concern.

Test plan: embedded-llm 86/86, api 551/551 green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ing, fix CHANGELOG pieces count

Three Copilot round-2 findings on PR #255 addressed:

1. getPort() renamed to getSttPort() throughout the file. With both
   STT and TTS ports cached in this router, "getPort" was ambiguous
   and could lead to accidentally calling the wrong cached port as
   the file evolves.

2. Header doc updated: removed the "binaries live behind a single
   port" wording — there are two ports now (STT + TTS). The new
   wording reflects the actual architecture.

3. CHANGELOG entry "Three pieces:" → "Four pieces:" to match the
   actual list (backend, findFirstPiperModel, factory, API route).

Test plan: api 551/551 green; build clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jayzalowitz jayzalowitz force-pushed the jayzalowitz/issue-187-ac4-piper-tts branch from 4ffac42 to 8536a98 Compare May 12, 2026 04:47
@jayzalowitz jayzalowitz merged commit c32de57 into main May 12, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants