feat(#187 partial): real llama.cpp + whisper.cpp backends by jayzalowitz · Pull Request #238 · jayzalowitz/skytwin

jayzalowitz · 2026-05-09T03:15:56Z

Summary

Replaces NullEmbeddedTextPort/NullEmbeddedSttPort stubs in @skytwin/embedded-llm with real subprocess backends spawning llama-cli and whisper-cli.
Validated end-to-end against Homebrew-installed binaries (/opt/homebrew/bin/{llama,whisper}-cli) — confirmed real binary spawn, arg construction, exit-code handling, stderr-tail propagation.
36 new tests (package now 58 total). Tests mock child_process.spawn and fs so they never touch real binaries.

What ships

LlamaCppTextBackend: --no-display-prompt -no-cnv so stdout contains only generated tokens; strips [end of text] / <|im_end|> / <|endoftext|> / </s> markers; configurable timeoutMs (default 120s) and threads; non-zero exit → reject with last 5 stderr lines.
WhisperCppSttBackend: writes audio buffer to mkdtemp dir, runs whisper-cli with -oj JSON output, joins transcription[].text with single spaces, cleans up temp dir in finally (even on failure).
createEmbeddedTextPort() / createEmbeddedSttPort() factories: pick real vs Null based on detection. Model resolution: explicit override → SKYTWIN_LLAMA_MODEL/SKYTWIN_WHISPER_MODEL env var → first matching file in modelDir → Null fallback.

Closes for #187

AC#1 runtime (text gen subprocess layer)
AC#3 runtime (STT subprocess layer)

Bundling default GGUF, downloader UI, Piper TTS, and prompt-eval parity remain open.

Test plan

pnpm --filter @skytwin/embedded-llm build clean
pnpm --filter @skytwin/embedded-llm test — 58 passing
Smoke test with real /opt/homebrew/bin/llama-cli + bogus model path: surfaces llama_model_load: error loading model from binary stderr correctly
CI green on monorepo build/test

🤖 Generated with Claude Code

Replaces NullEmbeddedTextPort/NullEmbeddedSttPort stubs with subprocess backends spawning llama-cli and whisper-cli. Validated end-to-end against Homebrew-installed binaries (/opt/homebrew/bin/{llama,whisper}-cli). LlamaCppTextBackend: spawns llama-cli with --no-display-prompt + -no-cnv so stdout contains only generated tokens; strips end-of-text markers; configurable timeout (default 120s); rejects with last 5 stderr lines on non-zero exit. WhisperCppSttBackend: writes audio to mkdtemp dir, runs whisper-cli with -oj JSON output, joins transcription segments with single spaces, cleans up temp dir in finally. createEmbeddedTextPort/SttPort factories pick real vs Null based on detectEmbeddedRuntimes(); model resolution: explicit override → SKYTWIN_*_MODEL env var → first matching file in modelDir → Null fallback. 36 new unit tests; package total 58 tests, all passing. Tests mock spawn + fs so they never touch real binaries. Closes runtime portion of #187 AC#1 (text gen) and AC#3 (STT). Bundling, downloader UI, Piper TTS, and prompt-eval parity remain open. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds real llama.cpp and whisper.cpp subprocess-backed implementations to @skytwin/embedded-llm, replacing the previous null/stub behavior and providing factories to select real vs. null ports based on runtime detection.

Changes:

Introduces LlamaCppTextBackend and WhisperCppSttBackend that spawn llama-cli / whisper-cli, parse output, handle timeouts, and surface stderr tails on failures.
Adds createEmbeddedTextPort() / createEmbeddedSttPort() factories with model resolution (override → env var → modelDir scan → null fallback).
Adds unit tests for the new backends/factory and documents the feature in CHANGELOG.md.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
packages/embedded-llm/src/whisper-cpp-backend.ts	Implements whisper.cpp STT backend (temp files, spawn, JSON parse, cleanup) + model scan helper
packages/embedded-llm/src/llama-cpp-backend.ts	Implements llama.cpp text backend (spawn, marker stripping) + GGUF model scan helper
packages/embedded-llm/src/index.ts	Re-exports new backends, helpers, and factories
packages/embedded-llm/src/factory.ts	Adds factories to choose real vs. null ports and resolve model paths
packages/embedded-llm/src/tests/whisper-cpp-backend.test.ts	Adds unit tests for whisper backend + helpers
packages/embedded-llm/src/tests/llama-cpp-backend.test.ts	Adds unit tests for llama backend + helper
packages/embedded-llm/src/tests/factory.test.ts	Adds unit tests for factories and resolution precedence
CHANGELOG.md	Documents the new embedded backends and usage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+export interface LlamaCppBackendOptions {
+  binaryPath: string;
+  modelPath: string;
+  contextWindow?: number;


+    this.capabilities = {
+      available: true,
+      modelName: basename(opts.modelPath),
+      contextWindow: opts.contextWindow ?? DEFAULT_CONTEXT_WINDOW,


+    const args = [
+      '-m', this.modelPath,
+      '-p', prompt,
+      '-n', String(maxTokens),
+      '--temp', String(temperature),
+      '--no-display-prompt',
+      '--no-warmup',
+      '-no-cnv',
+    ];


+}
+
+const DEFAULT_TIMEOUT_MS = 120_000;
+const SUPPORTED_FORMATS = ['wav', 'mp3', 'flac', 'ogg'];


+    const audioPath = join(work, 'input.wav');
+    const outBase = join(work, 'output');
+    writeFileSync(audioPath, audio);


+      const timer = setTimeout(() => {
+        if (settled) return;
+        settled = true;
+        child.kill('SIGKILL');
+        reject(new Error(`llama-cli timed out after ${this.timeoutMs}ms`));
+      }, this.timeoutMs);


+      const timer = setTimeout(() => {
+        if (settled) return;
+        settled = true;
+        child.kill('SIGKILL');
+        reject(new Error(`whisper-cli timed out after ${this.timeoutMs}ms`));
+      }, this.timeoutMs);


+    if (this.threads !== null) {
+      args.push('-t', String(this.threads));
+    }


+      const timer = setTimeout(() => {
+        if (settled) return;
+        settled = true;
+        child.kill('SIGKILL');
+        reject(new Error(`whisper-cli timed out after ${this.timeoutMs}ms`));
+      }, this.timeoutMs);
+
+      child.stderr?.on('data', (chunk) => { stderr += chunk.toString('utf8'); });
+      child.on('error', (err) => {
+        if (settled) return;
+        settled = true;
+        clearTimeout(timer);
+        reject(new Error(`failed to spawn whisper-cli: ${err.message}`));
+      });


+export interface CreatePortOverrides {
+  binaryPath?: string;
+  modelPath?: string;
+}


…+ voice STT route Five real implementations: - High-contrast theme variant (4th in the dropdown): pure b/w with bold yellow/blue accents, 2px borders, 3px focus rings. WCAG AAA-aimed. - Text-scale slider: 100/125/150/200% via data-text-scale on <html>; every rem-based size reflows. - Reduced-motion override: respects prefers-reduced-motion by default, user can force on/off via Settings. - Voice-first toggle: persists to localStorage, sets body class so CSS surfaces mic affordances. - /api/voice/transcribe + /api/voice/capabilities/:userId backed by createEmbeddedSttPort() from #238. Real Whisper STT exposed as a reusable HTTP endpoint (mobile #179 will consume it once recording lands; desktop voice-first does today). Sweep: - Skip-link injected as first body child (CSS hides until focused). - Global focus-visible ring across buttons, links, inputs. - prefers-reduced-motion: reduce honored across all themes. - New Accessibility card in Settings with text-size, animations, voice. 8 new voice route tests; api total 423 passing (24 skipped). Out of scope: mobile recording UI (#179 hardware-blocked), axe-core CI gate, multi-language i18n (separate slice). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…+ voice STT route (#244) Five real implementations: - High-contrast theme variant (4th in the dropdown): pure b/w with bold yellow/blue accents, 2px borders, 3px focus rings. WCAG AAA-aimed. - Text-scale slider: 100/125/150/200% via data-text-scale on <html>; every rem-based size reflows. - Reduced-motion override: respects prefers-reduced-motion by default, user can force on/off via Settings. - Voice-first toggle: persists to localStorage, sets body class so CSS surfaces mic affordances. - /api/voice/transcribe + /api/voice/capabilities/:userId backed by createEmbeddedSttPort() from #238. Real Whisper STT exposed as a reusable HTTP endpoint (mobile #179 will consume it once recording lands; desktop voice-first does today). Sweep: - Skip-link injected as first body child (CSS hides until focused). - Global focus-visible ring across buttons, links, inputs. - prefers-reduced-motion: reduce honored across all themes. - New Accessibility card in Settings with text-size, animations, voice. 8 new voice route tests; api total 423 passing (24 skipped). Out of scope: mobile recording UI (#179 hardware-blocked), axe-core CI gate, multi-language i18n (separate slice). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 9, 2026 03:15

Copilot started reviewing on behalf of jayzalowitz May 9, 2026 03:16 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

This was referenced May 9, 2026

Capability loop #N: Embedded LLM with auto-upgrade path (Phi/Llama/Qwen via llama.cpp; Whisper-tiny STT; Piper TTS) #187

Closed

Capability loop #G: Mobile parity (Capabilities + Briefing + voice + push) #179

Closed

jayzalowitz merged commit fc8b7de into main May 9, 2026
10 of 11 checks passed

jayzalowitz deleted the feat/embedded-llm-real-backends branch May 9, 2026 03:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#187 partial): real llama.cpp + whisper.cpp backends#238

feat(#187 partial): real llama.cpp + whisper.cpp backends#238
jayzalowitz merged 1 commit into
mainfrom
feat/embedded-llm-real-backends

jayzalowitz commented May 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jayzalowitz commented May 9, 2026

Summary

What ships

Closes for #187

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants