Skip to content

feat(#187 partial): real llama.cpp + whisper.cpp backends#238

Merged
jayzalowitz merged 1 commit into
mainfrom
feat/embedded-llm-real-backends
May 9, 2026
Merged

feat(#187 partial): real llama.cpp + whisper.cpp backends#238
jayzalowitz merged 1 commit into
mainfrom
feat/embedded-llm-real-backends

Conversation

@jayzalowitz

Copy link
Copy Markdown
Owner

Summary

  • Replaces NullEmbeddedTextPort/NullEmbeddedSttPort stubs in @skytwin/embedded-llm with real subprocess backends spawning llama-cli and whisper-cli.
  • Validated end-to-end against Homebrew-installed binaries (/opt/homebrew/bin/{llama,whisper}-cli) — confirmed real binary spawn, arg construction, exit-code handling, stderr-tail propagation.
  • 36 new tests (package now 58 total). Tests mock child_process.spawn and fs so they never touch real binaries.

What ships

  • LlamaCppTextBackend: --no-display-prompt -no-cnv so stdout contains only generated tokens; strips [end of text] / <|im_end|> / <|endoftext|> / </s> markers; configurable timeoutMs (default 120s) and threads; non-zero exit → reject with last 5 stderr lines.
  • WhisperCppSttBackend: writes audio buffer to mkdtemp dir, runs whisper-cli with -oj JSON output, joins transcription[].text with single spaces, cleans up temp dir in finally (even on failure).
  • createEmbeddedTextPort() / createEmbeddedSttPort() factories: pick real vs Null based on detection. Model resolution: explicit override → SKYTWIN_LLAMA_MODEL/SKYTWIN_WHISPER_MODEL env var → first matching file in modelDir → Null fallback.

Closes for #187

  • AC#1 runtime (text gen subprocess layer)
  • AC#3 runtime (STT subprocess layer)

Bundling default GGUF, downloader UI, Piper TTS, and prompt-eval parity remain open.

Test plan

  • pnpm --filter @skytwin/embedded-llm build clean
  • pnpm --filter @skytwin/embedded-llm test — 58 passing
  • Smoke test with real /opt/homebrew/bin/llama-cli + bogus model path: surfaces llama_model_load: error loading model from binary stderr correctly
  • CI green on monorepo build/test

🤖 Generated with Claude Code

Replaces NullEmbeddedTextPort/NullEmbeddedSttPort stubs with subprocess
backends spawning llama-cli and whisper-cli. Validated end-to-end against
Homebrew-installed binaries (/opt/homebrew/bin/{llama,whisper}-cli).

LlamaCppTextBackend: spawns llama-cli with --no-display-prompt + -no-cnv so
stdout contains only generated tokens; strips end-of-text markers; configurable
timeout (default 120s); rejects with last 5 stderr lines on non-zero exit.

WhisperCppSttBackend: writes audio to mkdtemp dir, runs whisper-cli with -oj
JSON output, joins transcription segments with single spaces, cleans up temp
dir in finally.

createEmbeddedTextPort/SttPort factories pick real vs Null based on
detectEmbeddedRuntimes(); model resolution: explicit override → SKYTWIN_*_MODEL
env var → first matching file in modelDir → Null fallback.

36 new unit tests; package total 58 tests, all passing. Tests mock spawn + fs
so they never touch real binaries.

Closes runtime portion of #187 AC#1 (text gen) and AC#3 (STT).
Bundling, downloader UI, Piper TTS, and prompt-eval parity remain open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 9, 2026 03:15

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds real llama.cpp and whisper.cpp subprocess-backed implementations to @skytwin/embedded-llm, replacing the previous null/stub behavior and providing factories to select real vs. null ports based on runtime detection.

Changes:

  • Introduces LlamaCppTextBackend and WhisperCppSttBackend that spawn llama-cli / whisper-cli, parse output, handle timeouts, and surface stderr tails on failures.
  • Adds createEmbeddedTextPort() / createEmbeddedSttPort() factories with model resolution (override → env var → modelDir scan → null fallback).
  • Adds unit tests for the new backends/factory and documents the feature in CHANGELOG.md.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
packages/embedded-llm/src/whisper-cpp-backend.ts Implements whisper.cpp STT backend (temp files, spawn, JSON parse, cleanup) + model scan helper
packages/embedded-llm/src/llama-cpp-backend.ts Implements llama.cpp text backend (spawn, marker stripping) + GGUF model scan helper
packages/embedded-llm/src/index.ts Re-exports new backends, helpers, and factories
packages/embedded-llm/src/factory.ts Adds factories to choose real vs. null ports and resolve model paths
packages/embedded-llm/src/tests/whisper-cpp-backend.test.ts Adds unit tests for whisper backend + helpers
packages/embedded-llm/src/tests/llama-cpp-backend.test.ts Adds unit tests for llama backend + helper
packages/embedded-llm/src/tests/factory.test.ts Adds unit tests for factories and resolution precedence
CHANGELOG.md Documents the new embedded backends and usage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

export interface LlamaCppBackendOptions {
binaryPath: string;
modelPath: string;
contextWindow?: number;
this.capabilities = {
available: true,
modelName: basename(opts.modelPath),
contextWindow: opts.contextWindow ?? DEFAULT_CONTEXT_WINDOW,
Comment on lines +45 to +53
const args = [
'-m', this.modelPath,
'-p', prompt,
'-n', String(maxTokens),
'--temp', String(temperature),
'--no-display-prompt',
'--no-warmup',
'-no-cnv',
];
}

const DEFAULT_TIMEOUT_MS = 120_000;
const SUPPORTED_FORMATS = ['wav', 'mp3', 'flac', 'ogg'];
Comment on lines +53 to +55
const audioPath = join(work, 'input.wav');
const outBase = join(work, 'output');
writeFileSync(audioPath, audio);
Comment on lines +64 to +69
const timer = setTimeout(() => {
if (settled) return;
settled = true;
child.kill('SIGKILL');
reject(new Error(`llama-cli timed out after ${this.timeoutMs}ms`));
}, this.timeoutMs);
Comment on lines +91 to +96
const timer = setTimeout(() => {
if (settled) return;
settled = true;
child.kill('SIGKILL');
reject(new Error(`whisper-cli timed out after ${this.timeoutMs}ms`));
}, this.timeoutMs);
Comment on lines +68 to +70
if (this.threads !== null) {
args.push('-t', String(this.threads));
}
Comment on lines +91 to +104
const timer = setTimeout(() => {
if (settled) return;
settled = true;
child.kill('SIGKILL');
reject(new Error(`whisper-cli timed out after ${this.timeoutMs}ms`));
}, this.timeoutMs);

child.stderr?.on('data', (chunk) => { stderr += chunk.toString('utf8'); });
child.on('error', (err) => {
if (settled) return;
settled = true;
clearTimeout(timer);
reject(new Error(`failed to spawn whisper-cli: ${err.message}`));
});
Comment on lines +13 to +16
export interface CreatePortOverrides {
binaryPath?: string;
modelPath?: string;
}
@jayzalowitz jayzalowitz merged commit fc8b7de into main May 9, 2026
10 of 11 checks passed
@jayzalowitz jayzalowitz deleted the feat/embedded-llm-real-backends branch May 9, 2026 03:25
jayzalowitz added a commit that referenced this pull request May 9, 2026
…+ voice STT route

Five real implementations:
- High-contrast theme variant (4th in the dropdown): pure b/w with bold
  yellow/blue accents, 2px borders, 3px focus rings. WCAG AAA-aimed.
- Text-scale slider: 100/125/150/200% via data-text-scale on <html>;
  every rem-based size reflows.
- Reduced-motion override: respects prefers-reduced-motion by default,
  user can force on/off via Settings.
- Voice-first toggle: persists to localStorage, sets body class so CSS
  surfaces mic affordances.
- /api/voice/transcribe + /api/voice/capabilities/:userId backed by
  createEmbeddedSttPort() from #238. Real Whisper STT exposed as a
  reusable HTTP endpoint (mobile #179 will consume it once recording
  lands; desktop voice-first does today).

Sweep:
- Skip-link injected as first body child (CSS hides until focused).
- Global focus-visible ring across buttons, links, inputs.
- prefers-reduced-motion: reduce honored across all themes.
- New Accessibility card in Settings with text-size, animations, voice.

8 new voice route tests; api total 423 passing (24 skipped).

Out of scope: mobile recording UI (#179 hardware-blocked), axe-core CI
gate, multi-language i18n (separate slice).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jayzalowitz added a commit that referenced this pull request May 9, 2026
…+ voice STT route (#244)

Five real implementations:
- High-contrast theme variant (4th in the dropdown): pure b/w with bold
  yellow/blue accents, 2px borders, 3px focus rings. WCAG AAA-aimed.
- Text-scale slider: 100/125/150/200% via data-text-scale on <html>;
  every rem-based size reflows.
- Reduced-motion override: respects prefers-reduced-motion by default,
  user can force on/off via Settings.
- Voice-first toggle: persists to localStorage, sets body class so CSS
  surfaces mic affordances.
- /api/voice/transcribe + /api/voice/capabilities/:userId backed by
  createEmbeddedSttPort() from #238. Real Whisper STT exposed as a
  reusable HTTP endpoint (mobile #179 will consume it once recording
  lands; desktop voice-first does today).

Sweep:
- Skip-link injected as first body child (CSS hides until focused).
- Global focus-visible ring across buttons, links, inputs.
- prefers-reduced-motion: reduce honored across all themes.
- New Accessibility card in Settings with text-size, animations, voice.

8 new voice route tests; api total 423 passing (24 skipped).

Out of scope: mobile recording UI (#179 hardware-blocked), axe-core CI
gate, multi-language i18n (separate slice).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants