feat(#187 AC#5): embedded LLM model registry + upgrade-check by jayzalowitz · Pull Request #246 · jayzalowitz/skytwin

jayzalowitz · 2026-05-09T05:22:02Z

Summary

Closes AC#5 of #187: hand-curated static model registry keyed by RAM bracket, with an upgrade-check API. Static list ships with @skytwin/embedded-llm — no network fetches at runtime; bumping a model's version is the trigger for the "your twin can be N% smarter" prompt.

Why

Auto-fetching from HuggingFace risks dependency-ghosting (model disappears, gets re-quanted with worse quality, URL scheme changes). A hand-curated list ships with the package, gets PR-reviewed, and gives users a stable upgrade path.

What ships

MODEL_REGISTRY — 5 frozen entries across 4 brackets (4gb / 8gb / 16gb / 32gb-plus)
checkForUpgrade(currentId) — returns recommended upgrade in same bracket only (never auto-recommends cross-bracket since RAM might be exceeded)
recommendDefault(bracket) for first-run, findById, listByBracket (returns copy)
/api/embedded-llm/{registry,upgrade-check,recommend-default} endpoints
11 new unit tests (embedded-llm package: 71 passing)

Closes for #187

✅ AC#5 (registry + version-check)
☐ AC#2 (downloader UI — substantial UI surface, separate slice)
☐ AC#1 (default GGUF in installer payload — distribution work)
AC#8 (cost dashboard zero-cost) — implicitly satisfied by embedded provider passing spendCents:0; no rendering change needed

Test plan

pnpm --filter @skytwin/embedded-llm test — 71 passing
pnpm --filter @skytwin/api test — 415 passing (24 skipped)
pnpm build clean across all 34 packages
CI green

🤖 Generated with Claude Code

Copilot

Pull request overview

Adds a static, package-versioned embedded LLM model registry (keyed by RAM bracket) and exposes it through new API endpoints so clients can pick a default model and check for recommended upgrades without runtime network discovery.

Changes:

Introduces MODEL_REGISTRY plus helper functions (findById, listByBracket, recommendDefault, checkForUpgrade) in @skytwin/embedded-llm.
Adds unit tests for the embedded-llm model registry helpers.
Adds /api/embedded-llm/{registry,upgrade-check,recommend-default} routes and wires them into the API app; updates API dependencies.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
pnpm-lock.yaml	Links `@skytwin/embedded-llm` workspace package into the monorepo dependency graph.
packages/embedded-llm/src/model-registry.ts	Defines the static registry and upgrade/default-selection helpers.
packages/embedded-llm/src/index.ts	Re-exports the new registry APIs from the package entrypoint.
packages/embedded-llm/src/tests/model-registry.test.ts	Adds unit tests covering registry invariants and helper behavior.
apps/api/src/routes/embedded-llm.ts	Adds HTTP endpoints to expose the registry and upgrade/default selection.
apps/api/src/index.ts	Registers the new embedded-llm router under `/api/embedded-llm`.
apps/api/package.json	Adds `@skytwin/embedded-llm` as an API dependency.

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+ */
+export function checkForUpgrade(
+  currentModelId: string,
+  registry: readonly ModelEntry[] = MODEL_REGISTRY,
+): UpgradeRecommendation | null {
+  const current = registry.find((m) => m.id === currentModelId);
+  if (!current) return null;
+
+  // Look for a strictly-better model in the same bracket. "Strictly
+  // better" = higher qualityScore. We don't recommend cross-bracket
+  // upgrades automatically — going from 8GB to 16GB might exceed the
+  // user's RAM.
+  const candidates = registry.filter(
+    (m) => m.ramBracket === current.ramBracket && m.qualityScore > current.qualityScore,
+  );
+  if (candidates.length === 0) return null;
+
+  const best = candidates.reduce((a, b) => (a.qualityScore > b.qualityScore ? a : b));
+  const deltaPct = Math.round(((best.qualityScore - current.qualityScore) / current.qualityScore) * 100);
+
+  return {
+    currentId: currentModelId,
+    recommended: best,
+    qualityDeltaPct: deltaPct,
+    rationale: `${best.displayName} scores ${best.qualityScore}/100 on our curated benchmark vs your current ${current.qualityScore}/100. Same RAM bracket (${current.ramBracket}), ~${(best.approxBytes / 1024 / 1024 / 1024).toFixed(1)}GB download.`,


+export const MODEL_REGISTRY: readonly ModelEntry[] = Object.freeze([
+  {
+    id: 'phi-3.5-mini-q4',
+    displayName: 'Phi-3.5 Mini (Q4)',
+    ramBracket: '4gb',
+    approxBytes: 2.4 * 1024 * 1024 * 1024,
+    contextWindow: 4096,
+    qualityScore: 72,
+    downloadUrl: 'https://huggingface.co/microsoft/Phi-3.5-mini-instruct-gguf/resolve/main/Phi-3.5-mini-instruct-q4.gguf',
+    sha256: '0000000000000000000000000000000000000000000000000000000000000000',
+    version: 1,
+  },
+  {


+  {
+    id: 'phi-3.5-mini-q4',
+    displayName: 'Phi-3.5 Mini (Q4)',
+    ramBracket: '4gb',
+    approxBytes: 2.4 * 1024 * 1024 * 1024,
+    contextWindow: 4096,
+    qualityScore: 72,
+    downloadUrl: 'https://huggingface.co/microsoft/Phi-3.5-mini-instruct-gguf/resolve/main/Phi-3.5-mini-instruct-q4.gguf',
+    sha256: '0000000000000000000000000000000000000000000000000000000000000000',
+    version: 1,
+  },


+  if (candidates.length === 0) return null;
+
+  const best = candidates.reduce((a, b) => (a.qualityScore > b.qualityScore ? a : b));
+  const deltaPct = Math.round(((best.qualityScore - current.qualityScore) / current.qualityScore) * 100);


+ *   GET  /api/embedded-llm/registry
+ *     Returns the curated model list grouped by RAM bracket.
+ *
+ *   GET  /api/embedded-llm/upgrade-check?currentId=<id>
+ *     Returns `{ upgrade: UpgradeRecommendation | null }`. Clients call
+ *     this on app start to decide whether to surface the "your twin can
+ *     be N% smarter" prompt.
+ *
+ *   GET  /api/embedded-llm/recommend-default?bracket=<4gb|8gb|16gb|32gb-plus>
+ *     Returns the highest-quality model in the bracket — used by the
+ *     first-run flow.
+ *
+ * The registry itself is a static, hand-curated list shipped with the
+ * `@skytwin/embedded-llm` package. No external HTTP. The downloader UI
+ * (#187 AC#2) consumes the `downloadUrl` field; this API just exposes
+ * the catalog.
+ */
+export function createEmbeddedLlmRouter(): Router {
+  const router = Router();
+  bindUserIdParamOwnership(router);
+
+  router.get('/registry', (_req, res) => {
+    res.json({ models: MODEL_REGISTRY });
+  });


+export function createEmbeddedLlmRouter(): Router {
+  const router = Router();
+  bindUserIdParamOwnership(router);
+


+export function createEmbeddedLlmRouter(): Router {
+  const router = Router();
+  bindUserIdParamOwnership(router);
+
+  router.get('/registry', (_req, res) => {
+    res.json({ models: MODEL_REGISTRY });
+  });
+
+  router.get('/upgrade-check', (req, res) => {
+    const currentId = req.query['currentId'];
+    if (typeof currentId !== 'string' || currentId.length === 0) {
+      res.status(400).json({ error: 'currentId query param required' });
+      return;
+    }
+    const upgrade = checkForUpgrade(currentId);
+    res.json({ upgrade });
+  });
+
+  router.get('/recommend-default', (req, res) => {
+    const bracket = req.query['bracket'];
+    const valid: ReadonlyArray<RamBracket> = ['4gb', '8gb', '16gb', '32gb-plus'];
+    if (typeof bracket !== 'string' || !valid.includes(bracket as RamBracket)) {
+      res.status(400).json({ error: 'bracket must be one of 4gb / 8gb / 16gb / 32gb-plus' });
+      return;
+    }
+    const model = recommendDefault(bracket as RamBracket);
+    res.json({ model });
+  });


Closes AC#5: hand-curated static model registry keyed by RAM bracket (4gb/8gb/16gb/32gb-plus); upgrade-check returns a strictly-better peer in the SAME bracket so we never auto-recommend a cross-bracket "upgrade" that could OOM the user's machine. Ships: - model-registry.ts: MODEL_REGISTRY (5 models, frozen), findById, listByBracket (returns copy), recommendDefault, checkForUpgrade (returns { recommended, qualityDeltaPct, rationale } or null). - /api/embedded-llm/{registry,upgrade-check,recommend-default}. - @skytwin/embedded-llm dep added to apps/api. 11 new unit tests in embedded-llm; package 71 passing total. API 415 passing. Out of scope: AC#2 downloader UI (separate surface), AC#1 bundling (distribution change), AC#8 cost dashboard (already implicitly satisfied — embedded provider passes spendCents:0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(#187 AC#2): embedded LLM model downloader The launch-gating piece: a fresh install can now fetch its own model with no config, no API keys, no per-message cost. Settings → Local AI brain card detects RAM bracket via navigator.deviceMemory, recommends the highest-quality model that fits, click Download → 3-15 minutes later the twin works fully offline. Substrate (#246) gave us the catalog. This PR makes downloading happen. Ships: - 039 migration: model_downloads (status state machine, paused_at, completed_at, sha256_expected, target_path) - modelDownloadRepository: create, findActive (idempotency on user+model), updateProgress, setStatus, recoverOrphanedDownloads - apps/api/src/embedded-llm/downloader.ts: streams fetch() body to <target_path>.partial, atomic rename on success; HTTP Range for resume (graceful fallback when server ignores Range); SHA-256 verify post-download (skipped when registry hash is placeholder all-zeros); in-flight AbortController map for pause/cancel; default dir ~/.skytwin/models/llama matches runtime detector's read path. - /api/embedded-llm/downloads/{start,:id,user/:userId,pause,resume,cancel} endpoints; boot-time recovery wired into apps/api/src/index.ts. - apps/web/public/js/components/embedded-llm-card.js: detects bracket, recommends default, override select, progress bar (reuses .confidence-bar), pause/resume/cancel, polls every 1s during active. 14 new route tests; api total 477 passing. All 34 packages build clean. Out of scope: real SHA-256 hashes in registry (currently placeholder), first-run dashboard integration, paused-on-restart proactive toast. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#187 AC#2): address Copilot review — 3 IDOR + OOM + races 12 substantive findings from Copilot's review of PR #247, all fixed: Security (3 IDOR vulns): - POST /downloads/start: requireOwnership middleware (body.userId must match req.authenticatedUserId) - GET /downloads/:id and pause/resume/cancel: loadOwnedDownload() helper fetches row, rejects 403 if row.user_id ≠ authenticated user Correctness: - SHA-256 verify: stream via createReadStream so multi-GB GGUFs don't OOM the API process (was readFile() into memory) - Schema: UNIQUE partial index on (user_id, model_id) for non-terminal statuses; startDownload() catches 23505 race - inFlight guard in startDownload() prevents two concurrent runners writing same .partial when caller rapidly retries - Persist bytes_downloaded=0 immediately when server ignores Range - Persist total_bytes when Content-Length differs from registry approxBytes (added updateTotalBytes repo method) Frontend: - Stable data-role selectors (embedded-llm-progress-text / -status) replace the brittle "labelLine.firstElementChild" walk that was clobbering action buttons after first poll - ensureListener() no longer closes over container/userId from first mount; re-derives both at click time via getElementById(CARD_TARGET_ID) + getCurrentUserId() Tests: - Switched from injecting req.user.id to req.authenticatedUserId to match production sessionAuth contract - 7 new cross-user 403 tests covering every mutating endpoint + dev-bypass-allowed path. Total: 25 tests (was 14), all passing. Build clean. 486 API tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#187 AC#2): cancel through verify/install + stale partial + dead code Four findings from Copilot's re-review of the security/correctness fixes: - **Cancel couldn't stop verify/install**: the runner removed the inFlight handle BEFORE verification, so cancelDownload() couldn't abort SHA-256 or rename. After verify completed, the runner would overwrite the user's cancelled status with complete/failed. Now the handle stays in inFlight through verify+install, and the runner checks handle.cancelled at every phase boundary (pre-verify, post-hash, pre-install, post-rename) and bails without writing terminal status if cancel fired. inFlight.delete moved to the final boundary. - **Stale .partial corrupting fresh resume**: a previous attempt that failed to delete .partial (e.g., Windows file lock during cancel) could cause a brand-new download row to "resume" from stale bytes. Now we only resume when bytes_downloaded > 0 on the row; otherwise we delete any pre-existing partial before starting fresh. - **UI showed Pause for verifying/installing but backend rejected it**: pauseDownload() returns ok:false for those statuses. Added PAUSABLE_STATUSES set (pending + downloading only); the action bar shows Cancel-only during verify/install. Cancel now actually works through those phases (per the runner change above). - **ensureDirectory() dead code**: exported but unused. Removed, along with the now-unused `dirname` import. Build clean. 25 route tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 9, 2026 05:22

Copilot started reviewing on behalf of jayzalowitz May 9, 2026 05:22 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

jayzalowitz force-pushed the feat/embedded-llm-closeout branch 2 times, most recently from 5e80b81 to 74107c5 Compare May 9, 2026 05:55

jayzalowitz mentioned this pull request May 9, 2026

Capability loop #N: Embedded LLM with auto-upgrade path (Phi/Llama/Qwen via llama.cpp; Whisper-tiny STT; Piper TTS) #187

Closed

jayzalowitz force-pushed the feat/embedded-llm-closeout branch from 74107c5 to dfce5a6 Compare May 9, 2026 08:50

jayzalowitz merged commit 699dc35 into main May 9, 2026
8 checks passed

This was referenced May 9, 2026

Epic: Capability Acquisition Loop — an MCP-native autonomous twin (OSS launch v1) #195

Open

feat(#187 AC#2): embedded LLM model downloader #247

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#187 AC#5): embedded LLM model registry + upgrade-check#246

feat(#187 AC#5): embedded LLM model registry + upgrade-check#246
jayzalowitz merged 1 commit into
mainfrom
feat/embedded-llm-closeout

jayzalowitz commented May 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jayzalowitz commented May 9, 2026

Summary

Why

What ships

Closes for #187

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants