feat(#187 AC#5): embedded LLM model registry + upgrade-check#246
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a static, package-versioned embedded LLM model registry (keyed by RAM bracket) and exposes it through new API endpoints so clients can pick a default model and check for recommended upgrades without runtime network discovery.
Changes:
- Introduces
MODEL_REGISTRYplus helper functions (findById,listByBracket,recommendDefault,checkForUpgrade) in@skytwin/embedded-llm. - Adds unit tests for the embedded-llm model registry helpers.
- Adds
/api/embedded-llm/{registry,upgrade-check,recommend-default}routes and wires them into the API app; updates API dependencies.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| pnpm-lock.yaml | Links @skytwin/embedded-llm workspace package into the monorepo dependency graph. |
| packages/embedded-llm/src/model-registry.ts | Defines the static registry and upgrade/default-selection helpers. |
| packages/embedded-llm/src/index.ts | Re-exports the new registry APIs from the package entrypoint. |
| packages/embedded-llm/src/tests/model-registry.test.ts | Adds unit tests covering registry invariants and helper behavior. |
| apps/api/src/routes/embedded-llm.ts | Adds HTTP endpoints to expose the registry and upgrade/default selection. |
| apps/api/src/index.ts | Registers the new embedded-llm router under /api/embedded-llm. |
| apps/api/package.json | Adds @skytwin/embedded-llm as an API dependency. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+137
to
+161
| */ | ||
| export function checkForUpgrade( | ||
| currentModelId: string, | ||
| registry: readonly ModelEntry[] = MODEL_REGISTRY, | ||
| ): UpgradeRecommendation | null { | ||
| const current = registry.find((m) => m.id === currentModelId); | ||
| if (!current) return null; | ||
|
|
||
| // Look for a strictly-better model in the same bracket. "Strictly | ||
| // better" = higher qualityScore. We don't recommend cross-bracket | ||
| // upgrades automatically — going from 8GB to 16GB might exceed the | ||
| // user's RAM. | ||
| const candidates = registry.filter( | ||
| (m) => m.ramBracket === current.ramBracket && m.qualityScore > current.qualityScore, | ||
| ); | ||
| if (candidates.length === 0) return null; | ||
|
|
||
| const best = candidates.reduce((a, b) => (a.qualityScore > b.qualityScore ? a : b)); | ||
| const deltaPct = Math.round(((best.qualityScore - current.qualityScore) / current.qualityScore) * 100); | ||
|
|
||
| return { | ||
| currentId: currentModelId, | ||
| recommended: best, | ||
| qualityDeltaPct: deltaPct, | ||
| rationale: `${best.displayName} scores ${best.qualityScore}/100 on our curated benchmark vs your current ${current.qualityScore}/100. Same RAM bracket (${current.ramBracket}), ~${(best.approxBytes / 1024 / 1024 / 1024).toFixed(1)}GB download.`, |
Comment on lines
+55
to
+67
| export const MODEL_REGISTRY: readonly ModelEntry[] = Object.freeze([ | ||
| { | ||
| id: 'phi-3.5-mini-q4', | ||
| displayName: 'Phi-3.5 Mini (Q4)', | ||
| ramBracket: '4gb', | ||
| approxBytes: 2.4 * 1024 * 1024 * 1024, | ||
| contextWindow: 4096, | ||
| qualityScore: 72, | ||
| downloadUrl: 'https://huggingface.co/microsoft/Phi-3.5-mini-instruct-gguf/resolve/main/Phi-3.5-mini-instruct-q4.gguf', | ||
| sha256: '0000000000000000000000000000000000000000000000000000000000000000', | ||
| version: 1, | ||
| }, | ||
| { |
Comment on lines
+56
to
+66
| { | ||
| id: 'phi-3.5-mini-q4', | ||
| displayName: 'Phi-3.5 Mini (Q4)', | ||
| ramBracket: '4gb', | ||
| approxBytes: 2.4 * 1024 * 1024 * 1024, | ||
| contextWindow: 4096, | ||
| qualityScore: 72, | ||
| downloadUrl: 'https://huggingface.co/microsoft/Phi-3.5-mini-instruct-gguf/resolve/main/Phi-3.5-mini-instruct-q4.gguf', | ||
| sha256: '0000000000000000000000000000000000000000000000000000000000000000', | ||
| version: 1, | ||
| }, |
| if (candidates.length === 0) return null; | ||
|
|
||
| const best = candidates.reduce((a, b) => (a.qualityScore > b.qualityScore ? a : b)); | ||
| const deltaPct = Math.round(((best.qualityScore - current.qualityScore) / current.qualityScore) * 100); |
Comment on lines
+13
to
+36
| * GET /api/embedded-llm/registry | ||
| * Returns the curated model list grouped by RAM bracket. | ||
| * | ||
| * GET /api/embedded-llm/upgrade-check?currentId=<id> | ||
| * Returns `{ upgrade: UpgradeRecommendation | null }`. Clients call | ||
| * this on app start to decide whether to surface the "your twin can | ||
| * be N% smarter" prompt. | ||
| * | ||
| * GET /api/embedded-llm/recommend-default?bracket=<4gb|8gb|16gb|32gb-plus> | ||
| * Returns the highest-quality model in the bracket — used by the | ||
| * first-run flow. | ||
| * | ||
| * The registry itself is a static, hand-curated list shipped with the | ||
| * `@skytwin/embedded-llm` package. No external HTTP. The downloader UI | ||
| * (#187 AC#2) consumes the `downloadUrl` field; this API just exposes | ||
| * the catalog. | ||
| */ | ||
| export function createEmbeddedLlmRouter(): Router { | ||
| const router = Router(); | ||
| bindUserIdParamOwnership(router); | ||
|
|
||
| router.get('/registry', (_req, res) => { | ||
| res.json({ models: MODEL_REGISTRY }); | ||
| }); |
Comment on lines
+30
to
+33
| export function createEmbeddedLlmRouter(): Router { | ||
| const router = Router(); | ||
| bindUserIdParamOwnership(router); | ||
|
|
Comment on lines
+30
to
+57
| export function createEmbeddedLlmRouter(): Router { | ||
| const router = Router(); | ||
| bindUserIdParamOwnership(router); | ||
|
|
||
| router.get('/registry', (_req, res) => { | ||
| res.json({ models: MODEL_REGISTRY }); | ||
| }); | ||
|
|
||
| router.get('/upgrade-check', (req, res) => { | ||
| const currentId = req.query['currentId']; | ||
| if (typeof currentId !== 'string' || currentId.length === 0) { | ||
| res.status(400).json({ error: 'currentId query param required' }); | ||
| return; | ||
| } | ||
| const upgrade = checkForUpgrade(currentId); | ||
| res.json({ upgrade }); | ||
| }); | ||
|
|
||
| router.get('/recommend-default', (req, res) => { | ||
| const bracket = req.query['bracket']; | ||
| const valid: ReadonlyArray<RamBracket> = ['4gb', '8gb', '16gb', '32gb-plus']; | ||
| if (typeof bracket !== 'string' || !valid.includes(bracket as RamBracket)) { | ||
| res.status(400).json({ error: 'bracket must be one of 4gb / 8gb / 16gb / 32gb-plus' }); | ||
| return; | ||
| } | ||
| const model = recommendDefault(bracket as RamBracket); | ||
| res.json({ model }); | ||
| }); |
5e80b81 to
74107c5
Compare
Closes AC#5: hand-curated static model registry keyed by RAM bracket
(4gb/8gb/16gb/32gb-plus); upgrade-check returns a strictly-better peer
in the SAME bracket so we never auto-recommend a cross-bracket
"upgrade" that could OOM the user's machine.
Ships:
- model-registry.ts: MODEL_REGISTRY (5 models, frozen), findById,
listByBracket (returns copy), recommendDefault, checkForUpgrade
(returns { recommended, qualityDeltaPct, rationale } or null).
- /api/embedded-llm/{registry,upgrade-check,recommend-default}.
- @skytwin/embedded-llm dep added to apps/api.
11 new unit tests in embedded-llm; package 71 passing total.
API 415 passing.
Out of scope: AC#2 downloader UI (separate surface), AC#1 bundling
(distribution change), AC#8 cost dashboard (already implicitly
satisfied — embedded provider passes spendCents:0).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
74107c5 to
dfce5a6
Compare
This was referenced May 9, 2026
jayzalowitz
added a commit
that referenced
this pull request
May 9, 2026
* feat(#187 AC#2): embedded LLM model downloader The launch-gating piece: a fresh install can now fetch its own model with no config, no API keys, no per-message cost. Settings → Local AI brain card detects RAM bracket via navigator.deviceMemory, recommends the highest-quality model that fits, click Download → 3-15 minutes later the twin works fully offline. Substrate (#246) gave us the catalog. This PR makes downloading happen. Ships: - 039 migration: model_downloads (status state machine, paused_at, completed_at, sha256_expected, target_path) - modelDownloadRepository: create, findActive (idempotency on user+model), updateProgress, setStatus, recoverOrphanedDownloads - apps/api/src/embedded-llm/downloader.ts: streams fetch() body to <target_path>.partial, atomic rename on success; HTTP Range for resume (graceful fallback when server ignores Range); SHA-256 verify post-download (skipped when registry hash is placeholder all-zeros); in-flight AbortController map for pause/cancel; default dir ~/.skytwin/models/llama matches runtime detector's read path. - /api/embedded-llm/downloads/{start,:id,user/:userId,pause,resume,cancel} endpoints; boot-time recovery wired into apps/api/src/index.ts. - apps/web/public/js/components/embedded-llm-card.js: detects bracket, recommends default, override select, progress bar (reuses .confidence-bar), pause/resume/cancel, polls every 1s during active. 14 new route tests; api total 477 passing. All 34 packages build clean. Out of scope: real SHA-256 hashes in registry (currently placeholder), first-run dashboard integration, paused-on-restart proactive toast. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#187 AC#2): address Copilot review — 3 IDOR + OOM + races 12 substantive findings from Copilot's review of PR #247, all fixed: Security (3 IDOR vulns): - POST /downloads/start: requireOwnership middleware (body.userId must match req.authenticatedUserId) - GET /downloads/:id and pause/resume/cancel: loadOwnedDownload() helper fetches row, rejects 403 if row.user_id ≠ authenticated user Correctness: - SHA-256 verify: stream via createReadStream so multi-GB GGUFs don't OOM the API process (was readFile() into memory) - Schema: UNIQUE partial index on (user_id, model_id) for non-terminal statuses; startDownload() catches 23505 race - inFlight guard in startDownload() prevents two concurrent runners writing same .partial when caller rapidly retries - Persist bytes_downloaded=0 immediately when server ignores Range - Persist total_bytes when Content-Length differs from registry approxBytes (added updateTotalBytes repo method) Frontend: - Stable data-role selectors (embedded-llm-progress-text / -status) replace the brittle "labelLine.firstElementChild" walk that was clobbering action buttons after first poll - ensureListener() no longer closes over container/userId from first mount; re-derives both at click time via getElementById(CARD_TARGET_ID) + getCurrentUserId() Tests: - Switched from injecting req.user.id to req.authenticatedUserId to match production sessionAuth contract - 7 new cross-user 403 tests covering every mutating endpoint + dev-bypass-allowed path. Total: 25 tests (was 14), all passing. Build clean. 486 API tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#187 AC#2): cancel through verify/install + stale partial + dead code Four findings from Copilot's re-review of the security/correctness fixes: - **Cancel couldn't stop verify/install**: the runner removed the inFlight handle BEFORE verification, so cancelDownload() couldn't abort SHA-256 or rename. After verify completed, the runner would overwrite the user's cancelled status with complete/failed. Now the handle stays in inFlight through verify+install, and the runner checks handle.cancelled at every phase boundary (pre-verify, post-hash, pre-install, post-rename) and bails without writing terminal status if cancel fired. inFlight.delete moved to the final boundary. - **Stale .partial corrupting fresh resume**: a previous attempt that failed to delete .partial (e.g., Windows file lock during cancel) could cause a brand-new download row to "resume" from stale bytes. Now we only resume when bytes_downloaded > 0 on the row; otherwise we delete any pre-existing partial before starting fresh. - **UI showed Pause for verifying/installing but backend rejected it**: pauseDownload() returns ok:false for those statuses. Added PAUSABLE_STATUSES set (pending + downloading only); the action bar shows Cancel-only during verify/install. Cancel now actually works through those phases (per the runner change above). - **ensureDirectory() dead code**: exported but unused. Removed, along with the now-unused `dirname` import. Build clean. 25 route tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes AC#5 of #187: hand-curated static model registry keyed by RAM bracket, with an upgrade-check API. Static list ships with
@skytwin/embedded-llm— no network fetches at runtime; bumping a model'sversionis the trigger for the "your twin can be N% smarter" prompt.Why
Auto-fetching from HuggingFace risks dependency-ghosting (model disappears, gets re-quanted with worse quality, URL scheme changes). A hand-curated list ships with the package, gets PR-reviewed, and gives users a stable upgrade path.
What ships
MODEL_REGISTRY— 5 frozen entries across 4 brackets (4gb / 8gb / 16gb / 32gb-plus)checkForUpgrade(currentId)— returns recommended upgrade in same bracket only (never auto-recommends cross-bracket since RAM might be exceeded)recommendDefault(bracket)for first-run,findById,listByBracket(returns copy)/api/embedded-llm/{registry,upgrade-check,recommend-default}endpointsCloses for #187
spendCents:0; no rendering change neededTest plan
pnpm --filter @skytwin/embedded-llm test— 71 passingpnpm --filter @skytwin/api test— 415 passing (24 skipped)pnpm buildclean across all 34 packages🤖 Generated with Claude Code