Skip to content

feat(#187 AC#5): embedded LLM model registry + upgrade-check#246

Merged
jayzalowitz merged 1 commit into
mainfrom
feat/embedded-llm-closeout
May 9, 2026
Merged

feat(#187 AC#5): embedded LLM model registry + upgrade-check#246
jayzalowitz merged 1 commit into
mainfrom
feat/embedded-llm-closeout

Conversation

@jayzalowitz

Copy link
Copy Markdown
Owner

Summary

Closes AC#5 of #187: hand-curated static model registry keyed by RAM bracket, with an upgrade-check API. Static list ships with @skytwin/embedded-llm — no network fetches at runtime; bumping a model's version is the trigger for the "your twin can be N% smarter" prompt.

Why

Auto-fetching from HuggingFace risks dependency-ghosting (model disappears, gets re-quanted with worse quality, URL scheme changes). A hand-curated list ships with the package, gets PR-reviewed, and gives users a stable upgrade path.

What ships

  • MODEL_REGISTRY — 5 frozen entries across 4 brackets (4gb / 8gb / 16gb / 32gb-plus)
  • checkForUpgrade(currentId) — returns recommended upgrade in same bracket only (never auto-recommends cross-bracket since RAM might be exceeded)
  • recommendDefault(bracket) for first-run, findById, listByBracket (returns copy)
  • /api/embedded-llm/{registry,upgrade-check,recommend-default} endpoints
  • 11 new unit tests (embedded-llm package: 71 passing)

Closes for #187

  • ✅ AC#5 (registry + version-check)
  • ☐ AC#2 (downloader UI — substantial UI surface, separate slice)
  • ☐ AC#1 (default GGUF in installer payload — distribution work)
  • AC#8 (cost dashboard zero-cost) — implicitly satisfied by embedded provider passing spendCents:0; no rendering change needed

Test plan

  • pnpm --filter @skytwin/embedded-llm test — 71 passing
  • pnpm --filter @skytwin/api test — 415 passing (24 skipped)
  • pnpm build clean across all 34 packages
  • CI green

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 9, 2026 05:22

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a static, package-versioned embedded LLM model registry (keyed by RAM bracket) and exposes it through new API endpoints so clients can pick a default model and check for recommended upgrades without runtime network discovery.

Changes:

  • Introduces MODEL_REGISTRY plus helper functions (findById, listByBracket, recommendDefault, checkForUpgrade) in @skytwin/embedded-llm.
  • Adds unit tests for the embedded-llm model registry helpers.
  • Adds /api/embedded-llm/{registry,upgrade-check,recommend-default} routes and wires them into the API app; updates API dependencies.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
pnpm-lock.yaml Links @skytwin/embedded-llm workspace package into the monorepo dependency graph.
packages/embedded-llm/src/model-registry.ts Defines the static registry and upgrade/default-selection helpers.
packages/embedded-llm/src/index.ts Re-exports the new registry APIs from the package entrypoint.
packages/embedded-llm/src/tests/model-registry.test.ts Adds unit tests covering registry invariants and helper behavior.
apps/api/src/routes/embedded-llm.ts Adds HTTP endpoints to expose the registry and upgrade/default selection.
apps/api/src/index.ts Registers the new embedded-llm router under /api/embedded-llm.
apps/api/package.json Adds @skytwin/embedded-llm as an API dependency.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +137 to +161
*/
export function checkForUpgrade(
currentModelId: string,
registry: readonly ModelEntry[] = MODEL_REGISTRY,
): UpgradeRecommendation | null {
const current = registry.find((m) => m.id === currentModelId);
if (!current) return null;

// Look for a strictly-better model in the same bracket. "Strictly
// better" = higher qualityScore. We don't recommend cross-bracket
// upgrades automatically — going from 8GB to 16GB might exceed the
// user's RAM.
const candidates = registry.filter(
(m) => m.ramBracket === current.ramBracket && m.qualityScore > current.qualityScore,
);
if (candidates.length === 0) return null;

const best = candidates.reduce((a, b) => (a.qualityScore > b.qualityScore ? a : b));
const deltaPct = Math.round(((best.qualityScore - current.qualityScore) / current.qualityScore) * 100);

return {
currentId: currentModelId,
recommended: best,
qualityDeltaPct: deltaPct,
rationale: `${best.displayName} scores ${best.qualityScore}/100 on our curated benchmark vs your current ${current.qualityScore}/100. Same RAM bracket (${current.ramBracket}), ~${(best.approxBytes / 1024 / 1024 / 1024).toFixed(1)}GB download.`,
Comment on lines +55 to +67
export const MODEL_REGISTRY: readonly ModelEntry[] = Object.freeze([
{
id: 'phi-3.5-mini-q4',
displayName: 'Phi-3.5 Mini (Q4)',
ramBracket: '4gb',
approxBytes: 2.4 * 1024 * 1024 * 1024,
contextWindow: 4096,
qualityScore: 72,
downloadUrl: 'https://huggingface.co/microsoft/Phi-3.5-mini-instruct-gguf/resolve/main/Phi-3.5-mini-instruct-q4.gguf',
sha256: '0000000000000000000000000000000000000000000000000000000000000000',
version: 1,
},
{
Comment on lines +56 to +66
{
id: 'phi-3.5-mini-q4',
displayName: 'Phi-3.5 Mini (Q4)',
ramBracket: '4gb',
approxBytes: 2.4 * 1024 * 1024 * 1024,
contextWindow: 4096,
qualityScore: 72,
downloadUrl: 'https://huggingface.co/microsoft/Phi-3.5-mini-instruct-gguf/resolve/main/Phi-3.5-mini-instruct-q4.gguf',
sha256: '0000000000000000000000000000000000000000000000000000000000000000',
version: 1,
},
if (candidates.length === 0) return null;

const best = candidates.reduce((a, b) => (a.qualityScore > b.qualityScore ? a : b));
const deltaPct = Math.round(((best.qualityScore - current.qualityScore) / current.qualityScore) * 100);
Comment on lines +13 to +36
* GET /api/embedded-llm/registry
* Returns the curated model list grouped by RAM bracket.
*
* GET /api/embedded-llm/upgrade-check?currentId=<id>
* Returns `{ upgrade: UpgradeRecommendation | null }`. Clients call
* this on app start to decide whether to surface the "your twin can
* be N% smarter" prompt.
*
* GET /api/embedded-llm/recommend-default?bracket=<4gb|8gb|16gb|32gb-plus>
* Returns the highest-quality model in the bracket — used by the
* first-run flow.
*
* The registry itself is a static, hand-curated list shipped with the
* `@skytwin/embedded-llm` package. No external HTTP. The downloader UI
* (#187 AC#2) consumes the `downloadUrl` field; this API just exposes
* the catalog.
*/
export function createEmbeddedLlmRouter(): Router {
const router = Router();
bindUserIdParamOwnership(router);

router.get('/registry', (_req, res) => {
res.json({ models: MODEL_REGISTRY });
});
Comment on lines +30 to +33
export function createEmbeddedLlmRouter(): Router {
const router = Router();
bindUserIdParamOwnership(router);

Comment on lines +30 to +57
export function createEmbeddedLlmRouter(): Router {
const router = Router();
bindUserIdParamOwnership(router);

router.get('/registry', (_req, res) => {
res.json({ models: MODEL_REGISTRY });
});

router.get('/upgrade-check', (req, res) => {
const currentId = req.query['currentId'];
if (typeof currentId !== 'string' || currentId.length === 0) {
res.status(400).json({ error: 'currentId query param required' });
return;
}
const upgrade = checkForUpgrade(currentId);
res.json({ upgrade });
});

router.get('/recommend-default', (req, res) => {
const bracket = req.query['bracket'];
const valid: ReadonlyArray<RamBracket> = ['4gb', '8gb', '16gb', '32gb-plus'];
if (typeof bracket !== 'string' || !valid.includes(bracket as RamBracket)) {
res.status(400).json({ error: 'bracket must be one of 4gb / 8gb / 16gb / 32gb-plus' });
return;
}
const model = recommendDefault(bracket as RamBracket);
res.json({ model });
});
Closes AC#5: hand-curated static model registry keyed by RAM bracket
(4gb/8gb/16gb/32gb-plus); upgrade-check returns a strictly-better peer
in the SAME bracket so we never auto-recommend a cross-bracket
"upgrade" that could OOM the user's machine.

Ships:
- model-registry.ts: MODEL_REGISTRY (5 models, frozen), findById,
  listByBracket (returns copy), recommendDefault, checkForUpgrade
  (returns { recommended, qualityDeltaPct, rationale } or null).
- /api/embedded-llm/{registry,upgrade-check,recommend-default}.
- @skytwin/embedded-llm dep added to apps/api.

11 new unit tests in embedded-llm; package 71 passing total.
API 415 passing.

Out of scope: AC#2 downloader UI (separate surface), AC#1 bundling
(distribution change), AC#8 cost dashboard (already implicitly
satisfied — embedded provider passes spendCents:0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jayzalowitz jayzalowitz force-pushed the feat/embedded-llm-closeout branch from 74107c5 to dfce5a6 Compare May 9, 2026 08:50
@jayzalowitz jayzalowitz merged commit 699dc35 into main May 9, 2026
8 checks passed
jayzalowitz added a commit that referenced this pull request May 9, 2026
* feat(#187 AC#2): embedded LLM model downloader

The launch-gating piece: a fresh install can now fetch its own model
with no config, no API keys, no per-message cost. Settings → Local AI
brain card detects RAM bracket via navigator.deviceMemory, recommends
the highest-quality model that fits, click Download → 3-15 minutes
later the twin works fully offline.

Substrate (#246) gave us the catalog. This PR makes downloading happen.

Ships:
- 039 migration: model_downloads (status state machine, paused_at,
  completed_at, sha256_expected, target_path)
- modelDownloadRepository: create, findActive (idempotency on
  user+model), updateProgress, setStatus, recoverOrphanedDownloads
- apps/api/src/embedded-llm/downloader.ts: streams fetch() body to
  <target_path>.partial, atomic rename on success; HTTP Range for
  resume (graceful fallback when server ignores Range); SHA-256 verify
  post-download (skipped when registry hash is placeholder all-zeros);
  in-flight AbortController map for pause/cancel; default dir
  ~/.skytwin/models/llama matches runtime detector's read path.
- /api/embedded-llm/downloads/{start,:id,user/:userId,pause,resume,cancel}
  endpoints; boot-time recovery wired into apps/api/src/index.ts.
- apps/web/public/js/components/embedded-llm-card.js: detects bracket,
  recommends default, override select, progress bar (reuses
  .confidence-bar), pause/resume/cancel, polls every 1s during active.

14 new route tests; api total 477 passing. All 34 packages build clean.

Out of scope: real SHA-256 hashes in registry (currently placeholder),
first-run dashboard integration, paused-on-restart proactive toast.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#187 AC#2): address Copilot review — 3 IDOR + OOM + races

12 substantive findings from Copilot's review of PR #247, all fixed:

Security (3 IDOR vulns):
- POST /downloads/start: requireOwnership middleware (body.userId
  must match req.authenticatedUserId)
- GET /downloads/:id and pause/resume/cancel: loadOwnedDownload()
  helper fetches row, rejects 403 if row.user_id ≠ authenticated user

Correctness:
- SHA-256 verify: stream via createReadStream so multi-GB GGUFs
  don't OOM the API process (was readFile() into memory)
- Schema: UNIQUE partial index on (user_id, model_id) for
  non-terminal statuses; startDownload() catches 23505 race
- inFlight guard in startDownload() prevents two concurrent
  runners writing same .partial when caller rapidly retries
- Persist bytes_downloaded=0 immediately when server ignores Range
- Persist total_bytes when Content-Length differs from registry
  approxBytes (added updateTotalBytes repo method)

Frontend:
- Stable data-role selectors (embedded-llm-progress-text /
  -status) replace the brittle "labelLine.firstElementChild" walk
  that was clobbering action buttons after first poll
- ensureListener() no longer closes over container/userId from
  first mount; re-derives both at click time via
  getElementById(CARD_TARGET_ID) + getCurrentUserId()

Tests:
- Switched from injecting req.user.id to req.authenticatedUserId
  to match production sessionAuth contract
- 7 new cross-user 403 tests covering every mutating endpoint +
  dev-bypass-allowed path. Total: 25 tests (was 14), all passing.

Build clean. 486 API tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#187 AC#2): cancel through verify/install + stale partial + dead code

Four findings from Copilot's re-review of the security/correctness fixes:

- **Cancel couldn't stop verify/install**: the runner removed the
  inFlight handle BEFORE verification, so cancelDownload() couldn't
  abort SHA-256 or rename. After verify completed, the runner would
  overwrite the user's cancelled status with complete/failed. Now
  the handle stays in inFlight through verify+install, and the runner
  checks handle.cancelled at every phase boundary (pre-verify,
  post-hash, pre-install, post-rename) and bails without writing
  terminal status if cancel fired. inFlight.delete moved to the
  final boundary.

- **Stale .partial corrupting fresh resume**: a previous attempt that
  failed to delete .partial (e.g., Windows file lock during cancel)
  could cause a brand-new download row to "resume" from stale bytes.
  Now we only resume when bytes_downloaded > 0 on the row; otherwise
  we delete any pre-existing partial before starting fresh.

- **UI showed Pause for verifying/installing but backend rejected it**:
  pauseDownload() returns ok:false for those statuses. Added
  PAUSABLE_STATUSES set (pending + downloading only); the action
  bar shows Cancel-only during verify/install. Cancel now actually
  works through those phases (per the runner change above).

- **ensureDirectory() dead code**: exported but unused. Removed,
  along with the now-unused `dirname` import.

Build clean. 25 route tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants