Skip to content

Feature/video gen features/fixes/improvements#5

Merged
cryptopoly merged 11 commits into
mainfrom
feature/video-gen
Apr 25, 2026
Merged

Feature/video gen features/fixes/improvements#5
cryptopoly merged 11 commits into
mainfrom
feature/video-gen

Conversation

@cryptopoly

Copy link
Copy Markdown
Owner

No description provided.

User reported CUDA torch install succeeded but Image/Video Studio
still showed DEVICE: CPU and video generation ran on CPU despite
an RTX 4090 being present.

Root cause: the backend was in source-workspace launcher mode (Tauri
couldn't find / fell back from the embedded runtime), so Python ran
against the dev .venv. The extras-site-packages prepend to PYTHONPATH
only existed in apply_embedded_runtime_env, which the source-workspace
branch never calls. Result: Python started with no PYTHONPATH,
found torch in .venv (or nowhere), never looked at the 2.5 GB of
CUDA torch the GPU bundle install had just dropped into
~/.chaosengine/extras/site-packages/.

Evidence from user's diagnostics snapshot:
  PYTHONPATH      | null (not set)
  sysPath         | includes .venv/Lib/site-packages, missing extras

Fix: add the matching PYTHONPATH prepend in the source-workspace
branch of bootstrap(). Same shape as the embedded path, just without
the runtime-specific entries (source-workspace Python auto-discovers
.venv via sys.prefix, so we only need to inject extras to win over
whatever the dev venv happens to have).

Per Karpathy CLAUDE.md #3 (Surgical Changes): single additive block
inside an existing else branch. No existing logic modified. No new
helpers. Embedded path is untouched because it already handles this.

cargo check clean. No Python or TS tests exercise this path (it's
purely subprocess env-var wiring), verification is post-rebuild on
the user's Windows VM: /api/diagnostics/snapshot should show the
extras path in both PYTHONPATH and sysPath.
User asked for: fixed-width terminal, single scroll region, step
counter showing progress. Previous per-step <details> cards were OK
on a 3-package install but stacked too tall on the 13-package GPU
bundle — output scrolled off-screen and users lost track of which
step was current.

New layout:

- Single monospace <pre> region, max-height 380px, auto-scrolls to
  bottom on new attempts (tail -f behaviour). Doesn't steal scroll
  on phase transitions — only on new output — so a user reading
  earlier lines doesn't get yanked forward.
- Step line above the terminal shows 'Step 3/13: accelerate · 42%'
  while running, 'Final: 12/13 packages · 100%' when done.
- Per-attempt markers ([ OK ], [FAIL], [....] for in-progress) line
  up on the left edge so failures are scannable.

Also strips pip's dep-resolver noise from the displayed output. The
user hit this on their Windows box where their .venv's leftover
turboquant-mlx-full declared an mlx>= constraint that will never be
satisfied on Windows — pip prints a scary-looking ERROR block
('chaos-engine-compressor ... requires safetensors, which is not
installed'), cosmetic but alarming. Raw attempt.output still has
the noise; we just filter it from the rendered terminal. Users who
want the full pip trace still get it via the attempts array.

Per CLAUDE.md #2 (Simplicity First): single component file, no new
abstractions, no new deps. Per #3 (Surgical Changes): only touched
InstallLogPanel.tsx + its CSS block. Per #4 (Goal-Driven): visible
improvement (scroll works, step counter visible, noise suppressed)
verifiable at a glance on next build.

174 TS tests still pass, tsc clean.
Two UX asks from the user's latest round:

1. 'Image Studio doesn't say whether it's trying to use CPU or CUDA'
2. 'Even though I just installed everything, the Image Studio still
   shows an install GPU runtime button. I closed and reopened the
   app and it disappeared.'

## Fix C: Device chip

The chip at line 256 only rendered when runtime.device was set,
and since my earlier refactor removed the speculative torch import
from probe() the device is now null until a model is actually loaded.
Added an expectedDevice field to ImageRuntimeStatus that's
computed WITHOUT importing torch — find_spec + nvidia_gpu_present +
platform.machine — so we can show 'Device: cuda (expected)' before
the first Generate even fires. Same constraint as probe(): absolutely
no torch import (would pin torch/lib/*.dll and break the install
flow we just fixed).

The chip now reads:

  Device: cuda                (model loaded, actual device)
  Device: cuda (expected)     (torch installed + NVIDIA seen)
  Device: mps (expected)      (Apple Silicon)
  Device: cpu (expected)      (torch installed, no GPU)
  (hidden)                    (torch not installed)

## Fix D: Post-install restart nudge

Before this, after a successful GPU bundle install the Image Studio
still showed 'Install GPU runtime'. Root cause: backend's sys.path
is snapshotted at spawn time, so find_spec still reports torch as
missing until the backend restarts with the new PYTHONPATH (Fix A's
domain). User was confused — install said success, UI said install
again. App restart made it go away.

Split the 'runtime-not-available' block into two paths:

  - Post-install awaiting restart (job.phase === 'done' &&
    job.requiresRestart): show 'installed to <path>, restart to
    activate' + a Restart Backend button. No install button — you
    just installed, clicking it again would be confusing.
  - All other cases: show the install button as before.

Mirrored the same split to Video Studio for consistency.

Per CLAUDE.md #1 (Think Before Coding): surfaced the 'running
backend can't see new packages' reality to the user instead of
hiding it. Per #3 (Surgical Changes): added one field to the
backend status + one branch to each Studio tab; no shared
component extraction yet because we only use this shape in two
places and hoisting would obscure the per-tab differences.

478 Python tests pass, 174 TS tests pass, tsc clean.
Bundles today's work across five themes. All changes ship together
because the fixes for FLUX perf and Wan size display depend on the
HF metadata wiring from the Discover work.

CUDA torch install + diagnostics
- install-cuda-torch now writes to the user-persistent extras dir via
  pip --target, not the bundled venv. The venv path on packaged builds
  is read-only (Program Files / AppData\Local\Programs) and anyway the
  next app upgrade wipes it -- fresh installs were silently landing
  CUDA torch somewhere Python never imports from.
- Purge stale torch/* + nvidia-* from extras before reinstalling so a
  half-installed prior attempt doesn't poison the next import. Field
  repro: torch-2.6.0+cu124.dist-info left orphaned alongside a
  torch-2.11.0+cpu folder -> ModuleNotFoundError at runtime.
- GPU bundle installer now writes .chaosengine-torch-constraints.txt
  pinning the installed torch version, and passes --constraint for
  every follow-up pip call. Stops diffusers/transformers from letting
  pip's resolver silently swap the CUDA wheel for CPU-only from
  default PyPI.
- Always-reachable "Install / reinstall CUDA torch" button in the
  Diagnostics panel -- the main-page banner is dismissible-forever
  via localStorage, so there needs to be a permanent entry point.
- Tests cover extras targeting, stale-purge family match, torch
  version parsing, and constraint writing.

FLUX performance
- For FLUX repos on CUDA: bfloat16 + enable_model_cpu_offload()
  instead of .to("cuda") + enable_attention_slicing(). The old path
  tried to fit ~33 GB of weights in 24 GB VRAM, spilled to pinned
  host memory, pagefile-thrashed at ~8 min/step.
- NF4 quantization via bitsandbytes (transformer 24 GB -> 7 GB).
  Falls back to bf16 + cpu_offload if bitsandbytes isn't installed
  or diffusers predates BitsAndBytesConfig, with a user-visible note.
- bitsandbytes added to _INSTALLABLE_PIP_PACKAGES and the GPU bundle
  so the Setup page can install it, and so fresh installs pick it up
  as part of the one-click runtime.
- Other pipelines (SD 1.5 / SDXL / Qwen-Image) stay on the existing
  .to(device) + slicing path -- no regression for models that fit.

Video UX
- New device chip on Video Studio matching the image studio:
  "Device: cuda (expected)" before preload, real device after.
  Derived via nvidia_gpu_present + find_spec so we don't import
  torch just to compute it (same DLL-lock trap the image runtime hit).
- Default guidance is now per-family (LTX=3.0, Hunyuan=6.0, everything
  else keeps 5.0). LTX at 5.0 over-guided and produced the "random
  shapes" output; 3.0 matches the reference pipeline.
- Default negative prompt pre-fills the Video Studio field with a
  generic prompt tuned across LTX/Wan/Hunyuan/Mochi. Blank negatives
  gave the models no correction signal.
- Quality presets (Draft/Standard/High/Max) as pill buttons above
  the params row. Set frames + steps; deliberately DO NOT touch
  guidance so the model-aware values survive a preset click.
- Aspect-ratio presets (1:1/4:3/16:9/9:16/21:9) with fixed safe
  resolutions (all div-by-8, inside every supported model's tested
  envelope).
- Sliders for Steps + Guidance alongside the number inputs. Frames
  and FPS stay number-only because their snapping rules fight
  smooth dragging.
- New InfoTooltip component -- hover/focus-reveal bubble next to
  every video param explaining what it does and a sane starting
  point. Native title attr was considered but its 500-1500ms delay
  and un-styleable body didn't work for dense forms.

Discover polish
- Drop "Curated" badge from image cards + "curated" wording from
  both Discover tab subtitles. Runtime "Real engine ready" callout
  removed from both Discover tabs -- that status lives in the Studio
  tabs where it's actionable.
- Sort dropdown on both tabs: Newest released (default) / Most likes
  / Most downloads. Shared comparator in utils/discoverSort consumed
  by both hooks; prefers curated releaseDate, falls back to HF
  createdAt, then lastModified.
- Video catalog now fetches HF metadata (downloads/likes/
  lastModified/pipelineTag) in parallel from _image_repo_live_metadata
  -- despite the name it's a generic HF-repo fetcher, just wasn't
  wired into video_model_payloads before. 6 h in-process cache
  shared with the image tab.
- Video Discover cards now render downloads + likes chips alongside
  the release date.
- videoPrimarySizeLabel prefers onDiskGb -> coreWeightsGb ->
  repoSizeGb -> hardcoded sizeGb (mirrors the image helper). Added
  videoSecondarySizeLabel to surface "109 GB full repo" when the
  HF tree is meaningfully larger than the diffusers-only subset
  (Wan 2.2: 14 GB weights, 109 GB full repo).
- Fix stale-selection bug in Video Studio: validation effect now
  checks studioFamilies (the dropdown's actual option set) instead
  of videoCatalog. Repro: delete the currently-selected model ->
  dropdown visually shows the next installed model while React
  state stays on the deleted one -> everything downstream
  (chips, Generate button) reads from the wrong variant.

Model storage
- New hfCachePath setting. Path validation accepts Windows
  (D:\foo), POSIX, and ~-relative; rejects bare relative to avoid
  silently landing blobs in the backend's cwd.
- Tauri shell reads hfCachePath from settings.json and injects
  HF_HOME on backend spawn. MUST happen before Python starts --
  huggingface_hub reads HF_HOME at module import, setting it from
  inside the backend via os.environ is a no-op.
- New routes/storage.py: GET/POST the path, and a background-thread
  move worker with GET/status for progress polling.
- Move worker preserves the snapshots/<commit>/ -> blobs/<hash>
  symlink structure HF relies on. Naive shutil.copytree would
  resolve symlinks and explode a 100 GB tree to 500+ GB. Preflight
  requires 1 GB of destination headroom beyond the copy size so
  we don't fill the drive exactly and block other writes mid-move.
- ModelStoragePanel in Settings: effective path, on-disk size, free
  bytes on target drive, path input + Browse + Save + Reset, "Move
  existing models" button (only shows when data to move + path
  changed), live progress panel while the move runs, restart-backend
  prompt after save.
Two user-visible features and a build-blocker fix, version-bumped
together so the diff stays bisect-friendly.

Cancellable image + video generation
- ProgressTracker now exposes request_cancel() / is_cancelled() backed
  by a threading.Event. Snapshot includes a new cancelRequested field
  so the UI can render "Cancelling..." without a separate poll. The
  flag is cleared on begin() so a stale cancel from a previous run
  doesn't abort the next Generate.
- New GenerationCancelled exception keeps user-requested aborts
  distinct from real pipeline crashes — handlers map it to
  HTTPException(409, "cancelled") so the frontend can render a calm
  "Cancelled" state instead of a red error callout.
- image_runtime + video_runtime: the existing callback_on_step_end
  hook now also checks the cancel flag and raises GenerationCancelled
  when set, plus sets pipeline._interrupt = True (diffusers' graceful
  stop signal). An early-cancel check after _ensure_pipeline returns
  catches Cancel clicks during the blocking from_pretrained load —
  we can't interrupt the C extension itself, but we bail before
  sinking time into T5 / VAE / denoising once it returns.
- New POST /api/images/cancel + POST /api/video/cancel endpoints —
  thin handlers that call request_cancel() and return
  {cancelled: bool}. Idempotent.
- Frontend: cancelImageGeneration() / cancelVideoGeneration() API
  wrappers; useImageState / useVideoState track new cancelled +
  cancelling states (cleared on next Generate). Existing error path
  special-cases the "cancelled" message string so the modal
  transitions to a calm cancelled state rather than the red error
  callout.
- ImageGenerationModal + VideoGenerationModal: footer is now always
  rendered. During busy → red "Cancel generation" button (disabled
  + "Cancelling..." while the request is in flight). After → "Done"
  / "Close". New "cancelled" render state between busy and success
  with a callout explaining "no image saved / model still loaded —
  next Generate will skip the cold-start wait".
- Tests: 3 new ProgressTracker cancel tests (idle no-op, mid-run flag
  set, begin() clears stale flag). _REQUIRED_SNAPSHOT_KEYS contract
  test extended with the new cancelRequested key.

Windows build fix: platform-gate MLX optional packages
- scripts/stage-runtime.mjs now declares per-package platform gates.
  dflash-mlx and turboquant-mlx-full are MLX-family (Apple Silicon
  only) — bundling them on Windows tries to install mlx>=0.20.0 as
  a build-time dep, which has no Windows wheel, so pip exits non-zero
  and the whole stage:runtime step blows up. Gate them to ["darwin"]
  with a clean info-line skip elsewhere. turboquant (pure Python)
  stays unrestricted.

Version bump
- package.json + package-lock.json + pyproject.toml +
  src-tauri/Cargo.{toml,lock} + src-tauri/tauri.conf.json all roll
  0.6.2 → 0.6.3 in the same commit so installer + updater + Python
  sdist + npm package all agree on what's running.
Fixes hard-crash when generating with Wan 2.2: catalog reported 14 GB
but the A14B MoE actually needs ~126 GB resident (dual-expert:
transformer/ + transformer_2/). The safety heuristic was correct — it
was fed the wrong input.

Catalog:
- Wan 2.2 A14B sizeGb 14.0 → 126.0 with MoE note
- defaultVariantId switched to Wan 2.2 TI2V-5B
- Added Wan 2.2 TI2V-5B Diffusers variant + QuantStack GGUFs (Q4_K_M /
  Q6_K / Q8_0) as permissive-licence, single-transformer drop-ins

Runtime:
- video_runtime PIPELINE_REGISTRY + _GGUF_VIDEO_TRANSFORMER_CLASSES
  route Wan2.2-TI2V-5B to WanPipeline / WanTransformer3DModel
- image_runtime.ImageRuntimeStatus gains deviceMemoryGb, populated in
  probe() via helpers.gpu.get_device_vram_total_gb

UI safety:
- New assessImageGenerationSafety in utils/images.ts mirrors video
  heuristic with image-specific tweaks (×4 slab, /8 latent, no frames)
- ImageStudioTab + VideoStudioTab gain danger-blocks-generate with
  explicit override checkbox that resets on variant / resolution /
  frame-count change
- Removed frames field from video QUALITY_PRESETS — frames is clip
  length, not quality

Tests:
- test_video_runtime expected-set includes Wan2.2-TI2V-5B-Diffusers
- pytest: 129 pass  |  vitest: 174 pass  |  tsc --noEmit clean
- Add Dependabot config for pip, cargo, and GitHub Actions (weekly/monthly)
- Add pre-commit hooks: large-file check, merge-conflict, private-key detect, EOF/trailing-whitespace
- Update Cargo.lock with security-relevant crate bumps (rustls 0.23.39, hyper-rustls 0.27.9, rustls-webpki 0.103.13, tokio 1.52.1)

Vulnerability scanning delegated to Dependabot and GitHub native alerts rather than osv-scanner pre-commit hook, which was blocking commits on pre-existing unfixable transitive deps in the Tauri/GTK stack.
- video_runtime.probe: detect device memory without waiting for model
  preload. Sysctl/nvidia-smi read is cheap and model-independent; the
  prior gate left the frontend falling back to a hardcoded 16 GB MPS
  default on 64 GB Macs.
- mlx_worker: split JSON protocol channel from stdout. Dup FD 1 into a
  private sink, redirect FD 1 to stderr, rebind sys.stdout. mlx-lm's
  print-to-stdout warnings no longer corrupt the response stream
  ("MLX worker returned invalid JSON: [WARNING] Generating with a
  model that requires 48128 MB...").
- PerformancePreview: when "May not fit" triggers, show a targeted
  advice line. Distinguishes cache-dominated (lower context / pick
  compressed strategy) from weight-dominated (smaller model)
  oversubscription.
- Add LongLive integration (FU-003) for NVlabs/LongLive-1.3B: CUDA-only
  install script, subprocess engine with torchrun dispatch, video_runtime
  routing for NVlabs/LongLive* repos, catalog entry (832x480, 30s default,
  10GB), /api/video/longlive capabilities endpoint, setup route install
  action, and frontend probe + install callout.
- Show clip length in seconds under frames+fps in Video Studio so users
  can eyeball duration without mental math.
- Add Follow-Ups Tracker to CLAUDE.md capturing FU-001..FU-006 for
  deferred upstream work (turboquant 0.3.x, TriAttention MLX wiring,
  LongLive, SGLang, dflash-mlx pin verification).
- Wire TriAttention MLX compressor plumbing in cache_compression
  (blocked on upstream API gap per FU-002).
- Bump triattention/turboquant extras to git-pinned + 0.1.3 respectively.
- Fix update-llama-turbo.sh default branch to feature/turboquant-kv-cache.
- Tests: 14 new unit tests for frame math (Wan VAE 4x + block-align),
  install resolution, YAML render, probe states (Darwin/missing/ready),
  and dispatch routing.
The Video Discover Download button for LongLive was calling HF
snapshot_download on `NVlabs/LongLive-1.3B` -- which is the GitHub
org, not a Hugging Face repo -- so it failed with a misleading "not
found on Hugging Face" error. Even the Studio tab install path,
which knew about the real install script, was broken on Windows:
it shelled out to `scripts/install-longlive.sh` and the generic
FileNotFoundError handler was hardcoded to suggest "Install
Homebrew first: https://brew.sh" -- wrong OS, wrong tool.

Two fixes:

Port the installer to Python (backend_service.longlive_installer)

LongLive is CUDA-only (Windows + Linux), so shipping a bash-only
installer was a dead end on the platforms that actually run it.
The new module does everything the shell script did -- clone the
NVlabs/LongLive repo, build an isolated venv, install pip deps
(incl. best-effort flash-attn), snapshot_download the LongLive +
Wan 2.1 base weights, write the ready marker last so partial
installs don't read as complete. It handles the Windows vs POSIX
venv python layout (Scripts/python.exe vs bin/python) and rejects
macOS up front. Invoked as `python -m backend_service.longlive_installer`
by the setup route, with --help / unknown-arg handling so an
accidental CLI invocation doesn't kick off a 10-minute install.
The legacy scripts/install-longlive.sh is deleted -- it was never
staged into the embedded backend runtime anyway, which is a
separate reason the Windows path couldn't find it.

Also scoped the "Install Homebrew first" hint in the
install-system-package route to only fire when cmd[0] == "brew",
so the LongLive path on Windows no longer emits a macOS-only
error.

Install LongLive CTA in Video Discover

The Download button for LongLive in Video Discover now becomes
"Install LongLive" (matching the Studio tab's existing install
affordance). VideoDiscoverTab probes the LongLive runtime status
when the results include a LongLive variant, renders "Installed"
vs "Not installed" based on that probe rather than the HF
snapshot dir (which never exists for LongLive), and suppresses
the stale "Download Failed" badge + Delete button since LongLive
was never downloading through the HF pipeline in the first place.

Tests

* tests/test_longlive_installer.py -- platform guards, missing
  git, venv path layout for both OSes, CLI --help / unknown-arg
  behaviour.
* tests/test_setup_routes.py -- new regression tests pinning the
  LongLive -> python-module wiring and the brew-error scoping.
Two follow-ups land alongside research tracker entries (FU-007..010
in CLAUDE.md): a working TeaCache scaffold for diffusion pipelines
and an Apple Silicon MLX video engine probe + UI surface.

TeaCache (FU-007) -- training-free DiT cache, Apache 2.0
---------------------------------------------------------

Extends the cache_compression registry beyond text. The base
CacheStrategy now declares applies_to() (default {"text"}); TeaCache
returns {"image", "video"} so the UI can surface it only in
diffusion contexts. The available() JSON payload gains an
"appliesTo" field so the frontend can filter.

A shared apply_diffusion_cache_strategy() helper in
cache_compression/__init__.py is invoked from both
DiffusersTextToImageEngine.generate (image_runtime.py) and
DiffusersVideoEngine.generate (video_runtime.py) just before pipeline
invoke. Falsy strategy id is a silent no-op; unknown id, domain
mismatch, and unsupported pipeline class all return a note string
without raising -- the engine continues with the stock pipeline.

Per-pipeline forward patches are deferred (cache_compression/
_teacache_patches/ vendoring lands incrementally). The strategy
ships with an empty _FORWARD_PATCHES dict; apply_diffusers_hook
raises NotImplementedError for any class until vendoring lands.
Validation order: rel_l1_thresh and num_inference_steps are checked
before transformer-class lookup so callers see ValueError for bad
args, NotImplementedError only for unsupported shapes.

ImageGenerationConfig and VideoGenerationConfig grow optional
cacheStrategy and cacheRelL1Thresh fields. UI surfacing of those
inputs is deferred -- the engine path is wired and tested first so
the strategy can be smoked via direct API calls before adding
controls.

mlx-video (FU-009) -- Apple Silicon video engine
-------------------------------------------------

backend_service/mlx_video_runtime.py adds MlxVideoEngine, a
subprocess-style wrapper around Blaizzy/mlx-video (MIT). Probe
gates on Darwin arm64 and importlib.util.find_spec("mlx_video"),
mirroring the MfluxImageEngine pattern. Generate raises
NotImplementedError pointing at FU-009 -- end-to-end generation
lands when the follow-up promotes from scaffold.

VideoRuntimeManager exposes mlx_video_capabilities() (lazy-
constructed like _longlive). New route GET /api/video/mlx-runtime
proxies it. supported_repos() lists the Wan2.1 / Wan2.2 / LTX-2
repos mlx-video covers natively so the Studio can decide chip
visibility without an extra round-trip.

Frontend wiring:
  - src/api.ts -- getMlxVideoRuntime()
  - src/hooks/useVideoState.ts -- mlxVideoStatus + handlers,
    bundled into refreshVideoData() Promise.allSettled
  - src/features/video/VideoStudioTab.tsx -- chip (warning when
    pkg missing, subtle when scaffold-installed) + "Install
    mlx-video" button gated on Apple Silicon via the probe's
    expectedDevice/device fields
  - src/App.tsx -- props passthrough

Install path is the existing /api/setup/install-package endpoint
with mlx-video added to _INSTALLABLE_PIP_PACKAGES.

Tracker + licences
------------------

CLAUDE.md follow-ups: FU-007 (TeaCache, in-flight), FU-008
(stable-diffusion.cpp, next cycle), FU-009 (mlx-video, scaffolded),
FU-010 (vllm-swift, watch-only).

THIRD_PARTY_NOTICES.md: TeaCache (Apache 2.0, vendored patches)
and mlx-video (MIT, optional Apple Silicon dep).

pyproject.toml: new diffusion-accel and mlx-video extras.

Tests
-----

  - tests/test_teacache.py -- 27 tests covering registry, metadata,
    availability badge, hook contract (raises for missing
    transformer / unsupported class / bad threshold / zero steps),
    text-engine refusal, helper swallow contract.
  - tests/test_mlx_video.py -- 15 tests covering supported repo
    set, platform gating (mocked Darwin/Linux/Intel Mac),
    install probe, lifecycle (preload/unload), generate refuses,
    manager surface + lazy construction.
  - tests/test_video_routes.py -- two new tests pinning the
    /api/video/mlx-runtime shape and that it delegates to
    VideoRuntimeManager.mlx_video_capabilities (not the diffusers
    probe).

Full sweep on Apple Silicon: 590 pytest pass + 1 skip, 174 vitest
pass / 13 files, npx tsc --noEmit clean.
@cryptopoly cryptopoly merged commit 7f429bc into main Apr 25, 2026
1 check passed
cryptopoly added a commit that referenced this pull request Jun 1, 2026
…n-any-HF, connect presets)

Five local-AI-app parity features to close gaps vs Ollama / LM Studio,
each reusing existing infra rather than adding new heavy subsystems.

1. Out-of-box RAG — one-click nomic-embed-text-v1.5 install
   (/api/setup/install-embedding-model) + /api/rag/status. Chat doc
   panel shows vector vs lexical mode and offers the upgrade
   (RagStatusBadge). Retrieval was silently lexical-only without a model.

2. Server "Connect your app" presets — base_url + Python/JS snippets +
   Open WebUI / Continue.dev / Ollama presets in ServerTab.

3. Ollama-compatible API — /api/{chat,generate,tags,show,version,
   embeddings,embed} layered over the existing OpenAI generation path,
   translating SSE to NDJSON. Inherits auth + format->json_schema.
   Unlocks Ollama-preset tools (Open WebUI, Continue, Raycast, n8n).

4. Import Ollama / LM Studio models by reference — scans the Ollama blob
   store (manifest -> blob) and LM Studio cache, symlinks into a managed
   imported-models dir (no re-download), auto-registers for library scan.

5. Run any Hugging Face repo — /api/models/resolve-hf classifies backend,
   picks the GGUF file, and infers context + capabilities from the repo's
   own metadata; loads with canonicalRepo set to bypass the FU-041
   catalog fuzzy-match that mis-tagged off-catalog models (RunFromHuggingFace).

Tests: +42 backend (test_embedding_setup, test_hf_resolve, test_model_import,
+ Ollama shim cases in test_backend_service); vitest 453 green; tsc clean;
i18n 100%; full E2E suite 8/8 phases pass incl. new phase-0 checks.

Known follow-ups: stage llama-embedding binary for packaged builds (#1);
Windows symlink privilege (#4); raw-safetensors repos flagged vLLM/CUDA (#5).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant