Feature/video gen features/fixes/improvements#5
Merged
Conversation
User reported CUDA torch install succeeded but Image/Video Studio still showed DEVICE: CPU and video generation ran on CPU despite an RTX 4090 being present. Root cause: the backend was in source-workspace launcher mode (Tauri couldn't find / fell back from the embedded runtime), so Python ran against the dev .venv. The extras-site-packages prepend to PYTHONPATH only existed in apply_embedded_runtime_env, which the source-workspace branch never calls. Result: Python started with no PYTHONPATH, found torch in .venv (or nowhere), never looked at the 2.5 GB of CUDA torch the GPU bundle install had just dropped into ~/.chaosengine/extras/site-packages/. Evidence from user's diagnostics snapshot: PYTHONPATH | null (not set) sysPath | includes .venv/Lib/site-packages, missing extras Fix: add the matching PYTHONPATH prepend in the source-workspace branch of bootstrap(). Same shape as the embedded path, just without the runtime-specific entries (source-workspace Python auto-discovers .venv via sys.prefix, so we only need to inject extras to win over whatever the dev venv happens to have). Per Karpathy CLAUDE.md #3 (Surgical Changes): single additive block inside an existing else branch. No existing logic modified. No new helpers. Embedded path is untouched because it already handles this. cargo check clean. No Python or TS tests exercise this path (it's purely subprocess env-var wiring), verification is post-rebuild on the user's Windows VM: /api/diagnostics/snapshot should show the extras path in both PYTHONPATH and sysPath.
User asked for: fixed-width terminal, single scroll region, step
counter showing progress. Previous per-step <details> cards were OK
on a 3-package install but stacked too tall on the 13-package GPU
bundle — output scrolled off-screen and users lost track of which
step was current.
New layout:
- Single monospace <pre> region, max-height 380px, auto-scrolls to
bottom on new attempts (tail -f behaviour). Doesn't steal scroll
on phase transitions — only on new output — so a user reading
earlier lines doesn't get yanked forward.
- Step line above the terminal shows 'Step 3/13: accelerate · 42%'
while running, 'Final: 12/13 packages · 100%' when done.
- Per-attempt markers ([ OK ], [FAIL], [....] for in-progress) line
up on the left edge so failures are scannable.
Also strips pip's dep-resolver noise from the displayed output. The
user hit this on their Windows box where their .venv's leftover
turboquant-mlx-full declared an mlx>= constraint that will never be
satisfied on Windows — pip prints a scary-looking ERROR block
('chaos-engine-compressor ... requires safetensors, which is not
installed'), cosmetic but alarming. Raw attempt.output still has
the noise; we just filter it from the rendered terminal. Users who
want the full pip trace still get it via the attempts array.
Per CLAUDE.md #2 (Simplicity First): single component file, no new
abstractions, no new deps. Per #3 (Surgical Changes): only touched
InstallLogPanel.tsx + its CSS block. Per #4 (Goal-Driven): visible
improvement (scroll works, step counter visible, noise suppressed)
verifiable at a glance on next build.
174 TS tests still pass, tsc clean.
Two UX asks from the user's latest round:
1. 'Image Studio doesn't say whether it's trying to use CPU or CUDA'
2. 'Even though I just installed everything, the Image Studio still
shows an install GPU runtime button. I closed and reopened the
app and it disappeared.'
## Fix C: Device chip
The chip at line 256 only rendered when runtime.device was set,
and since my earlier refactor removed the speculative torch import
from probe() the device is now null until a model is actually loaded.
Added an expectedDevice field to ImageRuntimeStatus that's
computed WITHOUT importing torch — find_spec + nvidia_gpu_present +
platform.machine — so we can show 'Device: cuda (expected)' before
the first Generate even fires. Same constraint as probe(): absolutely
no torch import (would pin torch/lib/*.dll and break the install
flow we just fixed).
The chip now reads:
Device: cuda (model loaded, actual device)
Device: cuda (expected) (torch installed + NVIDIA seen)
Device: mps (expected) (Apple Silicon)
Device: cpu (expected) (torch installed, no GPU)
(hidden) (torch not installed)
## Fix D: Post-install restart nudge
Before this, after a successful GPU bundle install the Image Studio
still showed 'Install GPU runtime'. Root cause: backend's sys.path
is snapshotted at spawn time, so find_spec still reports torch as
missing until the backend restarts with the new PYTHONPATH (Fix A's
domain). User was confused — install said success, UI said install
again. App restart made it go away.
Split the 'runtime-not-available' block into two paths:
- Post-install awaiting restart (job.phase === 'done' &&
job.requiresRestart): show 'installed to <path>, restart to
activate' + a Restart Backend button. No install button — you
just installed, clicking it again would be confusing.
- All other cases: show the install button as before.
Mirrored the same split to Video Studio for consistency.
Per CLAUDE.md #1 (Think Before Coding): surfaced the 'running
backend can't see new packages' reality to the user instead of
hiding it. Per #3 (Surgical Changes): added one field to the
backend status + one branch to each Studio tab; no shared
component extraction yet because we only use this shape in two
places and hoisting would obscure the per-tab differences.
478 Python tests pass, 174 TS tests pass, tsc clean.
Bundles today's work across five themes. All changes ship together
because the fixes for FLUX perf and Wan size display depend on the
HF metadata wiring from the Discover work.
CUDA torch install + diagnostics
- install-cuda-torch now writes to the user-persistent extras dir via
pip --target, not the bundled venv. The venv path on packaged builds
is read-only (Program Files / AppData\Local\Programs) and anyway the
next app upgrade wipes it -- fresh installs were silently landing
CUDA torch somewhere Python never imports from.
- Purge stale torch/* + nvidia-* from extras before reinstalling so a
half-installed prior attempt doesn't poison the next import. Field
repro: torch-2.6.0+cu124.dist-info left orphaned alongside a
torch-2.11.0+cpu folder -> ModuleNotFoundError at runtime.
- GPU bundle installer now writes .chaosengine-torch-constraints.txt
pinning the installed torch version, and passes --constraint for
every follow-up pip call. Stops diffusers/transformers from letting
pip's resolver silently swap the CUDA wheel for CPU-only from
default PyPI.
- Always-reachable "Install / reinstall CUDA torch" button in the
Diagnostics panel -- the main-page banner is dismissible-forever
via localStorage, so there needs to be a permanent entry point.
- Tests cover extras targeting, stale-purge family match, torch
version parsing, and constraint writing.
FLUX performance
- For FLUX repos on CUDA: bfloat16 + enable_model_cpu_offload()
instead of .to("cuda") + enable_attention_slicing(). The old path
tried to fit ~33 GB of weights in 24 GB VRAM, spilled to pinned
host memory, pagefile-thrashed at ~8 min/step.
- NF4 quantization via bitsandbytes (transformer 24 GB -> 7 GB).
Falls back to bf16 + cpu_offload if bitsandbytes isn't installed
or diffusers predates BitsAndBytesConfig, with a user-visible note.
- bitsandbytes added to _INSTALLABLE_PIP_PACKAGES and the GPU bundle
so the Setup page can install it, and so fresh installs pick it up
as part of the one-click runtime.
- Other pipelines (SD 1.5 / SDXL / Qwen-Image) stay on the existing
.to(device) + slicing path -- no regression for models that fit.
Video UX
- New device chip on Video Studio matching the image studio:
"Device: cuda (expected)" before preload, real device after.
Derived via nvidia_gpu_present + find_spec so we don't import
torch just to compute it (same DLL-lock trap the image runtime hit).
- Default guidance is now per-family (LTX=3.0, Hunyuan=6.0, everything
else keeps 5.0). LTX at 5.0 over-guided and produced the "random
shapes" output; 3.0 matches the reference pipeline.
- Default negative prompt pre-fills the Video Studio field with a
generic prompt tuned across LTX/Wan/Hunyuan/Mochi. Blank negatives
gave the models no correction signal.
- Quality presets (Draft/Standard/High/Max) as pill buttons above
the params row. Set frames + steps; deliberately DO NOT touch
guidance so the model-aware values survive a preset click.
- Aspect-ratio presets (1:1/4:3/16:9/9:16/21:9) with fixed safe
resolutions (all div-by-8, inside every supported model's tested
envelope).
- Sliders for Steps + Guidance alongside the number inputs. Frames
and FPS stay number-only because their snapping rules fight
smooth dragging.
- New InfoTooltip component -- hover/focus-reveal bubble next to
every video param explaining what it does and a sane starting
point. Native title attr was considered but its 500-1500ms delay
and un-styleable body didn't work for dense forms.
Discover polish
- Drop "Curated" badge from image cards + "curated" wording from
both Discover tab subtitles. Runtime "Real engine ready" callout
removed from both Discover tabs -- that status lives in the Studio
tabs where it's actionable.
- Sort dropdown on both tabs: Newest released (default) / Most likes
/ Most downloads. Shared comparator in utils/discoverSort consumed
by both hooks; prefers curated releaseDate, falls back to HF
createdAt, then lastModified.
- Video catalog now fetches HF metadata (downloads/likes/
lastModified/pipelineTag) in parallel from _image_repo_live_metadata
-- despite the name it's a generic HF-repo fetcher, just wasn't
wired into video_model_payloads before. 6 h in-process cache
shared with the image tab.
- Video Discover cards now render downloads + likes chips alongside
the release date.
- videoPrimarySizeLabel prefers onDiskGb -> coreWeightsGb ->
repoSizeGb -> hardcoded sizeGb (mirrors the image helper). Added
videoSecondarySizeLabel to surface "109 GB full repo" when the
HF tree is meaningfully larger than the diffusers-only subset
(Wan 2.2: 14 GB weights, 109 GB full repo).
- Fix stale-selection bug in Video Studio: validation effect now
checks studioFamilies (the dropdown's actual option set) instead
of videoCatalog. Repro: delete the currently-selected model ->
dropdown visually shows the next installed model while React
state stays on the deleted one -> everything downstream
(chips, Generate button) reads from the wrong variant.
Model storage
- New hfCachePath setting. Path validation accepts Windows
(D:\foo), POSIX, and ~-relative; rejects bare relative to avoid
silently landing blobs in the backend's cwd.
- Tauri shell reads hfCachePath from settings.json and injects
HF_HOME on backend spawn. MUST happen before Python starts --
huggingface_hub reads HF_HOME at module import, setting it from
inside the backend via os.environ is a no-op.
- New routes/storage.py: GET/POST the path, and a background-thread
move worker with GET/status for progress polling.
- Move worker preserves the snapshots/<commit>/ -> blobs/<hash>
symlink structure HF relies on. Naive shutil.copytree would
resolve symlinks and explode a 100 GB tree to 500+ GB. Preflight
requires 1 GB of destination headroom beyond the copy size so
we don't fill the drive exactly and block other writes mid-move.
- ModelStoragePanel in Settings: effective path, on-disk size, free
bytes on target drive, path input + Browse + Save + Reset, "Move
existing models" button (only shows when data to move + path
changed), live progress panel while the move runs, restart-backend
prompt after save.
Two user-visible features and a build-blocker fix, version-bumped
together so the diff stays bisect-friendly.
Cancellable image + video generation
- ProgressTracker now exposes request_cancel() / is_cancelled() backed
by a threading.Event. Snapshot includes a new cancelRequested field
so the UI can render "Cancelling..." without a separate poll. The
flag is cleared on begin() so a stale cancel from a previous run
doesn't abort the next Generate.
- New GenerationCancelled exception keeps user-requested aborts
distinct from real pipeline crashes — handlers map it to
HTTPException(409, "cancelled") so the frontend can render a calm
"Cancelled" state instead of a red error callout.
- image_runtime + video_runtime: the existing callback_on_step_end
hook now also checks the cancel flag and raises GenerationCancelled
when set, plus sets pipeline._interrupt = True (diffusers' graceful
stop signal). An early-cancel check after _ensure_pipeline returns
catches Cancel clicks during the blocking from_pretrained load —
we can't interrupt the C extension itself, but we bail before
sinking time into T5 / VAE / denoising once it returns.
- New POST /api/images/cancel + POST /api/video/cancel endpoints —
thin handlers that call request_cancel() and return
{cancelled: bool}. Idempotent.
- Frontend: cancelImageGeneration() / cancelVideoGeneration() API
wrappers; useImageState / useVideoState track new cancelled +
cancelling states (cleared on next Generate). Existing error path
special-cases the "cancelled" message string so the modal
transitions to a calm cancelled state rather than the red error
callout.
- ImageGenerationModal + VideoGenerationModal: footer is now always
rendered. During busy → red "Cancel generation" button (disabled
+ "Cancelling..." while the request is in flight). After → "Done"
/ "Close". New "cancelled" render state between busy and success
with a callout explaining "no image saved / model still loaded —
next Generate will skip the cold-start wait".
- Tests: 3 new ProgressTracker cancel tests (idle no-op, mid-run flag
set, begin() clears stale flag). _REQUIRED_SNAPSHOT_KEYS contract
test extended with the new cancelRequested key.
Windows build fix: platform-gate MLX optional packages
- scripts/stage-runtime.mjs now declares per-package platform gates.
dflash-mlx and turboquant-mlx-full are MLX-family (Apple Silicon
only) — bundling them on Windows tries to install mlx>=0.20.0 as
a build-time dep, which has no Windows wheel, so pip exits non-zero
and the whole stage:runtime step blows up. Gate them to ["darwin"]
with a clean info-line skip elsewhere. turboquant (pure Python)
stays unrestricted.
Version bump
- package.json + package-lock.json + pyproject.toml +
src-tauri/Cargo.{toml,lock} + src-tauri/tauri.conf.json all roll
0.6.2 → 0.6.3 in the same commit so installer + updater + Python
sdist + npm package all agree on what's running.
Fixes hard-crash when generating with Wan 2.2: catalog reported 14 GB but the A14B MoE actually needs ~126 GB resident (dual-expert: transformer/ + transformer_2/). The safety heuristic was correct — it was fed the wrong input. Catalog: - Wan 2.2 A14B sizeGb 14.0 → 126.0 with MoE note - defaultVariantId switched to Wan 2.2 TI2V-5B - Added Wan 2.2 TI2V-5B Diffusers variant + QuantStack GGUFs (Q4_K_M / Q6_K / Q8_0) as permissive-licence, single-transformer drop-ins Runtime: - video_runtime PIPELINE_REGISTRY + _GGUF_VIDEO_TRANSFORMER_CLASSES route Wan2.2-TI2V-5B to WanPipeline / WanTransformer3DModel - image_runtime.ImageRuntimeStatus gains deviceMemoryGb, populated in probe() via helpers.gpu.get_device_vram_total_gb UI safety: - New assessImageGenerationSafety in utils/images.ts mirrors video heuristic with image-specific tweaks (×4 slab, /8 latent, no frames) - ImageStudioTab + VideoStudioTab gain danger-blocks-generate with explicit override checkbox that resets on variant / resolution / frame-count change - Removed frames field from video QUALITY_PRESETS — frames is clip length, not quality Tests: - test_video_runtime expected-set includes Wan2.2-TI2V-5B-Diffusers - pytest: 129 pass | vitest: 174 pass | tsc --noEmit clean
- Add Dependabot config for pip, cargo, and GitHub Actions (weekly/monthly) - Add pre-commit hooks: large-file check, merge-conflict, private-key detect, EOF/trailing-whitespace - Update Cargo.lock with security-relevant crate bumps (rustls 0.23.39, hyper-rustls 0.27.9, rustls-webpki 0.103.13, tokio 1.52.1) Vulnerability scanning delegated to Dependabot and GitHub native alerts rather than osv-scanner pre-commit hook, which was blocking commits on pre-existing unfixable transitive deps in the Tauri/GTK stack.
- video_runtime.probe: detect device memory without waiting for model
preload. Sysctl/nvidia-smi read is cheap and model-independent; the
prior gate left the frontend falling back to a hardcoded 16 GB MPS
default on 64 GB Macs.
- mlx_worker: split JSON protocol channel from stdout. Dup FD 1 into a
private sink, redirect FD 1 to stderr, rebind sys.stdout. mlx-lm's
print-to-stdout warnings no longer corrupt the response stream
("MLX worker returned invalid JSON: [WARNING] Generating with a
model that requires 48128 MB...").
- PerformancePreview: when "May not fit" triggers, show a targeted
advice line. Distinguishes cache-dominated (lower context / pick
compressed strategy) from weight-dominated (smaller model)
oversubscription.
- Add LongLive integration (FU-003) for NVlabs/LongLive-1.3B: CUDA-only install script, subprocess engine with torchrun dispatch, video_runtime routing for NVlabs/LongLive* repos, catalog entry (832x480, 30s default, 10GB), /api/video/longlive capabilities endpoint, setup route install action, and frontend probe + install callout. - Show clip length in seconds under frames+fps in Video Studio so users can eyeball duration without mental math. - Add Follow-Ups Tracker to CLAUDE.md capturing FU-001..FU-006 for deferred upstream work (turboquant 0.3.x, TriAttention MLX wiring, LongLive, SGLang, dflash-mlx pin verification). - Wire TriAttention MLX compressor plumbing in cache_compression (blocked on upstream API gap per FU-002). - Bump triattention/turboquant extras to git-pinned + 0.1.3 respectively. - Fix update-llama-turbo.sh default branch to feature/turboquant-kv-cache. - Tests: 14 new unit tests for frame math (Wan VAE 4x + block-align), install resolution, YAML render, probe states (Darwin/missing/ready), and dispatch routing.
The Video Discover Download button for LongLive was calling HF snapshot_download on `NVlabs/LongLive-1.3B` -- which is the GitHub org, not a Hugging Face repo -- so it failed with a misleading "not found on Hugging Face" error. Even the Studio tab install path, which knew about the real install script, was broken on Windows: it shelled out to `scripts/install-longlive.sh` and the generic FileNotFoundError handler was hardcoded to suggest "Install Homebrew first: https://brew.sh" -- wrong OS, wrong tool. Two fixes: Port the installer to Python (backend_service.longlive_installer) LongLive is CUDA-only (Windows + Linux), so shipping a bash-only installer was a dead end on the platforms that actually run it. The new module does everything the shell script did -- clone the NVlabs/LongLive repo, build an isolated venv, install pip deps (incl. best-effort flash-attn), snapshot_download the LongLive + Wan 2.1 base weights, write the ready marker last so partial installs don't read as complete. It handles the Windows vs POSIX venv python layout (Scripts/python.exe vs bin/python) and rejects macOS up front. Invoked as `python -m backend_service.longlive_installer` by the setup route, with --help / unknown-arg handling so an accidental CLI invocation doesn't kick off a 10-minute install. The legacy scripts/install-longlive.sh is deleted -- it was never staged into the embedded backend runtime anyway, which is a separate reason the Windows path couldn't find it. Also scoped the "Install Homebrew first" hint in the install-system-package route to only fire when cmd[0] == "brew", so the LongLive path on Windows no longer emits a macOS-only error. Install LongLive CTA in Video Discover The Download button for LongLive in Video Discover now becomes "Install LongLive" (matching the Studio tab's existing install affordance). VideoDiscoverTab probes the LongLive runtime status when the results include a LongLive variant, renders "Installed" vs "Not installed" based on that probe rather than the HF snapshot dir (which never exists for LongLive), and suppresses the stale "Download Failed" badge + Delete button since LongLive was never downloading through the HF pipeline in the first place. Tests * tests/test_longlive_installer.py -- platform guards, missing git, venv path layout for both OSes, CLI --help / unknown-arg behaviour. * tests/test_setup_routes.py -- new regression tests pinning the LongLive -> python-module wiring and the brew-error scoping.
Two follow-ups land alongside research tracker entries (FU-007..010
in CLAUDE.md): a working TeaCache scaffold for diffusion pipelines
and an Apple Silicon MLX video engine probe + UI surface.
TeaCache (FU-007) -- training-free DiT cache, Apache 2.0
---------------------------------------------------------
Extends the cache_compression registry beyond text. The base
CacheStrategy now declares applies_to() (default {"text"}); TeaCache
returns {"image", "video"} so the UI can surface it only in
diffusion contexts. The available() JSON payload gains an
"appliesTo" field so the frontend can filter.
A shared apply_diffusion_cache_strategy() helper in
cache_compression/__init__.py is invoked from both
DiffusersTextToImageEngine.generate (image_runtime.py) and
DiffusersVideoEngine.generate (video_runtime.py) just before pipeline
invoke. Falsy strategy id is a silent no-op; unknown id, domain
mismatch, and unsupported pipeline class all return a note string
without raising -- the engine continues with the stock pipeline.
Per-pipeline forward patches are deferred (cache_compression/
_teacache_patches/ vendoring lands incrementally). The strategy
ships with an empty _FORWARD_PATCHES dict; apply_diffusers_hook
raises NotImplementedError for any class until vendoring lands.
Validation order: rel_l1_thresh and num_inference_steps are checked
before transformer-class lookup so callers see ValueError for bad
args, NotImplementedError only for unsupported shapes.
ImageGenerationConfig and VideoGenerationConfig grow optional
cacheStrategy and cacheRelL1Thresh fields. UI surfacing of those
inputs is deferred -- the engine path is wired and tested first so
the strategy can be smoked via direct API calls before adding
controls.
mlx-video (FU-009) -- Apple Silicon video engine
-------------------------------------------------
backend_service/mlx_video_runtime.py adds MlxVideoEngine, a
subprocess-style wrapper around Blaizzy/mlx-video (MIT). Probe
gates on Darwin arm64 and importlib.util.find_spec("mlx_video"),
mirroring the MfluxImageEngine pattern. Generate raises
NotImplementedError pointing at FU-009 -- end-to-end generation
lands when the follow-up promotes from scaffold.
VideoRuntimeManager exposes mlx_video_capabilities() (lazy-
constructed like _longlive). New route GET /api/video/mlx-runtime
proxies it. supported_repos() lists the Wan2.1 / Wan2.2 / LTX-2
repos mlx-video covers natively so the Studio can decide chip
visibility without an extra round-trip.
Frontend wiring:
- src/api.ts -- getMlxVideoRuntime()
- src/hooks/useVideoState.ts -- mlxVideoStatus + handlers,
bundled into refreshVideoData() Promise.allSettled
- src/features/video/VideoStudioTab.tsx -- chip (warning when
pkg missing, subtle when scaffold-installed) + "Install
mlx-video" button gated on Apple Silicon via the probe's
expectedDevice/device fields
- src/App.tsx -- props passthrough
Install path is the existing /api/setup/install-package endpoint
with mlx-video added to _INSTALLABLE_PIP_PACKAGES.
Tracker + licences
------------------
CLAUDE.md follow-ups: FU-007 (TeaCache, in-flight), FU-008
(stable-diffusion.cpp, next cycle), FU-009 (mlx-video, scaffolded),
FU-010 (vllm-swift, watch-only).
THIRD_PARTY_NOTICES.md: TeaCache (Apache 2.0, vendored patches)
and mlx-video (MIT, optional Apple Silicon dep).
pyproject.toml: new diffusion-accel and mlx-video extras.
Tests
-----
- tests/test_teacache.py -- 27 tests covering registry, metadata,
availability badge, hook contract (raises for missing
transformer / unsupported class / bad threshold / zero steps),
text-engine refusal, helper swallow contract.
- tests/test_mlx_video.py -- 15 tests covering supported repo
set, platform gating (mocked Darwin/Linux/Intel Mac),
install probe, lifecycle (preload/unload), generate refuses,
manager surface + lazy construction.
- tests/test_video_routes.py -- two new tests pinning the
/api/video/mlx-runtime shape and that it delegates to
VideoRuntimeManager.mlx_video_capabilities (not the diffusers
probe).
Full sweep on Apple Silicon: 590 pytest pass + 1 skip, 174 vitest
pass / 13 files, npx tsc --noEmit clean.
cryptopoly
added a commit
that referenced
this pull request
Jun 1, 2026
…n-any-HF, connect presets)
Five local-AI-app parity features to close gaps vs Ollama / LM Studio,
each reusing existing infra rather than adding new heavy subsystems.
1. Out-of-box RAG — one-click nomic-embed-text-v1.5 install
(/api/setup/install-embedding-model) + /api/rag/status. Chat doc
panel shows vector vs lexical mode and offers the upgrade
(RagStatusBadge). Retrieval was silently lexical-only without a model.
2. Server "Connect your app" presets — base_url + Python/JS snippets +
Open WebUI / Continue.dev / Ollama presets in ServerTab.
3. Ollama-compatible API — /api/{chat,generate,tags,show,version,
embeddings,embed} layered over the existing OpenAI generation path,
translating SSE to NDJSON. Inherits auth + format->json_schema.
Unlocks Ollama-preset tools (Open WebUI, Continue, Raycast, n8n).
4. Import Ollama / LM Studio models by reference — scans the Ollama blob
store (manifest -> blob) and LM Studio cache, symlinks into a managed
imported-models dir (no re-download), auto-registers for library scan.
5. Run any Hugging Face repo — /api/models/resolve-hf classifies backend,
picks the GGUF file, and infers context + capabilities from the repo's
own metadata; loads with canonicalRepo set to bypass the FU-041
catalog fuzzy-match that mis-tagged off-catalog models (RunFromHuggingFace).
Tests: +42 backend (test_embedding_setup, test_hf_resolve, test_model_import,
+ Ollama shim cases in test_backend_service); vitest 453 green; tsc clean;
i18n 100%; full E2E suite 8/8 phases pass incl. new phase-0 checks.
Known follow-ups: stage llama-embedding binary for packaged builds (#1);
Windows symlink privilege (#4); raw-safetensors repos flagged vLLM/CUDA (#5).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.