Feat/web chat multimodal by primoco · Pull Request #183 · eullm/eullm

primoco · 2026-06-09T08:55:15Z

No description provided.

… run Windows CUDA's "Install CUDA toolkit" step swung 6-17 min because method: network only caches the tiny network bootstrapper — running it re-fetches ~5 GB of packages from NVIDIA's CDN on every run (logs show cuda_installer-...-x64_12.8.0.exe -s pulling packages live). The actual compile is already warm (~5.5 min via sccache/S3); this download was the real variance. Switch to method: local (full ~3 GB offline installer). With use-github-cache (default true) the full installer is cached in GitHub Actions cache, so subsequent runs restore it locally and install offline — no NVIDIA CDN round-trip. sccache stays on S3 and is untouched; only the ~3 GB installer shares the 10 GB GitHub cache with the rust-cache. First run after this change still pays the one-time full-installer download; runs after that should drop the step to ~2-4 min.

Compared our generate_multimodal against llama.cpp's reference mtmd-cli (tools/mtmd/mtmd-cli.cpp). The reference sets `text.add_special = add_bos` (true on the first turn), so it prepends the model's BOS token. We set `add_special = !request.raw`, and the multimodal path always uses raw=true (hand-built turn template) — so we were sending the prompt WITHOUT <bos>. Gemma requires BOS. Without it the prompt is malformed and the model degrades into "wall of text / line art / collage" confabulation — but only when the image signal is weak. Strong landscapes survived the missing BOS (which is why some images "worked"); portraits, people and dense/abstract images tipped into garbage. This reproduced identically on CLI and web, and on 0.6.0 — i.e. it was never a beta regression nor the web pipeline; it was this one missing token in the shared core, present since multimodal landed. Fix: add_special = true (always prepend BOS) for the mtmd tokenize, matching the reference. parse_special stays true so the hand-built turn markers are still recognised.

The headline of this beta: the missing-<bos> fix in the image prompt, which reproduced the "wall of text / line art" misreads on CLI, web and 0.6.0 alike. Also carries the n_ubatch sizing, image_max_tokens override, WebP→PNG, error surfacing, web audio, and the local CUDA installer CI change. First release on `method: local` for the Windows CUDA toolkit — that step is slow once (downloads + caches the full offline installer), fast afterwards.

primoco added 3 commits June 9, 2026 07:03

primoco merged commit dc15e5a into main Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/web chat multimodal#183

Feat/web chat multimodal#183
primoco merged 3 commits into
mainfrom
feat/web-chat-multimodal

primoco commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

primoco commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant