Skip to content

Feat/web chat multimodal#183

Merged
primoco merged 3 commits into
mainfrom
feat/web-chat-multimodal
Jun 9, 2026
Merged

Feat/web chat multimodal#183
primoco merged 3 commits into
mainfrom
feat/web-chat-multimodal

Conversation

@primoco

@primoco primoco commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

primoco added 3 commits June 9, 2026 07:03
… run

Windows CUDA's "Install CUDA toolkit" step swung 6-17 min because method:
network only caches the tiny network bootstrapper — running it re-fetches
~5 GB of packages from NVIDIA's CDN on every run (logs show
cuda_installer-...-x64_12.8.0.exe -s pulling packages live). The actual
compile is already warm (~5.5 min via sccache/S3); this download was the
real variance.

Switch to method: local (full ~3 GB offline installer). With
use-github-cache (default true) the full installer is cached in GitHub
Actions cache, so subsequent runs restore it locally and install offline —
no NVIDIA CDN round-trip. sccache stays on S3 and is untouched; only the
~3 GB installer shares the 10 GB GitHub cache with the rust-cache.

First run after this change still pays the one-time full-installer download;
runs after that should drop the step to ~2-4 min.
Compared our generate_multimodal against llama.cpp's reference mtmd-cli
(tools/mtmd/mtmd-cli.cpp). The reference sets `text.add_special = add_bos`
(true on the first turn), so it prepends the model's BOS token. We set
`add_special = !request.raw`, and the multimodal path always uses raw=true
(hand-built turn template) — so we were sending the prompt WITHOUT <bos>.

Gemma requires BOS. Without it the prompt is malformed and the model
degrades into "wall of text / line art / collage" confabulation — but only
when the image signal is weak. Strong landscapes survived the missing BOS
(which is why some images "worked"); portraits, people and dense/abstract
images tipped into garbage. This reproduced identically on CLI and web, and
on 0.6.0 — i.e. it was never a beta regression nor the web pipeline; it was
this one missing token in the shared core, present since multimodal landed.

Fix: add_special = true (always prepend BOS) for the mtmd tokenize, matching
the reference. parse_special stays true so the hand-built turn markers are
still recognised.
The headline of this beta: the missing-<bos> fix in the image prompt, which
reproduced the "wall of text / line art" misreads on CLI, web and 0.6.0
alike. Also carries the n_ubatch sizing, image_max_tokens override, WebP→PNG,
error surfacing, web audio, and the local CUDA installer CI change.

First release on `method: local` for the Windows CUDA toolkit — that step is
slow once (downloads + caches the full offline installer), fast afterwards.
@primoco primoco merged commit dc15e5a into main Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant