Skip to content

Feat/web chat multimodal#180

Merged
primoco merged 2 commits into
mainfrom
feat/web-chat-multimodal
Jun 8, 2026
Merged

Feat/web chat multimodal#180
primoco merged 2 commits into
mainfrom
feat/web-chat-multimodal

Conversation

@primoco

@primoco primoco commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

primoco added 2 commits June 8, 2026 10:28
…text per request

Three independent correctness fixes for the web multimodal path, all
verified against today's QA logs:

1) WebP / AVIF / HEIC images (ui/app.js)
   mtmd decodes via stb_image (jpg/png/bmp/gif only). WebP reached the
   server as `mtmd_helper_bitmap_init_from_buf: failed to decode image
   bytes`, the stream ended at 0 chunks, and the model then hallucinated.
   The picker now re-encodes any non-stb format to PNG in-browser via a
   canvas, transparently. HEIC (which the browser itself can't decode)
   gets an actionable message instead of silence.

2) Surface backend stream errors (ui/app.js)
   The server already emits `{"error": "..."}` as an NDJSON line, but the
   UI only looked for `message.content`, so decode failures showed as an
   empty "0 chunks" reply that looked like a hallucination. The stream
   loop now renders the error in the assistant bubble and stops.

3) EXPERIMENT — rebuild MtmdContext per request (inference/mod.rs)
   The image is encoded every turn (logs show 273/260 tokens) yet the
   model is intermittently "blind" and confabulates (repeated-text in
   Chinese/Cyrillic). The only mutable state shared across the otherwise
   fresh per-request LlamaContext is the long-lived MtmdContext, so we
   now rebuild it per request to rule that out. If the flakiness
   disappears the bug was shared mtmd state; if not, it points upstream
   (mtmd_helper_eval_chunks / SWA KV-position handling, llama.cpp#17930)
   and we stop chasing it from our side.
Pre-release to validate the 0.6.1 candidates on official CUDA binaries:
* web chat accepts audio files (mtmd auto-detects),
* WebP/AVIF/HEIC images re-encoded to PNG in-browser,
* backend decode errors surfaced in the chat instead of a silent 0-chunk reply,
* EXPERIMENT: MtmdContext rebuilt per request to test the non-deterministic
  "image not seen / hallucinated" behaviour.

Marked beta so 0.6.0 stays the latest stable; the experiment's fate (keep
or revert) is decided after runtime testing on this build.
@primoco primoco merged commit 677df9d into main Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant