Feat/web chat multimodal#180
Merged
Merged
Conversation
…text per request
Three independent correctness fixes for the web multimodal path, all
verified against today's QA logs:
1) WebP / AVIF / HEIC images (ui/app.js)
mtmd decodes via stb_image (jpg/png/bmp/gif only). WebP reached the
server as `mtmd_helper_bitmap_init_from_buf: failed to decode image
bytes`, the stream ended at 0 chunks, and the model then hallucinated.
The picker now re-encodes any non-stb format to PNG in-browser via a
canvas, transparently. HEIC (which the browser itself can't decode)
gets an actionable message instead of silence.
2) Surface backend stream errors (ui/app.js)
The server already emits `{"error": "..."}` as an NDJSON line, but the
UI only looked for `message.content`, so decode failures showed as an
empty "0 chunks" reply that looked like a hallucination. The stream
loop now renders the error in the assistant bubble and stops.
3) EXPERIMENT — rebuild MtmdContext per request (inference/mod.rs)
The image is encoded every turn (logs show 273/260 tokens) yet the
model is intermittently "blind" and confabulates (repeated-text in
Chinese/Cyrillic). The only mutable state shared across the otherwise
fresh per-request LlamaContext is the long-lived MtmdContext, so we
now rebuild it per request to rule that out. If the flakiness
disappears the bug was shared mtmd state; if not, it points upstream
(mtmd_helper_eval_chunks / SWA KV-position handling, llama.cpp#17930)
and we stop chasing it from our side.
Pre-release to validate the 0.6.1 candidates on official CUDA binaries: * web chat accepts audio files (mtmd auto-detects), * WebP/AVIF/HEIC images re-encoded to PNG in-browser, * backend decode errors surfaced in the chat instead of a silent 0-chunk reply, * EXPERIMENT: MtmdContext rebuilt per request to test the non-deterministic "image not seen / hallucinated" behaviour. Marked beta so 0.6.0 stays the latest stable; the experiment's fate (keep or revert) is decided after runtime testing on this build.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.