Feat/web chat multimodal by primoco · Pull Request #181 · eullm/eullm

primoco · 2026-06-08T14:17:22Z

No description provided.

Tested on the 0.6.1-beta.1 build: rebuilding the projector per request did NOT change the failure. Dark / low-contrast / portrait images still degrade identically ("collage / vertical repetition", person not seen), while clear landscape photos still work. The failure is therefore deterministic by image type and lives upstream in the Gemma 4 projector/encoder (cf. llama.cpp multimodal issues), not in shared mtmd state on our side. Reverting restores the load-once projector and removes the ~150-300 ms per-request re-init cost that bought us nothing. The WebP→PNG conversion and the stream-error surfacing from the same beta stay — those are real fixes.

Root-caused the "hard images misread" behaviour: Gemma 4 (GEMMA4UV) is a dynamic-resolution vision model that caps an image at max 280 tokens (clip.cpp: set_limit_image_tokens(40, 280)). That low cap downscales images hard; on dark / low-contrast / small-subject photos the encoder loses the subject and the model confabulates ("collage / repeated text"). The author's own comment notes the model "performs quite poor with small images". The mtmd C API already exposes image_min_tokens / image_max_tokens and passes them to clip, but our vendored binding didn't surface them. This: * adds image_min_tokens / image_max_tokens to MtmdContextParams (+ both From impls), defaulting to -1 (= keep model metadata default), * wires EULLM_IMAGE_MAX_TOKENS / EULLM_IMAGE_MIN_TOKENS env overrides in init_mtmd_optional so the budget can be raised without recompiling. Lets us empirically test whether more resolution fixes the hard images. Caveat: very high values can OOM/crash some projectors (cf. llama.cpp#21550); raise in moderate steps (512, then 1024).

Second 0.6.1 pre-release. Over beta.1: * expose EULLM_IMAGE_MAX_TOKENS / EULLM_IMAGE_MIN_TOKENS to raise Gemma 4's 280-token image cap (the suspected cause of hard-image misreads), * revert the per-request MtmdContext experiment (no effect on the failure). WebP→PNG conversion, stream-error surfacing and web audio carried over.

primoco added 3 commits June 8, 2026 14:10

primoco merged commit 7574dcc into main Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/web chat multimodal#181

Feat/web chat multimodal#181
primoco merged 3 commits into
mainfrom
feat/web-chat-multimodal

primoco commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

primoco commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant