Skip to content

chore(engine): 0.6.1 — multimodal vision fix (BOS) promoted to stable#184

Merged
primoco merged 1 commit into
mainfrom
feat/web-chat-multimodal
Jun 9, 2026
Merged

chore(engine): 0.6.1 — multimodal vision fix (BOS) promoted to stable#184
primoco merged 1 commit into
mainfrom
feat/web-chat-multimodal

Conversation

@primoco

@primoco primoco commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Vision is now correct on all images (portraits, vertical/dark shots, not just easy landscapes). Root cause was the missing token in the shared mtmd tokenize path (fixed in 0698e2c): Gemma requires BOS, and generate_multimodal sent the hand-built prompt without it, degrading the model into 'wall of text / line art' confabulation on harder subjects. Confirmed by an A/B of released binaries — beta.3 (no BOS) hallucinates, beta.4 (BOS) sees — on both the web/API and CLI one-shot paths.

Also refresh two stale docs: the CLI --image help (the HTTP API does route media now) and the E4B catalog blurb (the 12B gemma4uv projector is served by the vendored llama-cpp-rs, no upstream bump pending).

Vision is now correct on all images (portraits, vertical/dark shots,
not just easy landscapes). Root cause was the missing <bos> token in the
shared mtmd tokenize path (fixed in 0698e2c): Gemma requires BOS, and
generate_multimodal sent the hand-built prompt without it, degrading the
model into 'wall of text / line art' confabulation on harder subjects.
Confirmed by an A/B of released binaries — beta.3 (no BOS) hallucinates,
beta.4 (BOS) sees — on both the web/API and CLI one-shot paths.

Also refresh two stale docs: the CLI --image help (the HTTP API does
route media now) and the E4B catalog blurb (the 12B gemma4uv projector
is served by the vendored llama-cpp-rs, no upstream bump pending).
@primoco primoco merged commit 738aa87 into main Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant