Feat/web chat multimodal#177
Merged
Merged
Conversation
End of "multimodal CLI-only": the HTTP /api/chat endpoint now accepts the
Ollama convention `{"role":"user", "content":"...", "images":[<base64>...]}`
and routes the request through `engine.generate_multimodal()`, the same
mtmd path that --image already used on the CLI.
Load side (api/mod.rs swap_model + main.rs cmd_run):
* resolve `store.mmproj_path(model)` and pass it to InferenceConfig
instead of the hard-coded `None`,
* when an mmproj is present, force batch_size=0 (sequential engine).
The continuous-batching scheduler is text-only — it does not route
mtmd chunks — so multimodal models MUST load through InferenceEngine.
Vision is interactive single-user anyway, so losing batching here is
not a practical regression. Documented at both sites.
Dispatch side (api/routes.rs `chat` handler, feature-gated on `multimodal`):
* `extract_multimodal_payload` pulls images from the last user message,
accepting both raw base64 and `data:...;base64,...` prefixes,
* `gemma_multimodal_prompt` wraps the user text in the Gemma chat
template with `mtmd_default_marker()` placed inside the user turn
(matches `run_multimodal_oneshot`),
* `multimodal_to_channel` mirrors `sequential_to_channel` but calls
`engine.generate_multimodal()` instead of `generate_streaming`,
* the new branch runs BEFORE the text-path prompt builder; on a
text-only engine it returns a 503 with an explicit message instead
of silently dropping the images.
Scope (deliberate, MVP):
* `/api/chat` only — `/api/generate` and `/v1/chat/completions`
(OpenAI `image_url`) stay text-only for now,
* Gemma chat template only — switch on `template.family()` when more
vision-capable families land,
* the multimodal turn ignores prior chat history (one-shot probe).
Round-trips the multimodal pipeline from the embedded chat:
* 📎 button beside the textarea opens a file picker (`accept="image/*"`),
* the selected image is read as a data URL, the base64 payload is
stripped off the `data:image/...;base64,` prefix for the wire format
and a thumbnail preview bar appears above the textarea until the
user sends or removes it with ×,
* image-only turns are allowed (the model falls back to its default
"describe this image" behaviour); fully empty submits are not,
* the user bubble renders the same thumbnail above the prompt so the
conversation transcript stays self-contained.
Dispatch logic — `send()` picks the endpoint based on `pendingImage`:
* with image → POST /api/chat (Ollama NDJSON), body shape
`{ messages: [{ role:"user", content, images:[<base64>] }], stream:true }`.
History is intentionally NOT replayed: the multimodal MVP is a one-shot
probe (matching the backend), and re-sending old base64 bytes every
turn would balloon the prompt context for nothing.
* without image → POST /v1/chat/completions (SSE), unchanged.
* the streaming loop handles both formats from the same `while/read`:
NDJSON lines parse straight as JSON with `message.content`; SSE lines
keep their `data:` prefix detection and `choices[0].delta.content`.
Attachment is cleared on send and on "clear conversation" so a stale
image can never leak into a later turn.
Releasing the web-chat multimodal MVP:
* /api/chat routes the Ollama `images:[<base64>]` field through
engine.generate_multimodal() when an mmproj is loaded,
* loading a multimodal model now resolves the mmproj sibling from
the store and forces sequential mode (the batching scheduler is
text-only),
* the embedded chat ships an attach-image button + thumbnail preview
and dispatches multimodal turns over /api/chat (NDJSON) while
text-only turns keep using /v1/chat/completions (SSE).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.