Skip to content

Feat/web chat multimodal#177

Merged
primoco merged 3 commits into
mainfrom
feat/web-chat-multimodal
Jun 6, 2026
Merged

Feat/web chat multimodal#177
primoco merged 3 commits into
mainfrom
feat/web-chat-multimodal

Conversation

@primoco

@primoco primoco commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

primoco added 3 commits June 6, 2026 20:13
End of "multimodal CLI-only": the HTTP /api/chat endpoint now accepts the
Ollama convention `{"role":"user", "content":"...", "images":[<base64>...]}`
and routes the request through `engine.generate_multimodal()`, the same
mtmd path that --image already used on the CLI.

Load side (api/mod.rs swap_model + main.rs cmd_run):
  * resolve `store.mmproj_path(model)` and pass it to InferenceConfig
    instead of the hard-coded `None`,
  * when an mmproj is present, force batch_size=0 (sequential engine).
    The continuous-batching scheduler is text-only — it does not route
    mtmd chunks — so multimodal models MUST load through InferenceEngine.
    Vision is interactive single-user anyway, so losing batching here is
    not a practical regression. Documented at both sites.

Dispatch side (api/routes.rs `chat` handler, feature-gated on `multimodal`):
  * `extract_multimodal_payload` pulls images from the last user message,
    accepting both raw base64 and `data:...;base64,...` prefixes,
  * `gemma_multimodal_prompt` wraps the user text in the Gemma chat
    template with `mtmd_default_marker()` placed inside the user turn
    (matches `run_multimodal_oneshot`),
  * `multimodal_to_channel` mirrors `sequential_to_channel` but calls
    `engine.generate_multimodal()` instead of `generate_streaming`,
  * the new branch runs BEFORE the text-path prompt builder; on a
    text-only engine it returns a 503 with an explicit message instead
    of silently dropping the images.

Scope (deliberate, MVP):
  * `/api/chat` only — `/api/generate` and `/v1/chat/completions`
    (OpenAI `image_url`) stay text-only for now,
  * Gemma chat template only — switch on `template.family()` when more
    vision-capable families land,
  * the multimodal turn ignores prior chat history (one-shot probe).
Round-trips the multimodal pipeline from the embedded chat:
* 📎 button beside the textarea opens a file picker (`accept="image/*"`),
* the selected image is read as a data URL, the base64 payload is
  stripped off the `data:image/...;base64,` prefix for the wire format
  and a thumbnail preview bar appears above the textarea until the
  user sends or removes it with ×,
* image-only turns are allowed (the model falls back to its default
  "describe this image" behaviour); fully empty submits are not,
* the user bubble renders the same thumbnail above the prompt so the
  conversation transcript stays self-contained.

Dispatch logic — `send()` picks the endpoint based on `pendingImage`:
* with image  → POST /api/chat (Ollama NDJSON), body shape
  `{ messages: [{ role:"user", content, images:[<base64>] }], stream:true }`.
  History is intentionally NOT replayed: the multimodal MVP is a one-shot
  probe (matching the backend), and re-sending old base64 bytes every
  turn would balloon the prompt context for nothing.
* without image → POST /v1/chat/completions (SSE), unchanged.
* the streaming loop handles both formats from the same `while/read`:
  NDJSON lines parse straight as JSON with `message.content`; SSE lines
  keep their `data:` prefix detection and `choices[0].delta.content`.

Attachment is cleared on send and on "clear conversation" so a stale
image can never leak into a later turn.
Releasing the web-chat multimodal MVP:
  * /api/chat routes the Ollama `images:[<base64>]` field through
    engine.generate_multimodal() when an mmproj is loaded,
  * loading a multimodal model now resolves the mmproj sibling from
    the store and forces sequential mode (the batching scheduler is
    text-only),
  * the embedded chat ships an attach-image button + thumbnail preview
    and dispatches multimodal turns over /api/chat (NDJSON) while
    text-only turns keep using /v1/chat/completions (SSE).
@primoco primoco merged commit ba62695 into main Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant