issue: Vision capability check blocks follow-up messages after Image Generation

### Check Existing Issues

- [x] I have searched for any existing and/or related issues.
- [x] I have searched for any existing and/or related discussions.
- [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
- [x] I am using the latest version of Open WebUI.

### Installation Method

Docker

### Open WebUI Version

v0.6.43

### Ollama Version (if applicable)

N/A (I very recently swapped over to use llama.cpp server, but this should not matter)

### Operating System

Ubuntu 24.04.3 LTS

### Browser (if applicable)

Mozilla Firefox Snap for Ubuntu v146.0 (64-bit)

### Confirmation

- [x] I have read and followed all instructions in `README.md`.
- [x] I am using the latest version of **both** Open WebUI and Ollama.
- [x] I have included the browser console logs.
- [x] I have included the Docker container logs.
- [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.**
- [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
- [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps:
- Start with the initial platform/version/OS and dependencies used,
- Specify exact install/launch/configure commands,
- List URLs visited, user input (incl. example values/emails/passwords if needed),
- Describe all options and toggles enabled or changed,
- Include any files or environmental changes,
- Identify the expected and actual result at each stage,
- Ensure any reasonably skilled user can follow and hit the same issue.


### Expected Behavior

The model should handle follow-up messages gracefully even if an image was generated in a previous turn. Since "Vision" is disabled for the model, the system should either:

1. Automatically exclude the previously generated image from the context sent to the model for the follow-up message.
3. Allow the text-only follow-up to proceed without erroring on the presence of the image in history (assuming the backend or model can handle "invisible" image history or just text context). The user expects to continue the conversation after image generation without being blocked by a "not vision capable" error.

### Actual Behavior

When attempting to send a follow-up message after an image has been generated in the chat, the application displays a toast notification error: "Model [Model Name] is not vision capable". This happens because the generated image is included in the conversation history, and the backend/frontend validation detects an image in the context while the model's vision capability is explicitly set to false, triggering the error catch.

### Steps to Reproduce

1. Edit a model via its edit page (Admin Settings -> Models).
2. Toggle `OFF` the `Vision` capability for the model and save changes.
3. Start a new chat and select this model.
4. In the chat input integrations menu, toggle `ON` Image Generation.
5. Send a query to trigger image generation (e.g., "Generate an image of a cat").
6. Wait for the model to return the generated image.
7. Turn `OFF` Image Generation (or leave it, the issue is present regardless as long as the previous image is in history).
8. Send a text-only follow-up message (e.g., "Thanks, now tell me a joke").
9. **Observe:** A notification toast appears at the top right: "Model [Name] is not vision capable", and the message is still sent to the model. The model still returns a response, either textual or with an image attached.

### Logs & Screenshots

Error triggered in `src/lib/components/chat/Chat.svelte`:

```js
// src/lib/components/chat/Chat.svelte around line 1740
if (hasImages && !(model.info?.meta?.capabilities?.vision ?? true)) {
    toast.error(
        $i18n.t('Model {{modelName}} is not vision capable', {
            modelName: model.name ?? model.id
        })
    );
}
```

The `hasImages` check scans the entire message history (via `createMessagesList`) and finds the generated image from the previous turn, causing the validation failure.

### Additional Information

This error was initially reported by @Classic298, but it was requested that I open this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

issue: Vision capability check blocks follow-up messages after Image Generation #20129

Check Existing Issues

Installation Method

Open WebUI Version

Ollama Version (if applicable)

Operating System

Browser (if applicable)

Confirmation

Expected Behavior

Actual Behavior

Steps to Reproduce

Logs & Screenshots

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

issue: Vision capability check blocks follow-up messages after Image Generation #20129

Description

Check Existing Issues

Installation Method

Open WebUI Version

Ollama Version (if applicable)

Operating System

Browser (if applicable)

Confirmation

Expected Behavior

Actual Behavior

Steps to Reproduce

Logs & Screenshots

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions