Skip to content

issue: Vision capability check blocks follow-up messages after Image Generation #20129

@silentoplayz

Description

@silentoplayz

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.43

Ollama Version (if applicable)

N/A (I very recently swapped over to use llama.cpp server, but this should not matter)

Operating System

Ubuntu 24.04.3 LTS

Browser (if applicable)

Mozilla Firefox Snap for Ubuntu v146.0 (64-bit)

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

The model should handle follow-up messages gracefully even if an image was generated in a previous turn. Since "Vision" is disabled for the model, the system should either:

  1. Automatically exclude the previously generated image from the context sent to the model for the follow-up message.
  2. Allow the text-only follow-up to proceed without erroring on the presence of the image in history (assuming the backend or model can handle "invisible" image history or just text context). The user expects to continue the conversation after image generation without being blocked by a "not vision capable" error.

Actual Behavior

When attempting to send a follow-up message after an image has been generated in the chat, the application displays a toast notification error: "Model [Model Name] is not vision capable". This happens because the generated image is included in the conversation history, and the backend/frontend validation detects an image in the context while the model's vision capability is explicitly set to false, triggering the error catch.

Steps to Reproduce

  1. Edit a model via its edit page (Admin Settings -> Models).
  2. Toggle OFF the Vision capability for the model and save changes.
  3. Start a new chat and select this model.
  4. In the chat input integrations menu, toggle ON Image Generation.
  5. Send a query to trigger image generation (e.g., "Generate an image of a cat").
  6. Wait for the model to return the generated image.
  7. Turn OFF Image Generation (or leave it, the issue is present regardless as long as the previous image is in history).
  8. Send a text-only follow-up message (e.g., "Thanks, now tell me a joke").
  9. Observe: A notification toast appears at the top right: "Model [Name] is not vision capable", and the message is still sent to the model. The model still returns a response, either textual or with an image attached.

Logs & Screenshots

Error triggered in src/lib/components/chat/Chat.svelte:

// src/lib/components/chat/Chat.svelte around line 1740
if (hasImages && !(model.info?.meta?.capabilities?.vision ?? true)) {
    toast.error(
        $i18n.t('Model {{modelName}} is not vision capable', {
            modelName: model.name ?? model.id
        })
    );
}

The hasImages check scans the entire message history (via createMessagesList) and finds the generated image from the previous turn, causing the validation failure.

Additional Information

This error was initially reported by @Classic298, but it was requested that I open this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions