Skip to content

Support image parts in Ollama chat transformer#1610

Merged
looplj merged 1 commit into
looplj:unstablefrom
zhzy0077:support-ollama-image-parts
May 8, 2026
Merged

Support image parts in Ollama chat transformer#1610
looplj merged 1 commit into
looplj:unstablefrom
zhzy0077:support-ollama-image-parts

Conversation

@zhzy0077

@zhzy0077 zhzy0077 commented May 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • pass OpenAI-style image_url message parts through to Ollama chat requests as message images
  • strip data URL prefixes so Ollama receives raw base64 image payloads
  • add transformer coverage for data URL and plain image URL image parts

Testing

  • Not run: Go toolchain is not installed in this environment.

@greptile-apps

greptile-apps Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR extends the Ollama outbound transformer to forward image parts from OpenAI-style multipart messages into Ollama's images field, correctly stripping data:…;base64, prefixes so Ollama receives raw base64 payloads.

  • getImages() helper iterates MultipleContent parts, skips non-image entries, and strips data-URL prefixes before appending to the slice sent in ChatMessage.Images.
  • data: URL handling is correct and well-tested, but plain HTTP/HTTPS URLs are forwarded verbatim — the Ollama REST API accepts only base64-encoded image data, so those requests will silently deliver no image to the model.
  • The plain-URL test (TestTransformRequestPreservesPlainImageURLParts) asserts the broken pass-through behaviour rather than catching it.

Confidence Score: 3/5

The data-URL stripping path works correctly, but plain HTTP/HTTPS image URLs are forwarded verbatim to Ollama, which only accepts base64 — those images will silently not reach the model.

The data-URL extraction is correct and the test for it is solid. However, the plain-URL code path passes HTTPS image URLs directly into Ollama's images field without fetching or encoding them, and the accompanying test validates this incorrect behaviour rather than guarding against it. Users who pass standard image URLs expecting vision model results will get silent no-ops from Ollama's REST API.

llm/transformer/ollama/outbound.go — specifically the plain-URL branch inside getImages(), and llm/transformer/ollama/outbound_test.go where the plain-URL test encodes the broken behaviour as expected output.

Important Files Changed

Filename Overview
llm/transformer/ollama/outbound.go Adds Images field to ChatMessage and getImages() helper that strips data: prefixes correctly, but passes plain HTTP/HTTPS image URLs directly to Ollama's REST API, which only accepts raw base64 — those requests will silently produce no image input.
llm/transformer/ollama/outbound_test.go New test file covering data-URL stripping (correct) and plain URL pass-through (tests the broken behavior — asserting https://example.com/image.png lands verbatim in the images array, which Ollama's REST API would reject).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[llm.Request message part] --> B{part.Type == image_url?}
    B -- No --> C[skip]
    B -- Yes --> D{URL starts with data:?}
    D -- Yes --> E[find comma index]
    E --> F{comma found?}
    F -- Yes --> G[strip prefix → raw base64]
    F -- No --> H[pass as-is ⚠️]
    D -- No --> I[plain HTTP/HTTPS URL\npassed as-is ⚠️]
    G --> J[append to images slice]
    H --> J
    I --> J
    J --> K[ChatMessage.Images sent to Ollama REST API]
    K --> L{Ollama REST API}
    L -- raw base64 ✅ --> M[model receives image]
    L -- plain URL ❌ --> N[Ollama ignores / rejects image]
Loading

Reviews (1): Last reviewed commit: "Support image parts in Ollama chat trans..." | Re-trigger Greptile

Comment on lines +197 to +203
if comma := strings.Index(image, ","); strings.HasPrefix(image, "data:") && comma >= 0 {
image = image[comma+1:]
}

if image != "" {
images = append(images, image)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Plain HTTP/HTTPS URLs passed to Ollama's REST API will silently fail

The Ollama REST API's images field only accepts raw base64-encoded image data — not plain HTTP/HTTPS URLs. When a non-data: URL (e.g., https://example.com/image.png) is provided, the code passes it through unchanged into the images array. Ollama will either reject the request or silently ignore the image, so the model receives no visual input despite the caller expecting it to. The Ollama docs and the linked cline bug report (issue #10368) both confirm this: the REST API requires base64, while only the language SDKs download URLs automatically. Either fetch and base64-encode non-data URLs, or return an explicit error so callers know the image was not usable.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for images in the Ollama outbound transformer by introducing an "Images" field to the ChatMessage struct and implementing logic to extract image data from message content. Feedback indicates that the current implementation incorrectly allows remote HTTP(S) URLs, which Ollama's API does not support; these should be filtered or converted. Furthermore, the reviewer suggested refactoring the transformation logic to process text and images in a single pass to avoid redundant iterations over the message content.

Comment on lines +180 to +207
func getImages(content llm.MessageContent) []string {
if len(content.MultipleContent) == 0 {
return nil
}

images := make([]string, 0)

for _, part := range content.MultipleContent {
if part.Type != "image_url" || part.ImageURL == nil {
continue
}

image := strings.TrimSpace(part.ImageURL.URL)
if image == "" {
continue
}

if comma := strings.Index(image, ","); strings.HasPrefix(image, "data:") && comma >= 0 {
image = image[comma+1:]
}

if image != "" {
images = append(images, image)
}
}

return images
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Ollama's /api/chat endpoint only supports base64-encoded images in the images field and does not support remote HTTP(S) URLs. The current implementation passes remote URLs through as-is, which will cause Ollama to return an error. I've suggested an updated implementation that filters out remote URLs and uses a more idiomatic nil slice initialization.

References
  1. The transformer should ensure that the output format strictly adheres to the target provider's API requirements to avoid runtime errors.

Comment on lines +51 to +81
func TestTransformRequestPreservesPlainImageURLParts(t *testing.T) {
transformer, err := NewOutboundTransformerWithConfig(&Config{
BaseURL: "http://localhost:11434",
})
require.NoError(t, err)

req, err := transformer.TransformRequest(context.Background(), &llm.Request{
Model: "vision-model",
Messages: []llm.Message{
{
Role: "user",
Content: llm.MessageContent{
MultipleContent: []llm.MessageContentPart{
{
Type: "image_url",
ImageURL: &llm.ImageURL{
URL: "https://example.com/image.png",
},
},
},
},
},
},
})
require.NoError(t, err)

var got ChatRequest
require.NoError(t, json.Unmarshal(req.Body, &got))
require.Len(t, got.Messages, 1)
require.Equal(t, []string{"https://example.com/image.png"}, got.Messages[0].Images)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test verifies that plain HTTP URLs are preserved and passed to Ollama. However, Ollama's API does not support remote URLs in the images field (it expects base64 data). This test should be updated to reflect that remote URLs are either filtered out or should be converted to base64 before reaching the transformer.

References
  1. Tests should verify correct behavior according to the target system's specifications.

Comment on lines 109 to 115
for _, msg := range llmReq.Messages {
content := getContentString(msg.Content)
ollamaReq.Messages = append(ollamaReq.Messages, ChatMessage{
Role: msg.Role,
Content: content,
Content: getContentString(msg.Content),
Images: getImages(msg.Content),
})
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The message content is iterated twice: once in getContentString and once in getImages. While not a major performance issue for typical message sizes, it could be refactored to process both text and image parts in a single pass for better efficiency and maintainability.

References
  1. Avoid redundant iterations over the same data structure when multiple pieces of information can be extracted in a single pass.

@looplj looplj merged commit bf3a06a into looplj:unstable May 8, 2026
4 checks passed
looplj pushed a commit that referenced this pull request May 8, 2026
Co-authored-by: zhzy0077 <zhzy0077@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants