Support image parts in Ollama chat transformer#1610
Conversation
Greptile SummaryThis PR extends the Ollama outbound transformer to forward image parts from OpenAI-style multipart messages into Ollama's
Confidence Score: 3/5The data-URL stripping path works correctly, but plain HTTP/HTTPS image URLs are forwarded verbatim to Ollama, which only accepts base64 — those images will silently not reach the model. The data-URL extraction is correct and the test for it is solid. However, the plain-URL code path passes HTTPS image URLs directly into Ollama's
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[llm.Request message part] --> B{part.Type == image_url?}
B -- No --> C[skip]
B -- Yes --> D{URL starts with data:?}
D -- Yes --> E[find comma index]
E --> F{comma found?}
F -- Yes --> G[strip prefix → raw base64]
F -- No --> H[pass as-is ⚠️]
D -- No --> I[plain HTTP/HTTPS URL\npassed as-is ⚠️]
G --> J[append to images slice]
H --> J
I --> J
J --> K[ChatMessage.Images sent to Ollama REST API]
K --> L{Ollama REST API}
L -- raw base64 ✅ --> M[model receives image]
L -- plain URL ❌ --> N[Ollama ignores / rejects image]
Reviews (1): Last reviewed commit: "Support image parts in Ollama chat trans..." | Re-trigger Greptile |
| if comma := strings.Index(image, ","); strings.HasPrefix(image, "data:") && comma >= 0 { | ||
| image = image[comma+1:] | ||
| } | ||
|
|
||
| if image != "" { | ||
| images = append(images, image) | ||
| } |
There was a problem hiding this comment.
Plain HTTP/HTTPS URLs passed to Ollama's REST API will silently fail
The Ollama REST API's images field only accepts raw base64-encoded image data — not plain HTTP/HTTPS URLs. When a non-data: URL (e.g., https://example.com/image.png) is provided, the code passes it through unchanged into the images array. Ollama will either reject the request or silently ignore the image, so the model receives no visual input despite the caller expecting it to. The Ollama docs and the linked cline bug report (issue #10368) both confirm this: the REST API requires base64, while only the language SDKs download URLs automatically. Either fetch and base64-encode non-data URLs, or return an explicit error so callers know the image was not usable.
There was a problem hiding this comment.
Code Review
This pull request adds support for images in the Ollama outbound transformer by introducing an "Images" field to the ChatMessage struct and implementing logic to extract image data from message content. Feedback indicates that the current implementation incorrectly allows remote HTTP(S) URLs, which Ollama's API does not support; these should be filtered or converted. Furthermore, the reviewer suggested refactoring the transformation logic to process text and images in a single pass to avoid redundant iterations over the message content.
| func getImages(content llm.MessageContent) []string { | ||
| if len(content.MultipleContent) == 0 { | ||
| return nil | ||
| } | ||
|
|
||
| images := make([]string, 0) | ||
|
|
||
| for _, part := range content.MultipleContent { | ||
| if part.Type != "image_url" || part.ImageURL == nil { | ||
| continue | ||
| } | ||
|
|
||
| image := strings.TrimSpace(part.ImageURL.URL) | ||
| if image == "" { | ||
| continue | ||
| } | ||
|
|
||
| if comma := strings.Index(image, ","); strings.HasPrefix(image, "data:") && comma >= 0 { | ||
| image = image[comma+1:] | ||
| } | ||
|
|
||
| if image != "" { | ||
| images = append(images, image) | ||
| } | ||
| } | ||
|
|
||
| return images | ||
| } |
There was a problem hiding this comment.
Ollama's /api/chat endpoint only supports base64-encoded images in the images field and does not support remote HTTP(S) URLs. The current implementation passes remote URLs through as-is, which will cause Ollama to return an error. I've suggested an updated implementation that filters out remote URLs and uses a more idiomatic nil slice initialization.
References
- The transformer should ensure that the output format strictly adheres to the target provider's API requirements to avoid runtime errors.
| func TestTransformRequestPreservesPlainImageURLParts(t *testing.T) { | ||
| transformer, err := NewOutboundTransformerWithConfig(&Config{ | ||
| BaseURL: "http://localhost:11434", | ||
| }) | ||
| require.NoError(t, err) | ||
|
|
||
| req, err := transformer.TransformRequest(context.Background(), &llm.Request{ | ||
| Model: "vision-model", | ||
| Messages: []llm.Message{ | ||
| { | ||
| Role: "user", | ||
| Content: llm.MessageContent{ | ||
| MultipleContent: []llm.MessageContentPart{ | ||
| { | ||
| Type: "image_url", | ||
| ImageURL: &llm.ImageURL{ | ||
| URL: "https://example.com/image.png", | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| }) | ||
| require.NoError(t, err) | ||
|
|
||
| var got ChatRequest | ||
| require.NoError(t, json.Unmarshal(req.Body, &got)) | ||
| require.Len(t, got.Messages, 1) | ||
| require.Equal(t, []string{"https://example.com/image.png"}, got.Messages[0].Images) | ||
| } |
There was a problem hiding this comment.
This test verifies that plain HTTP URLs are preserved and passed to Ollama. However, Ollama's API does not support remote URLs in the images field (it expects base64 data). This test should be updated to reflect that remote URLs are either filtered out or should be converted to base64 before reaching the transformer.
References
- Tests should verify correct behavior according to the target system's specifications.
| for _, msg := range llmReq.Messages { | ||
| content := getContentString(msg.Content) | ||
| ollamaReq.Messages = append(ollamaReq.Messages, ChatMessage{ | ||
| Role: msg.Role, | ||
| Content: content, | ||
| Content: getContentString(msg.Content), | ||
| Images: getImages(msg.Content), | ||
| }) | ||
| } |
There was a problem hiding this comment.
The message content is iterated twice: once in getContentString and once in getImages. While not a major performance issue for typical message sizes, it could be refactored to process both text and image parts in a single pass for better efficiency and maintainability.
References
- Avoid redundant iterations over the same data structure when multiple pieces of information can be extracted in a single pass.
Co-authored-by: zhzy0077 <zhzy0077@users.noreply.github.com>
Summary
Testing