[feat] Add detail in image_data by yuhyao · Pull Request #8596 · sgl-project/sglang

yuhyao · 2025-07-31T04:06:18Z

Motivation

In the OpenAI Chat Completion API, the image_url object can include a detail field to indicate the resolution at which the image will be processed, which in turn affects the length of the image embeddings. Anticipating that future models will make use of this detail field, it has been added to the data class GenerateReqInput.

Modifications

Modified generate_chat_conv so that the returned Conversation object’s image_data now includes a detail field.

Accuracy Test

Benchmark & Profiling

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

gemini-code-assist

Summary of Changes

Hello @yuhyao, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the system's capability to handle image data by integrating the detail field, which specifies the desired resolution for image processing. This involves updating core data structures, modifying conversation generation logic, and introducing a new dataclass to encapsulate image information, ensuring compatibility with advanced multimodal model features.

Highlights

Feature Enhancement: Added support for the detail field in image data, aligning with the OpenAI Chat Completion API's capability to specify image processing resolution (auto, low, high).
Data Model Update: The image_data field within the Conversation object has been updated to store ImageData objects instead of raw image URLs (strings), providing a richer structure for image information.
API Integration: The append_image method in Conversation now accepts the detail parameter, and the generate_chat_conv function has been modified to extract and pass this detail from incoming requests to ensure proper propagation.
Utility Refinement: A new ImageData dataclass has been introduced in sglang/srt/utils.py to encapsulate image URL and detail, and the load_image utility function has been updated to correctly process this new ImageData type.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a detail field for image data, aligning with the OpenAI API for specifying image processing resolution. The changes are well-contained and correctly implemented across the modified files. I have a couple of suggestions to improve maintainability and robustness.

gemini-code-assist · 2025-07-31T04:08:08Z

The append_image method can raise an AttributeError if self.image_data is None. Making the method itself more robust by handling the None case will prevent potential issues if it's used in other contexts.

Suggested change

def append_image(self, image: str, detail: Literal["auto", "low", "high"]):

"""Append a new image."""

self.image_data.append(image)

self.image_data.append(ImageData(url=image, detail=detail))

def append_image(self, image: str, detail: Literal["auto", "low", "high"]):

"""Append a new image."""

if self.image_data is None:

self.image_data = []

self.image_data.append(ImageData(url=image, detail=detail))

gemini-code-assist · 2025-07-31T04:08:08Z

For better maintainability and clarity, it's best to import ImageData directly from sglang.srt.utils where it is defined. This avoids confusion and potential circular dependency issues in the future.

Suggested change

from sglang.srt.managers.io_struct import ImageData

from sglang.srt.utils import ImageData

gemini-code-assist · 2025-07-31T04:08:08Z

Consider adding a more descriptive error message that includes the actual value of image_file for easier debugging.

Suggested change

raise ValueError(f"Invalid image: {image_file}")

raise ValueError(f"Invalid image: {type(image_file)=} {image_file=}")

JustinTong0323 · 2025-07-31T04:22:25Z

We'd better put it in io_struct?

Yeah I tried this but it looks like io_struct.py depends on utils.py and isinstance(image_file, ImageData) will cause circular import. Any suggestions?

JustinTong0323 · 2025-07-31T04:24:00Z

You may also need to modify the logic here:

sglang/python/sglang/srt/jinja_template_utils.py

Lines 142 to 147 in c6d59f2

    
           if chunk_type == "image_url": 
        
               image_data.append(chunk["image_url"]["url"]) 
        
               if chunk.get("modalities"): 
        
                   modalities.append(chunk.get("modalities")) 
        
               # Normalize to simple 'image' type for template compatibility 
        
               processed_content_parts.append({"type": "image"})

yuhyao · 2025-07-31T08:07:25Z

You may also need to modify the logic here:

sglang/python/sglang/srt/jinja_template_utils.py

Lines 142 to 147 in c6d59f2

if chunk_type == "image_url":

image_data.append(chunk["image_url"]["url"])

if chunk.get("modalities"):

modalities.append(chunk.get("modalities"))

# Normalize to simple 'image' type for template compatibility

processed_content_parts.append({"type": "image"})

Thanks, fixed.

JustinTong0323 · 2025-08-01T02:45:06Z

Hold on for a while, the VLM CI is somewhat unstable, would trigger CI after #8635 merge

JustinTong0323 · 2025-08-02T06:09:32Z

Check errors in CI

yuhyao · 2025-08-04T03:55:55Z

Check errors in CI

Thanks, I have updated my modification to process_content_for_template_format and also made a small change to test_jinja_template_utils.py a little bit (should be fine).

I also looked into the other CI errors, but they don’t seem to be caused by my PR (e.g., launch error, vllm installation error, etc.). I would appreciate it if you could help check those.

gemini-code-assist · 2025-08-05T06:01:40Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

yuhyao requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners July 31, 2025 04:06

gemini-code-assist Bot reviewed Jul 31, 2025

View reviewed changes

JustinTong0323 reviewed Jul 31, 2025

View reviewed changes

JustinTong0323 self-assigned this Jul 31, 2025

yuhyao added 3 commits July 31, 2025 21:16

add detail in image_data

98f77c9

Tidy

f7d37dc

add modification for jinja template.

c1d6415

yuhyao force-pushed the feat/image-detail branch from d9b0283 to c1d6415 Compare July 31, 2025 13:17

yuhyao changed the title ~~feat: add detail in image_data~~ [feat] Add detail in image_data Aug 1, 2025

Merge branch 'main' into feat/image-detail

c9a21e6

yuhyao added 2 commits August 4, 2025 11:33

Fix by making process_content_for_template_format robust.

d444285

Apply a small modification to test_jinja_template_utils.py.

d611f11

yuhyao added 2 commits August 4, 2025 16:08

Merge branch 'main' into feat/image-detail

a0efe9f

Merge branch 'main' into feat/image-detail

d7a859d

hnyls2002 merged commit 873f384 into sgl-project:main Aug 5, 2025
171 of 185 checks passed

narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025

[feat] Add detail in image_data (sgl-project#8596)

a1a2039

MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Sep 8, 2025

[feat] Add detail in image_data (sgl-project#8596)

3634327

-    def append_image(self, image: str, detail: Literal["auto", "low", "high"]):
-        """Append a new image."""
-        self.image_data.append(image)
-        self.image_data.append(ImageData(url=image, detail=detail))
+    def append_image(self, image: str, detail: Literal["auto", "low", "high"]):
+        """Append a new image."""
+        if self.image_data is None:
+            self.image_data = []
+        self.image_data.append(ImageData(url=image, detail=detail))

	from sglang.srt.managers.io_struct import ImageData
	from sglang.srt.utils import ImageData

	raise ValueError(f"Invalid image: {image_file}")
	raise ValueError(f"Invalid image: {type(image_file)=} {image_file=}")

Conversation

yuhyao commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

JustinTong0323 Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

yuhyao Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

JustinTong0323 commented Jul 31, 2025

Uh oh!

yuhyao commented Jul 31, 2025

Uh oh!

JustinTong0323 commented Aug 1, 2025

Uh oh!

JustinTong0323 commented Aug 2, 2025

Uh oh!

yuhyao commented Aug 4, 2025

Uh oh!

Uh oh!

gemini-code-assist Bot commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuhyao commented Jul 31, 2025 •

edited

Loading