[feat] Add detail in image_data#8596
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @yuhyao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request enhances the system's capability to handle image data by integrating the detail field, which specifies the desired resolution for image processing. This involves updating core data structures, modifying conversation generation logic, and introducing a new dataclass to encapsulate image information, ensuring compatibility with advanced multimodal model features.
Highlights
- Feature Enhancement: Added support for the
detailfield in image data, aligning with the OpenAI Chat Completion API's capability to specify image processing resolution (auto, low, high). - Data Model Update: The
image_datafield within theConversationobject has been updated to storeImageDataobjects instead of raw image URLs (strings), providing a richer structure for image information. - API Integration: The
append_imagemethod inConversationnow accepts thedetailparameter, and thegenerate_chat_convfunction has been modified to extract and pass this detail from incoming requests to ensure proper propagation. - Utility Refinement: A new
ImageDatadataclass has been introduced insglang/srt/utils.pyto encapsulate image URL and detail, and theload_imageutility function has been updated to correctly process this newImageDatatype.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request introduces a detail field for image data, aligning with the OpenAI API for specifying image processing resolution. The changes are well-contained and correctly implemented across the modified files. I have a couple of suggestions to improve maintainability and robustness.
There was a problem hiding this comment.
The append_image method can raise an AttributeError if self.image_data is None. Making the method itself more robust by handling the None case will prevent potential issues if it's used in other contexts.
| def append_image(self, image: str, detail: Literal["auto", "low", "high"]): | |
| """Append a new image.""" | |
| self.image_data.append(image) | |
| self.image_data.append(ImageData(url=image, detail=detail)) | |
| def append_image(self, image: str, detail: Literal["auto", "low", "high"]): | |
| """Append a new image.""" | |
| if self.image_data is None: | |
| self.image_data = [] | |
| self.image_data.append(ImageData(url=image, detail=detail)) |
There was a problem hiding this comment.
For better maintainability and clarity, it's best to import ImageData directly from sglang.srt.utils where it is defined. This avoids confusion and potential circular dependency issues in the future.
| from sglang.srt.managers.io_struct import ImageData | |
| from sglang.srt.utils import ImageData |
There was a problem hiding this comment.
We'd better put it in io_struct?
There was a problem hiding this comment.
Yeah I tried this but it looks like io_struct.py depends on utils.py and isinstance(image_file, ImageData) will cause circular import. Any suggestions?
|
You may also need to modify the logic here: sglang/python/sglang/srt/jinja_template_utils.py Lines 142 to 147 in c6d59f2 |
Thanks, fixed. |
d9b0283 to
c1d6415
Compare
|
Hold on for a while, the VLM CI is somewhat unstable, would trigger CI after #8635 merge |
|
Check errors in CI |
Thanks, I have updated my modification to I also looked into the other CI errors, but they don’t seem to be caused by my PR (e.g., launch error, vllm installation error, etc.). I would appreciate it if you could help check those. |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Motivation
In the OpenAI Chat Completion API, the
image_urlobject can include adetailfield to indicate the resolution at which the image will be processed, which in turn affects the length of the image embeddings. Anticipating that future models will make use of this detail field, it has been added to the data classGenerateReqInput.Modifications
Modified
generate_chat_convso that the returnedConversationobject’simage_datanow includes adetailfield.Accuracy Test
Benchmark & Profiling
Checklist