[Feature][MM] split the images of one request into multiparts by XucSh · Pull Request #11828 · sgl-project/sglang

XucSh · 2025-10-19T12:52:40Z

Motivation

[2025-10-19 19:44:46] Prefill batch. #new-seq: 1, #new-token: 328, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-10-19 19:44:46] INFO: 127.0.0.1:59486 - "POST /generate HTTP/1.1" 200 OK
[2025-10-19 19:44:46] Prefill batch. #new-seq: 1, #new-token: 327, #cached-token: 328, token usage: 0.00, #running-req: 0, #queue-req: 0,
[2025-10-19 19:44:46] INFO: 127.0.0.1:59498 - "POST /generate HTTP/1.1" 200 OK

Cc @stmatengss @ByronHsu

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

gemini-code-assist · 2025-10-19T12:53:02Z

Summary of Changes

Hello @XucSh, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug related to the handling of multiple multimodal inputs, specifically images, within a single request. The core changes involve introducing logic to decompose bundled multimodal data into individual items, which then allows for more granular and correct processing. This includes applying unique token padding per item and caching embeddings on an individual item basis, ultimately enhancing the system's ability to manage complex multimodal requests efficiently and accurately.

Highlights

Image Splitting Logic: Implemented a new mechanism within MultimodalInputs.from_dict to automatically split bundled multimodal items (e.g., multiple images represented as a single entity) into distinct individual items. This ensures that each image is processed independently.
Refined Token Padding for Individual Items: Modified the pad_input_tokens function to apply specific pad values to placeholder tokens for each individual multimodal item. This change ensures correct tokenization and padding when a request contains multiple distinct items of the same modality.
Per-Item Embedding Caching: Updated the _get_chunked_prefill_embedding function to compute and cache embeddings for individual multimodal items rather than the entire request. This significantly improves caching efficiency and accuracy, especially for multi-image inputs, by avoiding recomputing embeddings for previously seen individual images.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant improvement by splitting bundled multi-image requests into individual parts. This enables per-image embedding caching, which should improve performance and efficiency for requests with multiple images. The changes in mm_utils.py to handle token padding and embedding caching are well-implemented. The corresponding logic in schedule_batch.py to expand bundled items is also functionally correct. My main feedback is on improving the performance of the item splitting logic to avoid unnecessary memory copies.

gemini-code-assist · 2025-10-19T12:54:12Z

+                    continue
+
+                for i in range(num_images):
+                    new_item = copy.deepcopy(item)


Using copy.deepcopy(item) inside this loop can be inefficient, especially when dealing with a large number of images. The item.feature tensor can be quite large, and deepcopy will create a full copy of this tensor's data for each new item before it's sliced. This leads to unnecessary memory allocation and copying, potentially impacting performance.

A more efficient approach would be to construct a new MultimodalDataItem and selectively copy the necessary attributes, avoiding the deep copy of the large feature tensor. You only need to deep copy model_specific_data since it's modified in-place.

Suggested change

new_item = copy.deepcopy(item)

new_item = MultimodalDataItem(

modality=item.modality,

precomputed_embeddings=item.precomputed_embeddings,

model_specific_data=copy.deepcopy(item.model_specific_data),

)

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

ByronHsu · 2025-10-20T03:59:29Z

This breaks the following assumption. I am not sure if it will have any side effect cc @mickqian @JustinTong0323

One MultimodalDataItem contains all inputs for one modality. For example, if there are 3 images and 1 audio inputs, there will be 2 MultimodalDataItem. One for images and one for audio.

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>

JustinTong0323 · 2025-10-20T18:28:23Z

This breaks the following assumption. I am not sure if it will have any side effect cc @mickqian @JustinTong0323

One MultimodalDataItem contains all inputs for one modality. For example, if there are 3 images and 1 audio inputs, there will be 2 MultimodalDataItem. One for images and one for audio.

Yes I think it would break the internal processing logic...

XucSh · 2025-10-21T00:31:18Z

This breaks the following assumption. I am not sure if it will have any side effect cc @mickqian @JustinTong0323

One MultimodalDataItem contains all inputs for one modality. For example, if there are 3 images and 1 audio inputs, there will be 2 MultimodalDataItem. One for images and one for audio.

Yes I think it would break the internal processing logic...

Great review, thanks! Any thoughts on the overall architecture? We can pair up on a refactor. Thanks! Cc @stmatengss @ByronHsu

mickqian · 2025-11-15T12:33:31Z

#9529

This reverts commit 15e1a85.

liusy58 · 2025-12-04T07:47:41Z

Problem

When a request contains multiple images [A, B, C], they are hashed together as one bundle. If another request has [A, B, D], we cannot reuse the cache for images A and B.

Solution

Split bundled images into individual items so each image has its own hash.

Before: hash([A, B, C]) vs hash([A, B, D]) → no cache hit

After: hash(A), hash(B), hash(C) vs hash(A), hash(B), hash(D) → A and B hit cache

This is a great PR! I think we should merge this.

yhyang201 · 2025-12-04T19:13:15Z

/tag-and-rerun-ci

yhyang201 · 2025-12-04T19:14:53Z

Great work, let's take a look at the CI.

ShangmingCai · 2025-12-05T02:40:42Z

/tag-and-rerun-ci 2

ShangmingCai · 2025-12-08T02:27:45Z

@yhyang201 CI is all green now, expect xpu. Do you think this PR is ready-to-merge?

yhyang201 · 2025-12-08T02:34:54Z

I think it’s okay.

ShangmingCai

LGTM. But still need a double-check from @mickqian

minor: is it possible that we wrap some logic in the def from_dict(obj: dict): to a mm util? The code seems pretty long in the scheduler batch.

liusy58 · 2025-12-08T09:42:25Z

LGTM. But still need a double-check from @mickqian

minor: is it possible that we wrap some logic in the def from_dict(obj: dict): to a mm util? The code seems pretty long in the scheduler batch.

Done

ShangmingCai

Thx. This looks better to me now.

stmatengss · 2025-12-10T16:30:00Z

/rerun-failed-ci

stmatengss · 2025-12-19T06:04:29Z

This PR is approved. Please ensure the requested changes are implemented. THX. @mickqian

JustinTong0323 · 2025-12-19T15:35:36Z

/rerun-failed-ci

XucSh · 2025-12-23T03:44:55Z

/rerun-failed-ci

Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>

XucSh · 2025-12-23T14:44:15Z

/rerun-failed-ci

JustinTong0323 · 2025-12-23T23:43:43Z

/rerun-failed-ci

mickqian · 2025-12-24T02:19:37Z

@yhyang201 @yuan-luo Would you take a look?

ShangmingCai · 2025-12-24T03:11:27Z

@yhyang201 CI looks OK, if you think this PR is ready to merge after double-checking, you can ping me to merge.

yhyang201 · 2025-12-24T03:18:48Z

LGTM. I think it can be merged.
@ShangmingCai

ShangmingCai · 2025-12-24T03:21:46Z

LGTM. I think it can be merged. @ShangmingCai

@yhyang201 OK, let's get it merged. If any bug has been reported, we can do a quick fix or revert.

…oject#11828) Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> Signed-off-by: Kun(llfl) <i@imux.top> Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com> Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com> Co-authored-by: Kun(llfl) <llfl@linux.alibaba.com> Co-authored-by: Kun(llfl) <i@imux.top> Co-authored-by: liusy58 <liusy58@linux.alibaba.com>

[buf fix] split the images of one request into multi part

8df071d

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

XucSh requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners October 19, 2025 12:52

XucSh mentioned this pull request Oct 19, 2025

[Bug] KV Cache is not reused for image in multi-turn text + image convo #11785

Closed

5 tasks

gemini-code-assist Bot reviewed Oct 19, 2025

View reviewed changes

stmatengss added the run-ci label Oct 19, 2025

fix lint

f85f5e9

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

XucSh changed the title ~~[buf fix] split the images of one request into multi part~~ [bug fix] split the images of one request into multi part Oct 20, 2025

XucSh changed the title ~~[bug fix] split the images of one request into multi part~~ [Performance] split the images of one request into multi part Oct 20, 2025

XucSh changed the title ~~[Performance] split the images of one request into multi part~~ [bug fix] split the images of one request into multi part Oct 20, 2025

update

312068d

Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>

JustinTong0323 self-assigned this Oct 20, 2025

mickqian changed the title ~~[bug fix] split the images of one request into multi part~~ [bug fix] split the images of one request into multiparts Oct 21, 2025

mickqian requested changes Oct 21, 2025

View reviewed changes

Comment thread python/sglang/srt/managers/mm_utils.py Outdated

XucSh requested a review from mickqian October 21, 2025 02:50

XucSh force-pushed the Xuchun/mm-dev branch from 83522a9 to 312068d Compare October 21, 2025 06:38

stmatengss and others added 4 commits October 29, 2025 14:56

Merge branch 'main' into Xuchun/mm-dev

27031fd

Merge branch 'main' into Xuchun/mm-dev

99d20ec

Merge branch 'main' into Xuchun/mm-dev

55309af

Merge branch 'main' into Xuchun/mm-dev

6e1d469

llfl requested a review from zhyncs as a code owner November 12, 2025 01:05

add switch for splitting images

48b141e

mickqian mentioned this pull request Nov 15, 2025

vlm: resume batching reqs in Vits to speedup #9529

Closed

4 tasks

liusy58 added 3 commits December 2, 2025 23:29

fix

951f477

speedup hash

15e1a85

Revert "speedup hash"

994a2de

This reverts commit 15e1a85.

ShangmingCai reviewed Dec 8, 2025

View reviewed changes

refactor code

f9cd623

ShangmingCai approved these changes Dec 8, 2025

View reviewed changes

Merge branch 'main' into Xuchun/mm-dev

e99475f

XucSh added 2 commits December 23, 2025 16:56

Merge remote-tracking branch 'origin/main' into mm-dev

ab58a49

default to False

b67b899

Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>

ShangmingCai merged commit 3bf07c6 into sgl-project:main Dec 24, 2025
167 of 171 checks passed

wili-65535 mentioned this pull request Mar 4, 2026

[Bug] SGLANG_USE_CUDA_IPC_TRANSPORT=1 and SGLANG_ENABLE_MM_SPLITTING=1 do not work at the same time. #19893

Closed

5 tasks

-                    new_item = copy.deepcopy(item)
+                    new_item = MultimodalDataItem(
+                        modality=item.modality,
+                        precomputed_embeddings=item.precomputed_embeddings,
+                        model_specific_data=copy.deepcopy(item.model_specific_data),
+                    )

Conversation

XucSh commented Oct 19, 2025

Motivation

Checklist

Uh oh!

gemini-code-assist Bot commented Oct 19, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

ByronHsu commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JustinTong0323 commented Oct 20, 2025

Uh oh!

XucSh commented Oct 21, 2025

Uh oh!

Uh oh!

mickqian commented Nov 15, 2025

Uh oh!

liusy58 commented Dec 4, 2025

Problem

Solution

Uh oh!

yhyang201 commented Dec 4, 2025

Uh oh!

yhyang201 commented Dec 4, 2025

Uh oh!

ShangmingCai commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShangmingCai commented Dec 8, 2025

Uh oh!

yhyang201 commented Dec 8, 2025

Uh oh!

ShangmingCai left a comment

Choose a reason for hiding this comment

Uh oh!

liusy58 commented Dec 8, 2025

Uh oh!

ShangmingCai left a comment

Choose a reason for hiding this comment

Uh oh!

stmatengss commented Dec 10, 2025

Uh oh!

stmatengss commented Dec 19, 2025

Uh oh!

JustinTong0323 commented Dec 19, 2025

Uh oh!

XucSh commented Dec 23, 2025

Uh oh!

XucSh commented Dec 23, 2025

Uh oh!

JustinTong0323 commented Dec 23, 2025

Uh oh!

mickqian commented Dec 24, 2025

Uh oh!

ShangmingCai commented Dec 24, 2025

Uh oh!

yhyang201 commented Dec 24, 2025

Uh oh!

ShangmingCai commented Dec 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

ByronHsu commented Oct 20, 2025 •

edited

Loading

ShangmingCai commented Dec 5, 2025 •

edited

Loading