[Diffusion] Add mixed-resolution benchmark support (for #20762) by fengyuanyu1 · Pull Request #20863 · sgl-project/sglang

fengyuanyu1 · 2026-03-18T13:45:15Z

Motivation

Add mix resolution to sglang/python/sglang/multimodal_gen/benchmarks/bench_serving.py
To test server with different prompts sizes

Modifications

Add --random-request-config for benchmarking with mixed resolutions. Accepts a JSON string of profiles with width, height, num_inference_steps, and weight fields. RandomDataset uses weighted sampling to assign profiles to requests. Also adds --random-request-seed for reproducibility.

Accuracy Tests

Benchmarking and Profiling

The test environment is: AMD CPU + RTX 3090 GPU.

$ python -m sglang.multimodal_gen.benchmarks.bench_serving \
  --dataset random \
  --num-prompts 4 \
  --port 30000 \
  --task text-to-image \
  --random-request-config '[{"width":512,"height":512,"num_inference_steps":5,"weight":0.5},{"width":1024,"height":1024,"num_inference_steps":5,"weight":0.5}]' \
  --warmup-requests 0
[03-18 13:35:54] Waiting for service at http://localhost:30000...
[03-18 13:35:54] Service is ready.
[03-18 13:35:54] Updated model name from server: /home/ainfra/Sana_600M_1024px_diffusers
[03-18 13:35:54] Using task from --task: text-to-image
[03-18 13:35:54] Loading requests...
[03-18 13:35:54] Prepared 4 requests from random dataset.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:15<00:00,  3.90s/it]

================= Serving Benchmark Result =================
Task:                                         text-to-image                 
Model:                                        /home/ainfra/Sana_600M_1024px_diffusers
Dataset:                                      random                        
--------------------------------------------------
Benchmark duration (s):                       15.60                         
Request rate:                                 inf                           
Max request concurrency:                      1                             
Successful requests:                          4/4                           
--------------------------------------------------
Request throughput (req/s):                   0.26                          
Latency Mean (s):                             3.90                          
Latency Median (s):                           2.74                          
Latency P99 (s):                              7.52                          
--------------------------------------------------
Peak Memory Max (MB):                         3948.00                       
Peak Memory Mean (MB):                        3565.50                       
Peak Memory Median (MB):                      3438.00                       
------------------------------------------------------------

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-03-18T13:45:32Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the benchmarking utility for diffusion models by introducing the capability to simulate diverse request loads. Instead of fixed parameters, users can now define a set of request profiles with different image resolutions and inference steps, and the system will randomly sample from these profiles based on specified weights. This change is crucial for understanding how diffusion model serving systems perform under more varied and realistic operational conditions, providing a more comprehensive performance assessment.

Highlights

Mixed-Resolution Benchmarking: Added support for benchmarking diffusion models with mixed resolutions and varying parameters, allowing for more realistic performance evaluations.
Configurable Request Profiles: Introduced a new command-line argument --random-request-config that accepts a JSON string to define weighted profiles for request parameters like width, height, and inference steps.
Reproducible Randomness: Included a --random-request-seed argument to ensure reproducibility when sampling request profiles for benchmarking.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request successfully adds support for mixed-resolution benchmarks by introducing --random-request-config and --random-request-seed arguments. The implementation in RandomDataset correctly uses weighted sampling to generate varied request profiles. The code is well-structured, but there is a small opportunity to improve the clarity and efficiency of the configuration parsing logic, for which I've left a specific comment.

Ratish1 · 2026-03-19T06:16:50Z

Could you remove this PR out of draft mode if you think it is ready @fengyuanyu1 and I will take another look. Thanks.

gemini-code-assist · 2026-03-19T06:34:22Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

fengyuanyu1 · 2026-03-19T06:35:02Z

Could you remove this PR out of draft mode if you think it is ready @fengyuanyu1 and I will take another look. Thanks.

Yes, thanks for your attention!

fengyuanyu1 · 2026-03-20T11:30:10Z

Hi, @ping1jing2 @Makcum888e @Ratish1 If you feel there aren’t any other problems, could you please tag run CI to trigger CI?

ping1jing2 · 2026-03-20T14:13:29Z

/tag-and-rerun-ci

Ratish1 · 2026-03-20T14:41:53Z

Can you fix the lint @fengyuanyu1?.

fengyuanyu1 · 2026-03-20T15:33:28Z

Can you fix the lint @fengyuanyu1?.

Emmm, I think the lint failure is unrelated to this PR — it's in test/registered/debug_utils/comparator/aligner/unsharder/test_planner.py:2064 (undefined sys), which is not part of this change.
Should I fix it in this PR, or would you prefer it to be handled separately?

Ratish1 · 2026-03-20T15:45:29Z

Can you fix the lint @fengyuanyu1?.

Emmm, I think the lint failure is unrelated to this PR — it's in test/registered/debug_utils/comparator/aligner/unsharder/test_planner.py:2064 (undefined sys), which is not part of this change. Should I fix it in this PR, or would you prefer it to be handled separately?

No, dont fix it here. I will look into it or I think someone else will look into it.

Ratish1 · 2026-03-20T16:00:22Z

Merge main @fengyuanyu1 , I think it should work

fengyuanyu1 · 2026-03-20T16:07:54Z

Merge main @fengyuanyu1 , I think it should work

Thanks for your patience and all your help!

fengyuanyu1 · 2026-03-21T01:07:55Z

Hi @ping1jing2 , CI has several failures but none are related to this PR. Could you take a look and re-trigger CI if you agree? or directly merge it?

Lint / lint (pull_request)Failing

F821 Undefined name `sys`
    --> test/registered/debug_utils/comparator/aligner/unsharder/test_planner.py:2064:5
     |
2063 | if __name__ == "__main__":
2064 |     sys.exit(pytest.main([__file__]))
     |     ^^^
     |

Pre-existing issue.

PR Test (NPU) / multimodal-gen-test-*-npu-a3 (pull_request)

  /tmp/pip-build-env-jad1ux1i/overlay/lib/python3.11/site-packages/vcs_versioning/overrides.py:609: UserWarning: No GlobalOverrides context is active. Auto-creating one with SETUPTOOLS_SCM prefix for backwards compatibility. Consider using 'with GlobalOverrides.from_env("YOUR_TOOL"):' explicitly.
    return get_active_overrides().subprocess_timeout
  fatal: detected dubious ownership in repository at '/__w/sglang/sglang'
  To add an exception for this directory, call:

        git config --global --add safe.directory /__w/sglang/sglang
  git introspection failed: fatal: detected dubious ownership in repository at '/__w/sglang/sglang'
  error: subprocess-exited-with-error

CI container environment issue

PR Test / multimodal-gen-test-1-gpu (1) (pull_request)

            if is_amd:
                logger.warning(
                    f"[AMD TIMEOUT WARNING] {case_id}: video job {video_id} did not complete "
                    f"within {timeout}s timeout. This may indicate performance issues on AMD."
                )
                pytest.skip(
                    f"{case_id}: video job timed out on AMD after {timeout}s - skipping"
                )
    
>           pytest.fail(f"{case_id}: video job {video_id} did not complete in time")
E           Failed: helios_distilled_t2v: video job 24725e31-202a-4c4d-ae06-61f53b743b79 did not complete in time

sglang/multimodal_gen/test/server/test_server_utils.py:826: Failed

Timeout on video generation. 20/21 tests passed.

PR Test / multimodal-gen-test-2-gpu (0) (pull_request)

FAILED sglang/multimodal_gen/test/server/test_server_2_gpu_a.py::TestDiffusionServerTwoGpu::test_diffusion_generation[fsdp-inference] - AssertionError: Validation failed for 'E2E Latency'.
    Actual:   3114.1671ms
    Expected: 2103.0500ms
    Limit:    2418.5075ms (rel_tol: 15.0%, abs_pad: 20.0ms)
assert 3114.167139865458 <= 2418.5075
FAILED sglang/multimodal_gen/test/server/test_server_2_gpu_b.py::TestDiffusionServerTwoGpu::test_diffusion_generation[flux_2_image_t2i_2_gpus] - AssertionError: Validation failed for 'Stage 'TextEncodingStage''.
    Actual:   940.0051ms
    Expected: 518.8800ms
    Limit:    830.2080ms (rel_tol: 60.0%, abs_pad: 120.0ms)
assert 940.0051319971681 <= 830.2080000000001
=========== 2 failed, 6 deselected, 2 warnings in 246.94s (0:04:06) ============

Perf baseline exceeded. 6/8 tests passed.

Ratish1 · 2026-03-21T09:38:56Z

Hey @fengyuanyu1, merge main into your branch. thanks

fengyuanyu1 · 2026-03-21T09:44:28Z

Hey @fengyuanyu1, merge main into your branch. thanks

Done, branch updated.

Add --random-request-config for benchmarking with mixed resolutions. Accepts a JSON string of profiles with width, height, num_inference_steps, and weight fields. RandomDataset uses weighted sampling to assign profiles to requests. Also adds --random-request-seed for reproducibility. Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>

- Replace getattr(args, ...) with direct attribute access - Use p.pop("weight") to extract weights and remove key in a single pass

- Add random_request_config and random_request_seed to BenchArgs - Add get_sampling_params() to RandomDataset for public access - Change generate_batch to accept per-request sampling params list - Build per-request sampling params in mix-diffusion mode Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>

Add num_inference_steps to image and video JSON payloads to forward per-request denoising steps to the server. Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>

Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>

…usion - Calculate per-request pixel count in calculate_metrics() for accurate megapixels throughput under mixed-resolution workloads - Validate that --random-request-config is only used with --dataset random Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>

ping1jing2 · 2026-03-24T07:36:12Z

/rerun-failed-ci

ping1jing2 · 2026-03-24T14:07:52Z

/rerun-failed-ci

fengyuanyu1 · 2026-03-28T02:59:57Z

@ping1jing2
Thank you for the attention!
May I ask if there is anything else that needs to be modified in this PR?

ping1jing2 · 2026-04-15T06:37:45Z

/rerun-failed-ci

…0762) (sgl-project#20863) Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com> Co-authored-by: Fengyuan Yu <15fengyuan@gmail.com> Co-authored-by: ronnie_zheng <zl19940307@163.com>

github-actions Bot added the diffusion SGLang Diffusion label Mar 18, 2026

gemini-code-assist Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread python/sglang/multimodal_gen/benchmarks/datasets.py Outdated

ping1jing2 self-assigned this Mar 18, 2026

ping1jing2 linked an issue Mar 18, 2026 that may be closed by this pull request

[Feature] [Diffusion] Benchmark mix resolution #20762

Closed

2 tasks

ping1jing2 reviewed Mar 18, 2026

View reviewed changes

Comment thread python/sglang/multimodal_gen/benchmarks/datasets.py Outdated

Comment thread python/sglang/multimodal_gen/benchmarks/datasets.py Outdated

Comment thread python/sglang/multimodal_gen/benchmarks/datasets.py Outdated

fengyuanyu1 requested a review from ping1jing2 March 19, 2026 03:18

Makcum888e reviewed Mar 19, 2026

View reviewed changes

Comment thread python/sglang/multimodal_gen/benchmarks/datasets.py Outdated

fengyuanyu1 marked this pull request as ready for review March 19, 2026 06:34

fengyuanyu1 requested review from mickqian and yhyang201 as code owners March 19, 2026 06:34

Ratish1 reviewed Mar 19, 2026

View reviewed changes

Comment thread python/sglang/multimodal_gen/benchmarks/datasets.py

ping1jing2 reviewed Mar 19, 2026

View reviewed changes

Comment thread python/sglang/multimodal_gen/benchmarks/datasets.py Outdated

fengyuanyu1 requested review from Makcum888e, Ratish1 and ping1jing2 March 19, 2026 12:14

Ratish1 reviewed Mar 19, 2026

View reviewed changes

Comment thread python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py

fengyuanyu1 requested a review from Ratish1 March 19, 2026 14:09

fengyuanyu1 force-pushed the feature/benchmark-mix-resolution branch from 55cd06e to e7dbf71 Compare March 20, 2026 11:28

github-actions Bot added the run-ci label Mar 20, 2026

fengyuanyu1 force-pushed the feature/benchmark-mix-resolution branch from ab89021 to 4a458b1 Compare March 21, 2026 09:42

Fengyuan Yu added 6 commits March 23, 2026 21:55

Address review feedback: remove getattr and simplify weight extraction

db3336f

- Replace getattr(args, ...) with direct attribute access - Use p.pop("weight") to extract weights and remove key in a single pass

Include num_inference_steps in benchmark HTTP request payload

d15a808

Add num_inference_steps to image and video JSON payloads to forward per-request denoising steps to the server. Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>

Replace **params with explicit field assignment in RandomDataset

4539fe3

Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>

fengyuanyu1 force-pushed the feature/benchmark-mix-resolution branch from 4a458b1 to 24c19c4 Compare March 23, 2026 13:55

Ratish1 approved these changes Mar 23, 2026

View reviewed changes

ping1jing2 added 3 commits April 2, 2026 04:49

Merge branch 'main' into feature/benchmark-mix-resolution

ffc1306

Merge branch 'main' into feature/benchmark-mix-resolution

2cb9d44

Merge branch 'main' into feature/benchmark-mix-resolution

bf119b9

ping1jing2 added 2 commits April 17, 2026 10:10

Merge branch 'main' into feature/benchmark-mix-resolution

3101b32

Merge branch 'main' into feature/benchmark-mix-resolution

10a349e

sglang-npu-bot merged commit 5c245d9 into sgl-project:main Apr 22, 2026
79 of 105 checks passed

Conversation

fengyuanyu1 commented Mar 18, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Mar 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ratish1 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot commented Mar 19, 2026

Uh oh!

fengyuanyu1 commented Mar 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fengyuanyu1 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ping1jing2 commented Mar 20, 2026

Uh oh!

Ratish1 commented Mar 20, 2026

Uh oh!

fengyuanyu1 commented Mar 20, 2026

Uh oh!

Ratish1 commented Mar 20, 2026

Uh oh!

Ratish1 commented Mar 20, 2026

Uh oh!

fengyuanyu1 commented Mar 20, 2026

Uh oh!

fengyuanyu1 commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ratish1 commented Mar 21, 2026

Uh oh!

fengyuanyu1 commented Mar 21, 2026

Uh oh!

ping1jing2 commented Mar 24, 2026

Uh oh!

ping1jing2 commented Mar 24, 2026

Uh oh!

fengyuanyu1 commented Mar 28, 2026

Uh oh!

ping1jing2 commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Ratish1 commented Mar 19, 2026 •

edited

Loading

fengyuanyu1 commented Mar 20, 2026 •

edited

Loading

fengyuanyu1 commented Mar 21, 2026 •

edited

Loading