Skip to content

[Diffusion] Add mixed-resolution benchmark support (for #20762)#20863

Merged
sglang-npu-bot merged 11 commits intosgl-project:mainfrom
fengyuanyu1:feature/benchmark-mix-resolution
Apr 22, 2026
Merged

[Diffusion] Add mixed-resolution benchmark support (for #20762)#20863
sglang-npu-bot merged 11 commits intosgl-project:mainfrom
fengyuanyu1:feature/benchmark-mix-resolution

Conversation

@fengyuanyu1
Copy link
Copy Markdown
Contributor

Motivation

Add mix resolution to sglang/python/sglang/multimodal_gen/benchmarks/bench_serving.py
To test server with different prompts sizes

Modifications

Add --random-request-config for benchmarking with mixed resolutions. Accepts a JSON string of profiles with width, height, num_inference_steps, and weight fields. RandomDataset uses weighted sampling to assign profiles to requests. Also adds --random-request-seed for reproducibility.

Accuracy Tests

Benchmarking and Profiling

The test environment is: AMD CPU + RTX 3090 GPU.

$ python -m sglang.multimodal_gen.benchmarks.bench_serving \
  --dataset random \
  --num-prompts 4 \
  --port 30000 \
  --task text-to-image \
  --random-request-config '[{"width":512,"height":512,"num_inference_steps":5,"weight":0.5},{"width":1024,"height":1024,"num_inference_steps":5,"weight":0.5}]' \
  --warmup-requests 0
[03-18 13:35:54] Waiting for service at http://localhost:30000...
[03-18 13:35:54] Service is ready.
[03-18 13:35:54] Updated model name from server: /home/ainfra/Sana_600M_1024px_diffusers
[03-18 13:35:54] Using task from --task: text-to-image
[03-18 13:35:54] Loading requests...
[03-18 13:35:54] Prepared 4 requests from random dataset.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:15<00:00,  3.90s/it]

================= Serving Benchmark Result =================
Task:                                         text-to-image                 
Model:                                        /home/ainfra/Sana_600M_1024px_diffusers
Dataset:                                      random                        
--------------------------------------------------
Benchmark duration (s):                       15.60                         
Request rate:                                 inf                           
Max request concurrency:                      1                             
Successful requests:                          4/4                           
--------------------------------------------------
Request throughput (req/s):                   0.26                          
Latency Mean (s):                             3.90                          
Latency Median (s):                           2.74                          
Latency P99 (s):                              7.52                          
--------------------------------------------------
Peak Memory Max (MB):                         3948.00                       
Peak Memory Mean (MB):                        3565.50                       
Peak Memory Median (MB):                      3438.00                       
------------------------------------------------------------

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@github-actions github-actions Bot added the diffusion SGLang Diffusion label Mar 18, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the benchmarking utility for diffusion models by introducing the capability to simulate diverse request loads. Instead of fixed parameters, users can now define a set of request profiles with different image resolutions and inference steps, and the system will randomly sample from these profiles based on specified weights. This change is crucial for understanding how diffusion model serving systems perform under more varied and realistic operational conditions, providing a more comprehensive performance assessment.

Highlights

  • Mixed-Resolution Benchmarking: Added support for benchmarking diffusion models with mixed resolutions and varying parameters, allowing for more realistic performance evaluations.
  • Configurable Request Profiles: Introduced a new command-line argument --random-request-config that accepts a JSON string to define weighted profiles for request parameters like width, height, and inference steps.
  • Reproducible Randomness: Included a --random-request-seed argument to ensure reproducibility when sampling request profiles for benchmarking.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully adds support for mixed-resolution benchmarks by introducing --random-request-config and --random-request-seed arguments. The implementation in RandomDataset correctly uses weighted sampling to generate varied request profiles. The code is well-structured, but there is a small opportunity to improve the clarity and efficiency of the configuration parsing logic, for which I've left a specific comment.

Comment thread python/sglang/multimodal_gen/benchmarks/datasets.py Outdated
@ping1jing2 ping1jing2 self-assigned this Mar 18, 2026
@ping1jing2 ping1jing2 linked an issue Mar 18, 2026 that may be closed by this pull request
2 tasks
Comment thread python/sglang/multimodal_gen/benchmarks/datasets.py Outdated
Comment thread python/sglang/multimodal_gen/benchmarks/datasets.py Outdated
Comment thread python/sglang/multimodal_gen/benchmarks/datasets.py Outdated
@fengyuanyu1 fengyuanyu1 requested a review from ping1jing2 March 19, 2026 03:18
@Ratish1
Copy link
Copy Markdown
Collaborator

Ratish1 commented Mar 19, 2026

Could you remove this PR out of draft mode if you think it is ready @fengyuanyu1 and I will take another look. Thanks.

Comment thread python/sglang/multimodal_gen/benchmarks/datasets.py Outdated
@fengyuanyu1 fengyuanyu1 marked this pull request as ready for review March 19, 2026 06:34
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@fengyuanyu1
Copy link
Copy Markdown
Contributor Author

Could you remove this PR out of draft mode if you think it is ready @fengyuanyu1 and I will take another look. Thanks.

Yes, thanks for your attention!

Comment thread python/sglang/multimodal_gen/benchmarks/datasets.py
Comment thread python/sglang/multimodal_gen/benchmarks/datasets.py Outdated
Comment thread python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py
@fengyuanyu1 fengyuanyu1 requested a review from Ratish1 March 19, 2026 14:09
@fengyuanyu1 fengyuanyu1 force-pushed the feature/benchmark-mix-resolution branch from 55cd06e to e7dbf71 Compare March 20, 2026 11:28
@fengyuanyu1
Copy link
Copy Markdown
Contributor Author

fengyuanyu1 commented Mar 20, 2026

Hi, @ping1jing2 @Makcum888e @Ratish1 If you feel there aren’t any other problems, could you please tag run CI to trigger CI?

@ping1jing2
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@Ratish1
Copy link
Copy Markdown
Collaborator

Ratish1 commented Mar 20, 2026

Can you fix the lint @fengyuanyu1?.

@fengyuanyu1
Copy link
Copy Markdown
Contributor Author

Can you fix the lint @fengyuanyu1?.

Emmm, I think the lint failure is unrelated to this PR — it's in test/registered/debug_utils/comparator/aligner/unsharder/test_planner.py:2064 (undefined sys), which is not part of this change.
Should I fix it in this PR, or would you prefer it to be handled separately?

@Ratish1
Copy link
Copy Markdown
Collaborator

Ratish1 commented Mar 20, 2026

Can you fix the lint @fengyuanyu1?.

Emmm, I think the lint failure is unrelated to this PR — it's in test/registered/debug_utils/comparator/aligner/unsharder/test_planner.py:2064 (undefined sys), which is not part of this change. Should I fix it in this PR, or would you prefer it to be handled separately?

No, dont fix it here. I will look into it or I think someone else will look into it.

@Ratish1
Copy link
Copy Markdown
Collaborator

Ratish1 commented Mar 20, 2026

Merge main @fengyuanyu1 , I think it should work

@fengyuanyu1
Copy link
Copy Markdown
Contributor Author

Merge main @fengyuanyu1 , I think it should work

Thanks for your patience and all your help!

@fengyuanyu1
Copy link
Copy Markdown
Contributor Author

fengyuanyu1 commented Mar 21, 2026

Hi @ping1jing2 , CI has several failures but none are related to this PR. Could you take a look and re-trigger CI if you agree? or directly merge it?

  1. Lint / lint (pull_request)Failing
F821 Undefined name `sys`
    --> test/registered/debug_utils/comparator/aligner/unsharder/test_planner.py:2064:5
     |
2063 | if __name__ == "__main__":
2064 |     sys.exit(pytest.main([__file__]))
     |     ^^^
     |

Pre-existing issue.

  1. PR Test (NPU) / multimodal-gen-test-*-npu-a3 (pull_request)
  /tmp/pip-build-env-jad1ux1i/overlay/lib/python3.11/site-packages/vcs_versioning/overrides.py:609: UserWarning: No GlobalOverrides context is active. Auto-creating one with SETUPTOOLS_SCM prefix for backwards compatibility. Consider using 'with GlobalOverrides.from_env("YOUR_TOOL"):' explicitly.
    return get_active_overrides().subprocess_timeout
  fatal: detected dubious ownership in repository at '/__w/sglang/sglang'
  To add an exception for this directory, call:

        git config --global --add safe.directory /__w/sglang/sglang
  git introspection failed: fatal: detected dubious ownership in repository at '/__w/sglang/sglang'
  error: subprocess-exited-with-error

CI container environment issue

  1. PR Test / multimodal-gen-test-1-gpu (1) (pull_request)
            if is_amd:
                logger.warning(
                    f"[AMD TIMEOUT WARNING] {case_id}: video job {video_id} did not complete "
                    f"within {timeout}s timeout. This may indicate performance issues on AMD."
                )
                pytest.skip(
                    f"{case_id}: video job timed out on AMD after {timeout}s - skipping"
                )
    
>           pytest.fail(f"{case_id}: video job {video_id} did not complete in time")
E           Failed: helios_distilled_t2v: video job 24725e31-202a-4c4d-ae06-61f53b743b79 did not complete in time

sglang/multimodal_gen/test/server/test_server_utils.py:826: Failed

Timeout on video generation. 20/21 tests passed.

  1. PR Test / multimodal-gen-test-2-gpu (0) (pull_request)
FAILED sglang/multimodal_gen/test/server/test_server_2_gpu_a.py::TestDiffusionServerTwoGpu::test_diffusion_generation[fsdp-inference] - AssertionError: Validation failed for 'E2E Latency'.
    Actual:   3114.1671ms
    Expected: 2103.0500ms
    Limit:    2418.5075ms (rel_tol: 15.0%, abs_pad: 20.0ms)
assert 3114.167139865458 <= 2418.5075
FAILED sglang/multimodal_gen/test/server/test_server_2_gpu_b.py::TestDiffusionServerTwoGpu::test_diffusion_generation[flux_2_image_t2i_2_gpus] - AssertionError: Validation failed for 'Stage 'TextEncodingStage''.
    Actual:   940.0051ms
    Expected: 518.8800ms
    Limit:    830.2080ms (rel_tol: 60.0%, abs_pad: 120.0ms)
assert 940.0051319971681 <= 830.2080000000001
=========== 2 failed, 6 deselected, 2 warnings in 246.94s (0:04:06) ============

Perf baseline exceeded. 6/8 tests passed.

@Ratish1
Copy link
Copy Markdown
Collaborator

Ratish1 commented Mar 21, 2026

Hey @fengyuanyu1, merge main into your branch. thanks

@fengyuanyu1 fengyuanyu1 force-pushed the feature/benchmark-mix-resolution branch from ab89021 to 4a458b1 Compare March 21, 2026 09:42
@fengyuanyu1
Copy link
Copy Markdown
Contributor Author

Hey @fengyuanyu1, merge main into your branch. thanks

Done, branch updated.

Fengyuan Yu added 6 commits March 23, 2026 21:55
Add --random-request-config for benchmarking with mixed resolutions.
Accepts a JSON string of profiles with width, height, num_inference_steps,
and weight fields. RandomDataset uses weighted sampling to assign profiles
to requests. Also adds --random-request-seed for reproducibility.

Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>
- Replace getattr(args, ...) with direct attribute access
- Use p.pop("weight") to extract weights and remove key in a single pass
- Add random_request_config and random_request_seed to BenchArgs
- Add get_sampling_params() to RandomDataset for public access
- Change generate_batch to accept per-request sampling params list
- Build per-request sampling params in mix-diffusion mode

Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>
Add num_inference_steps to image and video JSON payloads to
forward per-request denoising steps to the server.

Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>
Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>
…usion

- Calculate per-request pixel count in calculate_metrics() for accurate
  megapixels throughput under mixed-resolution workloads
- Validate that --random-request-config is only used with --dataset random

Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>
@fengyuanyu1 fengyuanyu1 force-pushed the feature/benchmark-mix-resolution branch from 4a458b1 to 24c19c4 Compare March 23, 2026 13:55
@ping1jing2
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

1 similar comment
@ping1jing2
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@fengyuanyu1
Copy link
Copy Markdown
Contributor Author

@ping1jing2
Thank you for the attention!
May I ask if there is anything else that needs to be modified in this PR?

@ping1jing2
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@sglang-npu-bot sglang-npu-bot merged commit 5c245d9 into sgl-project:main Apr 22, 2026
79 of 105 checks passed
zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026
…0762) (sgl-project#20863)

Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>
Co-authored-by: Fengyuan Yu <15fengyuan@gmail.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] [Diffusion] Benchmark mix resolution

5 participants