[Diffusion] Add mixed-resolution benchmark support (for #20762)#20863
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the benchmarking utility for diffusion models by introducing the capability to simulate diverse request loads. Instead of fixed parameters, users can now define a set of request profiles with different image resolutions and inference steps, and the system will randomly sample from these profiles based on specified weights. This change is crucial for understanding how diffusion model serving systems perform under more varied and realistic operational conditions, providing a more comprehensive performance assessment. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request successfully adds support for mixed-resolution benchmarks by introducing --random-request-config and --random-request-seed arguments. The implementation in RandomDataset correctly uses weighted sampling to generate varied request profiles. The code is well-structured, but there is a small opportunity to improve the clarity and efficiency of the configuration parsing logic, for which I've left a specific comment.
|
Could you remove this PR out of draft mode if you think it is ready @fengyuanyu1 and I will take another look. Thanks. |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Yes, thanks for your attention! |
55cd06e to
e7dbf71
Compare
|
Hi, @ping1jing2 @Makcum888e @Ratish1 If you feel there aren’t any other problems, could you please tag run CI to trigger CI? |
|
/tag-and-rerun-ci |
|
Can you fix the lint @fengyuanyu1?. |
Emmm, I think the lint failure is unrelated to this PR — it's in |
No, dont fix it here. I will look into it or I think someone else will look into it. |
|
Merge main @fengyuanyu1 , I think it should work |
Thanks for your patience and all your help! |
|
Hi @ping1jing2 , CI has several failures but none are related to this PR. Could you take a look and re-trigger CI if you agree? or directly merge it?
F821 Undefined name `sys`
--> test/registered/debug_utils/comparator/aligner/unsharder/test_planner.py:2064:5
|
2063 | if __name__ == "__main__":
2064 | sys.exit(pytest.main([__file__]))
| ^^^
|Pre-existing issue.
/tmp/pip-build-env-jad1ux1i/overlay/lib/python3.11/site-packages/vcs_versioning/overrides.py:609: UserWarning: No GlobalOverrides context is active. Auto-creating one with SETUPTOOLS_SCM prefix for backwards compatibility. Consider using 'with GlobalOverrides.from_env("YOUR_TOOL"):' explicitly.
return get_active_overrides().subprocess_timeout
fatal: detected dubious ownership in repository at '/__w/sglang/sglang'
To add an exception for this directory, call:
git config --global --add safe.directory /__w/sglang/sglang
git introspection failed: fatal: detected dubious ownership in repository at '/__w/sglang/sglang'
error: subprocess-exited-with-errorCI container environment issue
if is_amd:
logger.warning(
f"[AMD TIMEOUT WARNING] {case_id}: video job {video_id} did not complete "
f"within {timeout}s timeout. This may indicate performance issues on AMD."
)
pytest.skip(
f"{case_id}: video job timed out on AMD after {timeout}s - skipping"
)
> pytest.fail(f"{case_id}: video job {video_id} did not complete in time")
E Failed: helios_distilled_t2v: video job 24725e31-202a-4c4d-ae06-61f53b743b79 did not complete in time
sglang/multimodal_gen/test/server/test_server_utils.py:826: FailedTimeout on video generation. 20/21 tests passed.
FAILED sglang/multimodal_gen/test/server/test_server_2_gpu_a.py::TestDiffusionServerTwoGpu::test_diffusion_generation[fsdp-inference] - AssertionError: Validation failed for 'E2E Latency'.
Actual: 3114.1671ms
Expected: 2103.0500ms
Limit: 2418.5075ms (rel_tol: 15.0%, abs_pad: 20.0ms)
assert 3114.167139865458 <= 2418.5075
FAILED sglang/multimodal_gen/test/server/test_server_2_gpu_b.py::TestDiffusionServerTwoGpu::test_diffusion_generation[flux_2_image_t2i_2_gpus] - AssertionError: Validation failed for 'Stage 'TextEncodingStage''.
Actual: 940.0051ms
Expected: 518.8800ms
Limit: 830.2080ms (rel_tol: 60.0%, abs_pad: 120.0ms)
assert 940.0051319971681 <= 830.2080000000001
=========== 2 failed, 6 deselected, 2 warnings in 246.94s (0:04:06) ============Perf baseline exceeded. 6/8 tests passed. |
|
Hey @fengyuanyu1, merge main into your branch. thanks |
ab89021 to
4a458b1
Compare
Done, branch updated. |
Add --random-request-config for benchmarking with mixed resolutions. Accepts a JSON string of profiles with width, height, num_inference_steps, and weight fields. RandomDataset uses weighted sampling to assign profiles to requests. Also adds --random-request-seed for reproducibility. Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>
- Replace getattr(args, ...) with direct attribute access
- Use p.pop("weight") to extract weights and remove key in a single pass
- Add random_request_config and random_request_seed to BenchArgs - Add get_sampling_params() to RandomDataset for public access - Change generate_batch to accept per-request sampling params list - Build per-request sampling params in mix-diffusion mode Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>
Add num_inference_steps to image and video JSON payloads to forward per-request denoising steps to the server. Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>
Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>
…usion - Calculate per-request pixel count in calculate_metrics() for accurate megapixels throughput under mixed-resolution workloads - Validate that --random-request-config is only used with --dataset random Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>
4a458b1 to
24c19c4
Compare
|
/rerun-failed-ci |
1 similar comment
|
/rerun-failed-ci |
|
@ping1jing2 |
|
/rerun-failed-ci |
…0762) (sgl-project#20863) Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com> Co-authored-by: Fengyuan Yu <15fengyuan@gmail.com> Co-authored-by: ronnie_zheng <zl19940307@163.com>
Motivation
Add mix resolution to sglang/python/sglang/multimodal_gen/benchmarks/bench_serving.py
To test server with different prompts sizes
Modifications
Add --random-request-config for benchmarking with mixed resolutions. Accepts a JSON string of profiles with width, height, num_inference_steps, and weight fields. RandomDataset uses weighted sampling to assign profiles to requests. Also adds --random-request-seed for reproducibility.
Accuracy Tests
Benchmarking and Profiling
The test environment is: AMD CPU + RTX 3090 GPU.
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci