Skip to content

[diffusion] chore: auto-enable best parallel setting if unspecified#22763

Merged
mickqian merged 9 commits intomainfrom
worktree-auto-cfg-parallel
Apr 14, 2026
Merged

[diffusion] chore: auto-enable best parallel setting if unspecified#22763
mickqian merged 9 commits intomainfrom
worktree-auto-cfg-parallel

Conversation

@mickqian
Copy link
Copy Markdown
Collaborator

@mickqian mickqian commented Apr 14, 2026

When users launch with --num-gpus >= 2 without specifying any parallelism flags (--sp-degree, --ulysses-degree, --ring-degree, --enable-cfg-parallel), automatically enable CFG parallel instead of the previous default of pure sequence parallelism (ulysses).

Benchmarked on H200 (2 GPU, warmup excluded):

  Qwen-Image 1024x1024:  SP 11.21s / TP 13.43s / CFG  7.14s (-36%)
  Wan-1.3B T2V 480x832:  SP  5.95s /            CFG  4.77s (-20%)
  Wan-14B T2V 480x832:   SP 23.75s /            CFG 21.27s (-10%)

Users can opt out by explicitly setting --sp-degree or --ulysses-degree.

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

When users launch with --num-gpus >= 2 without specifying any parallelism
flags (--sp-degree, --ulysses-degree, --ring-degree, --enable-cfg-parallel),
automatically enable CFG parallel instead of the previous default of pure
sequence parallelism (ulysses).

Benchmarked on H200 (2 GPU, warmup excluded):
  Qwen-Image 1024x1024:  SP 11.21s / TP 13.43s / CFG  7.14s (-36%)
  Wan-1.3B T2V 480x832:  SP  5.95s /            CFG  4.77s (-20%)
  Wan-14B T2V 480x832:   SP 23.75s /            CFG 21.27s (-10%)

Users can opt out by explicitly setting --sp-degree or --ulysses-degree.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@mickqian mickqian marked this pull request as ready for review April 14, 2026 06:27
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added the diffusion SGLang Diffusion label Apr 14, 2026
@mickqian
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@mickqian mickqian merged commit d2f479e into main Apr 14, 2026
88 of 94 checks passed
@mickqian mickqian deleted the worktree-auto-cfg-parallel branch April 14, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant