[Diffusion] Add Wan2.2 ModelOpt NVFP4 support by BBuf · Pull Request #22681 · sgl-project/sglang

BBuf · 2026-04-13T09:38:38Z

Summary

add Wan2.2 ModelOpt NVFP4 support on top of #22672
keep a global --transformer-weights-path override scoped to the primary transformer so transformer_2 stays on the base BF16 checkpoint unless explicitly overridden
make scheduler loading tolerate newer config fields such as shift_terminal when the resolved SGLang scheduler class does not accept them
honor SGLANG_DIFFUSION_FLASHINFER_FP4_GEMM_BACKEND in diffusion ModelOpt FP4 GEMM selection so Blackwell bring-up can force the validated FlashInfer path
document the validated Wan2.2 NVFP4 launch recipe

Validation

official ModelOpt FP4 export for Wan-AI/Wan2.2-T2V-A14B-Diffusers, with only the primary transformer quantized and transformer_2 kept BF16
B200 no-compile generation with base BF16 model + --transformer-weights-path override succeeded after forcing SGLANG_DIFFUSION_FLASHINFER_FP4_GEMM_BACKEND=cudnn
fixed-config B200 comparison on 832x480 / 17 frames / 2 steps:
- main BF16 no-compile: E2E 55.89s, DenoisingStage 53.51s
- this branch NVFP4 no-compile: E2E 25.72s, DenoisingStage 23.46s
- delta: E2E -54.0%, DenoisingStage -56.2%
warmup compile check on the same PR branch/config:
- warmup eager: E2E 17.87s, DenoisingStage 16.20s
- warmup compile: E2E 22.64s, DenoisingStage 20.96s
- compile was slower on this setup (+26.7% E2E, +29.4% denoise)

Notes

this PR is intentionally stacked on #22672
the validated Blackwell bring-up path currently uses FlashInfer FP4 GEMM via SGLANG_DIFFUSION_FLASHINFER_FP4_GEMM_BACKEND=cudnn
local B200 artifacts (videos, perf dumps, traces, summaries) were collected separately outside the repo

BBuf · 2026-04-13T09:38:46Z

/tag-and-rerun-ci

gemini-code-assist

Code Review

This pull request introduces support for Wan2.2-T2V-A14B-Diffusers quantization and improves the robustness of component loading. Key changes include documentation for dual-transformer FP4 exports, a new utility to filter unsupported scheduler initialization arguments, and logic to mask global quantization overrides for secondary transformer components. Additionally, the CUDA platform now supports an environment variable to prefer FlashInfer for FP4 GEMM operations. Unit tests were added to verify the new filtering and masking behaviors. I have no feedback to provide.

[Diffusion] Add Wan2.2 ModelOpt NVFP4 support

ff89cbc

BBuf requested review from mickqian, ping1jing2 and yhyang201 as code owners April 13, 2026 09:38

github-actions Bot added documentation Improvements or additions to documentation quant LLM Quantization diffusion SGLang Diffusion run-ci labels Apr 13, 2026

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

[Diffusion] Drop Wan2.2 NVFP4 test additions

b77f11e

mickqian merged commit 85863f4 into codex/flux1-modelopt-nvfp4-resubmit Apr 13, 2026
2 checks passed

mickqian deleted the codex/wan22-modelopt-nvfp4-from-22672 branch April 13, 2026 10:12

BBuf mentioned this pull request Apr 29, 2026

SGLang AI Agent Performance Optimization PRs (2026-01-29 to 2026-04-29) BBuf/AI-Infra-Auto-Driven-SKILLS#46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Diffusion] Add Wan2.2 ModelOpt NVFP4 support#22681

[Diffusion] Add Wan2.2 ModelOpt NVFP4 support#22681
mickqian merged 2 commits intocodex/flux1-modelopt-nvfp4-resubmitfrom
codex/wan22-modelopt-nvfp4-from-22672

BBuf commented Apr 13, 2026 •

edited

Loading

Uh oh!

BBuf commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BBuf commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Notes

Uh oh!

BBuf commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BBuf commented Apr 13, 2026 •

edited

Loading