Skip to content

[Diffusion] Add FLUX.1-dev ModelOpt NVFP4 support#22574

Merged
BBuf merged 7 commits intosgl-project:mainfrom
BBuf:codex/flux1-modelopt-nvfp4
Apr 12, 2026
Merged

[Diffusion] Add FLUX.1-dev ModelOpt NVFP4 support#22574
BBuf merged 7 commits intosgl-project:mainfrom
BBuf:codex/flux1-modelopt-nvfp4

Conversation

@BBuf
Copy link
Copy Markdown
Collaborator

@BBuf BBuf commented Apr 11, 2026

Summary

  • add a FLUX.1-dev ModelOpt NVFP4 mixed-transformer builder for SGLang diffusion
  • make NVFP4 loading configurable for nibble swapping and preserve validated FLUX.1-dev export layout
  • fix FLUX attention/single-block quant prefixes so FLUX.1 fallback excludes match the intended modules
  • add unit coverage for the new NVFP4 config and FLUX prefix behavior

Validation

  • Remote RTX 5090 (4 GPUs), torch.compile disabled throughout benchmark/profile/correctness runs
  • pytest -q python/sglang/multimodal_gen/test/unit/test_transformer_quant.py -q in the remote diffusion container
  • BF16 benchmark denoise: 37.6940s
  • NVFP4 benchmark denoise: 29.0421s (22.95% faster)
  • BF16 end-to-end: 38.2545s
  • NVFP4 end-to-end: 29.4954s (22.90% faster)
  • Correctness check against BF16 at 512x512 / 8 steps: trajectory cosine 0.9933, final image PSNR 28.16 dB

bf16:

flux1_bf16_main_4gpu_1024_layeroffload

nvfp4:

flux1_nvfp4_pr_4gpu_1024_layeroffload

Notes

  • The validated FLUX.1-dev path uses --transformer-path for the mixed SGLang transformer override.
  • Profiling traces were captured on both main and this branch with identical 4-GPU settings and torch.compile disabled.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added quant LLM Quantization blackwell SM100/SM120 diffusion SGLang Diffusion labels Apr 11, 2026
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Apr 11, 2026
@BBuf
Copy link
Copy Markdown
Collaborator Author

BBuf commented Apr 12, 2026

/tag-and-rerun-ci

@BBuf BBuf merged commit 03a1a7b into sgl-project:main Apr 12, 2026
86 of 124 checks passed
mickqian added a commit that referenced this pull request Apr 13, 2026
BBuf added a commit that referenced this pull request Apr 13, 2026
pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blackwell SM100/SM120 diffusion SGLang Diffusion documentation Improvements or additions to documentation jit-kernel quant LLM Quantization run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant