Skip to content

[diffusion] Support nvfp4 for Flux.2#20137

Merged
mickqian merged 70 commits intosgl-project:mainfrom
ykcai-daniel:nvfp4
Mar 25, 2026
Merged

[diffusion] Support nvfp4 for Flux.2#20137
mickqian merged 70 commits intosgl-project:mainfrom
ykcai-daniel:nvfp4

Conversation

@ykcai-daniel
Copy link
Copy Markdown
Contributor

@ykcai-daniel ykcai-daniel commented Mar 8, 2026

Modifications

  • Weight Loading Logic for flux2-nvfp4: Added some utilities for safetensor metadata parsing to enable correct quantized layer selecting logic for flux2
  • NVFP4 Layer: There are currently two layers, ModelOptNvfp4Layer, which is adopted from a similar layer from srt from the previous PR. However, the weight_scaless in flux2 checkpoints differ from the format required from this layer. and another reference implementation based on comfyui is included. Currently, the nvfp4 version of flux2 is only supported in comfyui.

There are two checkpoints in the repo: flux2-dev-nvfp4 and flux2-dev-nvfp4-mixed. The mixed checkpoint doesn't have input_scale for quantized layers and have some more layers in bf16.

Run diffusion model with:

sglang generate --model-path black-forest-labs/FLUX.2-dev --transformer-model-path "hf://black-forest-labs/FLUX.2-dev-NVFP4/blob/flux2-dev-nvfp4.safetensors" --prompt "A smiling girl holding a rectangular white signboard with the text \'sGl Diffusion x FLUx.2\", in animate style"

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@ykcai-daniel
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

4 similar comments
@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@ping1jing2
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@mickqian mickqian merged commit 281fe10 into sgl-project:main Mar 25, 2026
218 of 228 checks passed
0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026
Co-authored-by: zcnrex <zcnrex@gmail.com>
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Yikang Cai <dcai@catalyst-fleet1.cs.cmu.edu>
Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com>
Co-authored-by: RubiaCx <1084281732@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
johnnycxm pushed a commit to johnnycxm/sglang that referenced this pull request Mar 25, 2026
Co-authored-by: zcnrex <zcnrex@gmail.com>
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Yikang Cai <dcai@catalyst-fleet1.cs.cmu.edu>
Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com>
Co-authored-by: RubiaCx <1084281732@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
johnnycxm pushed a commit to johnnycxm/sglang that referenced this pull request Mar 25, 2026
Co-authored-by: zcnrex <zcnrex@gmail.com>
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Yikang Cai <dcai@catalyst-fleet1.cs.cmu.edu>
Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com>
Co-authored-by: RubiaCx <1084281732@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
@yhyang201 yhyang201 mentioned this pull request Mar 30, 2026
5 tasks
yhyang201 added a commit to yhyang201/sglang that referenced this pull request Mar 30, 2026
The single-stream block's to_out (RowParallelLinear) receives input as
[attn_shard | mlp_shard], where each part is independently sharded by
MergedColumnParallelLinear. However, RowParallelLinear's default weight
loader slices weight columns as a contiguous block, causing a mismatch
between weight columns and input features when TP > 1.

Fix: patch the weight loader to select the correct non-contiguous columns
(attn slice + mlp slice) from the full checkpoint weight per rank.

Introduced by 281fe10 (sgl-project#20137).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
yhyang201 added a commit to yhyang201/sglang that referenced this pull request Mar 30, 2026
The single-stream block's to_out (RowParallelLinear) receives input as
[attn_shard | mlp_shard], where each part is independently sharded by
MergedColumnParallelLinear. However, RowParallelLinear's default weight
loader slices weight columns as a contiguous block, causing a mismatch
between weight columns and input features when TP > 1.

Fix: patch the weight loader to select the correct non-contiguous columns
(attn slice + mlp slice) from the full checkpoint weight per rank.

Introduced by 281fe10 (sgl-project#20137).
satyamk7054 pushed a commit to satyamk7054/sglang that referenced this pull request Apr 3, 2026
Co-authored-by: zcnrex <zcnrex@gmail.com>
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Yikang Cai <dcai@catalyst-fleet1.cs.cmu.edu>
Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com>
Co-authored-by: RubiaCx <1084281732@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
lishunyang12 added a commit to lishunyang12/vllm-omni that referenced this pull request Apr 6, 2026
Add NVIDIA FP4 (NVFP4) quantization support for Flux.2 diffusion models,
enabling running FLUX.2-dev-NVFP4 pre-quantized checkpoints via ModelOpt.

Key changes:
- Add BFL checkpoint weight name mapping (WeightsMapper) to
  Flux2Transformer2DModel.load_weights() with auto-detection
- Add transformer_weights_path to OmniDiffusionConfig for loading
  transformer weights from a separate checkpoint path
- Add NVFP4 auto-detection from safetensors file headers
- Update FluxPipeline and Flux2KleinPipeline to support separate
  transformer weight paths
- Add unit tests for weight mapping, format detection, and auto-detection

Leverages vLLM's existing ModelOptNvFp4Config and ModelOptNvFp4LinearMethod
registered as "modelopt_fp4" — no custom quantization config needed.

Reference: sgl-project/sglang#20137

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lishunyang12 added a commit to lishunyang12/vllm-omni that referenced this pull request Apr 6, 2026
Add NVIDIA FP4 (NVFP4) quantization support for Flux.2 diffusion models,
enabling running FLUX.2-dev-NVFP4 pre-quantized checkpoints via ModelOpt.

Key changes:
- Add BFL checkpoint weight name mapping (WeightsMapper) to
  Flux2Transformer2DModel.load_weights() with auto-detection
- Add transformer_weights_path to OmniDiffusionConfig for loading
  transformer weights from a separate checkpoint path
- Add NVFP4 auto-detection from safetensors file headers
- Update FluxPipeline and Flux2KleinPipeline to support separate
  transformer weight paths
- Add unit tests for weight mapping, format detection, and auto-detection

Leverages vLLM's existing ModelOptNvFp4Config and ModelOptNvFp4LinearMethod
registered as "modelopt_fp4" — no custom quantization config needed.

Reference: sgl-project/sglang#20137

Signed-off-by: lishunyang <lishunyang12@163.com>
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Co-authored-by: zcnrex <zcnrex@gmail.com>
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Yikang Cai <dcai@catalyst-fleet1.cs.cmu.edu>
Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com>
Co-authored-by: RubiaCx <1084281732@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Co-authored-by: zcnrex <zcnrex@gmail.com>
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Yikang Cai <dcai@catalyst-fleet1.cs.cmu.edu>
Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com>
Co-authored-by: RubiaCx <1084281732@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blackwell SM100/SM120 diffusion SGLang Diffusion documentation Improvements or additions to documentation quant LLM Quantization run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants