[diffusion] Support nvfp4 for Flux.2 by ykcai-daniel · Pull Request #20137 · sgl-project/sglang

ykcai-daniel · 2026-03-08T18:32:34Z

Modifications

Weight Loading Logic for flux2-nvfp4: Added some utilities for safetensor metadata parsing to enable correct quantized layer selecting logic for flux2
NVFP4 Layer: There are currently two layers, ModelOptNvfp4Layer, which is adopted from a similar layer from srt from the previous PR. However, the weight_scaless in flux2 checkpoints differ from the format required from this layer. and another reference implementation based on comfyui is included. Currently, the nvfp4 version of flux2 is only supported in comfyui.

There are two checkpoints in the repo: flux2-dev-nvfp4 and flux2-dev-nvfp4-mixed. The mixed checkpoint doesn't have input_scale for quantized layers and have some more layers in bf16.

Run diffusion model with:

sglang generate --model-path black-forest-labs/FLUX.2-dev --transformer-model-path "hf://black-forest-labs/FLUX.2-dev-NVFP4/blob/flux2-dev-nvfp4.safetensors" --prompt "A smiling girl holding a rectangular white signboard with the text \'sGl Diffusion x FLUx.2\", in animate style"

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

parameters for packed qkv

ykcai-daniel · 2026-03-23T03:25:40Z

/rerun-failed-ci

This reverts commit 45890d1.

This reverts commit d8978ce.

This reverts commit f562dfd.

yhyang201 · 2026-03-24T04:07:02Z

/rerun-failed-ci

yhyang201 · 2026-03-24T04:52:30Z

/rerun-failed-ci

yhyang201 · 2026-03-24T05:36:00Z

/rerun-failed-ci

yhyang201 · 2026-03-24T06:05:19Z

/rerun-failed-ci

ping1jing2 · 2026-03-24T13:07:10Z

/rerun-failed-ci

Co-authored-by: zcnrex <zcnrex@gmail.com> Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Yikang Cai <dcai@catalyst-fleet1.cs.cmu.edu> Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com> Co-authored-by: RubiaCx <1084281732@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com>

The single-stream block's to_out (RowParallelLinear) receives input as [attn_shard | mlp_shard], where each part is independently sharded by MergedColumnParallelLinear. However, RowParallelLinear's default weight loader slices weight columns as a contiguous block, causing a mismatch between weight columns and input features when TP > 1. Fix: patch the weight loader to select the correct non-contiguous columns (attn slice + mlp slice) from the full checkpoint weight per rank. Introduced by 281fe10 (sgl-project#20137). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The single-stream block's to_out (RowParallelLinear) receives input as [attn_shard | mlp_shard], where each part is independently sharded by MergedColumnParallelLinear. However, RowParallelLinear's default weight loader slices weight columns as a contiguous block, causing a mismatch between weight columns and input features when TP > 1. Fix: patch the weight loader to select the correct non-contiguous columns (attn slice + mlp slice) from the full checkpoint weight per rank. Introduced by 281fe10 (sgl-project#20137).

Co-authored-by: zcnrex <zcnrex@gmail.com> Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Yikang Cai <dcai@catalyst-fleet1.cs.cmu.edu> Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com> Co-authored-by: RubiaCx <1084281732@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com>

Add NVIDIA FP4 (NVFP4) quantization support for Flux.2 diffusion models, enabling running FLUX.2-dev-NVFP4 pre-quantized checkpoints via ModelOpt. Key changes: - Add BFL checkpoint weight name mapping (WeightsMapper) to Flux2Transformer2DModel.load_weights() with auto-detection - Add transformer_weights_path to OmniDiffusionConfig for loading transformer weights from a separate checkpoint path - Add NVFP4 auto-detection from safetensors file headers - Update FluxPipeline and Flux2KleinPipeline to support separate transformer weight paths - Add unit tests for weight mapping, format detection, and auto-detection Leverages vLLM's existing ModelOptNvFp4Config and ModelOptNvFp4LinearMethod registered as "modelopt_fp4" — no custom quantization config needed. Reference: sgl-project/sglang#20137 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add NVIDIA FP4 (NVFP4) quantization support for Flux.2 diffusion models, enabling running FLUX.2-dev-NVFP4 pre-quantized checkpoints via ModelOpt. Key changes: - Add BFL checkpoint weight name mapping (WeightsMapper) to Flux2Transformer2DModel.load_weights() with auto-detection - Add transformer_weights_path to OmniDiffusionConfig for loading transformer weights from a separate checkpoint path - Add NVFP4 auto-detection from safetensors file headers - Update FluxPipeline and Flux2KleinPipeline to support separate transformer weight paths - Add unit tests for weight mapping, format detection, and auto-detection Leverages vLLM's existing ModelOptNvFp4Config and ModelOptNvFp4LinearMethod registered as "modelopt_fp4" — no custom quantization config needed. Reference: sgl-project/sglang#20137 Signed-off-by: lishunyang <lishunyang12@163.com>

Co-authored-by: zcnrex <zcnrex@gmail.com> Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Yikang Cai <dcai@catalyst-fleet1.cs.cmu.edu> Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com> Co-authored-by: RubiaCx <1084281732@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com>

zcnrex and others added 23 commits January 11, 2026 06:06

copy modelopt_quant

2e6549a

Merge branch 'main' into sgld-quant

04e7279

keep only needed code in modelopt_quant

624322d

update

c6ab082

refactor modelopt_quant

c8bd1c9

format

6e0b6a3

update

470e556

use sgld param class

14c244a

Merge branch 'main' into sgld-quant

665ec95

add quant config to flux_2

4873c5c

fix

a78a985

format

97ca30d

use dit_config.quant_config

debd94f

attempt to resolve quant model path

1f1e27b

add nvfp4 model loading logic

37e4596

Merge branch 'main' into sgld-quant

1684e7b

Merge branch 'main' into sgld-quant

ade9bc2

Merge branch 'main' into nvfp4

ce17261

Support flux2-nvfp4 weight loading: Support one weight to multiple

58a5bce

parameters for packed qkv

remove debug prints

7127893

merge main

d2bf205

fix nvfp4 config parsing

2e5c0c7

fix guidance parem omitted

53b7270

ykcai-daniel requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg, HaiShaw, ch-wan and mickqian as code owners March 8, 2026 18:32

mickqian added 4 commits March 23, 2026 14:32

tmp

d8978ce

fix flux2 mixed nvfp4 tp fallback defaults

45890d1

fix ci

dffaa44

Merge remote-tracking branch 'origin/main' into nvfp4

c7a46a0

mickqian force-pushed the nvfp4 branch from 4b2322c to c7a46a0 Compare March 23, 2026 12:59

mickqian added 3 commits March 23, 2026 21:03

Revert "fix flux2 mixed nvfp4 tp fallback defaults"

4e03b44

This reverts commit 45890d1.

Revert "tmp"

e767c27

This reverts commit d8978ce.

Revert "fix attempt"

40e4bc1

This reverts commit f562dfd.

mickqian merged commit 281fe10 into sgl-project:main Mar 25, 2026
218 of 228 checks passed

yhyang201 mentioned this pull request Mar 30, 2026

[diffusion] Fix Flux.2 #21664

Merged

5 tasks

ykcai-daniel mentioned this pull request Apr 2, 2026

[RFC] NVIDIA ModelOpt Quantized Diffusion & Video Model Deployment in SGLang #20959

Open

lishunyang12 mentioned this pull request Apr 6, 2026

[Quantization] Support NVFP4 for Flux.2 vllm-project/vllm-omni#2517

Draft

lishunyang12 mentioned this pull request Apr 15, 2026

[RFC] NVFP4 quantization support for diffusion models (FLUX.2-dev-NVFP4 tracking) lishunyang12/vllm-omni#1

Open

BBuf mentioned this pull request Apr 28, 2026

SGLang Diffusion 外部影响力调研：kernel、feature 与平台采用情况 BBuf/how-to-optim-algorithm-in-cuda#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[diffusion] Support nvfp4 for Flux.2#20137

[diffusion] Support nvfp4 for Flux.2#20137
mickqian merged 70 commits intosgl-project:mainfrom
ykcai-daniel:nvfp4

ykcai-daniel commented Mar 8, 2026 •

edited

Loading

Uh oh!

ykcai-daniel commented Mar 23, 2026

Uh oh!

yhyang201 commented Mar 24, 2026

Uh oh!

yhyang201 commented Mar 24, 2026

Uh oh!

yhyang201 commented Mar 24, 2026

Uh oh!

yhyang201 commented Mar 24, 2026

Uh oh!

ping1jing2 commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

ykcai-daniel commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Modifications

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

ykcai-daniel commented Mar 23, 2026

Uh oh!

yhyang201 commented Mar 24, 2026

Uh oh!

yhyang201 commented Mar 24, 2026

Uh oh!

yhyang201 commented Mar 24, 2026

Uh oh!

yhyang201 commented Mar 24, 2026

Uh oh!

ping1jing2 commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ykcai-daniel commented Mar 8, 2026 •

edited

Loading