[Diffusion] modelopt diffusion fp8 support for flux1/flux2 and wan2.2 by BBuf · Pull Request #22365 · sgl-project/sglang

BBuf · 2026-04-08T13:49:03Z

Summary

This PR adds a diffusion-side ModelOpt FP8 loading path for SGLang and a reusable workflow for converting ModelOpt diffusers exports into SGLang-loadable checkpoints.

The main goal is to make ModelOpt FP8 practical for SGLang diffusion models without requiring users to manually reconstruct FP8 checkpoints from backbone.pt every time.

What changed

Runtime support

add a dedicated modelopt_fp8 quantization path for diffusion models
resolve quant_method=modelopt + quant_algo=FP8 into the SGLang diffusion FP8 runtime path
allow quant config detection from an overridden transformer checkpoint instead of relying only on the base model config
force-disable dit_cpu_offload and dit_layerwise_offload for ModelOpt FP8 checkpoints

Why offload is disabled:

the current diffusion FP8 linear path depends on a CUTLASS-compatible FP8 weight layout
the DiT offload/restore path does not preserve that layout
enabling those offload modes can break FP8 GEMM expectations during runtime

FP8 checkpoint conversion

add python -m sglang.multimodal_gen.tools.convert_modelopt_fp8_checkpoint
this tool reads a ModelOpt diffusers FP8 export plus backbone.pt
it reconstructs weight_scale / input_scale
it materializes SGLang-native float8_e4m3fn weights
it preserves ModelOpt ignore layers in their original dtype

The converter is generic in its core flow. The only model-family-specific part is an optional BF16 fallback profile. Today the validated built-in fallback profile is for FLUX.2.

Validation helper

add python -m sglang.multimodal_gen.tools.compare_diffusion_trajectory_similarity
this compares BF16 and quantized runs with the same prompt/seed and reports:
- latent trajectory cosine similarity
- MAE / RMSE / max-abs
- final image or video frame PSNR / MAE

Skill

add a reusable skill for future diffusion ModelOpt work:
- python/sglang/multimodal_gen/.claude/skills/sglang-diffusion-modelopt-quant/SKILL.md
the skill focuses on a reusable workflow first, with FLUX.2 and Wan2.2 as reference examples

Notes on ModelOpt formats

FP8 currently needs an extra SGLang-side conversion step.

Why:

the current diffusion FP8 runtime expects explicit weight_scale and input_scale tensors
the validated ModelOpt diffusers FP8 export still needs those tensors to be materialized from backbone.pt
SGLang also consumes validated float8_e4m3fn weights in the converted checkpoint

NVFP4 is different:

the official ModelOpt diffusers export already contains the packed FP4 weights, scale tensors, and enough metadata for SGLang to rebuild the runtime quant config
SGLang mainly needs checkpoint-family detection plus runtime layout adaptation

Published checkpoints

The following converted checkpoints are already published so users do not need to run ModelOpt export + SGLang conversion themselves.

FLUX.2 FP8 transformer:
- BBuf/flux2-dev-modelopt-fp8-sglang-transformer
Wan2.2 FP8 transformer:
- BBuf/wan22-t2v-a14b-modelopt-fp8-sglang-transformer

Example usage:

python -m sglang.multimodal_gen.runtime.entrypoints.cli.main generate \
  --model-path black-forest-labs/FLUX.2-dev \
  --transformer-path BBuf/flux2-dev-modelopt-fp8-sglang-transformer \
  ...

python -m sglang.multimodal_gen.runtime.entrypoints.cli.main generate \
  --model-path Wan-AI/Wan2.2-T2V-A14B-Diffusers \
  --transformer-path BBuf/wan22-t2v-a14b-modelopt-fp8-sglang-transformer \
  ...

Note:

the published Wan2.2 checkpoint is the validated primary transformer FP8 override currently used in our H100 validation runs
transformer_2 remains loaded from the base model in BF16 for that published recipe

Validation

FLUX.2

Validation was run on H100 with nightly-aligned settings and BF16/FP8 output comparisons.

Observed latency:

BF16: 24.47 s total, 23.21 s denoising
FP8: 17.13 s total, 16.21 s denoising
improvement: about 30.0% total and 30.1% denoising

Reduced deterministic validation also showed high latent trajectory agreement:

last-step latent cosine similarity: about 0.9971

5.7ms->2.5ms in Profile 5 step's last layer.

Wan2.2

Validation was run on H100 with nightly-aligned settings using the validated primary-transformer FP8 override.

Observed latency:

BF16: 212.19 s total, 204.09 s denoising
FP8: 204.38 s total, 196.28 s denoising
improvement: about 3.68% total and 3.83% denoising

Reduced deterministic validation also showed stable trajectory agreement:

last-step latent cosine similarity: about 0.9755

wan22_bf16_nocompile.mp4

wan22_fp8_nocompile.mp4

Artifacts

For both FLUX.2 and Wan2.2, I collected:

BF16 and FP8 generated outputs
torch profiler traces
perf json dumps
trajectory similarity results

These artifacts were used during local validation and can be attached in review if needed.

Scope

This PR focuses on:

diffusion ModelOpt FP8 loading
ModelOpt FP8 checkpoint conversion
reusable validation guidance and workflow

It does not add ModelOpt mixed precision support.

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist

Code Review

This pull request implements support for NVIDIA ModelOpt FP8 and NVFP4 quantization in SGLang Diffusion, introducing new runtime layers, loading adapters, and tools for checkpoint conversion and accuracy validation. Feedback focuses on generalizing the layer exclusion logic to avoid LLM-specific assumptions and ensuring that parameter metadata is preserved during weight processing by using appropriate utility functions.

gemini-code-assist · 2026-04-08T13:53:27Z

+        import regex as re
+
+        fused_patterns = ["q_a_proj", "q_b_proj", "kv_a_proj_with_mqa", "kv_b_proj"]
+        prefix_split = prefix.split(".")
+        for pattern in self.exclude_modules:
+            regex_str = pattern.replace(".", r"\.").replace("*", r".*")
+            pattern_split = pattern.split(".")
+            if re.fullmatch(regex_str, prefix):
+                return True
+            if (
+                pattern_split[-1] in fused_patterns
+                and pattern_split[-1] in prefix_split[-1]
+            ):
+                assert len(prefix_split) == 5 and len(pattern_split) == 5
+                return True
+        return False


The is_layer_excluded method contains logic and assertions that are specific to LLM layer structures in sglang.srt (e.g., fused_patterns like q_a_proj and assert len(prefix_split) == 5). These are likely not applicable to diffusion models and could cause runtime errors or incorrect exclusion behavior. Additionally, it's recommended to use the standard re library instead of regex for these simple patterns.

import re for pattern in self.exclude_modules: regex_str = pattern.replace(".", r"\.").replace("*", r".*") if re.fullmatch(regex_str, prefix): return True return False

gemini-code-assist · 2026-04-08T13:53:27Z

+        layer.weight = Parameter(quantized_weight.t(), requires_grad=False)
+        if self.cutlass_fp8_supported:
+            max_w_scale = convert_to_channelwise(max_w_scale, layer.logical_widths)
+        layer.weight_scale = Parameter(max_w_scale, requires_grad=False)
+        layer.input_scale = Parameter(layer.input_scale.max(), requires_grad=False)


In process_weights_after_loading, replacing layer.weight, layer.weight_scale, and layer.input_scale with plain Parameter objects removes the custom metadata and attributes (like weight_loader, input_dim, etc.) associated with ModelWeightParameter and PerTensorScaleParameter. It is safer to use copy_or_rebind_param to update the data while preserving the parameter types and their metadata.

Suggested change

layer.weight = Parameter(quantized_weight.t(), requires_grad=False)

if self.cutlass_fp8_supported:

max_w_scale = convert_to_channelwise(max_w_scale, layer.logical_widths)

layer.weight_scale = Parameter(max_w_scale, requires_grad=False)

layer.input_scale = Parameter(layer.input_scale.max(), requires_grad=False)

copy_or_rebind_param(layer, "weight", quantized_weight.t())

if self.cutlass_fp8_supported:

max_w_scale = convert_to_channelwise(max_w_scale, layer.logical_widths)

copy_or_rebind_param(layer, "weight_scale", max_w_scale)

copy_or_rebind_param(layer, "input_scale", layer.input_scale.max())

BBuf · 2026-04-09T05:53:44Z

/tag-and-rerun-ci

mickqian

some TODOs:

adapt quantization doc if necessary
add at least one testcase for modelopt fp8

mickqian · 2026-04-09T06:35:59Z

    """
    quant_config = get_quant_config(hf_config, component_model_path)
    if quant_config is None and server_args.transformer_weights_path:
+        override_quantized_path = maybe_download_model(


maybe extract to a dedicated function here to better illustrate the quant load logic

BBuf · 2026-04-10T05:03:10Z

Split the ModelOpt FP8 skill and helper tooling out into stacked PR #22492 so this PR stays focused on the runtime / loader / test changes.

This PR now only keeps the runtime-side code, docs, and the diffusion FP8 correctness test.

BBuf · 2026-04-10T08:57:38Z

/tag-and-rerun-ci

BBuf · 2026-04-10T12:50:12Z

FLUX1

main

{
  "timestamp": "2026-04-10T08:54:23.530155+00:00",
  "request_id": "19b63cc2-4529-4b9a-9aa7-33a0783dcda2",
  "commit_hash": "N/A",
  "tag": "cli_generate",
  "total_duration_ms": 6695.81201497931,
  "steps": [
    {
      "name": "InputValidationStage",
      "duration_ms": 0.025160028599202633
    },
    {
      "name": "TextEncodingStage",
      "duration_ms": 30.52324301097542
    },
    {
      "name": "TimestepPreparationStage",
      "duration_ms": 0.36307109985500574
    },
    {
      "name": "LatentPreparationStage",
      "duration_ms": 0.11596397962421179
    },
    {
      "name": "DenoisingStage",
      "duration_ms": 6460.792711004615
    },
    {
      "name": "DecodingStage",
      "duration_ms": 27.556644985452294
    }
  ],
  "denoise_steps_ms": [
    {
      "step": 0,
      "duration_ms": 45.21148093044758
    },
    {
      "step": 1,
      "duration_ms": 107.14216693304479
    },
    {
      "step": 2,
      "duration_ms": 127.85932107362896
    },
    {
      "step": 3,
      "duration_ms": 127.8339650016278
    },
    {
      "step": 4,
      "duration_ms": 128.16816894337535
    },
    {
      "step": 5,
      "duration_ms": 128.27936396934092
    },
    {
      "step": 6,
      "duration_ms": 128.3823469420895
    },
    {
      "step": 7,
      "duration_ms": 135.9735649311915
    },
    {
      "step": 8,
      "duration_ms": 139.40520700998604
    },
    {
      "step": 9,
      "duration_ms": 133.79655499011278
    },
    {
      "step": 10,
      "duration_ms": 129.9675980117172
    },
    {
      "step": 11,
      "duration_ms": 128.2759360037744
    },
    {
      "step": 12,
      "duration_ms": 128.12580401077867
    },
    {
      "step": 13,
      "duration_ms": 128.40036593843251
    },
    {
      "step": 14,
      "duration_ms": 131.5733449300751
    },
    {
      "step": 15,
      "duration_ms": 138.8318210374564
    },
    {
      "step": 16,
      "duration_ms": 136.0765720019117
    },
    {
      "step": 17,
      "duration_ms": 131.5552389714867
    },
    {
      "step": 18,
      "duration_ms": 128.84778704028577
    },
    {
      "step": 19,
      "duration_ms": 128.27524892054498
    },
    {
      "step": 20,
      "duration_ms": 128.18402098491788
    },
    {
      "step": 21,
      "duration_ms": 129.26305492874235
    },
    {
      "step": 22,
      "duration_ms": 135.51268901210278
    },
    {
      "step": 23,
      "duration_ms": 133.56514391489327
    },
    {
      "step": 24,
      "duration_ms": 134.86903789453208
    },
    {
      "step": 25,
      "duration_ms": 130.30908501241356
    },
    {
      "step": 26,
      "duration_ms": 128.80951107945293
    },
    {
      "step": 27,
      "duration_ms": 128.9640519535169
    },
    {
      "step": 28,
      "duration_ms": 129.3845809996128
    },
    {
      "step": 29,
      "duration_ms": 132.02287105377764
    },
    {
      "step": 30,
      "duration_ms": 135.59336494654417
    },
    {
      "step": 31,
      "duration_ms": 134.8614349262789
    },
    {
      "step": 32,
      "duration_ms": 131.2766190385446
    },
    {
      "step": 33,
      "duration_ms": 131.51185400784016
    },
    {
      "step": 34,
      "duration_ms": 128.7684499984607
    },
    {
      "step": 35,
      "duration_ms": 128.77396901603788
    },
    {
      "step": 36,
      "duration_ms": 130.59861096553504
    },
    {
      "step": 37,
      "duration_ms": 132.69286998547614
    },
    {
      "step": 38,
      "duration_ms": 134.68070805538446
    },
    {
      "step": 39,
      "duration_ms": 132.85610906314105
    },
    {
      "step": 40,
      "duration_ms": 130.88433700613678
    },
    {
      "step": 41,
      "duration_ms": 129.696708987467
    },
    {
      "step": 42,
      "duration_ms": 130.65053895115852
    },
    {
      "step": 43,
      "duration_ms": 130.38856105413288
    },
    {
      "step": 44,
      "duration_ms": 131.33260502945632
    },
    {
      "step": 45,
      "duration_ms": 133.01292096730322
    },
    {
      "step": 46,
      "duration_ms": 133.72311799321324
    },
    {
      "step": 47,
      "duration_ms": 132.46590201742947
    },
    {
      "step": 48,
      "duration_ms": 129.67829301487654
    },
    {
      "step": 49,
      "duration_ms": 130.17743604723364
    }
  ],
  "memory_checkpoints": {
    "before_forward": {
      "allocated_mb": 23069.31,
      "reserved_mb": 31882.0,
      "peak_allocated_mb": 23069.31,
      "peak_reserved_mb": 31882.0
    },
    "after_forward": {
      "allocated_mb": 23085.82,
      "reserved_mb": 31882.0,
      "peak_allocated_mb": 27956.82,
      "peak_reserved_mb": 31882.0
    }
  },
  "meta": {
    "prompt": [
      "A futuristic cyberpunk city at night, neon lights reflecting on wet streets"
    ],
    "model": "/tmp/flux1_fp8_run/base_model"
  }
}

pr (fp8)

{
  "timestamp": "2026-04-10T09:19:39.207633+00:00",
  "request_id": "91de44b2-a4f2-4002-8d82-b082c7b3bcaa",
  "commit_hash": "N/A",
  "tag": "cli_generate",
  "total_duration_ms": 5582.085501984693,
  "steps": [
    {
      "name": "InputValidationStage",
      "duration_ms": 0.030776020139455795
    },
    {
      "name": "TextEncodingStage",
      "duration_ms": 35.968265030533075
    },
    {
      "name": "TimestepPreparationStage",
      "duration_ms": 0.4988630535081029
    },
    {
      "name": "LatentPreparationStage",
      "duration_ms": 0.13579893857240677
    },
    {
      "name": "DenoisingStage",
      "duration_ms": 5350.660264957696
    },
    {
      "name": "DecodingStage",
      "duration_ms": 22.37797703128308
    }
  ],
  "denoise_steps_ms": [
    {
      "step": 0,
      "duration_ms": 39.828247972764075
    },
    {
      "step": 1,
      "duration_ms": 92.50018000602722
    },
    {
      "step": 2,
      "duration_ms": 107.16919403057545
    },
    {
      "step": 3,
      "duration_ms": 107.11465799249709
    },
    {
      "step": 4,
      "duration_ms": 106.54556192457676
    },
    {
      "step": 5,
      "duration_ms": 107.54498501773924
    },
    {
      "step": 6,
      "duration_ms": 107.66280791722238
    },
    {
      "step": 7,
      "duration_ms": 107.3214530479163
    },
    {
      "step": 8,
      "duration_ms": 110.10823596734554
    },
    {
      "step": 9,
      "duration_ms": 116.6555939707905
    },
    {
      "step": 10,
      "duration_ms": 109.03783701360226
    },
    {
      "step": 11,
      "duration_ms": 107.415645965375
    },
    {
      "step": 12,
      "duration_ms": 107.34590794891119
    },
    {
      "step": 13,
      "duration_ms": 107.16972593218088
    },
    {
      "step": 14,
      "duration_ms": 106.64337896741927
    },
    {
      "step": 15,
      "duration_ms": 106.93808004725724
    },
    {
      "step": 16,
      "duration_ms": 107.53305698744953
    },
    {
      "step": 17,
      "duration_ms": 109.33092003688216
    },
    {
      "step": 18,
      "duration_ms": 111.23268003575504
    },
    {
      "step": 19,
      "duration_ms": 110.03234900999814
    },
    {
      "step": 20,
      "duration_ms": 108.85851900093257
    },
    {
      "step": 21,
      "duration_ms": 108.3691141102463
    },
    {
      "step": 22,
      "duration_ms": 108.2050099503249
    },
    {
      "step": 23,
      "duration_ms": 107.47028910554945
    },
    {
      "step": 24,
      "duration_ms": 107.59423393756151
    },
    {
      "step": 25,
      "duration_ms": 107.26817406248301
    },
    {
      "step": 26,
      "duration_ms": 108.95517095923424
    },
    {
      "step": 27,
      "duration_ms": 109.6436909865588
    },
    {
      "step": 28,
      "duration_ms": 110.43768108356744
    },
    {
      "step": 29,
      "duration_ms": 109.63370196986943
    },
    {
      "step": 30,
      "duration_ms": 108.53177506942302
    },
    {
      "step": 31,
      "duration_ms": 108.52476407308131
    },
    {
      "step": 32,
      "duration_ms": 108.16039005294442
    },
    {
      "step": 33,
      "duration_ms": 108.08389307931066
    },
    {
      "step": 34,
      "duration_ms": 108.0660269362852
    },
    {
      "step": 35,
      "duration_ms": 109.21005508862436
    },
    {
      "step": 36,
      "duration_ms": 109.02151791378856
    },
    {
      "step": 37,
      "duration_ms": 109.09041599370539
    },
    {
      "step": 38,
      "duration_ms": 109.37492293305695
    },
    {
      "step": 39,
      "duration_ms": 109.04035402927548
    },
    {
      "step": 40,
      "duration_ms": 108.39987103827298
    },
    {
      "step": 41,
      "duration_ms": 108.93821506761014
    },
    {
      "step": 42,
      "duration_ms": 108.05032507050782
    },
    {
      "step": 43,
      "duration_ms": 108.09110407717526
    },
    {
      "step": 44,
      "duration_ms": 108.69649704545736
    },
    {
      "step": 45,
      "duration_ms": 109.48362399358302
    },
    {
      "step": 46,
      "duration_ms": 108.94432198256254
    },
    {
      "step": 47,
      "duration_ms": 109.13362202700227
    },
    {
      "step": 48,
      "duration_ms": 109.00457098614424
    },
    {
      "step": 49,
      "duration_ms": 109.0579240117222
    }
  ],
  "memory_checkpoints": {
    "before_forward": {
      "allocated_mb": 17663.82,
      "reserved_mb": 26480.0,
      "peak_allocated_mb": 17663.82,
      "peak_reserved_mb": 26480.0
    },
    "after_forward": {
      "allocated_mb": 17680.33,
      "reserved_mb": 26480.0,
      "peak_allocated_mb": 22551.33,
      "peak_reserved_mb": 26480.0
    }
  },
  "meta": {
    "prompt": [
      "A futuristic cyberpunk city at night, neon lights reflecting on wet streets"
    ],
    "model": "/tmp/flux1_fp8_run/base_model"
  }
}

BBuf · 2026-04-10T12:56:49Z

https://github.com/sgl-project/sglang/actions/runs/24239960667/job/70771892133?pr=22365

mickqian · 2026-04-10T16:25:35Z

this pr breaks ci: https://github.com/sgl-project/sglang/actions/runs/24252399441/job/70815045584?pr=22507

Mock maybe_download_model in test_resolve_transformer_quant_load_spec_keeps_nunchaku_hook to prevent it from trying to download a fake local path as an HF repo. #22365 added _resolve_quant_config_from_transformer_override which calls maybe_download_model on the transformer_weights_path, but the test uses a non-existent /tmp path that fails HF Hub validation.

Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>

…#22365)

…sgl-project#22365)

…gl-project#22560) Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>

…sgl-project#22365)

…gl-project#22560) Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>

BBuf added 11 commits April 8, 2026 19:25

Add diffusion ModelOpt FP8 support and validation tooling

fe187f3

Refine ModelOpt FP8 PR artifact notes

cb1964c

Add published FP8 checkpoint references

98fe3f2

Drop local PR helper docs from branch

39f48d8

Tighten NVFP4 skill wording

4420978

Rewrite ModelOpt quant skill for reuse

5ba9cb1

Clarify component override CLI examples

a1c1bc4

Tighten ModelOpt diffusion quant tests and config parsing

af60b45

Tighten diffusion FP8 converter notes

bc70e0c

Trim diffusion quant docs and tests

a08124d

Trim diffusion quant unit tests

b47bf8e

BBuf requested review from mickqian, ping1jing2, yhyang201 and yingluosanqian as code owners April 8, 2026 13:49

github-actions Bot added documentation Improvements or additions to documentation quant LLM Quantization diffusion SGLang Diffusion labels Apr 8, 2026

gemini-code-assist Bot reviewed Apr 8, 2026

View reviewed changes

ud

25fe6c7

github-actions Bot added the run-ci label Apr 9, 2026

mickqian reviewed Apr 9, 2026

View reviewed changes

[diffusion] address ModelOpt FP8 review feedback

ff53003

BBuf requested review from DarkSharpness, HydraQYH, celve and yuan-luo as code owners April 10, 2026 04:53

github-actions Bot added the jit-kernel label Apr 10, 2026

[diffusion] split ModelOpt FP8 skill and tools from runtime PR

e9cf731

BBuf mentioned this pull request Apr 10, 2026

[diffusion] split ModelOpt FP8 skill and tools from runtime PR #22492

Merged

BBuf and others added 3 commits April 10, 2026 17:35

[diffusion] split ModelOpt FP8 skill and tools from runtime PR (#22492)

62d3f52

[diffusion] add FLUX.1-dev ModelOpt FP8 support notes

481d22d

[diffusion] document ModelOpt FP8 and NVFP4 support tables

0dd4503

BBuf changed the title ~~[Diffusion] modelopt diffusion fp8 support for flux2 and wan2.2~~ [Diffusion] modelopt diffusion fp8 support for flux1/flux2 and wan2.2 Apr 10, 2026

BBuf merged commit 1ff5155 into main Apr 10, 2026
42 of 76 checks passed

BBuf deleted the bbuf/modelopt-diffusion-fp8 branch April 10, 2026 12:56

alisonshao mentioned this pull request Apr 10, 2026

[Diffusion][CI] Fix nunchaku unit test broken by #22365 #22560

Merged

1 task

hnyls2002 pushed a commit that referenced this pull request Apr 11, 2026

[Diffusion][CI] Fix nunchaku unit test broken by #22365 (#22560)

75223c5

Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>

Fridge003 pushed a commit that referenced this pull request Apr 11, 2026

[Diffusion] modelopt diffusion fp8 support for flux1/flux2 and wan2.2 (…

d709181

…#22365)

pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026

[Diffusion] modelopt diffusion fp8 support for flux1/flux2 and wan2.2 (…

a600c7c

…sgl-project#22365)

pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026

[Diffusion][CI] Fix nunchaku unit test broken by sgl-project#22365 (s…

a85da64

…gl-project#22560) Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[Diffusion] modelopt diffusion fp8 support for flux1/flux2 and wan2.2 (…

3df80f4

…sgl-project#22365)

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[Diffusion][CI] Fix nunchaku unit test broken by sgl-project#22365 (s…

021ed3f

…gl-project#22560) Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>

BBuf mentioned this pull request Apr 29, 2026

SGLang AI Agent Performance Optimization PRs (2026-01-29 to 2026-04-29) BBuf/AI-Infra-Auto-Driven-SKILLS#46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Diffusion] modelopt diffusion fp8 support for flux1/flux2 and wan2.2#22365

[Diffusion] modelopt diffusion fp8 support for flux1/flux2 and wan2.2#22365
BBuf merged 17 commits intomainfrom
bbuf/modelopt-diffusion-fp8

BBuf commented Apr 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Uh oh!

BBuf commented Apr 9, 2026

Uh oh!

mickqian left a comment

Uh oh!

mickqian Apr 9, 2026

Uh oh!

BBuf Apr 10, 2026

Uh oh!

BBuf commented Apr 10, 2026

Uh oh!

BBuf commented Apr 10, 2026

Uh oh!

BBuf commented Apr 10, 2026

Uh oh!

BBuf commented Apr 10, 2026

Uh oh!

Uh oh!

mickqian commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BBuf commented Apr 8, 2026

Summary

What changed

Runtime support

FP8 checkpoint conversion

Validation helper

Skill

Notes on ModelOpt formats

Published checkpoints

Validation

FLUX.2

Wan2.2

Artifacts

Scope

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

BBuf commented Apr 9, 2026

Uh oh!

mickqian left a comment

Choose a reason for hiding this comment

Uh oh!

mickqian Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

BBuf Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

BBuf commented Apr 10, 2026

Uh oh!

BBuf commented Apr 10, 2026

Uh oh!

BBuf commented Apr 10, 2026

FLUX1

main

pr (fp8)

Uh oh!

BBuf commented Apr 10, 2026

Uh oh!

Uh oh!

mickqian commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mickqian commented Apr 10, 2026 •

edited

Loading