Skip to content

Revert "[Diffusion] Add FLUX.1-dev ModelOpt NVFP4 support (#22574)"#22649

Merged
mickqian merged 1 commit intomainfrom
diffusion-ci
Apr 13, 2026
Merged

Revert "[Diffusion] Add FLUX.1-dev ModelOpt NVFP4 support (#22574)"#22649
mickqian merged 1 commit intomainfrom
diffusion-ci

Conversation

@mickqian
Copy link
Copy Markdown
Collaborator

@mickqian mickqian commented Apr 13, 2026

This reverts commit 03a1a7b #22574

failed ci: https://github.com/sgl-project/sglang/actions/runs/24322506533/job/71011288001?pr=22633

image

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@github-actions github-actions Bot added documentation Improvements or additions to documentation quant LLM Quantization blackwell SM100/SM120 diffusion SGLang Diffusion jit-kernel labels Apr 13, 2026
@mickqian
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies the ModelOpt quantization workflow by removing the NVFP4 mixed-precision builder, hardcoding nibble swapping, and renaming the FP8 conversion tool to convert_modelopt_fp8_checkpoint. It also removes the prefix argument from several linear layer initializations within the FLUX model. Feedback indicates that the removal of the prefix argument is inconsistent in FluxAttention, which could lead to runtime errors, and suggests removing the now-unused prefix parameter from the FluxSingleTransformerBlock constructor.

bias=bias,
gather_output=True,
quant_config=quant_config,
prefix=f"{prefix}.to_q" if prefix else "to_q",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

While this revert correctly removes the prefix argument from this ColumnParallelLinear call, other calls within FluxAttention (e.g., for to_out and to_add_out) still use the prefix argument. This suggests an incomplete revert. If ColumnParallelLinear no longer accepts prefix after this revert, those calls will cause a runtime error. Please ensure the prefix argument is removed from all ColumnParallelLinear calls consistently.

bias=True,
gather_output=True,
quant_config=quant_config,
prefix=f"{prefix}.proj_mlp" if prefix else "proj_mlp",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This change, along with other similar changes in this block, makes the prefix parameter of FluxSingleTransformerBlock.__init__ unused. Please consider removing the prefix parameter from the method signature in a follow-up change to improve code clarity.

@mickqian mickqian merged commit bf022e1 into main Apr 13, 2026
136 of 179 checks passed
@mickqian mickqian deleted the diffusion-ci branch April 13, 2026 03:17
BBuf added a commit that referenced this pull request Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blackwell SM100/SM120 diffusion SGLang Diffusion documentation Improvements or additions to documentation jit-kernel quant LLM Quantization run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant