Revert "[Diffusion] Add FLUX.1-dev ModelOpt NVFP4 support (#22574)"#22649
Revert "[Diffusion] Add FLUX.1-dev ModelOpt NVFP4 support (#22574)"#22649
Conversation
This reverts commit 03a1a7b
|
/tag-and-rerun-ci |
There was a problem hiding this comment.
Code Review
This pull request simplifies the ModelOpt quantization workflow by removing the NVFP4 mixed-precision builder, hardcoding nibble swapping, and renaming the FP8 conversion tool to convert_modelopt_fp8_checkpoint. It also removes the prefix argument from several linear layer initializations within the FLUX model. Feedback indicates that the removal of the prefix argument is inconsistent in FluxAttention, which could lead to runtime errors, and suggests removing the now-unused prefix parameter from the FluxSingleTransformerBlock constructor.
| bias=bias, | ||
| gather_output=True, | ||
| quant_config=quant_config, | ||
| prefix=f"{prefix}.to_q" if prefix else "to_q", |
There was a problem hiding this comment.
While this revert correctly removes the prefix argument from this ColumnParallelLinear call, other calls within FluxAttention (e.g., for to_out and to_add_out) still use the prefix argument. This suggests an incomplete revert. If ColumnParallelLinear no longer accepts prefix after this revert, those calls will cause a runtime error. Please ensure the prefix argument is removed from all ColumnParallelLinear calls consistently.
| bias=True, | ||
| gather_output=True, | ||
| quant_config=quant_config, | ||
| prefix=f"{prefix}.proj_mlp" if prefix else "proj_mlp", |
This reverts commit 03a1a7b #22574
failed ci: https://github.com/sgl-project/sglang/actions/runs/24322506533/job/71011288001?pr=22633
Motivation
Modifications
Accuracy Tests
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci