Feature/support distilled vae generic by baonudesifeizhai · Pull Request #14195 · sgl-project/sglang

baonudesifeizhai · 2025-12-01T07:08:41Z

Motivation

Add support for fal/FLUX.2-Tiny-AutoEncoder, a distilled VAE that doesn't work with the standard diffusers pipeline.
#14004

Modifications

Modifications
Added Flux2FinetunedPipelineConfig for distilled VAEs
Auto-detect FLUX.2-Tiny-AutoEncoder and switch to the specialized config
Handle patchified latents (128 channels) without unpatchify
Skip external scaling (VAE handles it internally)
Support 5D latents for decoding
Load custom VAE classes via auto_map from HuggingFace

Accuracy Tests

Benchmarking and Profiling

sglang generate
--model-path black-forest-labs/FLUX.2-dev
--vae-path fal/FLUX.2-Tiny-AutoEncoder
--num-gpus 8
--tp-size 8
--prompt "beatiful women with long yellow hair,around 30 ages"
--width 384
--height 384
--trust-remote-code
--vae-precision bf16
--vae-cpu-offload
--text-encoder-cpu-offload
--image-encoder-cpu-offload
--dit-cpu-offload
--pin-cpu-memory
--log-level debug

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

This reverts commit 6f48bbd.

- Add dynamic VAE type detection based on actual attributes (bn presence) - Support both standard Flux2 VAE and distilled VAE automatically - Add --vae-path CLI argument for custom VAE models - Handle DecoderOutput return type from VAE decode - Automatically adapt preprocess_decoding and get_decode_scale_and_shift based on VAE type - Support any HuggingFace VAE model with access permissions

…neric

gemini-code-assist · 2025-12-01T07:09:02Z

Summary of Changes

Hello @baonudesifeizhai, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the system's flexibility and compatibility by introducing robust support for distilled and fine-tuned VAE models, particularly the fal/FLUX.2-Tiny-AutoEncoder. The changes enable dynamic adaptation to different VAE behaviors, streamline the loading of custom VAE architectures, and ensure correct processing of various latent formats, ultimately broadening the range of generative models that can be effectively utilized.

Highlights

Distilled VAE Support: Introduced a new Flux2FinetunedPipelineConfig specifically designed to support distilled VAEs like fal/FLUX.2-Tiny-AutoEncoder, which require different handling than standard VAEs.
Dynamic VAE Handling: Implemented dynamic detection of VAE types (standard vs. distilled) during preprocessing and scaling. This allows the system to correctly handle patchified latents (128 channels) without unpatchifying for distilled VAEs and to skip external scaling when the VAE handles it internally.
5D Latent Support: Added support for 5D latents (batch, channels, frames, height, width) during decoding, enabling compatibility with multi-frame generation scenarios.
Custom VAE Loading: Enhanced the VAE loading mechanism to support custom VAE classes via HuggingFace's auto_map feature, providing greater flexibility for integrating diverse VAE architectures.
Configurable VAE Path: Added a --vae-path argument to ServerArgs, allowing users to specify a custom path for loading VAE models, which can also trigger automatic download from the HuggingFace Hub if the path is a model ID.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for distilled VAEs, specifically fal/FLUX.2-Tiny-AutoEncoder, by adding a new Flux2FinetunedPipelineConfig and implementing dynamic VAE loading and adaptive preprocessing logic. Key changes include modifying FluxVAEConfig to dynamically calculate vae_scale_factor, updating PipelineConfig.from_kwargs to auto-detect and use the fine-tuned VAE configuration, and enhancing VAELoader to support custom VAE classes loaded via HuggingFace's auto_map. The preprocess_decoding and get_decode_scale_and_shift methods in Flux2PipelineConfig and Flux2FinetunedPipelineConfig have been adapted to handle the specific requirements of distilled VAEs, including 5D latent support and conditional unpatchifying. Overall, the changes are well-structured and address the motivation effectively.

mickqian

over all LGTM.
We should add it to CI, in this PR

mickqian · 2025-12-01T08:26:44Z

-    def load_customized(
-        self, component_model_path: str, server_args: ServerArgs, module_name: str
-    ):
-        """


Could you resume the method doc?

… registry.py

baonudesifeizhai · 2025-12-02T13:52:25Z

works normal right now

mickqian · 2025-12-02T14:30:29Z

+                    trust_remote_code=server_args.trust_remote_code,
+                )
+            vae = vae.to(device=target_device, dtype=vae_dtype)
+            return vae.eval()


/gemini any idea for this part

This new block of code is essential for supporting custom VAE architectures, such as the fal/FLUX.2-Tiny-AutoEncoder mentioned in the pull request description. These custom VAEs often define their loading mechanism via an auto_map entry in their configuration, which points to a specific module and class name.

The importlib logic dynamically loads the specified VAE class from a Python file within the model's directory. This allows the system to correctly instantiate and use VAEs that don't conform to a standard, predefined structure, leveraging HuggingFace's from_pretrained method with trust_remote_code for flexibility. This approach is robust for handling such specialized models.

mickqian

LGTM

mickqian · 2025-12-02T14:34:23Z


-            component_model_path = os.path.join(self.model_path, load_module_name)
+            # Use custom VAE path if provided, otherwise use default path
+            if module_name == "vae" and server_args.vae_path is not None:


better make this a separate function for future custom VAEs

mickqian · 2025-12-02T14:35:12Z

/tag-and-rerun-ci

baonudesifeizhai · 2025-12-02T20:52:15Z

emmm so whats next ? looks like all checks passed?

mickqian · 2025-12-03T07:44:58Z

could you add this arg to cli.md？

baonudesifeizhai · 2025-12-03T11:19:11Z

#14355

could you add this arg to cli.md？

Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Mick <mickjagger19@icloud.com>

BBuf and others added 20 commits November 29, 2025 13:57

add moe_wna16_marlin_gemm_v2

6f48bbd

Revert "add moe_wna16_marlin_gemm_v2"

eeea208

This reverts commit 6f48bbd.

Merge branch 'sgl-project:main' into main

7ec9d22

Merge branch 'main' of https://github.com/baonudesifeizhai/sglang

deaed14

fix

4bdce09

fix?

7d641d0

fix

0c42ec7

fix

fbdeaee

fix

cddc96b

fix

bfe7098

fix

81c3350

fix

9a07e96

try tor fix?

4f936ac

fix?

faeb44f

add log

1b0351b

fix and remove

871c67c

fix

88ea3fd

change

29f7ade

Merge branch 'sgl-project:main' into feature/support-distilled-vae-ge…

83dfc18

…neric

baonudesifeizhai requested a review from mickqian as a code owner December 1, 2025 07:08

github-actions Bot added the diffusion SGLang Diffusion label Dec 1, 2025

gemini-code-assist Bot reviewed Dec 1, 2025

View reviewed changes

Comment thread python/sglang/multimodal_gen/configs/pipeline_configs/flux.py Outdated

Comment thread python/sglang/multimodal_gen/runtime/pipelines_core/stages/decoding.py Outdated

for coding style

3e49daf

mickqian reviewed Dec 1, 2025

View reviewed changes

change for reviewer

adf2fb1

baonudesifeizhai requested a review from mickqian December 1, 2025 20:50

change back

c478270

mickqian reviewed Dec 2, 2025

View reviewed changes

Comment thread python/sglang/multimodal_gen/runtime/pipelines_core/composed_pipeline_base.py

Comment thread python/sglang/multimodal_gen/configs/pipeline_configs/base.py Outdated

baonudesifeizhai requested a review from yhyang201 as a code owner December 2, 2025 12:24

baonudesifeizhai force-pushed the feature/support-distilled-vae-generic branch from bfee256 to c478270 Compare December 2, 2025 12:59

baonudesifeizhai and others added 5 commits December 2, 2025 08:03

change for conflict

be33abf

Merge branch 'main' into feature/support-distilled-vae-generic

7861bc6

fix: pass vae_path to get_model_info for Flux2 fine-tuned VAE detection

f5d5d93

fix: remove Flux2FinetunedPipelineConfig import and registration from…

1cdfcf7

… registry.py

fix

18119fe

mickqian reviewed Dec 2, 2025

View reviewed changes

mickqian approved these changes Dec 2, 2025

View reviewed changes

github-actions Bot added the run-ci label Dec 2, 2025

remove load_customized

0e0f5e4

mickqian merged commit f764c69 into sgl-project:main Dec 3, 2025
47 checks passed

yingluosanqian pushed a commit to yingluosanqian/sglang that referenced this pull request Dec 4, 2025

[diffusion] feat: support distilled vae generic (sgl-project#14195)

c3e39da

Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Mick <mickjagger19@icloud.com>

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025

[diffusion] feat: support distilled vae generic (sgl-project#14195)

cde875c

Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Mick <mickjagger19@icloud.com>

yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025

[diffusion] feat: support distilled vae generic (sgl-project#14195)

6346d56

Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Mick <mickjagger19@icloud.com>

Conversation

baonudesifeizhai commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Dec 1, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

mickqian left a comment

Choose a reason for hiding this comment

Uh oh!

mickqian Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

baonudesifeizhai commented Dec 2, 2025

Uh oh!

mickqian Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian left a comment

Choose a reason for hiding this comment

Uh oh!

mickqian Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian commented Dec 2, 2025

Uh oh!

baonudesifeizhai commented Dec 2, 2025

Uh oh!

Uh oh!

mickqian commented Dec 3, 2025

Uh oh!

baonudesifeizhai commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

baonudesifeizhai commented Dec 1, 2025 •

edited

Loading