Skip to content

Feature/support distilled vae generic #14195

Merged
mickqian merged 29 commits intosgl-project:mainfrom
baonudesifeizhai:feature/support-distilled-vae-generic
Dec 3, 2025
Merged

Feature/support distilled vae generic #14195
mickqian merged 29 commits intosgl-project:mainfrom
baonudesifeizhai:feature/support-distilled-vae-generic

Conversation

@baonudesifeizhai
Copy link
Copy Markdown
Contributor

@baonudesifeizhai baonudesifeizhai commented Dec 1, 2025

Motivation

Add support for fal/FLUX.2-Tiny-AutoEncoder, a distilled VAE that doesn't work with the standard diffusers pipeline.
#14004

Modifications

Modifications
Added Flux2FinetunedPipelineConfig for distilled VAEs
Auto-detect FLUX.2-Tiny-AutoEncoder and switch to the specialized config
Handle patchified latents (128 channels) without unpatchify
Skip external scaling (VAE handles it internally)
Support 5D latents for decoding
Load custom VAE classes via auto_map from HuggingFace

Accuracy Tests

Benchmarking and Profiling

sglang generate
--model-path black-forest-labs/FLUX.2-dev
--vae-path fal/FLUX.2-Tiny-AutoEncoder
--num-gpus 8
--tp-size 8
--prompt "beatiful women with long yellow hair,around 30 ages"
--width 384
--height 384
--trust-remote-code
--vae-precision bf16
--vae-cpu-offload
--text-encoder-cpu-offload
--image-encoder-cpu-offload
--dit-cpu-offload
--pin-cpu-memory
--log-level debug
image

Checklist

BBuf and others added 20 commits November 29, 2025 13:57
- Add dynamic VAE type detection based on actual attributes (bn presence)
- Support both standard Flux2 VAE and distilled VAE automatically
- Add --vae-path CLI argument for custom VAE models
- Handle DecoderOutput return type from VAE decode
- Automatically adapt preprocess_decoding and get_decode_scale_and_shift based on VAE type
- Support any HuggingFace VAE model with access permissions
@github-actions github-actions Bot added the diffusion SGLang Diffusion label Dec 1, 2025
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @baonudesifeizhai, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the system's flexibility and compatibility by introducing robust support for distilled and fine-tuned VAE models, particularly the fal/FLUX.2-Tiny-AutoEncoder. The changes enable dynamic adaptation to different VAE behaviors, streamline the loading of custom VAE architectures, and ensure correct processing of various latent formats, ultimately broadening the range of generative models that can be effectively utilized.

Highlights

  • Distilled VAE Support: Introduced a new Flux2FinetunedPipelineConfig specifically designed to support distilled VAEs like fal/FLUX.2-Tiny-AutoEncoder, which require different handling than standard VAEs.
  • Dynamic VAE Handling: Implemented dynamic detection of VAE types (standard vs. distilled) during preprocessing and scaling. This allows the system to correctly handle patchified latents (128 channels) without unpatchifying for distilled VAEs and to skip external scaling when the VAE handles it internally.
  • 5D Latent Support: Added support for 5D latents (batch, channels, frames, height, width) during decoding, enabling compatibility with multi-frame generation scenarios.
  • Custom VAE Loading: Enhanced the VAE loading mechanism to support custom VAE classes via HuggingFace's auto_map feature, providing greater flexibility for integrating diverse VAE architectures.
  • Configurable VAE Path: Added a --vae-path argument to ServerArgs, allowing users to specify a custom path for loading VAE models, which can also trigger automatic download from the HuggingFace Hub if the path is a model ID.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for distilled VAEs, specifically fal/FLUX.2-Tiny-AutoEncoder, by adding a new Flux2FinetunedPipelineConfig and implementing dynamic VAE loading and adaptive preprocessing logic. Key changes include modifying FluxVAEConfig to dynamically calculate vae_scale_factor, updating PipelineConfig.from_kwargs to auto-detect and use the fine-tuned VAE configuration, and enhancing VAELoader to support custom VAE classes loaded via HuggingFace's auto_map. The preprocess_decoding and get_decode_scale_and_shift methods in Flux2PipelineConfig and Flux2FinetunedPipelineConfig have been adapted to handle the specific requirements of distilled VAEs, including 5D latent support and conditional unpatchifying. Overall, the changes are well-structured and address the motivation effectively.

Comment thread python/sglang/multimodal_gen/configs/pipeline_configs/flux.py Outdated
Comment thread python/sglang/multimodal_gen/runtime/pipelines_core/stages/decoding.py Outdated
Copy link
Copy Markdown
Collaborator

@mickqian mickqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

over all LGTM.
We should add it to CI, in this PR

def load_customized(
self, component_model_path: str, server_args: ServerArgs, module_name: str
):
"""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you resume the method doc?

Comment thread python/sglang/multimodal_gen/runtime/loader/component_loader.py Outdated
Comment thread python/sglang/multimodal_gen/runtime/pipelines_core/stages/decoding.py Outdated
Comment thread python/sglang/multimodal_gen/runtime/pipelines_core/stages/decoding.py Outdated
Comment thread python/sglang/multimodal_gen/configs/pipeline_configs/base.py Outdated
@baonudesifeizhai baonudesifeizhai force-pushed the feature/support-distilled-vae-generic branch from bfee256 to c478270 Compare December 2, 2025 12:59
@baonudesifeizhai
Copy link
Copy Markdown
Contributor Author

image works normal right now

trust_remote_code=server_args.trust_remote_code,
)
vae = vae.to(device=target_device, dtype=vae_dtype)
return vae.eval()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gemini any idea for this part

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new block of code is essential for supporting custom VAE architectures, such as the fal/FLUX.2-Tiny-AutoEncoder mentioned in the pull request description. These custom VAEs often define their loading mechanism via an auto_map entry in their configuration, which points to a specific module and class name.

The importlib logic dynamically loads the specified VAE class from a Python file within the model's directory. This allows the system to correctly instantiate and use VAEs that don't conform to a standard, predefined structure, leveraging HuggingFace's from_pretrained method with trust_remote_code for flexibility. This approach is robust for handling such specialized models.

Copy link
Copy Markdown
Collaborator

@mickqian mickqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


component_model_path = os.path.join(self.model_path, load_module_name)
# Use custom VAE path if provided, otherwise use default path
if module_name == "vae" and server_args.vae_path is not None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better make this a separate function for future custom VAEs

@mickqian
Copy link
Copy Markdown
Collaborator

mickqian commented Dec 2, 2025

/tag-and-rerun-ci

@github-actions github-actions Bot added the run-ci label Dec 2, 2025
@baonudesifeizhai
Copy link
Copy Markdown
Contributor Author

emmm so whats next ? looks like all checks passed?

@mickqian mickqian merged commit f764c69 into sgl-project:main Dec 3, 2025
47 checks passed
@mickqian
Copy link
Copy Markdown
Collaborator

mickqian commented Dec 3, 2025

could you add this arg to cli.md?

@baonudesifeizhai
Copy link
Copy Markdown
Contributor Author

#14355

could you add this arg to cli.md?

yingluosanqian pushed a commit to yingluosanqian/sglang that referenced this pull request Dec 4, 2025
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants