Feature/support distilled vae generic #14195
Conversation
This reverts commit 6f48bbd.
- Add dynamic VAE type detection based on actual attributes (bn presence) - Support both standard Flux2 VAE and distilled VAE automatically - Add --vae-path CLI argument for custom VAE models - Handle DecoderOutput return type from VAE decode - Automatically adapt preprocess_decoding and get_decode_scale_and_shift based on VAE type - Support any HuggingFace VAE model with access permissions
Summary of ChangesHello @baonudesifeizhai, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the system's flexibility and compatibility by introducing robust support for distilled and fine-tuned VAE models, particularly the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for distilled VAEs, specifically fal/FLUX.2-Tiny-AutoEncoder, by adding a new Flux2FinetunedPipelineConfig and implementing dynamic VAE loading and adaptive preprocessing logic. Key changes include modifying FluxVAEConfig to dynamically calculate vae_scale_factor, updating PipelineConfig.from_kwargs to auto-detect and use the fine-tuned VAE configuration, and enhancing VAELoader to support custom VAE classes loaded via HuggingFace's auto_map. The preprocess_decoding and get_decode_scale_and_shift methods in Flux2PipelineConfig and Flux2FinetunedPipelineConfig have been adapted to handle the specific requirements of distilled VAEs, including 5D latent support and conditional unpatchifying. Overall, the changes are well-structured and address the motivation effectively.
mickqian
left a comment
There was a problem hiding this comment.
over all LGTM.
We should add it to CI, in this PR
| def load_customized( | ||
| self, component_model_path: str, server_args: ServerArgs, module_name: str | ||
| ): | ||
| """ |
There was a problem hiding this comment.
Could you resume the method doc?
bfee256 to
c478270
Compare
| trust_remote_code=server_args.trust_remote_code, | ||
| ) | ||
| vae = vae.to(device=target_device, dtype=vae_dtype) | ||
| return vae.eval() |
There was a problem hiding this comment.
/gemini any idea for this part
There was a problem hiding this comment.
This new block of code is essential for supporting custom VAE architectures, such as the fal/FLUX.2-Tiny-AutoEncoder mentioned in the pull request description. These custom VAEs often define their loading mechanism via an auto_map entry in their configuration, which points to a specific module and class name.
The importlib logic dynamically loads the specified VAE class from a Python file within the model's directory. This allows the system to correctly instantiate and use VAEs that don't conform to a standard, predefined structure, leveraging HuggingFace's from_pretrained method with trust_remote_code for flexibility. This approach is robust for handling such specialized models.
|
|
||
| component_model_path = os.path.join(self.model_path, load_module_name) | ||
| # Use custom VAE path if provided, otherwise use default path | ||
| if module_name == "vae" and server_args.vae_path is not None: |
There was a problem hiding this comment.
better make this a separate function for future custom VAEs
|
/tag-and-rerun-ci |
|
emmm so whats next ? looks like all checks passed? |
|
could you add this arg to cli.md? |
|
Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Mick <mickjagger19@icloud.com>

Motivation
Add support for fal/FLUX.2-Tiny-AutoEncoder, a distilled VAE that doesn't work with the standard diffusers pipeline.
#14004
Modifications
Modifications
Added Flux2FinetunedPipelineConfig for distilled VAEs
Auto-detect FLUX.2-Tiny-AutoEncoder and switch to the specialized config
Handle patchified latents (128 channels) without unpatchify
Skip external scaling (VAE handles it internally)
Support 5D latents for decoding
Load custom VAE classes via auto_map from HuggingFace
Accuracy Tests
Benchmarking and Profiling
sglang generate

--model-path black-forest-labs/FLUX.2-dev
--vae-path fal/FLUX.2-Tiny-AutoEncoder
--num-gpus 8
--tp-size 8
--prompt "beatiful women with long yellow hair,around 30 ages"
--width 384
--height 384
--trust-remote-code
--vae-precision bf16
--vae-cpu-offload
--text-encoder-cpu-offload
--image-encoder-cpu-offload
--dit-cpu-offload
--pin-cpu-memory
--log-level debug
Checklist