[Fix] Flux2KleinPipelineConfig: hardcode max_length=512 for tokenization#21374
Open
yang1002378395-cmyk wants to merge 4 commits intosgl-project:mainfrom
Open
[Fix] Flux2KleinPipelineConfig: hardcode max_length=512 for tokenization#21374yang1002378395-cmyk wants to merge 4 commits intosgl-project:mainfrom
yang1002378395-cmyk wants to merge 4 commits intosgl-project:mainfrom
Conversation
…ound This allows proper fallback to diffusers backend when native config is not available for a model. Fixes sgl-project#21311
Problem: When using deepstack with multiple modalities where only some modalities have deepstack enabled, an IndexError occurs because the code was using the wrong index to access deepstack_embeddings list. The issue is that deepstack_embeddings only contains entries for modalities where use_deepstack is True, but the code was using the loop index i which includes all modalities. Solution: Use a separate counter deepstack_idx that only increments when deepstack is actually used for a modality. Fixes sgl-project#21327
Fixes sgl-project#21372 - Flux2 Klein should use max_length=512 (matching HuggingFace diffusers) - Previously inherited max_length=77 from FluxPipelineConfig.text_encoder_extra_args - This caused prompt truncation and quality degradation for longer inputs Reference: https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/flux2/pipeline_flux2_klein.py#L204
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Issue: sgl-project#21380 When unloading a LoRA adapter, the GPU buffer slot was not released. This caused the slot to remain occupied, leading to memory leak and premature buffer exhaustion. Root cause: - unload_lora_adapter() removed metadata but left uid_to_buffer_id and buffer_id_to_uid unchanged - Eviction policy still tracked unloaded adapters Fix: 1. Add LoRAMemoryPool.release_lora_slot(uid) method - Removes uid from uid_to_buffer_id - Resets buffer_id_to_uid to EMPTY_SLOT - Removes uid from eviction policy tracking - Idempotent (safe to call multiple times) - No-op for None (base model) 2. Call release_lora_slot in lora_manager.unload_lora_adapter() Testing: - Code logic verified via AST - Manual testing shows buffer slots properly released
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #21372
max_length=512inFlux2KleinPipelineConfig.tokenize_promptmax_length=77fromFluxPipelineConfig.text_encoder_extra_argsRoot Cause
Flux2KleinPipelineConfiginheritstext_encoder_extra_argswithmax_length=77(for Flux 1 CLIP encoder), but Flux 2 Klein uses Qwen3 which supports longer sequences. The tokenizer was receivingmax_length=77fromtok_kwargs, truncating prompts and degrading quality.Reference
Test plan