[Fix] Flux2KleinPipelineConfig: hardcode max_length=512 for tokenization by yang1002378395-cmyk · Pull Request #21374 · sgl-project/sglang

yang1002378395-cmyk · 2026-03-25T06:18:27Z

Summary

Hardcode max_length=512 in Flux2KleinPipelineConfig.tokenize_prompt
Ignores inherited max_length=77 from FluxPipelineConfig.text_encoder_extra_args
Matches HuggingFace diffusers reference implementation

Root Cause

Flux2KleinPipelineConfig inherits text_encoder_extra_args with max_length=77 (for Flux 1 CLIP encoder), but Flux 2 Klein uses Qwen3 which supports longer sequences. The tokenizer was receiving max_length=77 from tok_kwargs, truncating prompts and degrading quality.

Reference

diffusers pipeline_flux2_klein.py

Test plan

Verify tokenizer outputs 512 tokens for long prompts
Compare with HuggingFace diffusers output

…ound This allows proper fallback to diffusers backend when native config is not available for a model. Fixes sgl-project#21311

Problem: When using deepstack with multiple modalities where only some modalities have deepstack enabled, an IndexError occurs because the code was using the wrong index to access deepstack_embeddings list. The issue is that deepstack_embeddings only contains entries for modalities where use_deepstack is True, but the code was using the loop index i which includes all modalities. Solution: Use a separate counter deepstack_idx that only increments when deepstack is actually used for a modality. Fixes sgl-project#21327

Fixes sgl-project#21372 - Flux2 Klein should use max_length=512 (matching HuggingFace diffusers) - Previously inherited max_length=77 from FluxPipelineConfig.text_encoder_extra_args - This caused prompt truncation and quality degradation for longer inputs Reference: https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/flux2/pipeline_flux2_klein.py#L204

gemini-code-assist · 2026-03-25T06:18:32Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Issue: sgl-project#21380 When unloading a LoRA adapter, the GPU buffer slot was not released. This caused the slot to remain occupied, leading to memory leak and premature buffer exhaustion. Root cause: - unload_lora_adapter() removed metadata but left uid_to_buffer_id and buffer_id_to_uid unchanged - Eviction policy still tracked unloaded adapters Fix: 1. Add LoRAMemoryPool.release_lora_slot(uid) method - Removes uid from uid_to_buffer_id - Resets buffer_id_to_uid to EMPTY_SLOT - Removes uid from eviction policy tracking - Idempotent (safe to call multiple times) - No-op for None (base model) 2. Call release_lora_slot in lora_manager.unload_lora_adapter() Testing: - Code logic verified via AST - Manual testing shows buffer slots properly released

yang1002378395-cmyk and others added 3 commits March 24, 2026 22:35

fix: return None instead of raising RuntimeError when no model info f…

d69c39b

…ound This allows proper fallback to diffusers backend when native config is not available for a model. Fixes sgl-project#21311

yang1002378395-cmyk requested review from Ying1123, hnyls2002, merrymercy, mickqian, ping1jing2, xiezhq-hermann and yhyang201 as code owners March 25, 2026 06:18

github-actions Bot added the diffusion SGLang Diffusion label Mar 25, 2026

yang1002378395-cmyk requested review from Fridge003, lifuhuang and yushengsu-thu as code owners March 25, 2026 08:56

github-actions Bot added the lora label Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Flux2KleinPipelineConfig: hardcode max_length=512 for tokenization#21374

[Fix] Flux2KleinPipelineConfig: hardcode max_length=512 for tokenization#21374
yang1002378395-cmyk wants to merge 4 commits intosgl-project:mainfrom
yang1002378395-cmyk:fix-flux2-klein-max-length-21372

yang1002378395-cmyk commented Mar 25, 2026

Uh oh!

gemini-code-assist Bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yang1002378395-cmyk commented Mar 25, 2026

Summary

Root Cause

Reference

Test plan

Uh oh!

gemini-code-assist Bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant