Checklist
Describe the bug
Description
Flux2KleinPipelineConfig uses an effective max_length of 77 for prompt tokenization, while the reference implementation in Hugging Face diffusers uses 512. This likely truncates prompts and degrades generation quality for longer text inputs.
Root Cause
Flux2KleinPipelineConfig inherits text_encoder_extra_args from FluxPipelineConfig, which sets max_length=77 (originally for the CLIP text encoder in Flux 1):
# FluxPipelineConfig (base class)
text_encoder_extra_args: list[dict] = field(
default_factory=lambda: [
dict(
max_length=77,
padding="max_length",
truncation=True,
...
),
None,
]
)
Neither Flux2PipelineConfig nor Flux2KleinPipelineConfig overrides this field.
In Flux2KleinPipelineConfig.tokenize_prompt, the max_length is read from tok_kwargs:
max_length = tok_kwargs.pop("max_length", 512) # default is 512, but tok_kwargs has 77
The default of 512 is correct, but it is never reached because the inherited text_encoder_extra_args[0] supplies max_length=77.
Note that Flux2PipelineConfig.tokenize_prompt (for non-Klein Flux2) avoids this issue by hardcoding max_length=512 directly:
# Flux2PipelineConfig.tokenize_prompt — correct
inputs = tokenizer.apply_chat_template(
prompts,
...
max_length=512, # hardcoded, ignores tok_kwargs
)
Reference
In the Hugging Face diffusers implementation, Flux2KleinPipeline explicitly uses max_sequence_length=512:
Suggested Fix
Either
- Override
text_encoder_extra_args in Flux2PipelineConfig (or Flux2KleinPipelineConfig) to use max_length=512, since both Flux2 variants use a single text encoder (Mistral/Qwen3) that supports longer sequences.
Or
- Hardcode
max_length=512 in Flux2KleinPipelineConfig.tokenize_prompt (similar to what Flux2PipelineConfig already does), removing the dependency on tok_kwargs for this value.
Reproduction
using H100 and latest sglang, sglang dev image will reproduce it.
Environment
using H100 and latest sglang, sglang dev image will reproduce it.
Checklist
Describe the bug
Description
Flux2KleinPipelineConfiguses an effectivemax_lengthof 77 for prompt tokenization, while the reference implementation in Hugging Face diffusers uses 512. This likely truncates prompts and degrades generation quality for longer text inputs.Root Cause
Flux2KleinPipelineConfiginheritstext_encoder_extra_argsfromFluxPipelineConfig, which setsmax_length=77(originally for the CLIP text encoder in Flux 1):Neither
Flux2PipelineConfignorFlux2KleinPipelineConfigoverrides this field.In
Flux2KleinPipelineConfig.tokenize_prompt, themax_lengthis read fromtok_kwargs:The default of 512 is correct, but it is never reached because the inherited
text_encoder_extra_args[0]suppliesmax_length=77.Note that
Flux2PipelineConfig.tokenize_prompt(for non-Klein Flux2) avoids this issue by hardcodingmax_length=512directly:Reference
In the Hugging Face diffusers implementation,
Flux2KleinPipelineexplicitly usesmax_sequence_length=512:pipeline_flux2_klein.pySuggested Fix
Either
text_encoder_extra_argsinFlux2PipelineConfig(orFlux2KleinPipelineConfig) to usemax_length=512, since both Flux2 variants use a single text encoder (Mistral/Qwen3) that supports longer sequences.Or
max_length=512inFlux2KleinPipelineConfig.tokenize_prompt(similar to whatFlux2PipelineConfigalready does), removing the dependency ontok_kwargsfor this value.Reproduction
using H100 and latest sglang, sglang dev image will reproduce it.
Environment
using H100 and latest sglang, sglang dev image will reproduce it.