Skip to content

[Bug] Flux2 Klein uses incorrect max_length=77 instead of 512 for prompt tokenization #21372

@yikaizhu-baseten

Description

@yikaizhu-baseten

Checklist

  • I searched related issues but found no solution.
  • The bug persists in the latest version.
  • Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
  • If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
  • Please use English. Otherwise, it will be closed.

Describe the bug

Description

Flux2KleinPipelineConfig uses an effective max_length of 77 for prompt tokenization, while the reference implementation in Hugging Face diffusers uses 512. This likely truncates prompts and degrades generation quality for longer text inputs.

Root Cause

Flux2KleinPipelineConfig inherits text_encoder_extra_args from FluxPipelineConfig, which sets max_length=77 (originally for the CLIP text encoder in Flux 1):

# FluxPipelineConfig (base class)
text_encoder_extra_args: list[dict] = field(
    default_factory=lambda: [
        dict(
            max_length=77,
            padding="max_length",
            truncation=True,
            ...
        ),
        None,
    ]
)

Neither Flux2PipelineConfig nor Flux2KleinPipelineConfig overrides this field.

In Flux2KleinPipelineConfig.tokenize_prompt, the max_length is read from tok_kwargs:

max_length = tok_kwargs.pop("max_length", 512)  # default is 512, but tok_kwargs has 77

The default of 512 is correct, but it is never reached because the inherited text_encoder_extra_args[0] supplies max_length=77.

Note that Flux2PipelineConfig.tokenize_prompt (for non-Klein Flux2) avoids this issue by hardcoding max_length=512 directly:

# Flux2PipelineConfig.tokenize_prompt — correct
inputs = tokenizer.apply_chat_template(
    prompts,
    ...
    max_length=512,  # hardcoded, ignores tok_kwargs
)

Reference

In the Hugging Face diffusers implementation, Flux2KleinPipeline explicitly uses max_sequence_length=512:

Suggested Fix

Either

  1. Override text_encoder_extra_args in Flux2PipelineConfig (or Flux2KleinPipelineConfig) to use max_length=512, since both Flux2 variants use a single text encoder (Mistral/Qwen3) that supports longer sequences.

Or

  1. Hardcode max_length=512 in Flux2KleinPipelineConfig.tokenize_prompt (similar to what Flux2PipelineConfig already does), removing the dependency on tok_kwargs for this value.

Reproduction

using H100 and latest sglang, sglang dev image will reproduce it.

Environment

using H100 and latest sglang, sglang dev image will reproduce it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions