[Bug] Flux2 Klein uses incorrect max_length=77 instead of 512 for prompt tokenization

### Checklist

- [x] I searched related issues but found no solution.
- [x] The bug persists in the latest version.
- [x] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [x] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [x] Please use English. Otherwise, it will be closed.

### Describe the bug

### Description

`Flux2KleinPipelineConfig` uses an effective `max_length` of 77 for prompt tokenization, while [the reference implementation in Hugging Face diffusers](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/flux2/pipeline_flux2_klein.py#L204) uses 512. This likely truncates prompts and degrades generation quality for longer text inputs.

### Root Cause

`Flux2KleinPipelineConfig` inherits `text_encoder_extra_args` from `FluxPipelineConfig`, which sets `max_length=77` (originally for the CLIP text encoder in Flux 1):

```python
# FluxPipelineConfig (base class)
text_encoder_extra_args: list[dict] = field(
    default_factory=lambda: [
        dict(
            max_length=77,
            padding="max_length",
            truncation=True,
            ...
        ),
        None,
    ]
)
```

Neither `Flux2PipelineConfig` nor `Flux2KleinPipelineConfig` overrides this field.

In `Flux2KleinPipelineConfig.tokenize_prompt`, the `max_length` is read from `tok_kwargs`:

```python
max_length = tok_kwargs.pop("max_length", 512)  # default is 512, but tok_kwargs has 77
```

The default of 512 is correct, but it is never reached because the inherited `text_encoder_extra_args[0]` supplies `max_length=77`.

Note that `Flux2PipelineConfig.tokenize_prompt` (for non-Klein Flux2) avoids this issue by hardcoding `max_length=512` directly:

```python
# Flux2PipelineConfig.tokenize_prompt — correct
inputs = tokenizer.apply_chat_template(
    prompts,
    ...
    max_length=512,  # hardcoded, ignores tok_kwargs
)
```

### Reference

In the Hugging Face diffusers implementation, `Flux2KleinPipeline` explicitly uses `max_sequence_length=512`:
- [diffusers `pipeline_flux2_klein.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/flux2/pipeline_flux2_klein.py#L204)

### Suggested Fix

Either

1. **Override `text_encoder_extra_args` in `Flux2PipelineConfig`** (or `Flux2KleinPipelineConfig`) to use `max_length=512`, since both Flux2 variants use a single text encoder (Mistral/Qwen3) that supports longer sequences.

Or 

2. **Hardcode `max_length=512`** in `Flux2KleinPipelineConfig.tokenize_prompt` (similar to what `Flux2PipelineConfig` already does), removing the dependency on `tok_kwargs` for this value.

### Reproduction

using H100 and latest sglang, sglang dev image will reproduce it.

### Environment

using H100 and latest sglang, sglang dev image will reproduce it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Flux2 Klein uses incorrect max_length=77 instead of 512 for prompt tokenization #21372

Checklist

Describe the bug

Description

Root Cause

Reference

Suggested Fix

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Flux2 Klein uses incorrect max_length=77 instead of 512 for prompt tokenization #21372

Description

Checklist

Describe the bug

Description

Root Cause

Reference

Suggested Fix

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions