[Diffusion] add FireRed-Image-Edit models by yuumn · Pull Request #20862 · sgl-project/sglang

yuumn · 2026-03-18T12:45:21Z

Motivation

The FireRed team from Xiaohongshu has released two image editing models with the same architecture, FireRed-Image-Edit-1.0 and FireRed-Image-Edit-1.1, whose model structure is consistent with the Qwen-Image-Edit-2509 series.
However, some issues were encountered when directly deploying these two models for inference using SGLang Diffusion. This PR is intended to enable SGLang Diffusion to properly support inference and deployment for FireRed-Image-Edit-1.0 and FireRed-Image-Edit-1.1.

Modifications

The main changes consist of the following two code additions:

The same pipeline_config and sampling_param as Qwen-Image-Edit-2509 are used. The same pipeline_config and sampling_param as Qwen-Image-Edit-2509 are used. The reason for not using Qwen-Image-Edit-2511’s QwenImageEditPlus_2511_PipelineConfig is that the transformer/config.json files of the two FireRed-Image-Edit models do not set "zero_cond_t": true, which is consistent with Qwen-Image-Edit-2509, the Qwen-Image-Edit-2511 model sets "zero_cond_t": true.
In FireRed-Image-Edit-1.0 and FireRed-Image-Edit-1.1, multimodal token IDs such as image_token_id are placed inside "text_config" in text_encoder/config.json, unlike the Qwen-Image-Edit series, where they are defined at the top level. This causes the following error:

 File "/sgl-workspace/sglang/python/sglang/multimodal_gen/runtime/models/encoders/qwen2_5vl.py", line 836, in get_placeholder_mask
    special_image_mask = input_ids == self.config.image_token_id
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/multimodal_gen/configs/models/base.py", line 27, in __getattr__
    raise AttributeError(
AttributeError: 'QwenImageArchConfig' object has no attribute 'image_token_id'. Did you mean: 'pad_token_id'?

Solution: issues 15630 Following the approach discussed in , the relevant token_id fields were added to the QwenImageArchConfig class. This does not affect inference for the Qwen-Image series models, since the same token_id values are used. Thanks to benihime91.

Modified files:

File	Change
`sglang/python/sglang/multimodal_gen/registry.py`	Added registration for the FireRed-Image-Edit models, using the same 'pipeline_config' and 'sampling_param' as Qwen-Image-Edit-2509.
`sglang/python/sglang/multimodal_gen/configs/models/encoders/qwen_image.py`	Added 'image_token_id' and other token IDs to the QwenImageArchConfig class.

Accuracy Tests

I tested it on 4 H20 GPUs using the following command.

sglang serve --model-path FireRedTeam/FireRed-Image-Edit-1.1 --num-gpus 4 --tp-size 2 --enable-cfg-parallel

I tested the service using the following script and obtained the correct results.

import base64
from openai import OpenAI

client = OpenAI(
    api_key="<api-key>",  
    base_url="http://127.0.0.1:30000/v1",  
)

result = client.images.edit(
    image=[
        open("input1.png", "rb"),
        open("input2.png", "rb"),
    ],
    model="FireRed-Image-Edit-1.1",
    prompt="Replace the model in Figure 1 with the long dress and high-top canvas shoes from Figure 2, maintaining the original pose and accessories, and ensuring overall style consistency.",
    size='1024x1024', # WxH 
    stream=False,
    output_format='png',
    extra_body={
        "num_inference_steps": 40, 
        "guidance_scale": 4.0, 
        "true_cfg_scale": 4.0, 
        "negative_prompt": " ",
        "seed": 42,
    }
)

image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

with open("output.png", "wb") as f:
    f.write(image_bytes)

[03-18 20:30:24] Starting FastAPI server.
[2026-03-18 20:30:24] INFO:     Started server process [358190]
[2026-03-18 20:30:24] INFO:     Waiting for application startup.
[03-18 20:30:24] ZMQ Broker is listening for offline jobs on tcp://*:30001
[2026-03-18 20:30:24] INFO:     Application startup complete.
[2026-03-18 20:30:24] INFO:     Uvicorn running on http://127.0.0.1:30000 (Press CTRL+C to quit)
[03-18 20:37:38] Diffusers version: 0.36.0.dev0
[03-18 20:37:38] Sampling params:
                       width: 1024
                      height: 1024
                  num_frames: 1
                         fps: 24
                      prompt: <redacted, len=177>
                  neg_prompt: <redacted, len=1>
                        seed: 42
                 infer_steps: 40
      num_outputs_per_prompt: 1
              guidance_scale: 4.0
     embedded_guidance_scale: 6.0
                    n_tokens: None
                  flow_shift: None
                  image_path: ['inputs/uploads/a5306dd5-c7da-46e9-9534-d78f886f46e1_0_input1.png', 'inputs/uploads/a5306dd5-c7da-46e9-9534-d78f886f46e1_1_input2.png']
                 save_output: True
            output_file_path: outputs/a5306dd5-c7da-46e9-9534-d78f886f46e1.png
        
[03-18 20:37:38] Running pipeline stages: ['input_validation_stage', 'image_encoding_stage', 'image_v_a_e_encoding_stage', 'latent_preparation_stage', 'timestep_preparation_stage', 'denoising_stage', 'decoding_stage']
[03-18 20:37:38] [InputValidationStage] started...
[03-18 20:37:38] [InputValidationStage] finished in 0.1990 seconds
[03-18 20:37:38] [ImageEncodingStage] started...
[03-18 20:37:40] [ImageEncodingStage] finished in 2.2332 seconds
[03-18 20:37:40] [ImageVAEEncodingStage] started...
[03-18 20:37:41] [ImageVAEEncodingStage] finished in 0.6484 seconds
[03-18 20:37:41] [LatentPreparationStage] started...
[03-18 20:37:41] [LatentPreparationStage] finished in 0.0022 seconds
[03-18 20:37:41] [TimestepPreparationStage] started...
[03-18 20:37:41] [TimestepPreparationStage] finished in 0.0009 seconds
[03-18 20:37:41] [DenoisingStage] started...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [02:19<00:00,  3.48s/it]
[03-18 20:40:00] [DenoisingStage] average time per step: 3.4784 seconds
[03-18 20:40:00] [DenoisingStage] finished in 139.1436 seconds
[03-18 20:40:00] [DecodingStage] started...
[03-18 20:40:01] [DecodingStage] finished in 1.2831 seconds
[03-18 20:40:04] Peak GPU memory: 49.23 GB, Peak allocated: 46.86 GB, Memory pool overhead: 2.37 GB (4.8%), Remaining GPU memory at peak: 91.17 GB. Components that could stay resident (based on the last request workload): ['text_encoder', 'transformer']. Related offload server args to disable: --dit-cpu-offload, --text-encoder-cpu-offload
[03-18 20:40:04] Output saved to outputs/a5306dd5-c7da-46e9-9534-d78f886f46e1.png
[03-18 20:40:05] Pixel data generated successfully in 147.28 seconds
[03-18 20:40:05] Completed batch processing. Generated 1 outputs in 147.28 seconds
[03-18 20:40:05] Peak memory usage: 50414.00 MB
[2026-03-18 20:40:05] INFO:     127.0.0.1:60704 - "POST /v1/images/edits HTTP/1.1" 200 OK

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-03-18T12:45:36Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the FireRed-Image-Edit-1.0 and FireRed-Image-Edit-1.1 models into SGLang Diffusion. The changes primarily involve aligning their configuration with existing Qwen-Image-Edit models and modifying the QwenImageArchConfig to correctly parse multimodal token IDs, ensuring seamless inference and deployment for these new image editing models.

Highlights

FireRed-Image-Edit Model Support: Added support for the FireRed-Image-Edit-1.0 and FireRed-Image-Edit-1.1 models from the FireRed team, enabling their deployment and inference within SGLang Diffusion.
Configuration Alignment: Configured the new FireRed models to use the same pipeline and sampling parameters as the Qwen-Image-Edit-2509 series, addressing differences in the zero_cond_t setting in their transformer/config.json.
Multimodal Token ID Fix: Resolved an AttributeError by explicitly adding vision_start_token_id, vision_end_token_id, vision_token_id, image_token_id, and video_token_id to the QwenImageArchConfig class, as these were nested differently in FireRed models' text_encoder/config.json compared to Qwen-Image-Edit models.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

yuumn · 2026-03-18T12:46:16Z

/tag-run-ci-label

gemini-code-assist

Code Review

This pull request introduces support for the FireRed-Image-Edit-1.0 and FireRed-Image-Edit-1.1 models. The changes are well-contained and justified. Specifically, token IDs are added to QwenImageArchConfig to accommodate differences in the new models' configuration files, and the models are registered in registry.py using appropriate existing configurations from a similar model. The implementation is sound and aligns with the existing codebase structure. I have reviewed the changes and find them to be correct.

mickqian · 2026-03-19T02:43:29Z

+            "FireRedTeam/FireRed-Image-Edit-1.0",
+            "FireRedTeam/FireRed-Image-Edit-1.1",
+        ],
+        model_detectors=[lambda hf_id: "firered-image-edit" in hf_id.lower()],


do we still this need, we're deprecating model_detectors

I tested removing the model_detectors and verified that it works fine, so I have pushed a new commit to remove it.

mickqian · 2026-03-19T02:43:52Z

/tag-and-rerun-ci

yhyang201 · 2026-03-19T17:14:46Z

/rerun-failed-ci

yhyang201 · 2026-03-19T23:43:45Z

/rerun-failed-ci

yhyang201 · 2026-03-20T02:26:10Z

/rerun-failed-ci

yhyang201 · 2026-03-20T06:42:55Z

/rerun-failed-ci

yhyang201 · 2026-03-20T12:26:18Z

/rerun-failed-ci

yhyang201 · 2026-03-21T16:41:07Z

/rerun-failed-ci

yhyang201 · 2026-03-21T17:10:13Z

/rerun-failed-ci

yhyang201 · 2026-03-21T17:55:27Z

/rerun-failed-ci

yhyang201 · 2026-03-21T18:43:54Z

/rerun-failed-ci

yhyang201 · 2026-03-21T19:16:00Z

/rerun-failed-ci

Co-authored-by: yuumn <1010797597@qqã.com>

add firered-image-edit models

6793dc8

yuumn requested review from mickqian, ping1jing2 and yhyang201 as code owners March 18, 2026 12:45

github-actions Bot added the diffusion SGLang Diffusion label Mar 18, 2026

gemini-code-assist Bot reviewed Mar 18, 2026

View reviewed changes

mickqian reviewed Mar 19, 2026

View reviewed changes

github-actions Bot added the run-ci label Mar 19, 2026

Remove 'model_detectors'

2431122

Merge branch 'main' into add_firered-image-edit

220370c

Merge branch 'main' into add_firered-image-edit

57dd8ad

mickqian approved these changes Mar 23, 2026

View reviewed changes

mickqian merged commit 889e848 into sgl-project:main Mar 23, 2026
70 of 73 checks passed

0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026

[diffusion] model: support FireRed-Image-Edit (sgl-project#20862)

1486cb7

Co-authored-by: yuumn <1010797597@qqã.com>

dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026

[diffusion] model: support FireRed-Image-Edit (sgl-project#20862)

b52b5e4

Co-authored-by: yuumn <1010797597@qqã.com>

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

[diffusion] model: support FireRed-Image-Edit (sgl-project#20862)

603f62b

Co-authored-by: yuumn <1010797597@qqã.com>

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[diffusion] model: support FireRed-Image-Edit (sgl-project#20862)

357e3df

Co-authored-by: yuumn <1010797597@qqã.com>

Conversation

yuumn commented Mar 18, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Mar 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

yuumn commented Mar 18, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mickqian Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

yuumn Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

mickqian commented Mar 19, 2026

Uh oh!

yhyang201 commented Mar 19, 2026

Uh oh!

yhyang201 commented Mar 19, 2026

Uh oh!

yhyang201 commented Mar 20, 2026

Uh oh!

yhyang201 commented Mar 20, 2026

Uh oh!

yhyang201 commented Mar 20, 2026

Uh oh!

yhyang201 commented Mar 21, 2026

Uh oh!

yhyang201 commented Mar 21, 2026

Uh oh!

yhyang201 commented Mar 21, 2026

Uh oh!

yhyang201 commented Mar 21, 2026

Uh oh!

yhyang201 commented Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants