Skip to content

[Diffusion] add FireRed-Image-Edit models#20862

Merged
mickqian merged 4 commits intosgl-project:mainfrom
yuumn:add_firered-image-edit
Mar 23, 2026
Merged

[Diffusion] add FireRed-Image-Edit models#20862
mickqian merged 4 commits intosgl-project:mainfrom
yuumn:add_firered-image-edit

Conversation

@yuumn
Copy link
Copy Markdown
Contributor

@yuumn yuumn commented Mar 18, 2026

Motivation

The FireRed team from Xiaohongshu has released two image editing models with the same architecture, FireRed-Image-Edit-1.0 and FireRed-Image-Edit-1.1, whose model structure is consistent with the Qwen-Image-Edit-2509 series.
However, some issues were encountered when directly deploying these two models for inference using SGLang Diffusion. This PR is intended to enable SGLang Diffusion to properly support inference and deployment for FireRed-Image-Edit-1.0 and FireRed-Image-Edit-1.1.

Modifications

The main changes consist of the following two code additions:

  1. The same pipeline_config and sampling_param as Qwen-Image-Edit-2509 are used. The same pipeline_config and sampling_param as Qwen-Image-Edit-2509 are used. The reason for not using Qwen-Image-Edit-2511’s QwenImageEditPlus_2511_PipelineConfig is that the transformer/config.json files of the two FireRed-Image-Edit models do not set "zero_cond_t": true, which is consistent with Qwen-Image-Edit-2509, the Qwen-Image-Edit-2511 model sets "zero_cond_t": true.
  2. In FireRed-Image-Edit-1.0 and FireRed-Image-Edit-1.1, multimodal token IDs such as image_token_id are placed inside "text_config" in text_encoder/config.json, unlike the Qwen-Image-Edit series, where they are defined at the top level. This causes the following error:
 File "/sgl-workspace/sglang/python/sglang/multimodal_gen/runtime/models/encoders/qwen2_5vl.py", line 836, in get_placeholder_mask
    special_image_mask = input_ids == self.config.image_token_id
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/multimodal_gen/configs/models/base.py", line 27, in __getattr__
    raise AttributeError(
AttributeError: 'QwenImageArchConfig' object has no attribute 'image_token_id'. Did you mean: 'pad_token_id'?
  • Solution: issues 15630 Following the approach discussed in , the relevant token_id fields were added to the QwenImageArchConfig class. This does not affect inference for the Qwen-Image series models, since the same token_id values are used. Thanks to benihime91.

Modified files:

File Change
sglang/python/sglang/multimodal_gen/registry.py Added registration for the FireRed-Image-Edit models, using the same 'pipeline_config' and 'sampling_param' as Qwen-Image-Edit-2509.
sglang/python/sglang/multimodal_gen/configs/models/encoders/qwen_image.py Added 'image_token_id' and other token IDs to the QwenImageArchConfig class.

Accuracy Tests

I tested it on 4 H20 GPUs using the following command.

sglang serve --model-path FireRedTeam/FireRed-Image-Edit-1.1 --num-gpus 4 --tp-size 2 --enable-cfg-parallel

I tested the service using the following script and obtained the correct results.

import base64
from openai import OpenAI

client = OpenAI(
    api_key="<api-key>",  
    base_url="http://127.0.0.1:30000/v1",  
)

result = client.images.edit(
    image=[
        open("input1.png", "rb"),
        open("input2.png", "rb"),
    ],
    model="FireRed-Image-Edit-1.1",
    prompt="Replace the model in Figure 1 with the long dress and high-top canvas shoes from Figure 2, maintaining the original pose and accessories, and ensuring overall style consistency.",
    size='1024x1024', # WxH 
    stream=False,
    output_format='png',
    extra_body={
        "num_inference_steps": 40, 
        "guidance_scale": 4.0, 
        "true_cfg_scale": 4.0, 
        "negative_prompt": " ",
        "seed": 42,
    }
)

image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

with open("output.png", "wb") as f:
    f.write(image_bytes)

[03-18 20:30:24] Starting FastAPI server.
[2026-03-18 20:30:24] INFO:     Started server process [358190]
[2026-03-18 20:30:24] INFO:     Waiting for application startup.
[03-18 20:30:24] ZMQ Broker is listening for offline jobs on tcp://*:30001
[2026-03-18 20:30:24] INFO:     Application startup complete.
[2026-03-18 20:30:24] INFO:     Uvicorn running on http://127.0.0.1:30000 (Press CTRL+C to quit)
[03-18 20:37:38] Diffusers version: 0.36.0.dev0
[03-18 20:37:38] Sampling params:
                       width: 1024
                      height: 1024
                  num_frames: 1
                         fps: 24
                      prompt: <redacted, len=177>
                  neg_prompt: <redacted, len=1>
                        seed: 42
                 infer_steps: 40
      num_outputs_per_prompt: 1
              guidance_scale: 4.0
     embedded_guidance_scale: 6.0
                    n_tokens: None
                  flow_shift: None
                  image_path: ['inputs/uploads/a5306dd5-c7da-46e9-9534-d78f886f46e1_0_input1.png', 'inputs/uploads/a5306dd5-c7da-46e9-9534-d78f886f46e1_1_input2.png']
                 save_output: True
            output_file_path: outputs/a5306dd5-c7da-46e9-9534-d78f886f46e1.png
        
[03-18 20:37:38] Running pipeline stages: ['input_validation_stage', 'image_encoding_stage', 'image_v_a_e_encoding_stage', 'latent_preparation_stage', 'timestep_preparation_stage', 'denoising_stage', 'decoding_stage']
[03-18 20:37:38] [InputValidationStage] started...
[03-18 20:37:38] [InputValidationStage] finished in 0.1990 seconds
[03-18 20:37:38] [ImageEncodingStage] started...
[03-18 20:37:40] [ImageEncodingStage] finished in 2.2332 seconds
[03-18 20:37:40] [ImageVAEEncodingStage] started...
[03-18 20:37:41] [ImageVAEEncodingStage] finished in 0.6484 seconds
[03-18 20:37:41] [LatentPreparationStage] started...
[03-18 20:37:41] [LatentPreparationStage] finished in 0.0022 seconds
[03-18 20:37:41] [TimestepPreparationStage] started...
[03-18 20:37:41] [TimestepPreparationStage] finished in 0.0009 seconds
[03-18 20:37:41] [DenoisingStage] started...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [02:19<00:00,  3.48s/it]
[03-18 20:40:00] [DenoisingStage] average time per step: 3.4784 seconds
[03-18 20:40:00] [DenoisingStage] finished in 139.1436 seconds
[03-18 20:40:00] [DecodingStage] started...
[03-18 20:40:01] [DecodingStage] finished in 1.2831 seconds
[03-18 20:40:04] Peak GPU memory: 49.23 GB, Peak allocated: 46.86 GB, Memory pool overhead: 2.37 GB (4.8%), Remaining GPU memory at peak: 91.17 GB. Components that could stay resident (based on the last request workload): ['text_encoder', 'transformer']. Related offload server args to disable: --dit-cpu-offload, --text-encoder-cpu-offload
[03-18 20:40:04] Output saved to outputs/a5306dd5-c7da-46e9-9534-d78f886f46e1.png
[03-18 20:40:05] Pixel data generated successfully in 147.28 seconds
[03-18 20:40:05] Completed batch processing. Generated 1 outputs in 147.28 seconds
[03-18 20:40:05] Peak memory usage: 50414.00 MB
[2026-03-18 20:40:05] INFO:     127.0.0.1:60704 - "POST /v1/images/edits HTTP/1.1" 200 OK

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@github-actions github-actions Bot added the diffusion SGLang Diffusion label Mar 18, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the FireRed-Image-Edit-1.0 and FireRed-Image-Edit-1.1 models into SGLang Diffusion. The changes primarily involve aligning their configuration with existing Qwen-Image-Edit models and modifying the QwenImageArchConfig to correctly parse multimodal token IDs, ensuring seamless inference and deployment for these new image editing models.

Highlights

  • FireRed-Image-Edit Model Support: Added support for the FireRed-Image-Edit-1.0 and FireRed-Image-Edit-1.1 models from the FireRed team, enabling their deployment and inference within SGLang Diffusion.
  • Configuration Alignment: Configured the new FireRed models to use the same pipeline and sampling parameters as the Qwen-Image-Edit-2509 series, addressing differences in the zero_cond_t setting in their transformer/config.json.
  • Multimodal Token ID Fix: Resolved an AttributeError by explicitly adding vision_start_token_id, vision_end_token_id, vision_token_id, image_token_id, and video_token_id to the QwenImageArchConfig class, as these were nested differently in FireRed models' text_encoder/config.json compared to Qwen-Image-Edit models.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@yuumn
Copy link
Copy Markdown
Contributor Author

yuumn commented Mar 18, 2026

/tag-run-ci-label

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the FireRed-Image-Edit-1.0 and FireRed-Image-Edit-1.1 models. The changes are well-contained and justified. Specifically, token IDs are added to QwenImageArchConfig to accommodate differences in the new models' configuration files, and the models are registered in registry.py using appropriate existing configurations from a similar model. The implementation is sound and aligns with the existing codebase structure. I have reviewed the changes and find them to be correct.

"FireRedTeam/FireRed-Image-Edit-1.0",
"FireRedTeam/FireRed-Image-Edit-1.1",
],
model_detectors=[lambda hf_id: "firered-image-edit" in hf_id.lower()],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still this need, we're deprecating model_detectors

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested removing the model_detectors and verified that it works fine, so I have pushed a new commit to remove it.

@mickqian
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

4 similar comments
@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

4 similar comments
@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@mickqian mickqian merged commit 889e848 into sgl-project:main Mar 23, 2026
70 of 73 checks passed
0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026
dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026
Co-authored-by: yuumn <1010797597@qqã.com>
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Co-authored-by: yuumn <1010797597@qqã.com>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Co-authored-by: yuumn <1010797597@qqã.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants