Skip to content

[NPU][diffusion] model: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend#13662

Merged
mickqian merged 135 commits intosgl-project:mainfrom
Makcum888e:diffusion_npu
Feb 8, 2026
Merged

[NPU][diffusion] model: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend#13662
mickqian merged 135 commits intosgl-project:mainfrom
Makcum888e:diffusion_npu

Conversation

@Makcum888e
Copy link
Copy Markdown
Contributor

@Makcum888e Makcum888e commented Nov 20, 2025

Motivation

Enable sglang diffusion on NPU platform

Tested models:
Wan-AI/Wan2.1-T2V-1.3B-Diffusers
black-forest-labs/FLUX.1-dev
Qwen/Qwen-Image-Edit
Qwen/Qwen-Image

Modifications

added NPU platform abstraction
distributed_init_method changed to TCP
added dependencies in python/pyproject_other.toml for now need to install yunchang from sources

Accuracy Tests

Validation script: gen.py

from sglang.multimodal_gen import DiffGenerator

def main():
    # Create a diff generator from a pre-trained model
    model_path = "Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
    tp = 1
    dp = 1
    sp = 1
    enable_cfg_parallel = False
    num_gpus = dp * tp * sp * (2 if enable_cfg_parallel else 1)
    generator = DiffGenerator.from_pretrained(
        model_path=model_path,
        num_gpus=num_gpus,  # Adjust based on your hardware
        server_args={
            "model_path":model_path,
            "dp_size":dp,
            "tp_size":tp,
            "sp_degree":sp,
            "enable_cfg_parallel":enable_cfg_parallel,
            "num_gpus":num_gpus,
        },
    )

    # Provide a prompt for your video
    prompt = "A curious raccoon peers through a vibrant field of yellow sunflowers, its eyes wide with interest."
    # Generate the video
    video = generator.generate(
        sampling_params_kwargs={
            # "image_path": "your/path/to/picture" # for Qwen-Image-Edit
            "prompt":prompt,
            "output_path":"my_videos/",  # Controls where videos are saved
            "num_inference_steps":50,
            "save_output":True,
        }
    )

if __name__ == '__main__':
    main()

python gen.py with model_path="Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
output:
Pixel data generated successfully in 273.75 seconds

python gen.py with model_path="black-forest-labs/FLUX.1-dev"
output:
Pixel data generated successfully in 27.76 seconds

ASCEND_LAUNCH_BLOCKING=1 python gen.py with model_path="Qwen/Qwen-Image"
output:
Pixel data generated successfully in 108.33 seconds

python gen.py with model_path="Qwen/Qwen-Image-Edit" and prompt = "change sunflowers to roses" and make image_path = "/path/to/picture/generated/by/Qwen-Image.jpg"
output:
Pixel data generated successfully in 143.8 seconds

Benchmarking and Profiling

sglang generate --model-path black-forest-labs/FLUX.1-dev --prompt "A benchmark prompt" --perf-dump-path baseline.json

performance for FLUX
{
  "timestamp": "",
  "request_id": "",
  "commit_hash": "N/A",
  "tag": "cli_generate",
  "total_duration_ms": 28260.682209860533,
  "steps": [
    {
      "name": "InputValidationStage",
      "duration_ms": 0.07356004789471626
    },
    {
      "name": "TextEncodingStage",
      "duration_ms": 2615.356710040942
    },
    {
      "name": "ConditioningStage",
      "duration_ms": 0.02898997627198696
    },
    {
      "name": "TimestepPreparationStage",
      "duration_ms": 76.16624981164932
    },
    {
      "name": "LatentPreparationStage",
      "duration_ms": 9.80090000666678
    },
    {
      "name": "DenoisingStage",
      "duration_ms": 21346.04144981131
    },
    {
      "name": "DecodingStage",
      "duration_ms": 3899.3239698465914
    }
  ],
  "meta": {
    "prompt": "A benchmark prompt",
    "model": "black-forest-labs/FLUX.1-dev/"
  }
}

sglang generate --model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers/ --prompt "A benchmark prompt" --perf-dump-path baseline.json

performance for Wan
{
  "timestamp": "",
  "request_id": "",
  "commit_hash": "N/A",
  "tag": "cli_generate",
  "total_duration_ms": 272303.70815005153,
  "steps": [
    {
      "name": "InputValidationStage",
      "duration_ms": 0.11193007230758667
    },
    {
      "name": "TextEncodingStage",
      "duration_ms": 3129.72661992535
    },
    {
      "name": "ConditioningStage",
      "duration_ms": 0.03347010351717472
    },
    {
      "name": "TimestepPreparationStage",
      "duration_ms": 4.545190138742328
    },
    {
      "name": "LatentPreparationStage",
      "duration_ms": 42.87018999457359
    },
    {
      "name": "DenoisingStage",
      "duration_ms": 250732.00444993563
    },
    {
      "name": "DecodingStage",
      "duration_ms": 18370.871579973027
    }
  ],
  "meta": {
    "prompt": "A benchmark prompt",
    "model": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers/"
  }
}

ASCEND_LAUNCH_BLOCKING=1 sglang generate --model-path Qwen/Qwen-Image --prompt "A benchmark prompt" --perf-dump-path baseline.json

performance for Qwen-Image
{
  "timestamp": "",
  "request_id": "",
  "commit_hash": "N/A",
  "tag": "cli_generate",
  "total_duration_ms": 97346.0955300834,
  "steps": [
    {
      "name": "InputValidationStage",
      "duration_ms": 0.05142996087670326
    },
    {
      "name": "TextEncodingStage",
      "duration_ms": 3380.918240174651
    },
    {
      "name": "ConditioningStage",
      "duration_ms": 0.02226000651717186
    },
    {
      "name": "TimestepPreparationStage",
      "duration_ms": 59.400869999080896
    },
    {
      "name": "LatentPreparationStage",
      "duration_ms": 8.644150104373693
    },
    {
      "name": "DenoisingStage",
      "duration_ms": 90416.92453017458
    },
    {
      "name": "DecodingStage",
      "duration_ms": 3459.5016699749976
    }
  ],
  "meta": {
    "prompt": "A benchmark prompt",
    "model": "Qwen/Qwen-Image"
  }
}

sglang generate --model-path Qwen/Qwen-Image-Edit/ --prompt "A benchmark prompt" --perf-dump-path baseline.json --image-path path/to/picture.jpg

performance for Qwen-Image-Edit
{
  "timestamp": "",
  "request_id": "",
  "commit_hash": "N/A",
  "tag": "cli_generate",
  "total_duration_ms": 140093.6643499881,
  "steps": [
    {
      "name": "InputValidationStage",
      "duration_ms": 45.48515984788537
    },
    {
      "name": "ImageEncodingStage",
      "duration_ms": 3814.3088999204338
    },
    {
      "name": "ImageVAEEncodingStage",
      "duration_ms": 3868.5707200784236
    },
    {
      "name": "TimestepPreparationStage",
      "duration_ms": 44.87665998749435
    },
    {
      "name": "LatentPreparationStage",
      "duration_ms": 8.529479848220944
    },
    {
      "name": "ConditioningStage",
      "duration_ms": 0.0241999514400959
    },
    {
      "name": "DenoisingStage",
      "duration_ms": 129670.34499999136
    },
    {
      "name": "DecodingStage",
      "duration_ms": 2622.476930031553
    }
  ],
  "meta": {
    "prompt": "A benchmark prompt",
    "model": "Qwen/Qwen-Image-Edit/"
  }
}

Results

  • flux
    flux

  • qwen-image
    qwen-image

  • qwen-image-edit
    Qwen-image-edit

  • WAN

Wan_output.mp4

Checklist

@github-actions github-actions Bot added dependencies Pull requests that update a dependency file npu labels Nov 20, 2025
@ping1jing2 ping1jing2 self-assigned this Nov 21, 2025
@L4-1024
Copy link
Copy Markdown

L4-1024 commented Nov 22, 2025

after infer process,it show the error log as below:
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!

@ping1jing2
Copy link
Copy Markdown
Collaborator

@L4-1024 please add your command and more information(like slang.check_env)

@Makcum888e
Copy link
Copy Markdown
Contributor Author

Makcum888e commented Nov 25, 2025

after infer process,it show the error log as below: [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!

This is a common problem for SGLang on Ascend, you can see it every time you shut down the server, no matter which model you use.

@github-actions github-actions Bot added the diffusion SGLang Diffusion label Nov 25, 2025
@Makcum888e
Copy link
Copy Markdown
Contributor Author

please wait for @BBuf @RubiaCx @yhyang201 's approval, thanks for your understanding
@mickqian do we need to wait for approval from all of them or BBuf's approval is enough?

@ping1jing2
Copy link
Copy Markdown
Collaborator

ping1jing2 commented Feb 3, 2026

the MOVA is merged, will add some test for the new MOVA model? could it be run on NPU?

Hi @L4-1024 thanks for your attention to Ascend. we currently don't support MOVA on Ascend, we will do it after this PR is merged.

and we will also publish our roadmap and planning ASAP

@ping1jing2
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

1 similar comment
@ping1jing2
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@L4-1024
Copy link
Copy Markdown

L4-1024 commented Feb 5, 2026

the MOVA is merged, will add some test for the new MOVA model? could it be run on NPU?

Hi @L4-1024 thanks for your attention to Ascend. we currently don't support MOVA on Ascend, we will do it after this PR is merged.

and we will also publish our roadmap and planning ASAP

when will this Pull/Request be merged into?

@mickqian
Copy link
Copy Markdown
Collaborator

mickqian commented Feb 7, 2026

please resolve the conflict

@Makcum888e
Copy link
Copy Markdown
Contributor Author

please resolve the conflict

done

@mickqian mickqian merged commit 00248d8 into sgl-project:main Feb 8, 2026
82 of 84 checks passed

_is_cuda = current_platform.is_cuda()
_is_hip = current_platform.is_hip()
_is_npu = current_platform.is_npu()
Copy link
Copy Markdown
Collaborator

@mickqian mickqian Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please clean this as a follow-up, and make sure to avoid these scattered variables in the future

charlesHsuGG pushed a commit to charlesHsuGG/sglang that referenced this pull request Feb 9, 2026
…Ascend (sgl-project#13662)

Co-authored-by: dhx98 <haox.dai@gmail.com>
Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: DHX98 <DHX98@noreply.gitcode.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
yeahdongcn added a commit to yeahdongcn/sglang that referenced this pull request Feb 9, 2026
…t#13662

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
yeahdongcn added a commit to yeahdongcn/sglang that referenced this pull request Feb 12, 2026
…t#13662

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026
…Ascend (sgl-project#13662)

Co-authored-by: dhx98 <haox.dai@gmail.com>
Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: DHX98 <DHX98@noreply.gitcode.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
Kangyan-Zhou pushed a commit that referenced this pull request Feb 14, 2026
…8456)

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
1StepForever pushed a commit to 1StepForever/sglang that referenced this pull request Feb 26, 2026
* www/pr/ks: (265 commits)
  [BugFix][PD]Fix metadata_buffer_index leak when aborted in PD (sgl-project#17483)
  Refactoring Mooncake TE as a shared distributed component (sgl-project#17810)
  [ModelOPT] Support Qwen 3 Next Coder NVFP4 (sgl-project#18224)
  Update author information in pyproject.toml (sgl-project#18453)
  [Kimi-K2.5] Fix missing `quant_config` in `KimiK25` (sgl-project#18440)
  Add tensor parallelism support to LFM2 ShortConv layers (sgl-project#17777)
  [diffusion] chore: revise process title (sgl-project#18446)
  Fix TRT-LLM MLA backend applying k_scale to BF16 KV cache in BMM1 (sgl-project#18396)
  [diffusion] refactor: group component loaders under the component_loaders/ directory (sgl-project#18438)
  [ModelOpt] Fix broken Qwen3-235B-A22B-Instruct-2507-NVFP4 launch (sgl-project#18189)
  [diffusion] feat: support efficient sequence shard (sgl-project#18161)
  [CI] fix: notebook ci may not working (sgl-project#18417)
  fix: sync server_args.kv_cache_dtype when detecting FP8 KV cache (sgl-project#18394)
  [Fix] Fix backend selection after flashinfer version update (sgl-project#18364)
  [diffusion] platform: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend (sgl-project#13662)
  fix: fix NVFP4 Kimi-K2.5 weight mapping and exclude list (sgl-project#18370)
  [diffusion] feat: support saving videos directly on the server to avoid the overhead of tensor transfer (sgl-project#18253)
  [diffusion] fix: respect dist_timeout option (sgl-project#18386)
  [Doc] Fix outdated `--fp4-gemm-backend` documentation (sgl-project#18350)
  [diffusion] fix: remove unnecessary norm_type argument from GLM-Image dits (sgl-project#18382)
  ...
magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026
…Ascend (sgl-project#13662)

Co-authored-by: dhx98 <haox.dai@gmail.com>
Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: DHX98 <DHX98@noreply.gitcode.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
…Ascend (sgl-project#13662)

Co-authored-by: dhx98 <haox.dai@gmail.com>
Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: DHX98 <DHX98@noreply.gitcode.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

amd dependencies Pull requests that update a dependency file diffusion SGLang Diffusion documentation Improvements or additions to documentation lora npu run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Does SGLang Diffusion support Ascend NPU?

9 participants