[NPU][diffusion] model: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend by Makcum888e · Pull Request #13662 · sgl-project/sglang

Makcum888e · 2025-11-20T14:36:24Z

Motivation

Enable sglang diffusion on NPU platform

Tested models:
Wan-AI/Wan2.1-T2V-1.3B-Diffusers
black-forest-labs/FLUX.1-dev
Qwen/Qwen-Image-Edit
Qwen/Qwen-Image

Modifications

added NPU platform abstraction
distributed_init_method changed to TCP
added dependencies in python/pyproject_other.toml for now need to install yunchang from sources

Accuracy Tests

Validation script: gen.py

from sglang.multimodal_gen import DiffGenerator

def main():
    # Create a diff generator from a pre-trained model
    model_path = "Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
    tp = 1
    dp = 1
    sp = 1
    enable_cfg_parallel = False
    num_gpus = dp * tp * sp * (2 if enable_cfg_parallel else 1)
    generator = DiffGenerator.from_pretrained(
        model_path=model_path,
        num_gpus=num_gpus,  # Adjust based on your hardware
        server_args={
            "model_path":model_path,
            "dp_size":dp,
            "tp_size":tp,
            "sp_degree":sp,
            "enable_cfg_parallel":enable_cfg_parallel,
            "num_gpus":num_gpus,
        },
    )

    # Provide a prompt for your video
    prompt = "A curious raccoon peers through a vibrant field of yellow sunflowers, its eyes wide with interest."
    # Generate the video
    video = generator.generate(
        sampling_params_kwargs={
            # "image_path": "your/path/to/picture" # for Qwen-Image-Edit
            "prompt":prompt,
            "output_path":"my_videos/",  # Controls where videos are saved
            "num_inference_steps":50,
            "save_output":True,
        }
    )

if __name__ == '__main__':
    main()

python gen.py with model_path="Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
output:
Pixel data generated successfully in 273.75 seconds

python gen.py with model_path="black-forest-labs/FLUX.1-dev"
output:
Pixel data generated successfully in 27.76 seconds

ASCEND_LAUNCH_BLOCKING=1 python gen.py with model_path="Qwen/Qwen-Image"
output:
Pixel data generated successfully in 108.33 seconds

python gen.py with model_path="Qwen/Qwen-Image-Edit" and prompt = "change sunflowers to roses" and make image_path = "/path/to/picture/generated/by/Qwen-Image.jpg"
output:
Pixel data generated successfully in 143.8 seconds

Benchmarking and Profiling

sglang generate --model-path black-forest-labs/FLUX.1-dev --prompt "A benchmark prompt" --perf-dump-path baseline.json

performance for FLUX

{
  "timestamp": "",
  "request_id": "",
  "commit_hash": "N/A",
  "tag": "cli_generate",
  "total_duration_ms": 28260.682209860533,
  "steps": [
    {
      "name": "InputValidationStage",
      "duration_ms": 0.07356004789471626
    },
    {
      "name": "TextEncodingStage",
      "duration_ms": 2615.356710040942
    },
    {
      "name": "ConditioningStage",
      "duration_ms": 0.02898997627198696
    },
    {
      "name": "TimestepPreparationStage",
      "duration_ms": 76.16624981164932
    },
    {
      "name": "LatentPreparationStage",
      "duration_ms": 9.80090000666678
    },
    {
      "name": "DenoisingStage",
      "duration_ms": 21346.04144981131
    },
    {
      "name": "DecodingStage",
      "duration_ms": 3899.3239698465914
    }
  ],
  "meta": {
    "prompt": "A benchmark prompt",
    "model": "black-forest-labs/FLUX.1-dev/"
  }
}

sglang generate --model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers/ --prompt "A benchmark prompt" --perf-dump-path baseline.json

performance for Wan

{
  "timestamp": "",
  "request_id": "",
  "commit_hash": "N/A",
  "tag": "cli_generate",
  "total_duration_ms": 272303.70815005153,
  "steps": [
    {
      "name": "InputValidationStage",
      "duration_ms": 0.11193007230758667
    },
    {
      "name": "TextEncodingStage",
      "duration_ms": 3129.72661992535
    },
    {
      "name": "ConditioningStage",
      "duration_ms": 0.03347010351717472
    },
    {
      "name": "TimestepPreparationStage",
      "duration_ms": 4.545190138742328
    },
    {
      "name": "LatentPreparationStage",
      "duration_ms": 42.87018999457359
    },
    {
      "name": "DenoisingStage",
      "duration_ms": 250732.00444993563
    },
    {
      "name": "DecodingStage",
      "duration_ms": 18370.871579973027
    }
  ],
  "meta": {
    "prompt": "A benchmark prompt",
    "model": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers/"
  }
}

ASCEND_LAUNCH_BLOCKING=1 sglang generate --model-path Qwen/Qwen-Image --prompt "A benchmark prompt" --perf-dump-path baseline.json

performance for Qwen-Image

{
  "timestamp": "",
  "request_id": "",
  "commit_hash": "N/A",
  "tag": "cli_generate",
  "total_duration_ms": 97346.0955300834,
  "steps": [
    {
      "name": "InputValidationStage",
      "duration_ms": 0.05142996087670326
    },
    {
      "name": "TextEncodingStage",
      "duration_ms": 3380.918240174651
    },
    {
      "name": "ConditioningStage",
      "duration_ms": 0.02226000651717186
    },
    {
      "name": "TimestepPreparationStage",
      "duration_ms": 59.400869999080896
    },
    {
      "name": "LatentPreparationStage",
      "duration_ms": 8.644150104373693
    },
    {
      "name": "DenoisingStage",
      "duration_ms": 90416.92453017458
    },
    {
      "name": "DecodingStage",
      "duration_ms": 3459.5016699749976
    }
  ],
  "meta": {
    "prompt": "A benchmark prompt",
    "model": "Qwen/Qwen-Image"
  }
}

sglang generate --model-path Qwen/Qwen-Image-Edit/ --prompt "A benchmark prompt" --perf-dump-path baseline.json --image-path path/to/picture.jpg

performance for Qwen-Image-Edit

{
  "timestamp": "",
  "request_id": "",
  "commit_hash": "N/A",
  "tag": "cli_generate",
  "total_duration_ms": 140093.6643499881,
  "steps": [
    {
      "name": "InputValidationStage",
      "duration_ms": 45.48515984788537
    },
    {
      "name": "ImageEncodingStage",
      "duration_ms": 3814.3088999204338
    },
    {
      "name": "ImageVAEEncodingStage",
      "duration_ms": 3868.5707200784236
    },
    {
      "name": "TimestepPreparationStage",
      "duration_ms": 44.87665998749435
    },
    {
      "name": "LatentPreparationStage",
      "duration_ms": 8.529479848220944
    },
    {
      "name": "ConditioningStage",
      "duration_ms": 0.0241999514400959
    },
    {
      "name": "DenoisingStage",
      "duration_ms": 129670.34499999136
    },
    {
      "name": "DecodingStage",
      "duration_ms": 2622.476930031553
    }
  ],
  "meta": {
    "prompt": "A benchmark prompt",
    "model": "Qwen/Qwen-Image-Edit/"
  }
}

Results

flux
qwen-image
qwen-image-edit
WAN

Wan_output.mp4

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

L4-1024 · 2025-11-22T14:59:56Z

after infer process，it show the error log as below:
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!

ping1jing2 · 2025-11-22T19:42:19Z

@L4-1024 please add your command and more information(like slang.check_env)

Makcum888e · 2025-11-25T06:29:14Z

after infer process，it show the error log as below: [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!

This is a common problem for SGLang on Ascend, you can see it every time you shut down the server, no matter which model you use.

Makcum888e · 2026-02-03T09:18:38Z

please wait for @BBuf @RubiaCx @yhyang201 's approval, thanks for your understanding
@mickqian do we need to wait for approval from all of them or BBuf's approval is enough?

ping1jing2 · 2026-02-03T13:32:54Z

the MOVA is merged, will add some test for the new MOVA model? could it be run on NPU?

Hi @L4-1024 thanks for your attention to Ascend. we currently don't support MOVA on Ascend, we will do it after this PR is merged.

and we will also publish our roadmap and planning ASAP

ping1jing2 · 2026-02-04T10:13:47Z

/rerun-failed-ci

ping1jing2 · 2026-02-04T12:47:12Z

/rerun-failed-ci

L4-1024 · 2026-02-05T09:16:33Z

the MOVA is merged, will add some test for the new MOVA model? could it be run on NPU?

Hi @L4-1024 thanks for your attention to Ascend. we currently don't support MOVA on Ascend, we will do it after this PR is merged.

and we will also publish our roadmap and planning ASAP

when will this Pull/Request be merged into?

mickqian · 2026-02-07T16:20:40Z

please resolve the conflict

Makcum888e · 2026-02-07T17:01:12Z

please resolve the conflict

done

mickqian · 2026-02-08T02:46:15Z


 _is_cuda = current_platform.is_cuda()
 _is_hip = current_platform.is_hip()
+_is_npu = current_platform.is_npu()


please clean this as a follow-up, and make sure to avoid these scattered variables in the future

…Ascend (sgl-project#13662) Co-authored-by: dhx98 <haox.dai@gmail.com> Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu> Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: DHX98 <DHX98@noreply.gitcode.com> Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>

…t#13662 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

…Ascend (sgl-project#13662) Co-authored-by: dhx98 <haox.dai@gmail.com> Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu> Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: DHX98 <DHX98@noreply.gitcode.com> Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>

…8456) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* www/pr/ks: (265 commits) [BugFix][PD]Fix metadata_buffer_index leak when aborted in PD (sgl-project#17483) Refactoring Mooncake TE as a shared distributed component (sgl-project#17810) [ModelOPT] Support Qwen 3 Next Coder NVFP4 (sgl-project#18224) Update author information in pyproject.toml (sgl-project#18453) [Kimi-K2.5] Fix missing `quant_config` in `KimiK25` (sgl-project#18440) Add tensor parallelism support to LFM2 ShortConv layers (sgl-project#17777) [diffusion] chore: revise process title (sgl-project#18446) Fix TRT-LLM MLA backend applying k_scale to BF16 KV cache in BMM1 (sgl-project#18396) [diffusion] refactor: group component loaders under the component_loaders/ directory (sgl-project#18438) [ModelOpt] Fix broken Qwen3-235B-A22B-Instruct-2507-NVFP4 launch (sgl-project#18189) [diffusion] feat: support efficient sequence shard (sgl-project#18161) [CI] fix: notebook ci may not working (sgl-project#18417) fix: sync server_args.kv_cache_dtype when detecting FP8 KV cache (sgl-project#18394) [Fix] Fix backend selection after flashinfer version update (sgl-project#18364) [diffusion] platform: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend (sgl-project#13662) fix: fix NVFP4 Kimi-K2.5 weight mapping and exclude list (sgl-project#18370) [diffusion] feat: support saving videos directly on the server to avoid the overhead of tensor transfer (sgl-project#18253) [diffusion] fix: respect dist_timeout option (sgl-project#18386) [Doc] Fix outdated `--fp4-gemm-backend` documentation (sgl-project#18350) [diffusion] fix: remove unnecessary norm_type argument from GLM-Image dits (sgl-project#18382) ...

…Ascend (sgl-project#13662) Co-authored-by: dhx98 <haox.dai@gmail.com> Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu> Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: DHX98 <DHX98@noreply.gitcode.com> Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>

…t#13662 (sgl-project#18456) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

…Ascend (sgl-project#13662) Co-authored-by: dhx98 <haox.dai@gmail.com> Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu> Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: DHX98 <DHX98@noreply.gitcode.com> Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>

…t#13662 (sgl-project#18456) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

DHX98 and others added 8 commits October 27, 2025 21:26

Dedicated toml files for NPU

8533edc

Merge branch 'main' into npu_toml

db27dc7

Merge branch 'main' into npu_toml

0a11ff3

Merge branch 'main' into npu_toml

d4da594

Merge branch 'main' into npu_toml

2a57b4b

Merge branch 'main' into npu_toml

b42515d

Merge branch 'main' into npu_toml

fe026cb

initial support

234d222

github-actions Bot added dependencies Pull requests that update a dependency file npu labels Nov 20, 2025

Makcum888e added 6 commits November 21, 2025 09:21

fix

7638601

remove

589906f

fix

74f5355

fix tp

58870bd

fix tp

72173c4

fix

72fe029

ping1jing2 self-assigned this Nov 21, 2025

Merge remote-tracking branch 'base/main' into diffusion_npu

e6f0e1d

DHX98 and others added 2 commits November 24, 2025 10:35

Merge branch 'main' into npu_toml

71ff4e4

Merge branch 'main' into npu_toml

172e25b

github-actions Bot added the diffusion SGLang Diffusion label Nov 25, 2025

Makcum888e force-pushed the diffusion_npu branch from d3fac19 to e6f0e1d Compare November 25, 2025 06:42

Makcum888e and others added 5 commits November 25, 2025 11:33

move

aebbb69

device type

6953d5c

Merge branch 'sgl-project:main' into diffusion_npu

9133d3e

remove always failed check

01f7e35

fix cossin device

5e8900e

yhyang201 approved these changes Feb 3, 2026

View reviewed changes

Makcum888e mentioned this pull request Feb 3, 2026

[Roadmap] [NPU] Sglang Diffusion on Ascend #18177

Closed

18 tasks

ping1jing2 and others added 2 commits February 3, 2026 19:00

Merge branch 'main' into diffusion_npu

7cf3055

Merge remote-tracking branch 'base/main' into diffusion_npu

9a8cfad

ping1jing2 mentioned this pull request Feb 4, 2026

[Feature] 生图模型是否可以在910B上支持？ #18245

Closed

2 tasks

mickqian approved these changes Feb 7, 2026

View reviewed changes

Merge branch 'main' into diffusion_npu

cf1f4bf

Makcum888e requested a review from yingluosanqian as a code owner February 7, 2026 16:47

mickqian approved these changes Feb 8, 2026

View reviewed changes

mickqian merged commit 00248d8 into sgl-project:main Feb 8, 2026
82 of 84 checks passed

mickqian reviewed Feb 8, 2026

View reviewed changes

yeahdongcn added a commit to yeahdongcn/sglang that referenced this pull request Feb 9, 2026

[diffusion][MUSA] fix: MUSA platform breakage caused by PR sgl-projec…

cc18263

…t#13662 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

yeahdongcn mentioned this pull request Feb 9, 2026

[diffusion][MUSA] fix: MUSA platform breakage caused by PR #13662 #18456

Merged

5 tasks

yeahdongcn added a commit to yeahdongcn/sglang that referenced this pull request Feb 12, 2026

[diffusion][MUSA] fix: MUSA platform breakage caused by PR sgl-projec…

d4374df

…t#13662 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

Kangyan-Zhou pushed a commit that referenced this pull request Feb 14, 2026

[diffusion][MUSA] fix: MUSA platform breakage caused by PR #13662 (#1…

45a4697

…8456) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

ping1jing2 mentioned this pull request Feb 18, 2026

[Roadmap] [NPU] Sglang Diffusion on Ascend #18967

Open

89 tasks

magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026

[diffusion][MUSA] fix: MUSA platform breakage caused by PR sgl-projec…

8bf076f

…t#13662 (sgl-project#18456) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

[diffusion][MUSA] fix: MUSA platform breakage caused by PR sgl-projec…

9d4cc6d

…t#13662 (sgl-project#18456) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPU][diffusion] model: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend#13662

[NPU][diffusion] model: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend#13662
mickqian merged 135 commits intosgl-project:mainfrom
Makcum888e:diffusion_npu

Makcum888e commented Nov 20, 2025 •

edited

Loading

Uh oh!

L4-1024 commented Nov 22, 2025

Uh oh!

ping1jing2 commented Nov 22, 2025

Uh oh!

Makcum888e commented Nov 25, 2025 •

edited

Loading

Uh oh!

Makcum888e commented Feb 3, 2026

Uh oh!

ping1jing2 commented Feb 3, 2026 •

edited

Loading

Uh oh!

ping1jing2 commented Feb 4, 2026

Uh oh!

ping1jing2 commented Feb 4, 2026

Uh oh!

L4-1024 commented Feb 5, 2026

Uh oh!

mickqian commented Feb 7, 2026

Uh oh!

Makcum888e commented Feb 7, 2026

Uh oh!

Uh oh!

mickqian Feb 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

Makcum888e commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Results

Checklist

Uh oh!

L4-1024 commented Nov 22, 2025

Uh oh!

ping1jing2 commented Nov 22, 2025

Uh oh!

Makcum888e commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Makcum888e commented Feb 3, 2026

Uh oh!

ping1jing2 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ping1jing2 commented Feb 4, 2026

Uh oh!

ping1jing2 commented Feb 4, 2026

Uh oh!

L4-1024 commented Feb 5, 2026

Uh oh!

mickqian commented Feb 7, 2026

Uh oh!

Makcum888e commented Feb 7, 2026

Uh oh!

Uh oh!

mickqian Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Makcum888e commented Nov 20, 2025 •

edited

Loading

Makcum888e commented Nov 25, 2025 •

edited

Loading

ping1jing2 commented Feb 3, 2026 •

edited

Loading

mickqian Feb 8, 2026 •

edited

Loading