[NPU][diffusion] model: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend#13662
[NPU][diffusion] model: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend#13662mickqian merged 135 commits intosgl-project:mainfrom
Conversation
|
after infer process,it show the error log as below: |
|
@L4-1024 please add your command and more information(like |
This is a common problem for SGLang on Ascend, you can see it every time you shut down the server, no matter which model you use. |
d3fac19 to
e6f0e1d
Compare
|
Hi @L4-1024 thanks for your attention to Ascend. we currently don't support MOVA on Ascend, we will do it after this PR is merged. and we will also publish our roadmap and planning ASAP |
|
/rerun-failed-ci |
1 similar comment
|
/rerun-failed-ci |
when will this Pull/Request be merged into? |
|
please resolve the conflict |
done |
|
|
||
| _is_cuda = current_platform.is_cuda() | ||
| _is_hip = current_platform.is_hip() | ||
| _is_npu = current_platform.is_npu() |
There was a problem hiding this comment.
please clean this as a follow-up, and make sure to avoid these scattered variables in the future
…Ascend (sgl-project#13662) Co-authored-by: dhx98 <haox.dai@gmail.com> Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu> Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: DHX98 <DHX98@noreply.gitcode.com> Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
…t#13662 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
…t#13662 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
…Ascend (sgl-project#13662) Co-authored-by: dhx98 <haox.dai@gmail.com> Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu> Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: DHX98 <DHX98@noreply.gitcode.com> Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
* www/pr/ks: (265 commits) [BugFix][PD]Fix metadata_buffer_index leak when aborted in PD (sgl-project#17483) Refactoring Mooncake TE as a shared distributed component (sgl-project#17810) [ModelOPT] Support Qwen 3 Next Coder NVFP4 (sgl-project#18224) Update author information in pyproject.toml (sgl-project#18453) [Kimi-K2.5] Fix missing `quant_config` in `KimiK25` (sgl-project#18440) Add tensor parallelism support to LFM2 ShortConv layers (sgl-project#17777) [diffusion] chore: revise process title (sgl-project#18446) Fix TRT-LLM MLA backend applying k_scale to BF16 KV cache in BMM1 (sgl-project#18396) [diffusion] refactor: group component loaders under the component_loaders/ directory (sgl-project#18438) [ModelOpt] Fix broken Qwen3-235B-A22B-Instruct-2507-NVFP4 launch (sgl-project#18189) [diffusion] feat: support efficient sequence shard (sgl-project#18161) [CI] fix: notebook ci may not working (sgl-project#18417) fix: sync server_args.kv_cache_dtype when detecting FP8 KV cache (sgl-project#18394) [Fix] Fix backend selection after flashinfer version update (sgl-project#18364) [diffusion] platform: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend (sgl-project#13662) fix: fix NVFP4 Kimi-K2.5 weight mapping and exclude list (sgl-project#18370) [diffusion] feat: support saving videos directly on the server to avoid the overhead of tensor transfer (sgl-project#18253) [diffusion] fix: respect dist_timeout option (sgl-project#18386) [Doc] Fix outdated `--fp4-gemm-backend` documentation (sgl-project#18350) [diffusion] fix: remove unnecessary norm_type argument from GLM-Image dits (sgl-project#18382) ...
…Ascend (sgl-project#13662) Co-authored-by: dhx98 <haox.dai@gmail.com> Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu> Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: DHX98 <DHX98@noreply.gitcode.com> Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
…t#13662 (sgl-project#18456) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
…Ascend (sgl-project#13662) Co-authored-by: dhx98 <haox.dai@gmail.com> Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu> Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: DHX98 <DHX98@noreply.gitcode.com> Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
…t#13662 (sgl-project#18456) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Motivation
Enable sglang diffusion on NPU platform
Tested models:
Wan-AI/Wan2.1-T2V-1.3B-Diffusersblack-forest-labs/FLUX.1-devQwen/Qwen-Image-EditQwen/Qwen-ImageModifications
added NPU platform abstraction
distributed_init_methodchanged toTCPadded dependencies in
python/pyproject_other.tomlfor now need to install yunchang from sourcesAccuracy Tests
Validation script:
gen.pypython gen.pywithmodel_path="Wan-AI/Wan2.1-T2V-1.3B-Diffusers"output:
Pixel data generated successfully in 273.75 secondspython gen.pywithmodel_path="black-forest-labs/FLUX.1-dev"output:
Pixel data generated successfully in 27.76 secondsASCEND_LAUNCH_BLOCKING=1 python gen.pywithmodel_path="Qwen/Qwen-Image"output:
Pixel data generated successfully in 108.33 secondspython gen.pywithmodel_path="Qwen/Qwen-Image-Edit"andprompt = "change sunflowers to roses"and makeimage_path = "/path/to/picture/generated/by/Qwen-Image.jpg"output:
Pixel data generated successfully in 143.8 secondsBenchmarking and Profiling
sglang generate --model-path black-forest-labs/FLUX.1-dev --prompt "A benchmark prompt" --perf-dump-path baseline.jsonperformance for FLUX
sglang generate --model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers/ --prompt "A benchmark prompt" --perf-dump-path baseline.jsonperformance for Wan
ASCEND_LAUNCH_BLOCKING=1 sglang generate --model-path Qwen/Qwen-Image --prompt "A benchmark prompt" --perf-dump-path baseline.jsonperformance for Qwen-Image
sglang generate --model-path Qwen/Qwen-Image-Edit/ --prompt "A benchmark prompt" --perf-dump-path baseline.json --image-path path/to/picture.jpgperformance for Qwen-Image-Edit
Results
flux

qwen-image

qwen-image-edit

WAN
Wan_output.mp4
Checklist