[Roadmap] [NPU] Sglang Diffusion on Ascend

Teammates: @Makcum888e @OrangeRedeng @e-martirosian @ssshinigami 

## Foundational Capability Development

initial npu support for SGLang-Diffusion was added in #13662
- support NPU
  - [x] support `NPUPlatformBase` (python\sglang\multimodal_gen\runtime\platforms\npu.py)
  - [ ] support models
    - [x] Wan/Qwen-Image/FLUX #13662
    - [x] MOVA @LLThomas  #21633 
    - [x] Hunyuan3D @e-martirosian #20352 
        - [ ] rasterize_image_npu
    - [ ] BAGEL
    - [x] GLM-Image
    - [ ] ERNIE-Image
    - [ ] Hunyuan-Image/Video
    - [ ] ERNIE-Image
    - [ ] GPT-Image 2.0
    - [ ] Happy Horse 1.0

- Tools
  - [x] profiling #17807
  - [x] benchmark #18907
- documentation
  - [x] get started #18894
  - [x] basic usage
  - [ ] advanced features
  - [ ] developer guide
  - [ ] cookbook
- CI/CD & Observability
  - [ ] PR-test design(include testcase and corresponding time)
  - [ ] nightly-test design(include testcase and corresponding time)
  - [ ] Improve [CI monitor](https://github.com/sgl-project/sglang/actions/workflows/ci-monitor.yml) workflow
  - [ ] More tests
  	- [ ] Different parallelism
  	- [ ] ulysses+ring+tp
  	- [ ] ulysses+ring
  	- [ ] tp+ring
- [ ] Models
- [ ] accuracy test
  	- [ ] module test
  		- [ ] v-bench

## Performance Optimization
> P0 means High priority, P1 means Medium priority, P2 means low priority
- Graph optimization
  - [ ] [P1] support ACLGraph [in progress] @Alisehen 
  - [x] support torch.compile  @Alisehen #20687
- Advanced attention
  - [x] USPAttention
  - [x] LocalAttention
  - [x] UlyssesAttention
  - [ ] [P2] UlyssesAttention_VSA
    - [ ] Video Sparse Attention P2
  - [x] MinimalA2AAttnOp
    - [ ] [P2] Sparse Linear attention backend
    - [ ] [P2] Sage Sparse Linear attention backend
  - [ ] [P2] SAGEAttention
  - [ ] [P0] RainFusion Attention @Napkin-AI @Svoloch2940194 
  - [ ] Block Sparse Attention
     - [ ] [P0] Initial support @Napkin-AI @Svoloch2940194
     - [ ] [P1] Improved version support
  - [ ] Laser Attention
     - [ ] [P0] Initial support @Napkin-AI @Svoloch2940194 
     - [ ] [P1] Improved version support 
  - [ ] Ulysses Anything Attention
  - [ ] Support Cross attention skipping for Ulysses SP

- Cache
  - [x] Cache-DiT
  - [x] TeaCache
  - [ ] [P1] AttentionCache
- memory optimization
  - [x] CPU offload (need support for quant models)
- parallelism
  - [x] TP
  - [x] Sequence Parallel(SP)
  - [x] DP
  - [ ] Image-model SP(with padding)
  - [x] VAE Decoding Parallel @gxxx-hum #20764
  - [ ] VAE tiling Parallel encode
  - [ ] PipelineStage Parallelism
  - [x] Ring Sequence Parallel #20013 #21383 #20998 
  - [x] Ulysses Sequence Parallel
  - [x] CFG parallelism
  - [ ] Expert Parallelism

- Other optimizations
  - [ ] Support fast_layernorm from MindIE-SD @Napkin-AI @Svoloch2940194
  - [ ] Support BF16 datatype in vae
  - [ ] NZ for MoE support

- quantization and compression @OrangeRedeng
  - [x] [modelslim](https://gitcode.com/Ascend/msmodelslim/tree/master/example/multimodal_sd) -
     - [x] Wan2.2 #17996
     - [ ] Wan2.1 - need to test with #17996
     - [ ] SD3
     - [ ] Open-Sora-Plan v1.2
     - [ ] Flux.1
     - [ ] HunyuanVideo 
     - [ ] Qwen-Image-Edit

  - [ ] [nunchaku](https://github.com/nunchaku-ai/nunchaku)
  - [ ] [TurboDiffusion](https://github.com/thu-ml/TurboDiffusion)
- Scheduling & Serving
  - [x] ComfyUI web serving
  - [x] LoRA support 
  - [ ] Batch Scheduler 

## Advanced features
  - [ ] Real time diffusion
  - [x] Postprocess
    - [x] Upscaling #18327
    - [x] Frame interpolation #19384

## Refactoring
  - [ ] Refactor Custom Op
  - [ ] Refactor Rotary Embedding: Custom Op for RoPE



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap] [NPU] Sglang Diffusion on Ascend #18967

Foundational Capability Development

Performance Optimization

Advanced features

Refactoring

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap] [NPU] Sglang Diffusion on Ascend #18967

Description

Foundational Capability Development

Performance Optimization

Advanced features

Refactoring

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions