Skip to content

[Diffusion][CPU] Init CPU platform support for SGLang Diffusion#20816

Merged
mingfeima merged 16 commits intosgl-project:mainfrom
jianan-gu:sglang_diffusion_cpu
Apr 21, 2026
Merged

[Diffusion][CPU] Init CPU platform support for SGLang Diffusion#20816
mingfeima merged 16 commits intosgl-project:mainfrom
jianan-gu:sglang_diffusion_cpu

Conversation

@jianan-gu
Copy link
Copy Markdown
Contributor

@jianan-gu jianan-gu commented Mar 18, 2026

Motivation

This PR adds native support to run SGLang Diffusion on CPU only platforms (e.g., Intel Xeon)

Key changes

  1. CPU source installation for SGLang Diffusion
  2. General CPU only path logic functionality (like no offloading...)
  3. CPU OMP core binding and automatic NUMA nodes binding
  4. CPU functionality with key ops using torch native path (SDPA attention, apply_rotary_embedding and more)
  5. TP functionality and commutation ops with shared memory optimizations (allreduce/allgather)

Tested models

Tongyi-MAI/Z-Image-Turbo

sglang generate --model-path Tongyi-MAI/Z-Image-Turbo  --prompt "A curious raccoon peers through a vibrant field of yellow sunflowers, its eyes wide with interest"   --generator-device cpu  
A_curious_raccoon_peers_through_a_vibrant_field_of_yellow_sunflowers_its_eyes_wide_with_interest_20260318-045352_9f34c904

black-forest-labs/FLUX.1-dev

sglang generate --model-path black-forest-labs/FLUX.1-dev   --prompt "A curious raccoon peers through a vibrant field of yellow sunflowers, its eyes wide with interest"   --generator-device cpu 
A_curious_raccoon_peers_through_a_vibrant_field_of_yellow_sunflowers_its_eyes_wide_with_interest_20260318-050227_67fd7637

black-forest-labs/FLUX.2-klein-4B

sglang generate --model-path black-forest-labs/FLUX.2-klein-4B   --prompt "A curious raccoon peers through a vibrant field of yellow sunflowers, its eyes wide with interest"   --generator-device cpu 
A_curious_raccoon_peers_through_a_vibrant_field_of_yellow_sunflowers_its_eyes_wide_with_interest_20260318-051215_ba6c5bf9

Wan-AI/Wan2.2-TI2V-5B-Diffusers

sglang generate --prompt 'A curious raccoon peers through a vibrant field of yellow sunflowers, its eyes wide with interest' --save-output  --model-path Wan-AI/Wan2.2-TI2V-5B-Diffusers  --height 480 --width 832  --generator-device cpu
A_curious_raccoon_peers_through_a_vibrant_field_of_yellow_sunflowers_its_eyes_wide_with_interest_20260318-065816_3a3a7f5c.mp4

Qwen/Qwen-Image-Edit

sglang generate --model-path Qwen/Qwen-Image-Edit     --prompt="Convert 2D style to 3D style" --image-path="https://github.com/lm-sys/lm-sys.github.io/releases/download/test/TI2I_Qwen_Image_Edit_Input.jpg"     --width=1536 --height=1024 --save-output --generator-device cpu 
Convert_2D_style_to_3D_style_20260318-083736_ee9a7bf1

FastVideo/FastWan2.1-T2V-1.3B-Diffusers

sglang generate --model-path FastVideo/FastWan2.1-T2V-1.3B-Diffusers --prompt "A curious raccoon" --save-output --generator-device cpu
A_curious_raccoon_20260318-091646_d9d96fab.mp4

More plans after this PR:

  1. Enable CPU kernel optimizations in sgl-kernels and their integration (replace native ops like apply_rotary_embedding)
  2. CPU AMX attention backend design and their integration (replace SDPA attention, also consider variants like SEGA)
  3. More parallelism evaluations and supports

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added dependencies Pull requests that update a dependency file diffusion SGLang Diffusion jit-kernel labels Mar 18, 2026
@jianan-gu jianan-gu changed the title [Diffusion][CPU] Init pure cpu platform support for SGLang Diffusion [Diffusion][CPU] Init pure CPU platform support for SGLang Diffusion Mar 18, 2026
@jianan-gu jianan-gu changed the title [Diffusion][CPU] Init pure CPU platform support for SGLang Diffusion [Diffusion][CPU] Init CPU platform support for SGLang Diffusion Mar 18, 2026
@jianan-gu jianan-gu marked this pull request as ready for review March 19, 2026 06:27
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Comment thread python/sglang/jit_kernel/diffusion/triton/rotary.py
Copy link
Copy Markdown
Collaborator

@mingfeima mingfeima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we handle attention backend?
which attention backend from ‎python/sglang/multimodal_gen/runtime/layers/attention/backends is used for cpu device?

Comment thread python/sglang/jit_kernel/diffusion/triton/cpu_fallback.py Outdated
Comment thread python/sglang/jit_kernel/diffusion/triton/norm.py
Comment thread python/sglang/jit_kernel/diffusion/triton/rmsnorm_onepass.py
Comment thread python/sglang/jit_kernel/diffusion/triton/scale_shift.py
Comment thread python/sglang/multimodal_gen/runtime/distributed/group_coordinator.py Outdated
Comment thread python/sglang/multimodal_gen/runtime/layers/layernorm.py
Comment thread python/sglang/multimodal_gen/runtime/managers/gpu_worker.py Outdated
@jianan-gu jianan-gu requested a review from mingfeima April 16, 2026 07:06
@jianan-gu
Copy link
Copy Markdown
Contributor Author

Hi @mickqian could you please help review this PR? Thanks.

Comment thread python/sglang/multimodal_gen/runtime/managers/cpu_worker.py
@jianan-gu jianan-gu requested a review from mickqian April 17, 2026 08:26
Comment thread python/sglang/multimodal_gen/runtime/managers/cpu_worker.py Outdated
@jianan-gu
Copy link
Copy Markdown
Contributor Author

/tag-and-rerun-ci

@jianan-gu jianan-gu requested a review from mickqian April 20, 2026 06:58
@mingfeima mingfeima merged commit 2cf3ac5 into sgl-project:main Apr 21, 2026
130 of 164 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file diffusion SGLang Diffusion jit-kernel run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants