Skip to content

[CPU] Add Qwen3.5 model optimization for CPU#19484

Merged
Kangyan-Zhou merged 43 commits intosgl-project:mainfrom
jianan-gu:qwen3.5_cpu_opt
Apr 26, 2026
Merged

[CPU] Add Qwen3.5 model optimization for CPU#19484
Kangyan-Zhou merged 43 commits intosgl-project:mainfrom
jianan-gu:qwen3.5_cpu_opt

Conversation

@jianan-gu
Copy link
Copy Markdown
Contributor

@jianan-gu jianan-gu commented Feb 27, 2026

This PR (work with @blzheng ) adds support for Qwen3.5 series with cpu optimized performance, including changes:

  1. Dtype support in fusion of fused_sigmoid_gating_delta_rule_update
  2. Continues support for fusion of fused_qkvzba_split_reshape_cat_contiguous_cpu
  3. TP cases padding support for both bf16 and fp8
  4. Refinements for logging CPU padding logic.

Note that this PR depends on previous CPU mrope kernel support #12531

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added Multi-modal multi-modal language model sgl-kernel labels Feb 27, 2026
@jianan-gu jianan-gu changed the title [CPU] Add Qwen3.5 model optimized opt [CPU] Add Qwen3.5 model optimization for CPU Feb 27, 2026
jianan-gu and others added 4 commits March 2, 2026 02:28
* refactor and format code

* add log for config setting and updating

* only log config update info on rank 0

* refactor code

change log info to debug
@mingfeima
Copy link
Copy Markdown
Collaborator

@jianan-gu fail at https://github.com/sgl-project/sglang/actions/runs/24720109083/job/72306535388?pr=19484#step:6:2541. rebase and test again.

Copy link
Copy Markdown
Collaborator

@JustinTong0323 JustinTong0323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few issues in the follow-up — the three new asserts use assert (cond, msg,) which Python parses as asserting a non-empty tuple, so they never fire (also triggers SyntaxWarning: assertion is always true). And the two new TORCH_CHECK lines in qwen3.cpp are missing trailing ;.

Comment thread python/sglang/srt/model_loader/weight_utils.py Outdated
Comment thread python/sglang/srt/models/qwen3_5.py Outdated
Comment thread python/sglang/srt/layers/attention/mamba/mamba.py Outdated
Comment thread sgl-kernel/csrc/cpu/model/qwen3.cpp Outdated
Comment thread python/sglang/srt/layers/attention/mamba/mamba.py Outdated
Comment thread python/sglang/srt/utils/common.py Outdated
Comment thread python/sglang/srt/configs/update_config.py Outdated
@JustinTong0323 JustinTong0323 self-assigned this Apr 24, 2026
Copy link
Copy Markdown
Collaborator

@JustinTong0323 JustinTong0323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round-2 fixes all landed cleanly — asserts, TORCH_CHECK, head_dim resolver, rank-0 logging all look good. Two new things surfaced on a final pass, one of which is a crash on Qwen3-Omni with unaligned TP.

Comment thread python/sglang/srt/configs/update_config.py Outdated
Comment thread python/sglang/srt/utils/common.py
Comment thread python/sglang/srt/layers/attention/mamba/mamba.py
@JustinTong0323
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

5 similar comments
@jianan-gu
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@jianan-gu
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@jianan-gu
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@jianan-gu
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@jianan-gu
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@Kangyan-Zhou Kangyan-Zhou merged commit 10fd0fa into sgl-project:main Apr 26, 2026
476 of 547 checks passed
vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026
Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpu cpu backend performance optimization intel Multi-modal multi-modal language model run-ci sgl-kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants