[CPU] Add Qwen3.5 model optimization for CPU by jianan-gu · Pull Request #19484 · sgl-project/sglang

jianan-gu · 2026-02-27T07:02:50Z

This PR (work with @blzheng ) adds support for Qwen3.5 series with cpu optimized performance, including changes:

Dtype support in fusion of fused_sigmoid_gating_delta_rule_update
Continues support for fusion of fused_qkvzba_split_reshape_cat_contiguous_cpu
TP cases padding support for both bf16 and fp8
Refinements for logging CPU padding logic.

Note that this PR depends on previous CPU mrope kernel support #12531

gemini-code-assist · 2026-02-27T07:02:54Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

* refactor and format code * add log for config setting and updating * only log config update info on rank 0 * refactor code change log info to debug

mingfeima · 2026-04-22T07:31:45Z

@jianan-gu fail at https://github.com/sgl-project/sglang/actions/runs/24720109083/job/72306535388?pr=19484#step:6:2541. rebase and test again.

JustinTong0323

A few issues in the follow-up — the three new asserts use assert (cond, msg,) which Python parses as asserting a non-empty tuple, so they never fire (also triggers SyntaxWarning: assertion is always true). And the two new TORCH_CHECK lines in qwen3.cpp are missing trailing ;.

JustinTong0323

Round-2 fixes all landed cleanly — asserts, TORCH_CHECK, head_dim resolver, rank-0 logging all look good. Two new things surfaced on a final pass, one of which is a crash on Qwen3-Omni with unaligned TP.

JustinTong0323 · 2026-04-24T17:11:26Z

/rerun-failed-ci

jianan-gu · 2026-04-24T22:40:48Z

/rerun-failed-ci

jianan-gu · 2026-04-25T01:20:42Z

/rerun-failed-ci

jianan-gu · 2026-04-25T13:17:50Z

/rerun-failed-ci

jianan-gu · 2026-04-26T01:11:48Z

/rerun-failed-ci

jianan-gu · 2026-04-26T04:41:34Z

/rerun-failed-ci

Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>

blzheng and others added 9 commits February 26, 2026 01:59

add support for bf16 A_log

2cb1e6e

add support for bf16 A_log

307d2f2

fix tp

226348a

fix acc issue when tp is enabled

f72bc7a

tp padding fix

606c5c4

workaround for fp8 new qkv padding

7c5e47e

formal tp fp8 fix

660faf7

minor refinement

26b8437

format fix

dbaee41

github-actions Bot added Multi-modal multi-modal language model sgl-kernel labels Feb 27, 2026

jianan-gu changed the title ~~[CPU] Add Qwen3.5 model optimized opt~~ [CPU] Add Qwen3.5 model optimization for CPU Feb 27, 2026

jianan-gu and others added 4 commits March 2, 2026 02:28

[PR19484] fix tp head dim

9f81037

[PR19484] Add logging info for config updates (#104)

ea4c197

* refactor and format code * add log for config setting and updating * only log config update info on rank 0 * refactor code change log info to debug

apply sgl kernel when tie_word_embeddings is True (#105)

76b5980

[PR to track] reduce duplicated copy overhead in moe (#106)

37d4b71

jianan-gu marked this pull request as ready for review March 3, 2026 01:43

jianan-gu requested review from BBuf, FlamingoPg, Fridge003, HaiShaw, Qiaolin-Yu, Ying1123, ch-wan, hebiao064, ispobock, merrymercy, yizhang2077 and zhyncs as code owners March 3, 2026 01:43

code refactor

ac753b3

jianan-gu requested a review from JustinTong0323 April 19, 2026 14:14

jianan-gu and others added 5 commits April 20, 2026 09:30

minor refine

382d9bc

Merge branch 'main' into qwen3.5_cpu_opt

1a4ec02

fix qwen3-next

3aea8dd

Merge branch 'main' into qwen3.5_cpu_opt

8828cd3

Merge branch 'main' into qwen3.5_cpu_opt

79be113

Merge branch 'main' into qwen3.5_cpu_opt

f81009f

JustinTong0323 reviewed Apr 22, 2026

View reviewed changes

jianan-gu added 2 commits April 22, 2026 22:58

revise logs/asserts etc.

b0e0f86

Merge branch 'main' into qwen3.5_cpu_opt

ec25926

jianan-gu requested a review from JustinTong0323 April 23, 2026 03:03

JustinTong0323 self-assigned this Apr 24, 2026

JustinTong0323 reviewed Apr 24, 2026

View reviewed changes

Comment thread python/sglang/srt/configs/update_config.py Outdated

Comment thread python/sglang/srt/utils/common.py

Comment thread python/sglang/srt/layers/attention/mamba/mamba.py

jianan-gu added 2 commits April 24, 2026 10:44

refactor

4302c37

Merge branch 'main' into qwen3.5_cpu_opt

5d0f5f1

jianan-gu requested a review from JustinTong0323 April 24, 2026 14:47

jianan-gu added 2 commits April 24, 2026 23:43

Merge branch 'main' into qwen3.5_cpu_opt

0e9b4f3

refine record_func after rebase

8de75f0

JustinTong0323 approved these changes Apr 26, 2026

View reviewed changes

mickqian approved these changes Apr 26, 2026

View reviewed changes

Kangyan-Zhou merged commit 10fd0fa into sgl-project:main Apr 26, 2026
476 of 547 checks passed

Conversation

jianan-gu commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Feb 27, 2026

Uh oh!

mingfeima commented Apr 22, 2026

Uh oh!

JustinTong0323 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JustinTong0323 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JustinTong0323 commented Apr 24, 2026

Uh oh!

jianan-gu commented Apr 24, 2026

Uh oh!

jianan-gu commented Apr 25, 2026

Uh oh!

jianan-gu commented Apr 25, 2026

Uh oh!

jianan-gu commented Apr 26, 2026

Uh oh!

jianan-gu commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jianan-gu commented Feb 27, 2026 •

edited

Loading