[CPU] Add Qwen3.5 model optimization for CPU#19484
[CPU] Add Qwen3.5 model optimization for CPU#19484Kangyan-Zhou merged 43 commits intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
* refactor and format code * add log for config setting and updating * only log config update info on rank 0 * refactor code change log info to debug
|
@jianan-gu fail at https://github.com/sgl-project/sglang/actions/runs/24720109083/job/72306535388?pr=19484#step:6:2541. rebase and test again. |
JustinTong0323
left a comment
There was a problem hiding this comment.
A few issues in the follow-up — the three new asserts use assert (cond, msg,) which Python parses as asserting a non-empty tuple, so they never fire (also triggers SyntaxWarning: assertion is always true). And the two new TORCH_CHECK lines in qwen3.cpp are missing trailing ;.
JustinTong0323
left a comment
There was a problem hiding this comment.
Round-2 fixes all landed cleanly — asserts, TORCH_CHECK, head_dim resolver, rank-0 logging all look good. Two new things surfaced on a final pass, one of which is a crash on Qwen3-Omni with unaligned TP.
|
/rerun-failed-ci |
5 similar comments
|
/rerun-failed-ci |
|
/rerun-failed-ci |
|
/rerun-failed-ci |
|
/rerun-failed-ci |
|
/rerun-failed-ci |
Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
This PR (work with @blzheng ) adds support for Qwen3.5 series with cpu optimized performance, including changes:
fused_sigmoid_gating_delta_rule_updatefused_qkvzba_split_reshape_cat_contiguous_cpuNote that this PR depends on previous CPU mrope kernel support #12531