Port 6641 by jianan-gu · Pull Request #14 · jianan-gu/sglang

jianan-gu · 2025-06-12T05:49:32Z

* Use fused_experts_cpu and add weight packing * add check on whether AMX is supported * move utils to cpu_utils.py * address comment * no need to pass in is_vnni since it's True by default; change inplace to True * refactor prepack_weight_if_needed * Only import sgl_kernel.cpu once

* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend

* Switch torch.bmm to sgl_kernel.cpu.bmm * check if transpose_dims[i] is None * remove memory copies

* Integrate qkv_proj_with_rope * add forward_absorb_fused_mla_rope_cpu

chunyuan-w added 12 commits June 12, 2025 01:48

switch to weight_packed_linear if cpu_has_amx_support (#11)

fbccc9e

* switch to weight_packed_linear if cpu_has_amx_support * add self.use_intel_amx_backend

Switch to weight_packed_linear for MoEGate and lm_head (#16)

2f0a284

Replace torch.bmm in forward_absorb with sgl_kernel.cpu.bmm (#21)

0f7ed22

* Switch torch.bmm to sgl_kernel.cpu.bmm * check if transpose_dims[i] is None * remove memory copies

don't use c++ kernel if apply_router_weight_on_input is True

88acf3b

Integrate qkv_proj_with_rope (#34)

0f27607

* Integrate qkv_proj_with_rope * add forward_absorb_fused_mla_rope_cpu

update API for fused_qkv_a_proj_with_mqa

3c62d9a

revert changes to bmm

a6644fe

update qkv_proj OP name

12d5754

refine comment

ce678dd

only pack weight is using the fused_qkv_proj_with_rope kernel

74f4dc2

remove dead code

1d495c5

jianan-gu merged commit 68512a9 into hybrid_device Jun 12, 2025
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port 6641#14

Port 6641#14
jianan-gu merged 12 commits intohybrid_devicefrom
port_6641

jianan-gu commented Jun 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jianan-gu commented Jun 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants