Skip to content

Port 6641#14

Merged
jianan-gu merged 12 commits intohybrid_devicefrom
port_6641
Jun 12, 2025
Merged

Port 6641#14
jianan-gu merged 12 commits intohybrid_devicefrom
port_6641

Conversation

@jianan-gu
Copy link
Copy Markdown
Owner

* Use fused_experts_cpu and add weight packing

* add check on whether AMX is supported

* move utils to cpu_utils.py

* address comment

* no need to pass in is_vnni since it's True by default; change inplace to True

* refactor prepack_weight_if_needed

* Only import sgl_kernel.cpu once
* switch to weight_packed_linear if cpu_has_amx_support

* add self.use_intel_amx_backend
* Switch torch.bmm to sgl_kernel.cpu.bmm

* check if transpose_dims[i] is None

* remove memory copies
* Integrate qkv_proj_with_rope

* add forward_absorb_fused_mla_rope_cpu
@jianan-gu jianan-gu merged commit 68512a9 into hybrid_device Jun 12, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants