Skip to content

[CPU] Fix argument issues in qkv_proj_with_rope_fused_weight and bmm…#21367

Merged
mingfeima merged 8 commits intosgl-project:mainfrom
blzheng:beilei/fix_dsv31_terminus
Apr 13, 2026
Merged

[CPU] Fix argument issues in qkv_proj_with_rope_fused_weight and bmm…#21367
mingfeima merged 8 commits intosgl-project:mainfrom
blzheng:beilei/fix_dsv31_terminus

Conversation

@blzheng
Copy link
Copy Markdown
Contributor

@blzheng blzheng commented Mar 25, 2026

…_cpu

Motivation

This PR fixes the following issue encountered when running DeepSeek-V3.1-Terminus: w_scale is only required for FP8 in the kernels qkv_proj_with_rope_fused_weight and bmm_cpu, so we updated the frontend logic to pass w_scale=None for other data types.

command: python -m sglang.launch_server --model IntervitensInc/DeepSeek-V3.1-Terminus-Channel-int8 --trust-remote-code --disable-overlap-schedule --device cpu --quantization w8a8_int8 --tp 6
RuntimeError: sgl_kernel::qkv_proj_with_rope_fused_weight() Expected a value of type 'Optional[Tensor]' for argument 'w_scale' but instead found type 'float'.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@blzheng
Copy link
Copy Markdown
Contributor Author

blzheng commented Mar 25, 2026

/tag-run-ci-label

@blzheng blzheng force-pushed the beilei/fix_dsv31_terminus branch from 971a858 to f977d67 Compare April 9, 2026 09:15
@blzheng blzheng mentioned this pull request Apr 10, 2026
5 tasks
@mingfeima mingfeima merged commit 934e19a into sgl-project:main Apr 13, 2026
66 of 92 checks passed
pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants