Skip to content

[NPU] Fix issue and support GLM-4.5V#22961

Merged
iforgetmyname merged 14 commits intosgl-project:mainfrom
zhsurpass:glm-4.5v
Apr 28, 2026
Merged

[NPU] Fix issue and support GLM-4.5V#22961
iforgetmyname merged 14 commits intosgl-project:mainfrom
zhsurpass:glm-4.5v

Conversation

@zhsurpass
Copy link
Copy Markdown
Contributor

@zhsurpass zhsurpass commented Apr 16, 2026

Motivation

Fix issue and support GLM-4.5V on NPU.
Issue link: Ascend#343

Modifications

  1. When calling the split_qkv_rmsnorm_rope function, pass the correct arguments based on the use_qk_norm parameter.The split_qkv_rmsnorm_rope kernel already supports NORMS=False mode internally.

Accuracy Tests

Accuracy on MMMU dataset:

-Accuracy: 0.2802
-Invalid: 0.000
-Latency: 89.380 s
-Output throughput: 33.565 token/s

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the forward_prepare method in glm4_moe.py to include self.use_qk_norm in the conditional logic for QKV splitting and removes a redundant blank line. A review comment suggests refactoring the complex conditional logic into a descriptive boolean variable to improve code readability and maintainability.

Comment on lines 298 to 302
if (
not _is_npu
or forward_batch.forward_mode.is_extend_or_draft_extend_or_mixed()
or not self.use_qk_norm
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The conditional logic is becoming complex. For better readability and maintainability, consider refactoring this to explicitly check for the conditions required for the NPU-specific path. This makes the intent clearer.

Suggested change
if (
not _is_npu
or forward_batch.forward_mode.is_extend_or_draft_extend_or_mixed()
or not self.use_qk_norm
):
use_npu_decode_path = (
_is_npu
and not forward_batch.forward_mode.is_extend_or_draft_extend_or_mixed()
and self.use_qk_norm
)
if not use_npu_decode_path:

@sglang-npu-bot
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@iforgetmyname iforgetmyname self-assigned this Apr 23, 2026
@iforgetmyname iforgetmyname merged commit 9ffc0cc into sgl-project:main Apr 28, 2026
390 of 439 checks passed
@zhsurpass zhsurpass changed the title [NPU] Support GLM-4.5V [NPU] Fix issue and support GLM-4.5V Apr 29, 2026
vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants