[NPU] Fix issue and support GLM-4.5V#22961
Merged
iforgetmyname merged 14 commits intosgl-project:mainfrom Apr 28, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the forward_prepare method in glm4_moe.py to include self.use_qk_norm in the conditional logic for QKV splitting and removes a redundant blank line. A review comment suggests refactoring the complex conditional logic into a descriptive boolean variable to improve code readability and maintainability.
Comment on lines
298
to
302
| if ( | ||
| not _is_npu | ||
| or forward_batch.forward_mode.is_extend_or_draft_extend_or_mixed() | ||
| or not self.use_qk_norm | ||
| ): |
Contributor
There was a problem hiding this comment.
The conditional logic is becoming complex. For better readability and maintainability, consider refactoring this to explicitly check for the conditions required for the NPU-specific path. This makes the intent clearer.
Suggested change
| if ( | |
| not _is_npu | |
| or forward_batch.forward_mode.is_extend_or_draft_extend_or_mixed() | |
| or not self.use_qk_norm | |
| ): | |
| use_npu_decode_path = ( | |
| _is_npu | |
| and not forward_batch.forward_mode.is_extend_or_draft_extend_or_mixed() | |
| and self.use_qk_norm | |
| ) | |
| if not use_npu_decode_path: |
Hexq0210
approved these changes
Apr 16, 2026
sglang-npu-bot
approved these changes
Apr 16, 2026
sglang-npu-bot
approved these changes
Apr 16, 2026
Collaborator
|
/tag-and-rerun-ci |
5 tasks
vguduruTT
pushed a commit
to vguduruTT/sglang
that referenced
this pull request
May 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Fix issue and support GLM-4.5V on NPU.
Issue link: Ascend#343
Modifications
split_qkv_rmsnorm_ropekernel already supportsNORMS=Falsemode internally.Accuracy Tests
Accuracy on MMMU dataset:
-Accuracy: 0.2802
-Invalid: 0.000
-Latency: 89.380 s
-Output throughput: 33.565 token/s
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci