[NPU]Fix GLM-4.7-Flash failed on NPU#22509
Merged
iforgetmyname merged 6 commits intosgl-project:mainfrom Apr 22, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request improves the robustness of the DeepSeek-V2 and GLM4-MoE-Lite models. In deepseek_v2.py, a safe attribute access is implemented for _gfx95_quant_format using getattr. In glm4_moe_lite.py, the dsv3_router_gemm import is moved inside a CUDA-specific conditional block to prevent import errors on non-CUDA environments. I have no feedback to provide.
iforgetmyname
approved these changes
Apr 17, 2026
Collaborator
|
/tag-run-ci-label |
zhangying098
pushed a commit
to zhangying098/sglang
that referenced
this pull request
Apr 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
GPU op optimizations caused GLM-4.7-Flash to fail when running on the NPU; this PR implements compatibility adjustments to address this issue.
Modifications
selflacked the_gfx95_quant_formatmember variable.Accuracy Tests
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci