GLM-5/5.1 MXFP4 Checkpoint Inference Compatibility Fix#22543
GLM-5/5.1 MXFP4 Checkpoint Inference Compatibility Fix#22543HaiShaw merged 3 commits intosgl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces several updates to the DeepSeek model implementation and server argument handling. Specifically, it adds a check to ensure only 'DeepseekV3ForCausalLM' models are processed during certain quantization steps, updates the packed module mapping for DeepSeek V2, and normalizes device strings by stripping indices. A review comment suggests adding a safety check when accessing the architectures list in the weight loader to prevent a potential IndexError.
…uard Cherry-pick critical fixes from PR #22543 (ColinZ22): 1. Add packed_modules_mapping for DeepseekV2ForCausalLM so Quark's should_ignore_layer() can resolve fused gate_up_proj -> [gate_proj, up_proj] against the exclude list 2. Guard quark_post_load_weights to only run on DeepseekV3ForCausalLM, preventing GlmMoeDsaForCausalLM from hitting the wrong weight transformation path Co-authored-by: ColinZ22 <ColinZ22@users.noreply.github.com>
| packed_modules_mapping = {} | ||
| packed_modules_mapping = { | ||
| "gate_up_proj": ["gate_proj", "up_proj"], | ||
| } |
There was a problem hiding this comment.
Wrong place to introduce quark specific need changes.
Please refer to _get_quantization_config() in model_loader/loader.py for proper code change.
There was a problem hiding this comment.
tbf this is not quark specific. As can be seen in various other models.
sglang/python/sglang/srt/models/qwen3_moe.py
Lines 936 to 942 in e9d6b9e
sglang/python/sglang/srt/models/mllama4.py
Lines 417 to 421 in e9d6b9e
Add nightly CI tests for amd/GLM-5-MXFP4 (Quark MXFP4 quantized) on MI35x GPUs with accuracy (GSM8K) and performance (bench_one_batch) benchmarks, plus engine fixes to enable Quark MXFP4 on GlmMoeDsaForCausalLM. Engine fixes (cherry-picked from PR #22543 by ColinZ22): - Add packed_modules_mapping to DeepseekV2ForCausalLM for Quark exclude-layer name resolution (gate_up_proj -> [gate_proj, up_proj]) - Guard quark_post_load_weights to only run on DeepseekV3ForCausalLM Test files: - test/registered/amd/accuracy/mi35x/test_glm5_mxfp4_eval_mi35x.py - test/registered/amd/perf/mi35x/test_glm5_mxfp4_perf_mi35x.py Workflow: combined accuracy+perf jobs in nightly-test-amd.yml and nightly-test-amd-rocm720.yml Verified: GSM8K accuracy 0.93+ on MI35x (run #7 passed) https://github.com/sgl-project/sglang/actions/runs/24268460251 Co-authored-by: ColinZ22 <ColinZ22@users.noreply.github.com>
Add nightly CI tests for amd/GLM-5-MXFP4 (Quark MXFP4 quantized) on MI35x GPUs with accuracy (GSM8K) and performance (bench_one_batch) benchmarks, plus engine fixes to enable Quark MXFP4 on GlmMoeDsaForCausalLM. Engine fixes (cherry-picked from PR #22543 by ColinZ22): - Add packed_modules_mapping to DeepseekV2ForCausalLM for Quark exclude-layer name resolution (gate_up_proj -> [gate_proj, up_proj]) - Guard quark_post_load_weights to only run on DeepseekV3ForCausalLM Test files: - test/registered/amd/accuracy/mi35x/test_glm5_mxfp4_eval_mi35x.py - test/registered/amd/perf/mi35x/test_glm5_mxfp4_perf_mi35x.py Workflow: combined accuracy+perf jobs in nightly-test-amd.yml and nightly-test-amd-rocm720.yml Verified: GSM8K accuracy 0.93+ on MI35x (run #7 passed) https://github.com/sgl-project/sglang/actions/runs/24268460251 Co-authored-by: ColinZ22 <ColinZ22@users.noreply.github.com>
Add nightly CI tests for amd/GLM-5-MXFP4 (Quark MXFP4 quantized) on MI35x GPUs with accuracy (GSM8K) and performance (bench_one_batch) benchmarks, plus engine fixes to enable Quark MXFP4 on GlmMoeDsaForCausalLM. Engine fixes (cherry-picked from PR #22543 by ColinZ22): - Add packed_modules_mapping to DeepseekV2ForCausalLM for Quark exclude-layer name resolution (gate_up_proj -> [gate_proj, up_proj]) - Guard quark_post_load_weights to only run on DeepseekV3ForCausalLM Test files: - test/registered/amd/accuracy/mi35x/test_glm5_mxfp4_eval_mi35x.py - test/registered/amd/perf/mi35x/test_glm5_mxfp4_perf_mi35x.py Workflow: combined accuracy+perf jobs in nightly-test-amd.yml and nightly-test-amd-rocm720.yml Verified: GSM8K accuracy 0.93+ on MI35x (run #7 passed) https://github.com/sgl-project/sglang/actions/runs/24268460251 Co-authored-by: ColinZ22 <ColinZ22@users.noreply.github.com>
Add nightly CI tests for amd/GLM-5-MXFP4 (Quark MXFP4 quantized) on MI35x GPUs with accuracy (GSM8K) and performance (bench_one_batch) benchmarks, plus engine fixes to enable Quark MXFP4 on GlmMoeDsaForCausalLM. Engine fixes (cherry-picked from PR #22543 by ColinZ22): - Add packed_modules_mapping to DeepseekV2ForCausalLM for Quark exclude-layer name resolution (gate_up_proj -> [gate_proj, up_proj]) - Guard quark_post_load_weights to only run on DeepseekV3ForCausalLM Test files: - test/registered/amd/accuracy/mi35x/test_glm5_mxfp4_eval_mi35x.py - test/registered/amd/perf/mi35x/test_glm5_mxfp4_perf_mi35x.py Workflow: combined accuracy+perf jobs in nightly-test-amd.yml and nightly-test-amd-rocm720.yml Verified: GSM8K accuracy 0.93+ on MI35x (run #7 passed) https://github.com/sgl-project/sglang/actions/runs/24268460251 Co-authored-by: ColinZ22 <ColinZ22@users.noreply.github.com>
Add nightly CI tests for amd/GLM-5-MXFP4 (Quark MXFP4 quantized) on MI35x GPUs with accuracy (GSM8K) and performance (bench_one_batch) benchmarks, plus engine fixes to enable Quark MXFP4 on GlmMoeDsaForCausalLM. Engine fixes (cherry-picked from PR #22543 by ColinZ22): - Add packed_modules_mapping to DeepseekV2ForCausalLM for Quark exclude-layer name resolution (gate_up_proj -> [gate_proj, up_proj]) - Guard quark_post_load_weights to only run on DeepseekV3ForCausalLM Test files: - test/registered/amd/accuracy/mi35x/test_glm5_mxfp4_eval_mi35x.py - test/registered/amd/perf/mi35x/test_glm5_mxfp4_perf_mi35x.py Workflow: combined accuracy+perf jobs in nightly-test-amd.yml and nightly-test-amd-rocm720.yml Verified: GSM8K accuracy 0.93+ on MI35x (run #7 passed) https://github.com/sgl-project/sglang/actions/runs/24268460251 Co-authored-by: ColinZ22 <ColinZ22@users.noreply.github.com>
Add nightly CI tests for amd/GLM-5-MXFP4 (Quark MXFP4 quantized) on MI35x GPUs with accuracy (GSM8K) and performance (bench_one_batch) benchmarks, plus engine fixes to enable Quark MXFP4 on GlmMoeDsaForCausalLM. Engine fixes (aligned with PR #22543 by ColinZ22, per HaiShaw review): - loader.py: Add packed_modules_mapping for Quark in _get_quantization_config() - deepseek_weight_loader.py: Guard quark_post_load_weights to DeepseekV3 only Test files: - test/registered/amd/accuracy/mi35x/test_glm5_mxfp4_eval_mi35x.py - test/registered/amd/perf/mi35x/test_glm5_mxfp4_perf_mi35x.py Workflow: combined accuracy+perf jobs in nightly-test-amd.yml and nightly-test-amd-rocm720.yml Co-authored-by: ColinZ22 <ColinZ22@users.noreply.github.com>
|
LGTM |
Add nightly CI tests for amd/GLM-5-MXFP4 (Quark MXFP4 quantized) on MI35x GPUs with accuracy (GSM8K) and performance (bench_one_batch) benchmarks, plus engine fixes to enable Quark MXFP4 on GlmMoeDsaForCausalLM. Engine fixes (aligned with PR #22543 by ColinZ22, per HaiShaw review): - loader.py: Add packed_modules_mapping for Quark in _get_quantization_config() - deepseek_weight_loader.py: Guard quark_post_load_weights to DeepseekV3 only Test files: - test/registered/amd/accuracy/mi35x/test_glm5_mxfp4_eval_mi35x.py - test/registered/amd/perf/mi35x/test_glm5_mxfp4_perf_mi35x.py Workflow: combined accuracy+perf jobs in nightly-test-amd.yml and nightly-test-amd-rocm720.yml Co-authored-by: ColinZ22 <ColinZ22@users.noreply.github.com>
…ks for MI30x and MI35x Add nightly CI tests for amd/GLM-5.1-MXFP4 (408B MoE, Quark MXFP4) on both MI30x and MI35x GPUs. Includes pre-download step for the 425GB model. Depends on PR #22543 for engine fixes (cherry-picked above): - packed_modules_mapping for Quark exclude-layer resolution - Guard quark_post_load_weights to only run on DeepseekV3ForCausalLM
|
/tag-and-rerun-ci |
|
@amd-bot ci-status |
CI Status for PR #22543PR: GLM-5/5.1 MXFP4 Checkpoint Inference Compatibility Fix
DetailsThis PR makes 3 small, targeted changes:
None of the 4 distinct root-cause failures are related to this PR:
Verdict: All failures are pre-existing infrastructure issues. No action needed from the PR author.Generated by amd-bot using Claude Code CLI |
Add test files for amd/GLM-5.1-MXFP4 (408B MoE, Quark MXFP4) on MI30x and MI35x. Workflow jobs and pre-download steps are already on main. Engine fixes from PR #22543 are already merged. Test suites: - nightly-amd-accuracy-8-gpu-glm51-mxfp4 (MI30x accuracy) - nightly-amd-8-gpu-mi35x-glm51-mxfp4 (MI35x accuracy) - nightly-perf-8-gpu-glm51-mxfp4 (MI30x perf) - nightly-perf-8-gpu-mi35x-glm51-mxfp4 (MI35x perf)
…ks for MI30x and MI35x Add nightly CI tests for amd/GLM-5.1-MXFP4 (408B MoE, Quark MXFP4) on MI30x and MI35x. Includes pre-download step (120min timeout) to cache the 425GB model on persistent runner storage before server start. Test files: - test/registered/amd/accuracy/mi30x/test_glm51_mxfp4_eval_amd.py - test/registered/amd/accuracy/mi35x/test_glm51_mxfp4_eval_mi35x.py - test/registered/amd/perf/mi30x/test_glm51_mxfp4_perf_amd.py - test/registered/amd/perf/mi35x/test_glm51_mxfp4_perf_mi35x.py Workflow: MI30x + MI35x jobs for default ROCm and ROCm 7.2 with accuracy + perf steps, dropdown entries, and check-all-jobs needs. Engine fixes already merged via PR #22543.
…2543) Co-authored-by: HAI <hixiao@gmail.com>
Motivation
Addresses this issue regarding AMD Quark-quantized GLM-5 and GLM-5.1 MXFP4 checkpoints when using with SGLang (Exclude-layer names don't match SGLang internal names & Weight shape mismatch during MoE loading).
Modifications
packed_modules_mappingforDeepseekV2ForCausalLMquark_post_load_weightsfunction (previously causingGlmMoeDsaForCausalLMmodels breaking)Accuracy Tests
Using
lm_evalin SGLang(Fixes amd/Quark#25)