Skip to content

[AMD] Fix Kimi-K2.6 Quark MXFP4 loading prefix and packed module mapping#23408

Merged
HaiShaw merged 2 commits intosgl-project:mainfrom
ColinZ22:Kimi-K2.6_fix
Apr 27, 2026
Merged

[AMD] Fix Kimi-K2.6 Quark MXFP4 loading prefix and packed module mapping#23408
HaiShaw merged 2 commits intosgl-project:mainfrom
ColinZ22:Kimi-K2.6_fix

Conversation

@ColinZ22
Copy link
Copy Markdown
Contributor

@ColinZ22 ColinZ22 commented Apr 21, 2026

Motivation

Fix Kimi-K2.6 Quark MXFP4 checkpoint loading.
Two issues:

  1. DeepseekV3ForCausalLM in KimiK25ForConditionalGeneration was only given the "language_model" prefix when the quant config was "ModelSlimConfig". For Quark MXFP4 checkpoints, the exclude_layers list contains entries prefixed with language_model. (e.g., language_model.model.layers.0.self_attn.q_a_proj). Without passing prefix="language_model", layer prefixes during model construction fail to match the exclude list, causing layers that should be kept unquantized to be incorrectly quantized.
  2. The Quark quantization path was missing the packed module mapping for fused QKV-A projections used in DeepseekV3-based models, which is needed for correct weight loading of fused q_a_proj + kv_a_proj_with_mqa layers.

Modifications

  • Add QuarkConfig to the isinstance check so DeepseekV3ForCausalLM receives prefix="language_model" for both ModelSlimConfig and QuarkConfig.
  • Add fused_qkv_a_proj_with_mqa to the Quark packed_modules_mapping.

Server Command and Accuracy Verification

Serving Command:

sglang serve --model-path [Kimi-K2.6-MXFP4] --host 0.0.0.0 --tp 4 --trust-remote-code

Accuracy Verification via lm_eval:

lm_eval --model sglang --model_args pretrained=[Kimi-K2.6-MXFP4],tp_size=4 --tasks gsm8k --batch_size auto

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9348|±  |0.0068|
|     |       |strict-match    |     5|exact_match|↑  |0.9356|±  |0.0068|

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@hubertlu-tw
Copy link
Copy Markdown
Collaborator

@ColinZ22 could you please add the server command you use along with the gsm8k benchmark results? Thanks.

@ColinZ22
Copy link
Copy Markdown
Contributor Author

@ColinZ22 could you please add the server command you use along with the gsm8k benchmark results? Thanks.

Added

@HaiShaw
Copy link
Copy Markdown
Collaborator

HaiShaw commented Apr 27, 2026

/tag-and-rerun-ci

@HaiShaw HaiShaw merged commit d49561b into sgl-project:main Apr 27, 2026
157 of 186 checks passed
vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants