[AMD] Fix Kimi-K2.6 Quark MXFP4 loading prefix and packed module mapping by ColinZ22 · Pull Request #23408 · sgl-project/sglang

ColinZ22 · 2026-04-21T21:12:12Z

Motivation

Fix Kimi-K2.6 Quark MXFP4 checkpoint loading.
Two issues:

DeepseekV3ForCausalLM in KimiK25ForConditionalGeneration was only given the "language_model" prefix when the quant config was "ModelSlimConfig". For Quark MXFP4 checkpoints, the exclude_layers list contains entries prefixed with language_model. (e.g., language_model.model.layers.0.self_attn.q_a_proj). Without passing prefix="language_model", layer prefixes during model construction fail to match the exclude list, causing layers that should be kept unquantized to be incorrectly quantized.
The Quark quantization path was missing the packed module mapping for fused QKV-A projections used in DeepseekV3-based models, which is needed for correct weight loading of fused q_a_proj + kv_a_proj_with_mqa layers.

Modifications

Add QuarkConfig to the isinstance check so DeepseekV3ForCausalLM receives prefix="language_model" for both ModelSlimConfig and QuarkConfig.
Add fused_qkv_a_proj_with_mqa to the Quark packed_modules_mapping.

Server Command and Accuracy Verification

Serving Command:

sglang serve --model-path [Kimi-K2.6-MXFP4] --host 0.0.0.0 --tp 4 --trust-remote-code

Accuracy Verification via lm_eval:

lm_eval --model sglang --model_args pretrained=[Kimi-K2.6-MXFP4],tp_size=4 --tasks gsm8k --batch_size auto

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9348|±  |0.0068|
|     |       |strict-match    |     5|exact_match|↑  |0.9356|±  |0.0068|

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist · 2026-04-21T21:12:15Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

hubertlu-tw · 2026-04-21T21:38:31Z

@ColinZ22 could you please add the server command you use along with the gsm8k benchmark results? Thanks.

ColinZ22 · 2026-04-22T18:32:37Z

@ColinZ22 could you please add the server command you use along with the gsm8k benchmark results? Thanks.

Added

HaiShaw · 2026-04-27T04:32:28Z

/tag-and-rerun-ci

…ing (sgl-project#23408)

ColinZ22 added 2 commits April 21, 2026 21:09

Fix Kimi-K2.6 quark MXFP4 checkpoint loading

3dee9e6

Merge branch 'main' into Kimi-K2.6_fix

624c274

hubertlu-tw added amd run-ci labels Apr 21, 2026

hubertlu-tw requested review from BowenBao, HaiShaw and hubertlu-tw April 21, 2026 21:38

BowenBao approved these changes Apr 22, 2026

View reviewed changes

HaiShaw approved these changes Apr 27, 2026

View reviewed changes

HaiShaw merged commit d49561b into sgl-project:main Apr 27, 2026
157 of 186 checks passed

vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026

[AMD] Fix Kimi-K2.6 Quark MXFP4 loading prefix and packed module mapp…

1b3c066

…ing (sgl-project#23408)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Fix Kimi-K2.6 Quark MXFP4 loading prefix and packed module mapping#23408

[AMD] Fix Kimi-K2.6 Quark MXFP4 loading prefix and packed module mapping#23408
HaiShaw merged 2 commits intosgl-project:mainfrom
ColinZ22:Kimi-K2.6_fix

ColinZ22 commented Apr 21, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 21, 2026

Uh oh!

hubertlu-tw commented Apr 21, 2026

Uh oh!

ColinZ22 commented Apr 22, 2026

Uh oh!

HaiShaw commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ColinZ22 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Server Command and Accuracy Verification

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented Apr 21, 2026

Uh oh!

hubertlu-tw commented Apr 21, 2026

Uh oh!

ColinZ22 commented Apr 22, 2026

Uh oh!

HaiShaw commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ColinZ22 commented Apr 21, 2026 •

edited

Loading