[DeepseekV32] Enable flashmla_prefill kernel with fp8 kvcache by hlu1 · Pull Request #11655 · sgl-project/sglang

hlu1 · 2025-10-15T06:55:08Z

Motivation

Add logics to dequant the kvcache from fp8 to bf16 in a separate kernel and use flashmla_prefill kernel with fp8 kvcache.

Verify accuracy and perf on both H200 and B200
Fuse topk transformations with the topk kernel.
Add a new triton fused kernel dequantize_k_cache_paged.
Add heuristics to switch flashmla_prefill/flashmla_decode automatically. Also add flashmla_auto mode and use it as the default mode for prefill when fp8 kvcache is enabled.
Adjust default settings for both fp8/bf16 kvcache on Blackwell.
Make it compatible with mtp. Currently it dispatches prefill to flashmla_kv with fp8 kvcache when spec decoding is detected.

flashmla_decode (before)

flashmla_prefill with no kvcache reuse or chunked prefill (after)

flashmla_prefill with kvcache reuse or chunked prefill (after)

Accuracy Tests

gpqa (with an early fp4 checkpoint)
with fp8 kvcache
before: ['0.768', '0.823', '0.773', '0.783']
after: ['0.788', '0.798', '0.773', '0.798']

with bf16 kvcache
After: ['0.818', '0.818', '0.828', '0.758']

Benchmarking and Profiling

With fp4 checkpoint:

python -m sglang.bench_one_batch_server --model-path $MODEL --tp 4 --dp 4 --enable-dp-attention --batch 64 --input-len 8192 --output-len 1  --nsa-prefill flashmla_prefill

before:
input throughput: 12536.68 tok/s

after:
input throughput: 14610.87 tok/s

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-10-15T06:55:13Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Fridge003 · 2025-10-26T02:58:58Z

@hlu1 The Configuration Tips section of document needs to be updated in a following PR after this PR get merged.

Fridge003

Great work!

Fridge003 · 2025-10-26T04:41:49Z

@hlu1 Please fix the bug here https://github.com/sgl-project/sglang/actions/runs/18812434633/job/53676478829?pr=11655

hlu1 · 2025-10-26T06:37:00Z

@hlu1 The Configuration Tips section of document needs to be updated in a following PR after this PR get merged.

Will do.

@hlu1 Please fix the bug here https://github.com/sgl-project/sglang/actions/runs/18812434633/job/53676478829?pr=11655

Fixed.

Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>

Fridge003

wonderful work!

hlu1 requested review from BBuf, Edwardf0t1, HaiShaw, Ying1123, ch-wan, ispobock, kushanam, merrymercy and zhyncs as code owners October 15, 2025 06:55

hlu1 marked this pull request as draft October 15, 2025 06:55

hlu1 force-pushed the dsv32 branch 2 times, most recently from a113313 to 264ce30 Compare October 18, 2025 23:35

hlu1 marked this pull request as ready for review October 18, 2025 23:41

hlu1 mentioned this pull request Oct 19, 2025

[DeepseekV32] Add fast_topk_transform_ragged_fused kernel #11815

Merged

4 tasks

hlu1 self-assigned this Oct 19, 2025

hlu1 added the run-ci label Oct 19, 2025

hlu1 force-pushed the dsv32 branch 2 times, most recently from a926a96 to d514e1f Compare October 22, 2025 19:42

hlu1 requested review from FlamingoPg and yizhang2077 as code owners October 22, 2025 20:07

hlu1 force-pushed the dsv32 branch from 7e07439 to daad351 Compare October 22, 2025 22:22

hlu1 mentioned this pull request Oct 23, 2025

[Feature] NSA optimization roadmap #11989

Closed

hlu1 force-pushed the dsv32 branch 3 times, most recently from f8974a1 to 00a769d Compare October 23, 2025 05:43

Fridge003 reviewed Oct 25, 2025

View reviewed changes

hlu1 force-pushed the dsv32 branch 3 times, most recently from 6f3c28d to aae926a Compare October 26, 2025 01:53

hlu1 force-pushed the dsv32 branch from aae926a to 951b783 Compare October 26, 2025 02:50

hlu1 force-pushed the dsv32 branch from 951b783 to 7a93bd9 Compare October 26, 2025 03:36

Fridge003 approved these changes Oct 26, 2025

View reviewed changes

hlu1 added 3 commits October 26, 2025 15:43

[DeepseekV32] Enable flashmla_prefill kernel with fp8 kvcache

4afcf87

Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>

Rebase and address comments

fff3ff0

Fix bug in mtp

fa878dc

hlu1 force-pushed the dsv32 branch from 730af91 to fa878dc Compare October 26, 2025 22:43

Merge branch 'main' into dsv32

31ab89a

Fridge003 approved these changes Oct 28, 2025

View reviewed changes

Fridge003 merged commit 81a632a into sgl-project:main Oct 28, 2025
90 of 108 checks passed

This was referenced Oct 28, 2025

[Deepseek V3.2] Enable flashmla_auto with MTP #12294

Merged

Update deepseek_v32.md #12296

Merged

YAMY1234 mentioned this pull request Nov 10, 2025

[DeepseekV3.2] Deepseek fp8 support for MHA path #12964

Merged

4 tasks

hlu1 deleted the dsv32 branch November 14, 2025 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DeepseekV32] Enable flashmla_prefill kernel with fp8 kvcache#11655

[DeepseekV32] Enable flashmla_prefill kernel with fp8 kvcache#11655
Fridge003 merged 4 commits intosgl-project:mainfrom
hlu1:dsv32

hlu1 commented Oct 15, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Oct 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fridge003 commented Oct 26, 2025

Uh oh!

Fridge003 left a comment

Uh oh!

Fridge003 commented Oct 26, 2025

Uh oh!

hlu1 commented Oct 26, 2025

Uh oh!

Fridge003 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hlu1 commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Oct 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fridge003 commented Oct 26, 2025

Uh oh!

Fridge003 left a comment

Choose a reason for hiding this comment

Uh oh!

Fridge003 commented Oct 26, 2025

Uh oh!

hlu1 commented Oct 26, 2025

Uh oh!

Fridge003 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hlu1 commented Oct 15, 2025 •

edited

Loading