Support piecewise cuda graph for fused marlin moe by ispobock · Pull Request #15100 · sgl-project/sglang

ispobock · 2025-12-14T06:42:38Z

Motivation

Support piecewise cuda graph for models using fused_marlin_moe like moe models with gptq/awq quantization, kimi-k2-thinking model.

Accuracy Tests

python3 -m sglang.launch_server --model-path Qwen/Qwen3-30B-A3B-GPTQ-Int4 --trust-remote-code --enable-piecewise-cuda-graph

python3 benchmark/gsm8k/bench_sglang.py --num-questions 1400 --parallel 1400

Accuracy: 0.903
Invalid: 0.000
Latency: 17.246 s
Output throughput: 9550.159 token/s

gemini-code-assist · 2025-12-14T06:42:41Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

ispobock · 2025-12-14T07:08:24Z

/tag-and-rerun-ci

ispobock · 2025-12-14T12:08:03Z

https://github.com/sgl-project/sglang/actions/runs/20204229586/job/58004459976?pr=15100 ci passed

fused marlin moe piecewise cuda graph

30ea849

ispobock requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg, Fridge003, HaiShaw, Ying1123, ch-wan and merrymercy as code owners December 14, 2025 06:42

ispobock added 2 commits December 14, 2025 06:53

add unit test

4e563ab

Merge branch 'main' into pcg-marlin-moe

1928a25

BBuf approved these changes Dec 14, 2025

View reviewed changes

github-actions Bot added the run-ci label Dec 14, 2025

ispobock merged commit b399e3a into sgl-project:main Dec 16, 2025
366 of 399 checks passed

This was referenced Dec 16, 2025

[Piecewise CUDA Graph] Partially Support Kimi-K2 Thinking #15006

Closed

[Feature] Roadmap for Prefill (Piecewise) CUDA Graph #11490

Closed

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 17, 2025

Support piecewise cuda graph for fused marlin moe (sgl-project#15100)

c1b9add

jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025

Support piecewise cuda graph for fused marlin moe (sgl-project#15100)

15faa0a

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

Support piecewise cuda graph for fused marlin moe (sgl-project#15100)

2fd73bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support piecewise cuda graph for fused marlin moe#15100

Support piecewise cuda graph for fused marlin moe#15100
ispobock merged 3 commits intosgl-project:mainfrom
ispobock:pcg-marlin-moe

ispobock commented Dec 14, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Dec 14, 2025

Uh oh!

ispobock commented Dec 14, 2025

Uh oh!

ispobock commented Dec 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ispobock commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Accuracy Tests

Uh oh!

gemini-code-assist Bot commented Dec 14, 2025

Uh oh!

ispobock commented Dec 14, 2025

Uh oh!

ispobock commented Dec 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ispobock commented Dec 14, 2025 •

edited

Loading