Skip to content

Support piecewise cuda graph for fused marlin moe#15100

Merged
ispobock merged 3 commits intosgl-project:mainfrom
ispobock:pcg-marlin-moe
Dec 16, 2025
Merged

Support piecewise cuda graph for fused marlin moe#15100
ispobock merged 3 commits intosgl-project:mainfrom
ispobock:pcg-marlin-moe

Conversation

@ispobock
Copy link
Copy Markdown
Collaborator

@ispobock ispobock commented Dec 14, 2025

Motivation

Support piecewise cuda graph for models using fused_marlin_moe like moe models with gptq/awq quantization, kimi-k2-thinking model.

Accuracy Tests

python3 -m sglang.launch_server --model-path Qwen/Qwen3-30B-A3B-GPTQ-Int4 --trust-remote-code --enable-piecewise-cuda-graph

python3 benchmark/gsm8k/bench_sglang.py --num-questions 1400 --parallel 1400

Accuracy: 0.903
Invalid: 0.000
Latency: 17.246 s
Output throughput: 9550.159 token/s

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@ispobock
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@ispobock
Copy link
Copy Markdown
Collaborator Author

@ispobock ispobock merged commit b399e3a into sgl-project:main Dec 16, 2025
366 of 399 checks passed
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 17, 2025
jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025
YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants