Tune triton fused moe for the case of glm-4.6-fp8 b200 tp4 by Qiaolin-Yu · Pull Request #15020 · sgl-project/sglang

Qiaolin-Yu · 2025-12-13T01:02:56Z

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-13T01:02:59Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…n_eagle3_npu * 'main' of https://github.com/sgl-project/sglang: (121 commits) Super tiny add gsp-fast-prepare (sgl-project#14992) Super tiny fix confusing slash_command_handler hint (sgl-project#14976) Super tiny remove unused argument (sgl-project#14966) [registry] Add a strict mode to model registration (sgl-project#14933) Feature/Fix multi lora scheduler blocking issue and evict LoRA None lastly (sgl-project#14795) Tune triton fused moe for the case of glm-4.6-fp8 b200 tp4 (sgl-project#15020) [model-gateway] refactor: unify worker management into modular workflow structure (sgl-project#15010) Update ci permission (sgl-project#15014) Refactor of http and engine entrypoints to allow custom override (sgl-project#14869) Add KV4-capable backend flashmla and update server args (sgl-project#14989) Revert several PRs (sgl-project#14958) Super tiny extract route_typed_request_once (sgl-project#14951) Fix CI by reverting incorrect metric check logic (sgl-project#15004) [model-gateway] refactor: workflow engine cleanup and minor optimization (sgl-project#15001) [model-gateway] fix: handle workflow deadlock and optimize cycle detection (sgl-project#15000) [model-gateway] feat: add DAG parallel execution support and workflow optimization (sgl-project#14999) [model-gateway] refactor: extract workflow engine to src/workflow module (sgl-project#14996) Update CODEOWNERS for multimodal_gen (sgl-project#14995) [diffusion] docker: Tiny fix Docker Hub link in installation documentation (sgl-project#14987) [PD] Add decode PP event loop for PD disaggregation (sgl-project#14945) ... # Conflicts: # python/sglang/srt/model_executor/piecewise_cuda_graph_runner.py

…ct#15020)

upd

712598a

Qiaolin-Yu requested review from BBuf, Edwardf0t1, Fridge003, HaiShaw, Ying1123, ch-wan, ispobock and merrymercy as code owners December 13, 2025 01:02

github-actions Bot added the quant LLM Quantization label Dec 13, 2025

Qiaolin-Yu assigned Fridge003 Dec 13, 2025

Fridge003 approved these changes Dec 13, 2025

View reviewed changes

Fridge003 merged commit 44fd701 into sgl-project:main Dec 13, 2025
50 of 58 checks passed

Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 17, 2025

Tune triton fused moe for the case of glm-4.6-fp8 b200 tp4 (sgl-proje…

e5ba497

…ct#15020)

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

Tune triton fused moe for the case of glm-4.6-fp8 b200 tp4 (sgl-proje…

b49f7fc

…ct#15020)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune triton fused moe for the case of glm-4.6-fp8 b200 tp4#15020

Tune triton fused moe for the case of glm-4.6-fp8 b200 tp4#15020
Fridge003 merged 1 commit intosgl-project:mainfrom
Qiaolin-Yu:tune_moe

Qiaolin-Yu commented Dec 13, 2025

Uh oh!

gemini-code-assist Bot commented Dec 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Qiaolin-Yu commented Dec 13, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Dec 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants