Integrate DeepGemm contiguous group gemm into Fused MoE by laixinn · Pull Request #4343 · sgl-project/sglang

laixinn · 2025-03-12T11:14:14Z

Motivation

Integrate DeepGemm m_grouped_gemm_fp8_fp8_bf16_nt_contiguous as the default group gemm kernel for Fused MoE, depending on #4165 .

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

Co-authored-by: yinfan98 <1106310035@qq.com>

… deep-gemm-contiguous

ch-wan · 2025-03-26T02:21:02Z

@laixinn Is there any plan to support masked gemm? It can be integrated with low_latency_dispatch seamlessly.

laixinn · 2025-03-26T02:51:11Z

@ch-wan I heard some EP features are developing with the masked gemm.

laixinn · 2025-03-26T05:45:39Z

This PR can pass unit test and support cuda graph, but the overhead for pre- and post-processing is currently unacceptable. Optimizing this overhead takes a while.

ch-wan · 2025-03-26T06:35:50Z

It sounds reasonable. The current fused_moe kernel matches the data layout of inference without ep or inference with --enable-ep. Changing data layout to compute DeepGeMM may incur unnecessary overhead.

How about we focus on integrating DeepGeMM with DeepEP? Our current implementation also requires pre- and post-processing before computing GroupedGeMM (see #4643). DeepGeMM may achieve better performance for this case. Note that cuda graph has to be disabled if we focus on DeepEP.

laixinn · 2025-03-26T06:50:17Z

@ch-wan Exactly, I suppose the deepgemm kernels are designed for EP.

support deepgemm contiguous group gemm with unit test

72d1903

Co-authored-by: yinfan98 <1106310035@qq.com>

sleepcoo requested review from sleepcoo and zhyncs March 12, 2025 11:20

sleepcoo and others added 4 commits March 12, 2025 19:21

Merge branch 'main' into deep-gemm-contiguous

88c6dd4

support cuda graph

346df31

Merge branch 'deep-gemm-contiguous' of github.com:laixinn/sglang into…

276450b

… deep-gemm-contiguous

Make deepgemm optional.

b160ec2

zhyncs added the high priority label Mar 12, 2025

merrymercy mentioned this pull request Mar 13, 2025

Development Roadmap (2025 H1) #4042

Closed

67 tasks

laixinn added 3 commits March 13, 2025 17:56

support cuda graph

60d81be

add env var into unit test

5a18535

optimize apply_token_ids by triton kernel

ce2677b

laixinn force-pushed the deep-gemm-contiguous branch from f2ac813 to ce2677b Compare March 14, 2025 10:37

fix typo

e5d3d3a

laixinn force-pushed the deep-gemm-contiguous branch from 13b3cdc to e5d3d3a Compare March 14, 2025 10:48

merrymercy mentioned this pull request Feb 24, 2025

[Feature] DeepSeek V3 optimization #2591

Closed

20 tasks

ch-wan mentioned this pull request Mar 26, 2025

[Roadmap] EP Enhancement #4734

Closed

18 tasks

merrymercy closed this Apr 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate DeepGemm contiguous group gemm into Fused MoE#4343

Integrate DeepGemm contiguous group gemm into Fused MoE#4343
laixinn wants to merge 9 commits intosgl-project:mainfrom
laixinn:deep-gemm-contiguous

laixinn commented Mar 12, 2025

Uh oh!

ch-wan commented Mar 26, 2025

Uh oh!

laixinn commented Mar 26, 2025

Uh oh!

laixinn commented Mar 26, 2025

Uh oh!

ch-wan commented Mar 26, 2025 •

edited

Loading

Uh oh!

laixinn commented Mar 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

laixinn commented Mar 12, 2025

Motivation

Modifications

Checklist

Uh oh!

ch-wan commented Mar 26, 2025

Uh oh!

laixinn commented Mar 26, 2025

Uh oh!

laixinn commented Mar 26, 2025

Uh oh!

ch-wan commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laixinn commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ch-wan commented Mar 26, 2025 •

edited

Loading

laixinn commented Mar 26, 2025 •

edited

Loading